Popularity

9.1

Growing

Activity

7.9

Stars 9,552

Watchers 246

Forks 1,139

Last Commit 4 days ago

Description

A General-purpose Parallel and Heterogeneous Task Programming System

Programming language: C++

License: GNU General Public License v3.0 or later

Tags: Artificial Intelligence Concurrency Scientific Computing Parallel Processing

Latest version: v2.7.0

Taskflow alternatives and similar libraries

Based on the "Concurrency" category.
Alternatively, view Taskflow alternatives based on common mentions on social networks and blogs.

moodycamel

9.1 3.9 L3 Taskflow VS moodycamel

A fast multi-producer, multi-consumer lock-free concurrent queue for C++11
Thrust

8.3 6.9 L4 Taskflow VS Thrust

DISCONTINUED. [ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

Promo www.influxdata.com

ArrayFire

8.0 7.8 L2 Taskflow VS ArrayFire

ArrayFire: a general purpose GPU library.
readerwriterqueue

7.8 0.0 Taskflow VS readerwriterqueue

A fast single-producer, single-consumer lock-free queue for C++
NCCL

7.6 5.8 Taskflow VS NCCL

Optimized primitives for collective multi-GPU communication
C++ Actor Framework

7.6 9.7 Taskflow VS C++ Actor Framework

An Open Source Implementation of the Actor Model in C++
HPX

7.3 9.8 L2 Taskflow VS HPX

The C++ Standard Library for Parallelism and Concurrency
libcds

7.3 0.0 L2 Taskflow VS libcds

A C++ library of Concurrent Data Structures
libmill

7.2 0.0 Taskflow VS libmill

Go-style concurrency in C
ck

7.0 6.9 L3 Taskflow VS ck

Concurrency primitives, safe memory reclamation mechanisms and non-blocking (including lock-free) data structures designed to aid in the research, design and implementation of high performance concurrent systems developed in C99+.
Boost.Compute

6.5 0.0 L3 Taskflow VS Boost.Compute

A C++ GPU Computing Library for OpenCL
moderngpu

6.4 2.6 L3 Taskflow VS moderngpu

Patterns and behaviors for GPU computing
libdill

6.3 0.0 Taskflow VS libdill

Structured concurrency in C
junction

6.0 0.0 L2 Taskflow VS junction

Concurrent data structures in C++
C++React

5.6 0.0 L4 Taskflow VS C++React

C++React: A reactive programming library for C++11.
MPMCQueue.h

5.6 2.8 Taskflow VS MPMCQueue.h

A bounded multi-producer multi-consumer concurrent queue written in C++11
RaftLib

5.3 5.7 Taskflow VS RaftLib

The RaftLib C++ library, streaming/dataflow concurrency via C++ iostream-like operators
stdgpu

5.3 7.1 Taskflow VS stdgpu

stdgpu: Efficient STL-like Data Structures on the GPU
SPSCQueue.h

5.0 4.6 Taskflow VS SPSCQueue.h

A bounded single-producer single-consumer wait-free and lock-free queue written in C++11
VexCL

4.8 0.0 L1 Taskflow VS VexCL

VexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP
continuable

4.7 5.2 L4 Taskflow VS continuable

C++14 asynchronous allocation aware futures (supporting then, exception handling, coroutines and connections)
A C++14 library for executors

4.2 0.0 L4 Taskflow VS A C++14 library for executors

C++ library for executors
Bolt

4.0 0.0 L1 Taskflow VS Bolt

Bolt is a C++ template library optimized for GPUs. Bolt provides high-performance library implementations for common algorithms such as scan, reduce, transform, and sort.
xenium

3.9 7.1 Taskflow VS xenium

A C++ library providing various concurrent data structures and reclamation schemes.
thread-pool

3.5 6.8 Taskflow VS thread-pool

A modern, fast, lightweight thread pool library based on C++20
CUB

2.6 2.7 Taskflow VS CUB

DISCONTINUED. THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.
SObjectizer

2.2 0.0 L4 Taskflow VS SObjectizer

SObjectizer: it's all about in-process message dispatching!
Light Actor Framework

2.2 0.0 Taskflow VS Light Actor Framework

DISCONTINUED. Laughably simple yet effective Actor concurrency framework for C++20
BlockingCollection

2.1 0.0 Taskflow VS BlockingCollection

C++11 thread safe, multi-producer, multi-consumer blocking queue, stack & priority queue class
Libclsph

1.9 0.0 L1 Taskflow VS Libclsph

OpenCL based GPU accelerated SPH fluid simulation library
Easy Creation of GnuPlot Scripts from C++

1.8 0.7 Taskflow VS Easy Creation of GnuPlot Scripts from C++

A simple C++17 lib that helps you to quickly plot your data with GnuPlot
cupla

1.2 0.0 Taskflow VS cupla

The project alpaka has moved to https://github.com/alpaka-group/cupla
alpaka

1.2 0.0 Taskflow VS alpaka

The project alpaka has moved to https://github.com/alpaka-group/alpaka
eXtended Template Library

1.1 0.0 Taskflow VS eXtended Template Library

eXtended Template Library
wstpool

0.9 0.0 Taskflow VS wstpool

Work Stealing Thread Pool
OpenMP

- Taskflow VS OpenMP

The OpenMP API.
OpenCL

- Taskflow VS OpenCL

The open standard for parallel programming of heterogeneous systems.

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of Taskflow or a related project?

Add another 'Concurrency' Library

Popular Comparisons

README

Taskflow

[Wiki](image/api-doc.svg) [TFProf](image/tfprof.svg) [Cite](image/cite-tpds.svg)

Taskflow helps you quickly write parallel and heterogeneous task programs in modern C++

Why Taskflow?

Taskflow is faster, more expressive, and easier for drop-in integration than many of existing task programming frameworks in handling complex parallel workloads.

[](image/performance.png)

Taskflow lets you quickly implement task decomposition strategies that incorporate both regular and irregular compute patterns, together with an efficient work-stealing scheduler to optimize your multithreaded performance.

Static Tasking	Dynamic Tasking
[](image/static_graph.svg)

Taskflow supports conditional tasking for you to make rapid control-flow decisions across dependent tasks to implement cycles and conditions that were otherwise difficult to do with existing tools.

Conditional Tasking
[](image/condition.svg)

Taskflow is composable. You can create large parallel graphs through composition of modular and reusable blocks that are easier to optimize at an individual scope.

Taskflow Composition
[](image/framework.svg)

Taskflow supports heterogeneous tasking for you to accelerate a wide range of scientific computing applications by harnessing the power of CPU-GPU collaborative computing.

Concurrent CPU-GPU Tasking
[](image/cudaflow.svg)

Taskflow provides visualization and tooling needed for profiling Taskflow programs.

Taskflow Profiler
[](image/tfprof.png)

We are committed to support trustworthy developments for both academic and industrial research projects in parallel computing. Check out Who is Using Taskflow and what our users say:

"Taskflow is the cleanest Task API I've ever seen." Damien Hocking @Corelium Inc
"Taskflow has a very simple and elegant tasking interface. The performance also scales very well." Glen Fraser
"Taskflow lets me handle parallel processing in a smart way." Hayabusa @Learning
"Taskflow improves the throughput of our graph engine in just a few hours of coding." Jean-Michaël @KDAB
"Best poster award for open-source parallel programming library." Cpp Conference 2018
"Second Prize of Open-source Software Competition." ACM Multimedia Conference 2019

See a quick presentation and visit the documentation to learn more about Taskflow. Technical details can be referred to our IEEE TPDS paper.

Start Your First Taskflow Program

The following program (simple.cpp) creates four tasks A, B, C, and D, where A runs before B and C, and D runs after B and C. When A finishes, B and C can run in parallel.

#include <taskflow/taskflow.hpp>  // Taskflow is header-only

int main(){

  tf::Executor executor;
  tf::Taskflow taskflow;

  auto [A, B, C, D] = taskflow.emplace(  // create four tasks
    [] () { std::cout << "TaskA\n"; },
    [] () { std::cout << "TaskB\n"; },
    [] () { std::cout << "TaskC\n"; },
    [] () { std::cout << "TaskD\n"; } 
  );                                  

  A.precede(B, C);  // A runs before B and C
  D.succeed(B, C);  // D runs after  B and C

  executor.run(taskflow).wait(); 

  return 0;
}

Taskflow is header-only and there is no wrangle with installation. To compile the program, clone the Taskflow project and tell the compiler to include the [headers](./taskflow/).

~$ git clone https://github.com/taskflow/taskflow.git  # clone it only once
<<<<<<< HEAD
~$ g++ -std=c++17 simple.cpp -I taskflow/ -O2 -pthread -o simple
=======
~$ g++ -std=c++17 simple.cpp -I taskflow -O2 -pthread -o simple
>>>>>>> a0328bf93ec5be3b2f79f46ac6f575a50c79e56b
~$ ./simple
TaskA
TaskC 
TaskB 
TaskD

Visualize Your First Taskflow Program

Taskflow comes with a built-in profiler, TFProf, for you to profile and visualize taskflow programs in an easy-to-use web-based interface.

[](doxygen/images/tfprof.png)

# run the program with the environment variable TF_ENABLE_PROFILER enabled
~$ TF_ENABLE_PROFILER=simple.json ./simple
~$ cat simple.json
[
{"executor":"0","data":[{"worker":0,"level":0,"data":[{"span":[172,186],"name":"0_0","type":"static"},{"span":[187,189],"name":"0_1","type":"static"}]},{"worker":2,"level":0,"data":[{"span":[93,164],"name":"2_0","type":"static"},{"span":[170,179],"name":"2_1","type":"static"}]}]}
]
# paste the profiling json data to https://taskflow.github.io/tfprof/

In addition to execution diagram, you can dump the graph to a DOT format and visualize it using a number of free GraphViz tools.

// dump the taskflow graph to a DOT format through std::cout
taskflow.dump(std::cout);

Express Task Graph Parallelism

Taskflow empowers users with both static and dynamic task graph constructions to express end-to-end parallelism in a task graph that embeds in-graph control flow.

Create a Subflow Graph
Integrate Control Flow to a Task Graph
Offload a Task to a GPU
Compose Task Graphs
Launch Asynchronous Tasks
Execute a Taskflow
Leverage Standard Parallel Algorithms

Create a Subflow Graph

Taskflow supports dynamic tasking for you to create a subflow graph from the execution of a task to perform dynamic parallelism. The following program spawns a task dependency graph parented at task B.

tf::Task A = taskflow.emplace([](){}).name("A");  
tf::Task C = taskflow.emplace([](){}).name("C");  
tf::Task D = taskflow.emplace([](){}).name("D");  

tf::Task B = taskflow.emplace([] (tf::Subflow& subflow) { 
  tf::Task B1 = subflow.emplace([](){}).name("B1");  
  tf::Task B2 = subflow.emplace([](){}).name("B2");  
  tf::Task B3 = subflow.emplace([](){}).name("B3");  
  B3.succeed(B1, B2);  // B3 runs after B1 and B2
}).name("B");

A.precede(B, C);  // A runs before B and C
D.succeed(B, C);  // D runs after  B and C

Integrate Control Flow to a Task Graph

Taskflow supports conditional tasking for you to make rapid control-flow decisions across dependent tasks to implement cycles and conditions in an end-to-end task graph.

tf::Task init = taskflow.emplace([](){}).name("init");
tf::Task stop = taskflow.emplace([](){}).name("stop");

// creates a condition task that returns a random binary
tf::Task cond = taskflow.emplace(
  [](){ return std::rand() % 2; }
).name("cond");

init.precede(cond);

// creates a feedback loop {0: cond, 1: stop}
cond.precede(cond, stop);

Offload a Task to a GPU

Taskflow supports GPU tasking for you to accelerate a wide range of scientific computing applications by harnessing the power of CPU-GPU collaborative computing using CUDA.

__global__ void saxpy(size_t N, float alpha, float* dx, float* dy) {
  int i = blockIdx.x*blockDim.x + threadIdx.x;
  if (i < n) {
    y[i] = a*x[i] + y[i];
  }
}
tf::Task cudaflow = taskflow.emplace([&](tf::cudaFlow& cf) {

  // data copy tasks
  tf::cudaTask h2d_x = cf.copy(dx, hx.data(), N).name("h2d_x");
  tf::cudaTask h2d_y = cf.copy(dy, hy.data(), N).name("h2d_y");
  tf::cudaTask d2h_x = cf.copy(hx.data(), dx, N).name("d2h_x");
  tf::cudaTask d2h_y = cf.copy(hy.data(), dy, N).name("d2h_y");

  // kernel task with parameters to launch the saxpy kernel
  tf::cudaTask saxpy = cf.kernel(
    (N+255)/256, 256, 0, saxpy, N, 2.0f, dx, dy
  ).name("saxpy");

  saxpy.succeed(h2d_x, h2d_y)
       .precede(d2h_x, d2h_y);
}).name("cudaFlow");

Compose Task Graphs

Taskflow is composable. You can create large parallel graphs through composition of modular and reusable blocks that are easier to optimize at an individual scope.

tf::Taskflow f1, f2;

// create taskflow f1 of two tasks
tf::Task f1A = f1.emplace([]() { std::cout << "Task f1A\n"; })
                 .name("f1A");
tf::Task f1B = f1.emplace([]() { std::cout << "Task f1B\n"; })
                 .name("f1B");

// create taskflow f2 with one module task composed of f1
tf::Task f2A = f2.emplace([]() { std::cout << "Task f2A\n"; })
                 .name("f2A");
tf::Task f2B = f2.emplace([]() { std::cout << "Task f2B\n"; })
                 .name("f2B");
tf::Task f2C = f2.emplace([]() { std::cout << "Task f2C\n"; })
                 .name("f2C");

tf::Task f1_module_task = f2.composed_of(f1)
                            .name("module");

f1_module_task.succeed(f2A, f2B)
              .precede(f2C);

Launch Asynchronous Tasks

Taskflow supports asynchronous tasking. You can launch tasks asynchronously to incorporate independent, dynamic parallelism in your taskflows.

tf::Executor executor;
tf::Taskflow taskflow;

// create asynchronous tasks directly from an executor
tf::Future<std::optional<int>> future = executor.async([](){ 
  std::cout << "async task returns 1\n";
  return 1;
}); 
executor.silent_async([](){ std::cout << "async task of no return\n"; });

// launch an asynchronous task from a running task
taskflow.emplace([&](){
  executor.async([](){ std::cout << "async task within a task\n"; });
});

executor.run(taskflow).wait();

Execute a Taskflow

The executor provides several thread-safe methods to run a taskflow. You can run a taskflow once, multiple times, or until a stopping criteria is met. These methods are non-blocking with a tf::Future<void> return to let you query the execution status.

// runs the taskflow once
tf::Future<void> run_once = executor.run(taskflow); 

// wait on this run to finish
run_once.get();

// run the taskflow four times
executor.run_n(taskflow, 4);

// runs the taskflow five times
executor.run_until(taskflow, [counter=5](){ return --counter == 0; });

// block the executor until all submitted taskflows complete
executor.wait_for_all();

Leverage Standard Parallel Algorithms

Taskflow defines algorithms for you to quickly express common parallel patterns using standard C++ syntaxes, such as parallel iterations, parallel reductions, and parallel sort.

// standard parallel CPU algorithms
tf::Task task1 = taskflow.for_each( // assign each element to 100 in parallel
  first, last, [] (auto& i) { i = 100; }    
);
tf::Task task2 = taskflow.reduce(   // reduce a range of items in parallel
  first, last, init, [] (auto a, auto b) { return a + b; }
);
tf::Task task3 = taskflow.sort(     // sort a range of items in parallel
  first, last, [] (auto a, auto b) { return a < b; }
);

// standard parallel GPU algorithms
tf::cudaTask cuda1 = cudaflow.for_each( // assign each element to 100 on GPU
  dfirst, dlast, [] __device__ (auto i) { i = 100; }
);
tf::cudaTask cuda2 = cudaflow.reduce(   // reduce a range of items on GPU
  dfirst, dlast, init, [] __device__ (auto a, auto b) { return a + b; }
);
tf::cudaTask cuda3 = cudaflow.sort(     // sort a range of items on GPU
  dfirst, dlast, [] __device__ (auto a, auto b) { return a < b; }
);

Additionally, Taskflow provides composable graph building blocks for you to efficiently implement common parallel algorithms, such as parallel pipeline.

// create a pipeline to propagate five tokens through three serial stages
tf::Pipeline pl(num_parallel_lines,
  tf::Pipe{tf::PipeType::SERIAL, [](tf::Pipeflow& pf) {
    if(pf.token() == 5) {
      pf.stop();
    }
  }},
  tf::Pipe{tf::PipeType::SERIAL, [](tf::Pipeflow& pf) {
    printf("stage 2: input buffer[%zu] = %d\n", pf.line(), buffer[pf.line()]);
  }},
  tf::Pipe{tf::PipeType::SERIAL, [](tf::Pipeflow& pf) {
    printf("stage 3: input buffer[%zu] = %d\n", pf.line(), buffer[pf.line()]);
  }}
);
taskflow.composed_of(pl)
executor.run(taskflow).wait();

Supported Compilers

To use Taskflow, you only need a compiler that supports C++17:

GNU C++ Compiler at least v8.4 with -std=c++17
Clang C++ Compiler at least v6.0 with -std=c++17
Microsoft Visual Studio at least v19.27 with /std:c++17
AppleClang Xode Version at least v12.0 with -std=c++17
Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17
Intel C++ Compiler at least v19.0.1 with -std=c++17
Intel DPC++ Clang Compiler at least v13.0.0 with -std=c++17 and SYCL20

Taskflow works on Linux, Windows, and Mac OS X.

Learn More about Taskflow

Visit our project website and documentation to learn more about Taskflow. To get involved:

See release notes to stay up-to-date with newest versions
Read the step-by-step tutorial at cookbook
Submit an issue at GitHub issues
Find out our technical details at references
Watch our technical talks at YouTube

CppCon20 Tech Talk	MUC++ Tech Talk
[](doxygen/images/cppcon20-thumbnail.jpg)

We are committed to support trustworthy developments for both academic and industrial research projects in parallel and heterogeneous computing. If you are using Taskflow, please cite the following paper we publised at 2021 IEEE TPDS:

Tsung-Wei Huang, Dian-Lun Lin, Chun-Xun Lin, and Yibo Lin, "Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing System," IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 33, no. 6, pp. 1303-1320, June 2022

More importantly, we appreciate all Taskflow contributors and the following organizations for sponsoring the Taskflow project!

<!-- -->	<!-- -->	<!-- -->	<!-- -->

License

Taskflow is licensed with the [MIT License](./LICENSE). You are completely free to re-distribute your work derived from Taskflow.

*Note that all licence references and agreements mentioned in the Taskflow README section above are relevant to that project's source code only.