Popularity

5.4

Stable

Activity

6.0

Stars 1,234

Watchers 30

Forks 91

Last Commit 21 days ago

Programming language: C++

License: Apache License 2.0

Tags: Concurrency Parallel Processing GPU C++14 C++17

Latest version: v1.3.0

stdgpu alternatives and similar libraries

Based on the "Concurrency" category.
Alternatively, view stdgpu alternatives based on common mentions on social networks and blogs.

moodycamel

9.2 4.8 L3 stdgpu VS moodycamel

A fast multi-producer, multi-consumer lock-free concurrent queue for C++11
Taskflow

9.1 9.3 stdgpu VS Taskflow

A General-purpose Task-parallel Programming System using Modern C++

JetBrains - Tell us how you use coding tools. You may win a prize!

Are you a developer or a data analyst? Share your thoughts about your coding tools in our short survey and get a chance to win prizes!

Promo surveys.jetbrains.com

Thrust

8.3 6.9 L4 stdgpu VS Thrust

DISCONTINUED. [ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
NCCL

8.1 6.6 stdgpu VS NCCL

Optimized primitives for collective multi-GPU communication
ArrayFire

7.9 8.5 L2 stdgpu VS ArrayFire

ArrayFire: a general purpose GPU library.
readerwriterqueue

7.9 3.9 stdgpu VS readerwriterqueue

A fast single-producer, single-consumer lock-free queue for C++
C++ Actor Framework

7.6 9.5 stdgpu VS C++ Actor Framework

An Open Source Implementation of the Actor Model in C++
HPX

7.3 9.7 L2 stdgpu VS HPX

The C++ Standard Library for Parallelism and Concurrency
libcds

7.2 0.0 L2 stdgpu VS libcds

A C++ library of Concurrent Data Structures
libmill

7.2 0.0 stdgpu VS libmill

Go-style concurrency in C
ck

7.1 6.4 L3 stdgpu VS ck

Concurrency primitives, safe memory reclamation mechanisms and non-blocking (including lock-free) data structures designed to aid in the research, design and implementation of high performance concurrent systems developed in C99+.
Boost.Compute

6.5 1.2 L3 stdgpu VS Boost.Compute

A C++ GPU Computing Library for OpenCL
moderngpu

6.5 2.6 L3 stdgpu VS moderngpu

Patterns and behaviors for GPU computing
libdill

6.3 0.0 stdgpu VS libdill

Structured concurrency in C
junction

5.9 0.0 L2 stdgpu VS junction

Concurrent data structures in C++
MPMCQueue.h

5.8 2.8 stdgpu VS MPMCQueue.h

A bounded multi-producer multi-consumer concurrent queue written in C++11
C++React

5.4 0.0 L4 stdgpu VS C++React

C++React: A reactive programming library for C++11.
RaftLib

5.3 5.7 stdgpu VS RaftLib

The RaftLib C++ library, streaming/dataflow concurrency via C++ iostream-like operators
SPSCQueue.h

5.2 4.6 stdgpu VS SPSCQueue.h

A bounded single-producer single-consumer wait-free and lock-free queue written in C++11
VexCL

4.7 5.2 L1 stdgpu VS VexCL

VexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP
continuable

4.7 5.2 L4 stdgpu VS continuable

C++14 asynchronous allocation aware futures (supporting then, exception handling, coroutines and connections)
A C++14 library for executors

4.2 0.0 L4 stdgpu VS A C++14 library for executors

C++ library for executors
xenium

4.1 4.7 stdgpu VS xenium

A C++ library providing various concurrent data structures and reclamation schemes.
Bolt

3.9 0.0 L1 stdgpu VS Bolt

Bolt is a C++ template library optimized for GPUs. Bolt provides high-performance library implementations for common algorithms such as scan, reduce, transform, and sort.
thread-pool

3.8 4.1 stdgpu VS thread-pool

A modern, fast, lightweight thread pool library based on C++2x
CUB

2.6 2.7 stdgpu VS CUB

DISCONTINUED. THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.
Light Actor Framework

2.2 0.0 stdgpu VS Light Actor Framework

DISCONTINUED. Laughably simple yet effective Actor concurrency framework for C++20
BlockingCollection

2.2 0.0 stdgpu VS BlockingCollection

C++11 thread safe, multi-producer, multi-consumer blocking queue, stack & priority queue class
SObjectizer

2.2 0.0 L4 stdgpu VS SObjectizer

SObjectizer: it's all about in-process message dispatching!
Libclsph

1.9 0.0 L1 stdgpu VS Libclsph

OpenCL based GPU accelerated SPH fluid simulation library
Easy Creation of GnuPlot Scripts from C++

1.9 0.7 stdgpu VS Easy Creation of GnuPlot Scripts from C++

A simple C++17 lib that helps you to quickly plot your data with GnuPlot
alpaka

1.2 0.0 stdgpu VS alpaka

The project alpaka has moved to https://github.com/alpaka-group/alpaka
cupla

1.2 0.0 stdgpu VS cupla

The project alpaka has moved to https://github.com/alpaka-group/cupla
eXtended Template Library

1.1 0.0 stdgpu VS eXtended Template Library

eXtended Template Library
gocxx

1.0 6.5 stdgpu VS gocxx

Go-inspired standard libraries for modern C++
wstpool

0.9 0.0 stdgpu VS wstpool

Work Stealing Thread Pool
OpenCL

- stdgpu VS OpenCL

The open standard for parallel programming of heterogeneous systems.
OpenMP

- stdgpu VS OpenMP

The OpenMP API.

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of stdgpu or a related project?

Add another 'Concurrency' Library

Popular Comparisons

README

stdgpu: Efficient STL-like Data Structures on the GPU

Features

stdgpu is an open-source library providing several generic GPU data structures for fast and reliable data management. Multiple platforms such as CUDA, OpenMP, and HIP are supported allowing you to rapidly write highly complex agnostic and native algorithms that look like sequential CPU code but are executed in parallel on the GPU.

Productivity. Previous libraries such as thrust, VexCL, ArrayFire or Boost.Compute focus on the fast and efficient implementation of various algorithms for contiguously stored data to enhance productivity. stdgpu follows an orthogonal approach and focuses on fast and reliable data management to enable the rapid development of more general and flexible GPU algorithms just like their CPU counterparts.
Interoperability. Instead of providing yet another ecosystem, stdgpu is designed to be a lightweight container library. Therefore, a core feature of stdgpu is its interoperability with previous established frameworks, i.e. the thrust library, to enable a seamless integration into new as well as existing projects.
Maintainability. Following the trend in recent C++ standards of providing functionality for safer and more reliable programming, the philosophy of stdgpu is to provide clean and familiar functions with strong guarantees that encourage users to write more robust code while giving them full control to achieve a high performance.

At its heart, stdgpu offers the following GPU data structures and containers:

atomic & atomic_refAtomic primitive types and references bitsetSpace-efficient bit array dequeDynamically sized double-ended queue queue & stackContainer adapters unordered_map & unordered_setHashed collection of unique keys and key-value pairs vectorDynamically sized contiguous array

In addition, stdgpu also provides commonly required functionality in algorithm, bit, cmath, contract, cstddef, functional, iterator, limits, memory, mutex, ranges, utility to complement the GPU data structures and to increase their usability and interoperability.

Examples

In order to reliably perform complex tasks on the GPU, stdgpu offers flexible interfaces that can be used in both agnostic code, e.g. via the algorithms provided by thrust, as well as in native code, e.g. in custom CUDA kernels.

For instance, stdgpu is extensively used in SLAMCast, a scalable live telepresence system, to implement real-time, large-scale 3D scene reconstruction as well as real-time 3D data streaming between a server and an arbitrary number of remote clients.

Agnostic code. In the context of SLAMCast, a simple task is the integration of a range of updated blocks into the duplicate-free set of queued blocks for data streaming which can be expressed very conveniently:

#include <stdgpu/cstddef.h>             // stdgpu::index_t
#include <stdgpu/iterator.h>            // stdgpu::make_device
#include <stdgpu/unordered_set.cuh>     // stdgpu::unordered_set

class stream_set
{
public:
    void
    add_blocks(const short3* blocks,
               const stdgpu::index_t n)
    {
        set.insert(stdgpu::make_device(blocks),
                   stdgpu::make_device(blocks + n));
    }

    // Further functions

private:
    stdgpu::unordered_set<short3> set;
    // Further members
};

Native code. More complex operations such as the creation of the duplicate-free set of updated blocks or other algorithms can be implemented natively, e.g. in custom CUDA kernels with stdgpu's CUDA backend enabled:

#include <stdgpu/cstddef.h>             // stdgpu::index_t
#include <stdgpu/unordered_map.cuh>     // stdgpu::unordered_map
#include <stdgpu/unordered_set.cuh>     // stdgpu::unordered_set

__global__ void
compute_update_set(const short3* blocks,
                   const stdgpu::index_t n,
                   const stdgpu::unordered_map<short3, voxel*> tsdf_block_map,
                   stdgpu::unordered_set<short3> mc_update_set)
{
    // Global thread index
    stdgpu::index_t i = blockIdx.x * blockDim.x + threadIdx.x;
    if (i >= n) return;

    short3 b_i = blocks[i];

    // Neighboring candidate blocks for the update
    short3 mc_blocks[8]
    = {
        short3(b_i.x - 0, b_i.y - 0, b_i.z - 0),
        short3(b_i.x - 1, b_i.y - 0, b_i.z - 0),
        short3(b_i.x - 0, b_i.y - 1, b_i.z - 0),
        short3(b_i.x - 0, b_i.y - 0, b_i.z - 1),
        short3(b_i.x - 1, b_i.y - 1, b_i.z - 0),
        short3(b_i.x - 1, b_i.y - 0, b_i.z - 1),
        short3(b_i.x - 0, b_i.y - 1, b_i.z - 1),
        short3(b_i.x - 1, b_i.y - 1, b_i.z - 1),
    };

    for (stdgpu::index_t j = 0; j < 8; ++j)
    {
        // Only consider existing neighbors
        if (tsdf_block_map.contains(mc_blocks[j]))
        {
            mc_update_set.insert(mc_blocks[j]);
        }
    }
}

More examples can be found in the examples directory.

Documentation

A comprehensive introduction into the design and API of stdgpu can be found here:

stdgpu API documentation
thrust algorithms documentation
Research paper

Since a core feature and design goal of stdgpu is its interoperability with thrust, it offers full support for all thrust algorithms instead of reinventing the wheel. More information about the design can be found in the related research paper.

Building

Before building the library, please make sure that all required tools and dependencies are installed on your system. Newer versions are supported as well.

Required

C++14 compiler
- GCC 7
  - (Ubuntu 18.04/20.04) sudo apt install g++ make
- Clang 6
  - (Ubuntu 18.04/20.04) sudo apt install clang make
- MSVC 19.20
  - (Windows) Visual Studio 2019 https://visualstudio.microsoft.com/downloads/
CMake 3.15
- (Ubuntu 18.04) https://apt.kitware.com
- (Ubuntu 20.04) sudo apt install cmake
- (Windows) https://cmake.org/download
thrust 1.9.2
- (Ubuntu/Windows) https://github.com/NVIDIA/thrust
- May already be installed by backend dependencies

Required for CUDA backend

CUDA compiler
- NVCC
  - Already included in CUDA Toolkit
- Clang 10
  - (Ubuntu 18.04/20.04) sudo apt install clang-10 or https://apt.llvm.org/
  - Requires at least CMake 3.18
CUDA Toolkit 10.0
- (Ubuntu/Windows) https://developer.nvidia.com/cuda-downloads
- Includes thrust

Required for OpenMP backend

OpenMP 2.0
- GCC 7
  - (Ubuntu 18.04/20.04) Already installed
- Clang 6
  - (Ubuntu 18.04/20.04) sudo apt install libomp-dev
- MSVC 19.20
  - (Windows) Already installed

Required for HIP backend (experimental)

ROCm 5.1
- (Ubuntu) https://github.com/RadeonOpenCompute/ROCm
- Includes thrust
CMake 3.21.3
- (Ubuntu 18.04/20.04) https://apt.kitware.com
- (Windows) https://cmake.org/download
- Required for first-class HIP language support

The library can be built as every other project which makes use of the CMake build system.

In addition, we also provide cross-platform scripts to make the build process more convenient. Since these scripts depend on the selected build type, there are scripts for both debug and release builds.

Command	Effect
bash scripts/setup.sh [<build_type>]	Performs a full clean build of the project. Removes old build, configures the project (build path: `./build`, default build type: `Release`), builds the project, and runs the unit tests.
bash scripts/build.sh [<build_type>]	(Re-)Builds the project. Requires that the project is set up (default build type: `Release`).
bash scripts/run_tests.sh [<build_type>]	Runs the unit tests. Requires that the project is built (default build type: `Release`).
bash scripts/install.sh [<build_type>]	Installs the project to the configured install path (default install dir: `./bin`, default build type: `Release`).
bash scripts/uninstall.sh [<build_type>]	Uninstalls the project from the configured install path (default build type: `Release`).

Integration

In the following, we show some examples on how the library can be integrated into and used in a project.

CMake Integration. To use the library in your project, you can either install it externally first and then include it using find_package:

find_package(stdgpu 1.0.0 REQUIRED)

add_library(foo ...)

target_link_libraries(foo PUBLIC stdgpu::stdgpu)

Or you can embed it into your project and build it from a subdirectory:

# Exclude the examples from the build
set(STDGPU_BUILD_EXAMPLES OFF CACHE INTERNAL "")

# Exclude the benchmarks from the build
set(STDGPU_BUILD_BENCHMARKS OFF CACHE INTERNAL "")

# Exclude the tests from the build
set(STDGPU_BUILD_TESTS OFF CACHE INTERNAL "")

add_subdirectory(stdgpu)

add_library(foo ...)

target_link_libraries(foo PUBLIC stdgpu::stdgpu)

CMake Options. To configure the library, two sets of options are provided. The following build options control the build process:

Build Option	Effect	Default
`STDGPU_BACKEND`	Device system backend	`STDGPU_BACKEND_CUDA`
`STDGPU_BUILD_SHARED_LIBS`	Builds the project as a shared library, if set to `ON`, or as a static library, if set to `OFF`	`BUILD_SHARED_LIBS`
`STDGPU_SETUP_COMPILER_FLAGS`	Constructs the compiler flags	`ON` if standalone, `OFF` if included via `add_subdirectory`
`STDGPU_TREAT_WARNINGS_AS_ERRORS`	Treats compiler warnings as errors	`OFF`
`STDGPU_BUILD_EXAMPLES`	Build the examples	`ON`
`STDGPU_BUILD_BENCHMARKS`	Build the benchmarks	`ON`
`STDGPU_BUILD_TESTS`	Build the unit tests	`ON`
`STDGPU_BUILD_TEST_COVERAGE`	Build a test coverage report	`OFF`
`STDGPU_ANALYZE_WITH_CLANG_TIDY`	Analyzes the code with clang-tidy	`OFF`
`STDGPU_ANALYZE_WITH_CPPCHECK`	Analyzes the code with cppcheck	`OFF`

In addition, the implementation of some functionality can be controlled via configuration options:

Configuration Option	Effect	Default
`STDGPU_ENABLE_CONTRACT_CHECKS`	Enable contract checks	`OFF` if `CMAKE_BUILD_TYPE` equals `Release` or `MinSizeRel`, `ON` otherwise
`STDGPU_USE_32_BIT_INDEX`	Use 32-bit instead of 64-bit signed integer for `index_t`	`ON`

Contributing

For detailed information on how to contribute, see CONTRIBUTING.

License

Distributed under the Apache 2.0 License. See LICENSE for more information.

If you use stdgpu in one of your projects, please cite the following publications:

stdgpu: Efficient STL-like Data Structures on the GPU

@UNPUBLISHED{stotko2019stdgpu,
    author = {Stotko, P.},
     title = {{stdgpu: Efficient STL-like Data Structures on the GPU}},
      year = {2019},
     month = aug,
      note = {arXiv:1908.05936},
       url = {https://arxiv.org/abs/1908.05936}
}

SLAMCast: Large-Scale, Real-Time 3D Reconstruction and Streaming for Immersive Multi-Client Live Telepresence

@article{stotko2019slamcast,
    author = {Stotko, P. and Krumpen, S. and Hullin, M. B. and Weinmann, M. and Klein, R.},
     title = {{SLAMCast: Large-Scale, Real-Time 3D Reconstruction and Streaming for Immersive Multi-Client Live Telepresence}},
   journal = {IEEE Transactions on Visualization and Computer Graphics},
    volume = {25},
    number = {5},
     pages = {2102--2112},
      year = {2019},
     month = may
}

Contact

Patrick Stotko - [email protected]

*Note that all licence references and agreements mentioned in the stdgpu README section above are relevant to that project's source code only.