NCCL alternatives and similar libraries
Based on the "Concurrency" category.
Alternatively, view NCCL alternatives based on common mentions on social networks and blogs.
-
Thrust
DISCONTINUED. [ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl -
ck
Concurrency primitives, safe memory reclamation mechanisms and non-blocking (including lock-free) data structures designed to aid in the research, design and implementation of high performance concurrent systems developed in C99+. -
SPSCQueue.h
A bounded single-producer single-consumer wait-free and lock-free queue written in C++11 -
continuable
C++14 asynchronous allocation aware futures (supporting then, exception handling, coroutines and connections) -
Bolt
Bolt is a C++ template library optimized for GPUs. Bolt provides high-performance library implementations for common algorithms such as scan, reduce, transform, and sort. -
CUB
DISCONTINUED. THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE. -
BlockingCollection
C++11 thread safe, multi-producer, multi-consumer blocking queue, stack & priority queue class -
Light Actor Framework
DISCONTINUED. Laughably simple yet effective Actor concurrency framework for C++20 -
Easy Creation of GnuPlot Scripts from C++
A simple C++17 lib that helps you to quickly plot your data with GnuPlot
CodeRabbit: AI Code Reviews for Developers

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of NCCL or a related project?
Popular Comparisons
README
NCCL
Optimized primitives for inter-GPU communication.
Introduction
NCCL (pronounced "Nickel") is a stand-alone library of standard communication routines for GPUs, implementing all-reduce, all-gather, reduce, broadcast, reduce-scatter, as well as any send/receive based communication pattern. It has been optimized to achieve high bandwidth on platforms using PCIe, NVLink, NVswitch, as well as networking using InfiniBand Verbs or TCP/IP sockets. NCCL supports an arbitrary number of GPUs installed in a single node or across multiple nodes, and can be used in either single- or multi-process (e.g., MPI) applications.
For more information on NCCL usage, please refer to the NCCL documentation.
Build
Note: the official and tested builds of NCCL can be downloaded from: https://developer.nvidia.com/nccl. You can skip the following build steps if you choose to use the official builds.
To build the library :
$ cd nccl
$ make -j src.build
If CUDA is not installed in the default /usr/local/cuda path, you can define the CUDA path with :
$ make src.build CUDA_HOME=<path to cuda install>
NCCL will be compiled and installed in build/
unless BUILDDIR
is set.
By default, NCCL is compiled for all supported architectures. To accelerate the compilation and reduce the binary size, consider redefining NVCC_GENCODE
(defined in makefiles/common.mk
) to only include the architecture of the target platform :
$ make -j src.build NVCC_GENCODE="-gencode=arch=compute_70,code=sm_70"
Install
To install NCCL on the system, create a package then install it as root.
Debian/Ubuntu :
$ # Install tools to create debian packages
$ sudo apt install build-essential devscripts debhelper fakeroot
$ # Build NCCL deb package
$ make pkg.debian.build
$ ls build/pkg/deb/
RedHat/CentOS :
$ # Install tools to create rpm packages
$ sudo yum install rpm-build rpmdevtools
$ # Build NCCL rpm package
$ make pkg.redhat.build
$ ls build/pkg/rpm/
OS-agnostic tarball :
$ make pkg.txz.build
$ ls build/pkg/txz/
Tests
Tests for NCCL are maintained separately at https://github.com/nvidia/nccl-tests.
$ git clone https://github.com/NVIDIA/nccl-tests.git
$ cd nccl-tests
$ make
$ ./build/all_reduce_perf -b 8 -e 256M -f 2 -g <ngpus>
Copyright
All source code and accompanying documentation is copyright (c) 2015-2020, NVIDIA CORPORATION. All rights reserved.