Changelog History
-
v1.4.1 Changes
May 04, 2017๐ A bug fix release.
- ๐ Improvements for cmake scripts.
- ๐ Bug fixes.
-
v1.4.0 Changes
April 19, 2017๐ท Modernize cmake build system.
ProvideVexCL::OpenCL
,VexCL::Compute
,VexCL::CUDA
,VexCL::JIT
imported targets, so that users may justadd_executable(myprogram myprogram.cpp) target_link_libraries(myprogram VexCL::OpenCL)
๐ to build a program using the corresponding VexCL backend.
Also stop polluting global cmake namespace with things like
add_definitions()
,include_directories()
, etc.
๐ See http://vexcl.readthedocs.io/en/latest/cmake.html.- ๐ Make
vex::backend::kernel::config()
return reference to the kernel. So
that it is possible to config and launch the kernel in a single line:
K.config(nblocks, nthreads)(queue, prm1, prm2, prm3);
. - Implement
vector<T>::reinterpret<U>()
method. It returns a new vector that
reinterprets the same data (no copies are made) as the new type. - Implemented new backend: JIT. The backend generates and compiles at runtime
๐ C++ kernels with OpenMP support. The code will not be more effective that
hand-written OpenMP code, but allows to easily debug the generated code with
โ host-side debugger. The backend also may be used to develop and test new code
when other backends are not available. - Let
VEX_CONSTANTS
to be casted to their values in the host code. So that a
constant defined withVEX_CONSTANT(name, expr)
could be used in host code
asname
. Constants are still useable in vector expressions asname()
. - ๐ Allow passing generated kernel args for each GPU (#202).
Kernel args packed into std::vector will be unpacked and passed
to the generated kernels on respective devices. - ๐ Reimplemented
vex::SpMat
asvex::sparse::ell
,vex::sparse::crs
,
๐vex::sparse::matrix
(automatically chooses one of the two formats based on
๐ the current compute device), andvex::sparse::distributed<format>
(this one
may span several compute devices). The new matrix-vector products are now
normal vector expressions, while the oldvex::SpMat
could only be used in
โ additive expressions. The old implementation is still available.
๐vex::sparse::ell
is now converted from host-side CRS format on compute
device, which makes the conversion faster. - ๐ Bug fixes and minor improvements.
-
v1.3.3 Changes
April 06, 2015- โ Added
vex::tensordot()
operation. Given two tensors (arrays of dimension greater than or equal to one), A and
๐ B, and a list of axes pairs (where each pair represents corresponding axes from two tensors), sums the products of A's and B's elements over the given axes. Inspired by python's numpy.tensordot operation. - ๐ฆ Expose constant memory space in OpenCL backend.
- Provide shortcut filters
vex::Filter::{CPU,GPU,Accelerator}
for OpenCL backend. - Added Boost.Compute backend. Core functionality of the Boost.Compute library is used as a replacement to Khronos C++ API which seems to become more and more outdated. The Boost.Compute backend is still based on OpenCL, so there are two OpenCL backends now. Define
VEXCL_BACKEND_COMPUTE
to use this backend and make sure Boost.Compute headers are in include path.
- โ Added
-
v1.3.2 Changes
September 04, 2014- ๐ Improved thread safety
- Implemented any_of and all_of primitives
- ๐ Minor bugfixes and improvements
-
v1.3.1 Changes
May 14, 2014- Adopted
scan_by_key
algorithm from HSA-Libraries/Bolt. - ๐ Minor improvements and bug fixes.
- Adopted
-
v1.3.0 Changes
April 14, 2014- API breaking change:
vex::purge_kernel_caches()
family of functions is
๐ renamed tovex::purge_caches()
as the online cache now may hold objects of
arbitrary type. The overloads that used to take
vex::backend::kernel_cache_key
now takeconst vex::backend::command_queue&
. - The online cache is now purged whenever
vex::Context
is destroyed. This
๐ allows for clean release of OpenCL/CUDA contexts. - Code for random number generators has been unified between OpenCL and CUDA
backends. - ๐ Fast Fourier Transform is now supported both for OpenCL and CUDA backends.
vex::backend::kernel
constructor now takes optional parameter with command
line options.- ๐ Performance of CLOGS algorithms has been improved.
- VEX_BUILTIN_FUNCTION macro has been made public.
- ๐ Minor bug fixes and improvements.
- API breaking change:
-
v1.2.0 Changes
April 02, 2014- API breaking change: the definition of
VEX_FUNCTION
family of macros has changed. The previous versions are available asVEX_FUNCTION_V1
. - ๐ Wrapping code for clogs library is added by @bmerry
๐ (the author of clogs). vector
/multivector
iterators are now standard-conforming iterators.- ๐ Other minor improvements and bug fixes.
- API breaking change: the definition of
-
v1.1.2 Changes
December 24, 2013reduce_by_key()
may take several tied keys (see e09d249).- It is possible to reduce OpenCL vector types (
cl_float2
,cl_double4
, etc). - ๐
VEXCL_SHOW_KERNELS
may be an environment variable as well as a preprocessor macro. This allows to control kernel source output without program recompilation. - โ Added compute capability filter for the CUDA backend (
vex::Filter::CC(major, minor)
). - ๐ Fixed compilation errors and warnings generated by Visual Studio.
-
v1.1.1 Changes
December 05, 2013Sorting algorithms may take tuples of keys/values (in fact, any Boost.Fusion sequence will do). One will have to explicitly specify the comparison functor in this case. Both host and device variants of the comparison functor should take
2n
arguments, wheren
is the number of keys. The firstn
arguments correspond to the left set of keys, and the secondn
arguments correspond to the right set of keys. Here is an example that sorts values by a tuple of two keys:vex::vector\<int\> keys1(ctx, n); vex::vector\<float\> keys2(ctx, n); vex::vector\<double\> vals (ctx, n);struct { VEX\_FUNCTION(device, bool(int, float, int, float), "return (prm1 == prm3) ? (prm2 \< prm4) : (prm1 \< prm3);" ); bool operator()(int a1, float a2, int b1, float b2) const { return std::make\_tuple(a1, a2) \< std::tuple(b1, b2); } } comp;vex::sort\_by\_key(std::tie(keys1, keys2), vals, comp);
-
v1.1.0 Changes
November 29, 2013- ๐
vex::SpMat<>
class uses CUSPARSE library on CUDA backend whenVEXCL_USE_CUSPARSE
macro is defined. This results in more effective sparse matrix-vector product, but disables inlining of SpMV operation. - Provided an example of CUDA backend interoperation with Thrust.
- When
VEXCL_CHECK_SIZES
macro is defined to 1 or 2, then runtime checks for vector
๐ expression correctness are enabled (see #81, #82). - Added
sort()
andsort_by_key()
functions. - Added
inclusive_scan()
andexclusive_scan()
functions. - Added
reduce_by_key()
function. Only works with single-device contexts. - Added
convert_<type>()
andas_<type>()
builtin functions for OpenCL backend.
- ๐