All Versions
10
Latest Version
Avg Release Cycle
139 days
Latest Release
1398 days ago

Changelog History

  • v1.4.1

    May 04, 2017

    ๐Ÿš€ A bug fix release.

    • ๐Ÿ‘Œ Improvements for cmake scripts.
    • ๐Ÿ› Bug fixes.
  • v1.4.0

    April 19, 2017
    • ๐Ÿ‘ท Modernize cmake build system.
      Provide VexCL::OpenCL, VexCL::Compute, VexCL::CUDA, VexCL::JIT
      imported targets, so that users may just

      add_executable(myprogram myprogram.cpp) target_link_libraries(myprogram VexCL::OpenCL)

    ๐Ÿ— to build a program using the corresponding VexCL backend.
    Also stop polluting global cmake namespace with things like
    add_definitions(), include_directories(), etc.
    ๐Ÿ‘€ See http://vexcl.readthedocs.io/en/latest/cmake.html.

    • ๐Ÿ‘‰ Make vex::backend::kernel::config() return reference to the kernel. So
      that it is possible to config and launch the kernel in a single line:
      K.config(nblocks, nthreads)(queue, prm1, prm2, prm3);.
    • Implement vector<T>::reinterpret<U>() method. It returns a new vector that
      reinterprets the same data (no copies are made) as the new type.
    • Implemented new backend: JIT. The backend generates and compiles at runtime
      ๐Ÿ‘ C++ kernels with OpenMP support. The code will not be more effective that
      hand-written OpenMP code, but allows to easily debug the generated code with
      โœ… host-side debugger. The backend also may be used to develop and test new code
      when other backends are not available.
    • Let VEX_CONSTANTS to be casted to their values in the host code. So that a
      constant defined with VEX_CONSTANT(name, expr) could be used in host code
      as name. Constants are still useable in vector expressions as name().
    • ๐Ÿ‘ Allow passing generated kernel args for each GPU (#202).
      Kernel args packed into std::vector will be unpacked and passed
      to the generated kernels on respective devices.
    • ๐Ÿ“œ Reimplemented vex::SpMat as vex::sparse::ell, vex::sparse::crs,
      ๐Ÿ“œ vex::sparse::matrix (automatically chooses one of the two formats based on
      ๐Ÿ“œ the current compute device), and vex::sparse::distributed<format> (this one
      may span several compute devices). The new matrix-vector products are now
      normal vector expressions, while the old vex::SpMat could only be used in
      โž• additive expressions. The old implementation is still available.
      ๐Ÿ“œ vex::sparse::ell is now converted from host-side CRS format on compute
      device, which makes the conversion faster.
    • ๐Ÿ› Bug fixes and minor improvements.
  • v1.3.3

    April 06, 2015
    • โž• Added vex::tensordot() operation. Given two tensors (arrays of dimension greater than or equal to one), A and
      ๐Ÿ“„ B, and a list of axes pairs (where each pair represents corresponding axes from two tensors), sums the products of A's and B's elements over the given axes. Inspired by python's numpy.tensordot operation.
    • ๐Ÿ”ฆ Expose constant memory space in OpenCL backend.
    • Provide shortcut filters vex::Filter::{CPU,GPU,Accelerator} for OpenCL backend.
    • Added Boost.Compute backend. Core functionality of the Boost.Compute library is used as a replacement to Khronos C++ API which seems to become more and more outdated. The Boost.Compute backend is still based on OpenCL, so there are two OpenCL backends now. Define VEXCL_BACKEND_COMPUTE to use this backend and make sure Boost.Compute headers are in include path.
  • v1.3.2

    September 04, 2014
    • ๐Ÿ‘Œ Improved thread safety
    • Implemented any_of and all_of primitives
    • ๐Ÿ›  Minor bugfixes and improvements
  • v1.3.1

    May 14, 2014
    • Adopted scan_by_key algorithm from HSA-Libraries/Bolt.
    • ๐Ÿ›  Minor improvements and bug fixes.
  • v1.3.0

    April 14, 2014
    • API breaking change: vex::purge_kernel_caches() family of functions is
      ๐Ÿ“‡ renamed to vex::purge_caches() as the online cache now may hold objects of
      arbitrary type. The overloads that used to take
      vex::backend::kernel_cache_key now take const vex::backend::command_queue&.
    • The online cache is now purged whenever vex::Context is destroyed. This
      ๐Ÿ‘ allows for clean release of OpenCL/CUDA contexts.
    • Code for random number generators has been unified between OpenCL and CUDA
      backends.
    • ๐Ÿ‘ Fast Fourier Transform is now supported both for OpenCL and CUDA backends.
    • vex::backend::kernel constructor now takes optional parameter with command
      line options.
    • ๐ŸŽ Performance of CLOGS algorithms has been improved.
    • VEX_BUILTIN_FUNCTION macro has been made public.
    • ๐Ÿ›  Minor bug fixes and improvements.
  • v1.2.0

    April 02, 2014
    • API breaking change: the definition of VEX_FUNCTION family of macros has changed. The previous versions are available as VEX_FUNCTION_V1.
    • ๐Ÿ”Š Wrapping code for clogs library is added by @bmerry
      ๐Ÿ”Š (the author of clogs).
    • vector/multivector iterators are now standard-conforming iterators.
    • ๐Ÿ›  Other minor improvements and bug fixes.
  • v1.1.2

    December 24, 2013
    • reduce_by_key() may take several tied keys (see e09d249).
    • It is possible to reduce OpenCL vector types (cl_float2, cl_double4, etc).
    • ๐Ÿ‘‰ VEXCL_SHOW_KERNELS may be an environment variable as well as a preprocessor macro. This allows to control kernel source output without program recompilation.
    • โž• Added compute capability filter for the CUDA backend (vex::Filter::CC(major, minor)).
    • ๐Ÿ›  Fixed compilation errors and warnings generated by Visual Studio.
  • v1.1.1

    December 05, 2013

    Sorting algorithms may take tuples of keys/values (in fact, any Boost.Fusion sequence will do). One will have to explicitly specify the comparison functor in this case. Both host and device variants of the comparison functor should take 2n arguments, where n is the number of keys. The first n arguments correspond to the left set of keys, and the second n arguments correspond to the right set of keys. Here is an example that sorts values by a tuple of two keys:

    vex::vector\<int\> keys1(ctx, n); vex::vector\<float\> keys2(ctx, n); vex::vector\<double\> vals (ctx, n);struct { VEX\_FUNCTION(device, bool(int, float, int, float), "return (prm1 == prm3) ? (prm2 \< prm4) : (prm1 \< prm3);" ); bool operator()(int a1, float a2, int b1, float b2) const { return std::make\_tuple(a1, a2) \< std::tuple(b1, b2); } } comp;vex::sort\_by\_key(std::tie(keys1, keys2), vals, comp);
    
  • v1.1.0

    November 29, 2013
    • ๐Ÿ‘‰ vex::SpMat<>class uses CUSPARSE library on CUDA backend when VEXCL_USE_CUSPARSE macro is defined. This results in more effective sparse matrix-vector product, but disables inlining of SpMV operation.
    • Provided an example of CUDA backend interoperation with Thrust.
    • When VEXCL_CHECK_SIZES macro is defined to 1 or 2, then runtime checks for vector
      ๐Ÿ‘€ expression correctness are enabled (see #81, #82).
    • Added sort() and sort_by_key() functions.
    • Added inclusive_scan() and exclusive_scan() functions.
    • Added reduce_by_key() function. Only works with single-device contexts.
    • Added convert_<type>() and as_<type>() builtin functions for OpenCL backend.