OpenBLAS v0.3.0 Release Notes

Release Date: 2018-05-23 // almost 6 years ago
  • common:

    * fixed some more thread race and locking bugs
    * added preliminary support for calling an OpenMP build of the library from multiple threads
    * removed performance impact of thread locks added in 0.2.20 on OpenMP code
    * general code cleanup 
    * optimized DSDOT implementation
    * improved thread distribution for GEMM
    * corrected IMATCOPY/OMATCOPY implementation
    * fixed out-of-bounds accesses in the multithreaded xBMV/xPMV and SYMV implementations
    * cmake build improvements
    * pkgconfig file now contains build options
    * openblas_get_config() now reports USE_OPENMP and NUM_THREADS settings used for the build
    * corrections and improvements for systems with more than 64 cpus
    * LAPACK code updated to 3.8.0 including later fixes
    * added ReLAPACK, a recursive implementation of several LAPACK functions
    * Rewrote ROTMG to handle cases that the netlib code failed to address
    * Disabled (broken) multithreading code for xTRMV
    * corrected prototypes of complex CBLAS functions to make our cblas.h match the generally accepted standard
    * shared memory access failures on startup are now handled more gracefully
    * restored utests from earlier releases (and made them pass on all affected systems)
    

    SPARC:

    * several fixes for cpu autodetection
    

    POWER:

    * corrected vector register overwriting in several Power8 kernels
    * optimized additional BLAS functions
    

    ARM:

    * added support for CortexA53 and A72 
    * added autodetection for ThunderX2T99
    * made most optimized kernels the default for generic ARMv8 targets 
    

    x86_64:

    * parallelized DDOT kernel for Haswell
    * changed alignment directives in assembly kernels to boost performance on OSX
    * fixed register handling in the GEMV microkernels (bug exposed by gcc7)
    * added support for building on OpenBSD and Dragonfly 
    * updated compiler options to work with Intel release 2018
    * support fully optimized build with clang/flang on Microsoft Windows
    * fixed building on AIX
    

    IBM Z:

    * added optimized BLAS 1/2 functions
    

    MIPS:

    * fixed cpu autodetection helper code
    * added mips32 1004K cpu (Mediatek MT7621 and similar SoC)
    * added mips64 I6500 cpu
    

    Download OpenBLAS