OpenBLAS v0.3.6 Release Notes

Release Date: 2019-04-29 // almost 5 years ago
  • common:

    - the build tools now check that a given cpu TARGET is actually valid
    - the build-time check of system features (c_check) has been made
      less dependent on particular perl features (this should mainly
      benefit building on Windows)
    - several problems with ReLAPACK and its integration were fixed,
      including INTERFACE64 support and building a shared library
    - building with CMAKE on BSD systems was improved
    - a non-absolute SUM function was added based on the
      existing optimized code for ASUM
    - CBLAS interfaces to the IxMIN and IxMAX functions were added
    - a name clash between LAPACKE and BOOST headers was resolved
    - CMAKE builds with OpenMP failed to include the appropriate getrf_parallel
      kernels
    - a crash on thread (key) deletion with the USE_TLS=1 memory management
      option was fixed
    - restored several earlier fixes, in particular for OpenMP performance,
      building on BSD, and calling fork on CYGWIN, which had inadvertently
      been dropped in the 0.3.3 rewrite of the memory management code.
    

    POWER:

    - single precision BLAS1/2 functions have received optimized POWER8 kernels
    - POWER9 is now a separate target, with an optimized DGEMM/DTRMM kernel
    - building on PPC970 systems under OSX Leopard or Tiger is now supported
    - out-of-bounds memory accesses in the gemm_beta microkernels were fixed
    - building a shared library on AIX is now supported for POWER6
    - DYNAMIC_ARCH support has been added for POWER6 and newer
    

    ARMV7:

    - corrected xDOT behaviour with zero INC_X or INC_Y 
    - a bug in the IMIN implementation made it return the result of IMAX
    

    ARMV8:

    - added support for HiSilicon TSV110 cpus
    - the CMAKE build system now recognizes 32bit userspace on 64bit hardware 
    - cross-compilation with CMAKE now works again
    - a bug in the IMIN implementation made it return the result of IMAX
    - ARMV8 builds with the BINARY=32 option are now automatically handled as ARMV7
    

    x86_64:

    - the AVX512 DGEMM kernel has been disabled again due to unsolved problems
    - building with old versions of MSVC was fixed
    - it is now possible to build a static library on Windows with CMAKE
    - accessing environment variables on CYGWIN at run time was fixed
    - the CMAKE build system now recognizes 32bit userspace on 64bit hardware
    - Intel "Denverton" atom and Hygon "Dhyana" zen CPUs are now autodetected
    - building for DYNAMIC_ARCH with a DYNAMIC_LIST of targets is now supported
      with CMAKE as well
    - building for DYNAMIC_ARCH with GENERIC as the default target is now supported
    - a buffer overflow in the SSE GEMM kernel for Intel Nano targets was fixed
    - assembly bugs involving undeclared modification of input operands were fixed
      in the AXPY, DOT, GEMV, GER, SCAL, SYMV and TRSM microkernels for Nehalem, 
      Sandybridge, Haswell, Bulldozer and Piledriver. These would typically cause
      test failures or segfaults when compiled with recent versions of gcc from 8 onward.
    - a similar bug was fixed in the blas_quickdivide code used to split workloads
      in most functions
    - a bug in the IxMIN implementation for the GENERIC target made it return the result of IxMAX
    - fixed building on SkylakeX systems when either the compiler or the (emulated) operating 
      environment does not support AVX512
    - improved GEMM performance on ZEN targets
    

    x86:

    - build failures caused by the recently added checks for AVX512 were fixed
    - an inline assembly bug involving undeclared modification of an input argument was
      fixed in the blas_quickdivide code used to split workloads in most functions
    - a bug in the IMIN implementation for the GENERIC target made it return the result of IMAX
    

    MIPS32:

    - a bug in the IMIN implementation made it return the result of IMAX
    

    IBM Z:

    - optimized microkernels for single precicion BLAS1/2 functions have been added for Z13 and Z14
    

    Download OpenBLAS