OpenBLAS v0.3.11 Release Notes

Release Date: 2020-10-17 // over 3 years ago
  • NOTE there appear to be several defects in this version unfortunately - this should not be redistributed or used in a production environment

    common:

    • API change:

      the newly added BFLOAT16 functions were renamed to use the
      letter "B" instead of "H" to avoid potential confusion with
      the IEEE "half precision float" type, i.e. the 0.3.10
      SHGEMM is now SBGEMM and the corresponding build option
      was changed from "BUILD_HALF" to "BUILD_BFLOAT16".
      
    • Reduced the default BLAS3_MEM_ALLOC_THRESHOLD (used as an upper
      limit for placing temporary arrays on the stack) to be compatible
      with a stack size of 1mb (as imposed by the JAVA runtime library)

    • โž• Added mixed-precision dot function SBDOT and utility functions
      shstobf16, shdtobf16, sbf16tos and dbf16tod to convert between
      single or double precision float arrays and bfloat16 arrays

    • Fixed prototypes of LAPACK_?ggsvp and LAPACK_?ggsvd functions
      in lapack.h

    • ๐Ÿ›  Fixed underflow and rounding errors in LAPACK SLANV2 and DLANV2
      (causing miscalculations in e.g. SHSEQR/DHSEQR, LAPACK issue #263)

    • ๐Ÿ›  Fixed workspace calculation in LAPACK ?GELQ (LAPACK issue #415)

    • ๐Ÿ›  Fixed several bugs in the LAPACK testsuite

    • ๐Ÿ‘Œ Improved performance of TRMM and TRSM for certain problem sizes

    • ๐Ÿ›  Fixed infinite recursions and workspace miscalculations in ReLAPACK

    • ๐Ÿ— CMAKE builds no longer require pkg-config for creating the .pc file

    • Makefile builds no longer misread NO_CBLAS=0 or NO_LAPACK=0 as
      enabling these options

    • ๐Ÿ›  Fixed detection of gfortran when invoked through an mpi wrapper

    • ๐Ÿ‘Œ Improve thread reinitialization performance with OpenMP after a fork

    • โž• Added support for building only the subset of the library required
      ๐Ÿ— for a particular precision by specifying BUILD_SINGLE, BUILD_DOUBLE

    • ๐Ÿ›  Optional function name prefixes and suffixes are now correctly
      reflected in the generated cblas.h

    • โž• Added CMAKE build support for the LAPACK and multithreading tests

    POWER:

    • โž• Added optimized support for POWER10
    • โž• Added support for compiling for POWER8 in 32bit mode
    • โž• Added support for compilation with LLVM/clang
    • โž• Added support for compilation with NVIDIA/PGI compilers
    • ๐Ÿ›  Fixed building on big-endian POWER8
    • ๐Ÿ›  Fixed miscompilation of ZDOTC by gcc10
    • ๐Ÿ›  Fixed alignment errors in the POWER8 SAXPY kernel
    • ๐Ÿ‘Œ Improved CPU detection on AIX
    • ๐Ÿ‘Œ Supported building with older compilers on POWER9

    x86_64:

    • โž• Added support for Intel Cooperlake
    • โž• Added autodetection of AMD Renoir/Matisse/Zen3 cpus
    • โž• Added autodetection of Intel Comet Lake cpus
    • Reimplemented ?sum, ?dot and daxpy using universal intrinsics
    • ๐Ÿ Reset the fpu state before using the fpu on Windows as a workaround
      ๐Ÿ for a problem introduced in Windows 10 build 19041 (a.k.a. SDK 2004)
    • ๐Ÿ›  Fixed potentially undefined behaviour in the dot and gemv_t kernels
    • ๐Ÿ›  Fixed a potential segmentation fault in DYNAMIC_ARCH builds
    • ๐Ÿ›  Fixed building for ZEN with PGI/NVIDIA and AMD AOCC compilers

    ARMV7:

    • ๐Ÿ›  Fixed cpu detection on BSD-like systems

    ARMV8:

    • โž• Added preliminary support for Apple Vortex cpus
    • โž• Added support for the Cavium ThunderX3T110 cpu
    • ๐Ÿ›  Fixed cpu detection on BSD-like systems
    • ๐Ÿ›  Fixed compilation in -std=C18 mode

    IBM Z:

    • โž• Added support for compiling with the clang compiler
    • ๐Ÿ‘Œ Improved GEMM performance on Z14

    Download OpenBLAS

    md5sums:
    dd211b73398383a44ebd75fffabd937a OpenBLAS-0.3.11.tar.gz
    a76bfee7c125071bce6b24eae5b07468 OpenBLAS-0.3.11.zip
    bad36be9fe4fe40372b06d326cfc5a2f OpenBLAS-0.3.11-x64.zip