OpenBLAS v0.3.10 Release Notes

Release Date: 2020-06-14 // almost 4 years ago
  • common:

    • ๐Ÿ‘Œ Improved thread locking behaviour in blas_server and parallel getrf
    • ๐Ÿ›  Imported bugfix 394 from LAPACK (spurious reference to "XERBL"
      due to overlong lines)
    • ๐Ÿ›  Imported bugfix 403 from LAPACK (compile option "recursive" required
      for correctness with Intel and PGI)
    • ๐Ÿ›  Imported bugfix 408 from LAPACK (wrong scaling in ZHEEQUB)
    • ๐Ÿ›  Imported bugfix 411 from LAPACK (infinite loop in LARGV/LARTG/LARTGP)
    • ๐Ÿ›  Fixed mismatches between BUFFERSIZE and GEMM_UNROLL parameters that
      could lead to crashes at large matrix sizes
    • โช Restored internal soname in dynamic libraries on FreeBSD and Dragonfly
    • โž• Added API (openblas_setaffinity) to set thread affinity
      ๐Ÿง programmatically on Linux
    • โž• Added initial infrastructure for half-precision floating point
      ๐Ÿ‘ (bfloat16) support with a generic implementation of SHGEMM
    • โž• Added CMAKE build system support for building the cblas_Xgemm3m
      functions
    • ๐Ÿ›  Fixed CMAKE support for building in a path with embedded spaces
    • Fixed CMAKE (non)handling of NO_EXPRECISION and MAX_STACK_ALLOC
    • ๐Ÿ›  Fixed GCC version detection in the Makefiles
    • ๐Ÿ‘ Allowed overriding the names of AR, AS and LD in Makefile builds

    POWER:

    • ๐Ÿ›  fixed big-endian POWER8 ELFv2 builds on FreeBSD
    • ๐Ÿ›  Fixed GCC version checks and DYNAMIC_ARCH builds on POWER9
    • ๐Ÿ›  Fixed CMAKE build support for POWER9
    • ๐Ÿ›  fixed a potential race condition in the thread buffer allocation
    • โœ… Worked around LAPACK test failures on PPC G4

    MIPS:

    • ๐Ÿ›  fixed a potential race condition in the thread buffer allocation
    • โž• Added support for MIPS 24K/24KE family based on P5600 kernels

    MIPS64:

    • ๐Ÿ›  fixed a potential race condition in the thread buffer allocation
    • โž• Added TARGET=GENERIC

    ARMV7:

    • ๐Ÿ›  fixed a race condition in the thread buffer allocation

    ARMV8:

    • ๐Ÿ›  Fixed a race condition in the thread buffer allocation
    • ๐Ÿ›  Fixed zero initialisation in the assembly for SGEMM and DGEMM BETA
    • ๐Ÿ‘Œ Improved performance of the ThunderX2 DAXPY kernel
    • โž• Added an optimized SGEMM kernel for Cortex A53
    • ๐Ÿ›  Fixed Makefile support for INTERFACE64 (8-byte integer)

    x86_64:

    • ๐Ÿ›  Fixed a syntax error in the CMAKE setup for SkylakeX
    • ๐Ÿ‘Œ Improved performance of STRSM on Haswell, SkylakeX and Ryzen
    • ๐Ÿ‘Œ Improved SGEMM performance on SGEMM for workloads with ldc a
      multiple of 1024
    • ๐Ÿ‘Œ Improved DGEMM performance on Skylake X
    • ๐Ÿ›  Fixed unwanted AVX512-dependency of SGEMM in DYNAMIC_ARCH
      ๐Ÿ— builds created on SkylakeX
    • โœ‚ Removed data alignment requirement in the SSE2 copy kernels
      that could cause spurious crashes
    • โž• Added a workaround for an optimizer bug in AppleClang 11.0.3
    • ๐Ÿ›  Fixed LAPACK-TEST failures with Intel Fortran
    • ๐Ÿ›  Fixed compilation and LAPACK test results with recent Flang
      and AMD AOCC
    • ๐Ÿ›  Fixed DYNAMIC_ARCH builds with CMAKE on OS X
    • Fixed missing exports of cblas_i?amin, cblas_i?min, cblas_i?max,
      cblas_?sum, cblas_?gemm3m in the shared library on OS X
    • ๐Ÿ›  Fixed reporting of cpu name in DYNAMIC_ARCH builds (would sometimes
      ๐Ÿ‘‰ show the name of an older generation chip supported by the same kernels)

    IBM Z:

    • ๐Ÿ‘Œ Improved performance of SGEMM/STRMM and DGEMM/DTRMM on Z14

    Download OpenBLAS