OpenBLAS v0.3.4 Release Notes

Release Date: 2018-12-02 // over 5 years ago
  • common:

    • the new, experimental thread-local memory allocation had
      ๐Ÿ— inadvertently been left enabled for gmake builds in 0.3.3
      0๏ธโƒฃ despite the announcement. It is now disabled by default, and
      ๐Ÿ— single-threaded builds will keep using the old allocator even
      if the USE_TLS option is turned on.
    • OpenBLAS will now provide enough buffer space for at least 50
      0๏ธโƒฃ threads by default.
    • The output of openblas_get_config() now contains the version
      number.
    • A serious thread safety bug in GEMV operation with small M and
      ๐Ÿ›  large N size has been fixed.
    • The code will now automatically call blas_thread_init after a
      fork if needed before handling a call to openblas_set_num_threads
    • Accesses to parallelized level3 functions from multiple callers
      are now serialized to avoid thread races (unless using OpenMP).
      ๐ŸŽ This should provide better performance than the known-threadsafe
      (but non-default) USE_SIMPLE_THREADED_LEVEL3 option.
    • ๐Ÿ— When building LAPACK with gfortran, -frecursive is now (again)
      0๏ธโƒฃ enabled by default to ensure correct behaviour.
    • ๐Ÿ‘ The OpenBLAS version cblas.h now supports both CBLAS_ORDER and
      CBLAS_LAYOUT as the name of the matrix row/column order option.
    • Externally set LDFLAGS are now passed through to the final compile/link
      steps to facilitate setting platform-specific linker flags.
    • ๐Ÿ— A potential race condition during the build of LAPACK (that would
      ๐Ÿ— usually manifest itself as a failure to build TESTING/MATGEN) has been
      ๐Ÿ›  fixed.
    • xHEMV has been changed to stay single-threaded for small input sizes
      where the overhead of multithreading exceeds any possible gains
    • CSWAP and ZSWAP have been limited to a single thread except on ARMV8 or
      ThunderX hardware with sizable input.
    • ๐Ÿ”— Linker flags for the PGI compiler have been updated
    • Behaviour of AXPY with zero increments is now handled in the C interface,
      correcting the result on at least Intel Atom.
    • The result matrix from calling SGELSS with an all-zero input matrix is
      now zeroed completely.

    x86_64:

    • ๐Ÿ›  Autodetection of AMD Ryzen2 has been fixed (again).
    • ๐Ÿ— CMAKE builds now support labeling of an INTERFACE64=1 build of
      the library with the _64 suffix.
    • AVX512 version of DGEMM has been added and the AVX512 SGEMM kernel
      has been sped up by rewriting with C intrinsics
    • Fixed compilation on RHEL5/CENTOS5 (issue with typename __WAIT_STATUS)

    POWER:

    • โž• added support for building on AIX (with gcc and GNU tools from AIX Toolbox).
    • CPU type detection has been implemented for AIX.
    • ๐Ÿ›  CPU type detection has been fixed for NETBSD.

    MIPS64:

    • โœ… AXPY on LOONGSON3A has been corrected to pass "zero increment" utest.
    • ๐Ÿ›  DSDOT on LOONGSON3A has been fixed.
    • the SGEMM microkernel has been hardened against potential data loss.

    ARMV8:

    • ๐Ÿ‘ DYNAMic_ARCH support is now available for 64bit ARM
    • cross-compiling for ARMV8 under iOS now works.
    • ๐Ÿ‘ cpu-specific code has been rearranged to make better use of both
      hardware commonalities and model-specific compiler optimizations.
    • ๐Ÿšš XGENE1 has been removed as a TARGET, superseded by the improved generic
      ๐Ÿ‘ ARMV8 support.

    ARMV7:

    • Older assembly mnemonics have been converted to UAL form to allow
      ๐Ÿ— building with clang 7.0
    • ๐Ÿ›  Cross compiling LAPACKE for Android has been fixed again (broken by
      โšก๏ธ update to LAPACK 3.7.0 some while ago).

    Download OpenBLAS