OpenBLAS v0.3.7 Release Notes
Release Date: 2019-08-11 // over 4 years ago-
common:
- having the gmake special variables TARGET_ARCH or TARGET_MACH defined no longer causes build failures in ctest or utest
- defining NO_AFFINITY or USE_TLS to zero in gmake builds no longer has the same effect as setting them to one
- โ a new test program was added to allow checking the library for thread safety
- a new option USE_LOCKING was added to ensure thread safety when OpenBLAS itself is built without multithreading but
will be called from multiple threads. - ๐ง a build failure on Linux with glibc versions earlier than 2.5 was fixed
- ๐ a runtime error with CPU enumeration (and NO_AFFINITY not set) on glibc 2.6 was fixed
- ๐ง NO_AFFINITY was added to the CMAKE options (and defaults to being active on Linux, as in the gmake builds)
x86_64
- ๐ the build-time logic for detection of AVX512 availability in the processor and compiler was fixed
- ๐ gmake builds on OSX now set the internal name of the library to libopenblas.0.dylib (consistent with CMAKE)
- the Haswell DGEMM kernel received a significant speedup through improved prefetch and load instructions
- ๐ performance of DGEMM, DTRMM, DTRSM and ZDOT on Zen/Zen2 was markedly increased by avoiding vpermpd instructions
- the SKYLAKEX (AVX512) DGEMM helper functions have now been disabled to fix remaining errors in DGEMM, DSYMM and DTRMM
POWER:
- โ added support for building on FreeBSD/powerpc64 and FreeBSD/ppc970
- โ added optimized kernels for POWER9 single and double precision complex BLAS3
- โ added optimized kernels for POWER9 SGEMM and STRMM
ARMV7:
- ๐ fixed the softfp implementations of xAMAX and IxAMAX
- โ removed the predefined -march= flags on both ARMV5 and ARMV6 as they were appropriate for only a subset of platforms