OpenBLAS v0.3.10 Release Notes
Release Date: 2020-06-14 // almost 4 years ago-
common:
- ๐ Improved thread locking behaviour in blas_server and parallel getrf
- ๐ Imported bugfix 394 from LAPACK (spurious reference to "XERBL"
due to overlong lines) - ๐ Imported bugfix 403 from LAPACK (compile option "recursive" required
for correctness with Intel and PGI) - ๐ Imported bugfix 408 from LAPACK (wrong scaling in ZHEEQUB)
- ๐ Imported bugfix 411 from LAPACK (infinite loop in LARGV/LARTG/LARTGP)
- ๐ Fixed mismatches between BUFFERSIZE and GEMM_UNROLL parameters that
could lead to crashes at large matrix sizes - โช Restored internal soname in dynamic libraries on FreeBSD and Dragonfly
- โ Added API (openblas_setaffinity) to set thread affinity
๐ง programmatically on Linux - โ Added initial infrastructure for half-precision floating point
๐ (bfloat16) support with a generic implementation of SHGEMM - โ Added CMAKE build system support for building the cblas_Xgemm3m
functions - ๐ Fixed CMAKE support for building in a path with embedded spaces
- Fixed CMAKE (non)handling of NO_EXPRECISION and MAX_STACK_ALLOC
- ๐ Fixed GCC version detection in the Makefiles
- ๐ Allowed overriding the names of AR, AS and LD in Makefile builds
POWER:
- ๐ fixed big-endian POWER8 ELFv2 builds on FreeBSD
- ๐ Fixed GCC version checks and DYNAMIC_ARCH builds on POWER9
- ๐ Fixed CMAKE build support for POWER9
- ๐ fixed a potential race condition in the thread buffer allocation
- โ Worked around LAPACK test failures on PPC G4
MIPS:
- ๐ fixed a potential race condition in the thread buffer allocation
- โ Added support for MIPS 24K/24KE family based on P5600 kernels
MIPS64:
- ๐ fixed a potential race condition in the thread buffer allocation
- โ Added TARGET=GENERIC
ARMV7:
- ๐ fixed a race condition in the thread buffer allocation
ARMV8:
- ๐ Fixed a race condition in the thread buffer allocation
- ๐ Fixed zero initialisation in the assembly for SGEMM and DGEMM BETA
- ๐ Improved performance of the ThunderX2 DAXPY kernel
- โ Added an optimized SGEMM kernel for Cortex A53
- ๐ Fixed Makefile support for INTERFACE64 (8-byte integer)
x86_64:
- ๐ Fixed a syntax error in the CMAKE setup for SkylakeX
- ๐ Improved performance of STRSM on Haswell, SkylakeX and Ryzen
- ๐ Improved SGEMM performance on SGEMM for workloads with ldc a
multiple of 1024 - ๐ Improved DGEMM performance on Skylake X
- ๐ Fixed unwanted AVX512-dependency of SGEMM in DYNAMIC_ARCH
๐ builds created on SkylakeX - โ Removed data alignment requirement in the SSE2 copy kernels
that could cause spurious crashes - โ Added a workaround for an optimizer bug in AppleClang 11.0.3
- ๐ Fixed LAPACK-TEST failures with Intel Fortran
- ๐ Fixed compilation and LAPACK test results with recent Flang
and AMD AOCC - ๐ Fixed DYNAMIC_ARCH builds with CMAKE on OS X
- Fixed missing exports of cblas_i?amin, cblas_i?min, cblas_i?max,
cblas_?sum, cblas_?gemm3m in the shared library on OS X - ๐ Fixed reporting of cpu name in DYNAMIC_ARCH builds (would sometimes
๐ show the name of an older generation chip supported by the same kernels)
IBM Z:
- ๐ Improved performance of SGEMM/STRMM and DGEMM/DTRMM on Z14