OpenBLAS v0.3.13 Release Notes
Release Date: 2020-12-12 // about 2 years ago-
common:
- ➕ Added a generic bfloat16 SBGEMV kernel
- 🛠 Fixed a potentially severe memory leak after fork in OpenMP builds
that was introduced in 0.3.12 - ➕ Added detection of the Fujitsu Fortran compiler
- ➕ Added detection of the (e)gfortran compiler on OpenBSD
- ➕ Added support for overriding the default name of the library independently
🏗 from symbol suffixing in the gmake builds (already supported in cmake)
RISC V:
- #### ➕ Added a RISC V port optimized for C910V
POWER:
- ➕ Added optimized POWER10 kernels for SAXPY, CAXPY, SDOT, DDOT and DGEMV_N
- 👌 Improved DGEMM performance on POWER10
- 👌 Improved STRSM and DTRSM performance on POWER9 and POWER10
- 🛠 Fixed segmemtation faults in DYNAMIC_ARCH builds
- 🛠 Fixed compilation with the PGI compiler
x86:
- 🛠 Fixed compilation of kernels that require SSE2 intrinsics since 0.3.12
x86_64:
- ➕ Added an optimized bfloat16 SBGEMV kernel for SkylakeX and Cooperlake
- 👌 Improved the performance of SASUM and DASUM kernels through parallelization
- 👌 Improved the performance of SROT and DROT kernels
- 👌 Improved the performance of multithreaded xSYRK
- 🛠 Fixed OpenMP builds that use the LLVM Clang compiler together with GNU gfortran
(where linking of both the LLVM libomp and GNU libgomp could lead to lockups or
wrong results) - 🛠 Fixed miscompilations by old gcc 4.6
- 🛠 Fixed misdetection of AVX2 capability in some Sandybridge cpus
- 🛠 Fixed lockups in builds combining DYNAMIC_ARCH with TARGET=GENERIC on OpenBSD
ARM64:
- 🛠 Fixed segmentation faults in DYNAMIC_ARCH builds
MIPS:
- 👌 Improved kernels for Loongson 3R3 ("3A") and 3R4 ("3B") models, including MSA
- 🛠 Fixed bugs in the MSA kernels for CGEMM, CTRMM, CGEMV and ZGEMV
- ➕ Added handling of zero increments in the MSA kernels for SSWAP and DSWAP
- ➕ Added DYNAMIC_ARCH support for MIPS64 (currently Loongson3R3/3R4 only)
SPARC:
- 🛠 Fixed building 32 and 64 bit SPARC kernels with the SolarisStudio compilers
md5sum:
2ca05b9cee97f0d1a8ab15bd6ea2b747 OpenBLAS-0.3.13.tar.gz
ab433ae7ed37ad282a67c2cfcc7c4301 OpenBLAS-0.3.13.zip
855469f768c6e32cf68f9cdb6f5fa69e OpenBLAS-0.3.13-x64.zip
Previous changes from v0.3.12
-
common:
- 🛠 Fixed missing BLAS/LAPACK functions (inadvertently dropped during
👷 the build system restructuring to support selective compilation) - 🛠 Fixed argument conversion macro in LAPACKE_zgesvdq (LAPACK #458)
POWER:
- ➕ Added optimized SCOPY/CCOPY kernels for POWER10
- 0️⃣ Increased and unified the default size of the GEMM buffer
- 🛠 Fixed building for POWER10 in DYNAMIC_ARCH mode
- ✅ POWER10 compatibility test now checks binutils version as well
- ⚠ Cleaned up compiler warnings
x86_64:
- corrected compiler version checks for AVX2 compatibility
- ➕ added compiler option -mavx2 for building with flang
- 🛠 fixed direct SGEMM pathway for small matrix sizes (broken by
🔨 the code refactoring in 0.3.11) - 🛠 fixed unhandled partial register clobbers in several kernels
for AXPY,DOT,GEMV_N and GEMV_T flagged by gcc10 tree-vectorizer
ARMV8:
- 👌 improved Apple Vortex support to include cross-compiling
md5sums:
03bff4558fc701b7d0e689814055ecb2 OpenBLAS-0.3.12.zip
baf8c58c0ef6ebe0f9eb74a5c4acd662 OpenBLAS-0.3.12.tar.gz
4df4ebb7b5c4f1b5ec8fa58f48be6a51 OpenBLAS-0.3.12-x64.zip - 🛠 Fixed missing BLAS/LAPACK functions (inadvertently dropped during