OpenBLAS v0.3.0 Release Notes
Release Date: 2018-05-23 // almost 6 years ago-
common:
* fixed some more thread race and locking bugs * added preliminary support for calling an OpenMP build of the library from multiple threads * removed performance impact of thread locks added in 0.2.20 on OpenMP code * general code cleanup * optimized DSDOT implementation * improved thread distribution for GEMM * corrected IMATCOPY/OMATCOPY implementation * fixed out-of-bounds accesses in the multithreaded xBMV/xPMV and SYMV implementations * cmake build improvements * pkgconfig file now contains build options * openblas_get_config() now reports USE_OPENMP and NUM_THREADS settings used for the build * corrections and improvements for systems with more than 64 cpus * LAPACK code updated to 3.8.0 including later fixes * added ReLAPACK, a recursive implementation of several LAPACK functions * Rewrote ROTMG to handle cases that the netlib code failed to address * Disabled (broken) multithreading code for xTRMV * corrected prototypes of complex CBLAS functions to make our cblas.h match the generally accepted standard * shared memory access failures on startup are now handled more gracefully * restored utests from earlier releases (and made them pass on all affected systems)
SPARC:
* several fixes for cpu autodetection
POWER:
* corrected vector register overwriting in several Power8 kernels * optimized additional BLAS functions
ARM:
* added support for CortexA53 and A72 * added autodetection for ThunderX2T99 * made most optimized kernels the default for generic ARMv8 targets
x86_64:
* parallelized DDOT kernel for Haswell * changed alignment directives in assembly kernels to boost performance on OSX * fixed register handling in the GEMV microkernels (bug exposed by gcc7) * added support for building on OpenBSD and Dragonfly * updated compiler options to work with Intel release 2018 * support fully optimized build with clang/flang on Microsoft Windows * fixed building on AIX
IBM Z:
* added optimized BLAS 1/2 functions
MIPS:
* fixed cpu autodetection helper code * added mips32 1004K cpu (Mediatek MT7621 and similar SoC) * added mips64 I6500 cpu