OpenBLAS v0.3.13 Release Notes

Release Date: 2020-12-12 // over 3 years ago
  • common:

    • โž• Added a generic bfloat16 SBGEMV kernel
    • ๐Ÿ›  Fixed a potentially severe memory leak after fork in OpenMP builds
      that was introduced in 0.3.12
    • โž• Added detection of the Fujitsu Fortran compiler
    • โž• Added detection of the (e)gfortran compiler on OpenBSD
    • โž• Added support for overriding the default name of the library independently
      ๐Ÿ— from symbol suffixing in the gmake builds (already supported in cmake)

    RISC V:

    • #### โž• Added a RISC V port optimized for C910V

    POWER:

    • โž• Added optimized POWER10 kernels for SAXPY, CAXPY, SDOT, DDOT and DGEMV_N
    • ๐Ÿ‘Œ Improved DGEMM performance on POWER10
    • ๐Ÿ‘Œ Improved STRSM and DTRSM performance on POWER9 and POWER10
    • ๐Ÿ›  Fixed segmemtation faults in DYNAMIC_ARCH builds
    • ๐Ÿ›  Fixed compilation with the PGI compiler

    x86:

    • ๐Ÿ›  Fixed compilation of kernels that require SSE2 intrinsics since 0.3.12

    x86_64:

    • โž• Added an optimized bfloat16 SBGEMV kernel for SkylakeX and Cooperlake
    • ๐Ÿ‘Œ Improved the performance of SASUM and DASUM kernels through parallelization
    • ๐Ÿ‘Œ Improved the performance of SROT and DROT kernels
    • ๐Ÿ‘Œ Improved the performance of multithreaded xSYRK
    • ๐Ÿ›  Fixed OpenMP builds that use the LLVM Clang compiler together with GNU gfortran
      (where linking of both the LLVM libomp and GNU libgomp could lead to lockups or
      wrong results)
    • ๐Ÿ›  Fixed miscompilations by old gcc 4.6
    • ๐Ÿ›  Fixed misdetection of AVX2 capability in some Sandybridge cpus
    • ๐Ÿ›  Fixed lockups in builds combining DYNAMIC_ARCH with TARGET=GENERIC on OpenBSD

    ARM64:

    • ๐Ÿ›  Fixed segmentation faults in DYNAMIC_ARCH builds

    MIPS:

    • ๐Ÿ‘Œ Improved kernels for Loongson 3R3 ("3A") and 3R4 ("3B") models, including MSA
    • ๐Ÿ›  Fixed bugs in the MSA kernels for CGEMM, CTRMM, CGEMV and ZGEMV
    • โž• Added handling of zero increments in the MSA kernels for SSWAP and DSWAP
    • โž• Added DYNAMIC_ARCH support for MIPS64 (currently Loongson3R3/3R4 only)

    SPARC:

    • ๐Ÿ›  Fixed building 32 and 64 bit SPARC kernels with the SolarisStudio compilers

    md5sum:
    2ca05b9cee97f0d1a8ab15bd6ea2b747 OpenBLAS-0.3.13.tar.gz
    ab433ae7ed37ad282a67c2cfcc7c4301 OpenBLAS-0.3.13.zip
    855469f768c6e32cf68f9cdb6f5fa69e OpenBLAS-0.3.13-x64.zip

    Download OpenBLAS


Previous changes from v0.3.12

  • common:

    • ๐Ÿ›  Fixed missing BLAS/LAPACK functions (inadvertently dropped during
      ๐Ÿ‘ท the build system restructuring to support selective compilation)
    • ๐Ÿ›  Fixed argument conversion macro in LAPACKE_zgesvdq (LAPACK #458)

    POWER:

    • โž• Added optimized SCOPY/CCOPY kernels for POWER10
    • 0๏ธโƒฃ Increased and unified the default size of the GEMM buffer
    • ๐Ÿ›  Fixed building for POWER10 in DYNAMIC_ARCH mode
    • โœ… POWER10 compatibility test now checks binutils version as well
    • โš  Cleaned up compiler warnings

    x86_64:

    • corrected compiler version checks for AVX2 compatibility
    • โž• added compiler option -mavx2 for building with flang
    • ๐Ÿ›  fixed direct SGEMM pathway for small matrix sizes (broken by
      ๐Ÿ”จ the code refactoring in 0.3.11)
    • ๐Ÿ›  fixed unhandled partial register clobbers in several kernels
      for AXPY,DOT,GEMV_N and GEMV_T flagged by gcc10 tree-vectorizer

    ARMV8:

    • ๐Ÿ‘Œ improved Apple Vortex support to include cross-compiling

    Download OpenBLAS

    md5sums:
    03bff4558fc701b7d0e689814055ecb2 OpenBLAS-0.3.12.zip
    baf8c58c0ef6ebe0f9eb74a5c4acd662 OpenBLAS-0.3.12.tar.gz
    4df4ebb7b5c4f1b5ec8fa58f48be6a51 OpenBLAS-0.3.12-x64.zip