PyTorch v0.3.1 Release Notes

Release Date: 2018-02-14 // about 6 years ago
  • Binaries

    • โœ‚ Removed support for CUDA capability 3.0 and 5.0 (they still work for source builds for now, but the commitment to support this forward is removed)
    • ๐Ÿš€ Stop binary releases for CUDA 7.5
    • โž• Add CPU-only binary releases that are 10x smaller in size than the full binary with CUDA capabilities.

    As always, links to our binaries are on http://pytorch.org

    ๐Ÿ†• New features

    ๐Ÿ› Bug Fixes

    Data Loader / Datasets / Multiprocessing

    • ๐Ÿ‘ท Made DataLoader workers more verbose on bus error and segfault. Additionally, add a timeout option to the DataLoader, which will error if sample loading time exceeds the given value. #3474
    • ๐Ÿ‘€ DataLoader workers used to all have the same random number generator (RNG) seed because of the semantics of fork syscall. Now, each worker will have it's RNG seed set to base_seed + worker_id where base_seed is a random int64 value generated by the parent process. You may use torch.initial_seed() to access this value in worker_init_fn, which can be used to set other seeds (e.g. NumPy) before data loading. worker_init_fn is an optional argument that will be called on each worker subprocess with the worker id as input, after seeding and before data loading #4018
    • โž• Add additional signal handling in DataLoader worker processes when workers abruptly die.
    • ๐Ÿ‘ท Negative value for n_workers now gives a ValueError #4019
    • ๐Ÿ›  fixed a typo in ConcatDataset.cumulative_sizes attribute name #3534
    • 0๏ธโƒฃ Accept longs in default_collate for dataloader in python 2 #4001
    • Re-initialize autograd engine in child processes #4158
    • ๐Ÿ›  Fix distributed dataloader so it pins memory to current GPU not GPU 0. #4196

    CUDA / CuDNN

    • ๐Ÿ‘ allow cudnn for fp16 batch norm #4021
    • ๐Ÿ‘‰ Use enabled argument in torch.autograd.profiler.emit_nvtx (was being ignored) #4032
    • ๐Ÿ›  Fix cuBLAS arguments for fp16 torch.dot #3660
    • Fix CUDA index_fill_ boundary check with small tensor size #3953
    • ๐Ÿ›  Fix CUDA Multinomial checks #4009
    • ๐Ÿ›  Fix CUDA version typo in warning #4175
    • ๐ŸŽ‰ Initialize cuda before setting cuda tensor types as default #4788
    • โž• Add missing lazy_init in cuda python module #4907
    • Lazy init order in set device, should not be called in getDevCount #4918
    • ๐Ÿ‘‰ Make torch.cuda.empty_cache() a no-op when cuda is not initialized #4936

    CPU

    • Assert MKL ld* conditions for ger, gemm, and gemv #4056

    torch operators

    • ๐Ÿ›  Fix tensor.repeat when the underlying storage is not owned by torch (for example, coming from numpy) #4084
    • โž• Add proper shape checking to torch.cat #4087
    • Add check for slice shape match in index_copy_ and index_add_. #4342
    • ๐Ÿ›  Fix use after free when advanced indexing tensors with tensors #4559
    • ๐Ÿ›  Fix triu and tril for zero-strided inputs on gpu #4962
    • ๐Ÿ›  Fix blas addmm (gemm) condition check #5048
    • ๐Ÿ›  Fix topk work size computation #5053
    • ๐Ÿ›  Fix reduction functions to respect the stride of the output #4995
    • ๐Ÿ‘Œ Improve float precision stability of linspace op, fix 4419. #4470

    autograd

    • ๐Ÿ›  Fix python gc race condition with THPVariable_traverse #4437

    nn layers

    • ๐Ÿ›  Fix padding_idx getting ignored in backward for Embedding(sparse=True) #3842
      ๐Ÿ›  Fix cosine_similarity's output shape #3811
    • โž• Add rnn args check #3925
    • NLLLoss works for arbitrary dimensions #4654
    • More strict shape check on Conv operators #4637
    • ๐Ÿ›  Fix maxpool3d / avgpool3d crashes #5052
    • ๐Ÿ›  Fix setting using running stats in InstanceNorm*d #4444

    Multi-GPU

    • ๐Ÿ›  Fix DataParallel scattering for empty lists / dicts / tuples #3769
    • ๐Ÿ›  Fix refcycles in DataParallel scatter and gather (fix elevated memory usage) #4988
    • Broadcast output requires_grad only if corresponding input requires_grad #5061

    core

    • โœ‚ Remove hard file offset reset in load() #3695
    • Have sizeof account for size of stored elements #3821
    • ๐Ÿ›  Fix undefined FileNotFoundError #4384
    • make torch.set_num_threads also set MKL threads (take 2) #5002

    others

    • ๐Ÿ›  Fix wrong learning rate evaluation in CosineAnnealingLR in Python 2 #4656

    ๐ŸŽ Performance improvements

    • slightly simplified math in IndexToOffset #4040
    • ๐Ÿ‘Œ improve performance of maxpooling backwards #4106
    • โž• Add cublas batched gemm support. #4151
    • ๐ŸŽ Rearrange dimensions for pointwise operations for better performance. #4174
    • ๐Ÿ‘Œ Improve memory access patterns for index operations. #4493
    • ๐Ÿ‘Œ Improve CUDA softmax performance #4973
    • ๐Ÿ›  Fixed double memory accesses of several pointwise operations. #5068

    ๐Ÿ“š Documentation and UX Improvements

    • ๐Ÿ‘ Better error messages for blas ops with cuda.LongTensor #4160
    • โž• Add missing trtrs, orgqr, ormqr docs #3720
    • ๐Ÿ”„ change doc for Adaptive Pooling #3746
    • ๐Ÿ›  Fix MultiLabelMarginLoss docs #3836
    • ๐Ÿ“„ More docs for Conv1d Conv2d #3870
    • ๐Ÿ‘Œ Improve Tensor.scatter_ doc #3937
    • 0๏ธโƒฃ [docs] rnn.py: Note zero defaults for hidden state/cell #3951
    • ๐Ÿ‘Œ Improve Tensor.new doc #3954
    • ๐Ÿ‘Œ Improve docs for torch and torch.Tensor #3969
    • โž• Added explicit tuple dimensions to doc for Conv1d. #4136
    • ๐Ÿ‘Œ Improve svd doc #4155
    • Correct instancenorm input size #4171
    • ๐Ÿ›  Fix StepLR example docs #4478