PyTorch v0.3.1 release notes (2018-02-14)

« Changelog History

PyTorch v0.3.1 Release Notes

Release Date: 2018-02-14 // about 6 years ago

Binaries
- ✂ Removed support for CUDA capability 3.0 and 5.0 (they still work for source builds for now, but the commitment to support this forward is removed)
- 🚀 Stop binary releases for CUDA 7.5
- ➕ Add CPU-only binary releases that are 10x smaller in size than the full binary with CUDA capabilities.
As always, links to our binaries are on http://pytorch.org

🆕 New features
- ➕ Add Cosine Annealing Learning Rate Scheduler #3311
- ➕ add reduce argument to PoissonNLLLoss to be able to compute unreduced losses #3770
- Allow target.requires_grad=True in l1_loss and mse_loss (compute loss wrt target) #3876
- ➕ Add random_split that randomly splits a dataset into non-overlapping new datasets of given lengths #4435
- 👍 Introduced scopes to annotate ONNX graphs to have better TensorBoard visualization of models #5153
  Allow map_location in torch.load to be a string, such as map_location='cpu' or map_location='cuda:2'#4203
🐛 Bug Fixes

Data Loader / Datasets / Multiprocessing
- 👷 Made DataLoader workers more verbose on bus error and segfault. Additionally, add a timeout option to the DataLoader, which will error if sample loading time exceeds the given value. #3474
- 👀 DataLoader workers used to all have the same random number generator (RNG) seed because of the semantics of fork syscall. Now, each worker will have it's RNG seed set to base_seed + worker_id where base_seed is a random int64 value generated by the parent process. You may use torch.initial_seed() to access this value in worker_init_fn, which can be used to set other seeds (e.g. NumPy) before data loading. worker_init_fn is an optional argument that will be called on each worker subprocess with the worker id as input, after seeding and before data loading #4018
- ➕ Add additional signal handling in DataLoader worker processes when workers abruptly die.
- 👷 Negative value for n_workers now gives a ValueError #4019
- 🛠 fixed a typo in ConcatDataset.cumulative_sizes attribute name #3534
- 0️⃣ Accept longs in default_collate for dataloader in python 2 #4001
- Re-initialize autograd engine in child processes #4158
- 🛠 Fix distributed dataloader so it pins memory to current GPU not GPU 0. #4196
CUDA / CuDNN
- 👍 allow cudnn for fp16 batch norm #4021
- 👉 Use enabled argument in torch.autograd.profiler.emit_nvtx (was being ignored) #4032
- 🛠 Fix cuBLAS arguments for fp16 torch.dot #3660
- Fix CUDA index_fill_ boundary check with small tensor size #3953
- 🛠 Fix CUDA Multinomial checks #4009
- 🛠 Fix CUDA version typo in warning #4175
- 🎉 Initialize cuda before setting cuda tensor types as default #4788
- ➕ Add missing lazy_init in cuda python module #4907
- Lazy init order in set device, should not be called in getDevCount #4918
- 👉 Make torch.cuda.empty_cache() a no-op when cuda is not initialized #4936
CPU
- Assert MKL ld* conditions for ger, gemm, and gemv #4056
torch operators
- 🛠 Fix tensor.repeat when the underlying storage is not owned by torch (for example, coming from numpy) #4084
- ➕ Add proper shape checking to torch.cat #4087
- Add check for slice shape match in index_copy_ and index_add_. #4342
- 🛠 Fix use after free when advanced indexing tensors with tensors #4559
- 🛠 Fix triu and tril for zero-strided inputs on gpu #4962
- 🛠 Fix blas addmm (gemm) condition check #5048
- 🛠 Fix topk work size computation #5053
- 🛠 Fix reduction functions to respect the stride of the output #4995
- 👌 Improve float precision stability of linspace op, fix 4419. #4470
autograd
- 🛠 Fix python gc race condition with THPVariable_traverse #4437
nn layers
- 🛠 Fix padding_idx getting ignored in backward for Embedding(sparse=True) #3842
  🛠 Fix cosine_similarity's output shape #3811
- ➕ Add rnn args check #3925
- NLLLoss works for arbitrary dimensions #4654
- More strict shape check on Conv operators #4637
- 🛠 Fix maxpool3d / avgpool3d crashes #5052
- 🛠 Fix setting using running stats in InstanceNorm*d #4444
Multi-GPU
- 🛠 Fix DataParallel scattering for empty lists / dicts / tuples #3769
- 🛠 Fix refcycles in DataParallel scatter and gather (fix elevated memory usage) #4988
- Broadcast output requires_grad only if corresponding input requires_grad #5061
core
- ✂ Remove hard file offset reset in load() #3695
- Have sizeof account for size of stored elements #3821
- 🛠 Fix undefined FileNotFoundError #4384
- make torch.set_num_threads also set MKL threads (take 2) #5002
others
- 🛠 Fix wrong learning rate evaluation in CosineAnnealingLR in Python 2 #4656
🐎 Performance improvements
- slightly simplified math in IndexToOffset #4040
- 👌 improve performance of maxpooling backwards #4106
- ➕ Add cublas batched gemm support. #4151
- 🐎 Rearrange dimensions for pointwise operations for better performance. #4174
- 👌 Improve memory access patterns for index operations. #4493
- 👌 Improve CUDA softmax performance #4973
- 🛠 Fixed double memory accesses of several pointwise operations. #5068
📚 Documentation and UX Improvements
- 👍 Better error messages for blas ops with cuda.LongTensor #4160
- ➕ Add missing trtrs, orgqr, ormqr docs #3720
- 🔄 change doc for Adaptive Pooling #3746
- 🛠 Fix MultiLabelMarginLoss docs #3836
- 📄 More docs for Conv1d Conv2d #3870
- 👌 Improve Tensor.scatter_ doc #3937
- 0️⃣ [docs] rnn.py: Note zero defaults for hidden state/cell #3951
- 👌 Improve Tensor.new doc #3954
- 👌 Improve docs for torch and torch.Tensor #3969
- ➕ Added explicit tuple dimensions to doc for Conv1d. #4136
- 👌 Improve svd doc #4155
- Correct instancenorm input size #4171
- 🛠 Fix StepLR example docs #4478

PyTorch v0.3.1

Version Release Notes from February 14, 2018 (about 6 years ago)

« Changelog History

PyTorch v0.3.1 Release Notes

Binaries

🆕 New features

🐛 Bug Fixes

Data Loader / Datasets / Multiprocessing

CUDA / CuDNN

CPU

torch operators

autograd

nn layers

Multi-GPU

core

others

🐎 Performance improvements

📚 Documentation and UX Improvements