PyTorch v0.3.1 Release Notes
Release Date: 2018-02-14 // about 6 years ago-
Binaries
- โ Removed support for CUDA capability 3.0 and 5.0 (they still work for source builds for now, but the commitment to support this forward is removed)
- ๐ Stop binary releases for CUDA 7.5
- โ Add CPU-only binary releases that are 10x smaller in size than the full binary with CUDA capabilities.
As always, links to our binaries are on http://pytorch.org
๐ New features
- โ Add Cosine Annealing Learning Rate Scheduler#3311
- โ add
reduce
argument toPoissonNLLLoss
to be able to compute unreduced losses #3770 - Allow
target.requires_grad=True
inl1_loss
andmse_loss
(compute loss wrttarget
) #3876 - โ Add
random_split
that randomly splits a dataset into non-overlapping new datasets of given lengths #4435 - ๐ Introduced scopes to annotate ONNX graphs to have better TensorBoard visualization of models #5153
Allowmap_location
intorch.load
to be a string, such asmap_location='cpu'
ormap_location='cuda:2'
#4203
๐ Bug Fixes
Data Loader / Datasets / Multiprocessing
- ๐ท Made DataLoader workers more verbose on bus error and segfault. Additionally, add a
timeout
option to the DataLoader, which will error if sample loading time exceeds the given value. #3474 - ๐ DataLoader workers used to all have the same random number generator (RNG) seed because of the semantics of
fork
syscall. Now, each worker will have it's RNG seed set tobase_seed + worker_id
wherebase_seed
is a random int64 value generated by the parent process. You may usetorch.initial_seed()
to access this value inworker_init_fn
, which can be used to set other seeds (e.g. NumPy) before data loading.worker_init_fn
is an optional argument that will be called on each worker subprocess with the worker id as input, after seeding and before data loading #4018 - โ Add additional signal handling in DataLoader worker processes when workers abruptly die.
- ๐ท Negative value for n_workers now gives a ValueError #4019
- ๐ fixed a typo in
ConcatDataset.cumulative_sizes
attribute name #3534 - 0๏ธโฃ Accept longs in default_collate for dataloader in python 2 #4001
- Re-initialize autograd engine in child processes #4158
- ๐ Fix distributed dataloader so it pins memory to current GPU not GPU 0. #4196
CUDA / CuDNN
- ๐ allow cudnn for fp16 batch norm #4021
- ๐ Use
enabled
argument intorch.autograd.profiler.emit_nvtx
(was being ignored) #4032 - ๐ Fix cuBLAS arguments for fp16
torch.dot
#3660 - Fix CUDA index_fill_ boundary check with small tensor size #3953
- ๐ Fix CUDA Multinomial checks #4009
- ๐ Fix CUDA version typo in warning #4175
- ๐ Initialize cuda before setting cuda tensor types as default #4788
- โ Add missing lazy_init in cuda python module #4907
- Lazy init order in set device, should not be called in getDevCount #4918
- ๐ Make torch.cuda.empty_cache() a no-op when cuda is not initialized #4936
CPU
- Assert MKL ld* conditions for ger, gemm, and gemv #4056
torch operators
- ๐ Fix
tensor.repeat
when the underlying storage is not owned bytorch
(for example, coming from numpy) #4084 - โ Add proper shape checking to torch.cat #4087
- Add check for slice shape match in index_copy_ and index_add_. #4342
- ๐ Fix use after free when advanced indexing tensors with tensors #4559
- ๐ Fix triu and tril for zero-strided inputs on gpu #4962
- ๐ Fix blas addmm (gemm) condition check #5048
- ๐ Fix topk work size computation #5053
- ๐ Fix reduction functions to respect the stride of the output #4995
- ๐ Improve float precision stability of
linspace
op, fix 4419. #4470
autograd
- ๐ Fix python gc race condition with THPVariable_traverse #4437
nn layers
- ๐ Fix padding_idx getting ignored in backward for Embedding(sparse=True) #3842
๐ Fix cosine_similarity's output shape #3811 - โ Add rnn args check #3925
- NLLLoss works for arbitrary dimensions #4654
- More strict shape check on Conv operators #4637
- ๐ Fix maxpool3d / avgpool3d crashes #5052
- ๐ Fix setting using running stats in InstanceNorm*d #4444
Multi-GPU
- ๐ Fix DataParallel scattering for empty lists / dicts / tuples #3769
- ๐ Fix refcycles in DataParallel scatter and gather (fix elevated memory usage) #4988
- Broadcast output requires_grad only if corresponding input requires_grad #5061
core
- โ Remove hard file offset reset in load() #3695
- Have sizeof account for size of stored elements #3821
- ๐ Fix undefined FileNotFoundError #4384
- make torch.set_num_threads also set MKL threads (take 2) #5002
others
- ๐ Fix wrong learning rate evaluation in CosineAnnealingLR in Python 2 #4656
๐ Performance improvements
- slightly simplified math in IndexToOffset #4040
- ๐ improve performance of maxpooling backwards #4106
- โ Add cublas batched gemm support. #4151
- ๐ Rearrange dimensions for pointwise operations for better performance. #4174
- ๐ Improve memory access patterns for index operations. #4493
- ๐ Improve CUDA softmax performance #4973
- ๐ Fixed double memory accesses of several pointwise operations. #5068
๐ Documentation and UX Improvements
- ๐ Better error messages for blas ops with cuda.LongTensor #4160
- โ Add missing trtrs, orgqr, ormqr docs #3720
- ๐ change doc for Adaptive Pooling #3746
- ๐ Fix MultiLabelMarginLoss docs #3836
- ๐ More docs for Conv1d Conv2d #3870
- ๐ Improve Tensor.scatter_ doc #3937
- 0๏ธโฃ [docs] rnn.py: Note zero defaults for hidden state/cell #3951
- ๐ Improve Tensor.new doc #3954
- ๐ Improve docs for torch and torch.Tensor #3969
- โ Added explicit tuple dimensions to doc for Conv1d. #4136
- ๐ Improve svd doc #4155
- Correct instancenorm input size #4171
- ๐ Fix StepLR example docs #4478