PyTorch v1.1.0 Release Notes

Release Date: 2019-05-01 // almost 5 years ago
  • πŸ‘ Note: CUDA 8.0 is no longer supported

    Highlights

    TensorBoard (currently experimental)

    🌐 First-class and native support for visualization and model debugging with TensorBoard, a web application suite for inspecting and understanding training runs, tensors, and graphs. PyTorch now supports TensorBoard logging with a simple from torch.utils.tensorboard import SummaryWriter command. Histograms, embeddings, scalars, images, text, graphs, and more can be visualized across training runs. TensorBoard support is currently experimental. You can browse the docs here.

    [JIT] Attributes in ScriptModules

    πŸ‘€ Attributes can be assigned on a ScriptModule by wrapping them with torch.jit.Attribute and specifying the type. Attributes are similar to parameters or buffers, but can be of any type. They will be serialized along with any paramters/buffers when you call torch.jit.save(), so they are a great way to store arbitrary state in your model. See the docs for more info.

    Example:

    class Foo(torch.jit.ScriptModule):
      def __init__ (self, a_dict):
        super(Foo, self). __init__ (False)
        self.words = torch.jit.Attribute([], List[str])
        self.some_dict = torch.jit.Attribute(a_dict, Dict[str, int])
    
      @torch.jit.script_method
      def forward(self, input: str) -> int:
        self.words.append(input)
        return self.some_dict[input]
    

    πŸ‘ [JIT] Dictionary and List Support in TorchScript

    πŸ‘ TorchScript now has robust support for list and dictionary types. They behave much like Python lists and dictionaries, supporting most built-in methods, as well as simple comprehensions and for…in constructs.

    [JIT] User-defined classes in TorchScript (experimental)

    πŸ‘€ For more complex stateful operations, TorchScript now supports annotating a class with @torch.jit.script. Classes used this way can be JIT-compiled and loaded in C++ like other TorchScript modules. See the docs for more info.

    @torch.jit.script
    class Pair:
        def __init__ (self, first, second)
            self.first = first
            self.second = second
    
        def sum(self):
            return self.first + self.second
    

    DistributedDataParallel new functionality and tutorials

    nn.parallel.DistributedDataParallel: can now wrap multi-GPU modules, which enables use cases such as model parallel (tutorial) on one server and data parallel (tutorial) across servers.
    (19271).

    πŸ’₯ Breaking Changes

    • Tensor.set_: the device of a Tensor can no longer be changed via Tensor.set_. This would most commonly happen when setting up a Tensor with the default CUDA device and later swapping in a Storage on a different CUDA device. Instead, set up the Tensor on the correct device from the beginning. (18832).
    • ⏱ Pay attention to the order change of lr_scheduler.step(). (7889).
    • 0️⃣ torch.unique: changed the default value of sorted to True. (15379).
    • [JIT] Rename isTensor api -> isCompleteTensor. #18437
    • [JIT] Remove GraphExecutor's python bindings. #19141
    • [C++]: many methods on Type no longer exist; use the functional or Tensor method equivalent. (17991).
    • [C++]: the Backend constructor of TensorOptions no longer exists. (18137).
    • [C++, Distributed]: Remove c10d ProcessGroup::getGroupRank has been removed. (19147).

    πŸ†• New Features

    Operators

    • torch.tril_indices, torch.triu_indices: added operator with same behavior as NumPy. (14904, 15203).
    • torch.combinations, torch.cartesian_prod: added new itertools-like operators. (9393).
    • torch.repeat_interleave: new operator similar to numpy.repeat. (18395).
    • torch.from_file: new operator similar to Storage.from_file, but returning a tensor. (18688).
    • torch.unique_consecutive: new operator with semantics similar to std::unique in C++. (19060).
    • πŸ‘ torch.tril, torch.triu, torch.trtrs: now support batching. (15257, 18025).
    • πŸ“œ torch.gather: add support for sparse_grad option. (17182).
    • torch.std, torch.max_values, torch.min_values, torch.logsumexp can now operate over multiple dimensions at once. (14535, 15892, 16475).
    • torch.cdist: added operator equivalent to scipy.spatial.distance.cdist. (16168, 17173).
    • torch. __config__.show(): reports detailed version of all libraries. (18579).

    NN

    • nn.MultiheadedAttention: new module implementing MultiheadedAttention from Attention Is All You Need. (18334).
    • πŸ‘ nn.functional.interpolate: added support for bicubic. (9849).
    • πŸ”€ nn.SyncBatchNorm: support synchronous Batch Normalization. (14267).
    • πŸ‘ nn.Conv: added support for Circular Padding via mode='circular'. (17240).
    • nn.EmbeddingBag: now supports trainable `per_sample_weights. (18799).
    • πŸ‘ nn.EmbeddingBag: add support for from_pretrained method, as in nn.Embedding. (15273).
    • RNNs: automatically handle unsorted variable-length sequences via enforce_sorted. (15225).
    • nn.Identity: new module for easier model surgery. (19249).

    Tensors / dtypes

    • πŸ‘ torch.bool: added support for torch.bool dtype and Tensors with that dtype (1-byte storage). NumPy conversion is supported, but operations are currently limited. (16810).

    Optim

    • ⏱ optim.lr_scheduler.CyclicLR: Support for Cyclical Learning Rate and Momentum. (18001).
    • ⏱ optim.lr_scheduler.CosineAnnealingWarmRestarts: new scheduler: Stochastic Gradient Descent with Warm Restarts). (17226).
    • πŸ‘Œ Support multiple simultaneous LR schedulers. (14010)

    Distributions

    • πŸ‘ torch.distributions: now support multiple inheritance. (16772).

    Samplers

    • quasirandom.SobolEngine: new sampler. (10505).

    DistributedDataParallel

    • πŸ‘ nn.parallel.DistributedDataParallel: now supports modules with unused parameters (e.g. control flow, like adaptive softmax, etc). (18251, 18953).

    TorchScript and Tracer

    • πŸ‘ Allow early returns from if-statements. (#154463)
    • βž• Add an @ignore annotation, which statically tells the TorchScript compiler to ignore the Python function. (#16055)
    • Simple for...in loops on lists. (#16726)
    • Ellipses (...) in Tensor indexing. (#17763)
    • None in Tensor indexing. (#18615)
    • πŸ‘Œ Support for basic list comprehensions. (#17267)
    • βž• Add implicit unwrapping of optionals on if foo is not None. (#15587)
    • Tensors, ints, and floats will once again be implicitly cast to bool if used in a conditional. (#18755).
    • Implement to(), cpu(), and cuda() on ScriptModules. (#15340 , #15904)
    • βž• Add support for various methods on lists: (clear(), pop(), reverse(), copy() , extend(),index(), count(), insert(), remove() ).
    • βž• Add support for sort() on lists of specialized type (Tensors, int, float, bool). (#19572)
    • βž• Add support for various methods on strings: (index(), slice(), len())
    • πŸ‘Œ Support Tensor.to() in TorchScript. ( #15976 )
    • πŸ‘Œ Support for Torch.tensor() in TorchScript. (#14913, #19445)
    • πŸ‘Œ Support for torch.manual_seed() in TorchScript. (#19510)
    • πŸ‘Œ Support for nn.LSTM in TorchScript. (#15744)
    • πŸ‘Œ Support for nn.init in TorchScript. (#19640)
    • βž• Add hash() builtin. (#18258)
    • βž• Add min() and max() builtins for numerical types. (#15680)
    • βž• Add isinstance() builtin, which performs a static type check. (#15076)
    • βž• Add train() / eval() / is_training() to C++ ScriptModule API. (#16044)
    • πŸ‘ Allow List arguments to Python functions called from TorchScript. (#15721)
    • πŸ‘ Allow using std::vector and std::unordered_map as arguments to custom operators. (#17587)
    • Tracer: now allows passing static dicts and lists as trace inputs. (#18092, #19580)
    • πŸ‘ Allow generic containers as ScriptModule inputs. (#16482)
    • πŸ‘ Allow nn.Sequential in ModuleList. (#16882)

    Experimental Features

    • [Quantization] (API unstable): added limited support for quantized datatypes via torch.qint8 dtype, torch.quantize_linear conversion function. (18230).
    • [MKLDNN tensor] (API unstable): Added limited (opaque) support for MKLDNN tensors via Tensor.to_mkldnn(); operators are currently limited to ResNext101 operators. (17748).

    πŸ‘Œ Improvements

    • torch.min, torch.max, torch.median, torch.mode, torch.kthvalue, torch.symeig, torch.eig, torch.pstrf, torch.qr, torch.geqrf, torch.solve, torch.slogdet, torch.sort, torch.topk, torch.gels, torch.triangular_solve, torch.svd now return namedtuples describing their outputs. (16186, 16950, 17093, 17195, 15429).
    • πŸ“Œ torch.empty (and other factory functions): now take a pin_memory kwarg; can now pin without going through torch.Storage interface.. (18455).
    • πŸ‘ torch.histc: Now supported on CUDA. (15842)
    • torch.unique: Add return_counts. (18391, 18651).
    • πŸ”Š torch.logspace: add the ability to specify a base. (19542).
    • πŸ–¨ torch.set_printoptions: added scientific notation support. (16876).
    • torch.btrifact now handles tensors with greater than 3 dimensions. (14964).
    • πŸ‘ torch.kthvalue: now supported on CUDA. (17544).
    • πŸ‘ torch.abs: now supported on uint8 and int8 dtypes. (16893).
    • πŸ‘ torch.stack, torch.cat: now supported for CPU half tensors. (16389).
    • πŸ‘ torch.cross: added support for negative dimensions. (17582).
    • πŸ‘ torch.lerp: add support for weight as a Tensor. (17348).
    • torch.transpose: Made consistent with NumPy: 1-d and 0-d arrays are accepted and returned as-is. (17462, 17535).
    • πŸ”Š torch.linspace, torch.logspace can now be used with steps=1 and start != end. (14748).
    • torch.cholesky: changed the derivative from a triangular matrix to symmetric matrix. (19116).
    • torch.lerp: Improved numerical stability. (18871).
    • torch.logdet, torch.slogdet: improve numerical precision. (18449).
    • Tensor. __contains__ is now supported. (17733).
    • πŸ‘ Tensor.fill_ and torch.zeros now support half on CPU. (17536).
    • Tensor.resize_as_, Tensor.view: now supported on half CPU tensors. (18821).
    • Tensor indexing: allow indexing via NumPy booleans. (14932).
    • nn.EmbeddingBag: enable half precision dense backward. (19293).
    • nn.Embedding: fix dense Embedding to work with double backwards. (9078).
    • nn.MaxPool1d: Allow list and tuples to be passed as output_size. (16489).
    • πŸ‘ nn.CTCLoss: support zeroing infinite losses via zero_infinity argument. (16199).
    • πŸ‘ nn.Dropout: add support for enabling during eval. (17549).
    • ⚠ nn.MSELoss: add warning about unexpected broadcasting. (18349).
    • nn.Module.load_state_dict: also return missing_keys and unexpected_keys. (18668).
    • nn.parallel.data_parallel: Enforce devices match device_ids. (17129).
    • torch.device: handle in more places that used to accept only device ordinals. (14929)
    • dtype.int8 tensors can now be converted to NumPy arrays. (14710).
    • nn.functional.gumbel_softmax: allow multidimensional input with dim argument. (13339).
    • nn.functional.cosine_similarity: improved precision. (18250).
    • torch.autograd: Don't keep unnecessary saved_inputs alive, increasing memory efficiency. (16583).
    • torch.autograd.profiler: add Self (non-nested) CPU Time Total, CPU time total (19378).
    • πŸ“Œ DataLoader: support accepting a custom memory pinning function. (16743).
    • DataLoader: retry libshm on EINTR. (15964).
    • πŸ›  DataLoader: fixed an issue with pin_memory and PackedSequence. (18079)
    • data.utils.collate, data.utils.pin_memory: now preserve namedtuples. (16440)
    • πŸ‘‰ Use IndexError instead of RuntimeError on many indexing error cases. (17049, 17114).
    • πŸ‘Œ Support indexing a torch.float16 tensor on CPU. (17645).
    • βž• Add (limited) error checking in case of internal overlap on inplace operators. (19317, 17927).
    • πŸ‘ utils.checkpoint.checkpoint: support None as an argument to checkpoint function. (17969).
    • πŸ‘» torch.autograd: added more information for one of the variables needed for gradient computation has been modified by an inplace operation exception. (18523).
    • πŸ”€ cuda.synchronize: add a device argument. (19573).
    • cuda.reset_max_memory_*: now supported. (15985).
    • distributions.Independent: can now calculate KL Divergence. (17681).
    • 0️⃣ torch.distributed.new_group: now supports overriding default backend. (18595).
    • πŸ–¨ torch.distributed.init_process_group: will now propagate timeout to underlying Store. (16571).
    • [JIT] Preserve module hierarchy on traced modules. (#15101)
    • [JIT] Add metadata for TracedModules. (#17311)
    • [JIT] Improve portability of int and float checks. (#19532)
    • [JIT] Preserve method parameter names during serialization. (#16750)
    • [JIT] Add a correctness check for C++ types to custom operators. (#15247)
    • [JIT] Added a few extra python bindings to help with walking the IR graph from Python. #17822
    • [JIT Error Messages] Print out operator suggestions for "unknown builtin op" error. (#15183)
    • [JIT Error Messages] Better error message when creating a module instance in TorchScript. (#16416)
    • [JIT Error Messages] Print suggestion to add nn.Module attributes to __constants__ when they are using in TorchScript. (#18164)
    • [JIT Error Messages] torch.save(): Improve error message when you try to save a ScriptModule. (#15321)
    • [JIT Error Messages] torch.jit.save(): Improve error message when trying to save a model with Python code. (#16850)
    • [JIT Error Messages] Better errors when trying to close over a Tensor with grad enabled while tracing. (#18298, #19645)
    • [JIT Error Messages] Better error when trying to add a Tensor to __constants__. (#16724)
    • [JIT Error Messages] Better error when a module list isn't added to __constants__. (#17167)
    • [JIT Error Messages] Add a warning when attempting to trace legacy constructors. (#16770)
    • [JIT Error Messages] Improve hint when trying to trace non-deterministic nodes. (#17957)
    • [C++] nn::Module: added Python interop. (13481).
    • [C++] autograd::profiler: is now supported. (16580)
    • [C++] allow detection of C++ ABI flag for cpp extensions from available runtime information. (18994).
    • [C++] torch.argsort is now supported in C++. (17099).
    • [C++] Tensor.isnan: now supported in C++. (15722).
    • [C++]: Added named submodule support to nn::Sequential. (17552).
    • [C++]: Kaiming Initialization. (14718).
    • [C++] torch::data::transforms::Normalize: now supported in C++. (15891).
    • [C++]: Support call operator on module holder calling forward. (15831).
      Random and Sequential distributed samplers. (16910).
    • [C++]: pretty printing of C++ Modules. (15326).
    • [C++] Support serializing std::vector<torch::Tensor>. (19677).

    πŸ› Bug Fixes

    Serious

    • torch.prod: correct erroneous calculation on large tensors. (15653).
    • torch.mean (and other reductions): fix incorrect calculation on CUDA on large inputs. (16023).
    • nn.Conv: correctly handle non-contiguous inputs on MKLDNN convolution codepath. (16300).
    • Tensor.eq_: Fix erroneous calculation. (15475).
    • torch.mean: Fix fp16 output calculation. (14878).
    • nn.PoissonNLLLoss: Properly handle reduction=None. (17358).
    • [JIT] Fix bug where custom ops could get optimized out if their outputs weren't used. (#18711).
    • [JIT] Fix bug where the model serializer would accidentally reorder statements. (#17557).

    Other

    • Tensor.round is now consistently half to even. (17443).
    • Tensor.resize_: Fix some 0-element cases. (14874).
    • Tensor.numpy: Fix conversion of torch.int8 dtype. (15194).
    • Tensor.grad: correctly handle del. (16525).
    • Tensor.clamp: correctly handle NaN on CUDA. (15479).
    • Tensor.topk: properly set launch bounds on CUDA. (17296).
    • Tensor.kthvalue: treat NaN as bigger than any number. (17824).
    • πŸ”€ Tensor.copy_: Properly synchronize on src and dst sreams. (16966).
    • Tensor indexing: Fix incorrect dimension error message. (16495).
    • πŸ“œ Tensor.coalesce, Tensor.clone, Tensor.to_dense: fixed for sparse 0-dimensional tensors. (17379).
    • torch.isinf: Don't error out on integral tensors. (15489).
    • torch.argsort, torch.sort: Match NumPy by considering NaNs to be larger than any number. (15886).
    • torch.geqrf, torch.ormqr: when an out parameter is specified, dispatch to the correct function. (16964).
    • torch.cuda.get_device_name / torch.cuda.get_device_capability: Fix handling of optional. (17222).
    • Tensor.tril_ / Tensor.triu_: properly reuse input memory. (17031).
    • torch.arange: fix shape inconsistency between CPU and CUDA. (18462).
    • torch.empty (and other size-based factory functions): properly enforce non-negative sizes. (17077).
    • πŸ‘ torch.load: support serializing / deserializing pathlib.Path object. (18562).
    • nn.BatchNorm: correctly handle very large batches. (17047).
    • πŸ”Š nn.Softmax / nn.LogSoftmax: fix double backward for torch.half. (17330).
    • nn.Softmax: handle empty inputs in backward. (17259).
    • nn.NLLLoss: Fix crash when ignore_index is out-of-bounds on CPU. (17328).
    • πŸ”Š nn.Softmax, nn.LogSoftmax: handle 0-element inputs. (17651).
    • nn.CTCLoss: correct error checking. (16269).
    • πŸ‘ nn.Conv: better report convolution size mismatch. (17436).
    • torch.nn.functional.cosine_similarity: fix output sometimes returning result > 1.0. (18168).
    • nn.parallel.data_parallel: Fix handling of buffers that require_grad. (13352).
    • nn.parallel.data_parallel: would previously sometimes frees tensors before all pending operations finish. (18465).
    • πŸ›  torch.distributed.broadcast: fixed repeated calls leading to OOM. (19219).
    • torch.multiprocessing: fix serialization of integer nn.Parameters. (18639).
    • torch.multiprocessing: Fix handling of distributions on CUDA. (16854).
    • torch.nonzero: Fix for 0-dimensional tensors on CUDA. (17406).
    • torch.slogdet: Fix sign requiring grad when input required grad. (16337).
    • βͺ torch.cuda.Stream: Properly restore stream on destination device when switching devices. (17439).
    • πŸ”€ torch.cuda.Stream: Fixed synchronization issue when used with non-current device. (15689).
    • torch.cuda.Stream: properly change device in stream context manager. (16128).
    • πŸ›  DataLoader: fixed a hang when no data was read and the buffer size is smaller than the chunk size. (17409).
    • 0️⃣ DataLoader: _utils.collate.default_collate now converts bool lists to byte Tensors, not integer tensors.
      (14669).
    • DataLoader: ensure dataset is indexed by integers. (17649).
    • πŸ“œ torch.sparse.mm: Handle transposed dense tensors in backwards. (18737).
    • πŸ“œ torch.sparse.sum: Fix parsing of dim. (16517).
    • πŸ“œ torch.sparse.mm / torch.sparse.addmm: fix broadcasting and using uninitialized data. (16572).
    • πŸ“œ Tensor.to_sparse: Fix for 0-dimensional tensors. (17406).
    • πŸ“œ SparseTensor: fix add with non-contiguous values tensors. (18179).
    • Fix compare_exchange_weak in weak_intrusive_ptr. (16302).
    • utils.model_zoo.load_url: Fix race condition. (16578).
    • utils.data.RandomSampler: have len properly take into account num_samples. (15991).
    • torch.distributions: Fix precision issue with expansion that prefers probs over logits. (18614).
    • πŸ›  distributions.dirichlet.Dirichlet: fixed an underflow issue. (17488).
    • πŸ›  distributions.binomial.Binomial.log_prob: fixed numerical stability issue. (15962).
    • πŸ†“ Caching Allocator: Free all blocks with outstanding events on OOM-retry. (19222).
    • torch.dtype: fix pickling issue with Python 2. (18045).
    • utils.data.DataLoader: Fix SIGCHLD checking. (19421).
    • ⚑️ optim.Optimizer: Properly copy defaults. (19308).
    • ⏱ optim.lr_scheduler.CosineAnnealingLR: Fix division-by-zero error. (19180).
    • ⏱ optim.lr_scheduler.ReduceLROnPlateau: fix bug when the argument to step is reused outside the function.
      (16697).
    • cudNN: fix race condition with multiple threads calling into the same device. (15080).
    • cudNN: Properly specify accumulation types. (16825).
    • cuDNN: Fix incorrectly selecting slower algorithms in certain cases. (15881).
    • cuFFT: Properly handle CUDA contexts. (19300)
    • Fix infinite loop in reduction functions when get_max_threads is nonzero but num_threads is 1. (15114).
    • πŸ›  Fix tensor printing bug with Python 2. (12732).
    • MKLDNN: fix thread safety. (17022).
    • [JIT] floordiv: Fix integer division and divide-by-zero semantics. (#15813).
    • [JIT] Fix bug in alias analysis that disabled optimizations even in models without mutation. (#18416).
    • [JIT] ord(): Fix handling of utf8 chars. (#19423).
    • [JIT] Fix error when too many parameters are passed to a fused CUDA kernel. (#18063).
    • [JIT] Fix bug where common subexpression elimination accidentally introduced aliasing to function outputs. (#19576).
    • [JIT] Fix infinite loop in requires_grad analysis pass. (#18361).
    • [JIT] Fix ordering of parameters for in rnn.py. (#18198).
    • [JIT]] Fix contiguous autodiff and AutoGradZero inconsistency (#18633).
    • [JIT] Fix error reporting in NVRTC use of the fuser. (#18327).
    • [JIT] Ensure GIL is acquired before doing module lookup on import. (#17135).
    • [JIT] Fix bug where _unique_state_dict could contain duplicate Tensors. (#18139).
    • [C++]: Fix module serialization issue where one submodule doesn't have any parameters, but its submodules do. (15033).
    • [C++]: Add Stream and Event APIs. (15937).
    • [C++]: Fix Module serialization incompatibility between Python and C++ with weight-less layers. (19740).
    • [C++]: Properly pass extra_cuda_cflags to C++ extensions on Windows. (18638).
    • [C++] Make SGD semantics match python. (15840).
    • [C++] torch::nn::init::orthogonal_: match Python API. (18915).

    πŸ—„ Deprecations

    • 🚚 torch.btrifact: the deprecated info argument has been removed. (14935).
    • 0️⃣ torch.potrs has been deprecated, use torch.cholesky_solve instead. Note that upper defaults to False for torch.cholesky_solve, and True for torch.potrs. (15334).
    • πŸ—„ torch.pstrf is deprecated; use torch.cholesky instead. Note that upper defaults to False for torch.cholesky, and True for torch.pstrf. (17866).
    • 0️⃣ torch.potri is deprecated; use torch.cholesky_inverse instead. Note that upper defaults to False for torch.cholesky_inverse, and True for torch.potri. (19498).
    • torch.btrifact_with_info has been deprecated; use torch.lu with get_infos=True instead.(18435).
    • πŸ—„ torch.btrifact has been deprecated; use the new name torch.lu instead. (18435).
    • πŸ—„ torch.gesv is deprecated; use the new name `torch.solve instead. (18060).
    • πŸ—„ torch.trtrs has been deprecated; use the new name torch.triangular_solve instead. (18213).
    • πŸ—„ torch. btriunpack has been deprecated; use the new name torch.lu_unpack instead. (18529).
    • πŸ—„ torch.btrisolve has been deprecated; use the new name torch.lu_solve instead. (18726).
    • [C++] IntList has been deprecated, use IntArrayRef instead, as it better describes the type and ownership semantics in C++. (16751).
    • [C++] Dispatch macros with Type parameters, e.g. AT_DISPATCH_ALL_TYPES(tensor.type(), ..., are now deprecated; use ScalarType instead, e.g. AT_DISPATCH_ALL_TYPES(tensor.scalar_type(), .... (17527, 17996).
    • [C++] the deprecated variable_tensor_functions have been removed. (15003).

    🐎 Performance

    Highlights

    • nn.BatchNorm CPU inference speed increased up to ~19x.(19152).
    • nn.AdaptiveAvgPool: speed up common-case of size=1 output by ~30x. (17011).
    • 🐎 nn.EmbeddingBag CPU performance increased by ~4x. (19329).
    • Tensor.copy_: sped up larger tensor copy ~2-3x, small regression in small tensor copy. (18618).
    • torch.nonzero: is now ~2x faster than numpy on CPU. (15190)
    • πŸ‘Œ Improve caching allocator for Pascal and newer GPUs; 10-20% better memory utilization on Mask-RCNN. (17120).
    • reduction functions: Speed up some large Tensor cases by 50-80%. (17428).
    • [JIT] Graph fuser: better fusion for backwards graphs in the presence of broadcasting. (#14957)
    • [JIT] Graph fuser: batch_norm fusion for inference. (#15146)
    • [JIT] Graph fuser: layer_norm fusion for inference. (#18266)

    Other

    • torch.abs, torch.frac, torch.repiprocal, torch.neg have been vectorized and parallelized (19041).
    • 🐎 torch.bmm: CPU performance increased by 2x. (19338).
    • 🐎 torch.sort: CUDA performance increased by ~2x. (19379).
    • torch.cat on CPU is now ~4x faster in the case where inputs are contiguous and dim != 0. (17032).
    • 🐎 torch.multinomial fixed a 2x performance regression. (17121).
    • torch.empty (and another factory functions): reduce overhead by 20-40%. (17565).
    • torch.linspace has been parallelized on CPU. (15320).
    • πŸ”Š torch.logspace has been parallelized on CPU. (15438).
    • torch.range has been parallelized on CPU. (15484).
    • torch.arange has been parallelized on CPU. (15667).
    • torch.load: avoid unnecessary CPU-to-CUDA copy. (17297).
    • reduction functions: improve efficiency on CUDA. (16224, 17040).
    • Speed up some GEMM cases on CPU by up to 7x.(17730)
    • Tensor iterator loop unrolling. (17667).
    • πŸ“œ sparse/dense matrix multiply: improve speed by ~5x. (16905).
    • distributions.MultivariateNormal: sped up. (17294).
    • [JIT] Graph fuser: pow scalar exponent / base autodiff, fusion (#19324)
    • [JIT] Graph fuser: allow fusion of function float arguments. (#18087)
    • [JIT] Shape analysis: specialize optional Tensor inputs to graphs. (#18360)
    • [JIT] Shape analysis: various correctness improvements. (#18271)
    • [JIT] Shape analysis: aten::_convolution now participates in shape analysis. (#16837]
    • [JIT] Autodiff: coverage for ops used in maskrcnn & BERT. (#16689)
    • [JIT] Autodiff: support for scalar comparison ops and randlike. (#14740)
    • [JIT] Autodiff: support for adaptive_avg_pool2d. (#15459)
    • [JIT] Autodiff: support for erf and erfc. (#15139)
    • [JIT] Autodiff: support for layernorm. (#17702)
    • [JIT] Autodiff: support for tanh. (#17816)
    • [JIT] Autodiff: support for matmul/dropout. (#17523)
    • [JIT] Autodiff: specialized CUDA impl for dropout. (#17756)
    • [JIT] Constant folding: improved inlining of control flow. (#16244)

    πŸ“š Documentation

    • πŸ“š Tensor.scatter_: add documentation about value parameter. (17467).
    • Tensor.unfold: correctly document dimension parameter, not dim. (19020).
    • Tensor.is_floating_point() is now documented. (15704).
    • πŸ“š torch.cholesky: Fix broken upper example in documentation. (15215).
    • torch.gesv: document out parameter. (15649).
    • πŸ‘ torch.mul: better explain elementwise multiplication. (15664).
    • πŸ‘ torch.eig, torch.symeig: better explain backwards limitations. (15929).
    • πŸ›  torch.ormqr: fixed output specification. (15694).
    • torch.from_numpy: replaced usage with torch.as_tensor in documentation. (16587).
    • πŸ“„ torch.mvlgamma: Fix the constant in the docs. (17045).
    • torch.mode: more precisely describe what is returned. (17069).
    • πŸ“š torch.upsample: documentation now matches torch.interpolate. (17134)
    • πŸ“š torch.arange: correct dtype documentation. (18604)
    • torch.cumprod: document out parameter. (19340).
    • torch.nonzero: document indices being returned lexicographically. (19539).
    • πŸ‘ torch.nn.functional.interpolate: better explain aligned_corners parameter. (14806).
    • πŸ“š torch.nn.functional.pad: documentation has been made consistent with other functional ops. (15984).
    • nn.functional.grid_sample: clarify behavior of padding. (19754).
    • nn.TripletMarginLoss: correct type of swap parameter. (18115).
    • πŸ“š nn.CrossEntropyLoss: clarify ignore_index documentation. (18117).
    • nn.CrossEntropyLoss: the input format is more clearly explained. (15990).
    • nn.CTCLoss: Clarify a number of ambiguities. (18415).
    • πŸ‘ nn.BCEWithLogitsLoss: add better explanation. (19212).
    • πŸ‘ nn.BCEWithLogitsLoss: better explain positive samples. (17258).
    • πŸ“š nn.ModuleList / nn.ParameterList: update documentation. (17731).
    • nn.Module.load_state_dict: correct semantics of strict. (17618)
    • nn.parallel.DataParallel: more accurately specify how different argument types are handled. (15993).
    • nn.parallel.DistributedDataParallel: Clarified batch size requirements. (16010).
    • torch.distributed: Document mixed-precision training. (15440).
    • torch.multiprocessing: Include example multiprocessing code. (16345).
    • πŸ‘ torch.autograd: Better explain computing Jacobian-vector product. (15197).
    • torch.cuda.get_rng_state, torch.cuda.set_rng_state: document taking a device object. (14324).
    • torch.device: Fix example of passing device to tensor factory. (16839).
    • πŸ“š DataLoader: update documentation to describe how workers are managed. (18091).
    • πŸ“š Unified shape formats throughout the documentation. (15741).
    • πŸ“š Update documentation for reduction arguments to use non-deprecated format. (17300).
    • mark_non_differentiable: document correct semantics. (17891).
    • Warn about memory overlaps on inplace operations. (17576).
    • πŸ›  Fix a number of small issues with conv and pooling docstrings. (17052).
    • πŸ›  Fix a number of small issues with padding and activation docstrings. (17197).
    • [C++]: mention packed accessors in Tensor basics. (19464).

    ONNX

    Exporting More Torch Operators to ONNX

    • Export torch.isnan to ONNX (17698).
    • Export torch.flatten to ONNX (16240).
    • Export torch.where, torch.ceil, torch.floor to ONNX (18571).
    • Export torch.narrow to ONNX (17550).
    • Export torch.argmax and torch torch.argmin (17382, 18264, 18261).
    • Export adaptive_avg_pool1D, adaptive_avg_pool2D, adaptive_avg_pool3D, adaptive_max_pool1D, adaptive_max_pool2D, adaptive_max_pool3D to ONNX (17412).
    • Export torch.nonzero to ONNX (17036, 18047).
    • Export torch.erf to ONNX (16106).
    • Export torch.split (15092).
    • Export torch.lt, torch.gt, torch.le, torch.ge, torch.eq, torch.ne to ONNX (15677).
    • Export torch.expand and torch.ne to ONNX (15050).
    • πŸ”Š Export torch.nn.LogSigmoid to ONNX (14830).
    • Export torch.nn.RReLU to ONNX (14781).
    • Export torch.reshape and torch.reshape_as to ONNX (16632, 16971).
    • Replace use of ConstantLike with with ConstantOfShape (16095, 16214).

    Extending Existing Exporting Logic

    • πŸ‘ Enable dim support in torch.nn.Softmax's export (18482).
    • πŸ‘Œ Support exporting squeeze & unsqueeze with negative dim attribute (19297).
    • Support exporting max_pool1d, max_pool2d, max_pool3d with indices (16455).
    • βž• Add dtype support in torch.logsoftmax and torch.softmax's export (17672).
    • Support ceil_mode in max_pool_1d, max_pool2d, max_pool3d, avg_pool1d, avg_pool2d, avg_pool3d's export (16769).

    ⚑️ Optimizing Exported ONNX Graph

    • βž• Add constant folding in ONNX exporter (18698).
    • Retain the parameter names in ONNX exporter (17551).
    • Omit slice op if it is a non-op (19155).
    • βž• Add a flag to strip doc_string from exported ONNX models (18882).
    • Omit torch.dropout if the model is in eval mode (16547).

    βž• Adding Utility Functions and Refactoring

    • Remove unused arg f from _model_to_graph(). (19647).
    • βž• Add the support for stable ONNX opsets in exporter (16068, 17419).
    • βœ… Set the default ONNX opset to the latest stable opset (i.e., 9) (17736).
    • βž• Add an utility function to check whether it's in the middle of ONNX export or not (19050).
    • πŸ”¨ Refactoring serialization of ONNX initializers to be name-based (17830).
    • πŸ”¦ Expose dim() on type and use it in ONNX symbolics (15933).
    • Add scalar_type_to_pytorch_type dict in ONNX symbolic (15965).
    • βž• Add an assertion to check the number of the parameters passed to ONNX exporter (18145).

    πŸ›  Bugfixes

    • πŸ›  Fix different types in rsub caused bug (15707).
    • πŸ›  Fix list structure supports in ONNX exporter (19102).
    • πŸ›  Fix case for activations attribute in nn.RNN ONNX export. (19368).
    • Minor fix for onnx ConstantOfShape export (18199).
    • πŸ›  Fix the torch.(reduce)min and torch.(reduce)max's export (15241).
    • πŸ›  Fixing ONNX export of logical ops to have correct output datatype (15185).
    • πŸ›  Fix typo in docstring (18216).