All Versions
34
Latest Version
Avg Release Cycle
20 days
Latest Release
1245 days ago

Changelog History
Page 2

  • v0.20.1 Changes

    December 11, 2019

    🆕 New features:

    • Have leaf_estimation_method=Exact the default for MAPE loss
    • 🌲 Add CatBoostClassifier.predict_log_proba(), PR #1095

    🐛 Bug fixes:

    • 🛠 Fix usability of read-only numpy arrays, #1101
    • 🔋 Fix python3 compatibility for get_feature_importance, PR #1090
    • Fix loading model from snapshot for boost_from_average mode
  • v0.20 Changes

    November 28, 2019

    🆕 New submodule for text processing!
    It contains two classes to help you make text features ready for training:

    • Tokenizer -- use this class to split text into tokens (automatic lowercase and punctuation removal)
    • Dictionary -- with this class you create a dictionary which maps tokens to numeric identifiers. You then use these identifiers as new features.

    🆕 New features:

    • Enabled boost_from_average for MAPE loss function

    🐛 Bug fixes:

    • 🛠 Fixed Pool creation from pandas.DataFrame with discontinuous columns, #1079
    • 🛠 Fixed standalone_evaluator, PR #1083

    Speedups:

    • 📦 Huge speedup of preprocessing in python-package for datasets with many samples (>10 mln)

    🚀 We also release precompiled packages for Python 3.8

  • v0.19.1 Changes

    November 19, 2019

    🆕 New features:

    • With this release we support Text features for classification on GPU. To specify text columns use text_features parameter. Achieve better quality by using text information of your dataset. See more in Learning CatBoost with text features
    • MultiRMSE loss function is now available on CPU. Labels for the multi regression mode should be specified in separate Label columns
    • MonoForest framework for model analysis, based on our NeurIPS 2019 paper. Learn more in MonoForest tutorial
    • boost_from_average is now True by default for Quantile and MAE loss functions, which improves the resulting quality

    Speedups:

    • Huge reduction of preprocessing time for datasets loaded from files and for datasets with many samples (> 10 million), which was a bottleneck for GPU training
    • 3x speedup for small datasets
  • v0.18.1 Changes

    October 31, 2019

    🆕 New features:

    • Now datasets.msrank() returns full msrank dataset. Previously, it returned the first 10k samples.
      We have added msrank_10k() dataset implementing the past behaviour.

    🐛 Bug fixes:

    • get_object_importance() now respects parameter top_size, #1045 by @ibuda
  • v0.18 Changes

    October 21, 2019
    • 🚀 The main feature of the release is huge speedup on small datasets. We now use MVS sampling for CPU regression and binary classification training by default, together with Plain boosting scheme for both small and large datasets. This change not only gives the huge speedup but also provides quality improvement!
    • The boost_from_average parameter is available in CatBoostClassifier and CatBoostRegressor
    • We have added new formats for describing monotonic constraints. For example, "(1,0,0,-1)" or "0:1,3:-1" or "FeatureName0:1,FeatureName3:-1" are all valid specifications. With Python and params-file json, lists and dictionaries can also be used

    🐛 Bugs fixed:

    • Error in Multiclass classifier training, #1040
    • 👻 Unhandled exception when saving quantized pool, #1021
    • Python 3.7: RuntimeError raised in StagedPredictIterator, #848
  • v0.17.5 Changes

    October 10, 2019

    🐛 Bugs fixed:

    • 🏁 System of linear equations is not positive definite when training MultiClass on Windows, #1022
    • Cat feature values could be taken from floating-point data. We have forbidden this
    • Handling of numpy.ndarray features data with categorical features is corrected
  • v0.17.4 Changes

    October 01, 2019

    👌 Improvements:

    • Massive 2x speedup for MultiClass with many classes
    • Updated MVS implementation. See Minimal Variance Sampling in Stochastic Gradient Boosting by Bulat Ibragimov and Gleb Gusev at NeurIPS 2019
    • ➕ Added sum_models in R-package, #1007

    🐛 Bugs fixed:

    • Multi model initialization in python, #995
    • Mishandling of 255 borders in training on GPU, #1010
  • v0.17.3 Changes

    September 24, 2019

    👌 Improvements:

    • New visualization for parameter tuning. Use plot=True parameter in grid_search and randomized_search methods to show plots in jupyter notebook
    • 🏁 Switched to jemalloc allocator instead of LFalloc in CLI and model interfaces to fix some problems on Windows 7 machines, #881
    • Calculation of binary class AUC is faster up to 1.3x
    • Added tutorial on using fast CatBoost applier with LightGBM models

    🐛 Bugs fixed:

    • Shap values for MultiClass objective don't give constant 0 value for the last class in case of GPU training.
      Shap values for MultiClass objective are now calculated in the following way. First, predictions are normalized so that the average of all predictions is zero in each tree. The normalized predictions produce the same probabilities as the non-normalized ones. Then the shap values are calculated for every class separately. Note that since the shap values are calculated on the normalized predictions, their sum for every class is equal to the normalized prediction
    • 🛠 Fixed bug in rangking tutorial, #955
    • Allow string value for per_float_feature_quantization parameter, #996
  • v0.17.2 Changes

    September 19, 2019

    👌 Improvements:

    • 0️⃣ For metric MAE on CPU default value of leaf-estimation-method is now Exact
    • Speed up LossFunctionChange feature strength computation

    🐛 Bugs fixed:

    • Broken label converter in grid search for multiclassification, #993
    • Incorrect prediction with monotonic constraint, #994
    • Invalid value of eval_metric in output of get_all_params(), #940
    • Train AUC is not computed because hint skip_train~false is ignored, #970
  • v0.17.1 Changes

    September 13, 2019

    🐛 Bugs fixed:

    • 🏁 Incorrect estimation of total RAM size on Windows and Mac OS, #989
    • Failure when dataset is a numpy.ndarray with order='F'
    • Disable boost_from_average when baseline is specified

    👌 Improvements:

    • Polymorphic raw features storage (2x---25x faster data preparation for numeric features in non-float32 columns as either pandas.DataFrame or numpy.ndarray with order='F').
    • 👌 Support AUC metric for CrossEntropy loss on CPU
    • ➕ Added datasets.rotten_tomatoes(), a textual dataset
    • Usability of monotone_constraints, #950

    Speedups:

    • ⚡️ Optimized computation of CrossEntropy metric on CPUs with SSE3