All Versions
34
Latest Version
Avg Release Cycle
20 days
Latest Release
1254 days ago

Changelog History
Page 3

  • v0.17 Changes

    September 10, 2019

    🆕 New features:

    • 📜 Sparse data support
    • We've implemented and set to default boost_from_average in RMSE mode. It gives a boost in quality especially for a small number of iterations.

    👌 Improvements:

    • Quantile regression on CPU
    • 0️⃣ default parameters for Poisson regression

    Speedups:

    • A number of speedups for training on CPU
    • Huge speedups for loading datasets with categorical features represented as pandas.Categorical.
      Hint: use pandas.Categorical instead of object to speed up loading up to 200x.
  • v0.16.5 Changes

    August 20, 2019

    💥 Breaking changes:

    • 0️⃣ All metrics except for AUC metric now use weights by default.

    🆕 New features:

    • Added boost_from_average parameter for RMSE training on CPU which might give a boost in quality.
    • Added conversion from ONNX to CatBoost. Now you can convert XGBoost or LightGBM model to ONNX, then convert it to CatBoost and use our fast applier. Use model.load_model(model_path, format="onnx") for that.

    Speed ups:

    • Training is ~15% faster for datasets with categorical features.

    🐛 Bug fixes:

    • 🔋 R language: get_features_importance with ShapValues for MultiClass, #868
    • NormalizedGini was not calculated, #962
    • 🐛 Bug in leaf calculation which could result in slightly worse quality if you use weights in binary classification mode
    • Fixed __builtins__ import in Python3 in PR #957, thanks to @AbhinavanT
  • v0.16.4 Changes

    August 14, 2019

    🐛 Bug fixes:

    • 🔖 Versions 0.16.* had a bug in python applier with categorical features for applying on more than 128 documents.

    🆕 New features:

    • It is now possible to use pairwise modes for datasets without groups

    👌 Improvements:

    • 1.8x Evaluation speed on asymmetrical trees
  • v0.16.3 Changes

    August 11, 2019

    💥 Breaking changes:

    • 🔋 Renamed column Feature Index to Feature Id in prettified output of python method get_feature_importance(), because it supports feature names now
    • Renamed option per_float_feature_binarization (--per-float-feature-binarization) to per_float_feature_quantization (--per-float-feature-quantization)
    • ✂ Removed parameter inverted from python cv method. Added type parameter instead, which can be set to Inverted
    • Method get_features() now works only for datasets without categorical features

    🆕 New features

    • A new multiclass version of AUC metric, called AUC Mu, which was proposed by Ross S. Kleiman on NeurIPS 2019, link
    • ➕ Added time series cv
    • ➕ Added MeanWeightedTarget in fstat
    • Added utils.get_confusion_matrix()
    • Now feature importance can be calculated for non-symmetric trees
  • v0.16.2 Changes

    August 02, 2019

    💥 Breaking changes:

    • Removed get_group_id() and get_features() methods of Pool class

    🆕 New model analysis tools:

    • 🔋 Added PredictionDiff type of get_feature_importance() method, which is a new method for model analysis. The method shows how the features influenced the fact that among two samples one has a higher prediction. It allows to debug ranking models: you find a pair of samples ranked incorrectly and you look at what features have caused that.
    • ➕ Added plot_predictions() method

    🆕 New features:

    • 🔋 model.set_feature_names() method in Python
    • ➕ Added stratified split to parameter search methods
    • 👌 Support catboost.load_model() from CPU snapshots for numerical-only datasets
    • 👍 CatBoostClassifier.score() now supports y as DataFrame
    • Added sampling_frequency, per_float_feature_binarization, monotone_constraints parameters to CatBoostClassifier and CatBoostRegresssor

    Speedups:

    • 2x speedup of multi-classification mode

    🛠 Bugfixes:

    • 🛠 Fixed score() for multiclassification, #924
    • Fixed get_all_params() function, #926

    Other improvements:

    • Clear error messages when a model cannot be saved
  • v0.16.1 Changes

    July 30, 2019

    💥 Breaking changes:

    • parameter fold_count is now called cv in grid_search() and randomized_search
    • cv results are now returned from grid_search() and randomized_search() in res['cv_results'] field

    🆕 New features:

    • 👍 R-language function catboost.save_model() now supports PMML, ONNX and other formats
    • Parameter monotone_constraints in python API allows specifying numerical features that the prediction shall depend on monotonically

    🐛 Bug fixes:

    • 🛠 Fixed eval_metric calculation for training with weights (in release 0.16 evaluation of a metric that was equal to an optimized loss did not use weights by default, so overfitting detector worked incorrectly)

    👌 Improvements:

    • Added option verbose to grid_search() and randomized_search()
    • Added tutorial on grid_search() and randomized_search()
  • v0.16 Changes

    July 24, 2019

    💥 Breaking changes:

    • MultiClass loss has now the same sign as Logloss. It had the other sign before and was maximized, now it is minimized.
    • CatBoostRegressor.score now returns the value of R2 metric instead of RMSE to be more consistent with the behavior of scikit-learn regressors.
    • 🔄 Changed metric parameter use_weights default value to false (except for ranking metrics)

    🆕 New features:

    • It is now possible to apply model on GPU
    • We have published two new realworld datasets with monotonic constraints, catboost.datasets.monotonic1() and catboost.datasets.monotonic2(). Before that there was only california_housing dataset in open-source with monotonic constraints. Now you can use these two to benchmark algorithms with monotonic constraints.
    • We've added several new metrics to catboost, including DCG, FairLoss, HammingLoss, NormalizedGini and FilteredNDCG
    • Introduced efficient GridSearch and RandomSearch implementations.
    • get_all_params() Python function returns the values of all training parameters, both user-defined and default.
    • ➕ Added more synonyms for training parameters to be more compatible with other GBDT libraries.

    Speedups:

    • AUC metric is computationally very expensive. We've implemented parallelized calculation of this metric, now it can be calculated on every iteration (or every k-th iteration) about 4x faster.

    Educational materials:

    • We've improved our command-line tutorial, now it has examples of files and more information.

    🛠 Fixes:

    • Automatic Logloss or MultiClass loss function deduction for CatBoostClassifier.fit now also works if the training dataset is specified as Pool or filename string.
    • 🛠 And some other fixes
  • v0.15.2 Changes

    June 28, 2019

    💥 Breaking changes:

    • 🔋 Function get_feature_statistics is replaced by calc_feature_statistics
    • Scoring function Correlation is renamed to Cosine
    • Parameter efb_max_conflict_fraction is renamed to sparse_features_conflict_fraction

    🆕 New features:

    • Models can be saved in PMML format now.

    Note: PMML does not have full categorical features support, so to have the model in PMML format for datasets with categorical features you need to use set one_hot_max_size parameter to some large value, so that all categorical features are one-hot encoded

    • 🔋 Feature names can be used to specify ignored features

    🐛 Bug fixes, including:

    • 🛠 Fixed restarting of CV on GPU for datasets without categorical features
    • 🛠 Fixed learning continuation errors with changed dataset (#879) and with model loaded from file (#884)
    • 🛠 Fixed NativeLib for JDK 9+ (PR #857)
  • v0.15.1 Changes

    May 31, 2019

    🐛 Bug fixes:

    • ⏪ restored parameter fstr_type in Python and R interfaces
  • v0.15 Changes

    May 27, 2019

    💥 Breaking changes

    • 0️⃣ cv is now stratified by default for Logloss, MultiClass and MultiClassOneVsAll.
    • 🚚 We have removed border parameter of Logloss metric. You need to use target_border as a separate training parameter now.
    • CatBoostClassifier now runs MultiClass if more than 2 different values are present in training dataset labels.
    • model.best_score_["validation_0"] is replaced with model.best_score_["validation"] if a single validation dataset is present.
    • get_object_importance function parameter ostr_type is renamed to type in Python and R.

    Model analysis

    • Tree visualisation by @karina-usmanova.
    • 🆕 New feature analysis: plotting information about how a feature was used in the model by @alexrogozin12.
    • Added plot parameter to get_roc_curve, get_fpr_curve and get_fnr_curve functions from catboost.utils.
    • 👌 Supported prettified format for all types of feature importances.

    🆕 New ways of doing predictions

    • Rust applier by @shuternay.
    • DotNet applier by @17minutes.
    • One-hot encoding for categorical features in CatBoost CoreML model by Kseniya Valchuk and Ekaterina Pogodina.

    🆕 New objectives

    Speedups

    • Speed up of shap values calculation for single object or for small number of objects by @Lokutrus.
    • Cheap preprocessing and no fighting of overfitting if there is little amount of iterations (since you will not overfit anyway).

    🆕 New functionality

    • Prediction of leaf indices.

    🆕 New educational materials

    • Rust tutorial by @shuternay.
    • C# tutorial.
    • Leaf indices.
    • Tree visualisation tutorial by @karina-usmanova.
    • Google Colab tutorial for regression in catboost by @col14m.

    🛠 And a set of fixes for your issues.