catboost/CHANGELOG and catboost Releases (Page 3)

All Versions

Latest Version

0.24.3

Avg Release Cycle

20 days

Latest Release

1254 days ago

Changelog History

Page 3

v0.17 Changes
September 10, 2019
🆕 New features:
- 📜 Sparse data support
- We've implemented and set to default boost_from_average in RMSE mode. It gives a boost in quality especially for a small number of iterations.
👌 Improvements:
- Quantile regression on CPU
- 0️⃣ default parameters for Poisson regression
Speedups:
- A number of speedups for training on CPU
- Huge speedups for loading datasets with categorical features represented as pandas.Categorical.
  Hint: use pandas.Categorical instead of object to speed up loading up to 200x.
v0.16.5 Changes
August 20, 2019
💥 Breaking changes:
- 0️⃣ All metrics except for AUC metric now use weights by default.
🆕 New features:
- Added boost_from_average parameter for RMSE training on CPU which might give a boost in quality.
- Added conversion from ONNX to CatBoost. Now you can convert XGBoost or LightGBM model to ONNX, then convert it to CatBoost and use our fast applier. Use model.load_model(model_path, format="onnx") for that.
Speed ups:
- Training is ~15% faster for datasets with categorical features.
🐛 Bug fixes:
- 🔋 R language: get_features_importance with ShapValues for MultiClass, #868
- NormalizedGini was not calculated, #962
- 🐛 Bug in leaf calculation which could result in slightly worse quality if you use weights in binary classification mode
- Fixed __builtins__ import in Python3 in PR #957, thanks to @AbhinavanT
v0.16.4 Changes
August 14, 2019
🐛 Bug fixes:
- 🔖 Versions 0.16.* had a bug in python applier with categorical features for applying on more than 128 documents.
🆕 New features:
- It is now possible to use pairwise modes for datasets without groups
👌 Improvements:
- 1.8x Evaluation speed on asymmetrical trees
v0.16.3 Changes
August 11, 2019
💥 Breaking changes:
- 🔋 Renamed column Feature Index to Feature Id in prettified output of python method get_feature_importance(), because it supports feature names now
- Renamed option per_float_feature_binarization (--per-float-feature-binarization) to per_float_feature_quantization (--per-float-feature-quantization)
- ✂ Removed parameter inverted from python cv method. Added type parameter instead, which can be set to Inverted
- Method get_features() now works only for datasets without categorical features
🆕 New features
- A new multiclass version of AUC metric, called AUC Mu, which was proposed by Ross S. Kleiman on NeurIPS 2019, link
- ➕ Added time series cv
- ➕ Added MeanWeightedTarget in fstat
- Added utils.get_confusion_matrix()
- Now feature importance can be calculated for non-symmetric trees
v0.16.2 Changes
August 02, 2019
💥 Breaking changes:
- Removed get_group_id() and get_features() methods of Pool class
🆕 New model analysis tools:
- 🔋 Added PredictionDiff type of get_feature_importance() method, which is a new method for model analysis. The method shows how the features influenced the fact that among two samples one has a higher prediction. It allows to debug ranking models: you find a pair of samples ranked incorrectly and you look at what features have caused that.
- ➕ Added plot_predictions() method
🆕 New features:
- 🔋 model.set_feature_names() method in Python
- ➕ Added stratified split to parameter search methods
- 👌 Support catboost.load_model() from CPU snapshots for numerical-only datasets
- 👍 CatBoostClassifier.score() now supports y as DataFrame
- Added sampling_frequency, per_float_feature_binarization, monotone_constraints parameters to CatBoostClassifier and CatBoostRegresssor
Speedups:
- 2x speedup of multi-classification mode
🛠 Bugfixes:
- 🛠 Fixed score() for multiclassification, #924
- Fixed get_all_params() function, #926
Other improvements:
- Clear error messages when a model cannot be saved
v0.16.1 Changes
July 30, 2019
💥 Breaking changes:
- parameter fold_count is now called cv in grid_search() and randomized_search
- cv results are now returned from grid_search() and randomized_search() in res['cv_results'] field
🆕 New features:
- 👍 R-language function catboost.save_model() now supports PMML, ONNX and other formats
- Parameter monotone_constraints in python API allows specifying numerical features that the prediction shall depend on monotonically
🐛 Bug fixes:
- 🛠 Fixed eval_metric calculation for training with weights (in release 0.16 evaluation of a metric that was equal to an optimized loss did not use weights by default, so overfitting detector worked incorrectly)
👌 Improvements:
- Added option verbose to grid_search() and randomized_search()
- Added tutorial on grid_search() and randomized_search()
v0.16 Changes
July 24, 2019
💥 Breaking changes:
- MultiClass loss has now the same sign as Logloss. It had the other sign before and was maximized, now it is minimized.
- CatBoostRegressor.score now returns the value of R² metric instead of RMSE to be more consistent with the behavior of scikit-learn regressors.
- 🔄 Changed metric parameter use_weights default value to false (except for ranking metrics)
🆕 New features:
- It is now possible to apply model on GPU
- We have published two new realworld datasets with monotonic constraints, catboost.datasets.monotonic1() and catboost.datasets.monotonic2(). Before that there was only california_housing dataset in open-source with monotonic constraints. Now you can use these two to benchmark algorithms with monotonic constraints.
- We've added several new metrics to catboost, including DCG, FairLoss, HammingLoss, NormalizedGini and FilteredNDCG
- Introduced efficient GridSearch and RandomSearch implementations.
- get_all_params() Python function returns the values of all training parameters, both user-defined and default.
- ➕ Added more synonyms for training parameters to be more compatible with other GBDT libraries.
Speedups:
- AUC metric is computationally very expensive. We've implemented parallelized calculation of this metric, now it can be calculated on every iteration (or every k-th iteration) about 4x faster.
Educational materials:
- We've improved our command-line tutorial, now it has examples of files and more information.
🛠 Fixes:
- Automatic Logloss or MultiClass loss function deduction for CatBoostClassifier.fit now also works if the training dataset is specified as Pool or filename string.
- 🛠 And some other fixes
v0.15.2 Changes
June 28, 2019
💥 Breaking changes:
- 🔋 Function get_feature_statistics is replaced by calc_feature_statistics
- Scoring function Correlation is renamed to Cosine
- Parameter efb_max_conflict_fraction is renamed to sparse_features_conflict_fraction
🆕 New features:
- Models can be saved in PMML format now.
Note: PMML does not have full categorical features support, so to have the model in PMML format for datasets with categorical features you need to use set one_hot_max_size parameter to some large value, so that all categorical features are one-hot encoded
- 🔋 Feature names can be used to specify ignored features
🐛 Bug fixes, including:
- 🛠 Fixed restarting of CV on GPU for datasets without categorical features
- 🛠 Fixed learning continuation errors with changed dataset (#879) and with model loaded from file (#884)
- 🛠 Fixed NativeLib for JDK 9+ (PR #857)
v0.15.1 Changes
May 31, 2019
🐛 Bug fixes:
- ⏪ restored parameter fstr_type in Python and R interfaces
v0.15 Changes
May 27, 2019
💥 Breaking changes
- 0️⃣ cv is now stratified by default for Logloss, MultiClass and MultiClassOneVsAll.
- 🚚 We have removed border parameter of Logloss metric. You need to use target_border as a separate training parameter now.
- CatBoostClassifier now runs MultiClass if more than 2 different values are present in training dataset labels.
- model.best_score_["validation_0"] is replaced with model.best_score_["validation"] if a single validation dataset is present.
- get_object_importance function parameter ostr_type is renamed to type in Python and R.
Model analysis
- Tree visualisation by @karina-usmanova.
- 🆕 New feature analysis: plotting information about how a feature was used in the model by @alexrogozin12.
- Added plot parameter to get_roc_curve, get_fpr_curve and get_fnr_curve functions from catboost.utils.
- 👌 Supported prettified format for all types of feature importances.
🆕 New ways of doing predictions
- Rust applier by @shuternay.
- DotNet applier by @17minutes.
- One-hot encoding for categorical features in CatBoost CoreML model by Kseniya Valchuk and Ekaterina Pogodina.
🆕 New objectives
- Expectile Regression by @david-waterworth.
- Huber loss by @atsky.
Speedups
- Speed up of shap values calculation for single object or for small number of objects by @Lokutrus.
- Cheap preprocessing and no fighting of overfitting if there is little amount of iterations (since you will not overfit anyway).
🆕 New functionality
- Prediction of leaf indices.
🆕 New educational materials
- Rust tutorial by @shuternay.
- C# tutorial.
- Leaf indices.
- Tree visualisation tutorial by @karina-usmanova.
- Google Colab tutorial for regression in catboost by @col14m.
🛠 And a set of fixes for your issues.

catboost changelog

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Changelog History Page 3

🆕 New features:

👌 Improvements:

Speedups:

💥 Breaking changes:

🆕 New features:

Speed ups:

🐛 Bug fixes:

🐛 Bug fixes:

🆕 New features:

👌 Improvements:

💥 Breaking changes:

🆕 New features

💥 Breaking changes:

🆕 New model analysis tools:

🆕 New features:

Speedups:

🛠 Bugfixes:

Other improvements:

💥 Breaking changes:

🆕 New features:

🐛 Bug fixes:

👌 Improvements:

💥 Breaking changes:

🆕 New features:

Speedups:

Educational materials:

🛠 Fixes:

💥 Breaking changes:

🆕 New features:

🐛 Bug fixes, including:

🐛 Bug fixes:

💥 Breaking changes

Model analysis

🆕 New ways of doing predictions

🆕 New objectives

Speedups

🆕 New functionality

🆕 New educational materials

Changelog History

Page 3