All Versions
34
Latest Version
Avg Release Cycle
20 days
Latest Release
926 days ago
Changelog History
Page 3
Changelog History
Page 3
-
v0.17 Changes
September 10, 2019๐ New features:
- ๐ Sparse data support
- We've implemented and set to default
boost_from_average
in RMSE mode. It gives a boost in quality especially for a small number of iterations.
๐ Improvements:
- Quantile regression on CPU
- 0๏ธโฃ default parameters for Poisson regression
Speedups:
- A number of speedups for training on CPU
- Huge speedups for loading datasets with categorical features represented as
pandas.Categorical
.
Hint: usepandas.Categorical
instead of object to speed up loading up to 200x.
-
v0.16.5 Changes
August 20, 2019๐ฅ Breaking changes:
- 0๏ธโฃ All metrics except for AUC metric now use weights by default.
๐ New features:
- Added
boost_from_average
parameter for RMSE training on CPU which might give a boost in quality. - Added conversion from ONNX to CatBoost. Now you can convert XGBoost or LightGBM model to ONNX, then convert it to CatBoost and use our fast applier. Use
model.load_model(model_path, format="onnx")
for that.
Speed ups:
- Training is ~15% faster for datasets with categorical features.
๐ Bug fixes:
- ๐ R language:
get_features_importance
withShapValues
forMultiClass
, #868 - NormalizedGini was not calculated, #962
- ๐ Bug in leaf calculation which could result in slightly worse quality if you use weights in binary classification mode
- Fixed
__builtins__
import in Python3 in PR #957, thanks to @AbhinavanT
-
v0.16.4 Changes
August 14, 2019๐ Bug fixes:
- ๐ Versions 0.16.* had a bug in python applier with categorical features for applying on more than 128 documents.
๐ New features:
- It is now possible to use pairwise modes for datasets without groups
๐ Improvements:
- 1.8x Evaluation speed on asymmetrical trees
-
v0.16.3 Changes
August 11, 2019๐ฅ Breaking changes:
- ๐ Renamed column
Feature Index
toFeature Id
in prettified output of python methodget_feature_importance()
, because it supports feature names now - Renamed option
per_float_feature_binarization
(--per-float-feature-binarization
) toper_float_feature_quantization
(--per-float-feature-quantization
) - โ Removed parameter
inverted
from pythoncv
method. Addedtype
parameter instead, which can be set toInverted
- Method
get_features()
now works only for datasets without categorical features
๐ New features
- A new multiclass version of AUC metric, called
AUC Mu
, which was proposed by Ross S. Kleiman on NeurIPS 2019, link - โ Added time series cv
- โ Added
MeanWeightedTarget
infstat
- Added
utils.get_confusion_matrix()
- Now feature importance can be calculated for non-symmetric trees
- ๐ Renamed column
-
v0.16.2 Changes
August 02, 2019๐ฅ Breaking changes:
- Removed
get_group_id()
andget_features()
methods ofPool
class
๐ New model analysis tools:
- ๐ Added
PredictionDiff
type ofget_feature_importance()
method, which is a new method for model analysis. The method shows how the features influenced the fact that among two samples one has a higher prediction. It allows to debug ranking models: you find a pair of samples ranked incorrectly and you look at what features have caused that. - โ Added
plot_predictions()
method
๐ New features:
- ๐
model.set_feature_names()
method in Python - โ Added stratified split to parameter search methods
- ๐ Support
catboost.load_model()
from CPU snapshots for numerical-only datasets - ๐
CatBoostClassifier.score()
now supportsy
asDataFrame
- Added
sampling_frequency
,per_float_feature_binarization
,monotone_constraints
parameters toCatBoostClassifier
andCatBoostRegresssor
Speedups:
- 2x speedup of multi-classification mode
๐ Bugfixes:
Other improvements:
- Clear error messages when a model cannot be saved
- Removed
-
v0.16.1 Changes
July 30, 2019๐ฅ Breaking changes:
- parameter
fold_count
is now calledcv
ingrid_search()
andrandomized_search
- cv results are now returned from
grid_search()
andrandomized_search()
inres['cv_results']
field
๐ New features:
- ๐ R-language function
catboost.save_model()
now supports PMML, ONNX and other formats - Parameter
monotone_constraints
in python API allows specifying numerical features that the prediction shall depend on monotonically
๐ Bug fixes:
- ๐ Fixed
eval_metric
calculation for training with weights (in release 0.16 evaluation of a metric that was equal to an optimized loss did not use weights by default, so overfitting detector worked incorrectly)
๐ Improvements:
- Added option
verbose
togrid_search()
andrandomized_search()
- Added tutorial on
grid_search()
andrandomized_search()
- parameter
-
v0.16 Changes
July 24, 2019๐ฅ Breaking changes:
MultiClass
loss has now the same sign as Logloss. It had the other sign before and was maximized, now it is minimized.CatBoostRegressor.score
now returns the value of R2 metric instead of RMSE to be more consistent with the behavior of scikit-learn regressors.- ๐ Changed metric parameter
use_weights
default value to false (except for ranking metrics)
๐ New features:
- It is now possible to apply model on GPU
- We have published two new realworld datasets with monotonic constraints,
catboost.datasets.monotonic1()
andcatboost.datasets.monotonic2()
. Before that there was onlycalifornia_housing
dataset in open-source with monotonic constraints. Now you can use these two to benchmark algorithms with monotonic constraints. - We've added several new metrics to catboost, including
DCG
,FairLoss
,HammingLoss
,NormalizedGini
andFilteredNDCG
- Introduced efficient
GridSearch
andRandomSearch
implementations. get_all_params()
Python function returns the values of all training parameters, both user-defined and default.- โ Added more synonyms for training parameters to be more compatible with other GBDT libraries.
Speedups:
- AUC metric is computationally very expensive. We've implemented parallelized calculation of this metric, now it can be calculated on every iteration (or every k-th iteration) about 4x faster.
Educational materials:
- We've improved our command-line tutorial, now it has examples of files and more information.
๐ Fixes:
- Automatic
Logloss
orMultiClass
loss function deduction forCatBoostClassifier.fit
now also works if the training dataset is specified asPool
or filename string. - ๐ And some other fixes
-
v0.15.2 Changes
June 28, 2019๐ฅ Breaking changes:
- ๐ Function
get_feature_statistics
is replaced bycalc_feature_statistics
- Scoring function
Correlation
is renamed toCosine
- Parameter
efb_max_conflict_fraction
is renamed tosparse_features_conflict_fraction
๐ New features:
- Models can be saved in PMML format now.
Note: PMML does not have full categorical features support, so to have the model in PMML format for datasets with categorical features you need to use set
one_hot_max_size
parameter to some large value, so that all categorical features are one-hot encoded- ๐ Feature names can be used to specify ignored features
๐ Bug fixes, including:
- ๐ Function
-
v0.15.1 Changes
May 31, 2019๐ Bug fixes:
- โช restored parameter
fstr_type
in Python and R interfaces
- โช restored parameter
-
v0.15 Changes
May 27, 2019๐ฅ Breaking changes
- 0๏ธโฃ cv is now stratified by default for
Logloss
,MultiClass
andMultiClassOneVsAll
. - ๐ We have removed
border
parameter ofLogloss
metric. You need to usetarget_border
as a separate training parameter now. CatBoostClassifier
now runsMultiClass
if more than 2 different values are present in training dataset labels.model.best_score_["validation_0"]
is replaced withmodel.best_score_["validation"]
if a single validation dataset is present.get_object_importance
function parameterostr_type
is renamed totype
in Python and R.
Model analysis
- Tree visualisation by @karina-usmanova.
- ๐ New feature analysis: plotting information about how a feature was used in the model by @alexrogozin12.
- Added
plot
parameter toget_roc_curve
,get_fpr_curve
andget_fnr_curve
functions fromcatboost.utils
. - ๐ Supported prettified format for all types of feature importances.
๐ New ways of doing predictions
- Rust applier by @shuternay.
- DotNet applier by @17minutes.
- One-hot encoding for categorical features in CatBoost CoreML model by Kseniya Valchuk and Ekaterina Pogodina.
๐ New objectives
- Expectile Regression by @david-waterworth.
- Huber loss by @atsky.
Speedups
- Speed up of shap values calculation for single object or for small number of objects by @Lokutrus.
- Cheap preprocessing and no fighting of overfitting if there is little amount of iterations (since you will not overfit anyway).
๐ New functionality
- Prediction of leaf indices.
๐ New educational materials
- Rust tutorial by @shuternay.
- C# tutorial.
- Leaf indices.
- Tree visualisation tutorial by @karina-usmanova.
- Google Colab tutorial for regression in catboost by @col14m.
๐ And a set of fixes for your issues.
- 0๏ธโฃ cv is now stratified by default for