All Versions
34
Latest Version
Avg Release Cycle
20 days
Latest Release
1522 days ago
Changelog History
Page 2
Changelog History
Page 2
-
v0.20.1 Changes
December 11, 2019๐ New features:
- Have
leaf_estimation_method=Exact
the default for MAPE loss - ๐ฒ Add
CatBoostClassifier.predict_log_proba()
, PR #1095
๐ Bug fixes:
- Have
-
v0.20 Changes
November 28, 2019๐ New submodule for text processing!
It contains two classes to help you make text features ready for training:- Tokenizer -- use this class to split text into tokens (automatic lowercase and punctuation removal)
- Dictionary -- with this class you create a dictionary which maps tokens to numeric identifiers. You then use these identifiers as new features.
๐ New features:
- Enabled
boost_from_average
forMAPE
loss function
๐ Bug fixes:
- ๐ Fixed
Pool
creation frompandas.DataFrame
with discontinuous columns, #1079 - ๐ Fixed
standalone_evaluator
, PR #1083
Speedups:
- ๐ฆ Huge speedup of preprocessing in python-package for datasets with many samples (>10 mln)
๐ We also release precompiled packages for Python 3.8
-
v0.19.1 Changes
November 19, 2019๐ New features:
- With this release we support
Text
features for classification on GPU. To specify text columns usetext_features
parameter. Achieve better quality by using text information of your dataset. See more in Learning CatBoost with text features MultiRMSE
loss function is now available on CPU. Labels for the multi regression mode should be specified in separateLabel
columns- MonoForest framework for model analysis, based on our NeurIPS 2019 paper. Learn more in MonoForest tutorial
boost_from_average
is nowTrue
by default forQuantile
andMAE
loss functions, which improves the resulting quality
Speedups:
- Huge reduction of preprocessing time for datasets loaded from files and for datasets with many samples (> 10 million), which was a bottleneck for GPU training
- 3x speedup for small datasets
- With this release we support
-
v0.18.1 Changes
October 31, 2019 -
v0.18 Changes
October 21, 2019- ๐ The main feature of the release is huge speedup on small datasets. We now use MVS sampling for CPU regression and binary classification training by default, together with
Plain
boosting scheme for both small and large datasets. This change not only gives the huge speedup but also provides quality improvement! - The
boost_from_average
parameter is available inCatBoostClassifier
andCatBoostRegressor
- We have added new formats for describing monotonic constraints. For example,
"(1,0,0,-1)"
or"0:1,3:-1"
or"FeatureName0:1,FeatureName3:-1"
are all valid specifications. With Python andparams-file
json, lists and dictionaries can also be used
๐ Bugs fixed:
- ๐ The main feature of the release is huge speedup on small datasets. We now use MVS sampling for CPU regression and binary classification training by default, together with
-
v0.17.5 Changes
October 10, 2019๐ Bugs fixed:
- ๐
System of linear equations is not positive definite
when training MultiClass on Windows, #1022 - Cat feature values could be taken from floating-point data. We have forbidden this
- Handling of numpy.ndarray features data with categorical features is corrected
- ๐
-
v0.17.4 Changes
October 01, 2019๐ Improvements:
- Massive 2x speedup for
MultiClass
with many classes - Updated MVS implementation. See Minimal Variance Sampling in Stochastic Gradient Boosting by Bulat Ibragimov and Gleb Gusev at NeurIPS 2019
- โ Added
sum_models
in R-package, #1007
๐ Bugs fixed:
- Massive 2x speedup for
-
v0.17.3 Changes
September 24, 2019๐ Improvements:
- New visualization for parameter tuning. Use
plot=True
parameter ingrid_search
andrandomized_search
methods to show plots in jupyter notebook - ๐ Switched to jemalloc allocator instead of LFalloc in CLI and model interfaces to fix some problems on Windows 7 machines, #881
- Calculation of binary class AUC is faster up to 1.3x
- Added tutorial on using fast CatBoost applier with LightGBM models
๐ Bugs fixed:
- Shap values for
MultiClass
objective don't give constant 0 value for the last class in case of GPU training.
Shap values forMultiClass
objective are now calculated in the following way. First, predictions are normalized so that the average of all predictions is zero in each tree. The normalized predictions produce the same probabilities as the non-normalized ones. Then the shap values are calculated for every class separately. Note that since the shap values are calculated on the normalized predictions, their sum for every class is equal to the normalized prediction - ๐ Fixed bug in rangking tutorial, #955
- Allow string value for
per_float_feature_quantization
parameter, #996
- New visualization for parameter tuning. Use
-
v0.17.2 Changes
September 19, 2019๐ Improvements:
- 0๏ธโฃ For metric MAE on CPU default value of
leaf-estimation-method
is nowExact
- Speed up
LossFunctionChange
feature strength computation
๐ Bugs fixed:
- 0๏ธโฃ For metric MAE on CPU default value of
-
v0.17.1 Changes
September 13, 2019๐ Bugs fixed:
- ๐ Incorrect estimation of total RAM size on Windows and Mac OS, #989
- Failure when dataset is a
numpy.ndarray
withorder='F'
- Disable
boost_from_average
when baseline is specified
๐ Improvements:
- Polymorphic raw features storage (2x---25x faster data preparation for numeric features in non-float32 columns as either
pandas.DataFrame
ornumpy.ndarray
withorder='F'
). - ๐ Support AUC metric for
CrossEntropy
loss on CPU - โ Added
datasets.rotten_tomatoes()
, a textual dataset - Usability of
monotone_constraints
, #950
Speedups:
- โก๏ธ Optimized computation of
CrossEntropy
metric on CPUs with SSE3