catboost v0.23.2 Release NotesRelease Date: 2020-05-26 // 11 months ago
🆕 New functionality
plot_partial_dependencemethod in python-package (Now it works for models with symmetric trees trained on dataset with numerical features only). Implemented by @felixandrer.
- Allowed using
boost_from_averageoption together with
model_shrink_rateoption. In this case shrinkage is applied to the starting value..
- Added new
auto_class_weightsoption in python-package, R-package and cli with possible values
Balancedevery class is weighted
maxSumWeightInClass / sumWeightInClass, where sumWeightInClass is sum of weights of all samples in this class. If no weights are present then sample weight is 1. And maxSumWeightInClass - is maximum sum weight among all classes. For
SqrtBalancedthe formula is
sqrt(maxSumWeightInClass / sumWeightInClass). This option supported in binclass and multiclass tasks. Implemented by @egiby.
model_size_regoption on GPU. Set to 0.5 by default (same as in CPU). This regularization works slightly differently on GPU: feature combinations are regularized more aggressively than on CPU. For CPU cost of a combination is equal to number of different feature values in this combinations that are present in training dataset. On GPU cost of a combination is equal to number of all possible different values of this combination. For example, if combination contains two categorical features c1 and c2, then the cost will be #categories in c1 * #categories in c2, even though many of the values from this combination might not be present in the dataset.
- Added calculation of Shapley values, (see formula (2) from https://arxiv.org/pdf/1802.03888.pdf). By default estimation from this paper (Algorithm 2) is calcucated, that is much more faster. To use this mode specify shap_calc_type parameter of CatBoost.get_feature_importance function as "Exact". Implemented by @LordProtoss.
- 🛠 Fixed onnx converter for old onnx versions.