catboost v0.24.1 Release Notes
Release Date: 2020-08-27 // over 3 years ago-
Uncertainty prediction
🚀 Main feature of this release is total uncertainty prediction support via virtual ensembles.
🖨 You can read the theoretical background in the preprint Uncertainty in Gradient Boosting via Ensembles from our research team.
We introduced new training parameterposterior_sampling
, that allows to estimate total uncertainty.
Settingposterior_sampling=True
implies enabling Langevin boosting, settingmodel_shrink_rate
to1/(2*N)
and settingdiffusion_temperature
toN
, whereN
is dataset size.
CatBoost object methodvirtual_ensembles_predict
splits model intovirtual_ensembles_count
submodels.
Callingmodel.virtual_ensembles_predict(.., prediction_type='TotalUncertainty')
returns mean prediction, variance (and knowledge uncertrainty for models, trained withRMSEWithUncertainty
loss function).
Callingmodel.virtual_ensembles_predict(.., prediction_type='VirtEnsembles')
returnsvirtual_ensembles_count
predictions of virtual submodels for each object.🆕 New functionality
- 👌 Supported non-owning model deserialization for models with categorical feature counters
Speedups
- 📜 We've done lot's of speedups for sparse data loading. For example, on bosch sparse dataset preprocessing speed got 4.5x speedup while running in 28 thread setting.
🛠 Bugfixes: