catboost v0.24.1 Release Notes
Release Date: 2020-08-27 // over 4 years ago-
Uncertainty prediction
๐ Main feature of this release is total uncertainty prediction support via virtual ensembles.
๐จ You can read the theoretical background in the preprint Uncertainty in Gradient Boosting via Ensembles from our research team.
We introduced new training parameterposterior_sampling
, that allows to estimate total uncertainty.
Settingposterior_sampling=True
implies enabling Langevin boosting, settingmodel_shrink_rate
to1/(2*N)
and settingdiffusion_temperature
toN
, whereN
is dataset size.
CatBoost object methodvirtual_ensembles_predict
splits model intovirtual_ensembles_count
submodels.
Callingmodel.virtual_ensembles_predict(.., prediction_type='TotalUncertainty')
returns mean prediction, variance (and knowledge uncertrainty for models, trained withRMSEWithUncertainty
loss function).
Callingmodel.virtual_ensembles_predict(.., prediction_type='VirtEnsembles')
returnsvirtual_ensembles_count
predictions of virtual submodels for each object.๐ New functionality
- ๐ Supported non-owning model deserialization for models with categorical feature counters
Speedups
- ๐ We've done lot's of speedups for sparse data loading. For example, on bosch sparse dataset preprocessing speed got 4.5x speedup while running in 28 thread setting.
๐ Bugfixes: