catboost v0.24 Release Notes

Release Date: 2020-08-05 // 8 months ago
  • ๐Ÿ†• New functionality

    • 0๏ธโƒฃ We've finally implemented MVS sampling for GPU training. Switched default bootstrap algorithm to MVS for RMSE loss function while training on GPU
    • Implemented near-zero cost model deserialization from memory blob. Currently, if your model doesn't use categorical features CTR counters and text features you can deserialize model from, for example, memory-mapped file.
    • Added ability to load trained models from binary string or file-like stream. To load model from bytes string use load_model(blob=b'....'), to deserialize form file-like stream use load_model(stream=gzip.open('model.cbm.gz', 'rb'))
    • ๐Ÿ›  Fixed auto-learning rate estimation params for GPU
    • ๐Ÿ‘Œ Supported beta parameter for QuerySoftMax function on CPU and GPU

    ๐Ÿ†• New losses and metrics

    • ๐Ÿ†• New loss function RMSEWithUncertainty - it allows to estimate data uncertainty for trained regression models. The trained model will give you a two-element vector for each object with the first element as regression model prediction and the second element as an estimation of data uncertainty for that prediction.

    Speedups

    • Our team and our contributors (Thanks @dmsivkov!) have made major speedups for CPU training: kdd98 -9%, higgs -18%, msrank -28%

    ๐Ÿ›  Bugfixes:

    • ๐Ÿ›  Fixed CatBoost model export as Python code
    • ๐Ÿ›  Fixed AUC metric creation
    • Add text features to model.feature_names_. Issue #1314
    • Allow models, trained on datasets with NaN values (Min treatment) and without NaNs in model_sum() or as the base model in init_model=. Issue #1271

    Educational materials

    • ๐Ÿ”‹ Published new tutorial on categorical features parameters. Thanks @garkavem