catboost v0.24.3 Release Notes

Release Date: 2020-11-18 // 11 months ago
  • 🚀 Release 0.24.3

    🆕 New functionality

    • 👌 Support fstr text features and embeddings. Issue #1293

    🛠 Bugfixes:

    • 🛠 Fix model apply speed regression introduced in 0.24.1
    • 🛠 Different fixes in embeddings support: fixed apply and model serialization, fixed apply on texts and embeddings
    • 🛠 Fixed virtual ensembles prediction - use proper scaling, fix apply (issue #1462)
    • 🛠 Fix score() method for RMSEWithUncertainty issue #1482
    • Automatically use correct prediction_type in score()

Previous changes from v0.24.2

  • Uncertainty prediction

    • 👌 Supported uncertainty prediction for classification models.
    • 🛠 Fixed RMSEWithUncertainty data uncertainty prediction - now it predicts variance, not standard deviation.

    🆕 New functionality

    • 👍 Allow categorical feature counters for MultiRMSE loss function.
    • group_weight parameter added to catboost.utils.eval_metric method to allow passing weights for object groups. Allows correctly match weighted ranking metrics computation when group weights present.
    • 🚚 Faster non-owning deserialization from memory with less memory overhead - moved some dynamically computed data to model file, other data is computed in lazy manner only when needed.

    Experimental functionality

    • 👌 Supported embedding features as input and linear discriminant analysis for embeddings preprocessing. Try adding your embeddings as new columns with embedding values array in Pandas.Dataframe and passing corresponding column names to Pool constructor or fit function with embedding_features=['EmbeddingFeaturesColumnName1, ...] parameter. Another way of adding your embedding vectors is new type of column in Column Description file NumVector and adding semicolon separated embeddings column to your XSV file: ClassLabel\t0.1;0.2;0.3\t....

    Educational materials

    • Published new tutorial on uncertainty prediction.

    🛠 Bugfixes:

    • ⬇️ Reduced GPU memory usage in multi gpu training when there is no need to compute categorical feature counters.
    • Now CatBoost allows to specify use_weights for metrics when auto_class_weights parameter is set.
    • Correctly handle NaN values in plot_predictions function.
    • 🛠 Fixed floating point precision drop releated bugs during Multiclass training with lots of objects in our case, bug was triggered while training on 25mln objects on single GPU card.
    • Now average parameter is passed to TotalF1 metric while training on GPU.
    • ➕ Added class labels checks
    • Disallow feature remapping in model predict when there is empty feature names in model.