catboost v0.20 Release Notes
Release Date: 2019-11-28 // over 4 years ago-
๐ New submodule for text processing!
It contains two classes to help you make text features ready for training:- Tokenizer -- use this class to split text into tokens (automatic lowercase and punctuation removal)
- Dictionary -- with this class you create a dictionary which maps tokens to numeric identifiers. You then use these identifiers as new features.
๐ New features:
- Enabled
boost_from_average
forMAPE
loss function
๐ Bug fixes:
- ๐ Fixed
Pool
creation frompandas.DataFrame
with discontinuous columns, #1079 - ๐ Fixed
standalone_evaluator
, PR #1083
Speedups:
- ๐ฆ Huge speedup of preprocessing in python-package for datasets with many samples (>10 mln)
๐ We also release precompiled packages for Python 3.8