MeTA v1.3.7 Release Notes
-
๐ Bug fixes
- ๐ Fix inconsistent behavior of
utf::segmenter
(and thusicu_tokenizer
) for different locales. Thanks @CanoeFZH and @tng-konrad for helping debug this!
โจ Enhancements
- ๐ Allow for specifying the language and country for locale generation in
setting up
utf::segmenter
(and thusicu_tokenizer
) - ๐ Allow for suppression of
<s>
and</s>
tags withinicu_tokenizer
, mostly useful for information retrieval experiments with unigram words. Thanks @husseinhazimeh for the suggestion! - โ Add a
default-unigram-chain
filter chain preset which is suitable for information retrieval experiments using unigram words. Thanks @husseinhazimeh for the suggestion!
- ๐ Fix inconsistent behavior of