MeTA v1.3 Release Notes

  • 🆕 New features

    • ➕ additions to the graph library:
      • myopic search
      • BFS
      • preferential attachment graph generation model (supports node attractiveness from different distributions)
      • betweenness centrality
      • eigenvector centrality
    • ➕ added a new natural language parsing library:
      • parse tree library (visitor-based)
      • shift-reduce constituency parser for generating phrase structure trees
      • reimplementation of evalb metrics for evaluating parsers
      • new filter for Penn Treebank-style normalization
    • ➕ added a greedy averaged Perceptron-based tagger
    • demo application for various basic text processing (profile)
    • 👍 basic iostreams that support gzip compression (if compiled with ZLib support)
    • ➕ added iteration method for stats::multinomial seen events
    • ➕ added expected value and entropy functions to stats namespace
    • ➕ added linear_model: a generic multiclass classifier storage class
    • added gz_corpus: a compressed version of line_corpus
    • ➕ added macros for generating type safe identifiers with user defined literal suffixes
    • ➕ added a persistent stack data structure to meta::util

    ✨ Enhancements

    • ➕ added operator== for util::optional<T>
    • 👍 better CMake support for building the libsvm modules
    • 👍 better CMake support for downloading unit-test data
    • 👌 improved setup guide in README (for OS X, Ubuntu, Arch, and EWS/ENGRIT)
    • 🔨 tree analyzers refactored to use the new parser library (removes dependency on outside toolkits for generating tree files)
    • 🚚 analyzers that are not part of the "core" have been moved into their respective folders (so ngram_pos_analyzer is in src/sequence, tree_analyzer is in src/parser)
    • make_index now checks if the files exist before loading an index, and if they are missing creates a new one (as opposed to just throwing an exception on a nonexistent file)
    • ⬆️ cpptoml upgraded to support TOML v0.4.0
    • ⚠ enable extra warnings (-Wextra) for clang++ and g++

    🐛 Bug fixes

    • 🛠 fix sequence_analyzer::analyze() const when applied to untagged sequences (was throwing when it shouldn't)
    • ensure that the inverted index object is destroyed first before uninverting occurs in the creation of a forward_idnex
    • 🛠 fix bug where icu_tokenizer would output spaces as tokens
    • 🛠 fix bugs where index objects were not destroyed before trying to delete their files in the unit tests
    • 🛠 fix bug in sparse_vector::find() where it would return a non-end iterator when asked to find an element that does not exist