xgboost v0.7 Release Notes

Release Date: 2017-12-30 // over 4 years ago
  • 🔄 Changes

    • 🚀 This version represents a major change from the last release (v0.6), which was released one year and half ago.
    • ⚡️ Updated Sklearn API
      • Add compatibility layer for scikit-learn v0.18: sklearn.cross_validation now deprecated
      • Updated to allow use of all XGBoost parameters via **kwargs.
      • Updated nthread to n_jobs and seed to random_state (as per Sklearn convention); nthread and seed are now marked as deprecated
      • Updated to allow choice of Booster (gbtree, gblinear, or dart)
      • XGBRegressor now supports instance weights (specify sample_weight parameter)
      • Pass n_jobs parameter to the DMatrix constructor
      • Add xgb_model parameter to fit method, to allow continuation of training
    • 🔨 Refactored gbm to allow more friendly cache strategy
      • Specialized some prediction routine
    • 📜 Robust DMatrix construction from a sparse matrix
    • Faster consturction of DMatrix from 2D NumPy matrices: elide copies, use of multiple threads
    • 🚚 Automatically remove nan from input data when it is sparse.
      • This can solve some of user reported problem of istart != hist.size
    • 🛠 Fix the single-instance prediction function to obtain correct predictions
    • 🛠 Minor fixes
      • Thread local variable is upgraded so it is automatically freed at thread exit.
      • Fix saving and loading count::poisson models
      • Fix CalcDCG to use base-2 logarithm
      • Messages are now written to stderr instead of stdout
      • Keep built-in evaluations while using customized evaluation functions
      • Use bst_float consistently to minimize type conversion
      • Copy the base margin when slicing DMatrix
      • Evaluation metrics are now saved to the model file
      • Use int32_t explicitly when serializing version
      • In distributed training, synchronize the number of features after loading a data matrix.
    • Migrate to C++11
      • The current master version now requires C++11 enabled compiled(g++4.8 or higher)
    • ⚡️ Predictor interface was factored out (in a manner similar to the updater interface).
    • 👉 Makefile support for Solaris and ARM
    • ✅ Test code coverage using Codecov
    • ➕ Add CPP tests
    • ➕ Add Dockerfile and Jenkinsfile to support continuous integration for GPU code
    • 🆕 New functionality
      • Ability to adjust tree model's statistics to a new dataset without changing tree structures.
      • Ability to extract feature contributions from individual predictions, as described in here and here.
      • Faster, histogram-based tree algorithm (tree_method='hist') .
      • GPU/CUDA accelerated tree algorithms (tree_method='gpu_hist' or 'gpu_exact'), including the GPU-based predictor.
      • Monotonic constraints: when other features are fixed, force the prediction to be monotonic increasing with respect to a certain specified feature.
      • Faster gradient caculation using AVX SIMD
      • Ability to export models in JSON format
      • Support for Tweedie regression
      • Additional dropout options for DART: binomial+1, epsilon
      • Ability to update an existing model in-place: this is useful for many applications, such as determining feature importance
    • 📦 Python package:
      • New parameters:
      • learning_rates in cv()
      • shuffle in mknfold()
      • max_features and show_values in plot_importance()
      • sample_weight in XGBRegressor.fit()
      • Support binary wheel builds
      • Fix MultiIndex detection to support Pandas 0.21.0 and higher
      • Support metrics and evaluation sets whose names contain -
      • Support feature maps when plotting trees
      • Compatibility fix for Python 2.6
      • Call print_evaluation callback at last iteration
      • Use appropriate integer types when calling native code, to prevent truncation and memory error
      • Fix shared library loading on Mac OS X
    • 📦 R package:
      • New parameters:
      • silent in xgb.DMatrix()
      • use_int_id in xgb.model.dt.tree()
      • predcontrib in predict()
      • monotone_constraints in xgb.train()
      • Default value of the save_period parameter in xgboost() changed to NULL (consistent with xgb.train()).
      • It's possible to custom-build the R package with GPU acceleration support.
      • Enable JVM build for Mac OS X and Windows
      • Integration with AppVeyor CI
      • Improved safety for garbage collection
      • Store numeric attributes with higher precision
      • Easier installation for devel version
      • Improved xgb.plot.tree()
      • Various minor fixes to improve user experience and robustness
      • Register native code to pass CRAN check
      • Updated CRAN submission
    • 📦 JVM packages
      • Add Spark pipeline persistence API
      • Fix data persistence: loss evaluation on test data had wrongly used caches for training data.
      • Clean external cache after training
      • Implement early stopping
      • Enable training of multiple models by distinguishing stage IDs
      • Better Spark integration: support RDD / dataframe / dataset, integrate with Spark ML package
      • XGBoost4j now supports ranking task
      • Support training with missing data
      • Refactor JVM package to separate regression and classification models to be consistent with other machine learning libraries
      • Support XGBoost4j compilation on Windows
      • Parameter tuning tool
      • Publish source code for XGBoost4j to maven local repo
      • Scala implementation of the Rabit tracker (drop-in replacement for the Java implementation)
      • Better exception handling for the Rabit tracker
      • Persist num_class, number of classes (for classification task)
      • XGBoostModel now holds BoosterParams
      • libxgboost4j is now part of CMake build
      • Release DMatrix when no longer needed, to conserve memory
      • Expose baseMargin, to allow initialization of boosting with predictions from an external model
      • Support instance weights
      • Use SparkParallelismTracker to prevent jobs from hanging forever
      • Expose train-time evaluation metrics via XGBoostModel.summary
      • Option to specify host-ip explicitly in the Rabit tracker
    • 📚 Documentation
      • Better math notation for gradient boosting
      • Updated build instructions for Mac OS X
      • Template for GitHub issues
      • Add CITATION file for citing XGBoost in scientific writing
      • Fix dropdown menu in xgboost.readthedocs.io
      • Document updater_seq parameter
      • Style fixes for Python documentation
      • Links to additional examples and tutorials
      • Clarify installation requirements
    • 🔄 Changes that break backward compatibility
      • #1519 XGBoost-spark no longer contains APIs for DMatrix; use the public booster interface instead.
      • #2476 XGBoostModel.predict() now has a different signature