GBM Framework
A unified framework for Gradient Boosting Models with SHAP analysis and system optimization.
GBM Framework is a comprehensive Python package designed to streamline the process of building and evaluating gradient boosting models for binary classification. It provides a consistent API across multiple boosting implementations, automated hyperparameter optimization, and built-in explainability through SHAP analysis.
Project Overview
The purpose of this project was to create an all-in-one solution for students and resource-strapped analysts to efficiently build and evaluate boosted tree models. The framework supports four powerful tree-based ensemble methods, each with unique strengths, while providing a standardized workflow for model training, evaluation, and interpretation.
Gradient Boosting Algorithms
The framework supports these major tree-based ensemble methods:
XGBoost
Excellent on medium-sized datasets with regularization to control overfitting. Handles sparse data well and provides overall high performance.
LightGBM
Very fast on wide datasets with many features. Uses GOSS and EFB techniques for efficient training with low memory usage.
CatBoost
Superior handling of categorical features without preprocessing. Robust against overfitting with excellent performance out-of-the-box.
Random Forest
Good baseline performance with fewer hyperparameters. Less prone to overfitting and provides good predictive uncertainty estimates.
Algorithm Comparison
Algorithm | Very Wide Data | Very Tall Data | Categorical Features | Training Speed | Default Performance |
---|---|---|---|---|---|
XGBoost | Good | Moderate | Requires encoding | Moderate | Very Good |
LightGBM | Excellent | Excellent | Good | Very Fast | Good |
CatBoost | Good | Good | Excellent | Moderate | Excellent |
Random Forest | Moderate | Good | Requires encoding | Fast | Moderate |
Key Features
System Optimization
The framework includes a SystemOptimizer that automatically detects system resources and configures optimal thread counts and memory usage for training and SHAP calculations:
Consistent Training API
All training functions follow a consistent pattern, with algorithm-specific additions:
Integrated SHAP Analysis
The framework provides built-in SHAP value integration for model explainability:
Installation
Project Benefits
- Efficiency: Streamlines the workflow for gradient boosting models with a consistent API
- Resource Optimization: Automatically configures optimal thread counts and memory usage
- Hyperparameter Tuning: Built-in hyperparameter optimization with hyperopt
- Explainability: Integrated SHAP analysis for model interpretation
- Flexibility: Support for multiple boosting algorithms with algorithm-specific optimizations
- Consistency: Standardized evaluation metrics and visualization across different algorithms