GBM Framework

A unified framework for Gradient Boosting Models with SHAP analysis and system optimization.

PyPI Package GitHub Repo

GBM Framework is a comprehensive Python package designed to streamline the process of building and evaluating gradient boosting models for binary classification. It provides a consistent API across multiple boosting implementations, automated hyperparameter optimization, and built-in explainability through SHAP analysis.

Project Overview

The purpose of this project was to create an all-in-one solution for students and resource-strapped analysts to efficiently build and evaluate boosted tree models. The framework supports four powerful tree-based ensemble methods, each with unique strengths, while providing a standardized workflow for model training, evaluation, and interpretation.

Gradient Boosting Algorithms

The framework supports these major tree-based ensemble methods:

XGBoost

Excellent on medium-sized datasets with regularization to control overfitting. Handles sparse data well and provides overall high performance.

LightGBM

Very fast on wide datasets with many features. Uses GOSS and EFB techniques for efficient training with low memory usage.

CatBoost

Superior handling of categorical features without preprocessing. Robust against overfitting with excellent performance out-of-the-box.

Random Forest

Good baseline performance with fewer hyperparameters. Less prone to overfitting and provides good predictive uncertainty estimates.

Algorithm Comparison

Algorithm	Very Wide Data	Very Tall Data	Categorical Features	Training Speed	Default Performance
XGBoost	Good	Moderate	Requires encoding	Moderate	Very Good
LightGBM	Excellent	Excellent	Good	Very Fast	Good
CatBoost	Good	Good	Excellent	Moderate	Excellent
Random Forest	Moderate	Good	Requires encoding	Fast	Moderate

Key Features

System Optimization

The framework includes a SystemOptimizer that automatically detects system resources and configures optimal thread counts and memory usage for training and SHAP calculations:

from gbmframework.optimizer import SystemOptimizer

optimizer = SystemOptimizer(
    enable_parallel=True,  # Whether to enable parallel computation
    memory_safety=0.8,     # Memory safety factor (0.0-1.0)
    verbose=True           # Whether to print optimization information
)

Consistent Training API

All training functions follow a consistent pattern, with algorithm-specific additions:

from gbmframework.models import train_xgboost

result = train_xgboost( X_train, # Training features (DataFrame or ndarray)
y_train, # Training labels (Series or ndarray)
X_test, # Test features for evaluation during training
y_test, # Test labels for evaluation
hyperopt_space=None, # Custom hyperopt search space dictionary (optional)
max_evals=50, # Number of hyperparameter evaluations to perform
handle_imbalance=False, # Whether to handle class imbalance
random_state=42, # Random seed for reproducibility
optimizer=None # SystemOptimizer instance (optional) )

Integrated SHAP Analysis

The framework provides built-in SHAP value integration for model explainability:

from gbmframework.shap_utils import generate_shap_values, visualize_shap # Generate SHAP values

shap_result = generate_shap_values( model, # Trained model object
X, # Feature dataset (typically X_test or a sample)
sample_size=None, # Number of samples to use (default: auto-detect)
verbose=1, # Verbosity level (0: silent, 1: normal, 2: detailed)
optimizer=None # SystemOptimizer instance ) # Visualize SHAP values

figure = visualize_shap( shap_result, # Result from generate_shap_values()
plot_type='summary', # Plot type: 'summary', 'bar', 'beeswarm', 'waterfall', 'dependence'
max_display=20, # Maximum number of features to display
plot_size=(12, 8) # Size of the plot in inches )

Installation

# Basic installation
pip install gbmframework

# With specific boosting libraries
pip install gbmframework[xgboost]    # With XGBoost
pip install gbmframework[lightgbm]   # With LightGBM
pip install gbmframework[catboost]   # With CatBoost
pip install gbmframework[shap]       # With SHAP for explainability
pip install gbmframework[all]        # All dependencies

Project Benefits

Efficiency: Streamlines the workflow for gradient boosting models with a consistent API
Resource Optimization: Automatically configures optimal thread counts and memory usage
Hyperparameter Tuning: Built-in hyperparameter optimization with hyperopt
Explainability: Integrated SHAP analysis for model interpretation
Flexibility: Support for multiple boosting algorithms with algorithm-specific optimizations
Consistency: Standardized evaluation metrics and visualization across different algorithms