GBM Framework

A unified framework for Gradient Boosting Models with SHAP analysis and system optimization.

GBM Framework is a comprehensive Python package designed to streamline the process of building and evaluating gradient boosting models for binary classification. It provides a consistent API across multiple boosting implementations, automated hyperparameter optimization, and built-in explainability through SHAP analysis.

Project Overview

The purpose of this project was to create an all-in-one solution for students and resource-strapped analysts to efficiently build and evaluate boosted tree models. The framework supports four powerful tree-based ensemble methods, each with unique strengths, while providing a standardized workflow for model training, evaluation, and interpretation.

Gradient Boosting Algorithms

The framework supports these major tree-based ensemble methods:

XGBoost

Excellent on medium-sized datasets with regularization to control overfitting. Handles sparse data well and provides overall high performance.

LightGBM

Very fast on wide datasets with many features. Uses GOSS and EFB techniques for efficient training with low memory usage.

CatBoost

Superior handling of categorical features without preprocessing. Robust against overfitting with excellent performance out-of-the-box.

Random Forest

Good baseline performance with fewer hyperparameters. Less prone to overfitting and provides good predictive uncertainty estimates.

Algorithm Comparison

Algorithm Very Wide Data Very Tall Data Categorical Features Training Speed Default Performance
XGBoost Good Moderate Requires encoding Moderate Very Good
LightGBM Excellent Excellent Good Very Fast Good
CatBoost Good Good Excellent Moderate Excellent
Random Forest Moderate Good Requires encoding Fast Moderate

Key Features

System Optimization

The framework includes a SystemOptimizer that automatically detects system resources and configures optimal thread counts and memory usage for training and SHAP calculations:

from gbmframework.optimizer import SystemOptimizer optimizer = SystemOptimizer( enable_parallel=True, # Whether to enable parallel computation memory_safety=0.8, # Memory safety factor (0.0-1.0) verbose=True # Whether to print optimization information )

Consistent Training API

All training functions follow a consistent pattern, with algorithm-specific additions:

from gbmframework.models import train_xgboost result = train_xgboost( X_train, # Training features (DataFrame or ndarray) y_train, # Training labels (Series or ndarray) X_test, # Test features for evaluation during training y_test, # Test labels for evaluation hyperopt_space=None, # Custom hyperopt search space dictionary (optional) max_evals=50, # Number of hyperparameter evaluations to perform handle_imbalance=False, # Whether to handle class imbalance random_state=42, # Random seed for reproducibility optimizer=None # SystemOptimizer instance (optional) )

Integrated SHAP Analysis

The framework provides built-in SHAP value integration for model explainability:

from gbmframework.shap_utils import generate_shap_values, visualize_shap # Generate SHAP values shap_result = generate_shap_values( model, # Trained model object X, # Feature dataset (typically X_test or a sample) sample_size=None, # Number of samples to use (default: auto-detect) verbose=1, # Verbosity level (0: silent, 1: normal, 2: detailed) optimizer=None # SystemOptimizer instance ) # Visualize SHAP values figure = visualize_shap( shap_result, # Result from generate_shap_values() plot_type='summary', # Plot type: 'summary', 'bar', 'beeswarm', 'waterfall', 'dependence' max_display=20, # Maximum number of features to display plot_size=(12, 8) # Size of the plot in inches )

Installation

# Basic installation pip install gbmframework # With specific boosting libraries pip install gbmframework[xgboost] # With XGBoost pip install gbmframework[lightgbm] # With LightGBM pip install gbmframework[catboost] # With CatBoost pip install gbmframework[shap] # With SHAP for explainability pip install gbmframework[all] # All dependencies

Project Benefits