Custom Gradient Boosting with PackBoost

PackBoost is a domain-specific gradient boosting algorithm designed to handle constraints not supported by standard libraries.

Gradient boosting decision trees and aggregation of weak learners.

Design Choices

PackBoost was developed for a public data science competition focused on financial markets. Its design reflects this context:

Synchronized ensemble feature sampling
- Ensemble of weak learners for robustness
- Feature synchronization to encourage orthogonality
Era-aware split selection
- Improved robustness across market regimes
Round-forward sample paths
- Enables massive parallelization
- Preserves orthogonality across boosting rounds

Ensemble Feature Synchronization

For a given round, features are never reused across folds. When
split_feature_candidates << total_features, this enforces approximate orthogonality between ensemble members.

Pre-sampled feature schedule per fold and tree depth.

Era-Aware Splitting Criterion

Splits are selected using an era-aware criterion ([reference TBD]). Instead of a single global score, splits are evaluated per era and then aggregated.

Era-level scoring leads to different optimal splits than global criteria.

Shared Tree Paths for Parallel Training

Instead of recursive tree growth, PackBoost reuses tree paths from previous rounds. This non-optimal strategy enables large-scale parallelization.

Reusing paths across rounds enables parallel split evaluation.