Skip to content

Conversation

@dantegd
Copy link

@dantegd dantegd commented Dec 19, 2025

This PR introduces a bulk tree construction API that significantly improves performance when importing scikit-learn RandomForest models into Treelite. In my benchmarks, the new API achieves ~7-10x speedup over the existing node-by-node construction approach of the current sklearn loader.

The current implementation spends significant time in per-node overhead due to:

  • Repeated ModelBuilder method calls for each node
  • Python-C++ boundary crossing overhead accumulating across millions of nodes
  • Memory allocation patterns that don't benefit from bulk operations

This becomes a bottleneck in workflows like cuML's RandomForestClassifier.from_sklearn(), where treelite import time dominates the conversion process.

This PR implements a BulkConstructTree friend function that directly populates the Tree class's internal ContiguousArray members in a single pass, bypassing the ModelBuilder abstraction for sklearn imports.

Initial benchmarks:

Configuration Total Nodes Old API (ms) Bulk API (ms) Speedup
classifier, 50 trees, depth=10 39,844 13.5 1.8 7.45x
classifier, 100 trees, depth=15 351,826 77.3 10.2 7.54x
classifier, 300 trees, depth=20 2,520,062 544.9 60.7 8.98x
regressor, 100 trees, depth=15 978,436 195.6 18.8 10.42x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant