Optimize Scikit-learn model loading by adding Bulk Tree Construction API #651

dantegd · 2025-12-19T20:10:47Z

This PR introduces a bulk tree construction API that significantly improves performance when importing scikit-learn RandomForest models into Treelite. In my benchmarks, the new API achieves ~7-10x speedup over the existing node-by-node construction approach of the current sklearn loader.

The current implementation spends significant time in per-node overhead due to:

Repeated ModelBuilder method calls for each node
Python-C++ boundary crossing overhead accumulating across millions of nodes
Memory allocation patterns that don't benefit from bulk operations

This becomes a bottleneck in workflows like cuML's RandomForestClassifier.from_sklearn(), where treelite import time dominates the conversion process.

This PR implements a BulkConstructTree friend function that directly populates the Tree class's internal ContiguousArray members in a single pass, bypassing the ModelBuilder abstraction for sklearn imports.

Initial benchmarks:

Configuration	Total Nodes	Old API (ms)	Bulk API (ms)	Speedup
classifier, 50 trees, depth=10	39,844	13.5	1.8	7.45x
classifier, 100 trees, depth=15	351,826	77.3	10.2	7.54x
classifier, 300 trees, depth=20	2,520,062	544.9	60.7	8.98x
regressor, 100 trees, depth=15	978,436	195.6	18.8	10.42x

FEA Add new optimized scikit-learn loading functionality

6fd15d8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize Scikit-learn model loading by adding Bulk Tree Construction API #651

Optimize Scikit-learn model loading by adding Bulk Tree Construction API #651

Uh oh!

dantegd commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Optimize Scikit-learn model loading by adding Bulk Tree Construction API #651

Are you sure you want to change the base?

Optimize Scikit-learn model loading by adding Bulk Tree Construction API #651

Uh oh!

Conversation

dantegd commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant