[Performance] Require Advance Profiling when running with DmlExecutionProvuder #24306
Labels
ep:DML
issues related to the DirectML execution provider
performance
issues related to performance regressions
stale
issues that have not been addressed in a while; categorized by a bot
Describe the issue
When I try to enable profiling with Dml EP, I see this
[{"cat" : "Node","pid" :27840,"tid" :1980,"dur" :378,"ts" :1260840,"ph" : "X","name" :"DmlFusedNode_0_41_kernel_time"}, {"cat" : "Node","pid" :27840,"tid" :1980,"dur" :150,"ts" :1261232,"ph" : "X","name" :"DmlFusedNode_4_63_kernel_time"}, {"cat" : "Node","pid" :27840,"tid" :1980,"dur" :193,"ts" :1261392,"ph" : "X","name" :"DmlFusedNode_3_62_kernel_time"}, {"cat" : "Node","pid" :27840,"tid" :1980,"dur" :661,"ts" :1261591,"ph" : "X","name" :"/0/0.0/STFT_kernel_time"}, {"cat" : "Node","pid" :27840,"tid" :1980,"dur" :541,"ts" :1262265,"ph" : "X","name" :"DmlFusedNode_1_43_kernel_time"}, {"cat" : "Node","pid" :27840,"tid" :1980,"dur" :108,"ts" :1262815,"ph" : "X","name" :"/1/prenet/prenet.1/lstm/LSTM_kernel_time"}, {"cat" : "Node","pid" :27840,"tid" :1980,"dur" :8298,"ts" :1262929,"ph" : "X","name" :"DmlFusedNode_2_45_kernel_time"}, {"cat" : "Session","pid" :27840,"tid" :1980,"dur" :10417,"ts" :1260828,"ph" : "X","name" :"SequentialExecutor::Execute"}, {"cat" : "Session","pid" :27840,"tid" :1980,"dur" :31038,"ts" :1253564,"ph" : "X","name" :"model_run"}]
As we can see model_run dur is 31038 and SequentialExecutor::Execute dur is 10417.
That means the actual graph runs for 10417 microseconds but overall dur is 31038 out of which 20621 is unexplained. Can I get details about where exactly the rest of time is being spent?
I assume, it is into allocating resources and CPU-GPU memory transfers. How can I get the exact information about this? Once I get the exact info, I would need ideas on how to reduce this extra time being taken.
Thanks, looking for help !
To reproduce
This is our proprietry model, I need help related to profiling
Urgency
Medium
Platform
Windows
OS Version
10.0.26100 Build 26100
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.21.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
DirectML
Execution Provider Library Version
1.15.2.0 [DirectML.dll version]
Model File
No response
Is this a quantized model?
No
The text was updated successfully, but these errors were encountered: