-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Only enable CUDA language if needed #24256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Only enable CUDA language if needed #24256
Conversation
#ifdef USE_CUDA_MINIMAL | ||
Cuda::RegisterOps(domain); | ||
Cuda::RegisterOps(domain_v2); | ||
|
||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a memcpy would be used as custom op that should work without needing cvcc during compile. Let me know if that would be an accepted change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very clear to me why we don't test Cuda::RegisterOps() in non USE_CUDA_MINIMAL, i.e. normal cuda build now?
/azp run Big Models, Linux CPU Minimal Build E2E CI Pipeline, Linux QNN CI Pipeline, ONNX Runtime Web CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline |
Azure Pipelines successfully started running 8 pipeline(s). |
The minimal CUDA Windows CI has following error: cuda_provider_factory.obj(0,0): Error LNK2019: unresolved external symbol "void __cdecl onnxruntime::cuda::Explicit_Impl_Cast(struct CUstream_st *,float const *,double *,unsigned __int64)" (?Explicit_Impl_Cast@cuda@onnxruntime@@YAXPEAUCUstream_st@@PEBMPEAN_K@Z) referenced in function "void __cdecl onnxruntime::cuda::Impl_Cast<float,double>(struct CUstream_st *,float const *,double *,unsigned __int64)" (??$Impl_Cast@MN@cuda@onnxruntime@@YAXPEAUCUstream_st@@PEBMPEAN_K@Z) |
Yeah found the remaining kernel - cast ops for float to double that TRT has registered as fallback. |
/azp run Big Models, Linux CPU Minimal Build E2E CI Pipeline, Linux QNN CI Pipeline, ONNX Runtime Web CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline |
Azure Pipelines successfully started running 7 pipeline(s). |
But the cuda::Impl_Cast() being called in TRT EP still needs the corresponding Explicit_Impl_Cast() implementation in unary_elementwise_ops_impl.cu that you excluded from the last commit. So, it will still end up with linker error in cuda minimal build CI: |
Description
In my testing it is not needed to even enable the CUDA language in case we are building with CUDA_MINIMAL. If not doing so CUDA toolkit does not have to be installed and fully registered with VS Studio on windows for example.
cc @chilo-ms