Skip to content

Only enable CUDA language if needed #24256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

gedoensmax
Copy link
Contributor

Description

In my testing it is not needed to even enable the CUDA language in case we are building with CUDA_MINIMAL. If not doing so CUDA toolkit does not have to be installed and fully registered with VS Studio on windows for example.

cc @chilo-ms

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Comment on lines +39 to +42
#ifdef USE_CUDA_MINIMAL
Cuda::RegisterOps(domain);
Cuda::RegisterOps(domain_v2);

#endif
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a memcpy would be used as custom op that should work without needing cvcc during compile. Let me know if that would be an accepted change.

Copy link
Contributor

@chilo-ms chilo-ms Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very clear to me why we don't test Cuda::RegisterOps() in non USE_CUDA_MINIMAL, i.e. normal cuda build now?

@snnn
Copy link
Member

snnn commented Mar 31, 2025

/azp run Big Models, Linux CPU Minimal Build E2E CI Pipeline, Linux QNN CI Pipeline, ONNX Runtime Web CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

Copy link

Azure Pipelines successfully started running 8 pipeline(s).

@chilo-ms
Copy link
Contributor

chilo-ms commented Apr 1, 2025

The minimal CUDA Windows CI has following error:

cuda_provider_factory.obj(0,0): Error LNK2019: unresolved external symbol "void __cdecl onnxruntime::cuda::Explicit_Impl_Cast(struct CUstream_st *,float const *,double *,unsigned __int64)" (?Explicit_Impl_Cast@cuda@onnxruntime@@YAXPEAUCUstream_st@@PEBMPEAN_K@Z) referenced in function "void __cdecl onnxruntime::cuda::Impl_Cast<float,double>(struct CUstream_st *,float const *,double *,unsigned __int64)" (??$Impl_Cast@MN@cuda@onnxruntime@@YAXPEAUCUstream_st@@PEBMPEAN_K@Z)

@gedoensmax
Copy link
Contributor Author

Yeah found the remaining kernel - cast ops for float to double that TRT has registered as fallback.

@chilo-ms
Copy link
Contributor

chilo-ms commented Apr 3, 2025

/azp run Big Models, Linux CPU Minimal Build E2E CI Pipeline, Linux QNN CI Pipeline, ONNX Runtime Web CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

Copy link

Azure Pipelines successfully started running 7 pipeline(s).

@chilo-ms
Copy link
Contributor

chilo-ms commented Apr 3, 2025

Yeah found the remaining kernel - cast ops for float to double that TRT has registered as fallback.

But the cuda::Impl_Cast() being called in TRT EP still needs the corresponding Explicit_Impl_Cast() implementation in unary_elementwise_ops_impl.cu that you excluded from the last commit.

So, it will still end up with linker error in cuda minimal build CI:
Error LNK2019: unresolved external symbol "void __cdecl onnxruntime::cuda::Explicit_Impl_Cast...

@gedoensmax gedoensmax marked this pull request as draft April 16, 2025 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants