Skip to content

error converting to onnx model #24198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
geraldstanje opened this issue Mar 27, 2025 · 3 comments
Open

error converting to onnx model #24198

geraldstanje opened this issue Mar 27, 2025 · 3 comments
Labels
model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. .NET Pull requests that update .net code stale issues that have not been addressed in a while; categorized by a bot

Comments

@geraldstanje
Copy link

geraldstanje commented Mar 27, 2025

Describe the issue

hi,

i fine tuned a modernbert classifier and try to convert it to onnx now.
i get the following issue. can someone explain what are all these issues and how to fix them?
also i see: -[x] values not close enough, max diff: 0.004712104797363281 (atol: 0.0001) - does that mean i will get wrong output?

cc @tianleiwu @ms1design @mszhanyi

logs:

optimum-cli export onnx --model ModernBERT-domain-classifier-save --task text-classification --device cuda --opset 14 ModernBERT-domain-classifier-save-onnx
2025-03-27 03:30:28.962170504 [W:onnxruntime:, transformer_memcpy.cc:83 ApplyImpl] 50 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2025-03-27 03:30:28.973787050 [W:onnxruntime:, session_state.cc:1263 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2025-03-27 03:30:28.973816140 [W:onnxruntime:, session_state.cc:1265 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2025-03-27 03:30:34.834608989 [W:onnxruntime:, transformer_memcpy.cc:83 ApplyImpl] 22 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2025-03-27 03:30:34.844736863 [W:onnxruntime:, session_state.cc:1263 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2025-03-27 03:30:34.844761063 [W:onnxruntime:, session_state.cc:1265 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:140: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
  warnings.warn(
                -[x] values not close enough, max diff: 0.004712104797363281 (atol: 0.0001)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 0.0001:
- logits: max diff = 0.004712104797363281.
 The exported model was saved at: ModernBERT-domain-classifier-save-onnx
[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

To reproduce

optimum-cli export onnx --model ModernBERT-domain-classifier-save --task text-classification --device cuda --opset 14 ModernBERT-domain-classifier-save-onnx

more infos:

pip list
Package                   Version
------------------------- --------------
absl-py                   2.1.0
accelerate                1.5.2
aiohappyeyeballs          2.5.0
aiohttp                   3.11.13
aiosignal                 1.3.2
annotated-types           0.7.0
antlr4-python3-runtime    4.9.3
anyio                     4.8.0
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
arrow                     1.3.0
asttokens                 3.0.0
async-lru                 2.0.4
async-timeout             5.0.1
attrs                     23.2.0
babel                     2.17.0
backoff                   2.2.1
beautifulsoup4            4.13.3
bleach                    6.2.0
boto3                     1.37.9
botocore                  1.37.9
Brotli                    1.1.0
cachetools                5.5.2
certifi                   2025.1.31
cffi                      1.17.1
charset-normalizer        3.4.1
click                     8.1.8
cloudpickle               3.1.1
coloredlogs               15.0.1
comm                      0.2.2
contourpy                 1.3.1
cycler                    0.12.1
datasets                  3.4.1
debugpy                   1.8.13
decorator                 5.2.1
defusedxml                0.7.1
dill                      0.3.9
docker                    7.1.0
einops                    0.8.1
exceptiongroup            1.2.2
executing                 2.2.0
fastapi                   0.115.11
fastjsonschema            2.21.1
filelock                  3.17.0
flatbuffers               25.2.10
fonttools                 4.56.0
fqdn                      1.5.1
frozenlist                1.5.0
fsspec                    2024.12.0
gevent                    24.11.1
geventhttpclient          2.3.3
google-auth               2.38.0
google-auth-oauthlib      1.2.1
google-pasta              0.2.0
greenlet                  3.1.1
grpcio                    1.70.0
h11                       0.14.0
httpcore                  1.0.7
httptools                 0.6.4
httpx                     0.28.1
huggingface-hub           0.29.3
humanfriendly             10.0
idna                      3.10
importlib-metadata        6.11.0
ipykernel                 6.26.0
ipython                   8.17.2
ipywidgets                8.1.1
isoduration               20.11.0
jedi                      0.19.2
Jinja2                    3.1.6
jmespath                  1.0.1
joblib                    1.4.2
json5                     0.10.0
jsonpointer               3.0.0
jsonschema                4.23.0
jsonschema-specifications 2024.10.1
jupyter_client            8.6.3
jupyter_core              5.7.2
jupyter-events            0.12.0
jupyter-lsp               2.2.5
jupyter_server            2.15.0
jupyter_server_terminals  0.5.3
jupyterlab                4.2.0
jupyterlab_pygments       0.3.0
jupyterlab_server         2.27.3
jupyterlab_widgets        3.0.13
kiwisolver                1.4.8
lightning                 2.5.0.post0
lightning-cloud           0.5.70
lightning_sdk             0.2.4
lightning-utilities       0.14.0
litdata                   0.2.32
litserve                  0.2.6
llvmlite                  0.44.0
Markdown                  3.7
markdown-it-py            3.0.0
MarkupSafe                3.0.2
matplotlib                3.8.2
matplotlib-inline         0.1.7
mdurl                     0.1.2
mistune                   3.1.2
mock                      4.0.3
mpmath                    1.3.0
multidict                 6.1.0
multiprocess              0.70.17
nbclient                  0.10.2
nbconvert                 7.16.6
nbformat                  5.10.4
nest-asyncio              1.6.0
networkx                  3.4.2
notebook_shim             0.2.4
numba                     0.61.0
numpy                     1.26.4
nvidia-cublas-cu12        12.1.3.1
nvidia-cuda-cupti-cu12    12.1.105
nvidia-cuda-nvrtc-cu12    12.1.105
nvidia-cuda-runtime-cu12  12.1.105
nvidia-cudnn-cu12         8.9.2.26
nvidia-cufft-cu12         11.0.2.54
nvidia-curand-cu12        10.3.2.106
nvidia-cusolver-cu12      11.4.5.107
nvidia-cusparse-cu12      12.1.0.106
nvidia-nccl-cu12          2.19.3
nvidia-nvjitlink-cu12     12.8.93
nvidia-nvtx-cu12          12.1.105
oauthlib                  3.2.2
omegaconf                 2.3.0
onnx                      1.17.0
onnxruntime-gpu           1.21.0
optimum                   1.24.0
overrides                 7.7.0
packaging                 24.2
pandas                    2.1.4
pandocfilters             1.5.1
parso                     0.8.4
pathos                    0.3.3
pexpect                   4.9.0
pillow                    11.1.0
pip                       25.0.1
platformdirs              4.3.6
pox                       0.3.5
ppft                      1.7.6.9
prometheus_client         0.21.1
prompt_toolkit            3.0.50
propcache                 0.3.0
protobuf                  4.23.4
psutil                    7.0.0
ptyprocess                0.7.0
pure_eval                 0.2.3
pyarrow                   19.0.1
pyasn1                    0.6.1
pyasn1_modules            0.4.1
pycparser                 2.22
pydantic                  2.10.6
pydantic_core             2.27.2
Pygments                  2.19.1
PyJWT                     2.10.1
pyparsing                 3.2.1
python-dateutil           2.9.0.post0
python-dotenv             1.0.1
python-json-logger        3.3.0
python-multipart          0.0.20
python-rapidjson          1.20
pytorch-lightning         2.5.0.post0
pytz                      2025.1
PyYAML                    6.0.2
pyzmq                     26.2.1
referencing               0.36.2
regex                     2024.11.6
requests                  2.32.3
requests-oauthlib         2.0.0
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rich                      13.9.4
rpds-py                   0.23.1
rsa                       4.9
s3transfer                0.11.4
safetensors               0.5.3
sagemaker                 2.242.0
sagemaker-core            1.0.25
schema                    0.7.7
scikit-learn              1.3.2
scipy                     1.11.4
seaborn                   0.13.2
Send2Trash                1.8.3
setuptools                75.8.0
shap                      0.47.1
simple-term-menu          1.6.6
six                       1.17.0
slicer                    0.0.8
smdebug-rulesconfig       1.0.1
sniffio                   1.3.1
soupsieve                 2.6
stack-data                0.6.3
starlette                 0.46.1
sympy                     1.13.3
tblib                     3.0.0
tensorboard               2.15.1
tensorboard-data-server   0.7.2
terminado                 0.18.1
threadpoolctl             3.5.0
tinycss2                  1.4.0
tokenizers                0.21.1
tomli                     2.2.1
torch                     2.2.1+cu121
torchmetrics              1.3.1
torchvision               0.17.1+cu121
tornado                   6.4.2
tqdm                      4.67.1
traitlets                 5.14.3
transformers              4.49.0
triton                    2.2.0
tritonclient              2.55.0
types-python-dateutil     2.9.0.20241206
typing_extensions         4.12.2
tzdata                    2025.1
uri-template              1.3.0
urllib3                   2.3.0
uvicorn                   0.34.0
uvloop                    0.21.0
watchfiles                1.0.4
wcwidth                   0.2.13
webcolors                 24.11.1
webencodings              0.5.1
websocket-client          1.8.0
websockets                15.0.1
Werkzeug                  3.1.3
wget                      3.2
wheel                     0.45.1
widgetsnbextension        4.0.13
xxhash                    3.5.0
yarl                      1.18.3
zipp                      3.21.0
zope.event                5.0
zope.interface            7.2

nvidia-smi
Thu Mar 27 03:42:39 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.230.02             Driver Version: 535.230.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A10G                    Off | 00000000:00:1E.0 Off |                    0 |
|  0%   27C    P8              15W / 300W |      0MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Urgency

No response

Platform

Linux

OS Version

Ubuntu

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

12.2

@github-actions github-actions bot added .NET Pull requests that update .net code model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. labels Mar 27, 2025
@geraldstanje geraldstanje changed the title error creating converting to onnx model error converting to onnx model Mar 27, 2025
@tianleiwu
Copy link
Contributor

@geraldstanje,
The max diff 0.0047 is not very large so I guess the onnx model is fine.
You can use your relevance metrics to measure the end-to-end result to verify.

You can also remove --device cuda from command line to use CPU provider to compare.

@geraldstanje
Copy link
Author

geraldstanje commented Mar 27, 2025

@tianleiwu can you run (model inference) the onnx model generated without --device cuda on the gpu?

here the output without cuda - does that look better and what about the fallback warning?

optimum-cli export onnx --model ModernBERT-domain-classifier-save --task text-classification --opset 14 ModernBERT-domain-
classifier-save-onnx 
Compiling the model with `torch.compile` and using a `torch.cpu` device is not supported. Falling back to non-compiled mode.

also what about the warning about about memcpy? can that be fixed?

50 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.

Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Apr 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. .NET Pull requests that update .net code stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

2 participants