error converting to onnx model #24198

geraldstanje · 2025-03-27T03:45:47Z

Describe the issue

hi,

i fine tuned a modernbert classifier and try to convert it to onnx now.
i get the following issue. can someone explain what are all these issues and how to fix them?
also i see: -[x] values not close enough, max diff: 0.004712104797363281 (atol: 0.0001) - does that mean i will get wrong output?

cc @tianleiwu @ms1design @mszhanyi

logs:

optimum-cli export onnx --model ModernBERT-domain-classifier-save --task text-classification --device cuda --opset 14 ModernBERT-domain-classifier-save-onnx
2025-03-27 03:30:28.962170504 [W:onnxruntime:, transformer_memcpy.cc:83 ApplyImpl] 50 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2025-03-27 03:30:28.973787050 [W:onnxruntime:, session_state.cc:1263 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2025-03-27 03:30:28.973816140 [W:onnxruntime:, session_state.cc:1265 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2025-03-27 03:30:34.834608989 [W:onnxruntime:, transformer_memcpy.cc:83 ApplyImpl] 22 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2025-03-27 03:30:34.844736863 [W:onnxruntime:, session_state.cc:1263 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2025-03-27 03:30:34.844761063 [W:onnxruntime:, session_state.cc:1265 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:140: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
  warnings.warn(
                -[x] values not close enough, max diff: 0.004712104797363281 (atol: 0.0001)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 0.0001:
- logits: max diff = 0.004712104797363281.
 The exported model was saved at: ModernBERT-domain-classifier-save-onnx
[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

To reproduce

optimum-cli export onnx --model ModernBERT-domain-classifier-save --task text-classification --device cuda --opset 14 ModernBERT-domain-classifier-save-onnx

more infos:

pip list
Package                   Version
------------------------- --------------
absl-py                   2.1.0
accelerate                1.5.2
aiohappyeyeballs          2.5.0
aiohttp                   3.11.13
aiosignal                 1.3.2
annotated-types           0.7.0
antlr4-python3-runtime    4.9.3
anyio                     4.8.0
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
arrow                     1.3.0
asttokens                 3.0.0
async-lru                 2.0.4
async-timeout             5.0.1
attrs                     23.2.0
babel                     2.17.0
backoff                   2.2.1
beautifulsoup4            4.13.3
bleach                    6.2.0
boto3                     1.37.9
botocore                  1.37.9
Brotli                    1.1.0
cachetools                5.5.2
certifi                   2025.1.31
cffi                      1.17.1
charset-normalizer        3.4.1
click                     8.1.8
cloudpickle               3.1.1
coloredlogs               15.0.1
comm                      0.2.2
contourpy                 1.3.1
cycler                    0.12.1
datasets                  3.4.1
debugpy                   1.8.13
decorator                 5.2.1
defusedxml                0.7.1
dill                      0.3.9
docker                    7.1.0
einops                    0.8.1
exceptiongroup            1.2.2
executing                 2.2.0
fastapi                   0.115.11
fastjsonschema            2.21.1
filelock                  3.17.0
flatbuffers               25.2.10
fonttools                 4.56.0
fqdn                      1.5.1
frozenlist                1.5.0
fsspec                    2024.12.0
gevent                    24.11.1
geventhttpclient          2.3.3
google-auth               2.38.0
google-auth-oauthlib      1.2.1
google-pasta              0.2.0
greenlet                  3.1.1
grpcio                    1.70.0
h11                       0.14.0
httpcore                  1.0.7
httptools                 0.6.4
httpx                     0.28.1
huggingface-hub           0.29.3
humanfriendly             10.0
idna                      3.10
importlib-metadata        6.11.0
ipykernel                 6.26.0
ipython                   8.17.2
ipywidgets                8.1.1
isoduration               20.11.0
jedi                      0.19.2
Jinja2                    3.1.6
jmespath                  1.0.1
joblib                    1.4.2
json5                     0.10.0
jsonpointer               3.0.0
jsonschema                4.23.0
jsonschema-specifications 2024.10.1
jupyter_client            8.6.3
jupyter_core              5.7.2
jupyter-events            0.12.0
jupyter-lsp               2.2.5
jupyter_server            2.15.0
jupyter_server_terminals  0.5.3
jupyterlab                4.2.0
jupyterlab_pygments       0.3.0
jupyterlab_server         2.27.3
jupyterlab_widgets        3.0.13
kiwisolver                1.4.8
lightning                 2.5.0.post0
lightning-cloud           0.5.70
lightning_sdk             0.2.4
lightning-utilities       0.14.0
litdata                   0.2.32
litserve                  0.2.6
llvmlite                  0.44.0
Markdown                  3.7
markdown-it-py            3.0.0
MarkupSafe                3.0.2
matplotlib                3.8.2
matplotlib-inline         0.1.7
mdurl                     0.1.2
mistune                   3.1.2
mock                      4.0.3
mpmath                    1.3.0
multidict                 6.1.0
multiprocess              0.70.17
nbclient                  0.10.2
nbconvert                 7.16.6
nbformat                  5.10.4
nest-asyncio              1.6.0
networkx                  3.4.2
notebook_shim             0.2.4
numba                     0.61.0
numpy                     1.26.4
nvidia-cublas-cu12        12.1.3.1
nvidia-cuda-cupti-cu12    12.1.105
nvidia-cuda-nvrtc-cu12    12.1.105
nvidia-cuda-runtime-cu12  12.1.105
nvidia-cudnn-cu12         8.9.2.26
nvidia-cufft-cu12         11.0.2.54
nvidia-curand-cu12        10.3.2.106
nvidia-cusolver-cu12      11.4.5.107
nvidia-cusparse-cu12      12.1.0.106
nvidia-nccl-cu12          2.19.3
nvidia-nvjitlink-cu12     12.8.93
nvidia-nvtx-cu12          12.1.105
oauthlib                  3.2.2
omegaconf                 2.3.0
onnx                      1.17.0
onnxruntime-gpu           1.21.0
optimum                   1.24.0
overrides                 7.7.0
packaging                 24.2
pandas                    2.1.4
pandocfilters             1.5.1
parso                     0.8.4
pathos                    0.3.3
pexpect                   4.9.0
pillow                    11.1.0
pip                       25.0.1
platformdirs              4.3.6
pox                       0.3.5
ppft                      1.7.6.9
prometheus_client         0.21.1
prompt_toolkit            3.0.50
propcache                 0.3.0
protobuf                  4.23.4
psutil                    7.0.0
ptyprocess                0.7.0
pure_eval                 0.2.3
pyarrow                   19.0.1
pyasn1                    0.6.1
pyasn1_modules            0.4.1
pycparser                 2.22
pydantic                  2.10.6
pydantic_core             2.27.2
Pygments                  2.19.1
PyJWT                     2.10.1
pyparsing                 3.2.1
python-dateutil           2.9.0.post0
python-dotenv             1.0.1
python-json-logger        3.3.0
python-multipart          0.0.20
python-rapidjson          1.20
pytorch-lightning         2.5.0.post0
pytz                      2025.1
PyYAML                    6.0.2
pyzmq                     26.2.1
referencing               0.36.2
regex                     2024.11.6
requests                  2.32.3
requests-oauthlib         2.0.0
rfc3339-validator         0.1.4
rfc3986-validator         0.1.1
rich                      13.9.4
rpds-py                   0.23.1
rsa                       4.9
s3transfer                0.11.4
safetensors               0.5.3
sagemaker                 2.242.0
sagemaker-core            1.0.25
schema                    0.7.7
scikit-learn              1.3.2
scipy                     1.11.4
seaborn                   0.13.2
Send2Trash                1.8.3
setuptools                75.8.0
shap                      0.47.1
simple-term-menu          1.6.6
six                       1.17.0
slicer                    0.0.8
smdebug-rulesconfig       1.0.1
sniffio                   1.3.1
soupsieve                 2.6
stack-data                0.6.3
starlette                 0.46.1
sympy                     1.13.3
tblib                     3.0.0
tensorboard               2.15.1
tensorboard-data-server   0.7.2
terminado                 0.18.1
threadpoolctl             3.5.0
tinycss2                  1.4.0
tokenizers                0.21.1
tomli                     2.2.1
torch                     2.2.1+cu121
torchmetrics              1.3.1
torchvision               0.17.1+cu121
tornado                   6.4.2
tqdm                      4.67.1
traitlets                 5.14.3
transformers              4.49.0
triton                    2.2.0
tritonclient              2.55.0
types-python-dateutil     2.9.0.20241206
typing_extensions         4.12.2
tzdata                    2025.1
uri-template              1.3.0
urllib3                   2.3.0
uvicorn                   0.34.0
uvloop                    0.21.0
watchfiles                1.0.4
wcwidth                   0.2.13
webcolors                 24.11.1
webencodings              0.5.1
websocket-client          1.8.0
websockets                15.0.1
Werkzeug                  3.1.3
wget                      3.2
wheel                     0.45.1
widgetsnbextension        4.0.13
xxhash                    3.5.0
yarl                      1.18.3
zipp                      3.21.0
zope.event                5.0
zope.interface            7.2

nvidia-smi
Thu Mar 27 03:42:39 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.230.02             Driver Version: 535.230.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A10G                    Off | 00000000:00:1E.0 Off |                    0 |
|  0%   27C    P8              15W / 300W |      0MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Urgency

No response

Platform

Linux

OS Version

Ubuntu

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

12.2

The text was updated successfully, but these errors were encountered:

tianleiwu · 2025-03-27T04:36:51Z

@geraldstanje,
The max diff 0.0047 is not very large so I guess the onnx model is fine.
You can use your relevance metrics to measure the end-to-end result to verify.

You can also remove --device cuda from command line to use CPU provider to compare.

geraldstanje · 2025-03-27T04:41:08Z

@tianleiwu can you run (model inference) the onnx model generated without --device cuda on the gpu?

here the output without cuda - does that look better and what about the fallback warning?

optimum-cli export onnx --model ModernBERT-domain-classifier-save --task text-classification --opset 14 ModernBERT-domain-
classifier-save-onnx 
Compiling the model with `torch.compile` and using a `torch.cpu` device is not supported. Falling back to non-compiled mode.

also what about the warning about about memcpy? can that be fixed?

50 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.

github-actions · 2025-04-26T15:03:25Z

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

github-actions bot added .NET model:transformer labels Mar 27, 2025

geraldstanje changed the title ~~error creating converting to onnx model~~ error converting to onnx model Mar 27, 2025

github-actions bot added the stale label Apr 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error converting to onnx model #24198

error converting to onnx model #24198

geraldstanje commented Mar 27, 2025 •

edited

Loading

tianleiwu commented Mar 27, 2025

geraldstanje commented Mar 27, 2025 •

edited

Loading

github-actions bot commented Apr 26, 2025

error converting to onnx model #24198

error converting to onnx model #24198

Comments

geraldstanje commented Mar 27, 2025 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

tianleiwu commented Mar 27, 2025

geraldstanje commented Mar 27, 2025 • edited Loading

github-actions bot commented Apr 26, 2025

geraldstanje commented Mar 27, 2025 •

edited

Loading

geraldstanje commented Mar 27, 2025 •

edited

Loading