Skip to content

Commit 4b3477f

Browse files
authoredAug 23, 2023
Add Whisper scripts (#17043)
### Description This PR adds benchmark scripts for Whisper. It is a follow-up to [this PR](#17020) that adds the LLaMA scripts. ### Motivation and Context This PR enables benchmarking Whisper across various configurations.
1 parent 5842144 commit 4b3477f

File tree

5 files changed

+1138
-14
lines changed

5 files changed

+1138
-14
lines changed
 

‎onnxruntime/python/tools/transformers/models/whisper/README.md

+149-10
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,37 @@
22

33
## Exporting Whisper with Beam Search
44

5-
There are two ways to export Whisper with beam search (using Whisper tiny as an example).
5+
There are several ways to export Whisper with beam search (using Whisper tiny as an example).
6+
7+
### Option 1: from convert_to_onnx
68

7-
Option 1: from source
89
```
10+
# From source
911
$ git clone https://github.com/microsoft/onnxruntime
10-
$ cd onnxruntime/onnxruntime/python/tools/transformers/models/whisper
11-
$ python3 convert_to_onnx.py -m openai/whisper-tiny --output whispertiny --use_external_data_format
12+
$ cd onnxruntime/onnxruntime/python/tools/transformers/
13+
$ python3 -m models.whisper.convert_to_onnx -m openai/whisper-tiny --output whispertiny --use_external_data_format
14+
15+
# From wheel
16+
$ python3 -m onnxruntime.transformers.models.whisper.convert_to_onnx -m openai/whisper-tiny --output whispertiny --use_external_data_format
1217
```
1318

14-
Option 2: from wheel
19+
### Option 2: end-to-end model from [Olive](https://github.com/microsoft/Olive/tree/main/examples/whisper)
20+
21+
Please follow the [README instructions](https://github.com/microsoft/Olive/tree/main/examples/whisper#prerequisites) in Olive.
22+
23+
### Option 3: from [Hugging Face Optimum](https://github.com/huggingface/optimum)
24+
25+
Run the following Python code to export:
26+
1527
```
16-
$ python3 -m onnxruntime.transformers.models.whisper.convert_to_onnx -m openai/whisper-tiny --output whispertiny --use_external_data_format
28+
from optimum.onnxruntime import ORTModelForSpeechSeq2Seq
29+
30+
model_name = "openai/whisper-large-v2"
31+
model = ORTModelForSpeechSeq2Seq.from_pretrained(
32+
model_name,
33+
export=True,
34+
)
35+
model.save_pretrained(model_name.split("/")[-1] + "-onnx")
1736
```
1837

1938
## Exporting + Optimizing + Quantizing Whisper with Beam Search
@@ -23,7 +42,7 @@ Here are some additional examples for exporting Whisper with beam search.
2342
Export with Forced Decoder Input Ids
2443
```
2544
# From source:
26-
$ python3 convert_to_onnx.py -m openai/whisper-tiny --output whispertiny --use_external_data_format --use_forced_decoder_ids
45+
$ python3 -m models.whisper.convert_to_onnx -m openai/whisper-tiny --output whispertiny --use_external_data_format --use_forced_decoder_ids
2746
2847
# From wheel:
2948
$ python3 -m onnxruntime.transformers.models.whisper.convert_to_onnx -m openai/whisper-tiny --output whispertiny --use_external_data_format --use_forced_decoder_ids
@@ -32,7 +51,7 @@ $ python3 -m onnxruntime.transformers.models.whisper.convert_to_onnx -m openai/w
3251
Export + Optimize for FP32
3352
```
3453
# From source:
35-
$ python3 convert_to_onnx.py -m openai/whisper-tiny --output whispertiny --use_external_data_format --optimize_onnx --precision fp32
54+
$ python3 -m models.whisper.convert_to_onnx -m openai/whisper-tiny --output whispertiny --use_external_data_format --optimize_onnx --precision fp32
3655
3756
# From wheel:
3857
$ python3 -m onnxruntime.transformers.models.whisper.convert_to_onnx -m openai/whisper-tiny --output whispertiny --use_external_data_format --optimize_onnx --precision fp32
@@ -41,7 +60,7 @@ $ python3 -m onnxruntime.transformers.models.whisper.convert_to_onnx -m openai/w
4160
Export + Optimize for FP16 and GPU
4261
```
4362
# From source:
44-
$ python3 convert_to_onnx.py -m openai/whisper-tiny --output whispertiny --use_external_data_format --optimize_onnx --precision fp16 --use_gpu --provider cuda
63+
$ python3 -m models.whisper.convert_to_onnx -m openai/whisper-tiny --output whispertiny --use_external_data_format --optimize_onnx --precision fp16 --use_gpu --provider cuda
4564
4665
# From wheel:
4766
$ python3 -m onnxruntime.transformers.models.whisper.convert_to_onnx -m openai/whisper-tiny --output whispertiny --use_external_data_format --optimize_onnx --precision fp16 --use_gpu --provider cuda
@@ -50,8 +69,128 @@ $ python3 -m onnxruntime.transformers.models.whisper.convert_to_onnx -m openai/w
5069
Export + Quantize for INT8
5170
```
5271
# From source:
53-
$ python3 convert_to_onnx.py -m openai/whisper-tiny --output whispertiny --use_external_data_format --precision int8 --quantize_embedding_layer
72+
$ python3 -m models.whisper.convert_to_onnx -m openai/whisper-tiny --output whispertiny --use_external_data_format --precision int8 --quantize_embedding_layer
5473
5574
# From wheel:
5675
$ python3 -m onnxruntime.transformers.models.whisper.convert_to_onnx -m openai/whisper-tiny --output whispertiny --use_external_data_format --precision int8 --quantize_embedding_layer
5776
```
77+
78+
## Benchmark Whisper
79+
80+
Here are some examples of how you can benchmark Whisper across various end-to-end (E2E) implementations.
81+
82+
Note: In the below examples, `PyTorch` refers to running in PyTorch without `torch.compile` and `PyTorch 2.0` refers to running in PyTorch with `torch.compile`.
83+
84+
### Variants
85+
86+
1. PyTorch (without `torch.compile`), FP32
87+
```
88+
python3 -m models.whisper.benchmark \
89+
--benchmark-type hf-pt \
90+
--audio-path 1272-141231-0002.mp3 \
91+
--model-name openai/whisper-large-v2 \
92+
--precision fp32 \
93+
--device cpu
94+
```
95+
96+
2. PyTorch 2.0 (with `torch.compile`), FP16
97+
```
98+
python3 -m models.whisper.benchmark \
99+
--benchmark-type hf-pt2 \
100+
--audio-path 1272-141231-0002.mp3 \
101+
--model-name openai/whisper-large-v2 \
102+
--precision fp16 \
103+
--device cuda
104+
```
105+
106+
3. Optimum + ONNX Runtime, FP32, export via Optimum
107+
```
108+
python3 -m models.whisper.benchmark \
109+
--benchmark-type hf-ort \
110+
--audio-path 1272-141231-0002.mp3 \
111+
--model-name openai/whisper-large-v2 \
112+
--hf-ort-model-path ./whisper-large-v2-onnx/ \
113+
--precision fp32 \
114+
--device cpu
115+
```
116+
117+
4. ONNX Runtime, FP32, export via Olive or convert_to_onnx
118+
```
119+
python3 -m models.whisper.benchmark \
120+
--benchmark-type ort \
121+
--audio-path 1272-141231-0002.mp3 \
122+
--model-name openai/whisper-large-v2 \
123+
--ort-model-path ./wlarge-fp32/whisper-large-v2_beamsearch.onnx \
124+
--precision fp32 \
125+
--device cpu
126+
```
127+
128+
5. ONNX Runtime, FP16, export via Olive or convert_to_onnx
129+
```
130+
python3 -m models.whisper.benchmark \
131+
--benchmark-type ort \
132+
--audio-path 1272-141231-0002.mp3 \
133+
--model-name openai/whisper-large-v2 \
134+
--ort-model-path ./wlarge-fp32/whisper-large_all.onnx \
135+
--precision fp16 \
136+
--device cuda
137+
```
138+
139+
6. ONNX Runtime, INT8, export via Olive or convert_to_onnx
140+
```
141+
python3 -m models.whisper.benchmark \
142+
--benchmark-type ort \
143+
--audio-path 1272-141231-0002.mp3 \
144+
--model-name openai/whisper-large-v2 \
145+
--ort-model-path ./wlarge-fp32/whisper-large-v2_all.onnx \
146+
--precision fp32 \
147+
--device cpu
148+
```
149+
150+
You can profile a variant by adding the `--profile` flag.
151+
152+
### Benchmark All
153+
154+
You can use `benchmark_all.py` to benchmark across various platforms and automatically store the results in a CSV file. Here is an example.
155+
156+
```
157+
python3 -m models.whisper.benchmark_all \
158+
--audio-path ./whisper-test-audios/ \
159+
--hf-ort-model-path ./whisper-large-v2-onnx/ \
160+
--ort-model-path ./wlarge-fp32/whisper-large-v2_all.onnx \
161+
--model-name openai/whisper-large-v2 \
162+
--precision fp32 \
163+
--device cpu
164+
```
165+
166+
### Benchmarking on NVIDIA A100
167+
168+
Here is a benchmark for an MP3 file with 20.7s of audio.
169+
170+
#### FP16
171+
172+
| Engine | Size | Per-Token Latency | Real-Time Factor |
173+
| ------------- | -------- | ----------------- | ---------------- |
174+
| PyTorch | Tiny | 4.697 ms/token | 0.004697 |
175+
| PyTorch 2.0 | Tiny | 3.406 ms/token | 0.003406 |
176+
| ONNX Runtime | Tiny | 0.746 ms/token | 0.000746 |
177+
| PyTorch | Medium | 17.837 ms/token | 0.017387 |
178+
| PyTorch 2.0 | Medium | 18.124 ms/token | 0.018124 |
179+
| ONNX Runtime | Medium | 3.894 ms/token | 0.003894 |
180+
| PyTorch | Large v2 | 23.470 ms/token | 0.023470 |
181+
| PyTorch 2.0 | Large v2 | 23.146 ms/token | 0.023146 |
182+
| ONNX Runtime | Large v2 | 6.262 ms/token | 0.006262 |
183+
184+
#### FP32
185+
186+
| Engine | Size | Per-Token Latency | Real-Time Factor |
187+
| ------------- | -------- | ----------------- | ---------------- |
188+
| PyTorch | Tiny | 6.220 ms/token | 0.006220 |
189+
| PyTorch 2.0 | Tiny | 3.944 ms/token | 0.003944 |
190+
| ONNX Runtime | Tiny | 1.545 ms/token | 0.001545 |
191+
| PyTorch | Medium | 19.093 ms/token | 0.019093 |
192+
| PyTorch 2.0 | Medium | 20.459 ms/token | 0.020459 |
193+
| ONNX Runtime | Medium | 9.440 ms/token | 0.009440 |
194+
| PyTorch | Large v2 | 25.844 ms/token | 0.025844 |
195+
| PyTorch 2.0 | Large v2 | 26.397 ms/token | 0.026397 |
196+
| ONNX Runtime | Large v2 | 7.492 ms/token | 0.007492 |

‎onnxruntime/python/tools/transformers/models/whisper/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# Copyright (c) Microsoft Corporation. All rights reserved.
33
# Licensed under the MIT License.
44
# --------------------------------------------------------------------------
5-
import os.path
5+
import os
66
import sys
77

88
sys.path.append(os.path.dirname(__file__))

0 commit comments

Comments
 (0)
Please sign in to comment.