-
Notifications
You must be signed in to change notification settings - Fork 13
"--thread=n" switch does not seem to work #293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @PapperYZ, can u once try rerun the command with the tag |
Hi @arjunsuresh , we have a custom ASIC which has the similar core structure as the 4 small cores in the orangepi plus system, so we attempt to do a comparison by turning off the 4 big cores in orangepi and run the exact the same benchmark to see the custom ASIC delivers the same benchmark performance. Do you see a good way of supporting that? |
Hi @sujik18 , I tried using
|
Hi @PapperYZ, I believe the script has failed due to an out-of-index error, which might have happened because of main memory constraints. So, running the script with a smaller batch size might help. Here is the updated command. It will do a quick performance test run on a short dataset (500 images): mlcr run-mlperf,inference,_find-performance,_short,_r5.0-dev \
--model=resnet50 \
--implementation=reference \
--framework=onnxruntime \
--category=edge \
--scenario=Offline \
--execution_mode=test \
--device=cpu \
--quiet \
--batch_size=8 \
--threads=4 \
--test_query_count=1000 |
"@arjunsuresh , would you help to make some adjust so that --threads=4 applies globally? " Unfortunately, that's out of the scope of our automations. The automation code is only passing the inputs to the respective implementations and if the implementations are not supporting @sujik18 Did either "--batch_size=8" or "--threads=4" work for you for ResNet50 onnxruntime implementation? |
@arjunsuresh Yes, it did work for me. As my system only has 8GB ram, and if I try to run the script with the default values for threads and batch_size, it usually fails with an out of index error. |
@sujik18 What was the perf difference between running with batch size 1 and batch size 8? And to see if the threads=4 worked, you can monitor the output of htop. |
@arjunsuresh I haven't tried running the perf with batch size 1, I sticked to batch size 8 instead of the default value 32, for all perf evaluation. |
This could be for the preprocessing stage. You need to see the htop during the inference to confirm this. |
For batch_size, yes. The code for resnet50 is handling the batching here: |
Hi @sujik18 , I tried your recommended command above (noted it is still running with 4 cores), below are the error logs, please kindly take a look and advise... appreciate it!
|
Hi @PapperYZ , can you please share your system details, like OS, Main memory size, available disk storage, and as you have mentioned earlier that your AISC is similar to the Orange Pi system, I assume it only supports single thread execution. Also as @arjunsuresh mentioned initially, the threads parameter value is not taken into account, as I tried a test run with threads=4 tag but htop stats made it clear that the system was still using max threads (in mine case it was 12) which is available to the system. |
Hi @sujik18 And I am using Ubuntu 22.04, I turned off the 4 big cores 4-7 using the commands below to disable them (I guess you can turn off some of your 12 cores to mimic what I have done here)
|
Hi @sujik18, batch_size=1 does not help with the error, see below bolded assertion msg, do you have thoughts about it? INFO:main:Namespace(dataset='imagenet', dataset_path='/home/orangepi/MLC/repos/local/cache/get-preprocessed-dataset-imagenet_236f9d1d', dataset_list='/home/orangepi/MLC/repos/local/cache/extract-file_c7f93c2e/val.txt', data_format=None, profile='resnet50-onnxruntime', scenario='Offline', max_batchsize=1, model='/home/orangepi/MLC/repos/local/cache/download-file_f6ae226d/resnet50_v1.onnx', output='/home/orangepi/MLC/repos/local/cache/get-mlperf-inference-results-dir_c7cc8ae6/test_results/orangepi5plus-reference-cpu-onnxruntime-v1.21.0-default_config/resnet50/offline/performance/run_1', inputs=None, outputs=['ArgMax:0'], backend='onnxruntime', device=None, model_name='resnet50', threads=4, qps=None, cache=0, cache_dir='/home/orangepi/MLC/repos/local/cache/get-preprocessed-dataset-imagenet_236f9d1d', preprocessed_dir=None, use_preprocessed_dataset=True, accuracy=False, find_peak_performance=False, debug=False, user_conf='/home/orangepi/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/316769885346468f8e8ff829aa739448.conf', audit_conf='audit.config', time=None, count=None, performance_sample_count=None, max_latency=None, samples_per_query=8) |
@PapperYZ , lowering the batch_size is not required as your system has enough RAM to support the default batch_size value, but was curious to know how the main memory size is 31G, isn't it should have been 32G? @arjunsuresh Can you please look into this error |
Hi @sujik18 , remove cached dataset does not seem to help... it generates the same error as shown above. would you be able to turn off some of your cores using the commands I provided above and see if you see the same error? One more data point, when I turned those 4 A76 cores back online, the error immediately disappeared. |
Hi @PapperYZ, I will try running the script after disabling half of my system cores tonight or tomorrow. In the meantime, could you please check whether you monitored the number of threads being used both after running the script and before the script failure, as I had previously asked? You can do this by observing the number of threads in an idle state, rerunning the script with the rerun tag to avoid using the cached state, and then noting how many new threads are being created during the process. |
Hi @sujik18, if you see my htop screen I shared above, once I disable the 4 A76 cores, htop will only show 4 cores to me, so I am not sure how would I monitor threads... though I can clearly see that preprocessing is using 4 threads and works fine... the error only showed up during the final step. |
@sujik18 please see below htop state while the preprocess is ongoing |
@sujik18 @arjunsuresh , I believe the issue is understood with Grok's help, I submitted a ticket in here: |
Hi @PapperYZ, I tried running the benchmarking test after disabling one of the CPU threads (my processor is a 4600H with 6 physical cores and 12 threads), but I was unable to reproduce this error. My test run completed successfully. Here is the screenshot of htop during the execution time. |
Thank you for the test, it would be great to disable one cluster instead of just one core... |
I am trying to use "--thread=4" to control the number of cores involved during the benchmark, and it does not seem to work and caused the failure. BTW, I have manually disabled 4 cores, and only leave 4 core available on my orangepi plus.
The command I used is as below:
(mlc) orangepi@orangepi5plus:~$ mlcr run-mlperf,inference,_full,_r5.0-dev --model=resnet50 --implementation=reference --framework=onnxruntime --category=edge --scenario=Offline --execution_mode=valid --device=cpu --quiet --test_query_count=1000 --thread=4
And the log is as below
The text was updated successfully, but these errors were encountered: