EdgeMM: Multi-Core CPU with Heterogeneous AI-Extension and Activation-aware Weight Pruning for Multimodal LLMs at Edge

Bai, Kangbo; Ye, Le; Huang, Ru; Jia, Tianyu

Computer Science > Hardware Architecture

arXiv:2505.10782 (cs)

[Submitted on 16 May 2025]

Title:EdgeMM: Multi-Core CPU with Heterogeneous AI-Extension and Activation-aware Weight Pruning for Multimodal LLMs at Edge

Authors:Kangbo Bai, Le Ye, Ru Huang, Tianyu Jia

View PDF HTML (experimental)

Abstract:Emerging multimodal LLMs (MLLMs) exhibit strong cross-modality perception and reasoning capabilities and hold great potential for various applications at edge. However, MLLMs typically consist of a compute-intensive modality encoder and a memory-bound LLM decoder, leading to distinct bottlenecks for hardware designs. In this work, we present a multi-core CPU solution with heterogeneous AI extensions, which are based on either the compute-centric systolic array or memory-centric digital compute-in-memory (CIM) co-processors. In addition, dynamic activation-aware weight pruning and bandwidth management are developed to enhance bandwidth efficiency and core utilization, improving overall performance. We implemented our solution using commercial 22nm technology. For representative MLLMs, our evaluations show EdgeMM can achieve 2.84x performance speedup compared to laptop 3060 GPU.

Comments:	Accepted by DAC 2025
Subjects:	Hardware Architecture (cs.AR)
Cite as:	arXiv:2505.10782 [cs.AR]
	(or arXiv:2505.10782v1 [cs.AR] for this version)
	https://doi.org/10.48550/arXiv.2505.10782

Submission history

From: Kangbo Bai [view email]
[v1] Fri, 16 May 2025 01:46:37 UTC (13,111 KB)

Computer Science > Hardware Architecture

Title:EdgeMM: Multi-Core CPU with Heterogeneous AI-Extension and Activation-aware Weight Pruning for Multimodal LLMs at Edge

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Hardware Architecture

Title:EdgeMM: Multi-Core CPU with Heterogeneous AI-Extension and Activation-aware Weight Pruning for Multimodal LLMs at Edge

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators