Asynchronous Decentralized SGD with Quantized and Local Updates

Nadiradze, Giorgi; Sabour, Amirmojtaba; Davies, Peter; Li, Shigang; Alistarh, Dan

Computer Science > Machine Learning

arXiv:1910.12308 (cs)

[Submitted on 27 Oct 2019 (v1), last revised 25 Mar 2022 (this version, v4)]

Title:Asynchronous Decentralized SGD with Quantized and Local Updates

Authors:Giorgi Nadiradze, Amirmojtaba Sabour, Peter Davies, Shigang Li, Dan Alistarh

View PDF

Abstract:Decentralized optimization is emerging as a viable alternative for scalable distributed machine learning, but also introduces new challenges in terms of synchronization costs. To this end, several communication-reduction techniques, such as non-blocking communication, quantization, and local steps, have been explored in the decentralized setting. Due to the complexity of analyzing optimization in such a relaxed setting, this line of work often assumes \emph{global} communication rounds, which require additional synchronization. In this paper, we consider decentralized optimization in the simpler, but harder to analyze, \emph{asynchronous gossip} model, in which communication occurs in discrete, randomly chosen pairings among nodes. Perhaps surprisingly, we show that a variant of SGD called \emph{SwarmSGD} still converges in this setting, even if \emph{non-blocking communication}, \emph{quantization}, and \emph{local steps} are all applied \emph{in conjunction}, and even if the node data distributions and underlying graph topology are both \emph{heterogenous}. Our analysis is based on a new connection with multi-dimensional load-balancing processes. We implement this algorithm and deploy it in a super-computing environment, showing that it can outperform previous decentralized methods in terms of end-to-end training time, and that it can even rival carefully-tuned large-batch SGD for certain tasks.

Subjects:	Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
Cite as:	arXiv:1910.12308 [cs.LG]
	(or arXiv:1910.12308v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1910.12308

Submission history

From: Giorgi Nadiradze [view email]
[v1] Sun, 27 Oct 2019 17:40:26 UTC (1,070 KB)
[v2] Fri, 20 Mar 2020 14:24:00 UTC (219 KB)
[v3] Mon, 14 Dec 2020 18:32:01 UTC (652 KB)
[v4] Fri, 25 Mar 2022 14:41:38 UTC (661 KB)

Computer Science > Machine Learning

Title:Asynchronous Decentralized SGD with Quantized and Local Updates

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Asynchronous Decentralized SGD with Quantized and Local Updates

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators