Dokumentdetails
ID

oai:arXiv.org:2403.15654

Thema
Computer Science - Machine Learnin... Mathematics - Optimization and Con...
Autor
Wu, Tongle Sun, Ying
Kategorie

Computer Science

Jahr

2024

Auflistungsdatum

27.03.2024

Schlüsselwörter
updates gradient dgt decentralized
Metrisch

Zusammenfassung

We revisit two fundamental decentralized optimization methods, Decentralized Gradient Tracking (DGT) and Decentralized Gradient Descent (DGD), with multiple local updates.

We consider two settings and demonstrate that incorporating $K > 1$ local update steps can reduce communication complexity.

Specifically, for $\mu$-strongly convex and $L$-smooth loss functions, we proved that local DGT achieves communication complexity $\tilde{\mathcal{O}} \Big(\frac{L}{\mu K} + \frac{\delta}{\mu (1 - \rho)} + \frac{\rho }{(1 - \rho)^2} \cdot \frac{L+ \delta}{\mu}\Big)$, where $\rho$ measures the network connectivity and $\delta$ measures the second-order heterogeneity of the local loss.

Our result reveals the tradeoff between communication and computation and shows increasing $K$ can effectively reduce communication costs when the data heterogeneity is low and the network is well-connected.

We then consider the over-parameterization regime where the local losses share the same minimums, we proved that employing local updates in DGD, even without gradient correction, can yield a similar effect as DGT in reducing communication complexity.

Numerical experiments validate our theoretical results.

Wu, Tongle,Sun, Ying, 2024, The Effectiveness of Local Updates for Decentralized Learning under Data Heterogeneity

Dokumentieren

Öffnen

Teilen

Quelle

Artikel empfohlen von ES/IODE AI

Use of ileostomy versus colostomy as a bridge to surgery in left-sided obstructive colon cancer: retrospective cohort study
deviating 0 versus surgery bridge colon study left-sided obstructive stoma colostomy cancer cent