oai:arXiv.org:2401.10688
Computer Science
2024
5/29/2024
Generalized Reed-Solomon (RS) codes are a common choice for efficient, reliable error correction in memory and communications systems.
These codes add $2t$ extra parity symbols to a block of memory, and can efficiently and reliably correct up to $t$ symbol errors in that block.
Decoding is possible beyond this bound, but it is imperfectly reliable and often computationally expensive.
Beyond-bound decoding is an important problem to solve for error-correcting Dynamic Random Access Memory (DRAM).
These memories are often designed so that each access touches two extra memory devices, so that a failure in any one device can be corrected.
But system architectures increasingly require DRAM to store metadata in addition to user data.
When the metadata replaces parity data, a single-device failure is then beyond-bound.
An error-correction system can either protect each access with a single RS code, or divide it into several segments protected with a shorter code, usually in an Interleaved Reed-Solomon (IRS) configuration.
The full-block RS approach is more reliable, both at correcting errors and at preventing silent data corruption (SDC).
The IRS option is faster, and is especially efficient at beyond-bound correction of single- or double-device failures.
Here we describe a new family of "unraveling" Reed-Solomon codes that bridges the gap between these options.
Our codes are full-block generalized RS codes, but they can also be decoded using an IRS decoder.
As a result, they combine the speed and beyond-bound correction capabilities of interleaved codes with the robustness of full-block codes, including the ability of the latter to reliably correct failures across multiple devices.
We show that unraveling codes are an especially good fit for high-reliability DRAM error correction.
;Comment: Changes vs first arxiv version: wordsmithing, typo corrections and citation fixes
Hamburg, Mike,Linstadt, Eric,Moore, Danny,Vogelsang, Thomas, 2024, Unraveling codes: fast, robust, beyond-bound error correction for DRAM