Document detail
ID

oai:arXiv.org:2401.10688

Topic
Computer Science - Information The... Computer Science - Hardware Archit...
Author
Hamburg, Mike Linstadt, Eric Moore, Danny Vogelsang, Thomas
Category

Computer Science

Year

2024

listing date

5/29/2024

Keywords
access reliable full-block irs data unraveling error correction memory dram codes beyond-bound
Metrics

Abstract

Generalized Reed-Solomon (RS) codes are a common choice for efficient, reliable error correction in memory and communications systems.

These codes add $2t$ extra parity symbols to a block of memory, and can efficiently and reliably correct up to $t$ symbol errors in that block.

Decoding is possible beyond this bound, but it is imperfectly reliable and often computationally expensive.

Beyond-bound decoding is an important problem to solve for error-correcting Dynamic Random Access Memory (DRAM).

These memories are often designed so that each access touches two extra memory devices, so that a failure in any one device can be corrected.

But system architectures increasingly require DRAM to store metadata in addition to user data.

When the metadata replaces parity data, a single-device failure is then beyond-bound.

An error-correction system can either protect each access with a single RS code, or divide it into several segments protected with a shorter code, usually in an Interleaved Reed-Solomon (IRS) configuration.

The full-block RS approach is more reliable, both at correcting errors and at preventing silent data corruption (SDC).

The IRS option is faster, and is especially efficient at beyond-bound correction of single- or double-device failures.

Here we describe a new family of "unraveling" Reed-Solomon codes that bridges the gap between these options.

Our codes are full-block generalized RS codes, but they can also be decoded using an IRS decoder.

As a result, they combine the speed and beyond-bound correction capabilities of interleaved codes with the robustness of full-block codes, including the ability of the latter to reliably correct failures across multiple devices.

We show that unraveling codes are an especially good fit for high-reliability DRAM error correction.

;Comment: Changes vs first arxiv version: wordsmithing, typo corrections and citation fixes

Hamburg, Mike,Linstadt, Eric,Moore, Danny,Vogelsang, Thomas, 2024, Unraveling codes: fast, robust, beyond-bound error correction for DRAM

Document

Open

Share

Source

Articles recommended by ES/IODE AI