November 2018 - Computer Repair Blog

The basis of all error detection and correction in hard disks is the inclusion of redundant information and special hardware or software to use it. Each sector of data on the hard disk contains 512 bytes, or 4,096 bits, of user data. In addition to these bits, an additional number of bits are added to each sector for the implementation of error correcting code or ECC (sometimes also called error correction code or error correcting circuits). These bits do not contain data; rather, they contain information about the data that can be used to correct any problems encountered trying to access the real data bits.

There are several different types of error correcting codes that have been invented over the years, but the type commonly used on PCs is the Reed-Solomon algorithm, named for researchers Irving Reed and Gustave Solomon, who first discovered the general technique that the algorithm employs. Reed-Solomon codes are widely used for error detection and correction in various computing and communications media, including magnetic storage, optical storage, high-speed modems, and data transmission channels. They have been chosen because they are easier to decode than most other similar codes, can detect (and correct) large numbers of missing bits of data, and require the least number of extra ECC bits for a given number of data bits. Look in the memory section for much more general information on error detection and correction.

When a sector is written to the hard disk, the appropriate ECC codes are generated and stored in the bits reserved for them. When the sector is read back, the user data read, combined with the ECC bits, can tell the controller if any errors occurred during the read. Errors that can be corrected using the redundant information are corrected before passing the data to the rest of the system. The system can also tell when there is too much damage to the data to correct, and will issue an error notification in that event. The sophisticated firmware present in all modern drives uses ECC as part of its overall error management protocols. This is all done “on the fly” with no intervention from the user required, and no slowdown in performance even when errors are encountered and must be corrected.

The capability of a Reed Solomon ECC implementation is based on the number of additional ECC bits it includes. The more bits that are included for a given amount of data, the more errors that can be tolerated. There are multiple trade offs involved in deciding how many bits of ECC information to use. Including more bits per sector of data allows for more robust error detection and correction, but means fewer sectors can be put on each track, since more of the linear distance of the track is used up with non-data bits. On the other hand, if you make the system more capable of detecting and correcting errors, you make it possible to increase areal density or make other performance improvements, which could pay back the “investment” of extra ECC bits, and then some. Another complicating factor is that the more ECC bits included, the more processing power the controller must possess to process the Reed Solomon algorithm. The engineers who design hard disks take these various factors into account in deciding how many ECC bits to include for each sector.

This is an archive of Charles M. Kozierok’s PCGuide (pcguide.com) which disappeared from the internet in 2018. We wanted to preserve Charles M. Kozierok’s knowledge about computers and are permanently hosting a selection of important pages from PCGuide.

Computer Repair Blog

Karls Technology Computer Repair

Monthly Archives: November 2018

Hard Drive Error Correcting Code (ECC)