RAID (Redundant Array of Independent Disks) combines multiple physical drives into a logical unit to improve performance, redundancy, or both.

Table of Contents#

  1. Overview
  2. RAID Level Comparison
  3. RAID 0 - Striping
  4. RAID 1 - Mirroring
  5. RAID 5 - Striping with Parity
  6. RAID 6 - Striping with Double Parity
  7. RAID 10 - Stripe of Mirrors
  8. Hardware vs Software RAID
  9. Rebuild Failure Risk
  10. Data Scrubbing
  11. Cost Analysis
  12. RAID Is Not a Backup
  13. Troubleshooting
  14. See Also
  15. Sources

1. Overview#

RAID technology aggregates multiple physical disks into a single logical volume. Depending on the RAID level, this can provide:

  • Increased performance via parallel I/O across disks (striping)
  • Data redundancy via mirroring or parity calculations
  • A combination of both performance and redundancy

RAID is implemented in two ways:

  • Software RAID - managed by the operating system (e.g., mdadm, LVM, ZFS, BTRFS)
  • Hardware RAID - managed by a dedicated controller card with its own processor and cache

2. RAID Level Comparison#

RAID LevelRedundancyDrive UtilizationRead PerformanceWrite PerformanceMin Drives
0No100%nX (best)nX (best)2
1Yes50%Up to nX (multi-read)1X2
5Yes67%-94%(n-1)X(n-1)X3
6Yes50%-88%(n-2)X(n-2)X4
10 (far2)Yes50%nX (best)(n/2)X2
10 (near2)Yes50%Up to nX (multi-read)(n/2)X2

Where n is the number of disks in the array.

RAID Calculator - calculate usable capacity and performance for your disk configuration.

3. RAID 0 - Striping#

Data is split into blocks and distributed across all drives. RAID 0 provides maximum performance and full capacity utilization, but offers zero redundancy.

Figure: RAID 0 diagram.

Advantages:

  • Best read and write performance of any RAID level
  • 100% storage utilization, no overhead
  • Simple to implement

Disadvantages:

  • Any single drive failure destroys the entire array
  • Not suitable for any data that cannot be easily recreated

Ideal use: Temporary scratch storage, video editing, build caches, or any workload where data is disposable and speed matters.

4. RAID 1 - Mirroring#

Data is written identically to two or more drives. If one drive fails, the other continues serving data without interruption.

Figure: RAID 1 diagram.

Advantages:

  • Excellent read speed; writes are comparable to a single drive
  • Simplest redundancy; no parity calculation overhead
  • Fast rebuild: data is copied directly from the surviving mirror

Disadvantages:

  • Only 50% of total capacity is usable
  • Not easily scalable beyond two disks for cost-efficiency

Ideal use: Boot drives, small servers with two drives, mission-critical applications requiring the simplest possible redundancy.

5. RAID 5 - Striping with Parity#

Data blocks and parity checksums are distributed across three or more drives. If one drive fails, the data can be reconstructed from the remaining drives and parity. The parity rotates across all drives to distribute the I/O load.

Figure: RAID 5 diagram.

Advantages:

  • Good balance of performance, redundancy, and capacity
  • Efficient storage utilization (67%-94% depending on drive count)
  • Read performance scales nearly linearly with drive count

Disadvantages:

  • Write penalty: each write requires reading old data and parity, computing new parity, then writing both
  • Long rebuild times on large disks (hours to days for multi-TB drives)
  • Vulnerable during rebuild: a second drive failure during rebuild results in total data loss

Ideal use: General-purpose file and application servers with a moderate number of drives. Consider RAID 6 for arrays with drives larger than 2 TB.

6. RAID 6 - Striping with Double Parity#

Similar to RAID 5, but parity data is written to two drives instead of one. This allows the array to survive two simultaneous drive failures.

Figure: RAID 6 diagram.

Advantages:

  • Survives two simultaneous drive failures
  • More resilient during long rebuild times with large disks
  • Read performance comparable to RAID 5

Disadvantages:

  • Higher write penalty than RAID 5 (two parity blocks per stripe)
  • Write performance approximately 20% lower than RAID 5
  • Requires a minimum of four drives

Ideal use: Large arrays with many high-capacity drives where rebuild times are long and a second failure during rebuild is a realistic risk.

7. RAID 10 - Stripe of Mirrors#

Combines RAID 1 (mirroring) and RAID 0 (striping). Data is first mirrored, then the mirror pairs are striped for performance.

Figure: RAID 10 diagram.

Advantages:

  • Excellent read and write performance
  • Fast rebuild: only the mirror pair needs to be rebuilt, not the entire array
  • Rebuilds typically complete in minutes rather than hours

Disadvantages:

  • 50% of total capacity goes to mirroring
  • More expensive per usable gigabyte than RAID 5/6

Ideal use: Databases, virtualization hosts, and any workload demanding both high IOPS and redundancy. The preferred RAID level for random I/O-heavy applications.

8. Hardware vs Software RAID#

AspectHardware RAIDSoftware RAID
ControllerDedicated card with processor and battery-backed cacheCPU and memory of the host system
Cost$200-$2000+ for enterprise cardsFree (included in the OS)
CPU overheadMinimal (offloaded to controller)Small, generally negligible on modern CPUs
PortabilityTied to controller model; array unreadable without matching cardPortable across systems (metadata on disks)
Boot supportTransparent to OS; can boot directlyRequires initramfs configuration
Battery/cacheWrite-back cache with battery protection for performanceNo hardware cache (can use write-intent bitmap)
Hot swapUsually supported by backplane/controllerDepends on enclosure; mdadm supports it
MonitoringVendor-specific tools (MegaCLI, arcconf, storcli)Standard Linux tools (mdadm, /proc/mdstat)
ReliabilityController itself is a single point of failureNo additional hardware to fail
"Fake" RAIDBIOS/UEFI RAID (Intel RST, AMD) is NOT hardware RAID; avoid for LinuxUse real software RAID instead of fake RAID

Recommendation: Software RAID (mdadm or ZFS/BTRFS) is preferred for most Linux deployments. Hardware RAID is warranted only when write-back cache performance is critical and the controller has battery protection.

9. Rebuild Failure Risk#

The most dangerous period for a RAID array is during a rebuild. While the array is degraded and rebuilding, a second disk failure causes total data loss (for RAID 5) or further degradation (for RAID 6).

Factors Affecting Rebuild Risk#

FactorImpact
Drive sizeLarger drives take longer to rebuild. A 16 TB RAID 5 rebuild can take 24+ hours
Drive ageDrives from the same batch and age are likely to fail around the same time
URE rateUnrecoverable Read Errors during rebuild can cause a second logical failure
I/O loadProduction traffic during rebuild extends rebuild time and stresses surviving disks

URE (Unrecoverable Read Error) Problem#

Consumer drives have a URE rate of approximately 1 per 10^14 bits read (about 12.5 TB). During a RAID 5 rebuild of large drives, the probability of encountering a URE is significant:

  • 4 TB RAID 5 array (3+1 drives): ~30% chance of URE during rebuild
  • 8 TB RAID 5 array: ~55% chance of URE during rebuild
  • 16 TB RAID 5 array: ~80% chance of URE during rebuild

Mitigation strategies:

  • Use RAID 6 or RAID 10 for arrays with drives larger than 2 TB
  • Use enterprise-grade drives with lower URE rates (1 per 10^15 bits)
  • Maintain hot spares to begin rebuild immediately on failure
  • Run regular scrubs to detect and correct errors before a rebuild is needed
  • Keep I/O load low during rebuild periods

10. Data Scrubbing#

Data scrubbing (also called patrol read or consistency check) reads all data and parity on the array to detect silent corruption. It is a non-destructive operation that should be scheduled regularly.

mdadm Scrub#

# Start a check (read-only verification)
echo check > /sys/block/md0/md/sync_action

# Start a repair (corrects mismatches using parity)
echo repair > /sys/block/md0/md/sync_action

# Monitor progress
cat /proc/mdstat

# View mismatch count after check
cat /sys/block/md0/md/mismatch_cnt

Scheduling#

# Monthly scrub via cron
0 2 1 * * root echo check > /sys/block/md0/md/sync_action

Many distributions include a /etc/cron.d/mdadm or a systemd timer that runs scrubs weekly or monthly.

ZFS and BTRFS#

These filesystems have integrated scrubbing:

# ZFS
zpool scrub mypool

# BTRFS
btrfs scrub start /mnt/data

11. Cost Analysis#

Example: 48 TB usable storage using 8 TB drives.

RAID LevelDrives NeededRaw CapacityUsable CapacityDrive Cost (at $200/drive)Cost per Usable TB
0648 TB48 TB$1,200$25
11296 TB48 TB$2,400$50
5756 TB48 TB$1,400$29
6864 TB48 TB$1,600$33
101296 TB48 TB$2,400$50

RAID 5 and RAID 6 offer the best cost-per-usable-TB for redundant storage. RAID 10 costs nearly double but provides superior performance and faster rebuilds.

12. RAID Is Not a Backup#

All RAID levels except RAID 0 protect against individual drive failure, but RAID does not protect against:

  • Simultaneous multi-drive failure beyond the RAID level's tolerance (power surge, fire, flood)
  • Theft of the entire system
  • Accidental deletion or user error, which is reflected immediately across all mirrors/parity
  • Ransomware or malware that encrypts or corrupts files
  • Controller or firmware bugs that corrupt the entire array
  • Natural disasters affecting the physical location

Always maintain off-site backups of critical data, independent of the RAID configuration.

Troubleshooting#

IssueCauseSolution
Array not assembling on bootMissing mdadm.conf or outdated initramfsRun mdadm --detail --scan >> /etc/mdadm.conf and rebuild initramfs
Single drive failure in RAID 5/6Physical disk failureReplace drive, rebuild array; see mdadm recovery (disk failure and recovery)
Two drive failures in RAID 5Second failure during rebuild or simultaneous failureData is lost; restore from backup; use RAID 6 or RAID 10 in the future
Slow rebuild performanceI/O contention from production workloadIncrease sync_speed_min, reduce application I/O, add write-intent bitmap
High mismatch_cnt after scrubSilent data corruption or UREInvestigate disk health with SMART; replace suspect drives; run repair sync action
RAID 5 write holeUnclean shutdown without write-intent bitmapAdd bitmap to prevent future occurrences; check parity consistency
Performance worse than expectedWrong chunk size, alignment issues, or read-ahead too smallTune chunk size at creation; set blockdev --setra 4096 /dev/md0; align partitions to chunk boundaries

See Also#

Sources#