RAID

RAID (Redundant Array of Independent Disks) combines multiple physical drives into a logical unit to improve performance, redundancy, or both.

Table of Contents#

Overview
RAID Level Comparison
RAID 0 - Striping
RAID 1 - Mirroring
RAID 5 - Striping with Parity
RAID 6 - Striping with Double Parity
RAID 10 - Stripe of Mirrors
Hardware vs Software RAID
Rebuild Failure Risk
Data Scrubbing
Cost Analysis
RAID Is Not a Backup
Troubleshooting
See Also
Sources

1. Overview#

RAID technology aggregates multiple physical disks into a single logical volume. Depending on the RAID level, this can provide:

Increased performance via parallel I/O across disks (striping)
Data redundancy via mirroring or parity calculations
A combination of both performance and redundancy

RAID is implemented in two ways:

Software RAID - managed by the operating system (e.g., mdadm, LVM, ZFS, BTRFS)
Hardware RAID - managed by a dedicated controller card with its own processor and cache

2. RAID Level Comparison#

RAID Level	Redundancy	Drive Utilization	Read Performance	Write Performance	Min Drives
0	No	100%	nX (best)	nX (best)	2
1	Yes	50%	Up to nX (multi-read)	1X	2
5	Yes	67%-94%	(n-1)X	(n-1)X	3
6	Yes	50%-88%	(n-2)X	(n-2)X	4
10 (far2)	Yes	50%	nX (best)	(n/2)X	2
10 (near2)	Yes	50%	Up to nX (multi-read)	(n/2)X	2

Where n is the number of disks in the array.

RAID Calculator - calculate usable capacity and performance for your disk configuration.

3. RAID 0 - Striping#

Data is split into blocks and distributed across all drives. RAID 0 provides maximum performance and full capacity utilization, but offers zero redundancy.

Figure: RAID 0 diagram.

Advantages:

Best read and write performance of any RAID level
100% storage utilization, no overhead
Simple to implement

Disadvantages:

Any single drive failure destroys the entire array
Not suitable for any data that cannot be easily recreated

Ideal use: Temporary scratch storage, video editing, build caches, or any workload where data is disposable and speed matters.

4. RAID 1 - Mirroring#

Data is written identically to two or more drives. If one drive fails, the other continues serving data without interruption.

Figure: RAID 1 diagram.

Advantages:

Excellent read speed; writes are comparable to a single drive
Simplest redundancy; no parity calculation overhead
Fast rebuild: data is copied directly from the surviving mirror

Disadvantages:

Only 50% of total capacity is usable
Not easily scalable beyond two disks for cost-efficiency

Ideal use: Boot drives, small servers with two drives, mission-critical applications requiring the simplest possible redundancy.

5. RAID 5 - Striping with Parity#

Data blocks and parity checksums are distributed across three or more drives. If one drive fails, the data can be reconstructed from the remaining drives and parity. The parity rotates across all drives to distribute the I/O load.

Figure: RAID 5 diagram.

Advantages:

Good balance of performance, redundancy, and capacity
Efficient storage utilization (67%-94% depending on drive count)
Read performance scales nearly linearly with drive count

Disadvantages:

Write penalty: each write requires reading old data and parity, computing new parity, then writing both
Long rebuild times on large disks (hours to days for multi-TB drives)
Vulnerable during rebuild: a second drive failure during rebuild results in total data loss

Ideal use: General-purpose file and application servers with a moderate number of drives. Consider RAID 6 for arrays with drives larger than 2 TB.

6. RAID 6 - Striping with Double Parity#

Similar to RAID 5, but parity data is written to two drives instead of one. This allows the array to survive two simultaneous drive failures.

Figure: RAID 6 diagram.

Advantages:

Survives two simultaneous drive failures
More resilient during long rebuild times with large disks
Read performance comparable to RAID 5

Disadvantages:

Higher write penalty than RAID 5 (two parity blocks per stripe)
Write performance approximately 20% lower than RAID 5
Requires a minimum of four drives

Ideal use: Large arrays with many high-capacity drives where rebuild times are long and a second failure during rebuild is a realistic risk.

7. RAID 10 - Stripe of Mirrors#

Combines RAID 1 (mirroring) and RAID 0 (striping). Data is first mirrored, then the mirror pairs are striped for performance.

Figure: RAID 10 diagram.

Advantages:

Excellent read and write performance
Fast rebuild: only the mirror pair needs to be rebuilt, not the entire array
Rebuilds typically complete in minutes rather than hours

Disadvantages:

50% of total capacity goes to mirroring
More expensive per usable gigabyte than RAID 5/6

Ideal use: Databases, virtualization hosts, and any workload demanding both high IOPS and redundancy. The preferred RAID level for random I/O-heavy applications.

8. Hardware vs Software RAID#

Aspect	Hardware RAID	Software RAID
Controller	Dedicated card with processor and battery-backed cache	CPU and memory of the host system
Cost	$200-$2000+ for enterprise cards	Free (included in the OS)
CPU overhead	Minimal (offloaded to controller)	Small, generally negligible on modern CPUs
Portability	Tied to controller model; array unreadable without matching card	Portable across systems (metadata on disks)
Boot support	Transparent to OS; can boot directly	Requires initramfs configuration
Battery/cache	Write-back cache with battery protection for performance	No hardware cache (can use write-intent bitmap)
Hot swap	Usually supported by backplane/controller	Depends on enclosure; mdadm supports it
Monitoring	Vendor-specific tools (MegaCLI, arcconf, storcli)	Standard Linux tools (mdadm, /proc/mdstat)
Reliability	Controller itself is a single point of failure	No additional hardware to fail
"Fake" RAID	BIOS/UEFI RAID (Intel RST, AMD) is NOT hardware RAID; avoid for Linux	Use real software RAID instead of fake RAID

Recommendation: Software RAID (mdadm or ZFS/BTRFS) is preferred for most Linux deployments. Hardware RAID is warranted only when write-back cache performance is critical and the controller has battery protection.

9. Rebuild Failure Risk#

The most dangerous period for a RAID array is during a rebuild. While the array is degraded and rebuilding, a second disk failure causes total data loss (for RAID 5) or further degradation (for RAID 6).

Factors Affecting Rebuild Risk#

Factor	Impact
Drive size	Larger drives take longer to rebuild. A 16 TB RAID 5 rebuild can take 24+ hours
Drive age	Drives from the same batch and age are likely to fail around the same time
URE rate	Unrecoverable Read Errors during rebuild can cause a second logical failure
I/O load	Production traffic during rebuild extends rebuild time and stresses surviving disks

URE (Unrecoverable Read Error) Problem#

Consumer drives have a URE rate of approximately 1 per 10^14 bits read (about 12.5 TB). During a RAID 5 rebuild of large drives, the probability of encountering a URE is significant:

4 TB RAID 5 array (3+1 drives): ~30% chance of URE during rebuild
8 TB RAID 5 array: ~55% chance of URE during rebuild
16 TB RAID 5 array: ~80% chance of URE during rebuild

Mitigation strategies:

Use RAID 6 or RAID 10 for arrays with drives larger than 2 TB
Use enterprise-grade drives with lower URE rates (1 per 10^15 bits)
Maintain hot spares to begin rebuild immediately on failure
Run regular scrubs to detect and correct errors before a rebuild is needed
Keep I/O load low during rebuild periods

10. Data Scrubbing#

Data scrubbing (also called patrol read or consistency check) reads all data and parity on the array to detect silent corruption. It is a non-destructive operation that should be scheduled regularly.

mdadm Scrub#

# Start a check (read-only verification)
echo check > /sys/block/md0/md/sync_action

# Start a repair (corrects mismatches using parity)
echo repair > /sys/block/md0/md/sync_action

# Monitor progress
cat /proc/mdstat

# View mismatch count after check
cat /sys/block/md0/md/mismatch_cnt

Scheduling#

# Monthly scrub via cron
0 2 1 * * root echo check > /sys/block/md0/md/sync_action

Many distributions include a /etc/cron.d/mdadm or a systemd timer that runs scrubs weekly or monthly.

ZFS and BTRFS#

These filesystems have integrated scrubbing:

# ZFS
zpool scrub mypool

# BTRFS
btrfs scrub start /mnt/data

11. Cost Analysis#

Example: 48 TB usable storage using 8 TB drives.

RAID Level	Drives Needed	Raw Capacity	Usable Capacity	Drive Cost (at $200/drive)	Cost per Usable TB
0	6	48 TB	48 TB	$1,200	$25
1	12	96 TB	48 TB	$2,400	$50
5	7	56 TB	48 TB	$1,400	$29
6	8	64 TB	48 TB	$1,600	$33
10	12	96 TB	48 TB	$2,400	$50

RAID 5 and RAID 6 offer the best cost-per-usable-TB for redundant storage. RAID 10 costs nearly double but provides superior performance and faster rebuilds.

12. RAID Is Not a Backup#

All RAID levels except RAID 0 protect against individual drive failure, but RAID does not protect against:

Simultaneous multi-drive failure beyond the RAID level's tolerance (power surge, fire, flood)
Theft of the entire system
Accidental deletion or user error, which is reflected immediately across all mirrors/parity
Ransomware or malware that encrypts or corrupts files
Controller or firmware bugs that corrupt the entire array
Natural disasters affecting the physical location

Always maintain off-site backups of critical data, independent of the RAID configuration.

Troubleshooting#

Issue	Cause	Solution
Array not assembling on boot	Missing mdadm.conf or outdated initramfs	Run `mdadm --detail --scan >> /etc/mdadm.conf` and rebuild initramfs
Single drive failure in RAID 5/6	Physical disk failure	Replace drive, rebuild array; see mdadm recovery (disk failure and recovery)
Two drive failures in RAID 5	Second failure during rebuild or simultaneous failure	Data is lost; restore from backup; use RAID 6 or RAID 10 in the future
Slow rebuild performance	I/O contention from production workload	Increase `sync_speed_min`, reduce application I/O, add write-intent bitmap
High `mismatch_cnt` after scrub	Silent data corruption or URE	Investigate disk health with SMART; replace suspect drives; run `repair` sync action
RAID 5 write hole	Unclean shutdown without write-intent bitmap	Add bitmap to prevent future occurrences; check parity consistency
Performance worse than expected	Wrong chunk size, alignment issues, or read-ahead too small	Tune chunk size at creation; set `blockdev --setra 4096 /dev/md0`; align partitions to chunk boundaries

Table of Contents#

1. Overview#

2. RAID Level Comparison#

3. RAID 0 - Striping#

4. RAID 1 - Mirroring#

5. RAID 5 - Striping with Parity#

6. RAID 6 - Striping with Double Parity#

7. RAID 10 - Stripe of Mirrors#

8. Hardware vs Software RAID#

9. Rebuild Failure Risk#

Factors Affecting Rebuild Risk#

URE (Unrecoverable Read Error) Problem#

10. Data Scrubbing#

mdadm Scrub#

Scheduling#

ZFS and BTRFS#

11. Cost Analysis#

12. RAID Is Not a Backup#

Troubleshooting#

See Also#

Sources#