RAID (Redundant Array of Independent Disks) combines multiple physical drives into a logical unit to improve performance, redundancy, or both.
Table of Contents#
- Overview
- RAID Level Comparison
- RAID 0 - Striping
- RAID 1 - Mirroring
- RAID 5 - Striping with Parity
- RAID 6 - Striping with Double Parity
- RAID 10 - Stripe of Mirrors
- Hardware vs Software RAID
- Rebuild Failure Risk
- Data Scrubbing
- Cost Analysis
- RAID Is Not a Backup
- Troubleshooting
- See Also
- Sources
1. Overview#
RAID technology aggregates multiple physical disks into a single logical volume. Depending on the RAID level, this can provide:
- Increased performance via parallel I/O across disks (striping)
- Data redundancy via mirroring or parity calculations
- A combination of both performance and redundancy
RAID is implemented in two ways:
- Software RAID - managed by the operating system (e.g., mdadm, LVM, ZFS, BTRFS)
- Hardware RAID - managed by a dedicated controller card with its own processor and cache
2. RAID Level Comparison#
| RAID Level | Redundancy | Drive Utilization | Read Performance | Write Performance | Min Drives |
|---|---|---|---|---|---|
| 0 | No | 100% | nX (best) | nX (best) | 2 |
| 1 | Yes | 50% | Up to nX (multi-read) | 1X | 2 |
| 5 | Yes | 67%-94% | (n-1)X | (n-1)X | 3 |
| 6 | Yes | 50%-88% | (n-2)X | (n-2)X | 4 |
| 10 (far2) | Yes | 50% | nX (best) | (n/2)X | 2 |
| 10 (near2) | Yes | 50% | Up to nX (multi-read) | (n/2)X | 2 |
Where n is the number of disks in the array.
RAID Calculator - calculate usable capacity and performance for your disk configuration.
3. RAID 0 - Striping#
Data is split into blocks and distributed across all drives. RAID 0 provides maximum performance and full capacity utilization, but offers zero redundancy.
Figure: RAID 0 diagram.
Advantages:
- Best read and write performance of any RAID level
- 100% storage utilization, no overhead
- Simple to implement
Disadvantages:
- Any single drive failure destroys the entire array
- Not suitable for any data that cannot be easily recreated
Ideal use: Temporary scratch storage, video editing, build caches, or any workload where data is disposable and speed matters.
4. RAID 1 - Mirroring#
Data is written identically to two or more drives. If one drive fails, the other continues serving data without interruption.
Figure: RAID 1 diagram.
Advantages:
- Excellent read speed; writes are comparable to a single drive
- Simplest redundancy; no parity calculation overhead
- Fast rebuild: data is copied directly from the surviving mirror
Disadvantages:
- Only 50% of total capacity is usable
- Not easily scalable beyond two disks for cost-efficiency
Ideal use: Boot drives, small servers with two drives, mission-critical applications requiring the simplest possible redundancy.
5. RAID 5 - Striping with Parity#
Data blocks and parity checksums are distributed across three or more drives. If one drive fails, the data can be reconstructed from the remaining drives and parity. The parity rotates across all drives to distribute the I/O load.
Figure: RAID 5 diagram.
Advantages:
- Good balance of performance, redundancy, and capacity
- Efficient storage utilization (67%-94% depending on drive count)
- Read performance scales nearly linearly with drive count
Disadvantages:
- Write penalty: each write requires reading old data and parity, computing new parity, then writing both
- Long rebuild times on large disks (hours to days for multi-TB drives)
- Vulnerable during rebuild: a second drive failure during rebuild results in total data loss
Ideal use: General-purpose file and application servers with a moderate number of drives. Consider RAID 6 for arrays with drives larger than 2 TB.
6. RAID 6 - Striping with Double Parity#
Similar to RAID 5, but parity data is written to two drives instead of one. This allows the array to survive two simultaneous drive failures.
Figure: RAID 6 diagram.
Advantages:
- Survives two simultaneous drive failures
- More resilient during long rebuild times with large disks
- Read performance comparable to RAID 5
Disadvantages:
- Higher write penalty than RAID 5 (two parity blocks per stripe)
- Write performance approximately 20% lower than RAID 5
- Requires a minimum of four drives
Ideal use: Large arrays with many high-capacity drives where rebuild times are long and a second failure during rebuild is a realistic risk.
7. RAID 10 - Stripe of Mirrors#
Combines RAID 1 (mirroring) and RAID 0 (striping). Data is first mirrored, then the mirror pairs are striped for performance.
Figure: RAID 10 diagram.
Advantages:
- Excellent read and write performance
- Fast rebuild: only the mirror pair needs to be rebuilt, not the entire array
- Rebuilds typically complete in minutes rather than hours
Disadvantages:
- 50% of total capacity goes to mirroring
- More expensive per usable gigabyte than RAID 5/6
Ideal use: Databases, virtualization hosts, and any workload demanding both high IOPS and redundancy. The preferred RAID level for random I/O-heavy applications.
8. Hardware vs Software RAID#
| Aspect | Hardware RAID | Software RAID |
|---|---|---|
| Controller | Dedicated card with processor and battery-backed cache | CPU and memory of the host system |
| Cost | $200-$2000+ for enterprise cards | Free (included in the OS) |
| CPU overhead | Minimal (offloaded to controller) | Small, generally negligible on modern CPUs |
| Portability | Tied to controller model; array unreadable without matching card | Portable across systems (metadata on disks) |
| Boot support | Transparent to OS; can boot directly | Requires initramfs configuration |
| Battery/cache | Write-back cache with battery protection for performance | No hardware cache (can use write-intent bitmap) |
| Hot swap | Usually supported by backplane/controller | Depends on enclosure; mdadm supports it |
| Monitoring | Vendor-specific tools (MegaCLI, arcconf, storcli) | Standard Linux tools (mdadm, /proc/mdstat) |
| Reliability | Controller itself is a single point of failure | No additional hardware to fail |
| "Fake" RAID | BIOS/UEFI RAID (Intel RST, AMD) is NOT hardware RAID; avoid for Linux | Use real software RAID instead of fake RAID |
Recommendation: Software RAID (mdadm or ZFS/BTRFS) is preferred for most Linux deployments. Hardware RAID is warranted only when write-back cache performance is critical and the controller has battery protection.
9. Rebuild Failure Risk#
The most dangerous period for a RAID array is during a rebuild. While the array is degraded and rebuilding, a second disk failure causes total data loss (for RAID 5) or further degradation (for RAID 6).
Factors Affecting Rebuild Risk#
| Factor | Impact |
|---|---|
| Drive size | Larger drives take longer to rebuild. A 16 TB RAID 5 rebuild can take 24+ hours |
| Drive age | Drives from the same batch and age are likely to fail around the same time |
| URE rate | Unrecoverable Read Errors during rebuild can cause a second logical failure |
| I/O load | Production traffic during rebuild extends rebuild time and stresses surviving disks |
URE (Unrecoverable Read Error) Problem#
Consumer drives have a URE rate of approximately 1 per 10^14 bits read (about 12.5 TB). During a RAID 5 rebuild of large drives, the probability of encountering a URE is significant:
- 4 TB RAID 5 array (3+1 drives): ~30% chance of URE during rebuild
- 8 TB RAID 5 array: ~55% chance of URE during rebuild
- 16 TB RAID 5 array: ~80% chance of URE during rebuild
Mitigation strategies:
- Use RAID 6 or RAID 10 for arrays with drives larger than 2 TB
- Use enterprise-grade drives with lower URE rates (1 per 10^15 bits)
- Maintain hot spares to begin rebuild immediately on failure
- Run regular scrubs to detect and correct errors before a rebuild is needed
- Keep I/O load low during rebuild periods
10. Data Scrubbing#
Data scrubbing (also called patrol read or consistency check) reads all data and parity on the array to detect silent corruption. It is a non-destructive operation that should be scheduled regularly.
mdadm Scrub#
# Start a check (read-only verification)
echo check > /sys/block/md0/md/sync_action
# Start a repair (corrects mismatches using parity)
echo repair > /sys/block/md0/md/sync_action
# Monitor progress
cat /proc/mdstat
# View mismatch count after check
cat /sys/block/md0/md/mismatch_cntScheduling#
# Monthly scrub via cron
0 2 1 * * root echo check > /sys/block/md0/md/sync_actionMany distributions include a /etc/cron.d/mdadm or a systemd timer that runs scrubs weekly or monthly.
ZFS and BTRFS#
These filesystems have integrated scrubbing:
# ZFS
zpool scrub mypool
# BTRFS
btrfs scrub start /mnt/data11. Cost Analysis#
Example: 48 TB usable storage using 8 TB drives.
| RAID Level | Drives Needed | Raw Capacity | Usable Capacity | Drive Cost (at $200/drive) | Cost per Usable TB |
|---|---|---|---|---|---|
| 0 | 6 | 48 TB | 48 TB | $1,200 | $25 |
| 1 | 12 | 96 TB | 48 TB | $2,400 | $50 |
| 5 | 7 | 56 TB | 48 TB | $1,400 | $29 |
| 6 | 8 | 64 TB | 48 TB | $1,600 | $33 |
| 10 | 12 | 96 TB | 48 TB | $2,400 | $50 |
RAID 5 and RAID 6 offer the best cost-per-usable-TB for redundant storage. RAID 10 costs nearly double but provides superior performance and faster rebuilds.
12. RAID Is Not a Backup#
All RAID levels except RAID 0 protect against individual drive failure, but RAID does not protect against:
- Simultaneous multi-drive failure beyond the RAID level's tolerance (power surge, fire, flood)
- Theft of the entire system
- Accidental deletion or user error, which is reflected immediately across all mirrors/parity
- Ransomware or malware that encrypts or corrupts files
- Controller or firmware bugs that corrupt the entire array
- Natural disasters affecting the physical location
Always maintain off-site backups of critical data, independent of the RAID configuration.
Troubleshooting#
| Issue | Cause | Solution |
|---|---|---|
| Array not assembling on boot | Missing mdadm.conf or outdated initramfs | Run mdadm --detail --scan >> /etc/mdadm.conf and rebuild initramfs |
| Single drive failure in RAID 5/6 | Physical disk failure | Replace drive, rebuild array; see mdadm recovery (disk failure and recovery) |
| Two drive failures in RAID 5 | Second failure during rebuild or simultaneous failure | Data is lost; restore from backup; use RAID 6 or RAID 10 in the future |
| Slow rebuild performance | I/O contention from production workload | Increase sync_speed_min, reduce application I/O, add write-intent bitmap |
High mismatch_cnt after scrub | Silent data corruption or URE | Investigate disk health with SMART; replace suspect drives; run repair sync action |
| RAID 5 write hole | Unclean shutdown without write-intent bitmap | Add bitmap to prevent future occurrences; check parity consistency |
| Performance worse than expected | Wrong chunk size, alignment issues, or read-ahead too small | Tune chunk size at creation; set blockdev --setra 4096 /dev/md0; align partitions to chunk boundaries |