ZFS is a combined filesystem and volume manager originally developed by Sun Microsystems, providing pooled storage, copy-on-write, checksumming, snapshots, RAID-Z, and built-in replication.
Table of Contents#
- Overview
- License Considerations
- Installation
- Pool Management
- Dataset Hierarchy
- Snapshots and Clones
- Send/Receive Replication
- RAID-Z and Redundancy
- Pool Import and Recovery
- ARC Cache Tuning
- Scrub and Resilver
- Properties and Tuning
- Troubleshooting
- See Also
- Sources
1. Overview#
ZFS is a fundamentally different approach to storage management. It combines the roles of a filesystem, volume manager, and RAID controller into a single integrated system. Originally created by Sun Microsystems for Solaris, it is now maintained as the open-source OpenZFS project across Linux, FreeBSD, and other platforms.
Key features:
- Pooled storage - aggregates disks into a storage pool; filesystems (datasets) draw from the shared pool automatically
- Copy-on-Write (CoW) - data is never overwritten in place, ensuring consistency even after power loss
- End-to-end checksumming - every block is checksummed (SHA-256 or fletcher4), detecting and correcting silent data corruption
- Snapshots and clones - instant, space-efficient point-in-time copies
- RAID-Z - integrated software RAID (RAID-Z1, Z2, Z3) without the RAID-5 write hole
- Send/receive - efficient incremental replication between pools or systems
- Compression - transparent compression (LZ4, zstd, gzip, lzjb)
- Deduplication - block-level deduplication (memory-intensive; use with caution)
- Encryption - native dataset encryption (OpenZFS 0.8+)
2. License Considerations#
ZFS is licensed under the CDDL (Common Development and Distribution License), which is incompatible with the GPL (Linux kernel license). This means:
- ZFS cannot be distributed as a built-in kernel module by Linux distributions
- It is distributed as a separate DKMS module (compiled against your kernel) or via pre-built packages
- Ubuntu ships ZFS in its repositories (via a legal interpretation that kernel modules loaded at runtime do not constitute a derivative work)
- Other distributions (Fedora, Debian, Arch) provide ZFS through third-party repositories or the AUR
- FreeBSD includes ZFS natively with no licensing conflict
Practical impact: After kernel updates, the ZFS DKMS module must be rebuilt. This occasionally causes delays when a new kernel is released before ZFS is updated to support it.
3. Installation#
Ubuntu/Debian#
sudo apt install -y zfsutils-linuxRHEL/CentOS/Rocky#
sudo dnf install -y https://zfsonlinux.org/epel/zfs-release-2-3.el9.noarch.rpm
sudo dnf install -y zfs
sudo modprobe zfsArch Linux#
# From the AUR (DKMS version)
yay -S zfs-dkms
# Load the module
sudo modprobe zfsVerify Installation#
zfs version
zpool version4. Pool Management#
Creating Pools#
# Simple pool (no redundancy, like RAID 0)
zpool create tank /dev/sda /dev/sdb
# Mirror (RAID 1)
zpool create tank mirror /dev/sda /dev/sdb
# RAID-Z1 (single parity, like RAID 5)
zpool create tank raidz /dev/sda /dev/sdb /dev/sdc
# RAID-Z2 (double parity, like RAID 6)
zpool create tank raidz2 /dev/sda /dev/sdb /dev/sdc /dev/sdd
# RAID-Z3 (triple parity)
zpool create tank raidz3 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde
# Striped mirrors (RAID 10 equivalent)
zpool create tank \
mirror /dev/sda /dev/sdb \
mirror /dev/sdc /dev/sddUsing Disk IDs (Recommended)#
Use /dev/disk/by-id/ paths to prevent device name changes across reboots:
zpool create tank mirror \
/dev/disk/by-id/ata-WDC_WD40EFRX_abc \
/dev/disk/by-id/ata-WDC_WD40EFRX_xyzSpecial vdevs#
# Add a dedicated log device (SLOG) for synchronous writes
zpool add tank log mirror /dev/nvme0n1 /dev/nvme1n1
# Add a cache device (L2ARC) for read caching
zpool add tank cache /dev/nvme2n1
# Add a special allocation class for metadata and small blocks
zpool add tank special mirror /dev/nvme3n1 /dev/nvme4n1Pool Status and Information#
# Pool status with health and scrub info
zpool status tank
# I/O statistics
zpool iostat tank 5
# Pool space usage
zpool list
# Detailed pool properties
zpool get all tankDestroying a Pool#
zpool destroy tank5. Dataset Hierarchy#
ZFS datasets are organized in a hierarchical tree, similar to a directory structure. Each dataset is an independent filesystem that inherits properties from its parent.
tank (pool root dataset)
tank/home (home directories)
tank/home/user1 (user1's home)
tank/home/user2 (user2's home)
tank/data (general data)
tank/vms (virtual machines)
tank/backup (backup targets)Creating Datasets#
# Create datasets
zfs create tank/home
zfs create tank/home/user1
zfs create tank/data
zfs create tank/vms
# Create with specific properties
zfs create -o compression=zstd -o quota=100G tank/home/user1Datasets are automatically mounted at a path matching their name (e.g., tank/home/user1 at /tank/home/user1). Override with the mountpoint property:
zfs set mountpoint=/home/user1 tank/home/user1Listing Datasets#
# List all datasets
zfs list
# List with specific properties
zfs list -o name,used,avail,refer,compression,compressratio
# List recursively under a parent
zfs list -r tank/homeProperty Inheritance#
Properties set on a parent dataset are inherited by children:
# Set compression on the parent; all children inherit it
zfs set compression=zstd tank
# Override on a specific child
zfs set compression=off tank/vms
# Check where a property value comes from
zfs get compression tank/home/user1
# SOURCE column shows "inherited from tank" or "local"Destroying Datasets#
# Destroy a dataset (must have no children or snapshots unless -r is used)
zfs destroy tank/data
# Recursive destroy (destroys all children and snapshots)
zfs destroy -r tank/old6. Snapshots and Clones#
Snapshots#
Snapshots are read-only, point-in-time copies. They are created instantly and consume no additional space until data changes.
# Create a snapshot
zfs snapshot tank/data@2026-03-22
# Create recursive snapshots (all child datasets)
zfs snapshot -r tank/home@daily-2026-03-22
# List snapshots
zfs list -t snapshot
# List snapshots for a specific dataset
zfs list -t snapshot -r tank/data
# Check snapshot space usage
zfs list -o name,used,refer -t snapshot tank/dataAccessing Snapshot Data#
Snapshots are accessible via the .zfs/snapshot hidden directory:
ls /tank/data/.zfs/snapshot/2026-03-22/Rolling Back#
# Rollback to a snapshot (destroys all changes since the snapshot)
zfs rollback tank/data@2026-03-22
# Rollback past intermediate snapshots (destroys them)
zfs rollback -r tank/data@2026-03-22Clones#
Clones are writable copies of snapshots:
# Create a clone
zfs clone tank/data@2026-03-22 tank/data-test
# Promote a clone to an independent dataset
zfs promote tank/data-testDestroying Snapshots#
# Destroy a single snapshot
zfs destroy tank/data@2026-03-22
# Destroy a range of snapshots
zfs destroy tank/data@2026-03-01%2026-03-22
# Destroy all snapshots matching a pattern
zfs destroy tank/data@daily-%7. Send/Receive Replication#
ZFS send/receive enables efficient replication of datasets and snapshots between pools, systems, or even to files.
Full Send#
# Send a snapshot to another pool on the same system
zfs send tank/data@snap1 | zfs receive backup/data
# Send to a remote system via SSH
zfs send tank/data@snap1 | ssh remote zfs receive backup/dataIncremental Send#
# Send only the changes between two snapshots
zfs send -i tank/data@snap1 tank/data@snap2 | zfs receive backup/data
# Incremental based on the last common snapshot (intermediate snapshots included)
zfs send -I tank/data@snap1 tank/data@snap5 | zfs receive backup/dataResumable Send#
# If a send is interrupted, get the resume token
zfs get receive_resume_token backup/data
# Resume the send
zfs send -t <resume-token> | zfs receive -s backup/dataReplication Workflow#
A typical backup workflow:
# Initial full replication
zfs snapshot -r tank@baseline
zfs send -R tank@baseline | ssh backup-server zfs receive -F backuppool
# Daily incremental
zfs snapshot -r tank@daily-$(date +%Y%m%d)
zfs send -R -i tank@daily-$(date -d yesterday +%Y%m%d) tank@daily-$(date +%Y%m%d) | \
ssh backup-server zfs receive -F backuppoolRaw Encrypted Send#
# Send encrypted datasets without decrypting
zfs send --raw tank/encrypted@snap1 | ssh remote zfs receive backup/encrypted8. RAID-Z and Redundancy#
RAID-Z Levels#
| Level | Parity | Drives Lost | Min Drives | Usable Capacity |
|---|---|---|---|---|
| RAID-Z1 | Single | 1 | 3 | (N-1) x disk |
| RAID-Z2 | Double | 2 | 4 | (N-2) x disk |
| RAID-Z3 | Triple | 3 | 5 | (N-3) x disk |
RAID-Z vs Traditional RAID 5#
ZFS RAID-Z eliminates the RAID-5 write hole because it uses variable-width stripes and CoW. Data and parity are always consistent, even after a power failure, without needing a battery-backed cache.
Recommended Configurations#
| Drives | Configuration | Rationale |
|---|---|---|
| 2 | Mirror | Simple redundancy |
| 3-4 | RAID-Z1 | Good balance of space and protection |
| 4-8 | RAID-Z2 | Recommended for large drives (>2 TB) |
| 8+ | Striped RAID-Z2 (multiple vdevs) | Balance performance and redundancy |
| 6+ (critical) | RAID-Z3 | Maximum protection for enterprise |
Adding a Vdev to Expand a Pool#
# Add another RAID-Z2 vdev (must match existing vdev geometry)
zpool add tank raidz2 /dev/sde /dev/sdf /dev/sdg /dev/sdhNote: You cannot add individual disks to an existing RAID-Z vdev. You must add entire new vdevs.
Expansion (OpenZFS 2.3+)#
OpenZFS 2.3 introduces RAID-Z expansion, allowing a single disk to be added to an existing RAID-Z vdev:
zpool attach tank raidz-0 /dev/sdnew9. Pool Import and Recovery#
Importing Pools#
When moving disks between systems or after a reboot where auto-import is not configured:
# Scan for importable pools
zpool import
# Import a specific pool
zpool import tank
# Import with a different name
zpool import tank newtank
# Import a pool from a specific directory
zpool import -d /dev/disk/by-id tank
# Force import (pool was not cleanly exported)
zpool import -f tank
# Import read-only (for recovery)
zpool import -o readonly=on tankExporting Pools#
Always export before moving disks:
zpool export tankRecovery from Failed Import#
# Import with missing log device (data loss for in-flight sync writes)
zpool import -m tank
# Clear persistent errors after fixing the underlying issue
zpool clear tank
# Revert to a previous transaction group (last resort)
zpool import -T <txg> tankRepairing a Degraded Pool#
# Check pool status to identify the failed device
zpool status tank
# Replace the failed device
zpool replace tank /dev/old_device /dev/new_device
# If the device was removed, use the device ID
zpool replace tank <device-guid> /dev/new_device10. ARC Cache Tuning#
The ARC (Adaptive Replacement Cache) is ZFS's primary read cache, stored in RAM. By default, ZFS uses up to 50% of system RAM for the ARC.
Checking ARC Usage#
# Summary
arc_summary
# Detailed statistics
arcstatSetting ARC Size#
# Set maximum ARC size (bytes) - runtime
echo 8589934592 > /sys/module/zfs/parameters/zfs_arc_max # 8 GiB
# Persistent via modprobe configuration
echo "options zfs zfs_arc_max=8589934592" > /etc/modprobe.d/zfs.conf
# Set minimum ARC size
echo "options zfs zfs_arc_min=2147483648" >> /etc/modprobe.d/zfs.conf # 2 GiBARC Tuning Guidelines#
| System Role | Recommended ARC Max | Rationale |
|---|---|---|
| Dedicated file server | 75% of RAM | Maximize cache hit rate |
| Hypervisor with VMs | 25-50% of RAM | Leave RAM for VM memory |
| Database server | 25% of RAM | Database has its own cache |
| Desktop/workstation | 50% of RAM (default) | Balance with application needs |
L2ARC (Level 2 ARC)#
L2ARC extends the read cache to a fast SSD:
# Add L2ARC device
zpool add tank cache /dev/nvme0n1
# Remove L2ARC device
zpool remove tank /dev/nvme0n1L2ARC is most effective when the working set exceeds ARC (RAM) but fits on the SSD. Each L2ARC entry consumes approximately 70 bytes of ARC RAM for the index.
SLOG (Separate Intent Log)#
SLOG accelerates synchronous writes (e.g., NFS, databases):
# Add a mirrored SLOG
zpool add tank log mirror /dev/nvme1n1 /dev/nvme2n1
# Remove SLOG (reverts to on-disk ZIL)
zpool remove tank /dev/nvme1n1 /dev/nvme2n1Use a high-endurance, low-latency NVMe device with power-loss protection for SLOG.
11. Scrub and Resilver#
Scrub#
A scrub reads all data and metadata, verifies checksums, and repairs corruption from redundant copies:
# Start a scrub
zpool scrub tank
# Cancel a scrub
zpool scrub -s tank
# Check scrub status and results
zpool status tankScheduling Scrubs#
# Systemd timer for monthly scrub
# /etc/systemd/system/zfs-scrub@.timer
[Unit]
Description=Monthly ZFS scrub on %i
[Timer]
OnCalendar=monthly
Persistent=true
RandomizedDelaySec=1w
[Install]
WantedBy=timers.target# /etc/systemd/system/zfs-scrub@.service
[Unit]
Description=ZFS scrub on %i
[Service]
Type=oneshot
ExecStart=/usr/sbin/zpool scrub %isudo systemctl enable --now zfs-scrub@tank.timerRecommendation: Scrub production pools at least monthly; weekly for critical data.
Resilver#
Resilvering is the process of rebuilding a replaced or missing device. ZFS resilvers only the blocks that are actually used, unlike traditional RAID which rebuilds the entire disk.
# Replace a device (triggers automatic resilver)
zpool replace tank /dev/old /dev/new
# Monitor resilver progress
zpool status tankResilver Priority#
# Increase resilver speed (higher priority)
echo 0 > /sys/module/zfs/parameters/zfs_resilver_delay
# Decrease to reduce impact on production I/O
echo 2 > /sys/module/zfs/parameters/zfs_resilver_delay
# Persistent
echo "options zfs zfs_resilver_delay=0" >> /etc/modprobe.d/zfs.confSequential Resilver (OpenZFS 2.2+)#
OpenZFS 2.2 introduced sequential resilver, which rebuilds data in disk order rather than pool order, significantly reducing resilver time:
# Enable (may be default in newer versions)
echo 1 > /sys/module/zfs/parameters/zfs_resilver_disable_defer12. Properties and Tuning#
Common Properties#
# Set properties
zfs set compression=zstd tank/data
zfs set atime=off tank
zfs set quota=500G tank/home/user1
zfs set reservation=100G tank/data
zfs set recordsize=1M tank/media # Large files
zfs set recordsize=16K tank/database # Database workloads
# Get properties
zfs get all tank/data
zfs get compression,compressratio tank/dataKey Properties#
| Property | Description | Recommended Value |
|---|---|---|
compression | Transparent compression | zstd (best ratio/speed) or lz4 (fastest) |
atime | Update access time on read | off (reduces writes) |
relatime | Update atime only if mtime is newer | on (compromise) |
recordsize | Maximum block size | 128K (default), 1M for media, 16K for databases |
quota | Maximum space for dataset and children | Set per-user/per-project |
reservation | Guaranteed minimum space | Use sparingly |
dedup | Block-level deduplication | off (requires ~5 GiB RAM per TB of data) |
encryption | Dataset encryption | aes-256-gcm |
xattr | Extended attribute storage | sa (store in system attribute, faster) |
dnodesize | Dnode size | auto (allows larger xattrs/metadata) |
sync | Synchronous write behavior | standard (default); disabled only for non-critical data |
copies | Number of data copies | 1 (default); 2 for extra protection without RAID |
Encryption#
# Create an encrypted dataset
zfs create -o encryption=aes-256-gcm -o keyformat=passphrase tank/secret
# Lock/unlock
zfs unload-key tank/secret # Lock
zfs load-key tank/secret # Unlock
zfs mount tank/secret13. Troubleshooting#
| Issue | Cause | Solution |
|---|---|---|
pool is degraded | One or more devices failed or removed | Run zpool status to identify the failed device; replace with zpool replace tank <old> <new> |
CKSUM errors in zpool status | Silent data corruption on disk | Scrub the pool: zpool scrub tank; ZFS auto-repairs from redundant copies if available |
| Cannot import pool | Pool was not exported, or disks moved | Use zpool import -f tank; for missing log, use zpool import -m tank |
| ARC consuming too much RAM | Default ARC max too high for workload | Set zfs_arc_max in /etc/modprobe.d/zfs.conf |
No space left on device but zfs list shows free space | Snapshot space, metadata reservation, or fragmentation | Delete old snapshots; check zfs list -t snapshot -o name,used -s used; verify zpool list -v |
| Very slow resilver | Large pool with high fragmentation or I/O contention | Set zfs_resilver_delay=0; reduce application I/O; enable sequential resilver |
zfs module not loaded after kernel update | DKMS rebuild failed for new kernel | Run sudo dkms autoinstall; check dkms status; ensure kernel headers are installed |
| Pool will not mount at boot | zfs-mount.service not enabled or pool not cached | Run systemctl enable zfs-mount.service zfs-import-cache.service; run zpool set cachefile=/etc/zfs/zpool.cache tank |
| Dedup using excessive RAM | DDT (dedup table) stored in ARC | Disable dedup: zfs set dedup=off tank/data (existing deduped blocks remain until overwritten); add more RAM |
| Encrypted dataset won't unlock | Wrong passphrase or keyformat mismatch | Verify key location: zfs get keylocation tank/secret; try zfs load-key -L prompt tank/secret |