ZFS is a combined filesystem and volume manager originally developed by Sun Microsystems, providing pooled storage, copy-on-write, checksumming, snapshots, RAID-Z, and built-in replication.

Table of Contents#

  1. Overview
  2. License Considerations
  3. Installation
  4. Pool Management
  5. Dataset Hierarchy
  6. Snapshots and Clones
  7. Send/Receive Replication
  8. RAID-Z and Redundancy
  9. Pool Import and Recovery
  10. ARC Cache Tuning
  11. Scrub and Resilver
  12. Properties and Tuning
  13. Troubleshooting
  14. See Also
  15. Sources

1. Overview#

ZFS is a fundamentally different approach to storage management. It combines the roles of a filesystem, volume manager, and RAID controller into a single integrated system. Originally created by Sun Microsystems for Solaris, it is now maintained as the open-source OpenZFS project across Linux, FreeBSD, and other platforms.

Key features:

  • Pooled storage - aggregates disks into a storage pool; filesystems (datasets) draw from the shared pool automatically
  • Copy-on-Write (CoW) - data is never overwritten in place, ensuring consistency even after power loss
  • End-to-end checksumming - every block is checksummed (SHA-256 or fletcher4), detecting and correcting silent data corruption
  • Snapshots and clones - instant, space-efficient point-in-time copies
  • RAID-Z - integrated software RAID (RAID-Z1, Z2, Z3) without the RAID-5 write hole
  • Send/receive - efficient incremental replication between pools or systems
  • Compression - transparent compression (LZ4, zstd, gzip, lzjb)
  • Deduplication - block-level deduplication (memory-intensive; use with caution)
  • Encryption - native dataset encryption (OpenZFS 0.8+)

2. License Considerations#

ZFS is licensed under the CDDL (Common Development and Distribution License), which is incompatible with the GPL (Linux kernel license). This means:

  • ZFS cannot be distributed as a built-in kernel module by Linux distributions
  • It is distributed as a separate DKMS module (compiled against your kernel) or via pre-built packages
  • Ubuntu ships ZFS in its repositories (via a legal interpretation that kernel modules loaded at runtime do not constitute a derivative work)
  • Other distributions (Fedora, Debian, Arch) provide ZFS through third-party repositories or the AUR
  • FreeBSD includes ZFS natively with no licensing conflict

Practical impact: After kernel updates, the ZFS DKMS module must be rebuilt. This occasionally causes delays when a new kernel is released before ZFS is updated to support it.

3. Installation#

Ubuntu/Debian#

sudo apt install -y zfsutils-linux

RHEL/CentOS/Rocky#

sudo dnf install -y https://zfsonlinux.org/epel/zfs-release-2-3.el9.noarch.rpm
sudo dnf install -y zfs
sudo modprobe zfs

Arch Linux#

# From the AUR (DKMS version)
yay -S zfs-dkms

# Load the module
sudo modprobe zfs

Verify Installation#

zfs version
zpool version

4. Pool Management#

Creating Pools#

# Simple pool (no redundancy, like RAID 0)
zpool create tank /dev/sda /dev/sdb

# Mirror (RAID 1)
zpool create tank mirror /dev/sda /dev/sdb

# RAID-Z1 (single parity, like RAID 5)
zpool create tank raidz /dev/sda /dev/sdb /dev/sdc

# RAID-Z2 (double parity, like RAID 6)
zpool create tank raidz2 /dev/sda /dev/sdb /dev/sdc /dev/sdd

# RAID-Z3 (triple parity)
zpool create tank raidz3 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde

# Striped mirrors (RAID 10 equivalent)
zpool create tank \
  mirror /dev/sda /dev/sdb \
  mirror /dev/sdc /dev/sdd

Use /dev/disk/by-id/ paths to prevent device name changes across reboots:

zpool create tank mirror \
  /dev/disk/by-id/ata-WDC_WD40EFRX_abc \
  /dev/disk/by-id/ata-WDC_WD40EFRX_xyz

Special vdevs#

# Add a dedicated log device (SLOG) for synchronous writes
zpool add tank log mirror /dev/nvme0n1 /dev/nvme1n1

# Add a cache device (L2ARC) for read caching
zpool add tank cache /dev/nvme2n1

# Add a special allocation class for metadata and small blocks
zpool add tank special mirror /dev/nvme3n1 /dev/nvme4n1

Pool Status and Information#

# Pool status with health and scrub info
zpool status tank

# I/O statistics
zpool iostat tank 5

# Pool space usage
zpool list

# Detailed pool properties
zpool get all tank

Destroying a Pool#

zpool destroy tank

5. Dataset Hierarchy#

ZFS datasets are organized in a hierarchical tree, similar to a directory structure. Each dataset is an independent filesystem that inherits properties from its parent.

tank                    (pool root dataset)
  tank/home             (home directories)
    tank/home/user1     (user1's home)
    tank/home/user2     (user2's home)
  tank/data             (general data)
  tank/vms              (virtual machines)
  tank/backup           (backup targets)

Creating Datasets#

# Create datasets
zfs create tank/home
zfs create tank/home/user1
zfs create tank/data
zfs create tank/vms

# Create with specific properties
zfs create -o compression=zstd -o quota=100G tank/home/user1

Datasets are automatically mounted at a path matching their name (e.g., tank/home/user1 at /tank/home/user1). Override with the mountpoint property:

zfs set mountpoint=/home/user1 tank/home/user1

Listing Datasets#

# List all datasets
zfs list

# List with specific properties
zfs list -o name,used,avail,refer,compression,compressratio

# List recursively under a parent
zfs list -r tank/home

Property Inheritance#

Properties set on a parent dataset are inherited by children:

# Set compression on the parent; all children inherit it
zfs set compression=zstd tank

# Override on a specific child
zfs set compression=off tank/vms

# Check where a property value comes from
zfs get compression tank/home/user1
# SOURCE column shows "inherited from tank" or "local"

Destroying Datasets#

# Destroy a dataset (must have no children or snapshots unless -r is used)
zfs destroy tank/data

# Recursive destroy (destroys all children and snapshots)
zfs destroy -r tank/old

6. Snapshots and Clones#

Snapshots#

Snapshots are read-only, point-in-time copies. They are created instantly and consume no additional space until data changes.

# Create a snapshot
zfs snapshot tank/data@2026-03-22

# Create recursive snapshots (all child datasets)
zfs snapshot -r tank/home@daily-2026-03-22

# List snapshots
zfs list -t snapshot

# List snapshots for a specific dataset
zfs list -t snapshot -r tank/data

# Check snapshot space usage
zfs list -o name,used,refer -t snapshot tank/data

Accessing Snapshot Data#

Snapshots are accessible via the .zfs/snapshot hidden directory:

ls /tank/data/.zfs/snapshot/2026-03-22/

Rolling Back#

# Rollback to a snapshot (destroys all changes since the snapshot)
zfs rollback tank/data@2026-03-22

# Rollback past intermediate snapshots (destroys them)
zfs rollback -r tank/data@2026-03-22

Clones#

Clones are writable copies of snapshots:

# Create a clone
zfs clone tank/data@2026-03-22 tank/data-test

# Promote a clone to an independent dataset
zfs promote tank/data-test

Destroying Snapshots#

# Destroy a single snapshot
zfs destroy tank/data@2026-03-22

# Destroy a range of snapshots
zfs destroy tank/data@2026-03-01%2026-03-22

# Destroy all snapshots matching a pattern
zfs destroy tank/data@daily-%

7. Send/Receive Replication#

ZFS send/receive enables efficient replication of datasets and snapshots between pools, systems, or even to files.

Full Send#

# Send a snapshot to another pool on the same system
zfs send tank/data@snap1 | zfs receive backup/data

# Send to a remote system via SSH
zfs send tank/data@snap1 | ssh remote zfs receive backup/data

Incremental Send#

# Send only the changes between two snapshots
zfs send -i tank/data@snap1 tank/data@snap2 | zfs receive backup/data

# Incremental based on the last common snapshot (intermediate snapshots included)
zfs send -I tank/data@snap1 tank/data@snap5 | zfs receive backup/data

Resumable Send#

# If a send is interrupted, get the resume token
zfs get receive_resume_token backup/data

# Resume the send
zfs send -t <resume-token> | zfs receive -s backup/data

Replication Workflow#

A typical backup workflow:

# Initial full replication
zfs snapshot -r tank@baseline
zfs send -R tank@baseline | ssh backup-server zfs receive -F backuppool

# Daily incremental
zfs snapshot -r tank@daily-$(date +%Y%m%d)
zfs send -R -i tank@daily-$(date -d yesterday +%Y%m%d) tank@daily-$(date +%Y%m%d) | \
  ssh backup-server zfs receive -F backuppool

Raw Encrypted Send#

# Send encrypted datasets without decrypting
zfs send --raw tank/encrypted@snap1 | ssh remote zfs receive backup/encrypted

8. RAID-Z and Redundancy#

RAID-Z Levels#

LevelParityDrives LostMin DrivesUsable Capacity
RAID-Z1Single13(N-1) x disk
RAID-Z2Double24(N-2) x disk
RAID-Z3Triple35(N-3) x disk

RAID-Z vs Traditional RAID 5#

ZFS RAID-Z eliminates the RAID-5 write hole because it uses variable-width stripes and CoW. Data and parity are always consistent, even after a power failure, without needing a battery-backed cache.

DrivesConfigurationRationale
2MirrorSimple redundancy
3-4RAID-Z1Good balance of space and protection
4-8RAID-Z2Recommended for large drives (>2 TB)
8+Striped RAID-Z2 (multiple vdevs)Balance performance and redundancy
6+ (critical)RAID-Z3Maximum protection for enterprise

Adding a Vdev to Expand a Pool#

# Add another RAID-Z2 vdev (must match existing vdev geometry)
zpool add tank raidz2 /dev/sde /dev/sdf /dev/sdg /dev/sdh

Note: You cannot add individual disks to an existing RAID-Z vdev. You must add entire new vdevs.

Expansion (OpenZFS 2.3+)#

OpenZFS 2.3 introduces RAID-Z expansion, allowing a single disk to be added to an existing RAID-Z vdev:

zpool attach tank raidz-0 /dev/sdnew

9. Pool Import and Recovery#

Importing Pools#

When moving disks between systems or after a reboot where auto-import is not configured:

# Scan for importable pools
zpool import

# Import a specific pool
zpool import tank

# Import with a different name
zpool import tank newtank

# Import a pool from a specific directory
zpool import -d /dev/disk/by-id tank

# Force import (pool was not cleanly exported)
zpool import -f tank

# Import read-only (for recovery)
zpool import -o readonly=on tank

Exporting Pools#

Always export before moving disks:

zpool export tank

Recovery from Failed Import#

# Import with missing log device (data loss for in-flight sync writes)
zpool import -m tank

# Clear persistent errors after fixing the underlying issue
zpool clear tank

# Revert to a previous transaction group (last resort)
zpool import -T <txg> tank

Repairing a Degraded Pool#

# Check pool status to identify the failed device
zpool status tank

# Replace the failed device
zpool replace tank /dev/old_device /dev/new_device

# If the device was removed, use the device ID
zpool replace tank <device-guid> /dev/new_device

10. ARC Cache Tuning#

The ARC (Adaptive Replacement Cache) is ZFS's primary read cache, stored in RAM. By default, ZFS uses up to 50% of system RAM for the ARC.

Checking ARC Usage#

# Summary
arc_summary

# Detailed statistics
arcstat

Setting ARC Size#

# Set maximum ARC size (bytes) - runtime
echo 8589934592 > /sys/module/zfs/parameters/zfs_arc_max  # 8 GiB

# Persistent via modprobe configuration
echo "options zfs zfs_arc_max=8589934592" > /etc/modprobe.d/zfs.conf

# Set minimum ARC size
echo "options zfs zfs_arc_min=2147483648" >> /etc/modprobe.d/zfs.conf  # 2 GiB

ARC Tuning Guidelines#

System RoleRecommended ARC MaxRationale
Dedicated file server75% of RAMMaximize cache hit rate
Hypervisor with VMs25-50% of RAMLeave RAM for VM memory
Database server25% of RAMDatabase has its own cache
Desktop/workstation50% of RAM (default)Balance with application needs

L2ARC (Level 2 ARC)#

L2ARC extends the read cache to a fast SSD:

# Add L2ARC device
zpool add tank cache /dev/nvme0n1

# Remove L2ARC device
zpool remove tank /dev/nvme0n1

L2ARC is most effective when the working set exceeds ARC (RAM) but fits on the SSD. Each L2ARC entry consumes approximately 70 bytes of ARC RAM for the index.

SLOG (Separate Intent Log)#

SLOG accelerates synchronous writes (e.g., NFS, databases):

# Add a mirrored SLOG
zpool add tank log mirror /dev/nvme1n1 /dev/nvme2n1

# Remove SLOG (reverts to on-disk ZIL)
zpool remove tank /dev/nvme1n1 /dev/nvme2n1

Use a high-endurance, low-latency NVMe device with power-loss protection for SLOG.

11. Scrub and Resilver#

Scrub#

A scrub reads all data and metadata, verifies checksums, and repairs corruption from redundant copies:

# Start a scrub
zpool scrub tank

# Cancel a scrub
zpool scrub -s tank

# Check scrub status and results
zpool status tank

Scheduling Scrubs#

# Systemd timer for monthly scrub
# /etc/systemd/system/zfs-scrub@.timer
[Unit]
Description=Monthly ZFS scrub on %i

[Timer]
OnCalendar=monthly
Persistent=true
RandomizedDelaySec=1w

[Install]
WantedBy=timers.target
# /etc/systemd/system/zfs-scrub@.service
[Unit]
Description=ZFS scrub on %i

[Service]
Type=oneshot
ExecStart=/usr/sbin/zpool scrub %i
sudo systemctl enable --now zfs-scrub@tank.timer

Recommendation: Scrub production pools at least monthly; weekly for critical data.

Resilver#

Resilvering is the process of rebuilding a replaced or missing device. ZFS resilvers only the blocks that are actually used, unlike traditional RAID which rebuilds the entire disk.

# Replace a device (triggers automatic resilver)
zpool replace tank /dev/old /dev/new

# Monitor resilver progress
zpool status tank

Resilver Priority#

# Increase resilver speed (higher priority)
echo 0 > /sys/module/zfs/parameters/zfs_resilver_delay

# Decrease to reduce impact on production I/O
echo 2 > /sys/module/zfs/parameters/zfs_resilver_delay

# Persistent
echo "options zfs zfs_resilver_delay=0" >> /etc/modprobe.d/zfs.conf

Sequential Resilver (OpenZFS 2.2+)#

OpenZFS 2.2 introduced sequential resilver, which rebuilds data in disk order rather than pool order, significantly reducing resilver time:

# Enable (may be default in newer versions)
echo 1 > /sys/module/zfs/parameters/zfs_resilver_disable_defer

12. Properties and Tuning#

Common Properties#

# Set properties
zfs set compression=zstd tank/data
zfs set atime=off tank
zfs set quota=500G tank/home/user1
zfs set reservation=100G tank/data
zfs set recordsize=1M tank/media       # Large files
zfs set recordsize=16K tank/database   # Database workloads

# Get properties
zfs get all tank/data
zfs get compression,compressratio tank/data

Key Properties#

PropertyDescriptionRecommended Value
compressionTransparent compressionzstd (best ratio/speed) or lz4 (fastest)
atimeUpdate access time on readoff (reduces writes)
relatimeUpdate atime only if mtime is neweron (compromise)
recordsizeMaximum block size128K (default), 1M for media, 16K for databases
quotaMaximum space for dataset and childrenSet per-user/per-project
reservationGuaranteed minimum spaceUse sparingly
dedupBlock-level deduplicationoff (requires ~5 GiB RAM per TB of data)
encryptionDataset encryptionaes-256-gcm
xattrExtended attribute storagesa (store in system attribute, faster)
dnodesizeDnode sizeauto (allows larger xattrs/metadata)
syncSynchronous write behaviorstandard (default); disabled only for non-critical data
copiesNumber of data copies1 (default); 2 for extra protection without RAID

Encryption#

# Create an encrypted dataset
zfs create -o encryption=aes-256-gcm -o keyformat=passphrase tank/secret

# Lock/unlock
zfs unload-key tank/secret    # Lock
zfs load-key tank/secret      # Unlock
zfs mount tank/secret

13. Troubleshooting#

IssueCauseSolution
pool is degradedOne or more devices failed or removedRun zpool status to identify the failed device; replace with zpool replace tank <old> <new>
CKSUM errors in zpool statusSilent data corruption on diskScrub the pool: zpool scrub tank; ZFS auto-repairs from redundant copies if available
Cannot import poolPool was not exported, or disks movedUse zpool import -f tank; for missing log, use zpool import -m tank
ARC consuming too much RAMDefault ARC max too high for workloadSet zfs_arc_max in /etc/modprobe.d/zfs.conf
No space left on device but zfs list shows free spaceSnapshot space, metadata reservation, or fragmentationDelete old snapshots; check zfs list -t snapshot -o name,used -s used; verify zpool list -v
Very slow resilverLarge pool with high fragmentation or I/O contentionSet zfs_resilver_delay=0; reduce application I/O; enable sequential resilver
zfs module not loaded after kernel updateDKMS rebuild failed for new kernelRun sudo dkms autoinstall; check dkms status; ensure kernel headers are installed
Pool will not mount at bootzfs-mount.service not enabled or pool not cachedRun systemctl enable zfs-mount.service zfs-import-cache.service; run zpool set cachefile=/etc/zfs/zpool.cache tank
Dedup using excessive RAMDDT (dedup table) stored in ARCDisable dedup: zfs set dedup=off tank/data (existing deduped blocks remain until overwritten); add more RAM
Encrypted dataset won't unlockWrong passphrase or keyformat mismatchVerify key location: zfs get keylocation tank/secret; try zfs load-key -L prompt tank/secret

See Also#

Sources#