ZFS · ArchWorks

ZFS is a combined filesystem and volume manager originally developed by Sun Microsystems, providing pooled storage, copy-on-write, checksumming, snapshots, RAID-Z, and built-in replication.

Table of Contents#

Overview
License Considerations
Installation
Pool Management
Dataset Hierarchy
Snapshots and Clones
Send/Receive Replication
RAID-Z and Redundancy
Pool Import and Recovery
ARC Cache Tuning
Scrub and Resilver
Properties and Tuning
Troubleshooting
See Also
Sources

1. Overview#

ZFS is a fundamentally different approach to storage management. It combines the roles of a filesystem, volume manager, and RAID controller into a single integrated system. Originally created by Sun Microsystems for Solaris, it is now maintained as the open-source OpenZFS project across Linux, FreeBSD, and other platforms.

Key features:

Pooled storage - aggregates disks into a storage pool; filesystems (datasets) draw from the shared pool automatically
Copy-on-Write (CoW) - data is never overwritten in place, ensuring consistency even after power loss
End-to-end checksumming - every block is checksummed (SHA-256 or fletcher4), detecting and correcting silent data corruption
Snapshots and clones - instant, space-efficient point-in-time copies
RAID-Z - integrated software RAID (RAID-Z1, Z2, Z3) without the RAID-5 write hole
Send/receive - efficient incremental replication between pools or systems
Compression - transparent compression (LZ4, zstd, gzip, lzjb)
Deduplication - block-level deduplication (memory-intensive; use with caution)
Encryption - native dataset encryption (OpenZFS 0.8+)

2. License Considerations#

ZFS is licensed under the CDDL (Common Development and Distribution License), which is incompatible with the GPL (Linux kernel license). This means:

ZFS cannot be distributed as a built-in kernel module by Linux distributions
It is distributed as a separate DKMS module (compiled against your kernel) or via pre-built packages
Ubuntu ships ZFS in its repositories (via a legal interpretation that kernel modules loaded at runtime do not constitute a derivative work)
Other distributions (Fedora, Debian, Arch) provide ZFS through third-party repositories or the AUR
FreeBSD includes ZFS natively with no licensing conflict

Practical impact: After kernel updates, the ZFS DKMS module must be rebuilt. This occasionally causes delays when a new kernel is released before ZFS is updated to support it.

3. Installation#

Ubuntu/Debian#

sudo apt install -y zfsutils-linux

RHEL/CentOS/Rocky#

sudo dnf install -y https://zfsonlinux.org/epel/zfs-release-2-3.el9.noarch.rpm
sudo dnf install -y zfs
sudo modprobe zfs

Arch Linux#

# From the AUR (DKMS version)
yay -S zfs-dkms

# Load the module
sudo modprobe zfs

Verify Installation#

zfs version
zpool version

4. Pool Management#

Creating Pools#

# Simple pool (no redundancy, like RAID 0)
zpool create tank /dev/sda /dev/sdb

# Mirror (RAID 1)
zpool create tank mirror /dev/sda /dev/sdb

# RAID-Z1 (single parity, like RAID 5)
zpool create tank raidz /dev/sda /dev/sdb /dev/sdc

# RAID-Z2 (double parity, like RAID 6)
zpool create tank raidz2 /dev/sda /dev/sdb /dev/sdc /dev/sdd

# RAID-Z3 (triple parity)
zpool create tank raidz3 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde

# Striped mirrors (RAID 10 equivalent)
zpool create tank \
  mirror /dev/sda /dev/sdb \
  mirror /dev/sdc /dev/sdd

Using Disk IDs (Recommended)#

Use /dev/disk/by-id/ paths to prevent device name changes across reboots:

zpool create tank mirror \
  /dev/disk/by-id/ata-WDC_WD40EFRX_abc \
  /dev/disk/by-id/ata-WDC_WD40EFRX_xyz

Special vdevs#

# Add a dedicated log device (SLOG) for synchronous writes
zpool add tank log mirror /dev/nvme0n1 /dev/nvme1n1

# Add a cache device (L2ARC) for read caching
zpool add tank cache /dev/nvme2n1

# Add a special allocation class for metadata and small blocks
zpool add tank special mirror /dev/nvme3n1 /dev/nvme4n1

Pool Status and Information#

# Pool status with health and scrub info
zpool status tank

# I/O statistics
zpool iostat tank 5

# Pool space usage
zpool list

# Detailed pool properties
zpool get all tank

Destroying a Pool#

zpool destroy tank

5. Dataset Hierarchy#

ZFS datasets are organized in a hierarchical tree, similar to a directory structure. Each dataset is an independent filesystem that inherits properties from its parent.

tank                    (pool root dataset)
  tank/home             (home directories)
    tank/home/user1     (user1's home)
    tank/home/user2     (user2's home)
  tank/data             (general data)
  tank/vms              (virtual machines)
  tank/backup           (backup targets)

Creating Datasets#

# Create datasets
zfs create tank/home
zfs create tank/home/user1
zfs create tank/data
zfs create tank/vms

# Create with specific properties
zfs create -o compression=zstd -o quota=100G tank/home/user1

Datasets are automatically mounted at a path matching their name (e.g., tank/home/user1 at /tank/home/user1). Override with the mountpoint property:

zfs set mountpoint=/home/user1 tank/home/user1

Listing Datasets#

# List all datasets
zfs list

# List with specific properties
zfs list -o name,used,avail,refer,compression,compressratio

# List recursively under a parent
zfs list -r tank/home

Property Inheritance#

Properties set on a parent dataset are inherited by children:

# Set compression on the parent; all children inherit it
zfs set compression=zstd tank

# Override on a specific child
zfs set compression=off tank/vms

# Check where a property value comes from
zfs get compression tank/home/user1
# SOURCE column shows "inherited from tank" or "local"

Destroying Datasets#

# Destroy a dataset (must have no children or snapshots unless -r is used)
zfs destroy tank/data

# Recursive destroy (destroys all children and snapshots)
zfs destroy -r tank/old

6. Snapshots and Clones#

Snapshots#

Snapshots are read-only, point-in-time copies. They are created instantly and consume no additional space until data changes.

# Create a snapshot
zfs snapshot tank/data@2026-03-22

# Create recursive snapshots (all child datasets)
zfs snapshot -r tank/home@daily-2026-03-22

# List snapshots
zfs list -t snapshot

# List snapshots for a specific dataset
zfs list -t snapshot -r tank/data

# Check snapshot space usage
zfs list -o name,used,refer -t snapshot tank/data

Accessing Snapshot Data#

Snapshots are accessible via the .zfs/snapshot hidden directory:

ls /tank/data/.zfs/snapshot/2026-03-22/

Rolling Back#

# Rollback to a snapshot (destroys all changes since the snapshot)
zfs rollback tank/data@2026-03-22

# Rollback past intermediate snapshots (destroys them)
zfs rollback -r tank/data@2026-03-22

Clones#

Clones are writable copies of snapshots:

# Create a clone
zfs clone tank/data@2026-03-22 tank/data-test

# Promote a clone to an independent dataset
zfs promote tank/data-test

Destroying Snapshots#

# Destroy a single snapshot
zfs destroy tank/data@2026-03-22

# Destroy a range of snapshots
zfs destroy tank/data@2026-03-01%2026-03-22

# Destroy all snapshots matching a pattern
zfs destroy tank/data@daily-%

7. Send/Receive Replication#

ZFS send/receive enables efficient replication of datasets and snapshots between pools, systems, or even to files.

Full Send#

# Send a snapshot to another pool on the same system
zfs send tank/data@snap1 | zfs receive backup/data

# Send to a remote system via SSH
zfs send tank/data@snap1 | ssh remote zfs receive backup/data

Incremental Send#

# Send only the changes between two snapshots
zfs send -i tank/data@snap1 tank/data@snap2 | zfs receive backup/data

# Incremental based on the last common snapshot (intermediate snapshots included)
zfs send -I tank/data@snap1 tank/data@snap5 | zfs receive backup/data

Resumable Send#

# If a send is interrupted, get the resume token
zfs get receive_resume_token backup/data

# Resume the send
zfs send -t <resume-token> | zfs receive -s backup/data

Replication Workflow#

A typical backup workflow:

# Initial full replication
zfs snapshot -r tank@baseline
zfs send -R tank@baseline | ssh backup-server zfs receive -F backuppool

# Daily incremental
zfs snapshot -r tank@daily-$(date +%Y%m%d)
zfs send -R -i tank@daily-$(date -d yesterday +%Y%m%d) tank@daily-$(date +%Y%m%d) | \
  ssh backup-server zfs receive -F backuppool

Raw Encrypted Send#

# Send encrypted datasets without decrypting
zfs send --raw tank/encrypted@snap1 | ssh remote zfs receive backup/encrypted

8. RAID-Z and Redundancy#

RAID-Z Levels#

Level	Parity	Drives Lost	Min Drives	Usable Capacity
RAID-Z1	Single	1	3	(N-1) x disk
RAID-Z2	Double	2	4	(N-2) x disk
RAID-Z3	Triple	3	5	(N-3) x disk

RAID-Z vs Traditional RAID 5#

ZFS RAID-Z eliminates the RAID-5 write hole because it uses variable-width stripes and CoW. Data and parity are always consistent, even after a power failure, without needing a battery-backed cache.

Recommended Configurations#

Drives	Configuration	Rationale
2	Mirror	Simple redundancy
3-4	RAID-Z1	Good balance of space and protection
4-8	RAID-Z2	Recommended for large drives (>2 TB)
8+	Striped RAID-Z2 (multiple vdevs)	Balance performance and redundancy
6+ (critical)	RAID-Z3	Maximum protection for enterprise

Adding a Vdev to Expand a Pool#

# Add another RAID-Z2 vdev (must match existing vdev geometry)
zpool add tank raidz2 /dev/sde /dev/sdf /dev/sdg /dev/sdh

Note: You cannot add individual disks to an existing RAID-Z vdev. You must add entire new vdevs.

Expansion (OpenZFS 2.3+)#

OpenZFS 2.3 introduces RAID-Z expansion, allowing a single disk to be added to an existing RAID-Z vdev:

zpool attach tank raidz-0 /dev/sdnew

9. Pool Import and Recovery#

Importing Pools#

When moving disks between systems or after a reboot where auto-import is not configured:

# Scan for importable pools
zpool import

# Import a specific pool
zpool import tank

# Import with a different name
zpool import tank newtank

# Import a pool from a specific directory
zpool import -d /dev/disk/by-id tank

# Force import (pool was not cleanly exported)
zpool import -f tank

# Import read-only (for recovery)
zpool import -o readonly=on tank

Exporting Pools#

Always export before moving disks:

zpool export tank

Recovery from Failed Import#

# Import with missing log device (data loss for in-flight sync writes)
zpool import -m tank

# Clear persistent errors after fixing the underlying issue
zpool clear tank

# Revert to a previous transaction group (last resort)
zpool import -T <txg> tank

Repairing a Degraded Pool#

# Check pool status to identify the failed device
zpool status tank

# Replace the failed device
zpool replace tank /dev/old_device /dev/new_device

# If the device was removed, use the device ID
zpool replace tank <device-guid> /dev/new_device

10. ARC Cache Tuning#

The ARC (Adaptive Replacement Cache) is ZFS's primary read cache, stored in RAM. By default, ZFS uses up to 50% of system RAM for the ARC.

Checking ARC Usage#

# Summary
arc_summary

# Detailed statistics
arcstat

Setting ARC Size#

# Set maximum ARC size (bytes) - runtime
echo 8589934592 > /sys/module/zfs/parameters/zfs_arc_max  # 8 GiB

# Persistent via modprobe configuration
echo "options zfs zfs_arc_max=8589934592" > /etc/modprobe.d/zfs.conf

# Set minimum ARC size
echo "options zfs zfs_arc_min=2147483648" >> /etc/modprobe.d/zfs.conf  # 2 GiB

ARC Tuning Guidelines#

System Role	Recommended ARC Max	Rationale
Dedicated file server	75% of RAM	Maximize cache hit rate
Hypervisor with VMs	25-50% of RAM	Leave RAM for VM memory
Database server	25% of RAM	Database has its own cache
Desktop/workstation	50% of RAM (default)	Balance with application needs

L2ARC (Level 2 ARC)#

L2ARC extends the read cache to a fast SSD:

# Add L2ARC device
zpool add tank cache /dev/nvme0n1

# Remove L2ARC device
zpool remove tank /dev/nvme0n1

L2ARC is most effective when the working set exceeds ARC (RAM) but fits on the SSD. Each L2ARC entry consumes approximately 70 bytes of ARC RAM for the index.

SLOG (Separate Intent Log)#

SLOG accelerates synchronous writes (e.g., NFS, databases):

# Add a mirrored SLOG
zpool add tank log mirror /dev/nvme1n1 /dev/nvme2n1

# Remove SLOG (reverts to on-disk ZIL)
zpool remove tank /dev/nvme1n1 /dev/nvme2n1

Use a high-endurance, low-latency NVMe device with power-loss protection for SLOG.

11. Scrub and Resilver#

Scrub#

A scrub reads all data and metadata, verifies checksums, and repairs corruption from redundant copies:

# Start a scrub
zpool scrub tank

# Cancel a scrub
zpool scrub -s tank

# Check scrub status and results
zpool status tank

Scheduling Scrubs#

# Systemd timer for monthly scrub
# /etc/systemd/system/zfs-scrub@.timer
[Unit]
Description=Monthly ZFS scrub on %i

[Timer]
OnCalendar=monthly
Persistent=true
RandomizedDelaySec=1w

[Install]
WantedBy=timers.target

# /etc/systemd/system/zfs-scrub@.service
[Unit]
Description=ZFS scrub on %i

[Service]
Type=oneshot
ExecStart=/usr/sbin/zpool scrub %i

sudo systemctl enable --now zfs-scrub@tank.timer

Recommendation: Scrub production pools at least monthly; weekly for critical data.

Resilver#

Resilvering is the process of rebuilding a replaced or missing device. ZFS resilvers only the blocks that are actually used, unlike traditional RAID which rebuilds the entire disk.

# Replace a device (triggers automatic resilver)
zpool replace tank /dev/old /dev/new

# Monitor resilver progress
zpool status tank

Resilver Priority#

# Increase resilver speed (higher priority)
echo 0 > /sys/module/zfs/parameters/zfs_resilver_delay

# Decrease to reduce impact on production I/O
echo 2 > /sys/module/zfs/parameters/zfs_resilver_delay

# Persistent
echo "options zfs zfs_resilver_delay=0" >> /etc/modprobe.d/zfs.conf

Sequential Resilver (OpenZFS 2.2+)#

OpenZFS 2.2 introduced sequential resilver, which rebuilds data in disk order rather than pool order, significantly reducing resilver time:

# Enable (may be default in newer versions)
echo 1 > /sys/module/zfs/parameters/zfs_resilver_disable_defer

12. Properties and Tuning#

Common Properties#

# Set properties
zfs set compression=zstd tank/data
zfs set atime=off tank
zfs set quota=500G tank/home/user1
zfs set reservation=100G tank/data
zfs set recordsize=1M tank/media       # Large files
zfs set recordsize=16K tank/database   # Database workloads

# Get properties
zfs get all tank/data
zfs get compression,compressratio tank/data

Key Properties#

Property	Description	Recommended Value
`compression`	Transparent compression	`zstd` (best ratio/speed) or `lz4` (fastest)
`atime`	Update access time on read	`off` (reduces writes)
`relatime`	Update atime only if mtime is newer	`on` (compromise)
`recordsize`	Maximum block size	`128K` (default), `1M` for media, `16K` for databases
`quota`	Maximum space for dataset and children	Set per-user/per-project
`reservation`	Guaranteed minimum space	Use sparingly
`dedup`	Block-level deduplication	`off` (requires ~5 GiB RAM per TB of data)
`encryption`	Dataset encryption	`aes-256-gcm`
`xattr`	Extended attribute storage	`sa` (store in system attribute, faster)
`dnodesize`	Dnode size	`auto` (allows larger xattrs/metadata)
`sync`	Synchronous write behavior	`standard` (default); `disabled` only for non-critical data
`copies`	Number of data copies	`1` (default); `2` for extra protection without RAID

Encryption#

# Create an encrypted dataset
zfs create -o encryption=aes-256-gcm -o keyformat=passphrase tank/secret

# Lock/unlock
zfs unload-key tank/secret    # Lock
zfs load-key tank/secret      # Unlock
zfs mount tank/secret

13. Troubleshooting#

Issue	Cause	Solution
`pool is degraded`	One or more devices failed or removed	Run `zpool status` to identify the failed device; replace with `zpool replace tank <old> <new>`
`CKSUM` errors in `zpool status`	Silent data corruption on disk	Scrub the pool: `zpool scrub tank`; ZFS auto-repairs from redundant copies if available
Cannot import pool	Pool was not exported, or disks moved	Use `zpool import -f tank`; for missing log, use `zpool import -m tank`
ARC consuming too much RAM	Default ARC max too high for workload	Set `zfs_arc_max` in `/etc/modprobe.d/zfs.conf`
`No space left on device` but `zfs list` shows free space	Snapshot space, metadata reservation, or fragmentation	Delete old snapshots; check `zfs list -t snapshot -o name,used -s used`; verify `zpool list -v`
Very slow resilver	Large pool with high fragmentation or I/O contention	Set `zfs_resilver_delay=0`; reduce application I/O; enable sequential resilver
`zfs module not loaded` after kernel update	DKMS rebuild failed for new kernel	Run `sudo dkms autoinstall`; check `dkms status`; ensure kernel headers are installed
Pool will not mount at boot	`zfs-mount.service` not enabled or pool not cached	Run `systemctl enable zfs-mount.service zfs-import-cache.service`; run `zpool set cachefile=/etc/zfs/zpool.cache tank`
Dedup using excessive RAM	DDT (dedup table) stored in ARC	Disable dedup: `zfs set dedup=off tank/data` (existing deduped blocks remain until overwritten); add more RAM
Encrypted dataset won't unlock	Wrong passphrase or keyformat mismatch	Verify key location: `zfs get keylocation tank/secret`; try `zfs load-key -L prompt tank/secret`