etcd · ArchWorks

Distributed, strongly consistent key-value store used as the backbone for Kubernetes cluster state, service discovery, and distributed configuration management.

Addresses below are RFC 5737 documentation ranges or placeholders - swap in your own.

Table of Contents#

Overview
Architecture and Concepts
Installation
Single-Node Setup
Three-Node Cluster Setup
TLS Configuration
Authentication and RBAC
Basic Operations
Watch Mechanism
Backup and Restore
Performance Tuning
Monitoring
Troubleshooting
See Also
Sources

1. Overview#

etcd is a distributed key-value store built on the Raft consensus algorithm. It provides strong consistency guarantees, making it the default cluster state store for Kubernetes. etcd stores all cluster configuration, secrets, service discovery data, and coordination primitives (leases, locks, elections) in a hierarchical key space.

Key characteristics:

Strong consistency - linearizable reads and writes via Raft
Watch support - clients subscribe to key changes in real time
Lease mechanism - TTL-based key expiration for ephemeral data
Transaction support - atomic compare-and-swap operations
Recommended cluster size - 3 or 5 nodes (odd numbers for quorum)

2. Architecture and Concepts#

2.1 Raft Consensus#

etcd uses Raft to replicate a write-ahead log (WAL) across all cluster members. One node is elected leader; all writes go through the leader and are committed once a majority (quorum) acknowledges.

3-node cluster - tolerates 1 failure (quorum = 2)
5-node cluster - tolerates 2 failures (quorum = 3)

2.2 Key-Value Store#

Data is organized in a flat key space with byte-string keys. The / convention creates a logical hierarchy (for example, /app/config/db_host). Each key has a creation revision, modification revision, and version counter.

2.3 Leases#

A lease is a time-to-live (TTL) grant. Keys attached to a lease are automatically deleted when the lease expires. Clients send periodic keep-alive requests to renew leases. This is the foundation for leader election and service registration.

2.4 Transactions#

etcd supports multi-key atomic transactions using an if/then/else model:

etcdctl txn --interactive
# compares: value("key1") = "val1"
# success:  put key2 "val2"
# failure:  put key2 "failed"

3. Installation#

3.1 Debian/Ubuntu#

sudo apt update
sudo apt install -y etcd

3.2 RHEL/Rocky/Alma#

sudo dnf install -y etcd

3.3 Binary Install (Any Distribution)#

ETCD_VER="v3.5.17"
DOWNLOAD_URL="https://github.com/etcd-io/etcd/releases/download"

curl -fsSL "${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz" \
  | sudo tar -xz -C /usr/local/bin --strip-components=1 \
    etcd-${ETCD_VER}-linux-amd64/etcd \
    etcd-${ETCD_VER}-linux-amd64/etcdctl \
    etcd-${ETCD_VER}-linux-amd64/etcdutl

etcd --version
etcdctl version

4. Single-Node Setup#

Useful for development and testing:

# Start with default settings (data in /var/lib/etcd or current directory)
etcd --data-dir /var/lib/etcd

# Verify
etcdctl put hello world
etcdctl get hello

For production single-node, create a systemd unit:

cat > /etc/systemd/system/etcd.service <<'EOF'
[Unit]
Description=etcd key-value store
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
User=etcd
ExecStart=/usr/local/bin/etcd \
  --data-dir=/var/lib/etcd \
  --listen-client-urls=http://127.0.0.1:2379 \
  --advertise-client-urls=http://127.0.0.1:2379
Restart=always
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF

useradd -r -s /sbin/nologin etcd
mkdir -p /var/lib/etcd && chown etcd:etcd /var/lib/etcd
systemctl daemon-reload
systemctl enable --now etcd

5. Three-Node Cluster Setup#

5.1 Environment Variables#

Set these on each node, adjusting CURRENT_NODE and CURRENT_IP:

export CURRENT_NODE="etcd1"
export CURRENT_IP="192.0.2.11"

export IP_ETCD1="192.0.2.11"
export IP_ETCD2="192.0.2.12"
export IP_ETCD3="192.0.2.13"

export CLUSTER_TOKEN="my-etcd-cluster"

5.2 Firewall#

# Client communication
firewall-cmd --permanent --add-port=2379/tcp
# Peer communication
firewall-cmd --permanent --add-port=2380/tcp
firewall-cmd --reload

5.3 Configuration#

cat > /etc/etcd/etcd.conf <<EOF
[member]
ETCD_NAME="${CURRENT_NODE}"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_LISTEN_PEER_URLS="http://${CURRENT_IP}:2380"
ETCD_LISTEN_CLIENT_URLS="http://${CURRENT_IP}:2379,http://127.0.0.1:2379"

[cluster]
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://${CURRENT_IP}:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://${CURRENT_IP}:2379"
ETCD_INITIAL_CLUSTER="etcd1=http://${IP_ETCD1}:2380,etcd2=http://${IP_ETCD2}:2380,etcd3=http://${IP_ETCD3}:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="${CLUSTER_TOKEN}"
EOF

5.4 Start and Verify#

Start etcd on all three nodes, then verify:

systemctl enable --now etcd

# Check cluster health
etcdctl --endpoints=http://${IP_ETCD1}:2379,http://${IP_ETCD2}:2379,http://${IP_ETCD3}:2379 \
  endpoint health

# Check member list
etcdctl member list --write-out=table

6. TLS Configuration#

6.1 Generate Certificates#

Use cfssl or openssl to create a CA and per-node certificates. Each node needs:

ca.pem - CA certificate
<node>.pem - server certificate (SAN must include node IP and 127.0.0.1)
<node>-key.pem - server private key
client.pem / client-key.pem - client certificate for etcdctl

6.2 etcd Configuration with TLS#

cat > /etc/etcd/etcd.conf <<EOF
[member]
ETCD_NAME="${CURRENT_NODE}"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_LISTEN_PEER_URLS="https://${CURRENT_IP}:2380"
ETCD_LISTEN_CLIENT_URLS="https://${CURRENT_IP}:2379,https://127.0.0.1:2379"

[cluster]
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://${CURRENT_IP}:2380"
ETCD_ADVERTISE_CLIENT_URLS="https://${CURRENT_IP}:2379"
ETCD_INITIAL_CLUSTER="etcd1=https://${IP_ETCD1}:2380,etcd2=https://${IP_ETCD2}:2380,etcd3=https://${IP_ETCD3}:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="${CLUSTER_TOKEN}"

[security]
ETCD_CERT_FILE="/etc/etcd/pki/${CURRENT_NODE}.pem"
ETCD_KEY_FILE="/etc/etcd/pki/${CURRENT_NODE}-key.pem"
ETCD_TRUSTED_CA_FILE="/etc/etcd/pki/ca.pem"
ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CERT_FILE="/etc/etcd/pki/${CURRENT_NODE}.pem"
ETCD_PEER_KEY_FILE="/etc/etcd/pki/${CURRENT_NODE}-key.pem"
ETCD_PEER_TRUSTED_CA_FILE="/etc/etcd/pki/ca.pem"
ETCD_PEER_CLIENT_CERT_AUTH="true"
EOF

6.3 etcdctl with TLS#

export ETCDCTL_CACERT="/etc/etcd/pki/ca.pem"
export ETCDCTL_CERT="/etc/etcd/pki/client.pem"
export ETCDCTL_KEY="/etc/etcd/pki/client-key.pem"
export ETCDCTL_ENDPOINTS="https://${IP_ETCD1}:2379,https://${IP_ETCD2}:2379,https://${IP_ETCD3}:2379"

etcdctl endpoint health

7. Authentication and RBAC#

7.1 Enable Authentication#

# Create the root user (required before enabling auth)
etcdctl user add root
# Enter password when prompted

# Enable authentication
etcdctl auth enable

7.2 Create Roles and Users#

# Create a read-only role for a specific prefix
etcdctl role add app-reader
etcdctl role grant-permission app-reader read /app/ --prefix

# Create a read-write role
etcdctl role add app-writer
etcdctl role grant-permission app-writer readwrite /app/ --prefix

# Create a user and assign a role
etcdctl user add appuser
etcdctl user grant-role appuser app-writer

# Verify
etcdctl user get appuser
etcdctl role get app-writer

7.3 Authenticate with etcdctl#

# Password-based
etcdctl --user=appuser --password=<password> put /app/key value

# Or set environment variables
export ETCDCTL_USER="appuser:<password>"
etcdctl put /app/key value

8. Basic Operations#

8.1 Key-Value CRUD#

# Put a key
etcdctl put /app/config/db_host "192.0.2.50"

# Get a key
etcdctl get /app/config/db_host

# Get all keys with a prefix
etcdctl get /app/config/ --prefix

# Get only values (no keys)
etcdctl get /app/config/ --prefix --print-value-only

# Delete a key
etcdctl del /app/config/db_host

# Delete all keys with a prefix
etcdctl del /app/config/ --prefix

8.2 Leases#

# Grant a lease with 60-second TTL
etcdctl lease grant 60
# Returns: lease 694d8257011ccb0a granted with TTL(60s)

# Attach a key to a lease
etcdctl put /services/web/node1 "alive" --lease=694d8257011ccb0a

# Keep-alive (renew indefinitely)
etcdctl lease keep-alive 694d8257011ccb0a

# Revoke a lease (deletes all attached keys)
etcdctl lease revoke 694d8257011ccb0a

# List active leases
etcdctl lease list

8.3 Compaction and Defragmentation#

# Get current revision
etcdctl endpoint status --write-out=json | python3 -c "import sys,json; print(json.load(sys.stdin)[0]['Status']['header']['revision'])"

# Compact history up to a revision (frees storage)
etcdctl compact <revision>

# Defragment each member to reclaim disk space
etcdctl defrag --endpoints=http://${IP_ETCD1}:2379

9. Watch Mechanism#

The watch API allows clients to receive real-time notifications when keys change.

9.1 Basic Watch#

# Watch a single key (blocks until a change occurs)
etcdctl watch /app/config/db_host

# In another terminal, trigger the watch:
etcdctl put /app/config/db_host "192.0.2.51"

9.2 Prefix Watch#

# Watch all keys under /app/config/
etcdctl watch /app/config/ --prefix

9.3 Watch from a Specific Revision#

# Replay all changes since revision 42
etcdctl watch /app/config/ --prefix --rev=42

9.4 Watch with Filters#

# Only watch PUT events (ignore deletes)
etcdctl watch /app/config/ --prefix --filter-put=false

# Only watch DELETE events
etcdctl watch /app/config/ --prefix --filter-delete=false

9.5 Programmatic Watch (Go Example)#

watcher := clientv3.NewWatcher(cli)
watchChan := watcher.Watch(context.Background(), "/app/config/", clientv3.WithPrefix())

for resp := range watchChan {
    for _, event := range resp.Events {
        fmt.Printf("Type: %s, Key: %s, Value: %s\n",
            event.Type, event.Kv.Key, event.Kv.Value)
    }
}

10. Backup and Restore#

10.1 Snapshot Backup#

# Take a snapshot from a healthy member
etcdctl snapshot save /backup/etcd-$(date +%Y%m%d-%H%M%S).snap

# Verify the snapshot
etcdctl snapshot status /backup/etcd-20260322-120000.snap --write-out=table

10.2 Automated Backup Script#

#!/bin/bash
# /usr/local/bin/etcd-backup.sh
BACKUP_DIR="/backup/etcd"
RETENTION_DAYS=7
SNAP="${BACKUP_DIR}/etcd-$(date +%Y%m%d-%H%M%S).snap"

mkdir -p "${BACKUP_DIR}"
etcdctl snapshot save "${SNAP}"

if [ $? -eq 0 ]; then
    echo "Backup successful: ${SNAP}"
    find "${BACKUP_DIR}" -name "*.snap" -mtime +${RETENTION_DAYS} -delete
else
    echo "Backup FAILED" >&2
    exit 1
fi

# Cron: daily at 02:00
echo "0 2 * * * root /usr/local/bin/etcd-backup.sh" > /etc/cron.d/etcd-backup

10.3 Restore from Snapshot#

Restore must be performed on every cluster member. The restored data directory replaces the existing one.

# Stop etcd on all nodes
systemctl stop etcd

# On each node, restore into a new data directory
etcdutl snapshot restore /backup/etcd-20260322-120000.snap \
  --name="${CURRENT_NODE}" \
  --initial-cluster="etcd1=https://${IP_ETCD1}:2380,etcd2=https://${IP_ETCD2}:2380,etcd3=https://${IP_ETCD3}:2380" \
  --initial-cluster-token="${CLUSTER_TOKEN}" \
  --initial-advertise-peer-urls="https://${CURRENT_IP}:2380" \
  --data-dir=/var/lib/etcd

# Start etcd on all nodes
systemctl start etcd

11. Performance Tuning#

11.1 Storage#

Parameter	Default	Recommended	Notes
`--quota-backend-bytes`	2 GB	8 GB max	Raise if alarm triggers on space
`--auto-compaction-mode`	(none)	`periodic`	Prevents unbounded history growth
`--auto-compaction-retention`	(none)	`1h` or `1000` (revisions)	Frequency of compaction
`--snapshot-count`	100000	10000-100000	Lower = more frequent snapshots

11.2 Disk I/O#

etcd is extremely sensitive to disk latency. The WAL and snapshot storage must be on fast disks.

# Use a dedicated SSD for the data directory
# Measure disk fsync latency (should be < 10ms)
fio --name=test --rw=write --fdatasync=1 --size=22m --bs=2300 --directory=/var/lib/etcd

Use ionice -c2 -n0 for the etcd process if competing for I/O
Avoid co-locating etcd with other disk-heavy workloads
Consider separate disks for --wal-dir and --data-dir

11.3 Network#

# Increase heartbeat interval for high-latency networks
--heartbeat-interval=250     # default 100ms
--election-timeout=2500      # default 1000ms; must be 5-10x heartbeat

11.4 File Descriptors#

# In the systemd unit
LimitNOFILE=65536

12. Monitoring#

12.1 Key Metrics#

# Endpoint status (leader, DB size, raft index)
etcdctl endpoint status --write-out=table

# Endpoint health
etcdctl endpoint health --write-out=table

# Member list
etcdctl member list --write-out=table

12.2 Prometheus Metrics#

etcd exposes Prometheus metrics on the client URL at /metrics:

curl -s http://${CURRENT_IP}:2379/metrics | grep etcd_server

Key metrics to monitor:

Metric	Purpose
`etcd_server_has_leader`	1 if member sees a leader
`etcd_server_leader_changes_seen_total`	Frequent changes indicate instability
`etcd_disk_wal_fsync_duration_seconds`	WAL fsync latency (p99 < 10ms)
`etcd_disk_backend_commit_duration_seconds`	Backend commit latency
`etcd_network_peer_round_trip_time_seconds`	Peer RTT
`etcd_mvcc_db_total_size_in_bytes`	Database size on disk

12.3 Alarms#

# Check for active alarms (e.g., NOSPACE)
etcdctl alarm list

# Resolve after compaction + defrag
etcdctl alarm disarm

Troubleshooting#

Issue	Cause	Solution
`NOSPACE` alarm fires	Database exceeded `--quota-backend-bytes`	Compact old revisions, defrag, then `etcdctl alarm disarm`; raise quota if needed
Leader election loops	Disk too slow for WAL fsync	Move data dir to SSD; increase `--heartbeat-interval` and `--election-timeout`
Cluster cannot form	Mismatched `--initial-cluster` or token	Ensure identical `--initial-cluster` on all nodes; wipe data dir and bootstrap fresh
`context deadline exceeded` on operations	Network partition or overloaded leader	Check network connectivity; verify leader health with `endpoint status`
Member shows unhealthy	Member fell behind on Raft log	Restart the member; if data is corrupt, remove and re-add it
Authentication locked out	Forgot root password	Disable auth: stop all nodes, start one with `--auth-token=""`, reset root password
High memory usage	Too many watchers or large key space	Monitor watcher count; compact + defrag; consider splitting into multiple clusters
Snapshot restore fails	Version mismatch or corrupt snapshot	Use `etcdutl` (not `etcdctl`) for v3.5+; verify snapshot with `snapshot status`