Distributed, strongly consistent key-value store used as the backbone for Kubernetes cluster state, service discovery, and distributed configuration management.
Addresses below are RFC 5737 documentation ranges or placeholders - swap in your own.
Table of Contents#
- Overview
- Architecture and Concepts
- Installation
- Single-Node Setup
- Three-Node Cluster Setup
- TLS Configuration
- Authentication and RBAC
- Basic Operations
- Watch Mechanism
- Backup and Restore
- Performance Tuning
- Monitoring
- Troubleshooting
- See Also
- Sources
1. Overview#
etcd is a distributed key-value store built on the Raft consensus algorithm. It provides strong consistency guarantees, making it the default cluster state store for Kubernetes. etcd stores all cluster configuration, secrets, service discovery data, and coordination primitives (leases, locks, elections) in a hierarchical key space.
Key characteristics:
- Strong consistency - linearizable reads and writes via Raft
- Watch support - clients subscribe to key changes in real time
- Lease mechanism - TTL-based key expiration for ephemeral data
- Transaction support - atomic compare-and-swap operations
- Recommended cluster size - 3 or 5 nodes (odd numbers for quorum)
2. Architecture and Concepts#
2.1 Raft Consensus#
etcd uses Raft to replicate a write-ahead log (WAL) across all cluster members. One node is elected leader; all writes go through the leader and are committed once a majority (quorum) acknowledges.
- 3-node cluster - tolerates 1 failure (quorum = 2)
- 5-node cluster - tolerates 2 failures (quorum = 3)
2.2 Key-Value Store#
Data is organized in a flat key space with byte-string keys. The / convention creates a logical hierarchy (for example, /app/config/db_host). Each key has a creation revision, modification revision, and version counter.
2.3 Leases#
A lease is a time-to-live (TTL) grant. Keys attached to a lease are automatically deleted when the lease expires. Clients send periodic keep-alive requests to renew leases. This is the foundation for leader election and service registration.
2.4 Transactions#
etcd supports multi-key atomic transactions using an if/then/else model:
etcdctl txn --interactive
# compares: value("key1") = "val1"
# success: put key2 "val2"
# failure: put key2 "failed"3. Installation#
3.1 Debian/Ubuntu#
sudo apt update
sudo apt install -y etcd3.2 RHEL/Rocky/Alma#
sudo dnf install -y etcd3.3 Binary Install (Any Distribution)#
ETCD_VER="v3.5.17"
DOWNLOAD_URL="https://github.com/etcd-io/etcd/releases/download"
curl -fsSL "${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz" \
| sudo tar -xz -C /usr/local/bin --strip-components=1 \
etcd-${ETCD_VER}-linux-amd64/etcd \
etcd-${ETCD_VER}-linux-amd64/etcdctl \
etcd-${ETCD_VER}-linux-amd64/etcdutl
etcd --version
etcdctl version4. Single-Node Setup#
Useful for development and testing:
# Start with default settings (data in /var/lib/etcd or current directory)
etcd --data-dir /var/lib/etcd
# Verify
etcdctl put hello world
etcdctl get helloFor production single-node, create a systemd unit:
cat > /etc/systemd/system/etcd.service <<'EOF'
[Unit]
Description=etcd key-value store
After=network-online.target
Wants=network-online.target
[Service]
Type=notify
User=etcd
ExecStart=/usr/local/bin/etcd \
--data-dir=/var/lib/etcd \
--listen-client-urls=http://127.0.0.1:2379 \
--advertise-client-urls=http://127.0.0.1:2379
Restart=always
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF
useradd -r -s /sbin/nologin etcd
mkdir -p /var/lib/etcd && chown etcd:etcd /var/lib/etcd
systemctl daemon-reload
systemctl enable --now etcd5. Three-Node Cluster Setup#
5.1 Environment Variables#
Set these on each node, adjusting CURRENT_NODE and CURRENT_IP:
export CURRENT_NODE="etcd1"
export CURRENT_IP="192.0.2.11"
export IP_ETCD1="192.0.2.11"
export IP_ETCD2="192.0.2.12"
export IP_ETCD3="192.0.2.13"
export CLUSTER_TOKEN="my-etcd-cluster"5.2 Firewall#
# Client communication
firewall-cmd --permanent --add-port=2379/tcp
# Peer communication
firewall-cmd --permanent --add-port=2380/tcp
firewall-cmd --reload5.3 Configuration#
cat > /etc/etcd/etcd.conf <<EOF
[member]
ETCD_NAME="${CURRENT_NODE}"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_LISTEN_PEER_URLS="http://${CURRENT_IP}:2380"
ETCD_LISTEN_CLIENT_URLS="http://${CURRENT_IP}:2379,http://127.0.0.1:2379"
[cluster]
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://${CURRENT_IP}:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://${CURRENT_IP}:2379"
ETCD_INITIAL_CLUSTER="etcd1=http://${IP_ETCD1}:2380,etcd2=http://${IP_ETCD2}:2380,etcd3=http://${IP_ETCD3}:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="${CLUSTER_TOKEN}"
EOF5.4 Start and Verify#
Start etcd on all three nodes, then verify:
systemctl enable --now etcd
# Check cluster health
etcdctl --endpoints=http://${IP_ETCD1}:2379,http://${IP_ETCD2}:2379,http://${IP_ETCD3}:2379 \
endpoint health
# Check member list
etcdctl member list --write-out=table6. TLS Configuration#
6.1 Generate Certificates#
Use cfssl or openssl to create a CA and per-node certificates. Each node needs:
ca.pem- CA certificate<node>.pem- server certificate (SAN must include node IP and 127.0.0.1)<node>-key.pem- server private keyclient.pem/client-key.pem- client certificate for etcdctl
6.2 etcd Configuration with TLS#
cat > /etc/etcd/etcd.conf <<EOF
[member]
ETCD_NAME="${CURRENT_NODE}"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_LISTEN_PEER_URLS="https://${CURRENT_IP}:2380"
ETCD_LISTEN_CLIENT_URLS="https://${CURRENT_IP}:2379,https://127.0.0.1:2379"
[cluster]
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://${CURRENT_IP}:2380"
ETCD_ADVERTISE_CLIENT_URLS="https://${CURRENT_IP}:2379"
ETCD_INITIAL_CLUSTER="etcd1=https://${IP_ETCD1}:2380,etcd2=https://${IP_ETCD2}:2380,etcd3=https://${IP_ETCD3}:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="${CLUSTER_TOKEN}"
[security]
ETCD_CERT_FILE="/etc/etcd/pki/${CURRENT_NODE}.pem"
ETCD_KEY_FILE="/etc/etcd/pki/${CURRENT_NODE}-key.pem"
ETCD_TRUSTED_CA_FILE="/etc/etcd/pki/ca.pem"
ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CERT_FILE="/etc/etcd/pki/${CURRENT_NODE}.pem"
ETCD_PEER_KEY_FILE="/etc/etcd/pki/${CURRENT_NODE}-key.pem"
ETCD_PEER_TRUSTED_CA_FILE="/etc/etcd/pki/ca.pem"
ETCD_PEER_CLIENT_CERT_AUTH="true"
EOF6.3 etcdctl with TLS#
export ETCDCTL_CACERT="/etc/etcd/pki/ca.pem"
export ETCDCTL_CERT="/etc/etcd/pki/client.pem"
export ETCDCTL_KEY="/etc/etcd/pki/client-key.pem"
export ETCDCTL_ENDPOINTS="https://${IP_ETCD1}:2379,https://${IP_ETCD2}:2379,https://${IP_ETCD3}:2379"
etcdctl endpoint health7. Authentication and RBAC#
7.1 Enable Authentication#
# Create the root user (required before enabling auth)
etcdctl user add root
# Enter password when prompted
# Enable authentication
etcdctl auth enable7.2 Create Roles and Users#
# Create a read-only role for a specific prefix
etcdctl role add app-reader
etcdctl role grant-permission app-reader read /app/ --prefix
# Create a read-write role
etcdctl role add app-writer
etcdctl role grant-permission app-writer readwrite /app/ --prefix
# Create a user and assign a role
etcdctl user add appuser
etcdctl user grant-role appuser app-writer
# Verify
etcdctl user get appuser
etcdctl role get app-writer7.3 Authenticate with etcdctl#
# Password-based
etcdctl --user=appuser --password=<password> put /app/key value
# Or set environment variables
export ETCDCTL_USER="appuser:<password>"
etcdctl put /app/key value8. Basic Operations#
8.1 Key-Value CRUD#
# Put a key
etcdctl put /app/config/db_host "192.0.2.50"
# Get a key
etcdctl get /app/config/db_host
# Get all keys with a prefix
etcdctl get /app/config/ --prefix
# Get only values (no keys)
etcdctl get /app/config/ --prefix --print-value-only
# Delete a key
etcdctl del /app/config/db_host
# Delete all keys with a prefix
etcdctl del /app/config/ --prefix8.2 Leases#
# Grant a lease with 60-second TTL
etcdctl lease grant 60
# Returns: lease 694d8257011ccb0a granted with TTL(60s)
# Attach a key to a lease
etcdctl put /services/web/node1 "alive" --lease=694d8257011ccb0a
# Keep-alive (renew indefinitely)
etcdctl lease keep-alive 694d8257011ccb0a
# Revoke a lease (deletes all attached keys)
etcdctl lease revoke 694d8257011ccb0a
# List active leases
etcdctl lease list8.3 Compaction and Defragmentation#
# Get current revision
etcdctl endpoint status --write-out=json | python3 -c "import sys,json; print(json.load(sys.stdin)[0]['Status']['header']['revision'])"
# Compact history up to a revision (frees storage)
etcdctl compact <revision>
# Defragment each member to reclaim disk space
etcdctl defrag --endpoints=http://${IP_ETCD1}:23799. Watch Mechanism#
The watch API allows clients to receive real-time notifications when keys change.
9.1 Basic Watch#
# Watch a single key (blocks until a change occurs)
etcdctl watch /app/config/db_host
# In another terminal, trigger the watch:
etcdctl put /app/config/db_host "192.0.2.51"9.2 Prefix Watch#
# Watch all keys under /app/config/
etcdctl watch /app/config/ --prefix9.3 Watch from a Specific Revision#
# Replay all changes since revision 42
etcdctl watch /app/config/ --prefix --rev=429.4 Watch with Filters#
# Only watch PUT events (ignore deletes)
etcdctl watch /app/config/ --prefix --filter-put=false
# Only watch DELETE events
etcdctl watch /app/config/ --prefix --filter-delete=false9.5 Programmatic Watch (Go Example)#
watcher := clientv3.NewWatcher(cli)
watchChan := watcher.Watch(context.Background(), "/app/config/", clientv3.WithPrefix())
for resp := range watchChan {
for _, event := range resp.Events {
fmt.Printf("Type: %s, Key: %s, Value: %s\n",
event.Type, event.Kv.Key, event.Kv.Value)
}
}10. Backup and Restore#
10.1 Snapshot Backup#
# Take a snapshot from a healthy member
etcdctl snapshot save /backup/etcd-$(date +%Y%m%d-%H%M%S).snap
# Verify the snapshot
etcdctl snapshot status /backup/etcd-20260322-120000.snap --write-out=table10.2 Automated Backup Script#
#!/bin/bash
# /usr/local/bin/etcd-backup.sh
BACKUP_DIR="/backup/etcd"
RETENTION_DAYS=7
SNAP="${BACKUP_DIR}/etcd-$(date +%Y%m%d-%H%M%S).snap"
mkdir -p "${BACKUP_DIR}"
etcdctl snapshot save "${SNAP}"
if [ $? -eq 0 ]; then
echo "Backup successful: ${SNAP}"
find "${BACKUP_DIR}" -name "*.snap" -mtime +${RETENTION_DAYS} -delete
else
echo "Backup FAILED" >&2
exit 1
fi# Cron: daily at 02:00
echo "0 2 * * * root /usr/local/bin/etcd-backup.sh" > /etc/cron.d/etcd-backup10.3 Restore from Snapshot#
Restore must be performed on every cluster member. The restored data directory replaces the existing one.
# Stop etcd on all nodes
systemctl stop etcd
# On each node, restore into a new data directory
etcdutl snapshot restore /backup/etcd-20260322-120000.snap \
--name="${CURRENT_NODE}" \
--initial-cluster="etcd1=https://${IP_ETCD1}:2380,etcd2=https://${IP_ETCD2}:2380,etcd3=https://${IP_ETCD3}:2380" \
--initial-cluster-token="${CLUSTER_TOKEN}" \
--initial-advertise-peer-urls="https://${CURRENT_IP}:2380" \
--data-dir=/var/lib/etcd
# Start etcd on all nodes
systemctl start etcd11. Performance Tuning#
11.1 Storage#
| Parameter | Default | Recommended | Notes |
|---|---|---|---|
--quota-backend-bytes | 2 GB | 8 GB max | Raise if alarm triggers on space |
--auto-compaction-mode | (none) | periodic | Prevents unbounded history growth |
--auto-compaction-retention | (none) | 1h or 1000 (revisions) | Frequency of compaction |
--snapshot-count | 100000 | 10000-100000 | Lower = more frequent snapshots |
11.2 Disk I/O#
etcd is extremely sensitive to disk latency. The WAL and snapshot storage must be on fast disks.
# Use a dedicated SSD for the data directory
# Measure disk fsync latency (should be < 10ms)
fio --name=test --rw=write --fdatasync=1 --size=22m --bs=2300 --directory=/var/lib/etcd- Use
ionice -c2 -n0for the etcd process if competing for I/O - Avoid co-locating etcd with other disk-heavy workloads
- Consider separate disks for
--wal-dirand--data-dir
11.3 Network#
# Increase heartbeat interval for high-latency networks
--heartbeat-interval=250 # default 100ms
--election-timeout=2500 # default 1000ms; must be 5-10x heartbeat11.4 File Descriptors#
# In the systemd unit
LimitNOFILE=6553612. Monitoring#
12.1 Key Metrics#
# Endpoint status (leader, DB size, raft index)
etcdctl endpoint status --write-out=table
# Endpoint health
etcdctl endpoint health --write-out=table
# Member list
etcdctl member list --write-out=table12.2 Prometheus Metrics#
etcd exposes Prometheus metrics on the client URL at /metrics:
curl -s http://${CURRENT_IP}:2379/metrics | grep etcd_serverKey metrics to monitor:
| Metric | Purpose |
|---|---|
etcd_server_has_leader | 1 if member sees a leader |
etcd_server_leader_changes_seen_total | Frequent changes indicate instability |
etcd_disk_wal_fsync_duration_seconds | WAL fsync latency (p99 < 10ms) |
etcd_disk_backend_commit_duration_seconds | Backend commit latency |
etcd_network_peer_round_trip_time_seconds | Peer RTT |
etcd_mvcc_db_total_size_in_bytes | Database size on disk |
12.3 Alarms#
# Check for active alarms (e.g., NOSPACE)
etcdctl alarm list
# Resolve after compaction + defrag
etcdctl alarm disarmTroubleshooting#
| Issue | Cause | Solution |
|---|---|---|
NOSPACE alarm fires | Database exceeded --quota-backend-bytes | Compact old revisions, defrag, then etcdctl alarm disarm; raise quota if needed |
| Leader election loops | Disk too slow for WAL fsync | Move data dir to SSD; increase --heartbeat-interval and --election-timeout |
| Cluster cannot form | Mismatched --initial-cluster or token | Ensure identical --initial-cluster on all nodes; wipe data dir and bootstrap fresh |
context deadline exceeded on operations | Network partition or overloaded leader | Check network connectivity; verify leader health with endpoint status |
| Member shows unhealthy | Member fell behind on Raft log | Restart the member; if data is corrupt, remove and re-add it |
| Authentication locked out | Forgot root password | Disable auth: stop all nodes, start one with --auth-token="", reset root password |
| High memory usage | Too many watchers or large key space | Monitor watcher count; compact + defrag; consider splitting into multiple clusters |
| Snapshot restore fails | Version mismatch or corrupt snapshot | Use etcdutl (not etcdctl) for v3.5+; verify snapshot with snapshot status |