Docker Swarm

Native Docker clustering and orchestration tool that turns a pool of Docker hosts into a single virtual host with service scheduling, scaling, and rolling updates.

Deprecated: Docker Swarm is in maintenance-only mode and receives no new features. For new container orchestration deployments, use Kubernetes or Docker Compose (for single-host setups). Existing Swarm clusters continue to function, but migration planning is recommended.

Table of Contents#

Overview
Installation
Firewall Configuration
Cluster Setup
- 4.1 Initializing the Swarm
- 4.2 Joining Nodes
- 4.3 Draining Manager Nodes
Cheat Sheet
- 5.1 Swarm Commands
- 5.2 Node Commands
- 5.3 Service Commands
- 5.4 Stack Commands
- 5.5 Secret and Config Commands
Troubleshooting

1. Overview#

Docker Swarm mode is built into the Docker Engine and provides native orchestration capabilities:

Declarative service model: Define the desired state and Swarm maintains it
Scaling: Scale services up or down with a single command
Rolling updates: Update services with zero downtime
Service discovery: Built-in DNS-based service discovery and load balancing
Mutual TLS: Automatic TLS encryption between nodes
Secrets management: Encrypted storage for sensitive data

Node roles:

Role	Description
Manager	Orchestrates the cluster, schedules tasks, maintains the Raft consensus state. Odd numbers recommended (3 or 5).
Worker	Executes container workloads. No access to cluster management.

2. Installation#

Docker Swarm mode is included with Docker Engine. Install Docker using your distribution's package manager:

# Arch Linux
pacman -S docker

# Debian / Ubuntu
sudo apt install docker-ce docker-ce-cli containerd.io

# RHEL / Fedora
sudo dnf install docker-ce docker-ce-cli containerd.io

Enable and start Docker:

sudo systemctl enable --now docker

3. Firewall Configuration#

Swarm requires the following ports open between all nodes:

Port	Protocol	Purpose
2377	TCP	Cluster management and Raft consensus
7946	TCP + UDP	Node-to-node communication (gossip)
4789	UDP	VXLAN overlay network traffic (default data path port)

iptables#

# Manager and worker nodes
sudo iptables -A INPUT -p tcp --dport 2377 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 7946 -j ACCEPT
sudo iptables -A INPUT -p udp --dport 7946 -j ACCEPT
sudo iptables -A INPUT -p udp --dport 4789 -j ACCEPT

# Persist rules (Debian/Ubuntu)
sudo apt install iptables-persistent
sudo netfilter-persistent save

# Persist rules (RHEL/Fedora)
sudo dnf install iptables-services
sudo service iptables save

firewalld#

sudo firewall-cmd --permanent --zone=public --add-port=2377/tcp
sudo firewall-cmd --permanent --zone=public --add-port=7946/tcp
sudo firewall-cmd --permanent --zone=public --add-port=7946/udp
sudo firewall-cmd --permanent --zone=public --add-port=4789/udp
sudo firewall-cmd --reload

nftables#

table inet filter {
  chain input {
    tcp dport 2377 accept comment "Swarm management"
    tcp dport 7946 accept comment "Swarm gossip TCP"
    udp dport 7946 accept comment "Swarm gossip UDP"
    udp dport 4789 accept comment "Swarm VXLAN overlay"
  }
}

ufw#

sudo ufw allow 2377/tcp
sudo ufw allow 7946/tcp
sudo ufw allow 7946/udp
sudo ufw allow 4789/udp

Note: If you run in a VMware NSX environment, use a custom data path port (e.g., --data-path-port 14789) to avoid conflicts with NSX's own VXLAN traffic on port 4789. Open that custom port instead.

4. Cluster Setup#

4.1 Initializing the Swarm#

On the first manager node:

# Basic initialization
docker swarm init

# Specify the advertise address (required on multi-NIC hosts)
docker swarm init --advertise-addr <manager-ip>

# Custom data path port (for VMware NSX or port conflict scenarios)
docker swarm init --data-path-port 14789

4.2 Joining Nodes#

After initialization, get the join tokens:

# Get the worker join token
docker swarm join-token worker

# Get the manager join token
docker swarm join-token manager

On each node to join:

# Join as a worker
docker swarm join --token <worker-token> <manager-ip>:2377

# Join as a manager
docker swarm join --token <manager-token> <manager-ip>:2377

4.3 Draining Manager Nodes#

To prevent manager nodes from running application workloads, set their availability to drain:

docker node update --availability drain <node_name>

This ensures managers focus on cluster management. Existing tasks on the node are rescheduled to active workers.

5. Cheat Sheet#

5.1 Swarm Commands#

Command	Description
`docker swarm init`	Initialize a new swarm
`docker swarm init --advertise-addr <ip>`	Initialize with a specific advertise address
`docker swarm join --token <token> <ip>:2377`	Join a node to the swarm
`docker swarm leave`	Leave the swarm (worker)
`docker swarm leave --force`	Force leave (manager)
`docker swarm join-token worker`	Display the worker join token
`docker swarm join-token manager`	Display the manager join token
`docker swarm join-token --rotate worker`	Rotate the worker join token
`docker swarm unlock`	Unlock a locked swarm after restart
`docker swarm unlock-key`	Display the unlock key
`docker swarm unlock-key --rotate`	Rotate the unlock key
`docker swarm update --autolock=true`	Enable autolock on the swarm
`docker swarm ca`	Display and rotate the root CA

5.2 Node Commands#

Command	Description
`docker node ls`	List all nodes in the swarm
`docker node inspect <node>`	Show detailed node information
`docker node ps <node>`	List tasks running on a node
`docker node promote <node>`	Promote a worker to manager
`docker node demote <node>`	Demote a manager to worker
`docker node rm <node>`	Remove a node from the swarm
`docker node update --availability active <node>`	Set node to accept tasks
`docker node update --availability pause <node>`	Prevent new tasks, keep existing
`docker node update --availability drain <node>`	Reschedule all tasks off the node
`docker node update --label-add <key>=<value> <node>`	Add a label to a node
`docker node update --label-rm <key> <node>`	Remove a label from a node

5.3 Service Commands#

Command	Description
`docker service create --name <name> <image>`	Create a new service
`docker service ls`	List all services
`docker service ps <service>`	List tasks (containers) for a service
`docker service inspect <service>`	Show detailed service information
`docker service logs <service>`	Show service logs
`docker service scale <service>=<n>`	Scale to N replicas
`docker service update --image <image>:<tag> <service>`	Update the service image (rolling update)
`docker service update --force <service>`	Force redeployment of all tasks
`docker service rollback <service>`	Roll back to the previous version
`docker service rm <service>`	Remove a service

5.4 Stack Commands#

Command	Description
`docker stack deploy -c compose.yaml <stack>`	Deploy a stack from a Compose file
`docker stack ls`	List all stacks
`docker stack ps <stack>`	List tasks in a stack
`docker stack services <stack>`	List services in a stack
`docker stack rm <stack>`	Remove a stack

5.5 Secret and Config Commands#

Command	Description
`echo "secret" \| docker secret create <name> -`	Create a secret from stdin
`docker secret create <name> <file>`	Create a secret from a file
`docker secret ls`	List all secrets
`docker secret inspect <name>`	Inspect a secret (metadata only)
`docker secret rm <name>`	Remove a secret
`docker config create <name> <file>`	Create a config from a file
`docker config ls`	List all configs
`docker config rm <name>`	Remove a config

6. Troubleshooting#

Issue	Cause	Solution
`Error response from daemon: This node is not a swarm manager`	Command run on a worker node	SSH to a manager node or promote this node
Node shows as `Down` in `docker node ls`	Network issue or Docker stopped on the node	Verify connectivity on ports 2377 and 7946; restart Docker on the node
`could not find a leader` after manager restart	Swarm autolock enabled	Run `docker swarm unlock` with the unlock key
Service stuck in `Pending` state	No node meets constraints or resources exhausted	Check `docker service ps <service>` for error messages; verify node labels and resource availability
Overlay network unreachable between nodes	Firewall blocking UDP 4789 (or custom data path port)	Open the VXLAN port on all nodes
`rpc error: transport is closing`	Raft consensus lost (majority of managers down)	Restore quorum: `docker swarm init --force-new-cluster` on a surviving manager
Rolling update fails	New image crashes or health check fails	Check `docker service ps <service>` for task errors; `docker service rollback <service>` to revert
`network not found` during stack deploy	Network from a previous deployment was not cleaned up	Remove the old stack first: `docker stack rm <stack>`, wait, then redeploy
Tasks rescheduled after node drain	Expected behavior	Drain triggers task migration; set node to `active` when maintenance is complete