The full writeup behind Three-layer GitOps on K3s, in production. Internal specifics are scrubbed; everything here is generic enough to run on your own gear. Replace
example.com, the placeholder IPs, and<placeholders>with your own values.
Build a 6-node K3s HA cluster (3 control-plane + 3 worker, embedded etcd) and drive every workload from a single git repository. Calico for CNI, MetalLB for load balancers, Traefik for ingress, Longhorn for storage, Vault for secrets, ArgoCD for reconciliation. Nothing is deployed by hand: a
kubectl applyof one root manifest brings the whole platform up in dependency order.
Address note: all
10.x/192.0.2.xaddresses below are RFC 5737 documentation ranges or placeholders. Swap in your own LAN subnet.
Table of Contents#
- Overview
- Prerequisites
- Host Preparation
- Firewall
- Install K3s in HA Mode
- Install Tooling: Helm, kubectl plugins, k9s
- CNI: Calico
- Load Balancer: MetalLB
- Storage: Longhorn
- The GitOps Repository
- ArgoCD and the App-of-Apps Pattern
- Three Helm Deployment Patterns
- Secret Management with Vault + AVP
- Networking and Network Policies
- Databases: PostgreSQL and MariaDB
- Platform Services: Harbor, Monitoring
- Namespace Convention
- Workload Distribution
- Dependency Updates with Renovate
- Day-to-Day Operations
- Troubleshooting
- Sources
1. Overview#
The target is a K3s HA cluster, 6 nodes, split 3 control-plane and 3 worker. K3s runs in HA mode with embedded etcd (--cluster-init). Calico provides the CNI, MetalLB hands out LoadBalancer IPs, Traefik terminates ingress, Longhorn provides replicated block storage, Vault holds secrets, and ArgoCD reconciles every workload from a single git repository.
The defining property is that nothing in the cluster is deployed by kubectl apply by hand. Everything is deployed by a git push. ArgoCD watches the repo and the cluster; if they diverge, it converges the cluster back to the repo. A fresh cluster comes up with one kubectl apply of a root manifest and then nothing else.
Example node layout (replace hostnames and IPs with your own):
| Node | IP | Role | Purpose |
|---|---|---|---|
k3s-control-01 | 192.0.2.51 | Control | K3s server, etcd, platform services |
k3s-control-02 | 192.0.2.52 | Control | K3s server, etcd, platform services |
k3s-control-03 | 192.0.2.53 | Control | K3s server, etcd, platform services |
k3s-worker-01 | 192.0.2.54 | Worker | Application workloads |
k3s-worker-02 | 192.0.2.55 | Worker | Application workloads |
k3s-worker-03 | 192.0.2.56 | Worker | Application workloads |
- Virtual IP:
192.0.2.50, handed out by MetalLB to the Traefik LoadBalancer service. All external DNS records point here. - Node labels:
example.com/role=controlon control nodes,example.com/role=workeron workers. Helm charts usenodeSelectorto place workloads on the right tier.
2. Prerequisites#
| OS | RHEL 9+, Debian 11+, or compatible |
| Access | Root privileges on every node |
| Network | Outbound internet access |
Minimum per node#
| Resource | Specification |
|---|---|
| CPU | 2 cores (more for control nodes running etcd + platform) |
| Memory | 2 GB minimum, 8 GB+ realistic for a platform node |
| Storage | 10 GB free for the OS; Longhorn needs its own disk/partition per node |
| Kernel | 5.4 or later |
Required software on each host: curl. The host firewall is disabled and Kubernetes manages the packet rules instead (see section 4).
3. Host Preparation#
Run on every node before installing K3s.
Update packages:
dnf update -y # RHEL family; apt on DebianKernel networking settings. Add to
/etc/sysctl.d/k8s.conf:net.ipv4.ip_forward = 1 net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-ip6tables = 1Apply:
sysctl --systemDisable swap. Kubernetes requires it off:
swapoff -a # then comment out the swap line in /etc/fstab to persistSELinux (RHEL family). Permissive at minimum:
setenforce 0 grubby --update-kernel ALL --args selinux=0Install required tools:
dnf install -y curl
4. Firewall#
Disable the host firewall on cluster nodes and let Kubernetes manage the packet rules.
K3s programs its own iptables/nftables chains through kube-proxy and the CNI (Calico here) for pod, service, and NodePort traffic. A host firewall like firewalld or ufw runs its own chains on top of that, and the two fight: a firewalld reload flushes or reorders chains the CNI installed, masquerade rules collide, and pod-to-pod or pod-to-service traffic starts dropping intermittently in ways that are miserable to debug. The K3s docs call this out directly - firewalld is known to conflict, and the supported answer on a trusted network is to turn it off.
# RHEL family
systemctl disable --now firewalld
# Debian / Ubuntu
systemctl disable --now ufwSecurity comes from the right layers instead of the host firewall:
- A perimeter firewall (router, OPNsense, or a cloud security group) controls what reaches the nodes from outside - normally just
80/443to the ingress, and6443to the API from trusted admin networks. - Kubernetes NetworkPolicy controls pod-to-pod traffic inside the cluster, enforced by Calico.
If you cannot disable the host firewall (shared L2, compliance), do not let it filter pod traffic. Put the CNI interface and the cluster CIDRs in a trusted zone, then open the node ports:
firewall-cmd --permanent --zone=trusted --add-interface=cni0 firewall-cmd --permanent --zone=trusted --add-source=<pod-cidr> # e.g. K3s default 10.42.0.0/16 firewall-cmd --permanent --zone=trusted --add-source=<service-cidr> # e.g. K3s default 10.43.0.0/16 for p in 53/tcp 53/udp 80/tcp 443/tcp 2049/tcp 2379-2380/tcp 3260/tcp 4789/udp 6443/tcp 10250/tcp; do firewall-cmd --permanent --add-port=$p done firewall-cmd --reload
5. Install K3s in HA Mode#
Disable the bundled Traefik and the bundled flannel CNI; Calico and a GitOps-managed Traefik replace them. The aggressive failover flags are deliberate: K3s defaults are roughly twelve minutes from node death to pod eviction, which is not high availability for a small cluster where every node matters.
First control node (--cluster-init starts a new etcd cluster):
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--cluster-init --disable traefik --flannel-backend=none \
--kube-controller-manager-arg=node-monitor-period=2s \
--kube-controller-manager-arg=node-monitor-grace-period=16s \
--kube-apiserver-arg=default-not-ready-toleration-seconds=30 \
--kube-apiserver-arg=default-unreachable-toleration-seconds=30" sh -
mkdir -p "$HOME/.kube"
cp /etc/rancher/k3s/k3s.yaml "$HOME/.kube/config"| Flag | Effect |
|---|---|
node-monitor-period=2s | Kubelet health check interval (default 5s) |
node-monitor-grace-period=16s | Time before a node is marked NotReady (default 40s) |
default-not-ready-toleration-seconds=30 | Pod eviction delay after NotReady (default 300s) |
default-unreachable-toleration-seconds=30 | Pod eviction delay after Unreachable (default 300s) |
Worst-case node-death detection lands around 46 seconds; add pod startup and you are back in service within ~90 seconds of a hard failure. Larger clusters can afford to be more conservative.
Additional control nodes join the existing etcd cluster with --server pointing at the first node and the shared node token (/var/lib/rancher/k3s/server/node-token on the first node):
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--server https://192.0.2.51:6443 --disable traefik --flannel-backend=none \
--kube-controller-manager-arg=node-monitor-period=2s \
--kube-controller-manager-arg=node-monitor-grace-period=16s \
--kube-apiserver-arg=default-not-ready-toleration-seconds=30 \
--kube-apiserver-arg=default-unreachable-toleration-seconds=30" \
K3S_TOKEN="<node-token>" sh -Worker nodes join as agents:
curl -sfL https://get.k3s.io | K3S_URL="https://192.0.2.51:6443" K3S_TOKEN="<node-token>" sh -Paths after install: binary /usr/local/bin/k3s, kubeconfig /etc/rancher/k3s/k3s.yaml, systemd unit /etc/systemd/system/k3s.service (k3s-agent.service on workers). The installer enables and starts the service automatically.
etcd tuning. Longhorn writes a lot of CRD state to etcd (per-volume), and the default election timeout (1000 ms) occasionally fires under that load and triggers a spurious leader election. Raise heartbeat to 500 ms and election timeout to 5000 ms - the lowest values at which spurious elections stop. Pass via K3s etcd args or the embedded-etcd config.
Verify:
systemctl status k3s
kubectl get nodes # all 6 should appear (Ready once the CNI is up)Label the nodes so charts can target a tier:
kubectl label node k3s-control-01 k3s-control-02 k3s-control-03 example.com/role=control
kubectl label node k3s-worker-01 k3s-worker-02 k3s-worker-03 example.com/role=worker6. Install Tooling: Helm, kubectl plugins, k9s#
# Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version# k9s (check the latest tag on the releases page)
curl -L https://github.com/derailed/k9s/releases/download/v0.32.7/k9s_Linux_amd64.tar.gz -o k9s.tar.gz
tar -xzf k9s.tar.gz -C /usr/local/bin
k9s versionThe remaining components (Calico, MetalLB, Longhorn, Traefik) can be installed by hand once to bring the cluster to life, then folded under ArgoCD so the repo owns them. The manual install commands below are the bootstrap; the GitOps form is in section 11.
7. CNI: Calico#
K3s was started with --flannel-backend=none, so install Calico explicitly.
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.29.1/manifests/calico.yamlSet a custom IP pool. default-ipv4-ippool.yml:
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
name: default-ipv4-ippool
spec:
cidr: 172.31.0.0/16
blockSize: 26
ipipMode: Always
natOutgoing: true
nodeSelector: all()
vxlanMode: Neverkubectl apply -f default-ipv4-ippool.yml
kubectl get pods -n kube-system # calico-node pods should go RunningOptional kubectl-calico plugin for IPAM inspection:
curl -L https://github.com/projectcalico/calico/releases/download/v3.29.1/calicoctl-linux-amd64 -o /usr/local/bin/kubectl-calico
chmod +x /usr/local/bin/kubectl-calico
kubectl calico ipam showCIDR mismatch caveat. If you let K3s default its pod CIDR to
10.42.0.0/16but configure Calico's IP pool as172.31.0.0/16, cross-node traffic is SNAT'd to the IPIP tunnel interface (tunl0) addresses. That breaksNetworkPolicypodSelectormatching for cross-node traffic. Either keep the two CIDRs aligned, or add thetunl0IPs asipBlockentries in anyNetworkPolicythat needs cross-node communication. Aligning the CIDRs from the start is the cleaner fix.
8. Load Balancer: MetalLB#
helm repo add metallb https://metallb.github.io/metallb
helm repo update
helm install metallb metallb/metallb --namespace metallb-system --create-namespaceLayer 2 address pool. metal.yaml:
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: ingress-vip-pool
namespace: metallb-system
spec:
addresses:
- 192.0.2.50/32
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: ingress-l2
namespace: metallb-system
spec:
ipAddressPools:
- ingress-vip-poolkubectl apply -f metal.yaml
kubectl get pods -n metallb-system
kubectl get services -A # Traefik's LoadBalancer should pick up the VIPThat VIP (192.0.2.50) is the single entry point. Point your wildcard DNS record (*.example.com) at it.
9. Storage: Longhorn#
Longhorn provides replicated block storage. In a 3-control-node layout, run replicas across the three control nodes (or across whichever nodes carry dedicated storage disks).
helm repo add longhorn https://charts.longhorn.io
helm repo update
helm install longhorn longhorn/longhorn --namespace longhorn-system --create-namespace
kubectl get pods -n longhorn-system- StorageClass:
longhorn(set it default). - Replication: 3 replicas per volume, one per storage node.
- RWX volumes: served via NFS share-manager pods for ReadWriteMany access (this is why port 2049 is open).
Expose the UI through Traefik (or NGINX if that is your ingress). Example NGINX Ingress:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: longhorn-ingress
namespace: longhorn-system
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: 'false'
nginx.ingress.kubernetes.io/proxy-body-size: 10000m
spec:
ingressClassName: nginx
rules:
- host: longhorn.example.com
http:
paths:
- pathType: Prefix
path: "/"
backend:
service:
name: longhorn-frontend
port:
number: 80kubectl apply -f longhorn-ingress.ymlStale volume attachments. If a pod cannot mount a Longhorn volume (
MountDevicetimeout): delete the staleVolumeAttachment(kubectl get volumeattachment | grep <pvc-id>), clear the volume'snodeID(kubectl patch volume <pvc-id> -n longhorn-system --type merge -p '{"spec":{"nodeID":""}}'), and for RWX volumes delete the share-manager pod to force the NFS server to restart.
10. The GitOps Repository#
One repository holds the entire desired state. Point ArgoCD at it; never kubectl apply by hand.
kubernetes/
argo/ # ArgoCD Application manifests
bootstrap/ # Phase 0: infra (Traefik, MetalLB, Longhorn, cert-manager, operators)
platform/ # Phase 1: shared services (Vault, Postgres, registry, monitoring)
apps/ # Phase 2: user applications
root-bootstrap.yaml # App-of-apps for the bootstrap phase
root-platform.yaml # App-of-apps for the platform phase
root-apps.yaml # App-of-apps for the apps phase
apps/
helm/ # Helm values and umbrella charts per application
traefik/values.yaml # Values override for the Traefik chart
registry/Chart.yaml # Umbrella chart with the registry chart as a dependency
registry/values.yaml
...
manifests/ # Raw Kubernetes YAML per namespace
traefik/ # NetworkPolicies, quotas, extra resources
vault-ha/ # Namespace config, network policies
...
infra/ # Node-level setup scripts (k3s install, Vault init)
.gitlab-ci.yml # CI pipeline (yamllint, kubeconform, helm validate, Renovate)
.gitlab/renovate.json # Renovate dependency update configHow changes are made#
- Edit files in the repo (
values.yaml, manifests, ArgoCD Applications). - Commit and push to
master. - ArgoCD detects the change and syncs within ~3 minutes (or instantly via a webhook).
- ArgoCD applies the new state to the cluster.
Never kubectl apply directly - ArgoCD reverts it on the next sync. The repo is the only way in.
Commit convention#
Single-line commits: feat:, fix:, docs:, refactor:, chore:. Add a scope for the component, e.g. fix(traefik): increase proxy timeout.
CI pipeline#
Every push validates:
- yamllint - YAML syntax.
- manifest-validate -
kubeconformagainst the target Kubernetes schema. - helm-validate -
helm templateon every umbrella chart. - argocd-validate - ArgoCD
ApplicationYAML structure.
11. ArgoCD and the App-of-Apps Pattern#
ArgoCD lives in its own namespace and manages every other Application through the app-of-apps pattern: one Application that creates more Application resources. Three roots, one per layer.
argo/
root-bootstrap.yaml sync-wave 0
root-platform.yaml sync-wave 1
root-apps.yaml sync-wave 2
bootstrap/ Layer 0 Applications
platform/ Layer 1 Applications
apps/ Layer 2 ApplicationsEach root is an Application pointing at the directory below it; each directory holds child Application resources. Sync waves enforce ordering: Layer 1 does not start until Layer 0 is healthy, Layer 2 not until Layer 1 is healthy. A fresh cluster comes up in the right order from one command:
kubectl apply -f argo/root-bootstrap.yamlForty minutes later you have the whole platform. The three layers:
Bootstrap (Phase 0) - infrastructure. Operators and CRDs that depend on nothing else.
| App | Chart source | Namespace |
|---|---|---|
| traefik | traefik.github.io/charts | traefik-system |
| metallb | metallb.github.io/metallb | metallb-system |
| longhorn | charts.longhorn.io | longhorn-system |
| cert-manager | charts.jetstack.io | cert-core |
| gatekeeper | open-policy-agent.github.io | gatekeeper-system |
| cnpg | cloudnative-pg.github.io | cnpg-core |
| mariadb-operator | helm.mariadb.com | mariadb-core |
| reflector | emberstack.github.io | argocd-system |
| argocd | git (manifests) | argocd-system |
| cluster-config | git (manifests) | various |
| volume-snapshot-crds | git (manifests) | kube-system |
Platform (Phase 1) - shared dependencies. Services that applications rely on.
| App | Description | Namespace |
|---|---|---|
| vault-transit | Vault auto-unseal backend | vault-core |
| vault-ha | 3-node Vault HA with Raft | vault-core |
| valkey | Redis-compatible cache (Sentinel HA) | valkey-core |
| postgres | CNPG 3-node PostgreSQL cluster | cnpg-core |
| registry | Container registry with proxy cache | registry-core |
| object-store | Distributed object storage | objectstore-core |
| kube-prometheus-stack | Prometheus + Grafana + Alertmanager | monitoring-core |
| loki | Log aggregation | monitoring-core |
| promtail | Log shipping DaemonSet | monitoring-core |
Apps (Phase 2) - user applications. They use the platform but do not provide it.
| App | Description | Namespace |
|---|---|---|
| awx | Ansible automation platform | awx |
| netbox | IP address management | netbox |
| asset-mgmt | IT asset management | asset-mgmt |
| filebrowser | Web file manager | filebrowser |
| ci-runner | GitLab CI runner (Kubernetes executor) | ci-runner |
Reconciliation is the point#
"I can deploy with git" is not the win - a shell script does that. The win is reconciliation. With selfHeal: true and prune: true on every Application, manual changes to the cluster are reverted within ~180 seconds. Three consequences:
- The repo is the truth. To know what is running, read
master, notkubectl get all -A. - No manual hotfixes survive. Edit a Deployment directly and ArgoCD reverts it.
- Rebuild is one command.
kubectl apply -f argo/root-bootstrap.yamlon a fresh K3s install, and the same cluster comes back in under an hour.
When something is genuinely on fire, self-heal works against you - every fix you try gets reverted. Disable auto-sync on the affected app, fix by hand, commit the fix, re-enable.
Access ArgoCD#
# Admin password
kubectl -n argocd-system get secret argocd-initial-admin-secret \
-o jsonpath='{.data.password}' | base64 -d
# Port-forward if DNS is not yet pointed at the VIP
kubectl port-forward svc/argocd-server -n argocd-system 8080:443
# then open https://localhost:808012. Three Helm Deployment Patterns#
ArgoCD can render manifests more than one way. The repo uses three, picked per app based on whether the app needs secrets and whether the chart exposes the knobs the app needs.
Pattern 1: Multi-source Helm (no secrets)#
Pull the chart from the upstream Helm repo, merge it with a values.yaml from the git repo, optionally add a third source for plain manifests (network policies, quotas). Used for most bootstrap and platform apps.
# argo/bootstrap/traefik-app.yaml
sources:
- repoURL: https://traefik.github.io/charts
chart: traefik
targetRevision: 39.0.1
helm:
releaseName: traefik
valueFiles:
- $values/apps/helm/traefik/values.yaml
- repoURL: <this repo>
targetRevision: master
ref: values
- repoURL: <this repo>
targetRevision: master
path: apps/manifests/traefikTo bump the chart version: edit targetRevision, commit, push.
Pattern 2: Umbrella chart + AVP (apps needing secrets)#
A small local Chart.yaml declares the upstream chart as a dependency. A custom ArgoCD management plugin (avp-helm) renders it through a pipeline that injects Vault secrets at sync time.
# apps/helm/registry/Chart.yaml
dependencies:
- name: harbor
version: 1.18.2
repository: https://helm.goharbor.io# argo/platform/registry-app.yaml
sources:
- repoURL: <this repo>
path: apps/helm/registry
plugin:
name: avp-helmTo bump: change version in Chart.yaml, commit, push. ArgoCD runs helm dependency update + AVP automatically.
The render pipeline:
helm dependency update
-> helm template
-> sed (URL-decode AVP placeholders)
-> argocd-vault-plugin generateThe sed step matters: Helm URL-encodes the <path:...> placeholders that argocd-vault-plugin looks for. Helm sees <path:kv/data/foo#bar> and serialises it as %3Cpath%3A.../%3E; AVP cannot find encoded placeholders. One regex decodes them before AVP runs, the rendered manifests get the real values, and the placeholders never touch the cluster.
Pattern 3: Kustomize + Helm post-render (when the chart has no injection point)#
Some charts hardcode a field you need and expose no value to override it - a missing volume, a sidecar, a cleanupPolicy. Forking the chart is one answer. The other is to render the chart through kustomize and patch the output, so you change container configuration without building a custom image or maintaining a fork.
A kustomization.yaml in apps/helm/<app>/ references the upstream chart in a helmCharts: block and applies patches: on top. The avp-helm plugin auto-detects it (a directory holds either a Chart.yaml or a kustomization.yaml, never both) and renders with kustomize build --enable-helm instead of helm template. The Vault secret-injection step runs on the kustomize output the same way.
# apps/helm/<app>/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
# NEVER set a top-level "namespace:" here. It rewrites the namespace of every
# resource, including cross-namespace CRs the chart emits (e.g. an operator
# Database CR that must target another namespace). That mistake caused a
# production data-loss incident.
helmCharts:
- name: <chart>
repo: <upstream-helm-repo>
version: <pinned-version>
valuesFile: values.yaml
releaseName: <release>
namespace: <destination-namespace>
resources:
- extra-config.yaml
patches:
- target: { kind: StatefulSet, name: <release> }
patch: |-
<strategic merge patch>values.yaml here is passed straight to the chart, with no umbrella sub-chart wrapping. Reach for this only when the chart genuinely cannot express the change through values.
The warning that makes this dangerous#
kustomize build --enable-helm inside the ArgoCD repo-server has been seen to render a SHORTER manifest than the exact same command on a workstation - same chart, same values. The cause is not fully isolated (most likely helm/kustomize version drift between the workstation and the plugin container). When it happens, ArgoCD diffs the partial render against the live cluster, decides the missing resources are extraneous, and prunes them. With the wrong defaults that means data loss:
| Foot-gun | What pruning does |
|---|---|
An operator Database CR with cleanupPolicy: Delete (a common chart default) | The operator drops the real database. Tables and rows gone. |
A StatefulSet with volumeClaimTemplates | Pods deleted. The PVCs survive, since a StatefulSet does not cascade-delete them, but the workload is down until the StatefulSet is restored under the same VCT name. |
A managed-Postgres Cluster CR with a delete reclaim policy | Same as the database case, data dropped. |
Before adopting this pattern for any app:
- Render the
kustomization.yamllocally with the same helm and kustomize versions as the repo-server, and diff against the purehelm templateoutput. Resource set, kinds, names, and namespaces must match. A single missingDatabase/ClusterCR is a stop-the-PR finding. - Patch every stateful CR the chart emits to a retain policy (
cleanupPolicy: Skip, or the operator's equivalent) before the first sync, so a future accidental prune cannot drop data. - Set
prune: falseon the Application for the first cutover. Checkstatus.operationState.syncResultfor any pruned stateful resource before re-enabling prune. - Take a fresh database dump immediately before, kept off the resource being migrated.
- Do it in a maintenance window. Treat it as risk-equivalent to a database engine upgrade.
13. Secret Management with Vault + AVP#
HashiCorp Vault (3-node HA with Raft) holds every secret. The ArgoCD Vault Plugin injects them at deploy time. The repo never contains a real secret - only <path:...> references.
Bootstrapping Vault HA#
Vault HA is hard to bootstrap, because it needs an unseal mechanism that exists before Vault HA does. You cannot unseal Vault with secrets stored in Vault.
The escape hatch is a second, smaller Vault running in Transit mode. Transit is an encryption API: it does not store the secrets you want, it encrypts and decrypts an unseal token on demand. The HA Vault auto-unseals against the Transit Vault.
Both live in the same namespace. Sync-wave 0 brings up the Transit Vault; sync-wave 1 brings up the HA Vault, which auto-unseals against Transit and is immediately ready. The Transit Vault holds only the HA Vault's unseal key - no application secrets - and is fenced off by namespace network policies and Vault auth policy. Different blast radii: losing Transit loses the ability to cold-start a fresh HA Vault; losing HA loses application secrets.
Vault access#
- Internal:
http://vault-ha.vault-core.svc.cluster.local:8200 - External:
https://vault.example.com - Tokens: keep the root token sealed away; use a scoped admin token for daily work.
Secret paths#
| Path pattern | Purpose |
|---|---|
kv/data/argocd/platform/<app>#<key> | Platform service secrets |
kv/data/argocd/apps/helm/<app>#<key> | User application secrets |
Using secrets in values#
In an AVP-processed values.yaml:
config:
database:
password: <path:kv/data/argocd/apps/helm/netbox#db_password>AVP replaces the placeholder with the real value from Vault at sync time. Vault is the source of truth; the repo never sees it. Used for database credentials, OIDC client secrets, SMTP passwords, registry credentials.
14. Networking and Network Policies#
CNI: Calico#
- Pod CIDR: keep K3s and Calico aligned (see the section 7 caveat).
- Service CIDR:
10.43.0.0/16(K3s default). - Tunneling: IPIP (
tunl0interfaces on each node).
Load balancer and ingress#
MetalLB in Layer 2 mode hands 192.0.2.50 to Traefik's LoadBalancer service. All HTTP/HTTPS enters through Traefik on port 443. TLS terminates on a wildcard cert for *.example.com, replicated to every namespace by Reflector. Routing uses Traefik IngressRoute CRDs.
Default-deny network policies#
Every namespace gets a default-deny policy for both ingress and egress; allowed traffic is enumerated explicitly.
| Policy | Purpose |
|---|---|
default-deny-ingress/egress | Block everything by default |
allow-same-namespace | Intra-namespace traffic |
allow-dns-egress | DNS resolution (kube-system:53) |
allow-monitoring | Prometheus scraping from monitoring-core |
allow-traefik | Ingress from traefik-system |
allow-internet-egress | External traffic (excludes pod/service CIDRs) |
allow-cluster-services-egress | Access to ClusterIP services |
App-specific policies add egress to the databases, Vault, SMTP, and so on.
15. Databases: PostgreSQL and MariaDB#
PostgreSQL (CloudNativePG)#
The CNPG operator manages a 3-node PostgreSQL cluster. Install the operator (bootstrap layer) then declare a Cluster:
helm repo add cnpg https://cloudnative-pg.io/charts
helm install cnpg --namespace cnpg-core --create-namespace cnpg/cloudnative-pgpg-cluster.yml:
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: postgres
namespace: cnpg-core
spec:
instances: 3
storage:
size: 10Gi
bootstrap:
initdb:
options:
- --encoding=UTF8
- --locale=en_US.UTF-8kubectl apply -f pg-cluster.yml
kubectl get pods -n cnpg-core # postgres-1/-2/-3 should runCNPG publishes discovery services automatically: postgres-rw.cnpg-core.svc (read-write, follows the primary) and postgres-ro.cnpg-core.svc (read-only replicas). Back up with a daily VolumeSnapshot.
MariaDB (MariaDB Operator)#
Single instance in mariadb-core, used by apps that need MySQL/MariaDB. Back up with a daily CronJob and a retention window (e.g. 30 days).
16. Platform Services: Harbor, Monitoring#
Container registry (Harbor)#
Harbor runs as an umbrella-chart + AVP app (it needs DB and admin secrets) and uses the external CNPG PostgreSQL rather than its bundled database.
The external-DB secret (rendered by AVP, never committed plain):
apiVersion: v1
kind: Secret
metadata:
name: postgres-harbor
namespace: registry-core
labels:
cnpg.io/reload: "true"
stringData:
host: postgres-rw.cnpg-core.svc.cluster.local
port: "5432"
coreDatabase: harbor
username: harbor
password: <path:kv/data/argocd/platform/registry#db_password>
type: OpaqueKey values.yaml fragments - run multiple replicas of every component, point the database at CNPG, and enable the proxy-cache so the registry doubles as a pull-through cache for upstream images:
externalURL: https://registry.example.com
expose:
ingress:
hosts:
core: registry.example.com
cache:
enable: true
expireHours: 87600
core: { replicas: 3 }
jobservice: { replicas: 3 }
registry: { replicas: 3 }
trivy: { replicas: 3 }
portal: { replicas: 3 }
database:
type: external
external:
existingSecret: postgres-harbor
host: postgres-rw.cnpg-core.svc.cluster.local
port: 5432
coreDatabase: harbor
username: harborThe default admin login is admin with a password set at install; retrieve it from the core secret:
kubectl get secret harbor-core -n registry-core \
-o jsonpath="{.data.HARBOR_ADMIN_PASSWORD}" | base64 --decode; echoMonitoring#
monitoring-core runs the kube-prometheus-stack (Prometheus, Grafana, Alertmanager), Loki for logs, and Promtail as a DaemonSet shipping logs from every node. This is cluster-internal monitoring; it is not built to watch external infrastructure. Alertmanager routes alerts by email to an ops address via your SMTP relay (smtp.example.com:25).
17. Namespace Convention#
The suffix tells you what tier something is in. RBAC policies attach to suffix patterns and apply automatically to new namespaces in the same tier.
| Suffix | Purpose | Examples |
|---|---|---|
-system | Cluster infrastructure operators | longhorn-system, metallb-system, traefik-system |
-core | Shared platform dependencies | cnpg-core, vault-core, registry-core, monitoring-core |
| (none) | Application namespaces | awx, netbox, filebrowser |
One app per namespace, always. Mixing two unrelated workloads in one namespace is how one app ends up able to exfiltrate the other's secrets via a misapplied ServiceAccount.
Pair the convention with topology spread constraints on every multi-replica workload (maxSkew: 1, whenUnsatisfiable: DoNotSchedule, on kubernetes.io/hostname) so a single node failure removes at most one replica of anything.
18. Workload Distribution#
- Control nodes run platform state: Vault, PostgreSQL, MariaDB, the registry, the cache, object storage, the CNPG/MariaDB operators, cert-manager, the policy controller, Reflector, ArgoCD.
- Worker nodes run user apps and the observability stack: AWX, Netbox, asset management, file browser, CI runner, Grafana, Prometheus, Alertmanager, Loki.
- DaemonSets run on all nodes: Traefik, Promtail, the Longhorn CSI, the MetalLB speaker, Calico, node-exporter.
Charts target a tier with nodeSelector against the example.com/role label from section 5.
19. Dependency Updates with Renovate#
Renovate runs on a weekly schedule (weekends) via CI and opens merge requests for:
- Helm chart versions in ArgoCD Application manifests (argocd manager).
- Umbrella chart dependencies in
Chart.yaml(auto-detected). - Container image tags in
values.yaml(helm-values manager). - Container images in raw manifests (custom regex).
- CI tool versions in the pipeline file.
Config in .gitlab/renovate.json. Rules worth copying:
- 3-day stability window for Helm chart updates.
- Auto-merge patch updates for CI tools only.
- Major updates never auto-merged.
- Vault and the monitoring stack grouped into single MRs so related bumps land together.
20. Day-to-Day Operations#
Deploy a new application#
- Create
apps/helm/<app>/values.yaml(andChart.yamlif it needs AVP). - Create
apps/manifests/<app>/withnamespace.yaml,network-policy.yaml,quota.yaml. - Create
argo/apps/<app>-app.yaml(the ArgoCD Application). - Add the app to
argo/root-apps.yaml. - If it needs secrets, add them to Vault under
kv/argocd/apps/helm/<app>. - Commit and push.
Update a Helm chart version#
- Multi-source apps: edit
targetRevisioninargo/*/...-app.yaml. - Umbrella apps: edit
versioninapps/helm/<app>/Chart.yaml.
Check cluster health#
kubectl get nodes # all 6 Ready
kubectl get pods -A | grep -v Running # empty (or Completed)
kubectl get app -n argocd-system # all Synced + HealthyForce an ArgoCD resync#
kubectl annotate app <app-name> -n argocd-system \
argocd.argoproj.io/refresh=hard --overwriteEveryday kubectl#
| Command | Description |
|---|---|
kubectl cluster-info | Cluster endpoints |
kubectl get nodes / kubectl describe node <node> | Node state |
kubectl get pods -n <ns> / kubectl describe pod <pod> -n <ns> | Pod state and events |
kubectl logs <pod> -n <ns> --tail=50 | Recent logs (-c <container> for a sidecar) |
kubectl get events -n <ns> --sort-by=.lastTimestamp | tail -20 | Recent events |
kubectl exec <pod> -n <ns> -- <command> | Run a command in a pod |
kubectl port-forward svc/<svc> -n <ns> <local>:<remote> | Tunnel a service locally |
kubectl get crd / kubectl describe crd <name> | Custom resource definitions |
21. Troubleshooting#
| Symptom | First move |
|---|---|
| Cannot reach the cluster | kubectl config view - verify the kubeconfig and API endpoint |
| Calico pods stuck | Check the VXLAN/IPIP config in the manifest; kubectl logs -n kube-system calico-node-<id> |
| Ingress not working | Confirm DNS points at the VIP; kubectl describe ingress <name> |
| K3s service issues | journalctl -u k3s (or k3s-agent on workers) |
| Pod will not mount a Longhorn volume | Clear stale VolumeAttachment + the volume nodeID (see section 9) |
| Cross-node NetworkPolicy not matching | CIDR mismatch / tunl0 SNAT (see section 7) |
| External PostgreSQL connection fails | Verify the CNPG secret and pod-to-DB connectivity from the namespace |
| Pod stuck not starting | kubectl describe pod <pod> -n <ns> and read the events |
GitOps does not solve debugging - kubectl logs/describe/get events is still the toolset. ArgoCD watches what is deployed, not what is happening at runtime. It is also the wrong tool for fast iteration: the ~180-second poll plus webhook plus sync means a deploy is 5-30 seconds end to end, fine for production-shaped changes, painful for tweaking a chart. Use a throwaway dev cluster and helm upgrade directly for that.
22. Sources#
K3s / Kubernetes:
- K3s documentation
- K3s HA with embedded etcd
- kubectl reference
- NetworkPolicy reference
- Topology spread constraints
GitOps / ArgoCD:
Networking / CNI / LB:
Storage:
Secrets:
Databases:
Platform: