Documentation for my homelab.

OPNsense

Static 10.0.8.0/24 Route

To successfully route traffic to the 10.0.8.0/24 subnet advertised by MetalLB a static route needs to be added to the OPNsense router. There is no good way to do so within the web interface, instead SSH to the box and add the following to /usr/local/etc/rc.syshook.d/start/96-k8s-static-route. Plus, make sure it has execute permissions (chmod +x 96-k8s-static-route).

#!/bin/sh

route add -net 10.0.8.0/24 -interface vlan09

Uptime Kuma

uptime.kgb33.dev

An off-site uptime monitoring solution hosted on AWS ECS.

Scripts to deploy to both AWS and Fly.io exist in the repo; However, due to cost, Uptime Kuma is only deployed to Fly.io. AWS documentation and Scrips are kept to demonstrate AWS experience on a resume.

Cloudflare Rules

Cloudflare (occasionally) tries to block this bot. To prevent this, add a new "Configuration Rule" with a custom filter expression where the IP source matches the Fly.io IPv4 or IPv6 address assigned to the machine. This rule turns off the Browser integrity check, and sets the Security Level to "Essentially Off".

Fly.io Deployment

From flyio/uptime_kuma, just run the following, It'll deploy Uptime Kuma to Fly.io, validate the DNS challenge for SSL certificates, and add A/AAAA records. If you use down instead of up, it'll do the reverse. Don't worry about running the commands multiple times, they're both idempotent.

dagger call \
    --fly-api-token=FLY_API_TOKEN \
    --fly-toml=fly.toml \
    --pulumi-access-token=PULUMI_ACCESS_TOKEN \
    --cloudflare-token=CLOUDFLARE_API_TOKEN \
    up

AWS Deployment (Depreciated)

Secrets required:

  • Cloudflare token (with write access to kgb33.dev) as CLOUDFLARE_API_TOKEN
  • Allow Pulumi access to AWS (See here

AWS Permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "acm:DeleteCertificate",
                "acm:DescribeCertificate",
                "acm:ListTagsForCertificate",
                "acm:RequestCertificate",
                "ec2:AuthorizeSecurityGroupEgress",
                "ec2:CreateTags",
                "ec2:DeleteSecurityGroup",
                "ec2:RevokeSecurityGroupEgress",
                "ec2:RevokeSecurityGroupIngress",
                "iam:AttachRolePolicy",
                "iam:CreateRole",
                "iam:DeleteRole",
                "iam:DetachRolePolicy",
                "iam:GetRole",
                "iam:ListInstanceProfilesForRole",
                "iam:ListRolePolicies",
                "logs:DeleteLogGroup",
                "logs:ListTagsLogGroup"
            ],
            "Resource": "*"
        }
    ]
}

Then just pulumi up and navigate to uptime.kgb33.dev

Proxmox Install

For the base install, use the latest ISO and follow the instructions. Installation values:

  • IP/Hostname:
    • 10.0.9.101/24 / glint.pve.kgb33.dev
    • 10.0.9.102/24 / targe.pve.kgb33.dev
    • 10.0.9.103/24 / sundance.pve.kgb33.dev
  • Email: pve@kgb33.dev

Afterward, implement the following post-install steps.

VLAN Tagging

Unfortunately, there is no way to add VLAN tagging in the installation. Instead, open a shell on the device and edit /etc/network/interfaces.

 auto lo
 iface lo inet loopback
 
 iface enp7s0f1 inet manual
 
-auto vmbr0
-iface vmbr0 inet static
+auto vmbr0.9
+iface vmbr0.9 inet static
         address 10.0.9.102/24
         gateway 10.0.9.1

+auto vmbr0
+iface vmbr0 inet manual
         bridge-ports enp7s0f1
         bridge-stp off
         bridge-fd 0
+        bridge-vlan-aware yes
+        bridge-vids 2-4094

 iface wlp0s20f3 inet manual
 
 
 source /etc/network/interfaces.d/*

Save and reload with ifreload -a.

Clustering

Make sure to cluster the machines using Proxmox's builtin clustering system.

On one machines (I prefer targe) create the cluster, then join the other machines using their web UI.

Post-Install Ansible Playbooks

Create my user account and pull ssh keys:

cd ansible
ansible-playbook --limit=pve playbooks/audd/audd.yaml -k -u root

Then, run the following playbook to set the DNS servers and enable closing the lid without shutting down the machine.

ansible-playbook --limit=pve playbooks/pve/system-services.yaml -k -u root

TLS Certificates

ACME Accounts

Create two ACME 'accounts' using the Web UI (Datacenter → ACME) or by SSH to one of the Proxmox machines (make sure to su into root).

# Select option 1: "Let's Encrypt V2 Staging"
pvenode acme account register homelab-staging pve@kgb33.dev

# Select option 0: "Let's Encrypt V2"
pvenode acme account register homelab-prod pve@kgb33.dev

dns-01 Challenge

In the Web UI, create a new Challenge Plugin (Datacenter → ACME) with the following values (all others are blank):

  • Plugin ID: homelab-cloudflare
  • DNS API: Cloudflare Managed DNS
  • CF_TOKEN: <CLOUDFLARE API TOKEN>

Add Certificate

On each node, navigate to System → Certificates and Add a domain under ACME.

  • Challenge Type: DNS
  • Plugin: homelab-cloudflare (The one made above)
  • Domain: <NODE>.pve.kgb33.dev

Set the "Using Account", then click "Order Certificates Now".

Proxmox Docs

Proxmox VMs

Download the latest Talos ISO onto all the Proxmox nodes https://github.com/siderolabs/talos/releases/latest/download/metal-amd64.iso Make sure to save it as talos-metal-amd64.iso.

Create a Terraform User

Following the instructions on the Telmate/proxmox docs.

Or ssh to a node and run the following commands:

# Create Role
pveum role add TerraformProv -privs "Datastore.AllocateSpace Datastore.Audit Pool.Allocate Sys.Audit Sys.Console Sys.Modify VM.Allocate VM.Audit VM.Clone VM.Config.CDROM VM.Config.Cloudinit VM.Config.CPU VM.Config.Disk VM.Config.HWType VM.Config.Memory VM.Config.Network VM.Config.Options VM.Migrate VM.Monitor VM.PowerMgmt SDN.Use"
# Create User (No password)
pveum user add terraform-prov@pve
# Add Role to User
pveum aclmod / -user terraform-prov@pve -role TerraformProv

Create Proxmox API Token

Then, open the Web UI to generate the API Key.

Go to Datacenter → Permissions → API Tokens; then Add a token. Expose the Token ID (public) and Secret (duh) as environment variables:

# Examples from Telmate Docs
export PM_API_TOKEN_ID="terraform-prov@pve!mytoken"
export PM_API_TOKEN_SECRET="afcd8f45-acc1-4d0f-bb12-a70b0777ec11"

Build VMs

cd tf
tofu apply

Kubernetes

Starting from Scratch

First, make sure to create the Talos VMs as described here, then, cd into the talos directory.

From here, you can use Dagger to automatically provision the nodes. Each step is also detailed in the sub-chapters - if you would prefer a manual approach.

Note: If you havn't already, generate the cluster info using

talosctl gen config homelab https://10.0.9.25:6443 -o _out
$ dagger functions
Name        Description
argocd      Step 4: Start ArgoCD.
base-img    Builds a Alpine image with talosctl installed and ready to go.
bootstrap   Step 2: Bootstrap etcd.
cilium      Step 3: Apply Cilium.
provision   Step 1: Provision the nodes.

Step 1: Provision the Nodes

After the brand new Talos VMs load up - and the STAGE is Maintance - run:

dagger call \
  --raw-template=./templates/talos.yaml.j2 \
  --talos-dir=_out \
  provision

Step 2: Bootstrap Etcd

After all the nodes have rebooted (~1min), bootstrap Etcd. The STAGE on teemo will change from Installing to Booting when its ready to be bootstraped.

dagger call \
  --raw-template=./templates/talos.yaml.j2 \
  --talos-dir=_out \
  bootstrap

Step 3: Apply Cilium

Once Etcd has started, apply cilium:

dagger call \
  --raw-template=./templates/talos.yaml.j2 \
  --talos-dir=_out \
  cilium

Step 4: Start ArgoCD

Once the Cilium step has compleated (it'll show a nice status dashboard), start ArgoCD.

dagger call \
  --raw-template=./templates/talos.yaml.j2 \
  --talos-dir=_out \
  argocd

Importantly, this step ends by printing out the default ArgoCD password. You still need to manually change the password and sync the apps-of-apps; see here.

Step 6: Grab the Kubeconfig

talosctl --talosconfig _out/talosconfig kubeconfig --nodes 10.0.9.25

Talos

First, cd into the talos directory.

Generating the Config files

Use the following command to create talosconfig, controlplane.yaml and worker.yaml

mkdir _out
pushd _out
talosctl gen config \
    home https://10.0.9.25:6443 \
    --config-patch '[{"op": "add", "path": "/cluster/proxy", "value": {"disabled": true}}, {"op":"add", "path": "/cluster/network/cni", "value": {"name": "none"}}]'

talosctl --talosconfig talosconfig config endpoint 10.0.9.25
talosctl --talosconfig talosconfig config node 10.0.9.25
popd

Start Nodes

Create a Python virtual environment and install dagger-io.

python -m venv .venv
source .venv/bin/activate
pip install dagger-io 

Then run the playbook:

python pipeline.py

TODO: Convert this to a Zenith style module.

Bootstrap etcd

Next, run

talosctl --talosconfig _out/talosconfig bootstrap

Then grab the kubeconfig, overwriting if needed:

talosctl --talosconfig _out/talosconfig kubeconfig

Note: The nodes won't be healthy until the cilium config is applied in the next step!

Cilium

Add the Helm repo:

helm repo add cilium https://helm.cilium.io/

Generate the cilium config:

 helm template cilium cilium/cilium \
    --version 1.15.1 --namespace kube-system \
    --set ipam.mode=kubernetes \
    --set kubeProxyReplacement=strict \
    --set k8sServiceHost="10.0.9.25" \
    --set k8sServicePort="6443" \
    --set=securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
    --set=securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
    --set=cgroup.autoMount.enabled=false \
    --set=cgroup.hostRoot=/sys/fs/cgroup \
    --set hubble.listenAddress=":4244" \
    --set hubble.relay.enabled=true \
    --set hubble.ui.enabled=true > cilium.yaml

And apply via,

kubectl apply -f cilium.yaml && cilium status --wait

ArgoCD

Argo Login

Grab the initial secret:

kubectl get secrets -n argocd \
  argocd-initial-admin-secret \
  -o jsonpath='{.data.password}' | base64 --decode 

Port forward the Argo dashboard, then login with username admin.

kubectl port-forward -n argocd services/argocd-server 8080:80

Note: This also forwards the Web GUI to localhost:8080

argocd login localhost:8080
argocd account update-password

Once the password has been changed, delete the initial secret:

kubectl delete secret -n argocd argocd-initial-admin-secret

Apps-of-Apps

Apply the meta definition:

kubectl apply -f k8s-apps/meta.yaml

And sync them:

argocd app sync argocd-meta
argocd app sync --project default

Note, on fresh cluster all the secrets will need to be rolled.

Kubernetes Secrets

Application secrets are managed using Sealed Secrets and are stored with the application deployment config in k8s-apps/<APPLICATION>/<SEALED_SECRET>.yaml.

Creating/Rotating Secrets

I use the following zsh function to regenerate the sealed secret when rotating them. Importantly, editing the plain text values within seems to cause the decryption to fail within the cluster; so recreating the secret from scratch seems to be the most consistent.

function sealSecret() {
    if [[ $# -eq 0 ]]; then
        echo "Useage: sealSecret secretName secretValue namespace"
        return 1
    fi
    echo -n $2 | \
    kubectl create secret generic $1 --dry-run=client --from-file=$1=/dev/stdin -o yaml -n $3 | \
    kubeseal -o yaml
}

Listing Application Secrets

Currently, there are three sealed secrets:

  • k8s-apps/traefik/CloudflareSecret.yaml
  • k8s-apps/roboshpee/SealedToken.yaml
  • k8s-apps/pihole/pihole-admin-password.yaml

To get a current list of secrets in-repo:

rg -l '^kind: SealedSecret' k8s-apps

Or in-cluster:

kubectl get -A sealedsecrets.bitnami.com