Documentation for my homelab.

OPNsense

Static `10.0.8.0/24` Route

To successfully route traffic to the 10.0.8.0/24 subnet advertised by MetalLB a static route needs to be added to the OPNsense router. There is no good way to do so within the web interface, instead SSH to the box and add the following to /usr/local/etc/rc.syshook.d/start/96-k8s-static-route. Plus, make sure it has execute permissions (chmod +x 96-k8s-static-route).

#!/bin/sh

route add -net 10.0.8.0/24 -interface vlan09

NixOS Server Configs

The nixos/ directory contains the system definitions for all of my NixOS servers.

This will (or maybe has already) replace Proxmox. However, Talos' configuration is already declarative, eventually I'll build out a Nix (or some other fancy config language - CUE? Dhall? pkl? Nickel?) module to generate the config.

To reload the config, see here.

Directory Layout

nixos/
├── base
│   ├── configuration.nix
│   └── users.nix
├── flake.nix
├── hosts
│   ├── iso
│   │   ├── cloneRepo.fish
│   │   ├── configuration.nix
│   │   └── installScript.sh
│   ├── ophiuchus
│   │   ├── configuration.nix
│   │   └── disks.nix
│   ├── ...
│   └── targe
│       ├── configuration.nix
│       └── disks.nix
└── justfile

The largest part is the machine configuration.

base contains configuration common to all machines, stuff like my user, enabling flakes, and setting networking DNS servers and domain names.

hosts/<HOSTNAME>/ contains machine specific configurations.

The hosts/iso/ is designed to be booted via a USB stick to easily install the other systems.

Next flake.nix grabs each host and provides a way to build them.

$ nix flake show
git+file:///home/kgb33/Code/homelab?dir=nixos
└───nixosConfigurations
    ├───iso: NixOS configuration
    ├───ophiuchus: NixOS configuration
    ├───...
    └───targe: NixOS configuration

Normal host are defined using the mkHost function. Whereas iso is manually defined with additional modules needed to be boot-able on a USB stick.

nixosConfigurations = let
  mkHost = hostname:
    lib.nixosSystem {
      modules = [
        ./hosts/${hostname}/configuration.nix
      ];
      specialArgs = {inherit inputs outputs;};
    };
in {
  iso = lib.nixosSystem {
    modules = [
      "${nixpkgs}/nixos/modules/installer/cd-dvd/installation-cd-graphical-plasma5.nix"
      "${nixpkgs}/nixos/modules/installer/cd-dvd/channel.nix"
      ./hosts/iso/configuration.nix
    ];
    specialArgs = {inherit inputs outputs;};
  };

  ophiuchus = mkHost "ophiuchus";
  targe = mkHost "targe";
};

Lastly, justfile is just used to conveniently build the iso.

Networking

Networking config is defined at the host level (i.e. in hosts/<HOSTNAME>) Eventually, this code duplication could be moved into base/.

Importantly, Both hostXYZ entries under networking are changed per-host. Additionally, the link name and IP address in matchConfig.Name and networkConfig.Address respectively also need to be changed per-host.

{...}: {
  imports = [
    ../../base/configuration.nix
    ./disks.nix
  ];

  networking = {
    hostName = "ophiuchus";
    hostId = "e7ea22a6"; # `head -c4 /dev/urandom | od -A none -t x4`
  };

  systemd.network = {
    enable = true;
    netdevs = {
      "10-vlan9" = {
        netdevConfig = {
          Name = "vlan9";
          Kind = "vlan";
        };
        vlanConfig.Id = 9;
      };
    };
    networks = {
      "10-enp0s13f0u1" = {
        matchConfig.Name = "enp0s13f0u1";
        vlan = ["vlan9"];
        networkConfig.LinkLocalAddressing = "no";
        linkConfig.RequiredForOnline = "carrier";
      };

      "10-vlan9" = {
        matchConfig.Name = "vlan9";
        gateway = ["10.0.9.1"];
        networkConfig = {
          Address = "10.0.9.104/24";
        };
      };
    };
  };
}

GitOps

The machine configs are synced every minute using comin.

To see the status:

On machine, use comin status
Prometheus metrics are exported on <HOST_IP>:4343/metrics

Rebuilding

Wait at most a minute for comin to pull and start a rebuild, or:

sudo nixos-rebuild switch --flake /var/lib/comin/repository/nixos#(hostname)

Note: comin can be paused using systemctl stop comin.

Testing

To test on one machine, ensure branches.testing.name is unset and push changes to a branch named testing-<HOSTNAME>.

To test changes on all machines, set branches.tesing.name to the name of the testing branch.

{...}: {
  services.comin = {
    enable = true;
    flakeSubdirectory = "nixos";
    exporter = {
      openFirewall = true;
      port = 4243;
    };
    remotes = [
      {
        name = "origin";
        url = "https://github.com/KGB33/homelab.git";
        branches.main.name = "main";
        branches.testing.name = "nixos-is-the-new-proxmox";
      }
    ];
  };
}

Secrets

Nix Secrets are managed by sops-nix.

Create a secret in-repo using sops host/<HOSTNAME>/<SERVICE_NAME>Secret.[env/yaml/etc].

Import it into the config via:

{...}: {
sops = {
    secrets = {
      "SERVICE_NAME" = {
        sopsFile = ./SERVICE_NAME_Secrets.env;
        format = "dotenv";
      };
    };
  };
};

The private key must also be on the machine.

scp ~/.config/sops/age/keys.txt $HOSTNAME:~/.config/sops/age/keys.txt

Uptime Kuma

uptime.kgb33.dev

An off-site uptime monitoring solution hosted on AWS ECS.

Scripts to deploy to both AWS and Fly.io exist in the repo; However, due to cost, Uptime Kuma is only deployed to Fly.io. AWS documentation and Scrips are kept to demonstrate AWS experience on a resume.

Cloudflare Rules

Cloudflare (occasionally) tries to block this bot. To prevent this, add a new "Configuration Rule" with a custom filter expression where the IP source matches the Fly.io IPv4 or IPv6 address assigned to the machine. This rule turns off the Browser integrity check, and sets the Security Level to "Essentially Off".

Fly.io Deployment

From flyio/uptime_kuma, just run the following, It'll deploy Uptime Kuma to Fly.io, validate the DNS challenge for SSL certificates, and add A/AAAA records. If you use down instead of up, it'll do the reverse. Don't worry about running the commands multiple times, they're both idempotent.

dagger call \
    --fly-api-token=FLY_API_TOKEN \
    --fly-toml=fly.toml \
    --pulumi-access-token=PULUMI_ACCESS_TOKEN \
    --cloudflare-token=CLOUDFLARE_API_TOKEN \
    up

AWS Deployment (Depreciated)

Secrets required:

Cloudflare token (with write access to kgb33.dev) as CLOUDFLARE_API_TOKEN
Allow Pulumi access to AWS (See here

AWS Permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "acm:DeleteCertificate",
                "acm:DescribeCertificate",
                "acm:ListTagsForCertificate",
                "acm:RequestCertificate",
                "ec2:AuthorizeSecurityGroupEgress",
                "ec2:CreateTags",
                "ec2:DeleteSecurityGroup",
                "ec2:RevokeSecurityGroupEgress",
                "ec2:RevokeSecurityGroupIngress",
                "iam:AttachRolePolicy",
                "iam:CreateRole",
                "iam:DeleteRole",
                "iam:DetachRolePolicy",
                "iam:GetRole",
                "iam:ListInstanceProfilesForRole",
                "iam:ListRolePolicies",
                "logs:DeleteLogGroup",
                "logs:ListTagsLogGroup"
            ],
            "Resource": "*"
        }
    ]
}

Then just pulumi up and navigate to uptime.kgb33.dev

Proxmox Install

For the base install, use the latest ISO and follow the instructions. Installation values:

IP/Hostname:
- 10.0.9.101/24 / glint.pve.kgb33.dev
- 10.0.9.102/24 / targe.pve.kgb33.dev
- 10.0.9.103/24 / sundance.pve.kgb33.dev
Email: pve@kgb33.dev

Afterward, implement the following post-install steps.

VLAN Tagging

Unfortunately, there is no way to add VLAN tagging in the installation. Instead, open a shell on the device and edit /etc/network/interfaces.

 auto lo
 iface lo inet loopback
 
 iface enp7s0f1 inet manual
 
-auto vmbr0
-iface vmbr0 inet static
+auto vmbr0.9
+iface vmbr0.9 inet static
         address 10.0.9.102/24
         gateway 10.0.9.1

+auto vmbr0
+iface vmbr0 inet manual
         bridge-ports enp7s0f1
         bridge-stp off
         bridge-fd 0
+        bridge-vlan-aware yes
+        bridge-vids 2-4094

 iface wlp0s20f3 inet manual
 
 
 source /etc/network/interfaces.d/*

Save and reload with ifreload -a.

Clustering

Make sure to cluster the machines using Proxmox's builtin clustering system.

On one machines (I prefer targe) create the cluster, then join the other machines using their web UI.

Post-Install Ansible Playbooks

Create my user account and pull ssh keys:

cd ansible
ansible-playbook --limit=pve playbooks/audd/audd.yaml -k -u root

Then, run the following playbook to set the DNS servers and enable closing the lid without shutting down the machine.

ansible-playbook --limit=pve playbooks/pve/system-services.yaml -k -u root

TLS Certificates

ACME Accounts

Create two ACME 'accounts' using the Web UI (Datacenter → ACME) or by SSH to one of the Proxmox machines (make sure to su into root).

# Select option 1: "Let's Encrypt V2 Staging"
pvenode acme account register homelab-staging pve@kgb33.dev

# Select option 0: "Let's Encrypt V2"
pvenode acme account register homelab-prod pve@kgb33.dev

`dns-01` Challenge

In the Web UI, create a new Challenge Plugin (Datacenter → ACME) with the following values (all others are blank):

Plugin ID: homelab-cloudflare
DNS API: Cloudflare Managed DNS
CF_TOKEN: <CLOUDFLARE API TOKEN>

Add Certificate

On each node, navigate to System → Certificates and Add a domain under ACME.

Challenge Type: DNS
Plugin: homelab-cloudflare (The one made above)
Domain: <NODE>.pve.kgb33.dev

Set the "Using Account", then click "Order Certificates Now".

Proxmox Docs

Proxmox VMs

Download the latest Talos ISO onto all the Proxmox nodes https://github.com/siderolabs/talos/releases/latest/download/metal-amd64.iso Make sure to save it as talos-metal-amd64.iso.

Create a Terraform User

Following the instructions on the Telmate/proxmox docs.

Or ssh to a node and run the following commands:

# Create Role
pveum role add TerraformProv -privs "Datastore.AllocateSpace Datastore.Audit Pool.Allocate Sys.Audit Sys.Console Sys.Modify VM.Allocate VM.Audit VM.Clone VM.Config.CDROM VM.Config.Cloudinit VM.Config.CPU VM.Config.Disk VM.Config.HWType VM.Config.Memory VM.Config.Network VM.Config.Options VM.Migrate VM.Monitor VM.PowerMgmt SDN.Use"
# Create User (No password)
pveum user add terraform-prov@pve
# Add Role to User
pveum aclmod / -user terraform-prov@pve -role TerraformProv

Create Proxmox API Token

Then, open the Web UI to generate the API Key.

Go to Datacenter → Permissions → API Tokens; then Add a token. Expose the Token ID (public) and Secret (duh) as environment variables:

# Examples from Telmate Docs
export PM_API_TOKEN_ID="terraform-prov@pve!mytoken"
export PM_API_TOKEN_SECRET="afcd8f45-acc1-4d0f-bb12-a70b0777ec11"

Build VMs

cd tf
tofu apply

Kubernetes

Starting from Scratch

First, make sure to create the Talos VMs as described here, then, cd into the talos directory.

From here, you can use Dagger to automatically provision the nodes. Each step is also detailed in the sub-chapters - if you would prefer a manual approach.

Note: If you havn't already, generate the cluster info using

talosctl gen config homelab https://10.0.9.25:6443 -o _out

$ dagger functions
Name        Description
argocd      Step 4: Start ArgoCD.
base-img    Builds a Alpine image with talosctl installed and ready to go.
bootstrap   Step 2: Bootstrap etcd.
cilium      Step 3: Apply Cilium.
provision   Step 1: Provision the nodes.

Step 1: Provision the Nodes

After the brand new Talos VMs load up - and the STAGE is Maintance - run:

dagger call \
  --raw-template=./templates/talos.yaml.j2 \
  --talos-dir=_out \
  provision

Step 2: Bootstrap Etcd

After all the nodes have rebooted (~1min), bootstrap Etcd. The STAGE on teemo will change from Installing to Booting when its ready to be bootstraped.

dagger call \
  --raw-template=./templates/talos.yaml.j2 \
  --talos-dir=_out \
  bootstrap

Step 3: Apply Cilium

Once Etcd has started, apply cilium:

dagger call \
  --raw-template=./templates/talos.yaml.j2 \
  --talos-dir=_out \
  cilium

Step 4: Start ArgoCD

Once the Cilium step has compleated (it'll show a nice status dashboard), start ArgoCD.

dagger call \
  --raw-template=./templates/talos.yaml.j2 \
  --talos-dir=_out \
  argocd

Importantly, this step ends by printing out the default ArgoCD password. You still need to manually change the password and sync the apps-of-apps; see here.

Step 6: Grab the Kubeconfig

talosctl --talosconfig _out/talosconfig kubeconfig --nodes 10.0.9.25

Talos

First, cd into the talos directory.

Generating the Config files

Use the following command to create talosconfig, controlplane.yaml and worker.yaml

mkdir _out
pushd _out
talosctl gen config \
    home https://10.0.9.25:6443 \
    --config-patch '[{"op": "add", "path": "/cluster/proxy", "value": {"disabled": true}}, {"op":"add", "path": "/cluster/network/cni", "value": {"name": "none"}}]'

talosctl --talosconfig talosconfig config endpoint 10.0.9.25
talosctl --talosconfig talosconfig config node 10.0.9.25
popd

Start Nodes

Create a Python virtual environment and install dagger-io.

python -m venv .venv
source .venv/bin/activate
pip install dagger-io

Then run the playbook:

python pipeline.py

TODO: Convert this to a Zenith style module.

Bootstrap etcd

Next, run

talosctl --talosconfig _out/talosconfig bootstrap

Then grab the kubeconfig, overwriting if needed:

talosctl --talosconfig _out/talosconfig kubeconfig

Note: The nodes won't be healthy until the cilium config is applied in the next step!

Cilium

Add the Helm repo:

helm repo add cilium https://helm.cilium.io/

Generate the cilium config:

 helm template cilium cilium/cilium \
    --version 1.15.1 --namespace kube-system \
    --set ipam.mode=kubernetes \
    --set kubeProxyReplacement=strict \
    --set k8sServiceHost="10.0.9.25" \
    --set k8sServicePort="6443" \
    --set=securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
    --set=securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
    --set=cgroup.autoMount.enabled=false \
    --set=cgroup.hostRoot=/sys/fs/cgroup \
    --set hubble.listenAddress=":4244" \
    --set hubble.relay.enabled=true \
    --set hubble.ui.enabled=true > cilium.yaml

And apply via,

kubectl apply -f cilium.yaml && cilium status --wait

ArgoCD

Grab the initial secret:

kubectl get secrets -n argocd \
  argocd-initial-admin-secret \
  -o jsonpath='{.data.password}' | base64 --decode

Port forward the Argo dashboard, then login with username admin.

kubectl port-forward -n argocd services/argocd-server 8080:80

Note: This also forwards the Web GUI to localhost:8080

argocd login localhost:8080
argocd account update-password

Once the password has been changed, delete the initial secret:

kubectl delete secret -n argocd argocd-initial-admin-secret

Apps-of-Apps

Apply the meta definition:

kubectl apply -f k8s-apps/meta.yaml

And sync them:

argocd app sync argocd-meta
argocd app sync --project default

Note, on fresh cluster all the secrets will need to be rolled.

Kubernetes Secrets

Application secrets are managed using Sealed Secrets and are stored with the application deployment config in k8s-apps/<APPLICATION>/<SEALED_SECRET>.yaml.

Creating/Rotating Secrets

I use the following zsh function to regenerate the sealed secret when rotating them. Importantly, editing the plain text values within seems to cause the decryption to fail within the cluster; so recreating the secret from scratch seems to be the most consistent.

function sealSecret() {
    if [[ $# -eq 0 ]]; then
        echo "Useage: sealSecret secretName secretValue namespace"
        return 1
    fi
    echo -n $2 | \
    kubectl create secret generic $1 --dry-run=client --from-file=$1=/dev/stdin -o yaml -n $3 | \
    kubeseal -o yaml
}

Listing Application Secrets

Currently, there are three sealed secrets:

k8s-apps/traefik/CloudflareSecret.yaml
k8s-apps/roboshpee/SealedToken.yaml
k8s-apps/pihole/pihole-admin-password.yaml

To get a current list of secrets in-repo:

rg -l '^kind: SealedSecret' k8s-apps

Or in-cluster:

kubectl get -A sealedsecrets.bitnami.com

Keyboard shortcuts

KGB33's Homelab Docs