Documentation for my homelab.
OPNsense
Static 10.0.8.0/24
Route
To successfully route traffic to the 10.0.8.0/24
subnet advertised by MetalLB
a static route needs to be added to the OPNsense router. There is no good way
to do so within the web interface, instead SSH to the box and add the following
to /usr/local/etc/rc.syshook.d/start/96-k8s-static-route
. Plus, make sure it
has execute permissions (chmod +x 96-k8s-static-route
).
#!/bin/sh
route add -net 10.0.8.0/24 -interface vlan09
Uptime Kuma
uptime.kgb33.dev
An off-site uptime monitoring solution hosted on AWS ECS.
Scripts to deploy to both AWS and Fly.io exist in the repo; However, due to cost, Uptime Kuma is only deployed to Fly.io. AWS documentation and Scrips are kept to demonstrate AWS experience on a resume.
Cloudflare Rules
Cloudflare (occasionally) tries to block this bot. To prevent this, add a new "Configuration Rule" with a custom filter expression where the IP source matches the Fly.io IPv4 or IPv6 address assigned to the machine. This rule turns off the Browser integrity check, and sets the Security Level to "Essentially Off".
Fly.io Deployment
From flyio/uptime_kuma
, just run the following, It'll deploy Uptime Kuma to
Fly.io, validate the DNS challenge for SSL certificates, and add A
/AAAA
records. If you use down
instead of up
, it'll do the reverse. Don't worry
about running the commands multiple times, they're both idempotent.
dagger call \
--fly-api-token=FLY_API_TOKEN \
--fly-toml=fly.toml \
--pulumi-access-token=PULUMI_ACCESS_TOKEN \
--cloudflare-token=CLOUDFLARE_API_TOKEN \
up
AWS Deployment (Depreciated)
Secrets required:
- Cloudflare token (with write access to
kgb33.dev
) asCLOUDFLARE_API_TOKEN
- Allow Pulumi access to AWS (See here
AWS Permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"acm:DeleteCertificate",
"acm:DescribeCertificate",
"acm:ListTagsForCertificate",
"acm:RequestCertificate",
"ec2:AuthorizeSecurityGroupEgress",
"ec2:CreateTags",
"ec2:DeleteSecurityGroup",
"ec2:RevokeSecurityGroupEgress",
"ec2:RevokeSecurityGroupIngress",
"iam:AttachRolePolicy",
"iam:CreateRole",
"iam:DeleteRole",
"iam:DetachRolePolicy",
"iam:GetRole",
"iam:ListInstanceProfilesForRole",
"iam:ListRolePolicies",
"logs:DeleteLogGroup",
"logs:ListTagsLogGroup"
],
"Resource": "*"
}
]
}
Then just pulumi up
and navigate to uptime.kgb33.dev
Proxmox Install
For the base install, use the latest ISO and follow the instructions. Installation values:
- IP/Hostname:
10.0.9.101/24
/glint.pve.kgb33.dev
10.0.9.102/24
/targe.pve.kgb33.dev
10.0.9.103/24
/sundance.pve.kgb33.dev
- Email:
pve@kgb33.dev
Afterward, implement the following post-install steps.
VLAN Tagging
Unfortunately, there is no way to add VLAN tagging in the installation.
Instead, open a shell on the device and edit /etc/network/interfaces
.
auto lo
iface lo inet loopback
iface enp7s0f1 inet manual
-auto vmbr0
-iface vmbr0 inet static
+auto vmbr0.9
+iface vmbr0.9 inet static
address 10.0.9.102/24
gateway 10.0.9.1
+auto vmbr0
+iface vmbr0 inet manual
bridge-ports enp7s0f1
bridge-stp off
bridge-fd 0
+ bridge-vlan-aware yes
+ bridge-vids 2-4094
iface wlp0s20f3 inet manual
source /etc/network/interfaces.d/*
Save and reload with ifreload -a
.
Clustering
Make sure to cluster the machines using Proxmox's builtin clustering system.
On one machines (I prefer targe
) create the cluster, then join the other machines using their web UI.
Post-Install Ansible Playbooks
Create my user account and pull ssh keys:
cd ansible
ansible-playbook --limit=pve playbooks/audd/audd.yaml -k -u root
Then, run the following playbook to set the DNS servers and enable closing the lid without shutting down the machine.
ansible-playbook --limit=pve playbooks/pve/system-services.yaml -k -u root
TLS Certificates
ACME Accounts
Create two ACME 'accounts' using the Web UI (Datacenter → ACME) or by SSH to
one of the Proxmox machines (make sure to su
into root
).
# Select option 1: "Let's Encrypt V2 Staging"
pvenode acme account register homelab-staging pve@kgb33.dev
# Select option 0: "Let's Encrypt V2"
pvenode acme account register homelab-prod pve@kgb33.dev
dns-01
Challenge
In the Web UI, create a new Challenge Plugin (Datacenter → ACME) with the following values (all others are blank):
- Plugin ID:
homelab-cloudflare
- DNS API:
Cloudflare Managed DNS
- CF_TOKEN:
<CLOUDFLARE API TOKEN>
Add Certificate
On each node, navigate to System → Certificates and Add a domain under ACME.
- Challenge Type:
DNS
- Plugin:
homelab-cloudflare
(The one made above) - Domain:
<NODE>.pve.kgb33.dev
Set the "Using Account", then click "Order Certificates Now".
Proxmox Docs
Proxmox VMs
Download the latest Talos ISO onto all the Proxmox nodes
https://github.com/siderolabs/talos/releases/latest/download/metal-amd64.iso
Make sure to save it as talos-metal-amd64.iso
.
Create a Terraform User
Following the instructions on the Telmate/proxmox docs.
Or ssh to a node and run the following commands:
# Create Role
pveum role add TerraformProv -privs "Datastore.AllocateSpace Datastore.Audit Pool.Allocate Sys.Audit Sys.Console Sys.Modify VM.Allocate VM.Audit VM.Clone VM.Config.CDROM VM.Config.Cloudinit VM.Config.CPU VM.Config.Disk VM.Config.HWType VM.Config.Memory VM.Config.Network VM.Config.Options VM.Migrate VM.Monitor VM.PowerMgmt SDN.Use"
# Create User (No password)
pveum user add terraform-prov@pve
# Add Role to User
pveum aclmod / -user terraform-prov@pve -role TerraformProv
Create Proxmox API Token
Then, open the Web UI to generate the API Key.
Go to Datacenter → Permissions → API Tokens; then Add a token. Expose the Token ID (public) and Secret (duh) as environment variables:
# Examples from Telmate Docs
export PM_API_TOKEN_ID="terraform-prov@pve!mytoken"
export PM_API_TOKEN_SECRET="afcd8f45-acc1-4d0f-bb12-a70b0777ec11"
Build VMs
cd tf
tofu apply
Kubernetes
Starting from Scratch
First, make sure to create the Talos VMs as described here,
then, cd
into the talos
directory.
From here, you can use Dagger to automatically provision the nodes. Each step is also detailed in the sub-chapters - if you would prefer a manual approach.
Note: If you havn't already, generate the cluster info using
talosctl gen config homelab https://10.0.9.25:6443 -o _out
$ dagger functions
Name Description
argocd Step 4: Start ArgoCD.
base-img Builds a Alpine image with talosctl installed and ready to go.
bootstrap Step 2: Bootstrap etcd.
cilium Step 3: Apply Cilium.
provision Step 1: Provision the nodes.
Step 1: Provision the Nodes
After the brand new Talos VMs load up - and the STAGE
is Maintance
- run:
dagger call \
--raw-template=./templates/talos.yaml.j2 \
--talos-dir=_out \
provision
Step 2: Bootstrap Etcd
After all the nodes have rebooted (~1min), bootstrap Etcd. The STAGE
on
teemo
will change from Installing
to Booting
when its ready to be
bootstraped.
dagger call \
--raw-template=./templates/talos.yaml.j2 \
--talos-dir=_out \
bootstrap
Step 3: Apply Cilium
Once Etcd has started, apply cilium:
dagger call \
--raw-template=./templates/talos.yaml.j2 \
--talos-dir=_out \
cilium
Step 4: Start ArgoCD
Once the Cilium step has compleated (it'll show a nice status dashboard), start ArgoCD.
dagger call \
--raw-template=./templates/talos.yaml.j2 \
--talos-dir=_out \
argocd
Importantly, this step ends by printing out the default ArgoCD password. You still need to manually change the password and sync the apps-of-apps; see here.
Step 6: Grab the Kubeconfig
talosctl --talosconfig _out/talosconfig kubeconfig --nodes 10.0.9.25
Talos
First, cd
into the talos
directory.
Generating the Config files
Use the following command to create talosconfig
, controlplane.yaml
and worker.yaml
mkdir _out
pushd _out
talosctl gen config \
home https://10.0.9.25:6443 \
--config-patch '[{"op": "add", "path": "/cluster/proxy", "value": {"disabled": true}}, {"op":"add", "path": "/cluster/network/cni", "value": {"name": "none"}}]'
talosctl --talosconfig talosconfig config endpoint 10.0.9.25
talosctl --talosconfig talosconfig config node 10.0.9.25
popd
Start Nodes
Create a Python virtual environment and install dagger-io
.
python -m venv .venv
source .venv/bin/activate
pip install dagger-io
Then run the playbook:
python pipeline.py
TODO: Convert this to a Zenith style module.
Bootstrap etcd
Next, run
talosctl --talosconfig _out/talosconfig bootstrap
Then grab the kubeconfig, overwriting if needed:
talosctl --talosconfig _out/talosconfig kubeconfig
Note: The nodes won't be healthy until the cilium config is applied in the next step!
Cilium
Add the Helm repo:
helm repo add cilium https://helm.cilium.io/
Generate the cilium config:
helm template cilium cilium/cilium \
--version 1.15.1 --namespace kube-system \
--set ipam.mode=kubernetes \
--set kubeProxyReplacement=strict \
--set k8sServiceHost="10.0.9.25" \
--set k8sServicePort="6443" \
--set=securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
--set=securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
--set=cgroup.autoMount.enabled=false \
--set=cgroup.hostRoot=/sys/fs/cgroup \
--set hubble.listenAddress=":4244" \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true > cilium.yaml
And apply via,
kubectl apply -f cilium.yaml && cilium status --wait
ArgoCD
Argo Login
Grab the initial secret:
kubectl get secrets -n argocd \
argocd-initial-admin-secret \
-o jsonpath='{.data.password}' | base64 --decode
Port forward the Argo dashboard, then login with username admin
.
kubectl port-forward -n argocd services/argocd-server 8080:80
Note: This also forwards the Web GUI to localhost:8080
argocd login localhost:8080
argocd account update-password
Once the password has been changed, delete the initial secret:
kubectl delete secret -n argocd argocd-initial-admin-secret
Apps-of-Apps
Apply the meta definition:
kubectl apply -f k8s-apps/meta.yaml
And sync them:
argocd app sync argocd-meta
argocd app sync --project default
Note, on fresh cluster all the secrets will need to be rolled.
Kubernetes Secrets
Application secrets are managed using Sealed Secrets
and are stored with the application deployment config in k8s-apps/<APPLICATION>/<SEALED_SECRET>.yaml
.
Creating/Rotating Secrets
I use the following zsh function to regenerate the sealed secret when rotating them. Importantly, editing the plain text values within seems to cause the decryption to fail within the cluster; so recreating the secret from scratch seems to be the most consistent.
function sealSecret() {
if [[ $# -eq 0 ]]; then
echo "Useage: sealSecret secretName secretValue namespace"
return 1
fi
echo -n $2 | \
kubectl create secret generic $1 --dry-run=client --from-file=$1=/dev/stdin -o yaml -n $3 | \
kubeseal -o yaml
}
Listing Application Secrets
Currently, there are three sealed secrets:
k8s-apps/traefik/CloudflareSecret.yaml
k8s-apps/roboshpee/SealedToken.yaml
k8s-apps/pihole/pihole-admin-password.yaml
To get a current list of secrets in-repo:
rg -l '^kind: SealedSecret' k8s-apps
Or in-cluster:
kubectl get -A sealedsecrets.bitnami.com