Template for creating a k3s cluster with k3sup backed by flux and sops
Welcome to my template designed for deploying a single Kubernetes cluster. Whether you’re setting up a cluster at home on bare-metal or virtual machines (VMs), this project aims to simplify the process and make Kubernetes more accessible. This template is inspired by my personal home-ops repository, providing a practical starting point for anyone interested in managing their own Kubernetes environment.
At its core, this project leverages makejinja, a powerful tool for rendering templates. By reading configuration files—such as cluster.yaml and nodes.yaml—Makejinja generates the necessary configurations to deploy a Kubernetes cluster with the following features:
With this approach, you’ll gain a solid foundation to build and manage your Kubernetes cluster efficiently.
A Kubernetes cluster deployed with Talos Linux and an opinionated implementation of Flux using GitHub as the Git provider, sops to manage secrets and cloudflared to access applications external to your local network.
Other features include:
HelmRelease
and Kustomization
diffs w/ flux-localDoes this sound cool to you? If so, continue to read on! 👇
There are 5 stages outlined below for completing this project, make sure you follow the stages in order.
[!IMPORTANT]
If you have 3 or more nodes it is recommended to make 3 of them controller nodes for a highly available control plane. This project configures all nodes to be able to run workloads. Worker nodes are therefore optional.Minimum system requirements
| Role | Cores | Memory | System Disk |
|————-|—————|———————-|—————————————-|
| Control/Worker | 4 | 16GB | 256GB SSD/NVMe |
Head over to the Talos Linux Image Factory and follow the instructions. Be sure to only choose the bare-minimum system extensions as some might require additional configuration and prevent Talos from booting without it. You can always add system extensions after Talos is installed and working.
This will eventually lead you to download a Talos Linux ISO (or for SBCs a RAW) image. Make sure to note the schematic ID you will need this later on.
Flash the Talos ISO or RAW image to a USB drive and boot from it on your nodes.
Verify with nmap
that your nodes are available on the network. (Replace 192.168.1.0/24
with the network your nodes are on.)
nmap -Pn -n -p 50000 192.168.1.0/24 -vv | grep 'Discovered'
[!TIP]
It is recommended to set the visibility of your repository toPublic
so you can easily request help if you get stuck.
Create a new repository by clicking the green Use this template
button at the top of this page, then clone the new repo you just created and cd
into it. Alternatively you can us the GitHub CLI …
export REPONAME="home-ops"
gh repo create $REPONAME --template onedr0p/cluster-template --disable-wiki --public --clone && cd $REPONAME
Install the Mise CLI on your workstation.
Activate Mise in your shell by following the activation guide.
Use mise
to install the required CLI tools:
mise trust
pip install pipx
mise install
📍 Having trouble installing the tools? Try unsetting the GITHUB_TOKEN
env var and then run these commands again
📍 Having trouble compiling Python? Try running mise settings python.compile=0
and then run these commands again
Logout of GitHub Container Registry (GHCR) as this may cause authorization problems when using the public registry:
docker logout ghcr.io
helm registry logout ghcr.io
[!WARNING]
If any of the commands fail withcommand not found
orunknown command
it meansmise
is either not install or configured incorrectly.
Create a Cloudflare API token for use with cloudflared and external-dns by reviewing the official documentation and following the instructions below.
Use template
button for the Edit zone DNS
template.kubernetes
Permissions
, click + Add More
and add permissions Zone - DNS - Edit
and Account - Cloudflare Tunnel - Read
Continue to Summary
and then Create Token
.Create the Cloudflare Tunnel:
cloudflared tunnel login
cloudflared tunnel create --credentials-file cloudflare-tunnel.json kubernetes
Generate the config files from the sample files:
task init
Fill out cluster.yaml
and nodes.yaml
configuration files using the comments in those file as a guide.
Template out the kubernetes and talos configuration files, if any issues come up be sure to read the error and adjust your config files accordingly.
task configure
Push your changes to git:
📍 Verify all the ./kubernetes/**/*.sops.*
files are encrypted with SOPS
git add -A
git commit -m "chore: initial commit
"
git push
[!TIP]
Using a private repository? Make sure to paste the public key fromgithub-deploy.key.pub
into the deploy keys section of your GitHub repository settings. This will make sure Flux has read/write access to your repository.
[!WARNING]
It might take a while for the cluster to be setup (10+ minutes is normal). During which time you will see a variety of error messages like: “couldn’t get current server API group list,” “error: no matching resources found”, etc. ‘Ready’ will remain “False” as no CNI is deployed yet. This is a normal. If this step gets interrupted, e.g. by pressing Ctrl + C, you likely will need to reset the cluster before trying again
Install Talos:
task bootstrap:talos
Push your changes to git:
git add -A
git commit -m "chore: add talhelper encrypted secret
"
git push
Install cilium, coredns, spegel, flux and sync the cluster to the repository state:
task bootstrap:apps
Watch the rollout of your cluster happen:
kubectl get pods --all-namespaces --watch
Check the status of Cilium:
cilium status
Check the status of Flux and if the Flux resources are up-to-date and in a ready state:
📍 Run task reconcile
to force Flux to sync your Git repository state
flux check
flux get sources git flux-system
flux get ks -A
flux get hr -A
Check TCP connectivity to both the internal and external gateways:
📍 The variables are only placeholders, replace them with your actual values
nmap -Pn -n -p 443 ${cluster_gateway_addr} ${cloudflare_gateway_addr} -vv
Check you can resolve DNS for echo
, this should resolve to ${cloudflare_gateway_addr}
:
📍 The variables are only placeholders, replace them with your actual values
dig @${cluster_dns_gateway_addr} echo.${cloudflare_domain}
Check the status of your wildcard Certificate
:
kubectl -n kube-system describe certificates
[!TIP]
Use theexternal
gateway onHTTPRoutes
to make applications public to the internet.
The external-dns
application created in the network
namespace will handle creating public DNS records. By default, echo
and the flux-webhook
are the only subdomains reachable from the public internet. In order to make additional applications public you must set the correct gateway like in the HelmRelease for echo
.
[!TIP]
Use theinternal
gateway onHTTPRoutes
to make applications private to your network. If you’re having trouble with internal DNS resolution check out this GitHub discussion.
k8s_gateway
will provide DNS resolution to external Kubernetes resources (i.e. points of entry to the cluster) from any device that uses your home DNS server. For this to work, your home DNS server must be configured to forward DNS queries for ${cloudflare_domain}
to ${cluster_dns_gateway_addr}
instead of the upstream DNS server(s) it normally uses. This is a form of split DNS (aka split-horizon DNS / conditional forwarding).
… Nothing working? That is expected, this is DNS after all!
By default Flux will periodically check your git repository for changes. In-order to have Flux reconcile on git push
you must configure Github to send push
events to Flux.
Obtain the webhook path:
📍 Hook id and path should look like /hook/12ebd1e363c641dc3c2e430ecf3cee2b3c7a5ac9e1234506f6f5f3ce1230e123
kubectl -n flux-system get receiver github-webhook --output=jsonpath='{.status.webhookPath}'
Piece together the full URL with the webhook path appended:
https://flux-webhook.${cloudflare_domain}/hook/12ebd1e363c641dc3c2e430ecf3cee2b3c7a5ac9e1234506f6f5f3ce1230e123
Navigate to the settings of your repository on Github, under “Settings/Webhooks” press the “Add webhook” button. Fill in the webhook URL and your token from github-push-token.txt
, Content type: application/json
, Events: Choose Just the push event, and save.
[!CAUTION]
Resetting the cluster multiple times in a short period of time could lead to being rate limited by DockerHub or Let’s Encrypt.
There might be a situation where you want to destroy your Kubernetes cluster. The following command will reset your nodes back to maintenance mode.
task talos:reset
[!TIP]
Ensure you have updatedtalconfig.yaml
and any patches with your updated configuration. In some cases you not only need to apply the configuration but also upgrade talos to apply new configuration.
# (Re)generate the Talos config
task talos:generate-config
# Apply the config to the node
task talos:apply-node IP=? MODE=?
# e.g. task talos:apply-node IP=10.10.10.10 MODE=auto
[!TIP]
Ensure thetalosVersion
andkubernetesVersion
intalenv.yaml
are up-to-date with the version you wish to upgrade to.
# Upgrade node to a newer Talos version
task talos:upgrade-node IP=?
# e.g. task talos:upgrade-node IP=10.10.10.10
# Upgrade cluster to a newer Kubernetes version
task talos:upgrade-k8s
# e.g. task talos:upgrade-k8s
Renovate is a tool that automates dependency management. It is designed to scan your repository around the clock and open PRs for out-of-date dependencies it finds. Common dependencies it can discover are Helm charts, container images, GitHub Actions and more! In most cases merging a PR will cause Flux to apply the update to your cluster.
To enable Renovate, click the ‘Configure’ button over at their Github app page and select your repository. Renovate creates a “Dependency Dashboard” as an issue in your repository, giving an overview of the status of all updates. The dashboard has interactive checkboxes that let you do things like advance scheduling or reattempt update PRs you closed without merging.
The base Renovate configuration in your repository can be viewed at .renovaterc.json5. By default it is scheduled to be active with PRs every weekend, but you can change the schedule to anything you want, or remove it if you want Renovate to open PRs immediately.
Below is a general guide on trying to debug an issue with an resource or application. For example, if a workload/resource is not showing up or a pod has started but in a CrashLoopBackOff
or Pending
state. These steps do not include a way to fix the problem as the problem could be one of many different things.
Check if the Flux resources are up-to-date and in a ready state:
📍 Run task reconcile
to force Flux to sync your Git repository state
flux get sources git -A
flux get ks -A
flux get hr -A
Do you see the pod of the workload you are debugging:
kubectl -n <namespace> get pods -o wide
Check the logs of the pod if its there:
kubectl -n <namespace> logs <pod-name> -f
If a resource exists try to describe it to see what problems it might have:
kubectl -n <namespace> describe <resource> <name>
Check the namespace events:
kubectl -n <namespace> get events --sort-by='.metadata.creationTimestamp'
Resolving problems that you have could take some tweaking of your YAML manifests in order to get things working, other times it could be a external factor like permissions on a NFS server. If you are unable to figure out your problem see the support sections below.
Once your cluster is fully configured and you no longer need to run task configure
, it’s a good idea to clean up the repository by removing the templates directory and any files related to the templating process. This will help eliminate unnecessary clutter from the upstream template repository and resolve any “duplicate registry” warnings from Renovate.
Tidy up your repository:
task template:tidy
Push your changes to git:
git add -A
git commit -m "chore: tidy up
"
git push
There’s a lot to absorb here, especially if you’re new to these tools. Take some time to familiarize yourself with the tooling and understand how all the components interconnect. Dive into the documentation of the various tools included — they are a valuable resource. This shouldn’t be a production environment yet, so embrace the freedom to experiment. Move fast, break things intentionally, and challenge yourself to fix them.
Below are some optional considerations you may want to explore.
The template uses k8s_gateway to provide DNS for your applications, consider exploring external-dns as an alternative.
External-DNS offers broad support for various DNS providers, including but not limited to:
This flexibility allows you to integrate seamlessly with a range of DNS solutions to suit your environment and offload DNS from your cluster to your router, or external device.
SOPs is an excellent tool for managing secrets in a GitOps workflow. However, it can become cumbersome when rotating secrets or maintaining a single source of truth for secret items.
For a more streamlined approach to those issues, consider External Secrets. This tool allows you to move away from SOPs and leverage an external provider for managing your secrets. External Secrets supports a wide range of providers, from cloud-based solutions to self-hosted options.
If your workloads require persistent storage with features like replication or connectivity to NFS, SMB, or iSCSI servers, there are several projects worth exploring:
These tools offer a variety of solutions to meet your persistent storage needs, whether you’re using cloud-native or self-hosted infrastructures.
Community member @whazor created Kubesearch to allow searching Flux HelmReleases across Github and Gitlab repositories with the kubesearch
topic.
#support
or #cluster-template
channels in the Home Operations Discord server.If you’re having difficulty with this project, can’t find the answers you need through the community support options above, or simply want to show your appreciation while gaining deeper insights, I’m offering one-on-one paid support through GitHub Sponsors for a limited time. Payment and scheduling will be coordinated through GitHub Sponsors.
If this repo is too hot to handle or too cold to hold check out these following projects.
Big shout out to all the contributors, sponsors and everyone else who has helped on this project.