Using Terraform with Kubernetes
Terraform is an extremely flexible framework that allows for intent-based resource management. Its commonly used to provision various types of infrastructure as well as services and applications.
Terraform also has a number of official and unofficial providers that can be used to manage workloads inside Kubernetes clusters. This posts captures my experience with Terraform in that scenario.
At first glance it appears that Kubernetes and Terraform follow exactly the same intend-based logic, so it should be relatively easy to describe the desired state in Terraform and then make Kubernetes implement it. That approach definitely works in some cases, but not every time.
There's a number of providers that can be used by Terraform to manipulate the state of a cluster. They all seem to have their strong and weak points and only work well in some circumstances.
kubernetes provider
This official provider offers ability to manipulate a lot of the standard Kubernetes objects like Deployments
, ReplicaSets
or RoleBindings
. This provider works relatively well if you create the code from scratch. I have also found an interesting project on Github: k2tf that converts YAML
to HCL2
for consumption by Terraform.
The following things have to be taken into consideration when using this provider:
- Kubernetes doesn't have a prescribed amount of time for requirements to be satisfied. If you create a
Deployment
that needs a particularStorageClass
usingkubectl
, but that class is not yet available - that's not a problem, once the class becomes available the wholeDeployment
is created. But for Terraform provider that results in a timeout, and if the resource doesn't get created in that prescribed time Terraform reports an error. To make things worse that tends to leave a resource inside the Kubernetes cluster but not in Terraform state file, so running theapply
again causes an error. There's a way around this for some of the resources - they have await_for_rollout
property that makes Terraform to see resource as created the moment request is acknowledged by Kuberntes. Obviously, in that case you might end up with a non-working setup, even though Terraform thinks everything is fine, as some resources actually end up not functional. - Some of the Terraform resources seem to only support older
APIs
. For example -Ingress
doesn't allow to specifyIngressClassName
. This particular problem can be worked around using an annotation, but for many others there's no workaround. At this stage it doesn't look like there's an official match between the Terraform provider version and kubernetes versions. - Finally - if the resource is not supported directly by this provider (for example when using an operator) - there's no way to work around this using this provider.
kubectl provider
This is a very good third-party provider. It uses native Kubernetes YAML
files that can be scheduled by Terraform. It also offers a datasource that can split a YAML
file with multiple resource definitions into a list that can be used by the actual resource. This approach works for creating complete applications directly from resources provided by their creators.
The following things have to be taken into consideration when using this provider:
- Splitting a long
YAML
into individual resourced doesn't create any dependencies between them. If the first resource happens to be aNamespace
Terraform will attempt to create it in parallel with all the other ones, which fails, as theNamespace
must be created first. The whole bigYAML
file can be manually split into smaller chunks to work around this, but it's not always trivial and definitely time-consuming, particularly if its needed to import a larger project. - Any changes to the
YAML
file that affect order of resources (for example commenting one out or adding a new one in the middle of the file) force the whole list of resources to be renumbered. That in turn makes Terraform remove them and then recreate them, that can lead to the problem from the point above. - This provider by default waits for the resource to be created. Which means that if there's some sort of depedency in the long list of resoures and Terraform tries to create an object before the requirements can be satisfied (because the necessary resources are later in the file) the whole process times out. It can be really hard to determine which particular resource is causing the problem because all there's available is a sequence number from the file. This particular feature (waiting for resources to be created) can be turned off, but that means that no other resources in Terraform should depend on resources created by this provider.
helm provider
This is another official provider. It basically replaces the command line command helm
. It takes mostly the same parameters and allows to customise the values.yaml
in two ways: via a number of set
properties to specify individual parameters, or via importing a complete values.yaml
file.
The following things have to be taken into consideration:
- The releases have to be manually versioned (the same as the CLI)
- There's a number of flags that allow to force recreation of various components or reuse of values. They have to be use with caution to ensure that resources are not recreated.
- When something goes wrong the easiest way to debug it is to use CLI version of
helm
to inspect the failed state.
Summary
Terraform can be used to manage Kubernetes, but the numerous caveats make it really hard to do it reliably, particularly for larger projects.
Credits
Photo by Mike Hindle on Unsplash