Using Terraform with Kubernetes

Terraform is an extremely flexible framework that allows for intent-based resource management. Its commonly used to provision various types of infrastructure as well as services and applications.

Terraform also has a number of official and unofficial providers that can be used to manage workloads inside Kubernetes clusters. This posts captures my experience with Terraform in that scenario.

At first glance it appears that Kubernetes and Terraform follow exactly the same intend-based logic, so it should be relatively easy to describe the desired state in Terraform and then make Kubernetes implement it. That approach definitely works in some cases, but not every time.

There's a number of providers that can be used by Terraform to manipulate the state of a cluster. They all seem to have their strong and weak points and only work well in some circumstances.

kubernetes provider

This official provider offers ability to manipulate a lot of the standard Kubernetes objects like Deployments, ReplicaSets or RoleBindings. This provider works relatively well if you create the code from scratch. I have also found an interesting project on Github: k2tf that converts YAML to HCL2 for consumption by Terraform.

The following things have to be taken into consideration when using this provider:

  1. Kubernetes doesn't have a prescribed amount of time for requirements to be satisfied. If you create a Deployment that needs a particular StorageClass using kubectl, but that class is not yet available - that's not a problem, once the class becomes available the whole Deployment is created. But for Terraform provider that results in a timeout, and if the resource doesn't get created in that prescribed time Terraform reports an error. To make things worse that tends to leave a resource inside the Kubernetes cluster but not in Terraform state file, so running the apply again causes an error. There's a way around this for some of the resources - they have a wait_for_rollout property that makes Terraform to see resource as created the moment request is acknowledged by Kuberntes. Obviously, in that case you might end up with a non-working setup, even though Terraform thinks everything is fine, as some resources actually end up not functional.
  2. Some of the Terraform resources seem to only support older APIs. For example - Ingress doesn't allow to specify IngressClassName. This particular problem  can be worked around using an annotation, but for many others there's no workaround. At this stage it doesn't look like there's an official match between the Terraform provider version and kubernetes versions.
  3. Finally - if the resource is not supported directly by this provider (for example when using an operator) - there's no way to work around this using this provider.

kubectl provider

This is a very good third-party provider. It uses native Kubernetes YAML files that can be scheduled by Terraform. It also offers a datasource that can split a YAML file with multiple resource definitions into a list that can be used by the actual resource. This approach works for creating complete applications directly from resources provided by their creators.

The following things have to be taken into consideration when using this provider:

  1. Splitting a long YAML into individual resourced doesn't create any dependencies between them. If the first resource happens to be a Namespace Terraform will attempt to create it in parallel with all the other ones, which fails, as the Namespace must be created first. The whole big YAML file can be manually split into smaller chunks to work around this, but it's not always trivial and definitely time-consuming, particularly if its needed to import a larger project.
  2. Any changes to the YAML file that affect order of resources (for example commenting one out or adding a new one in the middle of the file) force the whole list of resources to be renumbered. That in turn makes Terraform remove them and then recreate them, that can  lead to the problem from the point above.
  3. This provider by default waits for the resource to be created. Which means that if there's some sort of depedency in the long list of resoures and Terraform tries to create an object before the requirements can be satisfied (because the necessary resources are later in the file) the whole process times out. It can be really hard to determine which particular resource is causing the problem because all there's available is a sequence number from the file. This particular feature (waiting for resources to be created) can be turned off, but that means that no other resources in Terraform should depend on resources created by this provider.

helm provider

This is another official provider. It basically replaces the command line command helm. It takes mostly the same parameters and allows to customise the values.yaml in two ways: via a number of set properties to specify individual parameters, or via importing a complete values.yaml file.

The following things have to be taken into consideration:

  1. The releases have to be manually versioned (the same as the CLI)
  2. There's a number of flags that allow to force recreation of various components or reuse of values. They have to be use with caution to ensure that resources are not recreated.
  3. When something goes wrong the easiest way to debug it is to use CLI version of helm to inspect the failed state.

Summary

Terraform can be used to manage Kubernetes, but the numerous caveats make it really hard to do it reliably, particularly for larger projects.

Credits

Photo by Mike Hindle on Unsplash