Techincal

Feb 28, 2024

Feb 28, 2024

Feb 28, 2024

Prodvana Architecture Part 4: Runtime Interface & Overall Conclusion

Naphat Sanguansin

In our previous posts, we explored how the Prodvana Compiler builds a desired state and how the Prodvana Convergence Engine uses the desired state to decide what changes to make. This post will examine changes are applied across the numerous backends that must be supported for brownfield application.

Kubernetes, while dominant, is not the only type of workload runner. This is evident by the continued success of AWS ECS and serverless solutions such as Google Cloud Run and AWS Lambda.

Deployment systems that are backend-specific end up creating technical and cultural silos, which in turn leads to organizational inefficiency. We built the Prodvana Runtime Interface to ensure that Prodvana is backend-agnostic and minimizes migration costs for various Runtime types.

Each Runtime needs to be able to satisfy at least two interfaces:

  • fetch - return the current state of a Service in the Runtime.

  • apply - runs the command(s) to bring the Service to a desired state

Additionally, because the Prodvana Runtime Interface gives us access to user environments, it must also be designed with a trustworthy security model.

The Kubernetes Runtime Interface

Prodvana Kubernetes Agent - Establishing a Secure Connection

To implement support for Kubernetes, we first have to establish a connection. We do so by having users run the Prodvana Agent inside their Kubernetes cluster. The Prodvana Agent securely connects to the Prodvana API and, after a mutual handshake, establishes a secure connection between Prodvana and the Kubernetes API server.

The Agent architecture ensures our connection to the user clusters is secure and trustworthy.

  • No credentials are exchanged or stored on Prodvana.

  • The Prodvana APIs the Agent communicates with are behind a user-specific IP address and can be permitted as needed.

  • By deleting the Prodvana Agent, users can terminate all operations from Prodvana. 

Satisfying the Runtime Interface

Once the connection is established, we must implement the Runtime Interface.

fetch

fetch is implemented via the Kubernetes API client. Because the Prodvana Convergence Engine continuously polls the Runtime about Services running in it, we must avoid overloading the Kubernetes API server. This is done via the use of watchers.

apply

apply is implemented by calling out to kubectl apply. This ensures that apply actions taken by Prodvana match exactly what users would do on their own and can be replicated for debugging purposes. Additionally, kubectl apply wraps various Kubernetes API calls in a non-trivial way that would be fragile to replicate.

Custom Runtimes

Kubernetes satisfies many user workloads - but not all of them. We found ourselves with many Runtime types to implement, each with unique challenges, different security models, and diminishing user base sizes. Additionally, we expect to need to support users with entirely in-house Runtime implementations. 

These two requirements led us to a key insight: make it possible for Runtimes to be built outside the Prodvana codebase and use that interface to implement first-class Runtimes as we detect commonalities between users.

We call this class of Runtimes “Custom Runtimes” to denote that they are built outside the Prodvana codebase.

Kubernetes Jobs as a Building Block

Kubernetes gives us a solid foundation to build on: the Job resource. With a Kubernetes Job, we can enable anyone to implement a Runtime as long as they can create a Docker image. We can even provide an optional configuration interface to abstract away the complexity of Kubernetes jobs for simple commands.

runtime:
  name: my-custom-runtime
  apply:
    taskConfig:
      program:
        image: my-image
        cmd

Building on Kubernetes Jobs means that users will need a Kubernetes cluster, even if they only have non-Kubernetes workloads. This is a tradeoff we accept today based on conversations with users, but one we can remove by adding new in-codebase Runtime implementations that can serve as job runners.

Security Model

Because Custom Runtimes run as Kubernetes Jobs, they can access the same secrets that users already use for application-level code. For example, users can use Kubernetes Secrets or a third-party secret vendor and grant access via a Kubernetes Service Account. No credentials are exchanged or stored on Prodvana.

For Custom Runtimes we implement, we further ensure the use of the best-in-class secrets model for that Runtime. For example, we use role-based credentials instead of service-account-based ones for ECS.

Minimizing Migration Cost: an Incremental Approach

To minimize migration costs onto Prodvana for users with Custom Runtimes, we need to make it simple to define them. To that end, we take a tiered, incremental approach.

apply-only

Users with non-Kubernetes workloads usually already have commands to update the workloads. These commands function exactly like apply in the Runtime Interface, so we make it possible to define a Custom Runtime with just an apply command.

When a Custom Runtime only defines apply, Prodvana will run the apply command once and mark the Service as converged when the command succeeds. Recall that this is precisely the behavior of the Prodvana Convergence Engine when an entity defines an apply and not a fetch.

apply and simple fetch

Many Runtimes can detect if apply would do any work before apply runs. For example, Terraform has a plan command that can determine if there are any changes to be made. We use this ability for the simple fetch interface: run a command that exits 0 or 2, where 0 indicates no work to be done and 2 indicates a drift. The choice of exit code 2 is intentional here, as 1 is commonly used as an unexpected error by various CLIs.

When a Custom Runtime defines both apply and fetch, apply will only run if fetch indicates a drift. This can save expensive, unnecessary work and allow Prodvana to skip Release Channels in the convergence of a Service.

apply and structured fetch Output

Lastly, some Runtimes, like Kubernetes, allow workloads to be annotated. For these Runtimes, we allow fetch to return a JSON explaining exactly what is running at what version. 

{
  "objects": [
    {
      "name": "my-service",
      "objectType": "my-type",
      "versions": [
        {
          "version": "svc-1",
          "active": true
        }
      ],
      "status": "SUCCEEDED"
    }
  ]
}

In this mode, Custom Runtimes function like any natively implemented Runtimes, with the output of fetch being compared to the desired state to determine if apply should run.

In the above example, if the desired state is for version svc-1, then there is no work to be done, and apply would not run. If the desired state is for version svc-2, then apply will run.

A Tiered Approach

Notice that each tier of Custom Runtimes is increasingly more complex to implement. We expect most users only to implement apply, some to implement simple fetch, and very few to implement structured fetch output.

However, by ensuring that the Custom Runtime interface is sufficiently robust, we can implement first-party Runtimes as Custom Runtimes while still providing a first-class experience to our users.

Additionally, because each tier is incrementally built upon the previous tier, users can start simple and “upgrade” by investing in the Custom Runtime as they see fit.

Parameterizing Custom Runtimes

Without a way to parametrize Custom Runtimes jobs, the Custom Runtimes would not be able to differentiate between different Services. 

Custom Runtimes accept parameters just like Services do:

runtime:
  name: my-custom-runtime
  apply:
    taskConfig:
      program:
        image: my-image
        cmd: [./do-apply, --service={{.Params.service}}]
  parameters:
  - name: service
    required: true
    string

Parameters are then passed in from the Service Configuration when using Custom Runtimes:

service:
  name: my-service-on-custom-runtime
  custom:
    parameterValues:
    - name: service
      string

Additionally, a default set of environment variables is injected to both apply and fetch with Service-level information:

  • PVN_SERVICE

  • PVN_SERVICE_ID

  • PVN_APPLICATION

  • PVN_APPLICATION_ID

  • PVN_RELEASE_CHANNEL

  • PVN_RELEASE_CHANNEL_ID

  • PVN_SERVICE_VERSION

First-Class Custom Runtime Implementations

We have built the following Custom Runtimes as first-class in Prodvana.

Terraform and Pulumi Runners

Terraform Runner is a Custom Runtime that executes Terraform modules.

  • simple fetch - Run terraform plan, return drifted if plan indicates there is work to be done.

  • apply - Run terraform apply.

The Pulumi runner is implemented similarly to Terraform.

Source Code: Terraform, Pulumi

ECS

The ECS Runtime allows users to use Prodvana to manage services on ECS.

  • structured fetch output - Use AWS CLI to determine the number of running replicas and their versions based on AWS tags. Return a single Runtime object of type ECSService.

  • apply - Use AWS CLI to create/reuse task definition with tags for Prodvana Service ID and version, create the ECS service if it does not exist, and update its task definition.

Source Code: ECS Runtime

Google Cloud Run

The Google Cloud Run Runtime allows users to use Prodvana to manage services on Google Cloud Run. 

  • structured fetch output - Use gCloud CLI to determine the number of running replicas and their versions based on annotations. Return a single Runtime object of type CloudRun.

  • apply - Use gCloud CLI to apply the Cloud Run config with added annotations for Prodvana Service ID and version.

Source Code: Google Cloud Run

Learnings

  • The initial Kubernetes Runtime implementation required users to create and store credentials on Prodvana. This did not meet our bar for security, as it relied on static credentials (or we would have to implement cloud-provider-specific credentials rotation) and would not support clusters with private IPs. As a result, we rewrote the implementation to be agent-based. Additionally, requiring credentials was an unintuitive operation that often failed during onboarding. Agent-based connection meant that users only had to ensure they had kubectl authenticated with permission to create resources, which most already do.

  • Engineering organizations always have more than one runtime and usually have multiple runtime types.

Results

  • Our Runtime implementation has allowed users to manage cloud-native and legacy compute workloads in one system. 

  • Teams have used Custom Runtimes to orchestrate non-compute workloads, such as build systems and static content pushing.

  • For ourselves, we have been able to implement new types of Runtimes, such as ECS, in hours, not days or weeks.

Overall Conclusion

Prodvana’s Dynamic Delivery addresses the challenge of coordinating applications and infrastructure for platform teams looking to unify workflows and support sophisticated architectures without needing large migrations.

Prodvana offers a powerful solution for a wide range of architectures by embracing intent-based requirements, adaptability, and real-world changes. Users have seen 50% greater deployment frequency, increases of >20% in discovering issues before production, and increased user satisfaction.

If you’ve found this deep dive interesting and see similar challenges in your organizations, please contact me! We love feedback and learning about other ways platform teams build abstractions.

Intelligent Deployments Now.

Intelligent Software Deployment. Eliminate Overhead with Clairvoyance, Self Healing, and Managed Delivery.

© 2023 ✣ All rights reserved.

Prodvana Inc.

Intelligent Deployments Now.

Intelligent Software Deployment. Eliminate Overhead with Clairvoyance, Self Healing, and Managed Delivery.

© 2023 ✣ All rights reserved.

Prodvana Inc.

Intelligent Deployments Now.

Intelligent Software Deployment. Eliminate Overhead with Clairvoyance, Self Healing, and Managed Delivery.

© 2023 ✣ All rights reserved.

Prodvana Inc.