May 9, 2023

May 9, 2023

May 9, 2023

Unleashing the Power of Prodvana's Convergence-based Deployment Engine

Naphat Sanguansin

In this post, we'll dive into the powerful capabilities and flexibility of Prodvana's convergence engine. This engine is a critical component of a broader system that brings together configurations from multiple inputs (like services, applications, and environments) to make deployment decisions.

Simple, Independent Deploys

We'll begin with the most straightforward deployment process: staging to production. 

Here is an example of what the input looks like:

The algorithm for this stage is simple:

The power of convergence becomes evident: each release channel operates independently and can be configured separately. The system uses declarative desired states to work towards. 

Pre-Deployment Protections

Fully independent deployments, while simple, do not reflect reality. Most deployments are interconnected in some way. The most common connection is the order of the deployment. 

Additionally, many deployments have an implicit set of pre-flight checks, e.g., the engineer checks that no outages are occurring. 

To add first-class support to these pre-flight checks, we introduce pre-deployment Protections. These are programs that exit 0 or 1 to allow for additional inputs for convergence.

The algorithm now incorporates pre-deployment Protections.

Protections with a pre-deployment lifecycle control occur prior to a deployment starting. 

In the example above, we have two protections validating the staging deployment. These are called alerts-not-firing and db-migrated

The alerts-not-firing Protection ensures that there are no active alerts. The db-migrated Protection checks that all necessary database migrations have been completed. 

The deployment is not started until all Protections pass during the pre-deployment lifecycle. This ensures that the environments are in a good state before the deployment begins and no known issues have been introduced.

Post-Deployment Protections

In addition to controlling when the deployment starts, another standard human operation is to validate the deployment. We codify this step via the post-deployment lifecycle for Protections.

After a new version of service has been deployed, the post-deployment Protections are checked to confirm that the deployment is successful and the service is operating correctly.

In the example above, the alerts-not-firing Protection is rechecked post-deployment. If any alerts are triggered after the new version is deployed, this indicates a problem with the deployment, so the service is automatically rolled back to the previous version.

Once enough time has passed, if no alerts have fired, the deployment is finalized. If there are other deployments with a pre-deployment release-channel-stable Protection on this deployment, the protection will succeed, allowing the new deployments to begin.

Modeling Complex Use Cases

The use case modeled previously demonstrates a safe deployment from staging to production. Convergence can be used to model more complex use cases, which are fragile and extremely hard to model in traditional pipelines.

In this example, we’ll look at a sophisticated SaaS deployment process. This contains one staging cluster, one shared production cluster, and two single-tenant clusters. The first single-tenant cluster requires customer approval before deployment and follows a quarterly major release cadence. The second single-tenant cluster mirrors the shared production cluster. All deployments, except for staging, require manual approval.

Here's the configuration for this use case:

The convergence algorithm remains unchanged.

We are working on extending the convergence engine further to support more complex use cases, such as release trains. This would allow us to support concurrent deployments across multiple release stages.

This is particularly beneficial for lengthy deployment processes spanning weeks making it possible to test the upcoming release before the final stages of the previous release are completed. It's also useful for teams wanting to validate individual release commits in production environments rather than batching them to the latest version.

The Power of Convergence

To recap the power of the convergence engine:

  • Isolation: Each entity is responsible for achieving its own desired state. This decentralizes control and reduces the complexity of managing multiple services. Internally, there is a dedicated actor for each entity responsible for ensuring the entity safely reaches the desired state.

    • Isolation unlocks many powerful use cases, including but not limited to localized rollbacks and failures, version pinning, and infinite parallelism.

  • Protections: Codify safety requirements for deployments and have the convergence engine monitor them automatically. It is also easy to skip Protections when needed, which can be helpful in situations where you need to deploy a hot fix or bypass a non-critical check quickly.

    • Protections unlock the ability to codify invariants for your environments, including but not limited to situations described above in this blog post.

  • State Awareness: The state of the environment is taken as an input with the defined desired state, allowing the engine to take the most appropriate action to converge your services.

    • State awareness unlocks many powerful use cases, including but not limited to resuming interrupted deployments, the self-healing of services across environments, and predictable outcomes.

The convergence engine with fully declarative inputs provides a powerful and flexible way to manage the deployment of services. Whether you're managing a handful of services, a complex microservices architecture, or a multi-cluster single-tenant environment, Prodvana's convergence engine will make your life easier.