No Nonsense Self Service Capability
As discussed in our earlier blog, we have simplified our infrastructure management by leveraging Terraform and GitHub Actions. As the leading supply chain sustainability diligence platform for the apparel industry, we continuously add new microservices to meet emerging market requirements and client needs. Now, our feature developers always want the infrastructure for the new microservice ready in a few minutes.
They are always collaborative and understanding, of course, but our organization values efficiency and speed in getting things done.
Our Daily Challenge
Every time we need to provision a microservice, someone from our platform engineering team has to:
- Create a pull request (PR) in GitHub after adding new modules to our GitHub and Cloudflare modules.
- Have the PR reviewed and merge to the main branch and let the workflow run.
- Clone the repo from the boilerplate, and update secrets in Doppler from the boilerplate Doppler project.
- Create user for the new service in Staging and Production databases We are done.
Oh, wait. I forgot to
- Update the Doppler with the new database passwords.
Now we are done. Oh no! Just one more thing:
- The new service name has to be added in the release notes database for our daily releaser to update the release notes.
All good but why does the new pod crashLoopsback get in staging. 🤦♂️ Ah, I forgot to replace some environment variable name in the Kubernetes manifest.
Every time a new microservice was provisioned, an engineer had to dedicate 3 -4 hours of his time to this project, and on top of that he ran into errors because of some minor manual error. On many occasions, our daily release failed because we missed a step in our micro-services process.
Checklists helped somewhat but didn’t completely resolve our issues. Additionally, as a small team managing infrastructure operations, our developers were often blocked until someone from the platform engineering team had time to provision the new service. This is a classic example of “toil” and we want to eliminate it.
Our Solution
We want developers to be able to provision new service without having to wait for someone else. As always, we want to keep it simple.
The solution ➡️ A homegrown GitHub Actions workflow to handle all the steps listed above.
Here’s How We Did It:
We already have Terraform modules to provision a GitHub repository and a Doppler project, complete with configurations for environments, secrets and access. Our Kubernetes deployments are managed through a single manifest file for each microservice.
Now, we need a way to generate Terraform code blocks for a GitHub Module and Cloudflare DNS records, and then create Kubernetes manifests from templates, replacing values for service name, port number, and description. We have created a Node.js script that generates Terraform code and Kubernetes (K8s) manifests for new microservices, adhering to the KISS (Keep It Simple, Stupid) strategy.
Let me break-down some important steps in this process:
1. We created templates of our GitHub and Cloudflare modules.
Example of GitHub module template:
module "PROJECT_NAME-service-repository" {
source = "./modules/repositories"
name = "PROJECT_NAME-service"
required_status_checks = PROJECT_REQUIRED_CHECKS
description = "PROJECT_DESCRIPTION"
production_deployment_approval_user_ids = var.production_approvers
}
2. We developed a simple JavaScript program to replace PROJECT_NAME with github.event.inputs.name.
Here’s the code snippet that accomplishes this:
const textReplacer = (source, {name, description, requiredChecks, port, version = 'v1.0.0'}) =>
source
.replace(/PROJECT_NAME/g, name)
.replace(/PROJECT_DESCRIPTION/g, description)
.replace(/PROJECT_VERSION/g, version)
.replace(/PROJECT_PORT/g, port)
.replace(/PROJECT_REQUIRED_CHECKS/g, requiredChecks)
The source is the module that we would like to generate.
const githubModule = fs.readFileSync('github-module.tf', 'utf8')
const githubModuleContent = textReplacer(githubModule, {name, description, requiredChecks, port, version})
3. Once the module snippet is generated, it is appended to the existing Terraform configuration after verifying that the module doesn't already exist.
Code Block Exampleconst githubRepositoriesServicesContent = fs.readFileSync('./github-repositories.tf', 'utf8')
if (githubRepositoriesServicesContent.includes(githubModuleContent)) {
console.log('Module ' + name + ' already exists')
} else {
fs.appendFileSync('./github-repositories.tf', githubModuleContent)
}
The Beauty of Self-Service
To summarize, our self-service workflow accomplishes the following:
1. Takes the service name, description and port number as inputs.
2. The Pre-check job checks existing repositories and ends the workflow if there is an existing repository with the same name as requested.
3. The Create new service job executes a NodeJS script that uses a template to generate new modules and append them to the existing Terraform files, including:
- Updating Terraform configuration
- Creating K8s manifests for staging and production
- Applying the new terraform configuration
- Creating a PR and merging the changes into our devops repository
4. Clones our boilerplate repository and checks in that code as the first commit to the new service's repository.
5. Populates the Doppler project with the secrets from our boilerplate Doppler project.
6. Creates users for the new service in staging and production databases and updates them in Doppler.
7. Adds the new service information to the GitHub Actions library.
8. Updates the Release notes in Notion.
9. Applies the K8s manifests in staging and production.
10. Success!
➡️ A complete new microservice is fully deployed in 8-10 minutes with no human intervention and (close to) zero errors.
Using this as a foundation, we've developed our Developers Self-Service Hub, enabling developers to create not only full-blown microservices but also simple GitHub repositories or even an Azure AI services. We continuously gather feedback from our developers and add more capabilities to the self-service hub.