Skip to content

Article: How Retraced Manages Infrastructure With a Small Team

How Retraced Manages Infrastructure With a Small Team
Technology

How Retraced Manages Infrastructure With a Small Team

by Kiran Baddi

At Retraced, Oracle cloud is the heart of our Infrastructure with our Compute, Database and files running on Oracle Cloud Infrastructure. While Oracle has wonderful managed services particularly the Autonomous transaction processing database, we still would not want to miss the Cloudflare's expertise in edge management—whether it’s DNS management, automated TLS/SSL management, WAF, or DDoS protection. Similarly, Entra ID (formerly Microsoft Azure AD) excels as a centralized IAM tool, not just for setting up SSO across different platforms but also for mobile device management.

We embraced a poly cloud approach to leverage the best solutions for each use case. However, this approach brings its own set of complexities in provisioning, maintaining, and upgrading infrastructure and software components.

Here’s a breakdown of the various clouds we’ve adopted:

Provider Use cases
Cloudflare Networking & Edge: DNS, Rate limiting
Security: Zero trust, WAF & DDoS Protection
Workers for UI/Frontend, Caching and some event driven services
Oracle Cloud Infrastructure Compute: Oracle Kubernetes Engine
Database: Autonomous Transaction Processing, Autonomous Data warehouse
Object storage: Files Management
Container Registry: Oracle Container image registry
Azure IAM: Entra ID
Artificial Intelligence: Azure Open AI Services
AWS Message Queue: Rabbit MQ
Email Services: Simple Email service

Yes, we are one of the few companies using Oracle Cloud Infrastructure. We chose it for its exceptional autonomous transaction processing databases, ensuring our compute and object storage are as close to the database as possible.

In addition to these, we utilize various SaaS providers, including development and automation tools such as HashiCorp Terraform Cloud, GitHub Enterprise, and Doppler.

High level overview of our architecture

 

Managing our poly cloud infrastructure with a small team dedicated to infrastructure operations relies on several key factors:

🦾 Managed Services

We strongly believe in the principle of buy before build. We avoid reinventing the wheel unless absolutely necessary. All our adopted services are fully managed - or in best case autonomous - which minimizes maintenance overhead and provides peace of mind regarding availability. Downtime for maintenance is rare.

📜Infrastructure as Code

We adopted Terraform to provision, deprovision and upgrade our infrastructure. All modules and workflows are stored in a dedicated repository and orchestrated through GitHub Actions. Terraform Cloud manages our state, ensuring efficient state locking.

The following demonstrates the workflow of how a new microservice is provisioned:

📂 A new branch is created → 📝 New module/resource added in the code → 🔃 Pull request is created → ⚙️ Pull request triggers the plan workflow → 👀 Reviewer reviews both the changes and the plan → 🚀 Merge triggers the Terraform deployment pipeline

We've taken this a step further by enabling developer self-service. Developers can create a microservice through GitHub, which sets up everything from a repository and Doppler project to Kubernetes deployment. They can also provision an Azure Open AI service without logging into the Azure portal. (We will discuss the self-service workflows in another article).

🙌🏾 Keeping it Simple

Counterintuitive as it may seem, simplicity is key to managing complex infrastructure. We focus on implementing the simplest solutions that meet our needs, even if it means not using the latest technology or tools. For example, we use a simple kubectl patch command for releases instead of more complex tools like Helm or Argo CD, given our straightforward microservices structure. This strategy helps minimize technical debt and keeps our system manageable.

📝Documentation and Communication

Documenting all changes is crucial for effective infrastructure management. We primarily document within the code, such as with Terraform. When code documentation isn’t possible, we use Notion and track all changes in our Significant Changes database. This practice ensures a clear record of actions taken, aiding in troubleshooting when issues arise. We communicate all changes to our technology department and the entire organization as necessary, fostering transparency and collaboration.

These strategies not only help us run our operations smoothly but also keep our infrastructure and software components up-to-date.

Read more

A woman looks smiling at her screen and a logo in the upper left corner says AI
Press

Making fashion supply chains transparent faster: Retraced automates audit management with their AI-powered platform

Press release | Düsseldorf, April 23, 2024  The Retraced platform introduces cutting-edge innovation, taking supply chain transparency in the fashion industry to the next level. Apparel and textile...

Read more
Understanding EU Sustainability Regulations and Their Impact on the Textile Industry
Blog

Understanding EU Sustainability Regulations and Their Impact on the Textile Industry

by Marwa Zamaray The EU business landscape is rapidly changing as sustainability regulations gain momentum within the European parliament and are enforced by member states. Compliance with these re...

Read more