Oddball modernized a federal platform using AWS cloud infrastructure and DevOps tools for Continuous Integration and Continuous Delivery (CI/CD), including Infrastructure as Code (IaC) to meet complex requirements and deliver secure, real-time, multichannel communication to Veterans.
Problem Statement/Definition
The legacy platform was built on 100+ on-premise servers and could not sustainably manage the rapidly increasing workload of critical Veteran notifications. Oddball was tasked with migrating and modernizing these on-premise servers to a highly scalable, and automated AWS infrastructure using modern DevOps tools.
Proposed Solution & Architecture
As part of our multi-year modernization effort, Oddball implemented a mature DevOps culture and an architecture based on core cloud-native and automation principles.
Continuous Integration and Continuous Delivery: We established a fully automated delivery pipeline to ensure every code change was automatically tested and deployed, reducing the risk of human error and increasing the speed of development cycles. We used GitHub Actions to automate our CI/CD workflows, which allowed our development team to automate workflows from code commits and pull requests to deployment and updates. To ensure a scalable, serverless container environment, we used Amazon Elastic Container Registry (ECR) to store, manage, and deploy Docker container images and OCI artifacts, and then used Amazon Elastic Container Service (ECS) Fargate to run the containerized applications. In addition, Oddball architected a new AWS Python-based workflow into the pipeline which was designed to send SMS notifications to Veterans, and enhanced the existing Flask application to send notifications during a defined timeframe, both minimizing disruption to the delivery process.
Infrastructure-as-Code (IaC) and Cloud Native Services: Our architecture and cloud-native services were implemented using anIaC approach, which allowed us to manage AWS resources in a programmable and predictable manner. We leveraged AWS Lambda for serverless, event-driven processing (e.g., translating opt-in data, consuming payment events), and Amazon noSQL DynamoDB to translate opt-in data via integrations, consume payment events, and track eligible notifications. We also used Amazon SQS to send messages between components with no data loss, and AWS Application Load Balancers (ALB) ensure high performance and minimal latency.
Observability and Feedback Loops: We utilized AWS CloudWatch to provide deep observability across the entire technology stack, offering real-time logging, metrics, and custom alarms to alert our team to the system’s health. Amazon PinPoint and Amazon Simple Email Service (SES)/Simple Notification Service (SNS) were utilized not just for email and SMS notification delivery, but also to track user engagement and delivery status, providing Veteran usage data back to development teams for continuous iteration.
DevSecOps and Security Automation: We integrated security practices directly into the automated delivery pipeline using AWS Identify Access Management (IAM) to securely control access to the system, and adhering to the principle of least privilege by managing users, groups, roles, and permissions across the entire automated environment. Also, the deployment into AWS GovCloud DocDB simplified security and compliance management, including our Authority to Operate (ATO), and allowed the team to focus on secure configuration and automation efforts.
We iteratively rolled out each modernization in close communication with our stakeholders and clients to avoid disruption and minimize deployment issues. Ultimately, our DevOps culture allowed us to enable AWS to build, operate, and maintain performant, reliable software that powers millions of notifications per year. This includes crucial Veteran outreach for appointment reminders, vaccine outreach, surgery notifications, prescription shipment tracking, emergency events, and beyond.
Outcomes & Success Metrics
- Sustained system error rates below 0.02%, validating the effectiveness of the automated testing and CI/CD quality gates.
- Automation and the CI/CD pipeline have enabled more frequent and reliable code changes
- Achieved P95 latency at <0.16s for notifications, ensuring a performant application and reliable UX
- On average, we can securely send 14 million SMS notifications each month
- A DevOps culture has allowed our modernization efforts to increase Veteran outreach success, 214 million texts, 27 million emails, and counting
- Proactive monitoring via CloudWatch and IaC allows teams to rapidly identify, resolve, and roll back issues, minimizing downtime and improving resiliency of the system.
- Modernized Flask app processes 6 million monthly events overnight
- Administrators can rapidly configure important healthcare notifications
Total Cost of Ownership Analysis Performed
Throughout the modernization, Oddball reduced total cost of ownership by replacing fixed-capacity systems with managed, pay-as-you-go cloud services and by utilizing DevOps practices. By building on managed and serverless primitives (e.g., ECS Fargate, Lambda, SQS, SES/SNS) and pairing those with Infrastructure as Code, automated CI/CD pipelines, and robust observability practices, we converted hard infrastructure costs and maintenance labor into more measurable operating expenses. This architecture enabled high-throughput, low-latency delivery at scale while minimizing re-platforming costs, accelerating feature delivery, and ensuring sustainable operations for millions of Veteran healthcare notifications annually.
Lessons Learned
- Continuous Feedback Loops: We leveraged Amazon PinPoint and AWS CloudWatch to establish comprehensive monitoring and logging. This accelerated business innovation through continuous feedback loops between Veterans and developers, allowing the development team to quickly iterate on services based on real-time Veteran usage and system performance data.
- Single Sign-On (SSO) Integration: Integrating SSO into a serverless ECS environment highlighted the importance of a true DevOps culture, requiring close collaboration between our development, operations, and external SSO teams to successfully adapt legacy services to modern architectures.
- Containerization in Cloud Environments: When deciding on a container orchestration solution, we found that using AWS services like ECS with Fargate for serverless container management simplified scaling and resource allocation, allowing teams to focus on development instead of day-to-day infrastructure work.
- Serverless Infrastructure: By choosing a fully serverless architecture for all AWS resources, we streamlined ATO management, and significantly reduced long-term maintenance labor and infrastructure risk, so costs directly match fluctuating notification volumes.