Migrating a Monolith to AWS: Lift-and-Shift vs Re-architect
Choosing the right migration path depends on constraints: time-to-migrate, risk tolerance, and target operating model.
Paths
- Rehost (Lift-and-Shift): fastest, minimal code change; great for deadlines
- Replatform: adopt managed services (RDS, ElastiCache, MQ) to reduce ops load
- Re-architect: break into services, event-driven, domain-aligned teams
Decision Drivers
- Compliance and downtime windows
- Peak load patterns and scaling needs
- Team skills in containers vs serverless
- Observability maturity and SLOs
Recommended Minimal Viable Migration
Start with replatforming primitives even for rehost:
- Move database to Amazon RDS
- Externalize config/secrets
- Add centralized logging and metrics
- Introduce a message broker for async work
This builds a runway for future re-architecture without stalling migration.
Step-by-Step Migration Playbook
- Readiness and Inventory (1–2 weeks)
- Catalogue services, dependencies, data stores, batch jobs, and cron
- Identify compliance constraints, maintenance windows, and peak usage
- Define success metrics: SLOs, error budgets, cost baseline
- Networking and Foundations (1 week)
- Create VPC, subnets (public/private), NAT, route tables
- Establish IAM roles, account boundaries, and SSO
- Set CloudWatch log groups, alarms, dashboards
- Database Path (1–2 weeks)
- Choose RDS engine and size; enable Multi-AZ and backups
- Plan migration method: DMS (CDC) vs downtime window dump/restore
- Validate character sets, time zones, and sequences/auto-increment
- App Compute Strategy (1–2 weeks)
- Rehost: EC2 AMIs with ASG + ALB
- Replatform: ECS Fargate service with health checks and autoscaling
- Re-architect: break modules by domain (start with the noisiest hotspot)
- Observability and Ops (ongoing)
- Structured JSON logs with correlation IDs
- Metrics: p95 latency, errors, saturation, cost per request
- Traces for top 5 slowest endpoints
- Cutover Plan (1–2 days)
- Freeze releases, switch DNS TTL to low value
- Blue/green deploy; smoke tests; ramp traffic gradually (10% → 50% → 100%)
- Rollback plan tested and documented
- Post-Migration Hardening (1–2 weeks)
- Cost review (rightsizing, savings plans)
- Reliability review (retry policies, timeouts, circuit breakers)
- Security review (least-privilege IAM, secret rotation)
Decision Matrix (Quick Reference)
- Deadline ≤ 4 weeks: Rehost or minimal replatform
- Team new to containers: Beanstalk/EC2 before ECS/EKS
- Volatile traffic: ECS Fargate + ALB autoscaling
- High ops overhead: Managed DBs, managed cache, managed MQ