How to become a Site Reliability Engineer
Overview
Run production services that don't go down — apply software engineering to operations so reliability is a product, not a hope.
As products become always-on infrastructure, the SRE is the person who turns availability into a measurable target and an engineering discipline. BLS projects 15% growth (2024–34) for Software Developers and WEF lists technology roles among the fastest-growing. AI is good at surfacing anomalies and drafting runbooks; the SRE still owns the SLOs, the postmortems, and the call under pressure.
What AI changes
What AI accelerates
Anomaly detection, first-pass runbooks, log summarisation, postmortem drafting, and capacity-planning tables.
What stays human
SLO design, incident command, on-call judgement, error-budget trade-offs, and postmortem culture.
AI surfaces anomalies, drafts runbooks, and proposes remediations, but the SRE's value is in setting the SLOs, designing error budgets, running the postmortem culture, and making the call under pressure during an incident. That judgement compounds; the routine parts get faster and the reliability spine gets more valuable.
Day to day
Set and defend SLOs, lead incident response, run blameless postmortems, automate toil, review capacity plans, and partner with product on reliability trade-offs.
Core skills
- Observability (metrics, logs, traces) and SLO/SLI design
- Incident response and on-call discipline
- Infrastructure as code and Kubernetes
- One or more production languages (e.g. Go, Python)
- Postmortem and error-budget culture
Tools
- Prometheus, Grafana, OpenTelemetry
- Kubernetes
- Terraform / Pulumi
- Go or Python
- PagerDuty / Opsgenie
How to get in
Entry routes
- From a DevOps or backend engineering role with on-call experience
- From a systems administration role with strong coding upskilling
- From an SRE-adjacent on-call rotation with self-study
- From a CS degree with strong systems/internship work
Certifications
- AWS Certified DevOps Engineer
- Certified Kubernetes Administrator (CKA)
- Google Cloud Professional Cloud Architect
Seniority ladder
| Level | Title | Experience | Focus | Salary |
|---|---|---|---|---|
| Entry | Junior SRE | 0–2 yrs | On-call rotation, automation, learning the platform | Entry of the US band, below the role median |
| Mid | Site Reliability Engineer | 2–5 yrs | Owning SLOs for a service area, leading incidents | Around the role median |
| Senior/Lead | Senior SRE | 5–8 yrs | Multi-service reliability, error-budget governance, mentoring | Upper end of the US band |
| Principal/Staff | Staff / Principal SRE | 8+ yrs | Cross-team reliability strategy, multi-region architecture, standards | Above the senior band, with a technical-leadership premium |
Where it can lead
Progresses to
- Senior SRE
- Staff SRE
- devops-engineer
- engineering-manager
Pivots to
- devops-engineer
- cloud-engineer
- security-engineer
- software-engineer
Pay (US)
USD 120,000
USD 133,080
USD 205,000
Outlook
US Software Developers employment is projected to grow 15% (2024–34), well above the 3% all-occupation average; SRE demand is structurally strong as more products become always-on services.
Prove it
CI/CD Demo on a Tiny App
Incident Runbook + Game-Day Exercise
Terraform/IaC Mini-Project
Capacity Planning Model (Spreadsheet)
Threat Model of a Small App
Interview prep
Interview prep not yet available for this role.
Your path into Site Reliability Engineer
See how your experience lines up — skill gaps, salary fit, and a personalised seniority match. No invented claims, just your real career mapped against this role.
Unlock all 10 career paths + deep reports
See full fit breakdowns, skill-gap maps, proof-project ideas, and salary outlooks for every path.