Site Reliability Engineer - SaaSOps

Hyderabad April 15, 2026 Full Time Lever
About the Role:

Responsibilities:
    • Define and embed SRE best practices across the SaaS platform, ensuring reliability is built into the system from the ground up.
    • Establish and maintain meaningful SLA, SLIs, SLOs, and error budgets to protect customer experience and guide engineering priorities.
    • Design and continuously improve high-availability and disaster recovery strategies.
    • Automate manual processes, manage incident response, optimize performance (SLI/SL0).
    • Bridge the gap between development IT operations.
    • Ensure strong tenant isolation and consistent performance within a DB-per-tenant architecture.
    • Strengthen system resiliency across both Azure and on-prem deployments in our hybrid environment.
    • Lead incident response efforts with structured troubleshooting and clear communication.
    • Drive thorough root cause analysis (RCA) and conduct blameless postmortems focused on long-term improvements.
    • Translate incidents into systemic fixes rather than temporary patches.
    • Develop and maintain operational runbooks to standardize responses.
    • Design and maintain a comprehensive observability framework for both cloud and on-prem environments.
Requirements:
    • Must have a minimum of 3+ years of hands-on experience in Site Reliability Engineering (SRE), supporting production-grade, cloud-native enterprise software platform/applications.
    • Prior experience as a DevOps engineer, cloud system administrator or software developer.
    • Strong proficiency in scripting languages such as Python, PowerShell etc
    • Deep hands-on experience working with Microsoft Azure in production environments.
    • Possess a solid understanding of Terraform, Ansible, Kubernetes internals, including networking, scheduling, scaling, and resource management.
    • Have proven experience in PostgreSQL performance tuning and optimization in production systems.
    • Demonstrate hands-on experience with Azure Monitor, Application Insights, and Log Analytics for cloud-based observability.
    • Implement and manage Prometheus and Grafana for Kubernetes and on-prem monitoring.
    • Understand how to turn metrics, logs, and traces into actionable insights that improve reliability and performance.
    • Troubleshoot and improve CI/CD pipelines to ensure stable and predictable releases.
    • Apply
Apply on company site

How to Get Hired at Valgenesis

  • Tailor your resume to each specific Valgenesis role — Lever applications are evaluated per-position
  • Valgenesis uses Lever to manage applications; PDF format preserves your formatting through their parser
Read the full guide

How well do you match this role?

Check My Resume