Apply for Site Reliability Engineer Job

Full time

Work From Office

This Position is Currently Open

Apply Now

This Position is Filled

Department / Category:

Engineer

Listed on Dec 19, 2023

Work Location:

Pune

Hyderabad

Chennai

Job Descritpion of Site Reliability Engineer

10+ Years Of Relevant Experience

We are seeking an experienced Site Reliability Engineer (SRE) to ensure the stability, scalability, and reliability of our production systems. The ideal candidate will focus on automation, incident management, performance optimization, and proactive monitoring, while collaborating closely with development and operations teams to build resilient infrastructure.

Key Responsibilities:

System Reliability: Collaborate with production support teams to build scalable, maintainable systems and continuously improve infrastructure and application architecture.
Toil Reduction & Automation: Develop and maintain tools, scripts, and automation for deployments, monitoring, incident response, and repetitive operational tasks to minimize manual effort and human error.
Incident Management: Participate in on-call rotations, respond to incidents and outages, investigate issues, and drive problem management through root cause analysis and preventive measures.
Monitoring & Alerting: Implement and maintain proactive monitoring systems and alerts to detect and address issues before they impact users.
Capacity Planning & Performance Optimization: Monitor performance metrics, identify bottlenecks, collaborate with engineering teams on optimization, and plan for future scalability.
Error Budgeting & Chaos Engineering: Conduct resiliency tests, mock drills, and stability assessments to improve system fault tolerance.
Documentation: Create and maintain detailed documentation for system configurations, operational processes, and troubleshooting guidelines.

Required Skills & Experience:

Strong understanding of cloud platforms (AWS, Google Cloud, or Azure).
Experience with containerization technologies (Docker, Kubernetes).
Proficiency with infrastructure-as-code tools (Terraform, Ansible).
Solid grasp of incident management processes and production operations.

Desirable Skills:

Software development experience in Python or Java.
Familiarity with monitoring and logging tools (Splunk Cloud, Thousand Eyes).
Strong networking fundamentals.
Ability to work in fast-paced, cross-functional environments with strong problem-solving skills.

Required Skills for Site Reliability Engineer Job

WS
Google Cloud
Azure
Docker
Kubernetes
Terraform
Ansible

Our Hiring Process

Screening (HR Round)
Technical Round 1
Technical Round 2
Final HR Round

Apply Now

Position Filled

Relavant Jobs in Engineer

Site Reliability Engineer with 10+ Years of Experience? Apply Now!

Thank you for submitting your form!

We appreciate your time and effort in providing us with your information.

We will get in touch with you soon.

Error occured submitting the form.