6 to 8 Years Relevant Experience
We are seeking a proactive and detail-oriented Production Support Engineer to support critical applications and ensure high availability, performance, and reliability in production environments. The ideal candidate will have experience in incident management, ITIL best practices, and cross-functional collaboration to resolve issues, meet SLA requirements, and stabilize application platforms.
Key Responsibilities:
- Troubleshoot issues in production environments, resolving incidents with minimal downtime.
- Implement ITIL best practices to improve the quality and speed of service delivery.
- Ensure Client SLAs are met by managing critical application deliverables and collaborating with business stakeholders.
- Handle Major Incidents, engaging the necessary teams for resolution, creating post-mortem reports, and ensuring proper closure.
- Respond to user issues and coordinate with development teams or external vendors for timely resolution.
- Perform ITIL Problem Management activities to identify root causes and remediate chronic issues.
- Maintain and contribute to the support knowledge base.
- Conduct system health checks and proactively monitor for potential issues.
- Collaborate with development and infrastructure teams to define checklists and validate steps for routine maintenance activities.
- Monitor production applications and environments, ensuring stability and availability.
- Create and maintain installation and operations documentation.
- Participate in deployment and configuration management processes.
- Coach and manage support teams when applicable.
- Investigate complex issues related to applications, data, and databases to ensure SLA compliance.
- Work closely with Dev/L3 teams on incident triage, change/release reviews, and platform enhancement opportunities.
- Participate in 24x7 on-call support coverage as needed.
Required Skills & Qualifications:
- Proven experience in production support, application monitoring, and incident management.
- Strong understanding of ITIL processes, including Incident, Problem, and Change Management.
- Familiarity with monitoring tools, system health check practices, and alert management.
- Good analytical and problem-solving skills with the ability to troubleshoot across application and infrastructure layers.
- Strong communication skills, both written and verbal.
- Ability to work under pressure, manage multiple tasks, and engage cross-functional teams.
- Experience working in 24x7 support environments is a plus.