Platform Monitoring Engineer

Software Developer

Bengaluru

January 16, 2026

Full Time

Apply Now

Job Description

Databricks helps data teams solve some of the world’s biggest problems—such as building new transportation systems and speeding up medical research. We do this by creating one of the best data and AI platforms in the world.

Founded by engineers and driven by customer needs, Databricks works on challenging technical problems every day—from building modern user interfaces to running large-scale infrastructure across millions of virtual machines. We are growing fast and continuously improving our platform.

About the Role

We are looking for an experienced Senior Platform Monitoring Engineer to join our Platform Monitoring Team. This is a critical role focused on platform reliability, monitoring, incident response, and customer experience.

You will act as a key first responder when platform issues occur, investigate problems deeply, improve monitoring systems, and help prevent future incidents. Your work will directly impact how reliable and stable the Databricks platform is for customers.

Key Responsibilities

Incident Management & Reliability

Act as a lead responder during platform incidents to reduce customer impact
Coordinate with multiple engineering and infrastructure teams to quickly identify and resolve issues
Ensure problems are detected early and handled efficiently

Root Cause Analysis

Perform detailed post-incident investigations
Identify the real root cause of failures across infrastructure, services, and cloud platforms
Look for recurring patterns and propose long-term fixes to avoid repeat incidents

Monitoring & Observability

Design and improve monitoring, alerting, and observability systems
Build customer-focused alerting pipelines to detect issues faster
Correlate metrics, logs, and traces to improve system visibility
Reduce mean time to detect (MTTD) and resolve (MTTR) issues

Automation & System Improvements

Develop automation tools to reduce manual work and improve reliability
Create reusable monitoring patterns and best practices
Continuously improve platform stability and customer experience

Required Skills & Experience

5+ years of experience as an:
- Site Reliability Engineer (SRE)
- DevOps Engineer
- Production Engineer
- Or similar role
Strong experience working in production environments
Hands-on experience with at least one cloud provider:
- AWS
- Azure
- Google Cloud Platform (GCP)
Experience with containers and orchestration tools:
- Docker
- Kubernetes
Strong knowledge of monitoring and alerting tools, such as:
- Prometheus
- Grafana
- ELK stack
- PagerDuty
Ability to design monitoring systems using metrics, logs, and traces
Strong programming skills in Python (or similar languages)
Experience building automation tools used in real production systems
Deep understanding of the full incident lifecycle:
- Detection
- Mitigation
- Resolution
- Post-incident review

Education

Bachelor’s, Master’s, or PhD in:
- Computer Science
- Computer Engineering
- Or a related engineering field

Key Skills

Platform Monitoring, Root Cause Analysis, Incident Management, Python, Automation Tools, Cloud Platforms (AWS/Azure/GCP), Kubernetes, Docker, Observability, Customer Experience, Computer Science

Date Posted

January 16, 2026
Location

Bengaluru
Expiration date

February 15, 2026
Experience

5 Year
Qualification

Bachelor Degree

Platform Monitoring Engineer

Job Description

About the Role

Key Responsibilities

Incident Management & Reliability

Root Cause Analysis

Monitoring & Observability

Automation & System Improvements

Required Skills & Experience

Education

Key Skills

Related Jobs

Java Support Engineer

Python Software Developer

Technical Analyst

Senior Application Developer

Call us

+91 9442078378

For Candidates

For Employers

About Us

Helpful Resources

Login to superio

Reset Password

Create a free superio account

Platform Monitoring Engineer

Job Description

About the Role

Key Responsibilities

Incident Management & Reliability

Root Cause Analysis

Monitoring & Observability

Automation & System Improvements

Required Skills & Experience

Education

Key Skills

Share this post

Related Jobs

Java Support Engineer

Python Software Developer

Technical Analyst

Senior Application Developer

Call us

+91 9442078378

For Candidates

For Employers

About Us

Helpful Resources