DevOps Engineer

May 2, 2026

Job Description

We are looking for a Site Reliability Engineer to manage and maintain mission-critical cloud infrastructure for global customers. This role focuses on ensuring high availability, reliability, and performance of production systems. The candidate will work on monitoring systems, automating infrastructure, improving scalability, and collaborating with development teams to enhance overall system efficiency.


Responsibilities

Production & System Reliability

  • Monitor system availability and maintain overall system health
  • Ensure smooth functioning of production environments
  • Provide operational support for large-scale distributed systems

Performance & Optimization

  • Analyze system metrics and optimize performance
  • Provide predictive insights to prevent system failures
  • Improve system reliability and scalability

Automation & Infrastructure

  • Build tools and automation for infrastructure management
  • Develop systems to manage cloud and on-premise environments
  • Improve deployment processes and reduce manual efforts

Collaboration & Engineering Support

  • Work with development teams to improve system quality and releases
  • Participate in system design, capacity planning, and architecture discussions
  • Support testing and deployment processes

Compliance & Process

  • Follow organizational policies and quality standards
  • Participate in risk assessment and system governance processes

Requirements

  • 2+ years of experience in Site Reliability Engineering / DevOps
  • Strong experience in managing cloud infrastructure and production systems
  • Experience in monitoring, troubleshooting, and performance tuning
  • Ability to analyze system and application metrics
  • Knowledge of automation and infrastructure tools
  • Experience working with distributed systems
  • Strong problem-solving and analytical skills

Skills

  • Cloud Platforms (AWS, Azure, GCP)
  • DevOps Tools (Jenkins, Git, CI/CD pipelines)
  • Containerization (Docker, Kubernetes)
  • Infrastructure as Code (Terraform, Ansible, Puppet)
  • Programming/Scripting (Python, Shell)
  • Monitoring Tools (Nagios, etc.)
  • Linux / Operating Systems
  • System Automation & Performance Optimization

Good to Have Skills

  • Experience with Citrix Cloud or CloudStack
  • Data center or ISP experience
  • Knowledge of GPU systems and virtualization
  • Experience supporting AI/ML workloads