Cloud Ops Engineer

May 18, 2026

Job Description

A Cloud Ops Engineer is responsible for maintaining, monitoring, troubleshooting, and supporting cloud-based applications and infrastructure.

This role focuses on:

  • Production support
  • Cloud operations
  • Incident management
  • Infrastructure monitoring
  • Automation
  • System reliability

The engineer ensures applications and services remain:

  • Stable
  • Secure
  • Highly available
  • Scalable

This role also involves:

  • DevOps practices
  • Kubernetes environments
  • Cloud deployments
  • Application release support
  • 24×7 operational support

Responsibilities

Incident & Production Support

  • Troubleshoot production and non-production issues
  • Resolve incidents within SLA timelines
  • Analyze logs and debug system issues
  • Perform Root Cause Analysis (RCA)
  • Minimize customer impact during outages
  • Communicate incident updates to stakeholders

Cloud Operations

  • Monitor cloud infrastructure and applications
  • Support deployments across multiple environments
  • Maintain high availability and system reliability
  • Support cloud platforms such as AWS and OCI

Automation & DevOps

  • Automate repetitive operational tasks
  • Improve operational efficiency using scripts and DevOps tools
  • Support CI/CD pipelines and infrastructure automation

Monitoring & Observability

Work with monitoring tools such as:

  • Prometheus
  • Grafana
  • Zabbix
  • ELK Stack

to track:

  • System health
  • Performance
  • Logs
  • Alerts

Infrastructure & Middleware Support

Support technologies such as:

  • Linux servers
  • Windows servers
  • JBoss
  • Apache
  • Tomcat
  • NGINX

Container & Kubernetes Support

  • Manage containerized applications
  • Support Kubernetes clusters
  • Work with Helm Charts and ArgoCD

Collaboration & Documentation

  • Work with DevOps, developers, and infrastructure teams
  • Use JIRA and Confluence for tracking and documentation
  • Participate in Agile/Scrum processes

Required Skills

Operating Systems

  • Linux (Ubuntu, CentOS, Amazon Linux, OEL)
  • Windows Server

Cloud & DevOps Skills

  • AWS
  • OCI
  • Jenkins
  • Terraform
  • Git
  • Ansible

Scripting Skills

  • Bash/Shell Scripting
  • Python
  • Perl
  • Groovy

Container & Orchestration Skills

  • Docker
  • Kubernetes
  • Helm Charts
  • ArgoCD

Networking Skills

  • DNS
  • Firewalls
  • LDAP
  • SFTP
  • File Systems

Monitoring Tools

  • Prometheus
  • Grafana
  • ELK Stack
  • Zabbix

Middleware Technologies

  • JBoss
  • Apache
  • Tomcat
  • NGINX

Preferred Skills

  • Agile/Scrum experience
  • Oracle/RDBMS knowledge
  • Hadoop/Spark exposure
  • Elasticsearch/Kibana
  • Informatica
  • Looker
  • WebSphere/WebLogic
  • AWS Certification