Job Description

Roles & Responsibilities

  • Design, architect, and build Big Data platforms (Data Lake, Data Warehouse, Lakehouse) using Databricks integrated with AWS cloud services.
  • Develop and support Data Engineering (ETL/ELT) and Machine Learning (ML) solutions using Python, Spark, Scala, or R.
  • Build and optimize distributed Spark workloads, ensuring performance and scalability.
  • Implement batch and streaming pipelines using Databricks Jobs, DLT, and Spark Streaming.
  • Design and maintain data models, databases, and tables across multiple subject areas.
  • Build, test, and maintain medium to large-scale data pipelines from multiple source systems.
  • Implement data quality checks, validations, and reusable pipeline frameworks.
  • Use Infrastructure as Code (IaC) and CI/CD pipelines to automate deployment of data platforms.
  • Collaborate on architecture design, documentation, and best practices.

Preferred Candidate Profile

Big Data & Databricks

  • Strong experience with Databricks, Spark, Hadoop, EMR, Hortonworks.
  • Hands-on expertise with Databricks components:
    • Notebooks, Jobs, DLT
    • Interactive & Job Clusters
    • SQL Warehouses
    • Unity Catalog, MLflow
    • DBFS, Secrets, Policies
    • Hive & Glue Metastore

Programming & Querying

  • Strong proficiency in:
    • Python
    • PySpark / Spark SQL
    • SQL
    • Hive, Presto
    • Spark Streaming

AWS Cloud Services

  • Experience with:
    • S3, EC2, VPC, IAM
    • Lambda, API Gateway
    • Glue, Redshift, Spectrum
    • Athena, Kinesis
    • Cognito, ALB

DevOps, CI/CD & Automation

  • Source control: Git, Bitbucket, AWS CodeCommit
  • CI/CD tools: Jenkins, GitHub Actions, AWS CodeBuild & CodeDeploy
  • Infrastructure automation using Terraform and Databricks APIs
  • Experience in MLOps pipelines

Documentation & Delivery

  • Create:
    • Architecture & design documents
    • Low-level designs (LLD)
    • Test cases & traceability matrix
  • Build reference architectures, demos, and how-to guides
  • Willingness to pursue cloud and Databricks certifications

Education

  • B.Tech / B.E. in any specialization

Key Skills

PySpark, Databricks, Spark, SQL, AWS, Data Lake, Data Warehousing, ETL/ELT, Machine Learning, CI/CD, Terraform, Python