Azure AI Engineer

May 7, 2026

Job Description

We are seeking a Data Scientist / Data Engineer to support a large-scale engineering modernization initiative focused on transforming years of legacy engineering data into an intelligent, searchable platform.

The role involves handling messy, semi-structured historical datasets, building scalable ETL/data pipelines, and preparing the foundation for future AI/ML-driven similarity matching systems.

This is a highly practical, real-world data engineering role with future exposure to AI and machine learning applications.


Project Overview

The project involves:

  • Processing 2,100+ Excel files containing 10–15 years of engineering configurations
  • Automating a currently manual engineering search process
  • Building a scalable backend system to support an internal UI
  • Enabling future AI-powered recommendation and similarity matching features

Engineers will eventually input parameters into an internal application, which will surface the most relevant historical configurations automatically.


Responsibilities

Data Engineering & ETL

  • Design and maintain scalable ETL/data pipelines using Python and SQL
  • Process large volumes of legacy engineering data
  • Standardize, normalize, and clean inconsistent datasets

Data Processing

  • Parse and transform Excel-based datasets
  • Handle semi-structured and unstructured historical data
  • Improve data quality and consistency across systems

Platform & Cloud

  • Work with Azure data platforms and Databricks
  • Build scalable processing workflows for large datasets

AI/ML Foundation

  • Prepare datasets for future:
    • similarity matching
    • ML models
    • AI-powered search systems
  • Support future integrations involving NLP, LangChain, and LangGraph

Requirements

  • 3+ years of experience in Data Engineering or Data Science
  • Strong Python programming skills
  • Experience building ETL/data pipelines
  • Strong SQL skills (PostgreSQL preferred)
  • Experience working with:
    • Databricks
    • Azure data platforms
  • Hands-on experience processing Excel datasets
  • Strong experience with:
    • data cleaning
    • normalization
    • preprocessing
  • Experience handling legacy or inconsistent datasets

Skills

  • Python
  • SQL
  • PostgreSQL
  • Data Engineering
  • ETL Pipelines
  • Data Cleaning & Normalization
  • Azure Databricks
  • Data Preprocessing
  • Machine Learning Foundations
  • NLP
  • LangChain
  • LangGraph
  • Legacy Data Processing
  • Excel Data Parsing