Description
Description
We are seeking a Data Engineer with hands-on AI/ML project experience in Databricks to join the Databricks Solution Team within the IRS Advanced Analytics Program (AAP). This role is responsible for building, optimizing, and maintaining data pipelines and feature engineering workflows that directly support model training, deployment, and monitoring for IRS mission teams.
As part of the AAP common services mission, the Data Engineer will deliver scalable, reusable, and compliant data engineering solutions using Databricks and AWS. The ideal candidate brings a strong background in data engineering for AI/ML use cases — ensuring data readiness and accessibility across the entire AI/ML lifecycle.
Key Responsibilities
- Design, build, and maintain data pipelines in Databricks (Spark, Delta Lake, MLflow) specifically tailored for AI/ML and GenAI use cases.
- Implement data ingestion, transformation, and feature engineering workflows that feed model training and inference processes.
- Collaborate with mission data scientists to ensure datasets are optimized for model development and experimentation.
- Integrate pipelines into CI/CD workflows for automated, repeatable, and compliant model operations.
- Optimize data workflows for performance, scalability, and cost-efficiency across multi-tenant workloads.
- Apply governance and security controls (Unity Catalog, IAM, audit logging) to protect sensitive IRS data.
- Support data validation, schema enforcement, and quality checks to ensure reliable model outcomes.
- Partner with Product Manager and Chief Architect to align data engineering capabilities with roadmap priorities and platform evolution.
Qualifications
Required Qualifications
- Bachelor's degree in Computer Science, Data Engineering, or related field and 14 years or more experience; Master's degree at 12 or more years experience.
- Must be a U.S. Citizen with the ability to obtain and maintain a Public Trust security clearance.
- 5+ years of data engineering experience with AI/ML-focused projects.
- Hands-on expertise with Databricks, Spark, Delta Lake, and MLflow in the context of AI/ML pipelines.
- Proficiency in Python, SQL, and data transformation frameworks.
- Experience delivering feature engineering and data prep for model development and operationalization.
- Familiarity with ETL orchestration tools (Airflow, Databricks Workflows, or similar).
- Knowledge of CI/CD integration for data pipelines (Terraform, Git-based workflows).
- Awareness of AI/ML lifecycle data needs (training, validation, inference, retraining).
Desired Skills
- Certifications: Databricks Certified Data Engineer Associate/Professional.
- Experience in federal or regulated data environments (FedRAMP, NIST 800-53).
- Familiarity with AWS data services (S3, Glue, Lambda, Redshift) integrated with Databricks.
- Exposure to Trustworthy AI practices (bias monitoring, lineage, explainability).
- Strong problem-solving and collaboration skills with architects, MLOps engineers, and mission data scientists.
Target salary range: $120,001 - $160,000. The estimate displayed represents the typical salary range for this position based on experience and other factors.
Apply on company website