Light

About Me

Dynamic and results-driven Data Engineering Specialist with over 10 years of experience designing, building, and optimizing cloud-native data ecosystems across healthcare, finance, and enterprise domains. Expert in modern lakehouse architectures, scalable ETL/ELT pipelines, and real-time streaming frameworks using Databricks, Snowflake, Spark, and Kafka. Skilled in Azure, AWS, and GCP with deep expertise in data modeling, orchestration, and automation through Airflow, dbt, Terraform, and CI/CD workflows. Recognized for driving modernization, compliance (HIPAA, GDPR, SOC2), and cost optimization while collaborating across global teams. Passionate about enabling AI-driven analytics, governance, and data democratization through robust engineering and scalable architecture.

Technical Skills

Programming & Scripting

  • Python, SQL, Scala, Java, Bash, JSON, YAML
  • ETL automation, PySpark, APIs, validation

Data Engineering

  • Apache Spark, Databricks, Flink, Kafka
  • Streaming, Beam, Airflow, Luigi, Glue, ADF

ETL / ELT

  • dbt, Talend, NiFi, Informatica, SSIS
  • Matillion, DataStage, Synapse, Glue Jobs

Architecture & Modeling

  • Lakehouse, Data Vault, Kimball, Inmon
  • Star/Snowflake Schemas, DDD, Bronze-Silver-Gold

Warehousing & Storage

  • Snowflake, Delta Lake, Redshift, BigQuery
  • Synapse, Hive, Presto, Parquet, ORC, Avro

Governance & Security

  • Apache Atlas, Collibra, Alation
  • RBAC/ABAC, PII Masking, HIPAA, GDPR, SOC2

Cloud Platforms

  • Azure, AWS, GCP
  • ADF, Synapse, Databricks, S3, Glue, EMR, BigQuery

Automation & DevOps

  • Terraform, Jenkins, Azure DevOps, GitHub Actions
  • Docker, Kubernetes, IaC, Prometheus, Grafana

Analytics & ML

  • MLflow, TensorFlow, Power BI, Tableau, Looker
  • Feature Store Design, AI/ML Integration

Key Projects

Healthcare Data Lakehouse

Healthcare Data Lakehouse Modernization

Description: Architected and migrated on-premise ETL workflows into Databricks + Snowflake lakehouse using Delta Lake and dbt. Structured Bronze–Silver–Gold layers, integrated Apache Atlas, and enforced HIPAA compliance enabling predictive analytics and self-service BI.

Cloud ETL

Cloud ETL Pipeline Modernization

Description: Re-engineered legacy batch jobs into modular, cloud-native ETL pipelines using Talend, Python, and Airflow. Integrated Great Expectations-based data validation, reducing recovery time by 50% and improving resilience.

Streaming

Real-Time Data Streaming Platform

Description: Designed streaming pipelines using Kafka, Spark Streaming, and AWS Lambda integrated with S3 and Snowflake for sub-second fraud detection and analytics.

View More on GitHub

Professional Experience

Lead Data Engineer
Contour Software
03/2022 – Present
  • Architected multi-cloud data platforms (Azure, GCP, Databricks) for healthcare clients.
  • Implemented Delta Lake + Snowflake lakehouse supporting AI, BI, and ML workloads.
  • Automated 100+ pipelines using dbt and Airflow improving reliability by 40%.
  • Governed data with Apache Atlas, ensuring HIPAA compliance.
  • Reduced compute/storage costs by 35% via autoscaling.
Senior Data Engineer
VentureDive
08/2018 – 02/2022
  • Migrated ETL workflows to Azure Data Factory & Databricks.
  • Built hybrid batch/stream pipelines using Delta Lake.
  • Integrated Snowflake to reduce query latency by 55%.
  • Developed CI/CD with Terraform & Azure DevOps.
  • Enhanced data quality with Great Expectations.
Big Data Specialist
North Bay Solutions
01/2016 – 07/2018
  • Built large-scale Spark/Hive pipelines for healthcare data.
  • Implemented Kafka ingestion for IoT and patient streams.
  • Improved job performance by 45% via optimization.
  • Unified EHR, lab & insurance data in Hadoop ecosystem.
Data Engineer
CodeNinja
07/2014 – 12/2015
  • Automated data ingestion with Talend, SQL Server, and Python.
  • Standardized healthcare records and created data marts.
  • Reduced manual workload by 60% through scheduling.
LinkedIn