Seasoned Data Engineering Professional with 10+ years of expertise in building cloud-native platforms, modern lakehouse architectures, and large-scale data pipelines across healthcare, finance, and enterprise domains. I specialize in designing and optimizing end-to-end ETL processes, real-time streaming solutions, and analytics-ready data models using Databricks, Snowflake, Spark, Kafka, and advanced transformation frameworks. With deep expertise in Azure, AWS, and GCP, I focus on performance tuning, compliance, and data reliability while delivering ML-ready datasets that enable advanced analytics and drive automation through governance, CI/CD, and infrastructure-as-code practices.
Company: Contour Software
Description: Migrated on-premise ETL workflows into a Databricks and Snowflake-based lakehouse using Delta Lake and dbt, structured with Bronze-Silver-Gold layers. Enforced governance with Apache Atlas and role-based security to ensure HIPAA compliance, while enabling real-time patient data insights and predictive analytics.
Technologies: Databricks, Snowflake, Delta Lake, dbt, Apache Atlas, Azure
Company: Contour Software
Description: Developed centralized data marts and semantic layers in Snowflake using dbt and automated ingestion pipelines with Airflow. Delivered KPIs through Power BI and Looker dashboards, improving self-service analytics and reducing reporting cycles by 40%.
Technologies: Snowflake, dbt, Airflow, Power BI, Looker
Company: VentureDive
Description: Built streaming pipelines with Kafka, Spark Streaming, and AWS Lambda integrated with S3-based alerting. Reduced detection latency by 60% and automated fraud risk workflows, empowering compliance teams with near real-time monitoring and proactive interventions.
Technologies: Kafka, Spark Streaming, AWS Lambda, S3
Company: NorthBay Solutions
Description: Engineered ML-ready time-series datasets using Spark, Databricks, and Delta Lake for patient outcome prediction. Collaborated with data scientists leveraging Scikit-learn and MLflow to deploy models that improved readmission risk identification and early intervention planning.
Technologies: Spark, Databricks, Delta Lake, Scikit-learn, MLflow