Bria Yasir - Portfolio

About Me

Dynamic and results-driven Data Engineering Specialist with over 10 years of experience designing, building, and optimizing cloud-native data ecosystems across healthcare, finance, and enterprise domains. Expert in modern lakehouse architectures, scalable ETL/ELT pipelines, and real-time streaming frameworks using Databricks, Snowflake, Spark, and Kafka. Skilled in Azure, AWS, and GCP with deep expertise in data modeling, orchestration, and automation through Airflow, dbt, Terraform, and CI/CD workflows. Recognized for driving modernization, compliance (HIPAA, GDPR, SOC2), and cost optimization while collaborating across global teams. Passionate about enabling AI-driven analytics, governance, and data democratization through robust engineering and scalable architecture.

Technical Skills

Programming & Scripting

Python, SQL, Scala, Java, Bash, JSON, YAML
ETL automation, PySpark, APIs, validation

Data Engineering

Apache Spark, Databricks, Flink, Kafka
Streaming, Beam, Airflow, Luigi, Glue, ADF

ETL / ELT

dbt, Talend, NiFi, Informatica, SSIS
Matillion, DataStage, Synapse, Glue Jobs

Architecture & Modeling

Lakehouse, Data Vault, Kimball, Inmon
Star/Snowflake Schemas, DDD, Bronze-Silver-Gold

Warehousing & Storage

Snowflake, Delta Lake, Redshift, BigQuery
Synapse, Hive, Presto, Parquet, ORC, Avro

Governance & Security

Apache Atlas, Collibra, Alation
RBAC/ABAC, PII Masking, HIPAA, GDPR, SOC2

Cloud Platforms

Azure, AWS, GCP
ADF, Synapse, Databricks, S3, Glue, EMR, BigQuery

Automation & DevOps

Terraform, Jenkins, Azure DevOps, GitHub Actions
Docker, Kubernetes, IaC, Prometheus, Grafana

Analytics & ML

MLflow, TensorFlow, Power BI, Tableau, Looker
Feature Store Design, AI/ML Integration

Key Projects

Healthcare Data Lakehouse Modernization

Description: Architected and migrated on-premise ETL workflows into Databricks + Snowflake lakehouse using Delta Lake and dbt. Structured Bronze–Silver–Gold layers, integrated Apache Atlas, and enforced HIPAA compliance enabling predictive analytics and self-service BI.

Cloud ETL Pipeline Modernization

Description: Re-engineered legacy batch jobs into modular, cloud-native ETL pipelines using Talend, Python, and Airflow. Integrated Great Expectations-based data validation, reducing recovery time by 50% and improving resilience.

Real-Time Data Streaming Platform

Description: Designed streaming pipelines using Kafka, Spark Streaming, and AWS Lambda integrated with S3 and Snowflake for sub-second fraud detection and analytics.

View More on GitHub

Professional Experience

Lead Data Engineer

Contour Software

03/2022 – Present

Architected multi-cloud data platforms (Azure, GCP, Databricks) for healthcare clients.
Implemented Delta Lake + Snowflake lakehouse supporting AI, BI, and ML workloads.
Automated 100+ pipelines using dbt and Airflow improving reliability by 40%.
Governed data with Apache Atlas, ensuring HIPAA compliance.
Reduced compute/storage costs by 35% via autoscaling.

Senior Data Engineer

VentureDive

08/2018 – 02/2022

Migrated ETL workflows to Azure Data Factory & Databricks.
Built hybrid batch/stream pipelines using Delta Lake.
Integrated Snowflake to reduce query latency by 55%.
Developed CI/CD with Terraform & Azure DevOps.
Enhanced data quality with Great Expectations.

Big Data Specialist

North Bay Solutions

01/2016 – 07/2018

Built large-scale Spark/Hive pipelines for healthcare data.
Implemented Kafka ingestion for IoT and patient streams.
Improved job performance by 45% via optimization.
Unified EHR, lab & insurance data in Hadoop ecosystem.

Data Engineer

CodeNinja

07/2014 – 12/2015

Automated data ingestion with Talend, SQL Server, and Python.
Standardized healthcare records and created data marts.
Reduced manual workload by 60% through scheduling.