0 viewsjobseeker
Sowmya M. — Senior Data Engineer from United States

Sowmya M.

Senior Data Engineer

United States 6+ years
Open to offersNew to Platform
Video Introduction
No video introduction yet
The candidate has not added a video.
Contact information and social networks are private. Connect to unlock.
Hidden

About

Sowmya M. is a seasoned Senior Data Engineer with over 7 years of experience in designing and building scalable data platforms across industries such as telecom, healthcare, banking, and IT. She is particularly skilled in creating enterprise-grade data solutions, leveraging her expertise in Databricks and Unity Catalog. Sowmya has a strong focus on cloud environments, specifically AWS and Azure, and has successfully implemented ETL/ELT pipelines, data governance, and quality frameworks. At Arvest Bank, she architected a secure, scalable AWS-based data lake to support fraud detection analytics using tools like S3, EMR, and Redshift, and at Flex Care, she managed data ingestion systems processing over 5 million patient records daily. Her technical proficiency includes using PySpark, Spark SQL, and Kafka for real-time and batch analytics, along with optimizing workflows with Airflow and Lambda. Educationally, she holds a Master’s degree in Advanced Data Analytics from the University of North Texas.

Experience

  • Senior AWS Data Engineer

    Arvest Bank · 2024 — Present
    Architected and developed a scalable AWS-based Data Lake utilizing S3, EMR, Redshift, Glue, Athena, and DynamoDB to support fraud detection analytics. Created batch and real-time data pipelines employing PySpark, Spark SQL, and Kafka in conjunction with Databricks. Established data governance and quality frameworks through Unity Catalog to ensure secure access, lineage, and dependable analytics. Designed dimensional data models with SCD Type 1 and 2, optimizing data retrieval via partitioning and query tuning. Created workflows for AI/ML lifecycles using MLflow, Docker, and ECR for model tracking and deployment. Optimized workflows with Airflow, Lambda, and tuned Spark, achieving a reduction in job runtime by 35% and decreasing costs by approximately $15K per month.
  • AWS Data Engineer

    Flex care · 2021 — 2023
    Constructed real-time and batch data pipelines using PySpark, Spark SQL, and Kafka on AWS Databricks, processing over 5 million patient records daily with less than 2-hour latency. Automated data ingestion and ETL workflows through Glue, Lambda, Sqoop, and Airflow, enhancing pipeline reliability and operational efficiency. Established data governance and compliance frameworks utilizing Unity Catalog, adhering to HIPAA and GDPR standards. Developed data storage solutions optimized for performance using HDFS, Hive, HBase, Redshift, and Druid. Designed distributed data processing systems leveraging Spark (Scala/PySpark), MapReduce, and Flume to enhance scalability and processing speed. Containerized and deployed data applications with Docker and Kubernetes, facilitating scalable CI/CD workflows.
  • Azure Data Engineer

    Lumen · 2020 — 2021
    Engineered scalable data pipelines utilizing Azure Data Factory and Databricks for batch and real-time processing across diverse data sources. Developed streaming solutions with Azure Stream Analytics, Event Hub, and Service Bus for continuous data ingestion. Enhanced ETL/ELT workflows using PySpark and Spark SQL to improve data processing efficiency. Implemented data governance and integration via Unity Catalog to ensure secure, compliant, and reliable data flows. Automated workflows and deployments utilizing Airflow and CI/CD pipelines, increasing reliability and minimizing manual interventions. Built analytics and machine learning solutions using MLflow, Power BI, and Azure services for better data-driven insights.
  • Software Engineer

    Electronic arts · 2019 — 2019
    Engineered and refined advanced SQL queries and indexing strategies, significantly enhancing query speed through indexing and partitioning. Automated ETL pipelines using Talend, facilitating efficient data migration to AWS S3. Conducted data cleansing, transformation, and mapping processes to ensure high data quality and precise integration across various systems. Developed Power BI dashboards and semantic reporting models for executive-level insights. Integrated data from multiple sources, including SQL Server and Oracle, standardizing formats for consistent analytics.

Skills & Expertise

Education

  • M.P.S in Advanced Data Analytics
    University of North Texas, Denton County · — — 2025