Resume

Experience

2023 - Present

Data Engineer

University of North Texas

Dallas, USA

SEER-Medicare-Health Data Integration and Visualization Project

• Orchestrated the end-to-end integration of SEER cancer registry data, Medicare claims, and state-level public health datasets using ZIP and FIPS codes to unify disparate sources into a cohesive patient trajectory model from diagnosis through outcomes.

• Cleaned and preprocessed the data using SQL, PostgreSQL, and Python, standardizing formats and resolving inconsistencies across datasets.

Designed and implemented robust ETL pipelines using Apache NiFi, automating ingestion and transformation processes.

• The cleaned data was stored in Google Cloud Storage and then loaded into Google BigQuery, which served as the central data warehouse. Built interactive dashboards and visual reports using Metabase, enabling stakeholders to explore trends, disparities, and outcomes across demographic and geographic

dimensions. This solution facilitated evidence-based decision-making for healthcare researchers and policy planners.

• Unified Health Data Sources: Orchestrated the integration of SEER cancer registry, Medicare claims, and state-level health datasets using Zip and FIP codes, providing a comprehensive view of patient trajectories from diagnosis to outcomes.

• Enhanced Data Integrity: Addressed data quality issues by correcting missing values, eliminating duplicates, and standardizing column names to ensure a reliable dataset for analyzing breast cancer stages and survival times.

• Data Visualization Mastery: Utilized Matplotlib to produce scatter plots, bar charts, and heatmaps, revealing patterns and trends in the data to facilitate deeper insights into breast cancer research.

• In-depth Data Analysis: Employed SQL, Tableau, Excel, and machine learning tools to interpret visualized results, providing critical insights for research, reporting, and strategic oncology decision-making.

• Machine Learning Implementation: Applied various machine learning algorithms, such as Gradient Boosting Classifier and Ordinary Least Squares regression, to assess and correct biases in large-scale datasets, promoting model fairness.

• Predictive Model Development: Created predictive models using advanced machine learning methodologies to forecast cancer outcomes and responses to treatment.

• Transparency through Explainable AI/ML: Implemented explainable AI/ML approaches to ensure clarity and transparency in model predictions, aiding clinicians and researchers in decision-making.

• Rigorous Data Validation: Conducted thorough quality assurance and data validation to maintain the accuracy and reliability of the data utilized in analyses and modeling.

• Advanced Data Collection Techniques: Executed sophisticated data scraping methods to acquire extensive datasets from SEER, Medicaid, AHD, USDA, and CMS, vital for informed decision-making in health research.

• Contributions to Scientific Research: Played a key role in drafting research papers and communicating complex data analyses to the scientific community, enhancing understanding of public health impacts.

• Leadership in Cross-functional Collaboration: Worked closely with cross-functional teams to align data

initiatives with broader research objectives, ensuring successful project execution and impactful results.

• Expanded Scope: Data Governance, Epidemiology, and Public Health Integration In addition to technical accomplishments, this project had a strong emphasis on data governance, epidemiological analysis, and public health impact.

• Data Governance Excellence: Developed and enforced policies and procedures for managing data privacy, access control, and compliance with HIPAA and other regulatory frameworks. Led the creation of a public health data governance framework that defined clear roles, responsibilities, and processes for

ensuring data accuracy, consistency, and security.

• Epidemiological Analysis: Collaborated closely with epidemiologists to define cohort selection criteria, perform population-level health trend analysis, and calculate survival rates, disease incidence, and treatment response across various demographic and geographic groups.

• Public Health Impact: The project supported strategic decision-making for public health interventions by linking health data with geospatial and socioeconomic indicators. These insights were used to identify disparities in healthcare access and outcomes, guiding policy recommendations.

• Stakeholder Collaboration: Acted as a liaison between technical teams and public health officials, translating data requirements into scalable solutions and ensuring that analytical outputs aligned with real world healthcare priorities.

• Transparency and Reproducibility: Maintained comprehensive documentation of data transformation processes, pipelines, and modeling decisions to support reproducibility and knowledge transfer.

Additional Contributions and Innovations

• Proactive Problem-Solving: Applied expert problem-solving skills to address and resolve technical and data related issues as they arose, ensuring minimal downtime and maintaining high standards of data quality and system performance.

• Stakeholder Engagement and Communication: Regularly engaged with stakeholders through meetings and

presentations, providing updates on project status, explaining complex technical details, and gathering feedback to refine data solutions. Developed comprehensive documentation to assist users and technical teams in understanding and utilizing the data systems.

• Innovative Technology Integration: Explored and integrated new technologies and methodologies to enhance

the capabilities of the data infrastructure. This includes cloud computing solutions for scalable storage and processing, as well as advanced analytical techniques for predictive modeling and machine learning.

• Ethical Data Governance: Ensured that all data handling practices comply with ethical standards and legal requirements, particularly concerning data security and patient confidentiality. Implemented robust data governance frameworks to oversee the proper management of sensitive health data.

2022 - 2023

Data Specialist/SQL Developer

HCLtech (Client: Toyota Financial Services)/Worked as an IT Outsourcer

Chennai, India

• Strategic Data Management: Orchestrated the automation of data extraction, transformation, and loading processes using SSIS, enhancing operational efficiency across Toyota’s global data systems.

• Robust Data Integration and Quality Assurance: Leveraged Azure Data Factory and Python scripting to cleanse and integrate large-scale data sets from diverse sources, ensuring high data quality and reliability for critical decision-making processes.

• Advanced Analytics and Reporting: Utilized Power BI and SQL to develop and administer insightful analytical reports and dashboards, directly supporting Toyota's strategic objectives by providing actionable insights into operational data.

• Cloud Solutions Engineering: Managed and optimized Azure cloud services, including BLOB and Data Lake storage, ensuring scalable and secure data solutions that support Toyota's outsourcing and IT strategies.

• Automated Monitoring and Response Systems: Implemented Azure Automation to monitor IT resources and configured alarms for proactive issue resolution, safeguarding system stability and performance.

• Security and Compliance Protocols: Conducted rigorous security assessments and compliance checks, aligning with Toyota’s stringent data security requirements to protect sensitive information and system integrity.

• Process Optimization and Agile Project Management: Directed Agile project teams through daily scrums and sprint planning, driving the timely execution of IT projects that enhance Toyota’s operational efficiencies.

• Visualization and Decision Support Tools: Crafted advanced visualizations using Power BI, SSRS, and Excel, facilitating strategic decisions with real-time data insights tailored to specific business needs.

• Innovative Problem Solving and System Support: Engaged in troubleshooting and resolving complex IT issues, improving system functionality and user satisfaction while providing continuous post-production support.

• Agile Development & Collaboration: Led cross-functional Agile teams, conducted daily scrums, sprint planning, and retrospective meetings to drive iterative delivery of data engineering tasks and ensure alignment with Toyota’s IT roadmap.

• Analytics Enablement & Visualization: Built data marts and served curated datasets to BI teams; collaborated on Power BI, SSRS, and Excel-based dashboards to provide timely, actionable insights into supply chain, production, and financial KPIs.

• Post-Production Support & Optimization: Troubleshot complex data issues across environments, performed root cause analysis, and implemented automated data validation to reduce error rates and improve pipeline stability.

• Infrastructure as Code & Deployment: Utilized ARM templates for deployment of ADF pipelines, linked services, and datasets across dev, test, and prod environments, supporting scalable and repeatable infrastructure provisioning.

Environment & Tools: Azure Data Factory (ADF), Synapse Analytics, Azure Data Lake Storage, Blob Storage, SSIS, Power BI, SQL Server, T-SQL, Python, Azure Monitor, Azure Key Vault, ARM Templates, JIRA, SSRS, Excel, Git

2020 - 2022

Data Engineer

Edgerock Software Solutions Private Limited

Hyderabad, Remote

• Cleaned, transformed, and loaded large datasets into cloud-based environments for analytics and reporting using Azure Data Factory and SQL Server

• Created automated validation workflows using Azure Functions and Python to clean CSV data, convert XML to JSON, and optimize file storage using Parquet format

• Built ETL pipelines using SSIS to process bulk Excel data, apply business rules, and log records into SQL Server for traceability and compliance

• Designed logging frameworks to track task-level and package-level ETL activities, ensuring high data quality and visibility

• Pre-aggregated data using Power BI Dataflows and Power Query to streamline dashboard generation and reduce report latency

• Utilized MS Excel (including VLOOKUP, pivot tables, and conditional formatting) and MS Access for ad-hoc analysis, error tracking, and offline reporting workflows

• Developed dashboards and visual reports in Power BI and Tableau to communicate KPIs and performance trends to stakeholders

• Migrated data integration workflows from Azure Data Factory V1 to V2, managing environment promotion using ARM templates and Azure Key Vault

• Managed Azure services such as Synapse, Blob Storage, and Data Lake for scalable data storage and transformation

• Wrote and optimized complex T-SQL queries, stored procedures, and views to support reporting, analytics, and automation • Implemented row-level security and enforced data governance standards for access control across reporting layers

• Automated routine data ingestion tasks using shell scripts and integrated them into production schedules

• Partnered with QA and UAT teams to validate pipeline outputs and troubleshoot data issues in production

• Collaborated in Agile sprints and daily stand-ups to prioritize reporting deliverables and improve data processes

Key Tools & Technologies: MS Excel, MS Access, Power BI, Tableau, SSIS, SQL Server, T-SQL, Azure Data Factory, Azure Synapse, Azure Blob Storage, Azure Data Lake, Python, Azure Monitor, ARM Templates, Git, JIRA, SSRS

2019 - Summer

Intern (Mobile App Developer)

One Eye Technology

Chennai, India

• Assisted in collecting, cleaning, and transforming raw datasets from multiple sources using SQL, Excel, and Python, supporting data-driven decision-making processes.

• Developed basic ETL workflows under supervision, contributing to automated data pipelines using tools like SSIS or Azure Data Factory.

• Performed exploratory data analysis (EDA) and created summary reports using Power BI and Excel dashboards, highlighting key trends and metrics.

• Wrote and optimized SQL queries for data extraction, reporting, and performance testing in staging and production databases • Collaborated with senior data professionals to understand data requirements, ensuring accurate data modeling and validation.

• Documented data definitions, data flows, and transformation logic to support transparency and knowledge transfer within the team.

• Gained hands-on experience in cloud platforms (Azure/AWS), including basic use of Data Lake Storage, Blob Storage, and Synapse Analytics.

• Participated in Agile ceremonies (stand-ups, sprint planning) and learned how to work within a structured software development lifecycle (SDLC).

Education

May 2025

University of North Texas

Masters of Science in Business Analytics

Dallas, USA

At UNT, I am gaining advanced knowledge in data engineering, machine learning, statistical modeling, and business intelligence. My coursework and projects focus on data-driven decision-making, predictive analytics, and real-world problem-solving using tools like Python, SQL, Power BI, and Tableau. I’m actively involved in research on healthcare and supply chain data, applying AI/ML to extract insights and improve operational efficiency.

May 2022

Bachelors of Science in Electronics and Communication Engineering

Anna University

Chennai, India

Completed a rigorous curriculum in electronics and communication engineering with a strong foundation in mathematics, programming, and signal processing. Gained practical experience in database management, system design, and analytics through coursework and academic projects. My interest in data and automation began here, which led to my transition into data engineering and analytics.