My Journey

A timeline of my professional roles, research contributions, and key achievements.

On this page

Data EngineerBell Canada (Data Engineering & Artificial Intelligence Team)June 2025 – PresentToronto, Ontario, Canada (Remote)

Impact at a Glance

Primary owner of the Network Ticket Service (NTS) data pipeline — a mission-critical ETL system integrating 4 enterprise data sources (SmartPath API, Maximo, IPACT, LDAP) into unified analytical reporting for Bell Business Markets. Expanded scope to include enterprise analytics platform delivery, cross-domain investigations, and production system ownership.

📊

Attributes Delivered

🔄

78,000+

Records Recovered

⚡

83%

Query Optimization

🔧

Pipelines Maintained

Key Contributions

Enterprise analytics platform deliveryFull-stack RCA & data recoveryQuery performance re-architectureCross-domain technical investigationProduction system ownershipCloud migration preparation

Built and productionized the mission-critical Network Ticket Service (NTS) data pipeline on Teradata using a three-tier ETL and ELT architecture: staging, warehouse, and analysis layers. Integrated four operational systems including REST API event streams, legacy enterprise resource planning (ERP), billing, and directory services (LDAP) using Python and SAS Data Integration, enforcing data contracts and Kimball-style dimensional modeling patterns transferable to Snowflake, BigQuery, and Amazon Redshift.

Built a stateful sessionization algorithm in Python to fix event sequencing defects, refactoring a flawed sequential method into a robust two-pass group-by propagation model. Achieved deterministic mapping across distributed agent sessions by identifying anchor events and backfilling request identifiers to preceding and succeeding events.

Reduced query latency by 83% (12 minutes to 2 minutes) on a join over 23 million rows by replacing dynamic runtime computation with a materialized pre-aggregation layer. Eliminated production timeouts and stabilized nightly SLA compliance through query optimization and static reference architecture.

Expanded analytical coverage by 800% (1 month to 9+ months) by running a full root cause analysis (RCA) on a hardcoded 30-day lookback filter causing systemic data drift. Executed a historical recovery program recasting 28,000+ and 50,000+ records, raising ticket match accuracy to the highest level since system inception.

Fixed historical attribution defects by implementing slowly changing dimension Type 2 (SCD Type 2) temporal joins on creation date to resolve employee hierarchy changes against directory services data. Replaced volatile login identifiers with a stable natural key (agent email) to preserve data lineage integrity for historical reporting.

Prevented duplicate data during retries by enforcing idempotency and atomic writes via composite upsert keys (request, ticket, and configuration item identifiers). Applied COALESCE, UPPER, and TRIM sanitization for all join conditions and enforced pre-aggregation patterns, preserving model granularity across development, quality assurance, and production environments.

Refactored monolithic SQL into modular clean and calculate transformation stages, a pattern analogous to dbt staging and marts. Created a unified view abstraction layer merging legacy and modern structures, enabling zero-downtime migration for downstream business intelligence consumers and accelerating peer reviews through clear separation of transformation and analytics code.

Built a Python and Apache Airflow observability module with configuration-driven validation checks orchestrated via DAG modules across 12+ pipelines. Reduced debugging time by 60% and prevented 15+ monthly data quality incidents through automated schema validation, anomaly detection, and threshold alerting for critical SLAs.

Promoted to Technical Gatekeeper within 3 months to govern the domain by enforcing defensive coding standards. Authored end-to-end validation documents including entity relationship diagrams (ERDs), data flow diagrams, and count-by-stage proofs, establishing the team standard for peer review and knowledge transfer.

Led requirements gathering and technical feasibility assessments for new data pipeline initiatives. Translated business needs into technical architectures and implementation roadmaps by facilitating cross-functional alignment sessions with stakeholders to define scope, deliverables, and success metrics.

Built visualizations and presented pipeline performance and data quality metrics to directors, team leads, and business intelligence analysts. Drove buy-in for platform modernization initiatives through data-driven executive reporting.

Expanded Scope: Platform Delivery & Cross-Domain Ownership

Shipped a 78-attribute analytics platform in MicroStrategy (business intelligence platform) integrating four operational systems: operations API (SmartPath), asset management (Maximo), billing (IPACT), and directory services (LDAP). Built derived metrics, conditional formatting, and cross-filter interactivity; deployed to production with director sign-off via a structured Dev to Pre-Prod to Prod migration pipeline.

Reduced build cost and risk through strategic architectural decision-making by choosing a SQL view over a physical fact table for joining event and request data with calculated duration metrics. Blocked out-of-scope integration requests and expanded the design from 13 columns to 78 attributes to eliminate recurring ad hoc request cycles.

Maintained 12 dependent data pipelines as backup owner for the production operations API (SmartPath) during a company-wide code embargo. Diagnosed and resolved critical ETL failures in error staging tables and business intelligence metric misconfigurations under time pressure without senior engineer availability.

Ramped from zero domain knowledge to full system comprehension within 2 weeks for a five-system data pipeline spanning Salesforce to billing to operations (CS Attack pipeline). Built a repeatable metadata discovery method using the Teradata system catalog (DBC.Columns) to locate target fields, validate source table population rates, and confirm join feasibility with evidence.

Co-authored the team engineering workflow standard, formalizing separation of Investigation (feasibility analysis, validation) and Implementation (code, test, deploy) phases. Established tracking protocols, stakeholder scope boundaries, and a mandatory front door ticket process for budget accountability.

Became the canonical source of institutional knowledge across three domains (NTS, SmartPath, CS Attack) during a leadership transition. Refactored team documentation into an Executive Summary plus Technical Appendix format and established a Visual-First methodology for director-level reporting.

Supporting Q1 2026 migration to Google Cloud Platform (GCP) and BigQuery, contributing to the architectural pivot from legacy Teradata wide tables to a snowflake schema design pattern. Preparing a pipeline refactoring roadmap from SAS Data Integration Studio (legacy ETL tool) to cloud-native orchestration using directed acyclic graphs (DAGs) on Cloud Composer.

Key Technologies & Concepts:

PythonSQLSAS DITeradataETL/ELTData WarehousingDimensional ModelingSnowflake SchemaSCD Type 2Root Cause AnalysisPerformance TuningAlgorithm DesignData ModelingMicroStrategyApache AirflowDAGCloud ComposerBigQueryGCPData QualityObservabilityProduction OwnershipStakeholder ManagementConfluence/Jira

Bioinformatics Software Development Research AssistantJohns Hopkins UniversitySeptember 2022 – PresentBaltimore, Maryland, United States (Remote)

Impact at a Glance

Cross-institutional oncology research integrating 750+ TB of multi-omics data across 3 cancer types. Built and maintain an open-source bioinformatics platform used by 100+ global researchers. This role grew from my foundational work at University of Toronto.

💾

750+ TB

Data Integrated

🧬

Novel Biomarkers

⚡

83%

Load Time Reduction

🌍

100+

Global Researchers

🎤 Harvard NCRC Plenary Speaker (1 of 12 from 5,000+)•🏆 ABRCMS Best Oral 2023•🏆 ABRCMS Best Poster 2024

Key Contributions

Full-stack bioinformatics platformML pipelines (SVM-RFE, Random Forests)HPC infrastructure integrationReal-time data visualizationMicroservices architecture35+ page research manuscript

Reduced analysis load times by 83% through optimized caching on a full-stack bioinformatics platform supporting more than 100 global researchers. Built the platform using Python, R, JavaScript, and C with microservices architecture, SOLID principles, and Docker containerization.

Engineered scalable ETL pipelines processing over 750 terabytes of multi-omics data on high-performance computing (HPC) clusters. Accelerated biomarker discovery by 40% and reduced analysis time by 40% using Python, R, SQL, and machine learning models including Support Vector Machine Recursive Feature Elimination (SVM-RFE) and Random Forest.

Improved data integrity by 30% by implementing automated data quality checks and anomaly detection using unsupervised machine learning (K-Means, DBSCAN) with TensorFlow within continuous integration and continuous deployment (CI/CD) pipelines. Validated biomarker analysis software using TensorFlow, Keras, and Scikit-learn.

Built interactive data visualization dashboards for molecular modeling and educational use using Shiny, React, and D3.js. Improved usability and accessibility for researchers working with complex genomic datasets.

Developed and optimized REST and GraphQL APIs to support real-time data access and model simulations across research modules. Enabled seamless integration between data processing pipelines and frontend applications.

Configured AWS environments including Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3) and automated testing and deployment workflows with GitHub Actions. Improved reliability and collaboration across development teams through CI/CD automation.

Applied secure data management and governance practices to ensure compliance with institutional privacy and research ethics standards. Maintained data lineage documentation for audit trails.

Collaborated with cross-functional experts including oncologists and statisticians to align computational workflows with research goals. Mentored peers on high-performance computing, reproducible software practices, and batch processing patterns.

Authored a 35+ page research manuscript featuring interactive R Shiny and D3.js data visualizations, published on GitHub and Zenodo to underscore reproducibility and transparency in data science projects.

Key Technologies & Concepts:

PythonRJavaScriptTypeScriptCReactNext.jsD3.jsMicroservicesDockerKubernetesRedisSQLHPCCI/CDMachine LearningTensorFlowScikit-learnPandasREST APIGraphQLAWS

Software Development Research AssistantUniversity of TorontoSeptember 2019 – April 2024Toronto, Ontario, Canada (Hybrid)

Impact at a Glance

Where my research career began. Built full-stack bioinformatics applications automating workflows across 7 wet lab research teams. This foundational work led to the cross-institutional collaboration with Johns Hopkins University.

⏱️

30+

Hours Saved Weekly

👥

Research Teams

🚀

50%

Setup Time Reduction

⚡

45%

UI Render Improvement

Key Contributions

Multi-language development (Python, R, C++, Java)Microservices & GraphQL APIsDocker/Kubernetes DevOpsMolecular modeling & 3D analysisAgile methodology adoption

Reduced analysis effort by more than 30 hours per week across 7 research teams by engineering full-stack bioinformatics platforms. Built automation using Python, R, C, and Java with object-oriented programming patterns to streamline lab workflows.

Owned the full software development life cycle (SDLC) including requirements, architecture, implementation, testing, deployment, and maintenance. Translated multidisciplinary research requirements into production-grade software solutions.

Cut setup and configuration time by 50% by implementing Docker-based DevOps workflows to eliminate environment drift. Enabled reproducible and scalable computation across research environments.

Improved user interface render times by 45% for large genomic datasets by optimizing data visualization performance in Next.js and Tailwind CSS. Enhanced research usability through frontend performance tuning.

Led Agile Scrum adoption and mentored a team of 5 junior developers. Increased throughput and strengthened cross-team collaboration through structured sprint planning and retrospectives.

Key Technologies & Concepts:

PythonRC/C++JavaNext.jsPostgreSQLMySQLGraphQLDockerKubernetesHPCBioinformaticsGenomicsAgile

Awards & Achievements

🎤

Plenary Speaker

National Collegiate Research Conference (NCRC) — Harvard University

2024Cambridge, Massachusetts, United States

Selected as 1 of only 12 plenary speakers from over 5,000 national applicants. Delivered keynote presentation on applying machine learning techniques to integrate transcriptomics and proteomics data for glioblastoma research.

🏆

Best Detailed Oral Presentation Award

Annual Biomedical Research Conference for Minoritized Scientists (ABRCMS) — Computational and Systems Biology Division

2023Phoenix, Arizona, United States$2,500 Travel Award

Awarded top presenter in the Computational and Systems Biology division, selected from 80 oral presenters at a conference with over 3,500 attendees. Recognized with a $2,500 award for travel and accommodation.

🏆

Best Poster Presentation Award

Annual Biomedical Research Conference for Minoritized Scientists (ABRCMS) — Graduate Division

2024Pittsburgh, Pennsylvania, United States$2,500 Travel Award

Received top honors for graduate-level poster presentation, competing among 150+ graduate presenters. Presented research on advancing open-source bioinformatics platforms. Recognized with a $2,500 award for travel and accommodation.

📜

Poster Presentation

National Collegiate Research Conference (NCRC) — Harvard University

2024Cambridge, Massachusetts, United States

Presented research poster detailing computational approaches for cancer biomarker identification using integrated multi-omics analysis and machine learning methodologies.

🎓

Friends of Arts and Science Award

University of Toronto — Faculty of Arts & Science

2024Toronto, Ontario, Canada

Awarded for academic excellence in Computer Sciences and Physical & Life Sciences disciplines during the 2023-2024 academic year.

🎓

Friends of Arts and Science Award

University of Toronto — Faculty of Arts & Science

2023Toronto, Ontario, Canada

Awarded for academic excellence in Computer Sciences and Physical & Life Sciences disciplines during the 2022-2023 academic year.

🎓

Friends of Arts and Science Award

University of Toronto — Faculty of Arts & Science

2022Toronto, Ontario, Canada

Awarded for academic excellence in Computer Sciences and Physical & Life Sciences disciplines during the 2021-2022 academic year.

Education

University of Toronto

St. George Campus

Toronto, Ontario, Canada

Bachelor of Science (Honours)

Graduated June 2024

Specialist: Computer Science, Bioinformatics & Computational Biology

Minor: Immunology

Major GPA: 3.96 / 4.0

Relevant Coursework

Computer Science

CSC108H1 — Introduction to Computer Programming
CSC148H1 — Introduction to Computer Science
CSC165H1 — Mathematical Expression and Reasoning for Computer Science
CSC207H1 — Software Design
CSC209H1 — Software Tools and Systems Programming
CSC236H1 — Introduction to the Theory of Computation
CSC263H1 — Data Structures and Analysis
CSC373H1 — Algorithm Design and Analysis
CSC384H1 — Introduction to Artificial Intelligence

Bioinformatics & Computational Biology

BCH441H1 — Bioinformatics
BCB410H1 — Applied Bioinformatics
BCB420H1 — Computational Systems Biology
BCB330Y1 — Bioinformatics Research Project
BCB430Y1 — Advanced Bioinformatics Research Project
CSB352H1 — Bioinformatic Methods

Statistics & Probability

STA247H1 — Probability with Computer Applications
STA237H1 — Probability, Statistics and Data Analysis I

Mathematics

MAT135H1 — Calculus I (A)
MAT136H1 — Calculus I (B)

Biochemistry

BCH210H1 — Biochemistry I: Proteins, Lipids and Metabolism
BCH311H1 — Biochemistry II: Nucleic Acids and Biological Information Flow

Immunology (Minor)

IMM250H1 — The Immune System and Infectious Disease
IMM340H1 — Fundamental Immunology
IMM350H1 — The Immune System in Action

Interested in Working Together?

I'm actively seeking opportunities in data platform engineering, data engineering, and software development.

Back to Home Get In Touch

My Journey

Work Experience

Data EngineerBell Canada (Data Engineering & Artificial Intelligence Team)June 2025 – PresentToronto, Ontario, Canada (Remote)

Expanded Scope: Platform Delivery & Cross-Domain Ownership

Bioinformatics Software Development Research AssistantJohns Hopkins UniversitySeptember 2022 – PresentBaltimore, Maryland, United States (Remote)

Software Development Research AssistantUniversity of TorontoSeptember 2019 – April 2024Toronto, Ontario, Canada (Hybrid)

Awards & Achievements

Plenary Speaker

Best Detailed Oral Presentation Award

Best Poster Presentation Award

Poster Presentation

Friends of Arts and Science Award

Friends of Arts and Science Award

Friends of Arts and Science Award

Education

University of Toronto

Relevant Coursework

Computer Science

Bioinformatics & Computational Biology

Statistics & Probability

Mathematics

Biochemistry

Immunology (Minor)

Interested in Working Together?