My Journey
A timeline of my professional roles, research contributions, and key achievements.
Work Experience
Data EngineerBell Canada (Data Engineering & Artificial Intelligence Team)June 2025 – PresentToronto, Ontario, Canada (Remote)
Primary owner of the Network Ticket Service (NTS) data pipeline — a mission-critical ETL system integrating 4 enterprise data sources (SmartPath API, Maximo, IPACT, LDAP) into unified analytical reporting for Bell Business Markets. Expanded scope to include enterprise analytics platform delivery, cross-domain investigations, and production system ownership.
Built and productionized the mission-critical Network Ticket Service (NTS) data pipeline on Teradata using a three-tier ETL and ELT architecture: staging, warehouse, and analysis layers. Integrated four operational systems including REST API event streams, legacy enterprise resource planning (ERP), billing, and directory services (LDAP) using Python and SAS Data Integration, enforcing data contracts and Kimball-style dimensional modeling patterns transferable to Snowflake, BigQuery, and Amazon Redshift.
Built a stateful sessionization algorithm in Python to fix event sequencing defects, refactoring a flawed sequential method into a robust two-pass group-by propagation model. Achieved deterministic mapping across distributed agent sessions by identifying anchor events and backfilling request identifiers to preceding and succeeding events.
Reduced query latency by 83% (12 minutes to 2 minutes) on a join over 23 million rows by replacing dynamic runtime computation with a materialized pre-aggregation layer. Eliminated production timeouts and stabilized nightly SLA compliance through query optimization and static reference architecture.
Expanded analytical coverage by 800% (1 month to 9+ months) by running a full root cause analysis (RCA) on a hardcoded 30-day lookback filter causing systemic data drift. Executed a historical recovery program recasting 28,000+ and 50,000+ records, raising ticket match accuracy to the highest level since system inception.
Fixed historical attribution defects by implementing slowly changing dimension Type 2 (SCD Type 2) temporal joins on creation date to resolve employee hierarchy changes against directory services data. Replaced volatile login identifiers with a stable natural key (agent email) to preserve data lineage integrity for historical reporting.
Prevented duplicate data during retries by enforcing idempotency and atomic writes via composite upsert keys (request, ticket, and configuration item identifiers). Applied COALESCE, UPPER, and TRIM sanitization for all join conditions and enforced pre-aggregation patterns, preserving model granularity across development, quality assurance, and production environments.
Refactored monolithic SQL into modular clean and calculate transformation stages, a pattern analogous to dbt staging and marts. Created a unified view abstraction layer merging legacy and modern structures, enabling zero-downtime migration for downstream business intelligence consumers and accelerating peer reviews through clear separation of transformation and analytics code.
Built a Python and Apache Airflow observability module with configuration-driven validation checks orchestrated via DAG modules across 12+ pipelines. Reduced debugging time by 60% and prevented 15+ monthly data quality incidents through automated schema validation, anomaly detection, and threshold alerting for critical SLAs.
Promoted to Technical Gatekeeper within 3 months to govern the domain by enforcing defensive coding standards. Authored end-to-end validation documents including entity relationship diagrams (ERDs), data flow diagrams, and count-by-stage proofs, establishing the team standard for peer review and knowledge transfer.
Led requirements gathering and technical feasibility assessments for new data pipeline initiatives. Translated business needs into technical architectures and implementation roadmaps by facilitating cross-functional alignment sessions with stakeholders to define scope, deliverables, and success metrics.
Built visualizations and presented pipeline performance and data quality metrics to directors, team leads, and business intelligence analysts. Drove buy-in for platform modernization initiatives through data-driven executive reporting.
Expanded Scope: Platform Delivery & Cross-Domain Ownership
Shipped a 78-attribute analytics platform in MicroStrategy (business intelligence platform) integrating four operational systems: operations API (SmartPath), asset management (Maximo), billing (IPACT), and directory services (LDAP). Built derived metrics, conditional formatting, and cross-filter interactivity; deployed to production with director sign-off via a structured Dev to Pre-Prod to Prod migration pipeline.
Reduced build cost and risk through strategic architectural decision-making by choosing a SQL view over a physical fact table for joining event and request data with calculated duration metrics. Blocked out-of-scope integration requests and expanded the design from 13 columns to 78 attributes to eliminate recurring ad hoc request cycles.
Maintained 12 dependent data pipelines as backup owner for the production operations API (SmartPath) during a company-wide code embargo. Diagnosed and resolved critical ETL failures in error staging tables and business intelligence metric misconfigurations under time pressure without senior engineer availability.
Ramped from zero domain knowledge to full system comprehension within 2 weeks for a five-system data pipeline spanning Salesforce to billing to operations (CS Attack pipeline). Built a repeatable metadata discovery method using the Teradata system catalog (DBC.Columns) to locate target fields, validate source table population rates, and confirm join feasibility with evidence.
Co-authored the team engineering workflow standard, formalizing separation of Investigation (feasibility analysis, validation) and Implementation (code, test, deploy) phases. Established tracking protocols, stakeholder scope boundaries, and a mandatory front door ticket process for budget accountability.
Became the canonical source of institutional knowledge across three domains (NTS, SmartPath, CS Attack) during a leadership transition. Refactored team documentation into an Executive Summary plus Technical Appendix format and established a Visual-First methodology for director-level reporting.
Supporting Q1 2026 migration to Google Cloud Platform (GCP) and BigQuery, contributing to the architectural pivot from legacy Teradata wide tables to a snowflake schema design pattern. Preparing a pipeline refactoring roadmap from SAS Data Integration Studio (legacy ETL tool) to cloud-native orchestration using directed acyclic graphs (DAGs) on Cloud Composer.
Bioinformatics Software Development Research AssistantJohns Hopkins UniversitySeptember 2022 – PresentBaltimore, Maryland, United States (Remote)
Cross-institutional oncology research integrating 750+ TB of multi-omics data across 3 cancer types. Built and maintain an open-source bioinformatics platform used by 100+ global researchers. This role grew from my foundational work at University of Toronto.
Reduced analysis load times by 83% through optimized caching on a full-stack bioinformatics platform supporting more than 100 global researchers. Built the platform using Python, R, JavaScript, and C with microservices architecture, SOLID principles, and Docker containerization.
Engineered scalable ETL pipelines processing over 750 terabytes of multi-omics data on high-performance computing (HPC) clusters. Accelerated biomarker discovery by 40% and reduced analysis time by 40% using Python, R, SQL, and machine learning models including Support Vector Machine Recursive Feature Elimination (SVM-RFE) and Random Forest.
Improved data integrity by 30% by implementing automated data quality checks and anomaly detection using unsupervised machine learning (K-Means, DBSCAN) with TensorFlow within continuous integration and continuous deployment (CI/CD) pipelines. Validated biomarker analysis software using TensorFlow, Keras, and Scikit-learn.
Built interactive data visualization dashboards for molecular modeling and educational use using Shiny, React, and D3.js. Improved usability and accessibility for researchers working with complex genomic datasets.
Developed and optimized REST and GraphQL APIs to support real-time data access and model simulations across research modules. Enabled seamless integration between data processing pipelines and frontend applications.
Configured AWS environments including Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3) and automated testing and deployment workflows with GitHub Actions. Improved reliability and collaboration across development teams through CI/CD automation.
Applied secure data management and governance practices to ensure compliance with institutional privacy and research ethics standards. Maintained data lineage documentation for audit trails.
Collaborated with cross-functional experts including oncologists and statisticians to align computational workflows with research goals. Mentored peers on high-performance computing, reproducible software practices, and batch processing patterns.
Authored a 35+ page research manuscript featuring interactive R Shiny and D3.js data visualizations, published on GitHub and Zenodo to underscore reproducibility and transparency in data science projects.
Software Development Research AssistantUniversity of TorontoSeptember 2019 – April 2024Toronto, Ontario, Canada (Hybrid)
Where my research career began. Built full-stack bioinformatics applications automating workflows across 7 wet lab research teams. This foundational work led to the cross-institutional collaboration with Johns Hopkins University.
Reduced analysis effort by more than 30 hours per week across 7 research teams by engineering full-stack bioinformatics platforms. Built automation using Python, R, C, and Java with object-oriented programming patterns to streamline lab workflows.
Owned the full software development life cycle (SDLC) including requirements, architecture, implementation, testing, deployment, and maintenance. Translated multidisciplinary research requirements into production-grade software solutions.
Cut setup and configuration time by 50% by implementing Docker-based DevOps workflows to eliminate environment drift. Enabled reproducible and scalable computation across research environments.
Improved user interface render times by 45% for large genomic datasets by optimizing data visualization performance in Next.js and Tailwind CSS. Enhanced research usability through frontend performance tuning.
Led Agile Scrum adoption and mentored a team of 5 junior developers. Increased throughput and strengthened cross-team collaboration through structured sprint planning and retrospectives.
Awards & Achievements
Plenary Speaker
National Collegiate Research Conference (NCRC) — Harvard University
Selected as 1 of only 12 plenary speakers from over 5,000 national applicants. Delivered keynote presentation on applying machine learning techniques to integrate transcriptomics and proteomics data for glioblastoma research.
Best Detailed Oral Presentation Award
Annual Biomedical Research Conference for Minoritized Scientists (ABRCMS) — Computational and Systems Biology Division
Awarded top presenter in the Computational and Systems Biology division, selected from 80 oral presenters at a conference with over 3,500 attendees. Recognized with a $2,500 award for travel and accommodation.
Best Poster Presentation Award
Annual Biomedical Research Conference for Minoritized Scientists (ABRCMS) — Graduate Division
Received top honors for graduate-level poster presentation, competing among 150+ graduate presenters. Presented research on advancing open-source bioinformatics platforms. Recognized with a $2,500 award for travel and accommodation.
Poster Presentation
National Collegiate Research Conference (NCRC) — Harvard University
Presented research poster detailing computational approaches for cancer biomarker identification using integrated multi-omics analysis and machine learning methodologies.
Friends of Arts and Science Award
University of Toronto — Faculty of Arts & Science
Awarded for academic excellence in Computer Sciences and Physical & Life Sciences disciplines during the 2023-2024 academic year.
Friends of Arts and Science Award
University of Toronto — Faculty of Arts & Science
Awarded for academic excellence in Computer Sciences and Physical & Life Sciences disciplines during the 2022-2023 academic year.
Friends of Arts and Science Award
University of Toronto — Faculty of Arts & Science
Awarded for academic excellence in Computer Sciences and Physical & Life Sciences disciplines during the 2021-2022 academic year.
Education
University of Toronto
St. George Campus
Bachelor of Science (Honours)
Major GPA: 3.96 / 4.0
Relevant Coursework
Computer Science
- CSC108H1 — Introduction to Computer Programming
- CSC148H1 — Introduction to Computer Science
- CSC165H1 — Mathematical Expression and Reasoning for Computer Science
- CSC207H1 — Software Design
- CSC209H1 — Software Tools and Systems Programming
- CSC236H1 — Introduction to the Theory of Computation
- CSC263H1 — Data Structures and Analysis
- CSC373H1 — Algorithm Design and Analysis
- CSC384H1 — Introduction to Artificial Intelligence
Bioinformatics & Computational Biology
- BCH441H1 — Bioinformatics
- BCB410H1 — Applied Bioinformatics
- BCB420H1 — Computational Systems Biology
- BCB330Y1 — Bioinformatics Research Project
- BCB430Y1 — Advanced Bioinformatics Research Project
- CSB352H1 — Bioinformatic Methods
Statistics & Probability
- STA247H1 — Probability with Computer Applications
- STA237H1 — Probability, Statistics and Data Analysis I
Mathematics
- MAT135H1 — Calculus I (A)
- MAT136H1 — Calculus I (B)
Biochemistry
- BCH210H1 — Biochemistry I: Proteins, Lipids and Metabolism
- BCH311H1 — Biochemistry II: Nucleic Acids and Biological Information Flow
Immunology (Minor)
- IMM250H1 — The Immune System and Infectious Disease
- IMM340H1 — Fundamental Immunology
- IMM350H1 — The Immune System in Action
Interested in Working Together?
I'm actively seeking opportunities in data platform engineering, data engineering, and software development.