Projects

A collection of production systems, data pipelines, and open-source tools I've built.

Back to Home

Enterprise & Research Projects

Production systems built at Bell Canada, Johns Hopkins, and other organizations

🔮

NTS/MS Archway Pipeline@ Bell Canada

End-to-end ETL pipeline integrating 4 enterprise data sources (SmartPath API, Maximo, IPACT, LDAP) into unified Control Plan reporting. 3-tier architecture (Staging → Jarvis → Analysis) processing 150,000+ records in ~20 minutes with schema-versioned loads across DEV/QA/PROD environments.

PythonSQLSAS DITeradataETLData Warehousing
⏱️

Duration Calculation Engine@ Bell Canada

Stateful Python algorithm computing 3 distinct duration metrics (d1, d2, d3) measuring agent work cycles. Refactored flawed sequential method into robust two-pass group-by propagation model using pandas shift operations with bfill/ffill propagation. Achieved zero calculation defects in QA validation.

PythonPandasAlgorithm DesignData Engineering
🔍

Data Quality Recovery System@ Bell Canada

Full-stack Root Cause Analysis (RCA) diagnosing systemic data integrity drift from misaligned filtration invariant. Executed three staged historical recasts correcting 78,000+ edge-case records, expanding analytical coverage from 1 to 9+ months (+800%) and raising ticket match accuracy to highest level since system inception.

SQLData QualityRoot Cause AnalysisETL

Query Optimization Framework@ Bell Canada

Re-architected critical performance anti-pattern by replacing direct join to 23-million-row live table with pre-aggregated static reference design. Reduced query runtime from 12 minutes to 2 minutes (83% improvement) and stabilized nightly SLA compliance.

SQLPerformance TuningData ModelingOptimization
🔗

Data Consolidation Layer@ Bell Canada

Built transformation logic to consolidate and standardize values from four different source systems using COALESCE waterfall patterns. Fixed 80,000 product discrepancies through proper source table usage and created unified customer name, product, and ticket number fields.

SQLData IntegrationETLBusiness Logic
🧬

Bioinformatics Platform@ Johns Hopkins

Open-source full-stack bioinformatics platform following SOLID principles and microservices architecture. Features interactive D3.js visualizations, R Shiny dashboards, and real-time WebSocket data streaming. Reduced genomic analysis load times by 83% and increased platform adoption across 100+ global researchers. Private deployment at Johns Hopkins.

ReactD3.jsR ShinyPythonDockerMicroservices
🔬

Multi-Omics Data Pipeline@ Johns Hopkins

Scalable data processing pipeline integrating 750+ TB of multi-omics Big Data from DISQOVER, ENCODE, PCAWG, PRIDE, and TCGA. Implemented ML models (SVM-RFE, Random Forests) on HPC infrastructure contributing to discovery of 8 novel biomarkers and accelerating validation timelines by 40%. Private deployment at Johns Hopkins.

PythonRSparkAWSMachine LearningHPC

Personal & Open Source Projects

Side projects, tools, and contributions to the developer community

🦠

Microbiome Explorer

Interactive visualization platform for exploring microbial community data with taxonomic profiling, diversity analysis, and comparative metagenomics tools. Built with React frontend and Python backend for processing 16S rRNA sequencing data.

ReactD3.jsPythonBioinformaticsData Visualization
🎯

Anomaly Detection System

Automated data quality and anomaly detection pipelines integrating unsupervised ML (K-Means, DBSCAN) and rule-based heuristics. Integrated into ETL and CI/CD workflows to flag real-time anomalies, boosting data integrity by 30% across large datasets.

PythonScikit-learnTensorFlowCI/CD
🚀

Portfolio Website

This cyberpunk-inspired portfolio built with Next.js featuring animated canvas backgrounds with floating symbols, custom cursor with lerp interpolation, glassmorphism design system, and responsive layout. Showcases projects, skills, and experience.

Next.jsReactCSSCanvas APIAnimation
📊

Algorithm Visualizer

W.I.P. Interactive web application for visualizing sorting algorithms, pathfinding algorithms, and data structures. Features step-by-step execution, speed controls, and custom input generation for educational purposes.

ReactTypeScriptAlgorithmsEducationCanvas
🛠️

SQL Query Optimizer Tool

W.I.P. Command-line tool for analyzing SQL query execution plans and suggesting optimizations. Parses EXPLAIN output, identifies common anti-patterns, and recommends index strategies.

PythonSQLCLIPerformance

Want to Collaborate?

I'm always interested in working on challenging data engineering and platform projects.

PDF