My Journey & Experience
A timeline of my professional roles, research contributions, key achievements, and education.
Work Experience
Junior Full Stack Software Engineer @ Johns Hopkins University(Sept 2022 – Present | Remote)
Spearhead large-scale oncology research projects by integrating 750+ TB of multi-omics Big Data from sources such as DISQOVER, ENCODE, PCAWG, PRIDE, and TCGA. Develop data-driven pipelines in Python, R, and C, contributing to the discovery of 8 novel biomarkers, accelerating validation timelines by 40%, and advancing oncology insights.
Architect and maintain an open-source full stack bioinformatics platform following SOLID principles and microservices architecture, utilizing Python, R, JavaScript, and C with containerization (Docker, Kubernetes). Designed optimized caching strategies, reducing genomic and proteomic analysis load times by 83% and increasing platform adoption across 100+ global researchers.
Engineer scalable data processing pipelines using advanced data structures and algorithms, integrating SQL-based ETL workflows with machine learning models (SVM-RFE, Random Forests) on HPC infrastructure (Rockfish). Leverage scientific computing libraries (MaxQuant, Bioconductor, NumPy, SciPy) to cut genomic analysis time by 40% and boost prediction accuracy by 20%.
Devise automated data quality and anomaly detection pipelines by integrating unsupervised ML (K-Means, DBSCAN) and rule-based heuristics using Python, scikit-learn and TensorFlow. Integrated these into ETL and CI/CD workflows to flag real-time anomalies, boosting data integrity by 30% across large datasets.
Develop interactive data visualization portals using React, Next.js, TypeScript, D3.js, and R Shiny, improving data accessibility and user engagement by 56% through optimized rendering, virtual DOM updates, and real-time WebSockets-based data streaming.
Implement frontend performance optimizations using Next.js SSR, code splitting, and lazy loading, improving page load speeds by 40% and enhancing large-scale data visualization rendering.
Enhance API performance by implementing GraphQL and RESTful API optimizations, introducing API Gateway, query batching, caching, and load balancing, reducing backend request latency by 35%.
Develop and deploy fault-tolerant microservices-based infrastructures integrating Python, C, and Java, unifying HPC workloads with AWS (S3, EC2, Lambda, DynamoDB) for high-throughput sequencing.
Integrate OAuth/Auth0 authentication for secure role-based access control (RBAC) across multi-institutional collaborations, ensuring strict data security compliance.
Build and validate GBM biomarker analysis software using TensorFlow, Keras, Scikit-learn, and Pandas, applying unit testing and model evaluation metrics to support precision medicine strategies, epidemiological studies, and drug discovery workflows.
Apply Test-Driven Development (TDD) principles, maintaining 95%+ test coverage and automating CI/CD pipelines (Git, Jenkins, Docker), increasing deployment reliability and reducing release cycles.
Established robust version control practices using Git and GitHub, enforcing branch protection and automated code reviews. Integrated these workflows with CI/CD pipelines to reduce merge conflicts by 25% and streamline collaborative development across multi-institutional projects.
Author a 35+ page research manuscript featuring interactive R Shiny and D3.js data visualizations, published on GitHub and Zenodo to underscore reproducibility and transparency in data science projects.
Ensure compliance with IRB and data privacy protocols when handling sensitive patient data, establishing rigorous data governance and secure data-access policies for multi-lab collaborations.
Collaborate with cross-functional domain experts (oncologists, statisticians, and data scientists) to prioritize research goals and shape pipeline development, ensuring alignment with clinical needs.
Mentor 15+ undergraduate and junior researchers across multiple research groups through hands-on workshops on HPC workflows, containerization, integrations, API development, and bioinformatics software engineering, fostering continuous learning and collaboration.
Present research findings at prestigious conferences, including ABRCMS and the National Collegiate Research Conference (NCRC) at Harvard University, earning multiple awards such as Graduate Oral Presenter, Plenary Speaker, and Travel Award.
Backend Software Developer Intern @ Outlier(Mar 2024 – Nov 2024 | Remote)
Developed and optimized AI-generated code using Python, Java, and C, enhancing model efficiency by 5% and reducing code errors by 10% through advanced data structures and algorithmic optimizations.
Designed and deployed serverless computing functions using AWS Lambda and API Gateway, contributing to infrastructure cost reductions of 30% and optimized execution time for on-demand tasks.
Built and refactored GraphQL and RESTful APIs with Python (FastAPI, Flask), Java (Spring Boot), and Node.js, implementing query batching, caching, and API Gateway integration. Refactored monolithic apps into microservices to enhance performance, scalability, and maintainability.
Collaborated on training generative AI models in software engineering, data analysis, and machine learning; implemented unit and integration tests to ensure model reliability and validity.
Assisted in debugging and optimizing production code under senior engineers' guidance, identifying bottlenecks and improving performance by 20%.
Conducted rigorous evaluations and enhancements of AI-generated code, increasing AI model performance by 3% while ensuring strict adherence to industry standards and best practices, significantly reducing code inefficiencies.
Utilized Agile (Scrum) methodologies to streamline development processes, ensuring iterative delivery, and fostering cross-team collaboration for enhanced productivity and timely project completion.
Software Development Research Assistant @ University of Toronto(Sept 2019 - Apr 2024 | Hybrid)
Engineered multiple full-stack bioinformatics applications using Python, R, C, C++, and Java, automating workflows that saved 30+ hours weekly and enhancing lab efficiency across 7 research teams. Applied object-oriented programming (OOP) methodologies to improve oncology, genomics, and protein structure-function analysis.
Designed and integrated a microservices architecture with GraphQL and RESTful APIs, adhering to SOLID design principles, and incorporating SQL (PostgreSQL, MySQL) databases to manage structured bioinformatics data from public repositories (EBI, NCBI). Optimized query performance through batch processing workflows, reducing API response times by 20% and improving data retrieval efficiency by 25%.
Established and maintained DevOps infrastructure using Docker for containerization and Kubernetes for orchestration, reducing environment setup time by 50% and enabling seamless deployment across high-performance computing (HPC) clusters for resource-intensive computations.
Developed frontend optimization strategies using Next.js, Tailwind CSS, and WebAssembly, reducing UI render times by 45% and improving real-time data visualization responsiveness.
Specialized in molecular modeling and 3D structural analysis using UCSF ChimeraX, Python, C++, and Shell scripting to automate biomolecular structure processing. Implemented parallelized computations, improving protein structure analysis accuracy by 20%.
Applied Agile methodologies (Scrum/Kanban) to enhance team collaboration, ensuring scalable system architecture, reliable deployments, and continuous integration/delivery (CI/CD) practices.
Conducted advanced bioinformatics analyses, including gene set enrichment analysis, mutation impact studies, and predictive modeling using Python and R, contributing to cutting-edge oncology research.
Mentored undergraduate students and junior researchers, providing guidance on bioinformatics tools, scalable software architecture, HPC, and DevOps best practices, fostering a culture of continuous learning.
Awards & Achievements
- Plenary Speaker, National Collegiate Research Conference (NCRC) - Harvard University (2024): Selected as 1 of only 12 plenary speakers from over 5,000 national applicants. Delivered keynote on applying machine learning to integrate transcriptomics and proteomics for glioblastoma research, focusing on immune cell composition, biomarker discovery, and personalized treatment strategies.
- Best Detailed Oral Presentation - ABRCMS Conference (2023): Awarded top presenter in the Computational and Systems Biology division (selected from 80 oral presenters; 3,500+ attendees). Showcased computational models for cancer biomarker identification using high-throughput sequencing data. Recognized with $2,500 for travel and accommodation.
- Best Poster Presentation - ABRCMS Conference (2024): Received top honors for graduate-level poster presentation (competed among 150+ graduate presenters). Presented research on advancing open-source bioinformatics platforms and computational approaches for scalable genomic data analysis. Recognized with $2,500 for travel and accommodation.
- Poster Presentation - National Collegiate Research Conference (NCRC) - Harvard University (2024): Presented research poster detailing computational approaches for cancer biomarker identification, focusing on integrating omics data and ML techniques for predictive oncology.
- Friends of Arts And Science Awards - University of Toronto (2022, 2023, 2024): Received multiple awards recognizing academic excellence in both Computer Sciences and Physical & Life Sciences disciplines throughout undergraduate studies.
Education
University of Toronto
St. George Campus
Bachelor of Science (Honours)
Graduated June 2024
Specialist: Computer Science, Bioinformatics & Computational Biology
Minor: Immunology
Major GPA: 3.96 / 4.0
Relevant Coursework: Data Structures & Analysis (CSC263), Software Design (CSC207), Systems Programming (CSC209), Algorithm Design & Analysis (CSC373), Computability & Complexity (Theory), Operating Systems, Database Systems, Machine Learning Principles, Distributed Systems Design, Cloud Computing Concepts, Computer Networks, Mathematical Reasoning for CS (CSC165), Applied Bioinformatics (BCB410), Systems Biology (BCB420), Core Bioinformatics (BCH441/BCB410), Calculus, Statistics & Probability, Advanced Project Courses (BCB330Y1/BCB430Y1 - Omics Integration & ML for Protein Interaction).