Education

University of Virginia

Master of Science • August 2022 — December 2023

Computer Science

  • Relevant Coursework - Machine Learning, Cloud Computing, Geometry of Data.

Indian Institute of Technology, Roorkee

Bachelor of Technology • July 2013 — May 2017

Major in Mechanical Engineering, Minor in Computer Science

  • Relevant Coursework - Design and Analysis of Algorithms, Data Structures, Numerical Methods, Product and Process Optimization, Database Management Systems, Discrete Structures, Computer Architecture and Microprocessors
  • Online Courses - Deep Learning Specialization(Coursera), Cs109(Harvard University), Cs50x(Harvard University), Cs231n(Stanford University), NLP(HSE, Russia), Machine Learning(Stanford University),Probabilistic Graphical Models(Coursera)

Publication

MaNi - Maximizing Mutual Information for Nuclei Cross-Domain Unsupervised Segmentation

Yash Sharma , Sana Syed, Donald Brown.
Published in MICCAI 2022. arXiv:2206.14437

Weakly Supervised Deep Instance Nuclei Detection using Points Annotation in 3D Cardiovascular Immunofluorescent Images

Nazanin Moradinasab, Yash Sharma , Laura S. Shankman, Gary K. Owens, Donald E. Brown.
Published in MLHC 2022. arXiv:2208.00098

Cluster-to-Conquer - A Framework for End-to-End Multi-Instance Learning for Whole Slide Image Classification

Yash Sharma , Aman Shrivastava, Lubaina Ehsan, Christopher A. Moskaluk, Sana Syed, Donald Brown.
Published in MIDL 2021. arXiv:2103.10626

HistoTransfer - Understanding Transfer Learning for Histopathology

Yash Sharma , Lubaina Ehsan, Sana Syed, Donald Brown.
Published in IEEE BHI-BSN 2021. arXiv:2106.07068

Encoding Cardiopulmonary Exercise Testing Time Series as Images for Classification using Convolutional Neural Network

Yash Sharma , Nick Coronato, Donald Brown.
Accepted in NeurIPS 2021 - MLPH Workshop and IEEE EMBC 2022.

Self-Attentive Adversarial Stain Normalization

Aman Shrivastava, Will Adorno, Yash Sharma , Lubaina Ehsan, S. Asad Ali, Sean R. Moore, Beatrice Amadi, Paul Kelly, Sana Syed, Donald Brown.
Published in International Conference on Pattern Recognition (ICPR) 2021. arXiv:1909.01963

Experience

Scale AI

Machine Learning Research Engineer Intern • May 2023 — August 2023

  • Worked on Computer Vision Foundation model.

Gastroenterology Data Science Lab, University Of Virginia

Machine Learning Researcher • July 2020 — April 2023

  • Whole Slide Image Modeling - Developed a semi-supervised segmentation and classification pipeline for gigapixel-sized histology images, leading to publications in multiple conferences.
  • Transcriptomics Data - Integrated flux balance analysis with machine learning for identifying the upregulated and downregulated genes in gastrointestinal diseased patients.

Freshworks

Data Scientist • March 2020 — June 2020

  • Sales Journey Forecasting - Worked as a part of feature engineering and ML modeling team for developing features and learning the temporal trend in sales journey for forecasting deal closure. Used Xgboost with dynamic sample weighting for tackling the challenge of imbalance in the data. Used Bayesian optimization framework and MLflow for tuning the hyperparameters by validating performance across multiple months, achieving precision >= 0.5 and recall >= 0.4 for all the accounts.

ZS Associates

Data Science Associate Consultant • July 2017 — March 2020

  • Pathway Clustering - Designed a Semi-Parametric Hawkes Process for interpretable modeling and learning of event intensity in the patient journey. Used event intensity vector for clustering the patients. Scaled the implementation using Pytorch framework.
  • Key Influencer Mapping - Developed a semi-supervised hierarchical clustering approach for author disambiguation in the medical archive database achieving F1 score of 0.91 on the manually annotated data and a future influencer identification approach using SNA algorithm PageRank with exponential influence decay. Approach was accepted and presented at 22nd Merck Technology Symposium.
  • Patient Journey Line Prediction - Led the development of disease area agnostic pipeline for the line of treatment identification in patient-level sequence data using Genetic algorithm, domain encoded treatment level information in graph, and mean-shift clustering.
  • Competitive Intelligence - Predicted launch date of competitor drugs using GLM trained on event impact features along with trial features to aid client in strategic decision making. Implemented multi-state markov model to learn trial and global event impacts on a drug journey. Achieved MSE of 10 months for phase 1 trial, 6 months for phase 2 trial, and 3 months for phase 3 trial drugs.
  • Information Extractor - Built a text pre-processing pipeline and implemented "Self-Taught CNN for Short Text Clustering" to extract and identify topic of signals from Combination Product pertaining news articles and conference papers.
  • Safety Alert Tool - Used LDA and a DBSCAN based tf-idf clustering algorithm to analyze site safety logs. Trained XGBoost model on cluster matrix to alert workers about potential incidents at pharmaceutical sites. Developed solution was presented by client in Health AI Summit showcasing the novel use of Data Science in quality operations.
  • Clinical Trial Design Optimization - Created data mining pipeline comprising of XML parser, coreNLP, spaCy modules ,MeSH ontologies, CRFs, etc. for entity extraction, and entity linking tasks from clinicaltrials.gov and PubMed to expedite knowledge discovery for trial researchers.
  • Response Analyzer - Used incrementally trained word vector on medical corpus along with WMD semantic matching technique to identify whether managers are complying to client's coaching standards for medical representative training or not.
  • Pharmacovigilance - Developed an application to identify tweets indicating adverse effects of the concerned drug using CNN model.

Busigence

Data Science Intern • May 2016 — July 2016

  • Statistical Testing Module - Designed a hypothesis testing module to aid in feature engineering, feature transformation, and outlier detection.
  • Ensemble Modeling Module - Built a module to select appropriate bagging/boosting model based on performance metric and tune its hyperparameters using Hyperopt.

Projects

Under the guidance of Prof. Siladitya Pal • July 2016 — May 2017

  • Developed a data-driven algorithmic framework to aid in surrogate modeling of composites based on simulation data.
  • ML powered framework reduced computation time from 2 days to 10 minutes achieving the same accuracy as that of traditional methods.

Project • October 2019

  • Designed a CycleGAN inspired generator-discriminator architecture for recovering degraded celeb images.
  • Used weighted sum of pixelwise L2 loss and VGGNet perceptual loss along with Adam optimizer for high quality output.

Project • July 2018

  • Built a dialogue chatbot to assist with stack overflow queries. Hosted it on Telegram and AWS.
  • Created intent classifier to classify queries as pertaining to stack overflow or not. Used chatterbot with incremental training for natural language generation and starspace embedding based semantic match for natural language query.

Project • December 2017

  • Built a question-answer format AI based tool to aid investigators in researching medical corpus like PubMed, etc.
  • Used Bi-directional attention flow model for machine comprehension task. Awarded runner-up position in the hackathon.

Project • July 2016

  • Developed ML based pipeline consisting of data pre-processing, feature engineering, model selection, and hyperparameter tuning step.
  • Tested the pipeline on range of supervised learning hackathons and achieved top 1\% rank in multiple of them.

Extra-Curriculur and Community Engagement

Assistant Capstone Advisor, School of Data Science, University of Virginia

As part of research lab, mentoring 4 students from University of Virginia, MS in Data Science for their yearlong capstone project.

Paper Reviewing

Reviewed papers for MIDL 2022, MIDL 2021, MDPI 2020, Biomedical and Health Informatics 2020.

Co-founder, Data Science Group

Started a student-run group aimed to improve the Machine Learning, Analytics, and Deep Learning culture on campus. Worked on projects focused on tackling college level issues, and maintained a Medium blog.

Joint Secretary, Photography Section

Organized 4 major exhibitions, photo-walks and workshops for students and professors.

Goalkeeper, Football

Earned gold medal at the inter-year and inter-hostel tournament.

Core Member, Team KNOx

Designed and fabricated the suspension system of an All Terrain Vehicle for the Baja SAE tournament.

Core Member, Quizzing

Represented IIT-R at inter IIT-IIM quizzing contest. Finished 5th among 25 institutes.