Pranav Vyas

KovaDx Internship

Explore the details of my software engineering internship aimed at classifying hematological diseases, such as sickle cell disease and spherocytosis using advanced algorithms.

Project Overview

During my Software Engineering Internship at KovaDx, a Yale InnovateHealth accelerated startup, I designed and developed machine learning models for the classification of sickle cell disease and spherocytosis. The objective of this project was to design a set of machine learning models that would be able to predict if the patient was at risk of a vaso-occlusive crisis based on their blood sample. The project involved analyzing 900,000+ cellular phase images to determine the disease stages using advanced classification techniques and dimensionality reduction methods such as t-distributed Stochastic Neighborhood Embedding (t-SNE).

Technologies Used

Technologies
  • Python
  • tensorflow/keras
  • scikit-learn
  • matplotlib
  • scipy
  • numpy
  • pandas
  • jupyter

Challenges & Solutions

The main challenge in this project was dealing with large-scale data (hundreds of thousands of images), which required optimizing the training process. To overcome this, I leveraged distributed computing techniques and model parallelism. Additionally, I handled class imbalances using techniques such as Synthetic Minority Over-sampling Technique (SMOTE) to balance the data and improve model accuracy.

Results

The machine learning models I developed were able to classify the disease stages with a high clustering accuracy, providing the potential to significantly help a medical team in early disease detection. The t-SNE embeddings provided clear visual representations of the data clusters, helping to interpret the patterns and results effectively.

Explore t-SNE with Handwritten Numbers

Since the healthcare data used in this project is private, I have created a Google Colab script that demonstrates the power of t-SNE on a public dataset. This example uses the MNIST dataset of handwritten digits and applies t-SNE to visualize how it can effectively reduce dimensionality and group similar data points based on their features.

Back to Projects