Open to DS / AI-ML roles

Data Scientist & AI/ML Engineer.

I turn messy, real-world data into models that make decisions. At BlackBox I cut document retrieval time 98% across 50,000+ documents; at PwC I hit 92% forecast accuracy on $20M/month in petrochemical trades. My focus is statistical rigor and pipelines built to be reproducible, not just to work once.

B.Sc. Computer Science, UMass Amherst ('26) · 3.82 GPA · Azure DP-100 & AWS AI Practitioner certified

View Projects Read Resume

Tech Stack

Tools I reach for across both sides of the stack.

Languages

Python SQL Java JavaScript TypeScript

Frameworks / Libraries

PyTorch scikit-learn XGBoost pandas / NumPy statsmodels / scipy Flask / FastAPI React / Node.js

Data, Cloud & Analytics

AWS (SageMaker, Lambda, ECS) Azure Docker / Kubernetes PostgreSQL / Snowflake Spark Tableau

How I Build Models

From raw data to a monitored, production endpoint.

Ingest

Raw data from APIs, warehouses & streams

Clean & Engineer

Handle nulls, build features

Train & Validate

XGBoost, Random Forest, cross-validation

Evaluate

Score against holdout & business metrics

Deploy & Monitor

Serve predictions, track drift

Projects

Filter by discipline, or see everything at once.

EquiSight

Data Science

Challenge: ran an A/B test across 50+ equities (ARIMA vs. gradient-boosted forecasting), cutting RMSE 18% at p < 0.05, then served it through a Flask/FastAPI + PostgreSQL backend and a React/Plotly dashboard — cutting API response time 57% (2.3s to 1s) at 99.9% uptime.

Flask / FastAPI PostgreSQL React / Plotly

GitHub Live Demo

COVID-19 Literature Clustering

AI/ML

Challenge: built an NLP pipeline (TF-IDF, t-SNE, K-means, LDA) to cluster 32,000+ CORD-19 research papers, preserving 95% variance while reducing a multi-thousand-paper corpus into navigable topic groups. (Lumiere Education Research)

Python TF-IDF t-SNE K-means / LDA

GitHub Notebook

Document Retrieval & Fitment Scoring

AI/ML

Challenge: cut document retrieval time 98% (30 min to <1 min) across 50,000+ documents with a statistical ranking pipeline, and automated resume fitment scoring for ~500 applications/cycle with a TF-IDF-based NLP model — reducing screening time 60%.

Python NLP TF-IDF Statistical Scoring

Proprietary — built during internship at BlackBox

Petrochemical Price Forecasting

Data Science

Challenge: hit 92% forecast accuracy on $20M/month in petrochemical (Paraxylene) trades — cutting forecast error from 45% to 28% with XGBoost + Random Forest — then containerized the model with Docker and deployed on AWS SageMaker, scaling real-time inference to 10,000+ records/day.

XGBoost scikit-learn Docker AWS SageMaker

Proprietary — built during internship at PwC

Contact

Based in New York, NY — open to Data Science / AI-ML roles.

[email protected] (413) 404-6243

New York, NY

github.com/VedaantAgrawal linkedin.com/in/vedaant-agrawal