Publications
NLPExplorer : Exploring the Universe of NLP Papers (ECIR 2020)
Webapp: nlpexplorer.org
- Developed a system using shell scripting and Python to periodically mine research article metadata and PDF’s from ACL Anthology, apply OCR, index papers and derive statistics such as paper topics, citation graphs and similar papers. Stored retrieved data in MongoDB and elasticsearch
- Developed a full-stack web application and REST API(4000+ monthly users post publication) using Flask to visualise derived statistics and open source our data with an aim to make research more accessible
Lessons from Large Scale Campus Deployment (DATA 2020)
- Engineered a system to track water consumption, electricity, solar produce and user occupancy using 66 sensors and existing WiFi infrastructure to collect ∼190MB data daily with an aim to reduce water and electricity wastage
Projects
Data Deduplication in Dirty Tabular Data
- Performed feature engineering, and applied Machine Learning models (Regression, Trees and Neural Networks) to classify whether two categorical data points are duplicates. Achieved a best result of 96% accuracy using Random Forest Classifier trained on string similarity features on bigrams and trigrams
Unix Shell | github.com/sohampachpande/unix-shell
- Developed a command line interpreter which implemented commands for modifying files, reading/editing files, and system monitoring in C
Agribot | 3rd Best Paper Award at ICSTEM-Vibrant Gujarat 2019
- Built an end to end AI driven chat-bot to answer agricultural queries of Indian Farmers trained using textual data from farmer hotline call logs. Contributed to data mining, data cleaning and feature extraction processes