My Blog

In search of optimality

CV

Life in two pages

My Code

Python, Js and more to come


I am an Engineering Manager/Tech Lead at Instabase. Instabase is a platform that lets the largest financial institutions in the world automate their document processing workflows. I lead an incredible team called Flow Execution, and we manage Instabase's document-processing backend called Flow. Flow is a dataflow system built for documents and resembles a hybrid of Spark and Airflow. It is responsible for processing millions of documents every day.

Previously, I completed my Ph.D. in the Database Group at MIT. My thesis was "Interactive data analytics using GPUs", and my advisor was the amazing Prof. Sam Madden.
Modern GPUs provide an order-of-magnitude greater FLOPs and memory bandwidth compared to CPUs. In the thesis, I presented Crystal, a system that lets you run data processing (SQL) queries directly on GPUs. The system can process O(TB) data in milliseconds, enabling interactive data analytics. I also presented theoretical models to understand the true nature of the performance gains of GPUs over CPUs for data analytics workloads.

I have previously worked with Srikanth Kandula at Microsoft Research – Redmond, on using approximation query techniques for big data queries and with Prof. Sudarshan on improving join enumeration in transformation-based query optimizers. I completed my B.Tech. in CS at IIT Bombay.

In my free time, I like to hack (in a good way :)). I run Dictanote and Voice In. Voice In is a Chrome extension that lets you use voice to type on over 10,000 websites in 50+ languages. Voice In transcribes your speech to text in real-time. You can use it to type emails in Gmail, enter data into Teladoc, write blogs in WordPress, etc. You can find more details at dictanote.co/voicein/. What started as a hobby project now has around 400,000+ users.

Publications

SIGMOD 2022 Tile-based Lightweight Integer Compression in GPU
Anil Shanbhag, Bobbi Yogatama, Xiangyao Yu, Samuel Madden (code)

DAMON 2020 Large-Scale In-Memory Analytics on Intel Optane DC Persistent Memory
Anil Shanbhag, Nesime Tatbul, David Cohen, Samuel Madden (code)

SIGMOD 2020 A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics
Anil Shanbhag, Samuel Madden, Xiangyao Yu (code)

SIGMOD 2018 Efficient Top-K Query Processing on Massively Parallel Hardware
Anil Shanbhag, Holger Pirk, Samuel Madden (slides) (code)

VLDB 2018 Evaluating End-to-End Optimization for Data Analytics Applications in Weld
Shoumik Palkar, James Thomas, Deepak Narayanan, Pratiksha Thaker, Rahul Palamuttam, Parimajan Negi, Anil Shanbhag, Malte Schwarzkopf, Holger Pirk, Saman Amarasinghe‡, Samuel Madden, Matei Zaharia

SoCC 2017 A robust partitioning scheme for ad-hoc query workloads
Anil Shanbhag, Alekh Jindal, Samuel Madden, Jorge Quiane, Aaron J. Elmore (slides) (code)

VLDB 2017 AdaptDB: adaptive partitioning for distributed joins
Yi Lu, Anil Shanbhag, Alekh Jindal, Samuel Madden

CIDR 2017 Weld: A common runtime for high performance data analytics
Shoumik Palkar, James J Thomas, Anil Shanbhag, Deepak Narayanan, Holger Pirk, Malte Schwarzkopf, Saman Amarasinghe, Matei Zaharia

VLDB 2016 Amoeba: A Shape changing Storage System for Big Data (Demo)
Anil Shanbhag, Alekh Jindal, Yi Lu, Samuel Madden (poster) (code)

ADMS/IMDM 2016 Locality-Adaptive Parallel Hash Joins using Hardware Transactional Memory
Anil Shanbhag, Holger Pirk, Samuel Madden (slides) (code)

SIGMOD 2016 Quickr: Lazily Approximating Complex Ad-Hoc Queries in Big-Data Clusters
Srikanth Kandula, Anil Shanbhag, Aleksandar Vitorovic, Matthaios Olma, Robert Grandl, Surajit Chaudhuri, Bolin Ding
VLDB 2014 Optimizing join enumeration in transformation-based query optimizers
Anil Shanbhag, S Sudarshan (slides)

Projects/ Awards/ Internships

You can have a look at my Curriculum Vitae

My older CV has a longer list of awards/projects from pre-PhD days.

Contact Me

The best way to contact me is to send mail to [email protected]
Other: Facebook Github