As researchers, we know how time-consuming it can be to design effective data visualizations. I used to spend hours searching through papers just to find inspiration for layouts and plot types. But what if I told you there’s a way to make this process more efficient?
I developed Plottie.art, a searchable, browser-based library of over 100,000 plots curated from scientific literature. The machine learning pipeline behind it combines a specialized computer vision model with a large language model (LLM) in a way that I think you’ll find interesting.
Here’s how it works: I started with a large collection of figure images sourced from open-access papers. The goal was to make each individual plot within these figures searchable. To do this, I trained a custom YOLOv12 model to segment complex figures into their constituent parts. This required manually annotating a dataset of 1,000 images to teach the model to accurately identify and isolate the boundaries of individual subplots and their captions.
Next, I used the Google Gemini API to classify each image by plot type (e.g., heatmap, UMAP) and extract relevant keywords for search. This approach was not only fast to implement but also yielded high-quality, structured metadata.
The result is a two-stage pipeline that allows the content on Plottie.art to be easily searched and explored. The tool is free, requires no login, and runs in the browser. I’d love to hear your feedback on the project and the technical stack. Are there any thoughts on combining specialized vision models with general-purpose LLMs for this type of application, or suggestions for improving the pipeline?