Have you ever tried to convert medical prescriptions into a digital format? It’s not as easy as it sounds. The sheer variety of formats and structures can make it a daunting task. I recently faced this challenge and thought I’d share my experience with you.
I tried using OCR (Optical Character Recognition) to extract the text from medical prescriptions. However, the output was far from perfect. The structure and semantic meaning of the original document were lost in the process. To preserve the structure, I decided to convert the prescription to ASCII, but that presented its own set of challenges.
I used Gemini 2.5Pro to convert the prescription to ASCII, but the output was not ideal. The table structure was distorted, and the positioning of certain elements was incorrect. You can see the output I got [here](https://limewire.com/d/JGqOt#o7boivJrZv).
So, my question is, how can I improve this process? Is there an open-source VLM (Vision-Language Model) that can understand the structure of medical prescriptions and convert them to ASCII accurately? How can I fine-tune this model to produce better results?
Ideally, I want a solution that preserves the original structure of the prescription and uses ASCII characters to represent tables and other elements. If there are no tables in the original document, I don’t want the output to include them.
If you have any experience with OCR, VLM, or ASCII generation, I’d love to hear your thoughts and suggestions.