Teaching an LLM Pintora: A Journey into Niche Diagraming

Teaching an LLM Pintora: A Journey into Niche Diagraming

Teaching an LLM a Niche Diagraming Language Text-to-diagram capabilities of LLMs seem well resolved with popular languages like Mermaid or PlantUML. However, there are several other niche diagramming languages like D2, Structurizr, or Pintora. This article explores the endeavor of teaching an LLM one of these less popular languages, focusing on Pintora. The project aims to enable the LLM to generate diagrams from scratch and edit existing ones. The base model selected for this project is Qwen2.5-Coder-7B due to its coding model characteristics. The training is divided into two phases: Continued Pretraining (CPT) where the LLM learns Pintora syntax and structures, followed by Instruction Finetune (IFT) to train the model on specific diagram generating and editing tasks. The dataset preparation involves creating training data for both phases, requiring diverse examples of different diagram types. As manual data creation was deemed unproductive, AI was employed to generate training data. The model was trained on a 48GB A40 GPU to manage the substantial VRAM demands. The accuracy evaluation demonstrated an 86% accuracy rate for generating syntactically correct Pintora diagrams. The project provided valuable learnings and insights for future experiments, possibly exploring RL-based training and other niche languages like Strudel. Links to the model, datasets, and evaluation results are provided for further exploration. Enjoy AI-assisted diagramming with ChatUML! You can benefit from a 60% discount using the code PINTORA. Explore the full article for detailed insights into training an LLM with a niche diagramming language.

Share this post

Leave a Reply

Your email address will not be published. Required fields are marked *