| --- |
| license: apache-2.0 |
| language: |
| - en |
| base_model: |
| - Qwen/Qwen2-VL-7B-Instruct |
| pipeline_tag: visual-question-answering |
| datasets: |
| - DiagramAgent/DiagramGenBenchmark |
| --- |
| |
| [📑paper link](https://arxiv.org/abs/2411.11916) |
|
|
|
|
|
|
| ## Model Card: DiagramAgent/Diagram_to_Code_Agent |
| |
| ### 1. Model Overview |
| |
| - **Name**: DiagramAgent/Diagram_to_Code_Agent |
| - **Description**: |
| This agent is tasked with converting a given diagram (visual representation) into its corresponding structured code. |
| |
| ### 2. Intended Use |
|
|
| - Primary Tasks: |
| - Convert existing diagrams into structured code representations. |
| - Support diagram editing workflows by providing a reliable code basis for modifications. |
| - Capture and preserve implicit logical structures and visual details of diagrams. |
| - Application Scenarios: |
| - Automated diagram editing: Transforming a diagram into code to enable subsequent modifications. |
| - Reverse engineering of visual diagrams for analysis and reusability. |
| - Enhancing data visualization tools by integrating code-based diagram representations. |
|
|
| ### 3. Architecture and Training Details |
|
|
| - **Base Model**: Utilizes the Qwen2-VL-7B model, which is a vision-language fusion model. |
| - Training Process: |
| - Trained on diverse diagram samples from the DiagramGenBenchmark dataset. |
| - Aims to generate code that is highly consistent with a reference code, ensuring that all diagram elements are accurately captured. |
| - Uses a specialized loss function to reduce the edit distance between the generated and reference code. |
| - **Module Interaction**: |
| Works closely with the Check Agent, which validates the generated code and provides feedback for further refinement. |
| |
| ### 4. Usage Examples |
|
|
| ```py |
| from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor |
| from qwen_vl_utils import process_vision_info |
| |
| # default: Load the model on the available device(s) |
| model = Qwen2VLForConditionalGeneration.from_pretrained( |
| "DiagramAgent/Diagram_to_Code_Agent", torch_dtype="auto", device_map="auto" |
| ) |
| |
| # default processer |
| processor = AutoProcessor.from_pretrained("DiagramAgent/Diagram_to_Code_Agent") |
| |
| messages = [ |
| { |
| "role": "user", |
| "content": [ |
| { |
| "type": "image", |
| "image": "your input", |
| }, |
| {"type": "text", "text": "image path"}, |
| ], |
| } |
| ] |
| |
| # Preparation for inference |
| text = processor.apply_chat_template( |
| messages, tokenize=False, add_generation_prompt=True |
| ) |
| image_inputs, video_inputs = process_vision_info(messages) |
| inputs = processor( |
| text=[text], |
| images=image_inputs, |
| videos=video_inputs, |
| padding=True, |
| return_tensors="pt", |
| ) |
| inputs = inputs.to("cuda") |
| |
| # Inference: Generation of the output |
| generated_ids = model.generate(**inputs, max_new_tokens=8192) |
| generated_ids_trimmed = [ |
| out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) |
| ] |
| output_text = processor.batch_decode( |
| generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False |
| ) |
| print(output_text) |
| |
| ``` |
|
|
| ### 5. Citation |
|
|
| If you find our work helpful, feel free to give us a cite. |
|
|
|
|
| ``` |
| @inproceedings{wei2024wordsstructuredvisualsbenchmark, |
| title={From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing}, |
| author={Jingxuan Wei and Cheng Tan and Qi Chen and Gaowei Wu and Siyuan Li and Zhangyang Gao and Linzhuang Sun and Bihui Yu and Ruifeng Guo}, |
| booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, |
| year={2025} |
| } |
| ``` |
|
|