updated README.md

a353dd1 verified 6 months ago

3.76 kB

	---
	base_model: Qwen/Qwen2.5-Coder-32B-Instruct
	tags:
	- Rust
	- Hyperswitch
	- LoRA
	- CPT
	- Fine-Tuned
	- Causal-LM
	pipeline_tag: text-generation
	language:
	- en
	---

	# Qwen2.5-Coder-32B-Instruct-CPT-LoRA-Adapter-HyperSwitch

	A LoRA fine-tuned model based on Qwen/Qwen2.5-Coder-32B-Instruct specialized for the [Hyperswitch](https://github.com/juspay/hyperswitch) Rust codebase. This model excels at understanding payment processing patterns, Hyperswitch architecture, and Rust development practices.

	## 🎯 Model Description

	This LoRA adapter was trained on 16,731 samples extracted from the Hyperswitch codebase to enhance code understanding, explanation, and generation within the payment processing domain.

	- Base Model: Qwen/Qwen2.5-Coder-32B-Instruct
	- Training Type: Causal Language Modeling (CLM) with LoRA
	- Domain: Payment Processing, Rust Development
	- Specialization: Hyperswitch codebase patterns and architecture

	## 📊 Training Details

	### Dataset Composition
	- Total Samples: 16,731
	- File-level samples: 2,120 complete files
	- Granular samples: 14,611 extracted components
	- Functions: 4,121
	- Structs: 5,710
	- Traits: 223
	- Implementations: 4,296
	- Modules: 261

	### LoRA Configuration
	```yaml
	r: 64 # LoRA rank
	alpha: 128 # LoRA alpha (2*r)
	dropout: 0.05 # LoRA dropout
	target_modules: # Applied to all linear layers
	- q_proj, k_proj, v_proj, o_proj
	- gate_proj, up_proj, down_proj
	```

	### Training Hyperparameters
	- Epochs: 5
	- Batch Size: 2 per device (16 effective with gradient accumulation)
	- Learning Rate: 5e-5 (cosine schedule)
	- Max Context: 8,192 tokens
	- Hardware: 2x NVIDIA H200 (80GB each)
	- Training Time: ~4 hours (2,355 steps)

	### Training Results
	```
	Final Loss: 0.48 (from 1.63)
	Perplexity: 1.59 (from 5.12)
	Accuracy: 89% (from 61%)
	```

	## 🚀 Usage

	### Quick Start
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel
	import torch

	# Load base model
	base_model = AutoModelForCausalLM.from_pretrained(
	"Qwen/Qwen2.5-Coder-32B-Instruct",
	dtype=torch.bfloat16,
	device_map="auto"
	)

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-32B-Instruct")

	# Load LoRA adapter
	model = PeftModel.from_pretrained(base_model, "juspay/Qwen2.5-Coder-32B-Instruct-CPT-LoRA-Adapter-HyperSwitch")

	# Generate code
	prompt = """// Hyperswitch payment processing
	pub fn validate_payment_method("""

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(
	**inputs,
	max_new_tokens=200,
	temperature=0.2, # Lower temperature for code generation
	do_sample=True,
	pad_token_id=tokenizer.eos_token_id
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### Recommended Settings
	- Temperature: 0.2-0.3 for code generation
	- Temperature: 0.5-0.7 for explanations and documentation
	- Max tokens: 512-1024 for most tasks

	## 🛠️ Technical Specifications

	- Context Window: 8,192 tokens
	- Precision: bfloat16
	- Memory Usage: ~78GB VRAM (32B base model)
	- Inference Speed: Optimized with Flash Attention 2



	## 🙏 Acknowledgments

	- Qwen Team for the excellent Qwen2.5-Coder base model
	- Hyperswitch Team for the open-source payment processing platform
	- Hugging Face for the transformers and PEFT libraries

	## 📞 Citation

	```bibtex
	@misc{hyperswitch-qwen-lora-2024,
	title={Qwen2.5-Coder-32B-Instruct-CPT-LoRA-Adapter-HyperSwitch},
	author={Juspay},
	year={2024},
	publisher={Hugging Face},
	url={https://huggingface.co/juspay/Qwen2.5-Coder-32B-Instruct-CPT-LoRA-Adapter-HyperSwitch}
	}
	```