Instructions to use Subh775/Perception-moondream2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Subh775/Perception-moondream2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Subh775/Perception-moondream2", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Subh775/Perception-moondream2", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Subh775/Perception-moondream2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Subh775/Perception-moondream2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Subh775/Perception-moondream2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Subh775/Perception-moondream2
- SGLang
How to use Subh775/Perception-moondream2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Subh775/Perception-moondream2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Subh775/Perception-moondream2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Subh775/Perception-moondream2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Subh775/Perception-moondream2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Subh775/Perception-moondream2 with Docker Model Runner:
docker model run hf.co/Subh775/Perception-moondream2
Perception-moondream2
Perception-moondream2 is a specialized Vision-Language Model (VLM) fine-tuned for dense urban traffic scene understanding. Built on top of the highly efficient moondream2 architecture, this model is designed to analyze CCTV and traffic camera feeds to generate highly detailed, comprehensive textual descriptions of traffic conditions.
Model Details
- Base Model: vikhyatk/moondream2 (Revision: 2024-08-26)
- Architecture: Vision Encoder + Phi-1.5 Text Decoder
- Task: Dense Image Captioning & Visual Question Answering (VQA)
- Language: English
Training Data
The model was fine-tuned on the Subh775/Traffic-Perception-VL dataset. This dataset consists of complex, real-world urban traffic scenes (such as bustling streets in Bengaluru, India).
The training focused on teaching the model to accurately perceive and describe:
- Vehicle Types & Colors: Identifying auto-rickshaws, scooters, motorcycles, and cars.
- Traffic Density & Flow: Estimating congestion levels and movement.
- Pedestrian Activity: Tracking people walking on sidewalks or crossing streets.
- Infrastructure: Recognizing road layouts, lanes, shops, signage, and greenery.
Intended Use Cases
- Smart City Analytics: Automated monitoring of CCTV feeds to detect congestion or accidents.
- Traffic Management: Generating real-time text logs of intersection activity.
- Autonomous Driving Context: Providing dense contextual descriptions for self-driving datasets.
Usage
Because this model relies on the custom Moondream2 architecture, you will need to use trust_remote_code=True when loading it via the transformers library.
Prerequisites
Make sure you have the required libraries installed:
!pip install transformers==4.44.2 "huggingface_hub<1.0" accelerate pillow einops
Load Tokenizer & Model
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
import requests
model_id = "Subh775/Perception-moondream2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.float16,
# REMOVED device_map="auto"
)
# move to the GPU
model = model.to("cuda")
model.eval()
Inference
image_path = "/content/100130.jpg"
image = Image.open(image_path).convert("RGB")
enc_image = model.encode_image(image)
# Give it explicit instructions & explicitly ban the geographic bias.
prompt = (
"Describe this traffic scene in detail. Focus strictly on the vehicles, "
"pedestrians, infrastructure, and traffic density. Do not mention Bengaluru, "
"India, or any specific geographic locations."
)
answer = model.answer_question(enc_image, prompt, tokenizer)
banned_phrases = ["in Bengaluru, India", "in Bengaluru", "Bengaluru, India,", "Bengaluru,"]
for banned in banned_phrases:
answer = answer.replace(banned, "")
print(answer.strip())
- Downloads last month
- 114
Model tree for Subh775/Perception-moondream2
Base model
vikhyatk/moondream2