Instructions to use OpenGVLab/VideoChat-TPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OpenGVLab/VideoChat-TPO with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("OpenGVLab/VideoChat-TPO", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| base_model: | |
| - mistralai/Mistral-7B-Instruct-v0.2 | |
| library_name: transformers | |
| license: mit | |
| pipeline_tag: video-text-to-text | |
| # VideoChat2-TPO | |
| This model is based on the paper [Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment](https://huggingface.co/papers/2412.19326). | |
| ## 🏃 Installation | |
| ``` | |
| pip install -r requirements.txt | |
| python app.py | |
| ``` | |
| ## 🔧 Usage | |
| ``` | |
| from transformers import AutoModel, AutoTokenizer | |
| from tokenizer import MultimodalLlamaTokenizer | |
| model_path = "OpenGVLab/VideoChat-TPO" | |
| tokenizer = AutoTokenizer.from_pretrained(model_path, | |
| trust_remote_code=True, | |
| use_fast=False,) | |
| model = AutoModel.from_pretrained(model_path, trust_remote_code=True, _tokenizer=self.tokenizer).eval() | |
| ``` |