Instructions to use Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2") model = AutoModelForCausalLM.from_pretrained("Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2
- SGLang
How to use Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2 with Docker Model Runner:
docker model run hf.co/Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2
Is there a way to deploy using a Hugging Face inference endpoint?
Looking for an easy way to spool up briefly for testing! Could also be another deployment method.
Howdy @iyodev :
A) I've just enabled hosted inference.
- Will only work with an access token (and if you've paid and gotten access - which you have, I believe!)
- The free inference won't work because the model is larger than 10 GB in size.
B) Deploying on runpod is probably an easy option. You can try this template. Make sure to read the ReadMe and use an access token.
Thanks Ronan! Do you happen to have an example of how to use the hosted inference for this large of a model? Or do you mean I should create an inference endpoint?
Yeah the hosted inference won't work for models bigger than 10 GB . I think it maybe works if you have a HuggingFace paid plan, but I'm not sure.
Yes, create an inference endpoint, or deploy using runpod.
Thank you, the inference endpoint is working for me. Do you happen to know if any frameworks like langchain, llamaindex, litellm, etc already have the necessary formatting for function calling baked in?
unfortunately, I don't know @iyodev , but appreciate you keeping me posted if you gain any insights. Any learnings can inform a v3 for function calling.