Instructions to use Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2")
model = AutoModelForCausalLM.from_pretrained("Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2

SGLang

How to use Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2 with Docker Model Runner:
```
docker model run hf.co/Trelis/CodeLlama-34b-Instruct-hf-function-calling-v2
```

Is there a way to deploy using a Hugging Face inference endpoint?

by iyodev - opened Nov 3, 2023

Discussion

iyodev

Nov 3, 2023

Looking for an easy way to spool up briefly for testing! Could also be another deployment method.

RonanMcGovern

Trelis org Nov 4, 2023

Howdy @iyodev :

A) I've just enabled hosted inference.

Will only work with an access token (and if you've paid and gotten access - which you have, I believe!)
The free inference won't work because the model is larger than 10 GB in size.

B) Deploying on runpod is probably an easy option. You can try this template. Make sure to read the ReadMe and use an access token.

iyodev

Nov 6, 2023

•

edited Nov 6, 2023

Thanks Ronan! Do you happen to have an example of how to use the hosted inference for this large of a model? Or do you mean I should create an inference endpoint?

RonanMcGovern

Trelis org Nov 7, 2023

Yeah the hosted inference won't work for models bigger than 10 GB . I think it maybe works if you have a HuggingFace paid plan, but I'm not sure.

Yes, create an inference endpoint, or deploy using runpod.

iyodev

Nov 9, 2023

Thank you, the inference endpoint is working for me. Do you happen to know if any frameworks like langchain, llamaindex, litellm, etc already have the necessary formatting for function calling baked in?

RonanMcGovern

Trelis org Nov 9, 2023

unfortunately, I don't know @iyodev , but appreciate you keeping me posted if you gain any insights. Any learnings can inform a v3 for function calling.

RonanMcGovern changed discussion status to closed Nov 16, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment