Instructions to use Cedille/de-anna with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Cedille/de-anna with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Cedille/de-anna")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Cedille/de-anna") model = AutoModelForCausalLM.from_pretrained("Cedille/de-anna") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Cedille/de-anna with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Cedille/de-anna" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Cedille/de-anna", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Cedille/de-anna
- SGLang
How to use Cedille/de-anna with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Cedille/de-anna" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Cedille/de-anna", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Cedille/de-anna" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Cedille/de-anna", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Cedille/de-anna with Docker Model Runner:
docker model run hf.co/Cedille/de-anna
Cedille AI
Cedille is a project to bring large language models to non-English languages.
de-anna
Anna is a 6B parameter autoregressive language model based on the GPT-J architecture and trained using the mesh-transformer-jax codebase.
Anna was trained on German text with a similar methodology to Boris, our French model. We started training from GPT-J, which has been trained on The Pile. As a consequence the model still has good performance in English language. Anna makes use of the unmodified GPT-2 tokenizer.
How to run
Loading the model
Base (requires 48+ GB of RAM)
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Cedille/de-anna")
model = AutoModelForCausalLM.from_pretrained("Cedille/de-anna")
Lower memory usage
Loading a model with Huggingface requires two copies of the weights, so 48+ GB of RAM for GPT-J models in float32 precision. The first trick would be to load the model with the specific argument below to load only one copy of the weights.
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Cedille/de-anna")
model = AutoModelForCausalLM.from_pretrained("Cedille/de-anna", low_cup_mem_usage=True)
We are planning on adding an fp16 branch soon. Combined with the lower memory loading above, loading could be done on 12.1GB of RAM.
Generation example
model.eval()
input_sentence = "Wo hast du unsere Sprache gelernt?"
input_ids = tokenizer.encode(input_sentence, return_tensors='pt')
beam_outputs = model.generate(
input_ids,
max_length=100,
do_sample=True,
top_k=50,
top_p=0.95,
num_return_sequences=1
)
print(tokenizer.decode(beam_outputs[0], skip_special_tokens=True))
Contact us
For any custom development please contact us at hello@cedille.ai.
Links
- Downloads last month
- 7