Instructions to use microsoft/phi-1_5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/phi-1_5 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="microsoft/phi-1_5")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5") model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use microsoft/phi-1_5 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "microsoft/phi-1_5" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/phi-1_5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/microsoft/phi-1_5
- SGLang
How to use microsoft/phi-1_5 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "microsoft/phi-1_5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/phi-1_5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "microsoft/phi-1_5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/phi-1_5", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use microsoft/phi-1_5 with Docker Model Runner:
docker model run hf.co/microsoft/phi-1_5
Weights not used when initializing the model
Started getting this error today after some changes were made to the phi model. The model does not use all the weights from the checkpoint.
In [2]: model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5", torch_dtype="auto", device_map="cuda", trust_remote_code=True)
pytorch_model.bin: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2.84G/2.84G [00:24<00:00, 116MB/s]
Some weights of the model checkpoint at microsoft/phi-1_5 were not used when initializing PhiForCausalLM: ['model.layers.11.self_attn.q_proj.bias', 'model.layers.22.self_attn.k_proj.bias', 'model.layers.12.self_attn.q_proj.bias', 'model.layers.9.self_attn.k_proj.bias', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.22.self_attn.q_proj.bias', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.2.self_attn.k_proj.bias', 'model.layers.22.self_attn.v_proj.bias', 'model.layers.15.self_attn.v_proj.bias', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.9.self_attn.q_proj.bias', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.0.self_attn.q_proj.bias', 'model.layers.3.self_attn.k_proj.bias', 'model.layers.8.self_attn.q_proj.bias', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.9.self_attn.v_proj.bias', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.1.self_attn.k_proj.weight', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.12.self_attn.k_proj.bias', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.17.self_attn.k_proj.bias', 'model.layers.7.self_attn.v_proj.bias', 'model.layers.13.self_attn.v_proj.bias', 'model.layers.20.self_attn.q_proj.bias', 'model.layers.9.self_attn.v_proj.weight', 'model.layers.4.self_attn.v_proj.bias', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.17.self_attn.q_proj.bias', 'model.layers.16.self_attn.q_proj.bias', 'model.layers.19.self_attn.v_proj.bias', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.5.self_attn.q_proj.bias', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.6.self_attn.k_proj.bias', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.7.self_attn.q_proj.bias', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.21.self_attn.k_proj.bias', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.10.self_attn.v_proj.bias', 'model.layers.16.self_attn.k_proj.bias', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.19.self_attn.q_proj.bias', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.18.self_attn.q_proj.bias', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.11.self_attn.v_proj.bias', 'model.layers.6.self_attn.q_proj.bias', 'model.layers.18.self_attn.v_proj.bias', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.1.self_attn.v_proj.bias', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.5.self_attn.v_proj.bias', 'model.layers.14.self_attn.k_proj.bias', 'model.layers.1.self_attn.q_proj.bias', 'model.layers.20.self_attn.v_proj.bias', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.0.self_attn.k_proj.weight', 'model.layers.8.self_attn.v_proj.bias', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.13.self_attn.q_proj.bias', 'model.layers.15.self_attn.k_proj.bias', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.19.self_attn.k_proj.bias', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.0.self_attn.k_proj.bias', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.12.self_attn.v_proj.bias', 'model.layers.16.self_attn.v_proj.bias', 'model.layers.1.self_attn.q_proj.weight', 'model.layers.17.self_attn.v_proj.bias', 'model.layers.21.self_attn.q_proj.bias', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.3.self_attn.q_proj.bias', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.0.self_attn.q_proj.weight', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.10.self_attn.q_proj.bias', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.23.self_attn.q_proj.bias', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.8.self_attn.k_proj.bias', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.2.self_attn.q_proj.bias', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.11.self_attn.k_proj.bias', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.0.self_attn.v_proj.weight', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.18.self_attn.k_proj.bias', 'model.layers.14.self_attn.v_proj.bias', 'model.layers.15.self_attn.q_proj.bias', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.13.self_attn.k_proj.bias', 'model.layers.7.self_attn.k_proj.bias', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.21.self_attn.v_proj.bias', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.23.self_attn.k_proj.bias', 'model.layers.6.self_attn.v_proj.bias', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.20.self_attn.k_proj.bias', 'model.layers.1.self_attn.k_proj.bias', 'model.layers.10.self_attn.k_proj.bias', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.23.self_attn.v_proj.bias', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.2.self_attn.v_proj.bias', 'model.layers.14.self_attn.q_proj.bias', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.3.self_attn.v_proj.bias', 'model.layers.4.self_attn.q_proj.bias', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.4.self_attn.k_proj.bias', 'model.layers.0.self_attn.v_proj.bias', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.5.self_attn.k_proj.bias']
- This IS expected if you are initializing PhiForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing PhiForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of PhiForCausalLM were not initialized from the model checkpoint at microsoft/phi-1_5 and are newly initialized: ['model.layers.12.self_attn.query_key_value.weight', 'model.layers.7.self_attn.query_key_value.weight', 'model.layers.15.self_attn.query_key_value.bias', 'model.layers.18.self_attn.query_key_value.bias', 'model.layers.21.self_attn.query_key_value.weight', 'model.layers.6.self_attn.query_key_value.bias', 'model.layers.18.self_attn.query_key_value.weight', 'model.layers.17.self_attn.query_key_value.weight', 'model.layers.4.self_attn.query_key_value.bias', 'model.layers.4.self_attn.query_key_value.weight', 'model.layers.8.self_attn.query_key_value.weight', 'model.layers.16.self_attn.query_key_value.bias', 'model.layers.19.self_attn.query_key_value.weight', 'model.layers.21.self_attn.query_key_value.bias', 'model.layers.7.self_attn.query_key_value.bias', 'model.layers.3.self_attn.query_key_value.weight', 'model.layers.2.self_attn.query_key_value.bias', 'model.layers.17.self_attn.query_key_value.bias', 'model.layers.9.self_attn.query_key_value.weight', 'model.layers.13.self_attn.query_key_value.weight', 'model.layers.6.self_attn.query_key_value.weight', 'model.layers.1.self_attn.query_key_value.weight', 'model.layers.22.self_attn.query_key_value.weight', 'model.layers.2.self_attn.query_key_value.weight', 'model.layers.23.self_attn.query_key_value.bias', 'model.layers.0.self_attn.query_key_value.bias', 'model.layers.15.self_attn.query_key_value.weight', 'model.layers.10.self_attn.query_key_value.weight', 'model.layers.23.self_attn.query_key_value.weight', 'model.layers.0.self_attn.query_key_value.weight', 'model.layers.5.self_attn.query_key_value.weight', 'model.layers.5.self_attn.query_key_value.bias', 'model.layers.22.self_attn.query_key_value.bias', 'model.layers.11.self_attn.query_key_value.bias', 'model.layers.10.self_attn.query_key_value.bias', 'model.layers.19.self_attn.query_key_value.bias', 'model.layers.14.self_attn.query_key_value.weight', 'model.layers.8.self_attn.query_key_value.bias', 'model.layers.20.self_attn.query_key_value.bias', 'model.layers.9.self_attn.query_key_value.bias', 'model.layers.16.self_attn.query_key_value.weight', 'model.layers.14.self_attn.query_key_value.bias', 'model.layers.12.self_attn.query_key_value.bias', 'model.layers.20.self_attn.query_key_value.weight', 'model.layers.13.self_attn.query_key_value.bias', 'model.layers.3.self_attn.query_key_value.bias', 'model.layers.11.self_attn.query_key_value.weight', 'model.layers.1.self_attn.query_key_value.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
generation_config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 74.0/74.0 [00:00<00:00, 70.9kB/s]
Phi in transformers library has different "architecture": for one instead of q_proj/k_proj they have single quiery_key_value
Simplest solution is to change config to use provided files
This worked for me; some steps might be redundant:
- I've copied modeling_phi to modeling_phi_1_5, configuration_phi to configuration_phi_1_5.py to prevent filename collision with transformers if it checks
- Added this into config.json:
"auto_map": {
"AutoConfig": "configuration_phi_1_5.PhiConfig",
"AutoModelForCausalLM": "modeling_phi_1_5.PhiForCausalLM"
},
- Changed
model_typeto"model_type": "phi_1_5"(I think without this change transformers didn't try to load custom_code) - Changed
architecturesto "PhiForCausalLM_1_5" (I didn't change the .py file beyond renaming)
After that changes model loaded successfully.
Write a detailed analogy between mathematics and a lighthouse.
Answer: Mathematics is like a lighthouse, guiding us through the complex world of numbers and calculations. It illumin<MAX_NEW_TOKENS_REACHED>
(Interestingly even with do_sample=False I get different result from model card: Mathematics is like a lighthouse, guiding us through the vast ocean of numbers and calculations. Just as a lighthouse illuminates....)
Hello @nihalnayak !
We just pushed a fix to the config.json and it should work now. The auto_map key was missing and hence it was not properly using the files on this repository when trust_remote_code=True.
Best regards,
Gustavo.