tool calling failed with claude code
#30
by weisunding - opened
The suggested tool calling with vLLM failed, specified the shipped chat template seems work.
--enable-auto-tool-choice \
--reasoning-parser qwen3 \
--tool-call-parser qwen3_coder \
--chat-template /chat/qwen-3.6.jinja
Streaming failed, falling back to non-streaming: API error: Streaming API error 400 Bad Request: {"error":{"message":"Can only get item pairs from a mapping.","type":"BadRequestError","param":null,"code":400}}
â error: API error: Non-streaming API error 400 Bad Request: {"error":{"message":"Can only get item pairs from a mapping.","type":"BadRequestError","param":null,"code":400}}
update, using latest vllm seems worked!
sudo docker pull vllm/vllm-openai:latest
vllm serve Qwen/Qwen3.6-35B-A3B \
--served-model-name beast \
--api-key secret \
--host 0.0.0.0 \
--port 8000 \
--tensor-parallel-size 2 \
--kv-cache-dtype fp8 \
--max-num-seqs 32 \
--max-model-len 204800 \
--gpu-memory-utilization 0.85 \
--max-num-batched-tokens 8192 \
--disable-custom-all-reduce \
--enable-prefix-caching \
--enable-chunked-prefill \
--trust-remote-code \
--speculative-config '{"method":"mtp","num_speculative_tokens":2}' \
--enable-auto-tool-choice \
--reasoning-parser qwen3 \
--tool-call-parser qwen3_coder