tool calling failed with claude code

#30
by weisunding - opened

The suggested tool calling with vLLM failed, specified the shipped chat template seems work.

    --enable-auto-tool-choice \
    --reasoning-parser qwen3 \
    --tool-call-parser qwen3_coder \
    --chat-template /chat/qwen-3.6.jinja

Streaming failed, falling back to non-streaming: API error: Streaming API error 400 Bad Request: {"error":{"message":"Can only get item pairs from a mapping.","type":"BadRequestError","param":null,"code":400}}

✗ error: API error: Non-streaming API error 400 Bad Request: {"error":{"message":"Can only get item pairs from a mapping.","type":"BadRequestError","param":null,"code":400}}

update, using latest vllm seems worked!

sudo docker pull vllm/vllm-openai:latest

vllm serve Qwen/Qwen3.6-35B-A3B \
    --served-model-name beast \
    --api-key secret \
    --host 0.0.0.0 \
    --port 8000 \
    --tensor-parallel-size 2 \
    --kv-cache-dtype fp8 \
    --max-num-seqs 32 \
    --max-model-len 204800 \
    --gpu-memory-utilization 0.85 \
    --max-num-batched-tokens 8192 \
    --disable-custom-all-reduce \
    --enable-prefix-caching \
    --enable-chunked-prefill \
    --trust-remote-code \
    --speculative-config '{"method":"mtp","num_speculative_tokens":2}' \
    --enable-auto-tool-choice \
    --reasoning-parser qwen3 \
    --tool-call-parser qwen3_coder

Sign up or log in to comment