Chinese Domestic LLM Performance Comparison
好的模型,现在就是需要 2 到七八秒的返回。
wxside gateway: model latency sample (successful requests)
Against https://api.wxside.com OpenAI-compatible API, one POST /v1/chat/completions per model; prompt: “Reply with exactly one English word: ok”, max_tokens=128. A single keep-alive TCP/TLS connection; requests run sequentially.
Timing: End-to-end is measured on the client; gateway is the X-Response-Time-Ms header (includes upstream wait); rest ≈ end-to-end minus gateway (local stack and network—indicative only).
Environment (from headers): gateway 0.56.2-rc1; sample time around 2026-04-21 22:56 CST. One-off sample, not a load test.
| Model | E2E (ms) | Gateway (ms) | Rest ≈ (ms) |
|---|---|---|---|
| qwen-turbo | 297 | 281 | 16 |
| qwen3-coder-480b-a35b-instruct | 339 | 316 | 23 |
| doubao-seed-1.6-flash | 591 | 566 | 25 |
| doubao-1.5-pro-32k | 603 | 579 | 24 |
| deepseek/deepseek-v3.2-251201 | 1207 | 1185 | 22 |
| qwen3-max | 1916 | 1892 | 24 |
| minimax/minimax-m2.7 | 2318 | 2297 | 21 |
| moonshotai/kimi-k2.5 | 2774 | 2752 | 22 |
| doubao-seed-1.6 | 3437 | 3399 | 38 |
| doubao-seed-2.0-code | 4097 | 3985 | 112 |
| z-ai/glm-5 | 5389 | 5367 | 22 |
| z-ai/glm-5.1 | 5834 | 5815 | 19 |
| doubao-seed-2.0-pro | 6362 | 6320 | 42 |
| moonshotai/kimi-k2.6 | 9759 | 9736 | 23 |
Single sample; latency depends on load and output length. Some models return reasoning traces, so latency can exceed what a one-token reply might suggest.