数据来源SuperCLUE,基于中文语言理解测评基准CLUE,本AI大模型榜单排行榜排名数据仅供参考!
排名 | 机构&模型 | 开源 闭源 | 总分 | 数学 推理 | 科学 推理 | 代码 生成 | 智能体 Agent | 指令 遵循 | 幻觉 控制 |
---|---|---|---|---|---|---|---|---|---|
OpenAI:GPT-5(high) | 闭源 | 75.34 | 76.86 | 54.2 | 87.92 | 83.21 | 66.77 | 83.08 | |
OpenAI:o3(high) | 闭源 | 73.78 | 81.6 | 59.7 | 83.76 | 76.12 | 64.09 | 77.41 | |
OpenAI:o4-mini(high) | 闭源 | 73.32 | 76.34 | 55.56 | 86.14 | 86.36 | 59.64 | 75.9 | |
深度求索:DeepSeek-V3.1(Thinking) | 开源 | 69.56 | 70.83 | 47.41 | 83.56 | 86.19 | 47.18 | 82.19 | |
Google:Gemini-2.5-Pro | 闭源 | 68.98 | 74.05 | 52.59 | 82.38 | 82.09 | 39.76 | 83 | |
字节跳动:Doubao-Seed-1.6-thinking-250715 | 闭源 | 68.04 | 70.77 | 40.49 | 84.36 | 90.67 | 32.34 | 89.59 | |
Anthropic:Claude-Opus-4-Reasoning | 闭源 | 67.02 | 60.16 | 45.93 | 80.4 | 80.6 | 44.81 | 90.24 | |
深度求索:DeepSeek-R1-0528 | 开源 | 66.15 | 75 | 48.15 | 74.06 | 82.09 | 35.31 | 82.28 | |
Google:Gemini-2.5-Flash | 闭源 | 64.96 | 65.89 | 44.44 | 73.86 | 82.09 | 34.12 | 89.36 | |
X.AI:grok-4 | 闭源 | 64.84 | 69.11 | 44.44 | 82.18 | 83.71 | 29.97 | 79.65 | |
阿里巴巴:Qwen3-235B-A22B-Thinking-2507 | 开源 | 64.34 | 68.7 | 42.22 | 81.78 | 74.25 | 43.62 | 75.44 | |
腾讯:Hunyuan-T1-20250711 | 闭源 | 63.73 | 68.7 | 37.78 | 73.27 | 76.49 | 42.73 | 83.39 | |
智谱AI:GLM-4.5 | 开源 | 63.25 | 66.67 | 44.62 | 79.01 | 83.58 | 27.22 | 78.42 | |
阿里巴巴:Qwen3-235B-A22B-Instruct-2507 | 开源 | 60.79 | 64.57 | 41.48 | 78.42 | 80.97 | 24.93 | 74.4 | |
商汤:SenseNova V6 Reasoner | 闭源 | 60.73 | 63.36 | 40 | 80.2 | 83.58 | 21.66 | 75.57 | |
OpenAI:gpt-oss-120b | 开源 | 59.35 | 67.18 | 47.41 | 72.28 | 72.76 | 41.25 | 55.23 | |
百度:ERNIE-X1-Turbo-32K-Preview | 闭源 | 58.84 | 55.12 | 35.61 | 73.66 | 72.01 | 31.16 | 85.51 | |
阿里巴巴:Qwen3-235B-A22B(Thinking) | 开源 | 58.01 | 63.36 | 40 | 77.23 | 83.21 | 17.51 | 66.74 | |
深度求索:DeepSeek-V3-0324 | 开源 | 57.46 | 52.67 | 36.67 | 72.48 | 81.34 | 19.88 | 81.71 | |
月之暗面:kimi-k2-0711-preview | 开源 | 56.9 | 53.54 | 37.97 | 80 | 76.87 | 19.29 | 73.7 | |
360:360zhinao2-o1.5 | 闭源 | 55.44 | 46.56 | 31.11 | 76.63 | 75.75 | 28.49 | 74.11 | |
阿里巴巴:Qwen3-32B(Thinking) | 开源 | 55.28 | 57.25 | 34.07 | 76.24 | 78.36 | 18.99 | 66.78 | |
OpenAI:gpt-oss-20b | 开源 | 54.37 | 62.6 | 38.52 | 72.67 | 68.28 | 35.31 | 48.86 | |
Mistral AI:Magistral-medium-2506 | 闭源 | 54.31 | 55.77 | 35.11 | 75.45 | 61.94 | 23.8 | 73.78 | |
阿里巴巴:Qwen3-14B(Thinking) | 开源 | 53.94 | 57.25 | 29.63 | 72.48 | 77.24 | 18.99 | 68.07 | |
OpenAI:ChatGPT-4o-latest | 闭源 | 52.46 | 29.77 | 26.67 | 78.81 | 81.72 | 19.58 | 78.18 | |
MiniMax:MiniMax-M1 | 开源 | 51.07 | 53.72 | 34.96 | 75.84 | 54.37 | 16.67 | 70.84 | |
阿里巴巴:Qwen3-8B(Thinking) | 开源 | 48.38 | 47.33 | 18.52 | 65.54 | 77.24 | 14.84 | 66.78 | |
阿里巴巴:Qwen3-30B-A3B(Thinking) | 开源 | 47.51 | 55.73 | 28.15 | 71.49 | 45.52 | 16.91 | 67.26 | |
Meta:Llama-4-Maverick-17B-128E-Instruct-FP8 | 开源 | 46.37 | 18.32 | 20 | 72.48 | 70.15 | 21.96 | 75.29 | |
科大讯飞:Spark X1 | 闭源 | 45.83 | 46.56 | 16.3 | 65.54 | 60.07 | 14.24 | 72.29 | |
阶跃星辰:Step-2-16k | 闭源 | 43.45 | 7.63 | 11.11 | 65.94 | 78.73 | 14.24 | 83.03 | |
阿里巴巴:Qwen3-4B(Thinking) | 开源 | 39.21 | 49.62 | 15.93 | 56.04 | 33.21 | 14.24 | 66.23 | |
Meta:Llama-3.3-70B-Instruct | 开源 | 38.84 | 7.63 | 10.37 | 64.16 | 60.82 | 18.99 | 71.07 | |
科大讯飞:Spark4.0 Ultra | 闭源 | 36.7 | 19.38 | 8.89 | 56.04 | 46.64 | 12.46 | 76.81 | |
Google:Gemma-3-27b-it | 开源 | 34.33 | 19.08 | 7.41 | 58.37 | 36.57 | 12.46 | 72.06 | |
智谱AI:GLM-Z1-9B-0414 | 开源 | 32.93 | 54.2 | 20 | 28.51 | 46.64 | 10.9 | 37.32 | |
Google:Gemma-3-12b-it | 开源 | 32.47 | 11.45 | 5.93 | 53.47 | 44.03 | 11.28 | 68.69 | |
Microsoft:Phi-4 | 开源 | 31.73 | 16.79 | 13.33 | 59.01 | 36.94 | 6.82 | 57.51 | |
面壁智能:MiniCPM4-8B | 开源 | 26.74 | 7.63 | 4.07 | 42.77 | 39.55 | 5.34 | 61.05 | |
Mistral AI:Ministral-8B-latest | 开源 | 25.01 | 6.87 | 2.96 | 39.41 | 30.6 | 2.99 | 67.22 | |
Meta:Llama-3.1-8B-Instruct | 开源 | 21.81 | 1.53 | 1.48 | 37.03 | 25.37 | 5.04 | 60.39 | |
阿里巴巴:Qwen3-1.7B(Thinking) | 开源 | 21.8 | 30.53 | 4.44 | 34.46 | 0 | 6.53 | 54.82 | |
Google:Gemma-3-4b-it | 开源 | 20.21 | 6.87 | 0.74 | 38.42 | 5.06 | 8.04 | 62.12 | |
Google:Gemma-3n-E4B-it | 开源 | 19.7 | 6.87 | 1.48 | 36.24 | 1.19 | 10.09 | 62.3 | |
零一万物:Yi-1.5-9B-Chat-16K | 开源 | 19.55 | 3.82 | 0.74 | 22.57 | 24.63 | 4.45 | 61.08 | |
Google:Gemma-3n-E2B-it | 开源 | 15.85 | 6.11 | 0.74 | 22.77 | 0 | 8.01 | 57.48 | |
阿里巴巴:Qwen3-0.6B(Thinking) | 开源 | 12.4 | 9.16 | 1.48 | 11.49 | 0.75 | 2.67 | 48.88 | |
Google:Gemma-3-1b-it | 开源 | 10.36 | 0 | 0 | 11.49 | 0 | 3.28 | 47.37 |