- "headers": [
- "Model",
- "Average β¬οΈ",
- "Precision",
- "# Parameters",
- "Multilingual",
- "Model Type",
- "Submission",
- "Submission Date",
- "π Cultural Knowledge",
- "ποΈ Classical NLP",
- "π Reading Comprehension",
- "π’ Generation",
- "Incomplete"
- "data": [
- [
- "<a target="_blank" href="https://platform.openai.com/docs/models" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">gpt-4o-2024-08-06</a>",
- 74.27,
- "?",
- -1,
- "π’ Multilingual",
- "β Unknown",
- false,
- "",
- 73.29,
- 89.03,
- 80.12,
- 54.65,
- true
- [
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8</a>",
- 67.67,
- "?",
- 400,
- "π’ Multilingual",
- "β¦οΈ Preference-aligned",
- false,
- "",
- 76.75,
- 87.28,
- 72.99,
- 33.67,
- true
- [
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-4-Scout-17B-16E-Instruct</a>",
- 63.2,
- "?",
- 109,
- "π’ Multilingual",
- "β¦οΈ Preference-aligned",
- false,
- "",
- 74.31,
- 87.88,
- 70.86,
- 19.75,
- true
- [
- "<a target="_blank" href="https://huggingface.co/Qwen/Qwen2.5-72B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen/Qwen2.5-72B-Instruct</a>",
- 63.08,
- "?",
- 72,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 73.11,
- 88.6,
- 75.62,
- 14.98,
- true
- [
- "<a target="_blank" href="https://huggingface.co/Tower-Babel/Babel-83B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Tower-Babel/Babel-83B</a>",
- 61.13,
- "?",
- 83,
- "π’ Multilingual",
- "π₯ Base",
- false,
- "",
- 74.96,
- 89.2,
- 64.25,
- 16.11,
- true
- [
- "<a target="_blank" href="https://huggingface.co/aisingapore/Llama-SEA-LION-v3-70B-IT" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">aisingapore/Llama-SEA-LION-v3-70B-IT</a>",
- 61.07,
- "?",
- 70,
- "π΅ SEA-Focused",
- "β SFT",
- false,
- "",
- 76.78,
- 89.99,
- 53.56,
- 23.95,
- true
- [
- "<a target="_blank" href="https://huggingface.co/Tower-Babel/Babel-83B-Chat" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Tower-Babel/Babel-83B-Chat</a>",
- 60.85,
- "?",
- 83,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 75.21,
- 88.81,
- 64.85,
- 14.53,
- true
- [
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-3.1-70B-Instruct</a>",
- 59.66,
- "bfloat16",
- 70,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 72.16,
- 90.27,
- 52.17,
- 24.03,
- true
- [
- "<a target="_blank" href="https://huggingface.co/sail/Sailor2-20B-Chat" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">sail/Sailor2-20B-Chat</a>",
- 58.61,
- "bfloat16",
- 20,
- "π΅ SEA-Focused",
- "β¦οΈ Preference-aligned",
- false,
- "",
- 66.43,
- 89.03,
- 63.03,
- 15.95,
- true
- [
- "<a target="_blank" href="https://huggingface.co/Qwen/Qwen2.5-32B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen/Qwen2.5-32B-Instruct</a>",
- 57.88,
- "bfloat16",
- 32,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 66.83,
- 89.32,
- 70.59,
- 4.79,
- true
- [
- "<a target="_blank" href="https://huggingface.co/Qwen/Qwen3-32B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen/Qwen3-32B</a>",
- 57.65,
- "?",
- 32,
- "π’ Multilingual",
- "β¦οΈ Preference-aligned",
- false,
- "",
- 71.29,
- 87.26,
- 65.48,
- 6.58,
- true
- [
- "<a target="_blank" href="https://huggingface.co/aisingapore/gemma2-9b-cpt-sea-lionv3-instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">aisingapore/gemma2-9b-cpt-sea-lionv3-instruct</a>",
- 56.14,
- "bfloat16",
- 9,
- "π΅ SEA-Focused",
- "β SFT",
- false,
- "",
- 64.44,
- 88.55,
- 54.46,
- 17.1,
- true
- [
- "<a target="_blank" href="https://huggingface.co/aisingapore/Llama-SEA-LION-v3.5-70B-R" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">aisingapore/Llama-SEA-LION-v3.5-70B-R</a>",
- 55.86,
- "?",
- 70,
- "π΅ SEA-Focused",
- "β SFT",
- false,
- "",
- 70.4,
- 83.75,
- 54.41,
- 14.88,
- true
- [
- "<a target="_blank" href="https://huggingface.co/google/gemma-2-27b-it" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">google/gemma-2-27b-it</a>",
- 55.22,
- "?",
- 27,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 68.76,
- 87.99,
- 48.77,
- 15.38,
- true
- [
- "<a target="_blank" href="https://huggingface.co/google/gemma-3-27b-it" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">google/gemma-3-27b-it</a>",
- 55.17,
- "?",
- 27,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 71.41,
- 88.61,
- 53.23,
- 7.42,
- true
- [
- "<a target="_blank" href="https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">mistralai/Mixtral-8x22B-Instruct-v0.1</a>",
- 54.28,
- "?",
- 141,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 54.47,
- 87.19,
- 64.78,
- 10.7,
- true
- [
- "<a target="_blank" href="https://huggingface.co/google/gemma-3-12b-it" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">google/gemma-3-12b-it</a>",
- 54.04,
- "?",
- 12,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 67.34,
- 88.56,
- 54.93,
- 5.34,
- true
- [
- "<a target="_blank" href="https://huggingface.co/google/gemma-2-9b-it" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">google/gemma-2-9b-it</a>",
- 53.33,
- "?",
- 9,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 63.69,
- 87.47,
- 50.65,
- 11.51,
- true
- [
- "<a target="_blank" href="https://huggingface.co/Qwen/Qwen3-14B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen/Qwen3-14B</a>",
- 53.23,
- "?",
- 14,
- "π’ Multilingual",
- "β¦οΈ Preference-aligned",
- false,
- "",
- 59.28,
- 83.45,
- 66.11,
- 4.07,
- true
- [
- "<a target="_blank" href="https://huggingface.co/Qwen/Qwen3-8B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen/Qwen3-8B</a>",
- 52.93,
- "?",
- 8,
- "π’ Multilingual",
- "β¦οΈ Preference-aligned",
- false,
- "",
- 59.82,
- 79.76,
- 67.3,
- 4.83,
- true
- [
- "<a target="_blank" href="https://huggingface.co/CohereLabs/c4ai-command-a-03-2025" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">CohereLabs/c4ai-command-a-03-2025</a>",
- 52.82,
- "?",
- 111,
- "π’ Multilingual",
- "β¦οΈ Preference-aligned",
- false,
- "",
- 69.95,
- 88.97,
- 46.82,
- 5.55,
- true
- [
- "<a target="_blank" href="https://huggingface.co/Tower-Babel/Babel-9B-Chat" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Tower-Babel/Babel-9B-Chat</a>",
- 52.75,
- "bfloat16",
- 9,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 60.06,
- 87.67,
- 56.49,
- 6.79,
- true
- [
- "<a target="_blank" href="https://huggingface.co/sail/Sailor2-8B-Chat" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">sail/Sailor2-8B-Chat</a>",
- 52.49,
- "?",
- 8,
- "π΅ SEA-Focused",
- "β¦οΈ Preference-aligned",
- false,
- "",
- 58.94,
- 86.03,
- 50.69,
- 14.29,
- true
- [
- "<a target="_blank" href="https://huggingface.co/Qwen/Qwen2.5-14B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen/Qwen2.5-14B-Instruct</a>",
- 52.41,
- "bfloat16",
- 14,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 59.27,
- 86.27,
- 59.95,
- 4.14,
- true
- [
- "<a target="_blank" href="https://huggingface.co/Qwen/Qwen2.5-7B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen/Qwen2.5-7B-Instruct</a>",
- 50.46,
- "bfloat16",
- 7,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 51.61,
- 85.58,
- 60.47,
- 4.19,
- true
- [
- "<a target="_blank" href="https://huggingface.co/aisingapore/llama3.1-8b-cpt-sea-lionv3-instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">aisingapore/llama3.1-8b-cpt-sea-lionv3-instruct</a>",
- 50.32,
- "bfloat16",
- 8,
- "π΅ SEA-Focused",
- "β SFT",
- false,
- "",
- 59.89,
- 83.33,
- 47.47,
- 10.6,
- true
- [
- "<a target="_blank" href="https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">mistralai/Mixtral-8x7B-Instruct-v0.1</a>",
- 50.26,
- "?",
- 47,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 49.88,
- 84.19,
- 60.95,
- 6.02,
- true
- [
- "<a target="_blank" href="https://huggingface.co/aisingapore/Llama-SEA-LION-v3.5-8B-R" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">aisingapore/Llama-SEA-LION-v3.5-8B-R</a>",
- 49.43,
- "?",
- 8,
- "π΅ SEA-Focused",
- "β SFT",
- false,
- "",
- 59.43,
- 82.02,
- 47.38,
- 8.89,
- true
- [
- "<a target="_blank" href="https://huggingface.co/SeaLLMs/SeaLLMs-v3-7B-Chat" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">SeaLLMs/SeaLLMs-v3-7B-Chat</a>",
- 49.06,
- "bfloat16",
- 7,
- "π΅ SEA-Focused",
- "β SFT",
- false,
- "",
- 52.04,
- 79.68,
- 62.47,
- 2.08,
- true
- [
- "<a target="_blank" href="https://platform.openai.com/docs/models" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">gpt-4o-mini</a>",
- 48.47,
- "?",
- -1,
- "π’ Multilingual",
- "β Unknown",
- false,
- "",
- 25.09,
- 73.12,
- 47.78,
- 47.88,
- true
- [
- "<a target="_blank" href="https://huggingface.co/Qwen/Qwen3-4B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen/Qwen3-4B</a>",
- 48.42,
- "?",
- 4,
- "π’ Multilingual",
- "β¦οΈ Preference-aligned",
- false,
- "",
- 55.21,
- 83.53,
- 51.85,
- 3.08,
- true
- [
- "<a target="_blank" href="https://huggingface.co/CohereForAI/aya-expanse-32b" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">CohereForAI/aya-expanse-32b</a>",
- 47.84,
- "bfloat16",
- 32,
- "π’ Multilingual",
- "β¦οΈ Preference-aligned",
- false,
- "",
- 53.22,
- 87.47,
- 46.09,
- 4.58,
- true
- [
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-3.1-8B-Instruct</a>",
- 47.38,
- "bfloat16",
- 8,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 52.08,
- 86.61,
- 46.42,
- 4.42,
- true
- [
- "<a target="_blank" href="https://huggingface.co/mistralai/Ministral-8B-Instruct-2410" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">mistralai/Ministral-8B-Instruct-2410</a>",
- 47.33,
- "bfloat16",
- 8,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 42.02,
- 77.95,
- 62.33,
- 7,
- true
- [
- "<a target="_blank" href="https://huggingface.co/neulab/Pangea-7B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">neulab/Pangea-7B</a>",
- 43.98,
- "bfloat16",
- 7,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 46.23,
- 78.8,
- 47.74,
- 3.15,
- true
- [
- "<a target="_blank" href="https://huggingface.co/SeaLLMs/SeaLLMs-v3-1.5B-Chat" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">SeaLLMs/SeaLLMs-v3-1.5B-Chat</a>",
- 43.2,
- "bfloat16",
- 1.5,
- "π΅ SEA-Focused",
- "β SFT",
- false,
- "",
- 37.14,
- 75.17,
- 56.85,
- 3.62,
- true
- [
- "<a target="_blank" href="https://huggingface.co/CohereLabs/c4ai-command-r7b-12-2024" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">CohereLabs/c4ai-command-r7b-12-2024</a>",
- 41.66,
- "?",
- 7,
- "π’ Multilingual",
- "β¦οΈ Preference-aligned",
- false,
- "",
- 43.83,
- 74.97,
- 45.71,
- 2.14,
- true
- [
- "<a target="_blank" href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">deepseek-ai/DeepSeek-R1-Distill-Qwen-14B</a>",
- 21.77,
- "?",
- -1,
- "β Unknown",
- "β Unknown",
- false,
- "",
- 23.69,
- 14.01,
- 45.64,
- 3.74,
- false
- [
- "<a target="_blank" href="https://huggingface.co/Qwen/Qwen2-72B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen/Qwen2-72B-Instruct</a>",
- 20.78,
- "bfloat16",
- -1,
- "β Unknown",
- "β Unknown",
- false,
- "",
- 3.8,
- 12.64,
- 54.56,
- 12.11,
- false
- [
- "<a target="_blank" href="https://huggingface.co/HuggingFaceTB/SmolLM-1.7B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">HuggingFaceTB/SmolLM-1.7B-Instruct</a>",
- 20.69,
- "?",
- 1.7,
- "π Monolingual",
- "β SFT",
- false,
- "",
- 23.82,
- 13.53,
- 45.41,
- 0,
- false
- [
- "<a target="_blank" href="https://huggingface.co/deepseek-ai/DeepSeek-V3" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">deepseek-ai/DeepSeek-V3</a>",
- 6.88,
- "?",
- -1,
- "β Unknown",
- "β Unknown",
- false,
- "",
- 0,
- 0,
- 0,
- 27.53,
- false
- [
- "metadata": null
- "headers": [
- "Model",
- "Average β¬οΈ",
- "Precision",
- "# Parameters",
- "Multilingual",
- "Model Type",
- "Submission",
- "Submission Date",
- "ποΈ BalitaNLP",
- "π Belebele (ceb)",
- "π Belebele (fil)",
- "ποΈ CebuaNER",
- "ποΈ Dengue",
- "ποΈ FiReCS",
- "π Global-MMLU",
- "π INCLUDE",
- "π KALAHI",
- "π NewsPH NLI",
- "π’ NTREX-128",
- "π Readability (ceb)",
- "ποΈ SIB-200 (ceb)",
- "ποΈ SIB-200 (tgl)",
- "πStingrayBench",
- "π’ Tatoeba (ceb)",
- "π’ Tatoeba (tgl)",
- "π’ TICO-19",
- "ποΈ TLUnified NER",
- "ποΈ Universal NER (ceb)",
- "ποΈ Universal NER (tgl)",
- "Incomplete"
- "data": [
- [
- "<a target="_blank" href="https://platform.openai.com/docs/models" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">gpt-4o-2024-08-06</a>",
- 74.27,
- "?",
- -1,
- "π’ Multilingual",
- "β Unknown",
- false,
- "",
- 93,
- 89.44,
- 92.11,
- 94.35,
- 78.48,
- 73,
- 72.85,
- 85.99,
- 81.33,
- 80,
- 56.25,
- 56,
- 85.86,
- 86.87,
- 60,
- 42.92,
- 52.85,
- 60.53,
- 97.09,
- 100,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8</a>",
- 67.67,
- "?",
- 400,
- "π’ Multilingual",
- "β¦οΈ Preference-aligned",
- false,
- "",
- 92.16,
- 86.11,
- 88.78,
- 91.22,
- 79.27,
- 65.9,
- 76.8,
- 81.92,
- 88.67,
- 72.76,
- 28.19,
- 56.29,
- 84.85,
- 84.85,
- 25,
- 28.32,
- 35.38,
- 42.64,
- 94.68,
- 100,
- 96.43,
- true
- [
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-4-Scout-17B-16E-Instruct</a>",
- 63.2,
- "?",
- 109,
- "π’ Multilingual",
- "β¦οΈ Preference-aligned",
- false,
- "",
- 91.93,
- 88.22,
- 89.89,
- 90.99,
- 81.03,
- 69.92,
- 74.29,
- 81.59,
- 88,
- 70.6,
- 14.64,
- 42,
- 83.84,
- 83.84,
- 21,
- 38.91,
- 19.37,
- 23.82,
- 95.82,
- 100,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/Qwen/Qwen2.5-72B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen/Qwen2.5-72B-Instruct</a>",
- 63.08,
- "?",
- 72,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 92.16,
- 78.33,
- 89.78,
- 90.46,
- 87.3,
- 70.27,
- 72.98,
- 80.79,
- 88.67,
- 75.5,
- 19.1,
- 64.29,
- 82.83,
- 85.86,
- 30,
- 8.29,
- 7.62,
- 28.04,
- 96.2,
- 100,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/Tower-Babel/Babel-83B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Tower-Babel/Babel-83B</a>",
- 61.13,
- "?",
- 83,
- "π’ Multilingual",
- "π₯ Base",
- false,
- "",
- 93.39,
- 76.67,
- 88.56,
- 85.73,
- 83.93,
- 71.65,
- 74.84,
- 81.3,
- 84.67,
- 63.93,
- 19.06,
- 54,
- 78.79,
- 83.84,
- 46,
- 4.05,
- 11.63,
- 26.26,
- 94.24,
- 97.96,
- 96.43,
- true
- [
- "<a target="_blank" href="https://huggingface.co/aisingapore/Llama-SEA-LION-v3-70B-IT" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">aisingapore/Llama-SEA-LION-v3-70B-IT</a>",
- 61.07,
- "?",
- 70,
- "π΅ SEA-Focused",
- "β SFT",
- false,
- "",
- 93.84,
- 85.56,
- 91.67,
- 92.37,
- 86.02,
- 71.78,
- 76.62,
- 83.28,
- 90,
- 52.87,
- 32.98,
- 49.43,
- 85.86,
- 86.87,
- 48,
- 9.37,
- 12.3,
- 41.06,
- 96.71,
- 100,
- 98.21,
- true
- [
- "<a target="_blank" href="https://huggingface.co/Tower-Babel/Babel-83B-Chat" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Tower-Babel/Babel-83B-Chat</a>",
- 60.85,
- "?",
- 83,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 93.56,
- 77,
- 88.33,
- 85.34,
- 84.08,
- 68.08,
- 75.14,
- 80.94,
- 84,
- 64.52,
- 16.39,
- 57.14,
- 79.8,
- 85.86,
- 43,
- 4.48,
- 11.14,
- 23.35,
- 94.68,
- 97.96,
- 96.43,
- true
- [
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-3.1-70B-Instruct</a>",
- 59.66,
- "bfloat16",
- 70,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 94.2,
- 65,
- 100,
- 88.85,
- 88.51,
- 71.9,
- 71.95,
- 82.46,
- 89.33,
- 51.54,
- 32.03,
- 58.57,
- 75.76,
- 83.84,
- 25,
- 10.65,
- 10.89,
- 46.61,
- 94.74,
- 97.96,
- 96.43,
- true
- [
- "<a target="_blank" href="https://huggingface.co/sail/Sailor2-20B-Chat" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">sail/Sailor2-20B-Chat</a>",
- 58.61,
- "bfloat16",
- 20,
- "π΅ SEA-Focused",
- "β¦οΈ Preference-aligned",
- false,
- "",
- 93.62,
- 85,
- 86.78,
- 88.63,
- 73.62,
- 73.68,
- 66.15,
- 81.52,
- 82,
- 62.61,
- 17.13,
- 55,
- 84.85,
- 86.87,
- 7,
- 13.29,
- 14.08,
- 19.38,
- 97.21,
- 97.96,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/Qwen/Qwen2.5-32B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen/Qwen2.5-32B-Instruct</a>",
- 57.88,
- "bfloat16",
- 32,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 93,
- 72,
- 87,
- 88.17,
- 83.7,
- 73.35,
- 66.62,
- 70.83,
- 87.33,
- 70.44,
- 6.33,
- 64,
- 85.86,
- 83.84,
- 46,
- 8.13,
- 2.06,
- 7.3,
- 96.77,
- 100,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/Qwen/Qwen3-32B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen/Qwen3-32B</a>",
- 57.65,
- "?",
- 32,
- "π’ Multilingual",
- "β¦οΈ Preference-aligned",
- false,
- "",
- 93.17,
- 84.67,
- 89,
- 83.97,
- 79.18,
- 61.62,
- 71.48,
- 77.4,
- 80,
- 65.1,
- 7.46,
- 53.43,
- 87.88,
- 85.86,
- 1,
- 9.25,
- 2.47,
- 14.31,
- 97.34,
- 100,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/aisingapore/gemma2-9b-cpt-sea-lionv3-instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">aisingapore/gemma2-9b-cpt-sea-lionv3-instruct</a>",
- 56.14,
- "bfloat16",
- 9,
- "π΅ SEA-Focused",
- "β SFT",
- false,
- "",
- 92.12,
- 79.67,
- 88.22,
- 90.76,
- 86.58,
- 70.44,
- 64.02,
- 78.97,
- 84.67,
- 53.85,
- 14.29,
- 58.86,
- 83.84,
- 85.86,
- 20,
- 35.57,
- 11.62,
- 29.82,
- 96.39,
- 100,
- 98.21,
- true
- [
- "<a target="_blank" href="https://huggingface.co/aisingapore/Llama-SEA-LION-v3.5-70B-R" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">aisingapore/Llama-SEA-LION-v3.5-70B-R</a>",
- 55.86,
- "?",
- 70,
- "π΅ SEA-Focused",
- "β SFT",
- false,
- "",
- 92.36,
- 80.33,
- 90,
- 90.61,
- 26.34,
- 69.82,
- 70.07,
- 77.95,
- 90.67,
- 53.8,
- 18.44,
- 53.43,
- 81.82,
- 86.87,
- 48,
- 8.82,
- 7.15,
- 29.8,
- 96.01,
- 97.96,
- 98.21,
- true
- [
- "<a target="_blank" href="https://huggingface.co/google/gemma-2-27b-it" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">google/gemma-2-27b-it</a>",
- 55.22,
- "?",
- 27,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 91.98,
- 82.67,
- 90.22,
- 92.37,
- 73.73,
- 73.92,
- 68.36,
- 80.22,
- 88.67,
- 47.93,
- 16.7,
- 69.14,
- 87.88,
- 85.86,
- 38,
- 10.91,
- 6.64,
- 36.87,
- 96.64,
- 100,
- 98.21,
- true
- [
- "<a target="_blank" href="https://huggingface.co/google/gemma-3-27b-it" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">google/gemma-3-27b-it</a>",
- 55.17,
- "?",
- 27,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 92.44,
- 86.67,
- 90.22,
- 91.45,
- 86.97,
- 68.64,
- 71.16,
- 84.83,
- 87.33,
- 52.48,
- 8.91,
- 63.14,
- 86.87,
- 88.89,
- 16,
- 13.17,
- 3.91,
- 11.14,
- 97.47,
- 100,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">mistralai/Mixtral-8x22B-Instruct-v0.1</a>",
- 54.28,
- "?",
- 141,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 90.81,
- 62.56,
- 78,
- 92.29,
- 84.25,
- 68.53,
- 53.96,
- 67.86,
- 82.67,
- 64.78,
- 12.42,
- 35.71,
- 75.76,
- 78.79,
- 16,
- 9.33,
- 6.98,
- 17.3,
- 96.9,
- 100,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/google/gemma-3-12b-it" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">google/gemma-3-12b-it</a>",
- 54.04,
- "?",
- 12,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 92.73,
- 82.33,
- 87.33,
- 87.18,
- 84.49,
- 69.59,
- 66.89,
- 78.63,
- 81.33,
- 54.35,
- 7.86,
- 50.57,
- 86.87,
- 89.9,
- 53,
- 9.85,
- 1.24,
- 8.95,
- 94.43,
- 100,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/google/gemma-2-9b-it" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">google/gemma-2-9b-it</a>",
- 53.33,
- "?",
- 9,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 92.06,
- 77.44,
- 87.56,
- 91.6,
- 77.67,
- 68.08,
- 63.19,
- 78.86,
- 82.67,
- 50.02,
- 12.58,
- 49.43,
- 82.83,
- 86.87,
- 29,
- 13.74,
- 4.4,
- 26.75,
- 96.52,
- 100,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/Qwen/Qwen3-14B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen/Qwen3-14B</a>",
- 53.23,
- "?",
- 14,
- "π’ Multilingual",
- "β¦οΈ Preference-aligned",
- false,
- "",
- 90.91,
- 79.11,
- 84.56,
- 87.48,
- 32.1,
- 71.93,
- 59.14,
- 60.88,
- 71.33,
- 65.86,
- 6.05,
- 48.57,
- 85.86,
- 88.89,
- 52,
- 8.66,
- 1.07,
- 5.91,
- 96.39,
- 100,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/Qwen/Qwen3-8B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen/Qwen3-8B</a>",
- 52.93,
- "?",
- 8,
- "π’ Multilingual",
- "β¦οΈ Preference-aligned",
- false,
- "",
- 88.62,
- 70.56,
- 77.33,
- 82.75,
- 22.13,
- 64.6,
- 59.53,
- 67.14,
- 72.67,
- 67.29,
- 6.4,
- 34.86,
- 81.82,
- 85.86,
- 45,
- 9.03,
- 0.9,
- 10.08,
- 95.12,
- 100,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/CohereLabs/c4ai-command-a-03-2025" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">CohereLabs/c4ai-command-a-03-2025</a>",
- 52.82,
- "?",
- 111,
- "π’ Multilingual",
- "β¦οΈ Preference-aligned",
- false,
- "",
- 93.17,
- 77.78,
- 88.78,
- 91.15,
- 83.77,
- 69.36,
- 69.57,
- 79.15,
- 86,
- 46.01,
- 3.42,
- 68.29,
- 82.83,
- 88.89,
- 53,
- 7.48,
- 4.85,
- 10.95,
- 97.85,
- 100,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/Tower-Babel/Babel-9B-Chat" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Tower-Babel/Babel-9B-Chat</a>",
- 52.75,
- "bfloat16",
- 9,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 94,
- 50.44,
- 78.78,
- 74.73,
- 67.41,
- 69.97,
- 59.59,
- 72.94,
- 82.67,
- 56.33,
- 8.4,
- 55.71,
- 81.82,
- 81.82,
- 28,
- 6.65,
- 2.85,
- 13.68,
- 91.2,
- 95.92,
- 98.21,
- true
- [
- "<a target="_blank" href="https://huggingface.co/sail/Sailor2-8B-Chat" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">sail/Sailor2-8B-Chat</a>",
- 52.49,
- "?",
- 8,
- "π΅ SEA-Focused",
- "β¦οΈ Preference-aligned",
- false,
- "",
- 92.35,
- 78.11,
- 80.56,
- 85.34,
- 61.46,
- 67.56,
- 58.28,
- 76.52,
- 80,
- 50.1,
- 18.71,
- 55.71,
- 82.83,
- 82.83,
- 31,
- 7.33,
- 6.58,
- 27.77,
- 93.73,
- 97.96,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/Qwen/Qwen2.5-14B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen/Qwen2.5-14B-Instruct</a>",
- 52.41,
- "bfloat16",
- 14,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 90,
- 71.67,
- 83.78,
- 89.01,
- 78.4,
- 70.33,
- 58.74,
- 70.19,
- 82.67,
- 59.59,
- 6.32,
- 61.71,
- 78.79,
- 84.85,
- 44,
- 8.92,
- 0.97,
- 5.97,
- 94.74,
- 100,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/Qwen/Qwen2.5-7B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen/Qwen2.5-7B-Instruct</a>",
- 50.46,
- "bfloat16",
- 7,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 90.18,
- 53.67,
- 68.11,
- 78.63,
- 78.49,
- 66.84,
- 51.36,
- 60.34,
- 69.33,
- 60.46,
- 6.99,
- 60.86,
- 77.78,
- 77.78,
- 16,
- 6.6,
- 0.72,
- 6.43,
- 94.3,
- 97.96,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/aisingapore/llama3.1-8b-cpt-sea-lionv3-instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">aisingapore/llama3.1-8b-cpt-sea-lionv3-instruct</a>",
- 50.32,
- "bfloat16",
- 8,
- "π΅ SEA-Focused",
- "β SFT",
- false,
- "",
- 90.33,
- 70.67,
- 81.11,
- 88.09,
- 36.06,
- 72.14,
- 59.66,
- 68.29,
- 78,
- 46.86,
- 14.06,
- 56,
- 80.81,
- 82.83,
- 23,
- 7.11,
- 2.25,
- 26.33,
- 94.87,
- 95.92,
- 98.21,
- true
- [
- "<a target="_blank" href="https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">mistralai/Mixtral-8x7B-Instruct-v0.1</a>",
- 50.26,
- "?",
- 47,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 89.88,
- 50.33,
- 70.78,
- 73.74,
- 69.58,
- 64.74,
- 49.19,
- 62.81,
- 74,
- 61.01,
- 8.9,
- 46.86,
- 74.75,
- 77.78,
- 46,
- 8.19,
- 1.19,
- 11.69,
- 93.79,
- 95.92,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/aisingapore/Llama-SEA-LION-v3.5-8B-R" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">aisingapore/Llama-SEA-LION-v3.5-8B-R</a>",
- 49.43,
- "?",
- 8,
- "π΅ SEA-Focused",
- "β SFT",
- false,
- "",
- 90.34,
- 70.78,
- 81,
- 87.25,
- 24.85,
- 69.63,
- 59.19,
- 69.63,
- 80,
- 46.79,
- 11.45,
- 52.57,
- 78.79,
- 80.81,
- 11,
- 7.63,
- 1.45,
- 23.27,
- 94.49,
- 97.96,
- 98.21,
- true
- [
- "<a target="_blank" href="https://huggingface.co/SeaLLMs/SeaLLMs-v3-7B-Chat" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">SeaLLMs/SeaLLMs-v3-7B-Chat</a>",
- 49.06,
- "bfloat16",
- 7,
- "π΅ SEA-Focused",
- "β SFT",
- false,
- "",
- 88.15,
- 45.44,
- 69.11,
- 73.44,
- 22.01,
- 69.54,
- 51.27,
- 67.01,
- 70,
- 62.64,
- 2.77,
- 43.43,
- 77.78,
- 79.8,
- 58,
- 4.05,
- 0.57,
- 3.75,
- 88.85,
- 97.96,
- 94.64,
- true
- [
- "<a target="_blank" href="https://platform.openai.com/docs/models" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">gpt-4o-mini</a>",
- 48.47,
- "?",
- -1,
- "π’ Multilingual",
- "β Unknown",
- false,
- "",
- 78,
- 48.44,
- 64.89,
- 40,
- 78.48,
- 58.54,
- 24.47,
- 37.13,
- 26.67,
- 47.6,
- 52.32,
- 48,
- 9.09,
- 13.13,
- 50,
- 38.47,
- 40.78,
- 60.67,
- 54.46,
- 85.71,
- 41.07,
- true
- [
- "<a target="_blank" href="https://huggingface.co/Qwen/Qwen3-4B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen/Qwen3-4B</a>",
- 48.42,
- "?",
- 4,
- "π’ Multilingual",
- "β¦οΈ Preference-aligned",
- false,
- "",
- 87.84,
- 66,
- 74.56,
- 83.44,
- 78.89,
- 63.01,
- 54.88,
- 61.89,
- 66.67,
- 51.53,
- 4.55,
- 40.57,
- 82.83,
- 84.85,
- 51,
- 5.8,
- 0.49,
- 5.7,
- 93.86,
- 100,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/CohereForAI/aya-expanse-32b" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">CohereForAI/aya-expanse-32b</a>",
- 47.84,
- "bfloat16",
- 32,
- "π’ Multilingual",
- "β¦οΈ Preference-aligned",
- false,
- "",
- 96,
- 49.89,
- 73.89,
- 79.77,
- 54.85,
- 64.39,
- 52.74,
- 68.03,
- 76.67,
- 45.83,
- 6.72,
- 33,
- 63.64,
- 81.82,
- 11,
- 8.31,
- 0.8,
- 8.48,
- 95.06,
- 97.96,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-3.1-8B-Instruct</a>",
- 47.38,
- "bfloat16",
- 8,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 94,
- 52,
- 72.67,
- 83.97,
- 47.37,
- 72.15,
- 51.43,
- 64.68,
- 72.67,
- 46.06,
- 5.22,
- 58.29,
- 77.78,
- 84.85,
- 50,
- 13.09,
- 1.93,
- 5.8,
- 91.13,
- 97.96,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/mistralai/Ministral-8B-Instruct-2410" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">mistralai/Ministral-8B-Instruct-2410</a>",
- 47.33,
- "bfloat16",
- 8,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 87,
- 39.56,
- 60.89,
- 82.6,
- 21.52,
- 61.7,
- 41.58,
- 58.21,
- 56,
- 62.71,
- 11.34,
- 27.43,
- 22.22,
- 77.78,
- 3,
- 13.23,
- 1.46,
- 9.94,
- 93.54,
- 97.96,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/neulab/Pangea-7B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">neulab/Pangea-7B</a>",
- 43.98,
- "bfloat16",
- 7,
- "π’ Multilingual",
- "β SFT",
- false,
- "",
- 88.34,
- 44.33,
- 61.56,
- 73.74,
- 21.52,
- 62.23,
- 45.43,
- 61.12,
- 62,
- 47.64,
- 4.59,
- 47.71,
- 72.73,
- 78.79,
- 61,
- 7.06,
- 0.53,
- 5.4,
- 91.89,
- 97.96,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/SeaLLMs/SeaLLMs-v3-1.5B-Chat" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">SeaLLMs/SeaLLMs-v3-1.5B-Chat</a>",
- 43.2,
- "bfloat16",
- 1.5,
- "π΅ SEA-Focused",
- "β SFT",
- false,
- "",
- 80.24,
- 26.67,
- 43.11,
- 39.08,
- 78.4,
- 57.15,
- 36.63,
- 46.29,
- 64,
- 57.38,
- 5.59,
- 33.43,
- 11.11,
- 51.52,
- 23,
- 4.14,
- 0.58,
- 7.21,
- 72.07,
- 85.71,
- 96.43,
- true
- [
- "<a target="_blank" href="https://huggingface.co/CohereLabs/c4ai-command-r7b-12-2024" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">CohereLabs/c4ai-command-r7b-12-2024</a>",
- 41.66,
- "?",
- 7,
- "π’ Multilingual",
- "β¦οΈ Preference-aligned",
- false,
- "",
- 83.31,
- 29.33,
- 51.11,
- 74.89,
- 22.59,
- 62.37,
- 43.6,
- 54.39,
- 58.67,
- 45.8,
- 2.91,
- 49.43,
- 26.26,
- 81.82,
- 2,
- 3.95,
- 0.66,
- 3.67,
- 82.08,
- 95.92,
- 100,
- true
- [
- "<a target="_blank" href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">deepseek-ai/DeepSeek-R1-Distill-Qwen-14B</a>",
- 21.77,
- "?",
- -1,
- "β Unknown",
- "β Unknown",
- false,
- "",
- null,
- 22.89,
- 22.89,
- 40,
- 78.48,
- 32.44,
- 23.12,
- 34.27,
- 24.67,
- 46.08,
- 5.32,
- 48,
- 8.08,
- 8.08,
- 50,
- 5.55,
- 1.26,
- 6.15,
- 52.75,
- 85.71,
- 37.5,
- false
- [
- "<a target="_blank" href="https://huggingface.co/Qwen/Qwen2-72B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen/Qwen2-72B-Instruct</a>",
- 20.78,
- "bfloat16",
- -1,
- "β Unknown",
- "β Unknown",
- false,
- "",
- null,
- null,
- null,
- 90.23,
- 86.82,
- null,
- null,
- 79.54,
- 84,
- 55.66,
- 10.36,
- 50.57,
- null,
- null,
- 38,
- 9.52,
- 5.94,
- 32.61,
- 95.88,
- 97.96,
- 98.21,
- false
- [
- "<a target="_blank" href="https://huggingface.co/HuggingFaceTB/SmolLM-1.7B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">HuggingFaceTB/SmolLM-1.7B-Instruct</a>",
- 20.69,
- "?",
- 1.7,
- "π Monolingual",
- "β SFT",
- false,
- "",
- null,
- 23.56,
- 21.89,
- 38.4,
- 73.04,
- 32.41,
- 23.92,
- 32.79,
- null,
- 46.04,
- null,
- null,
- 8.08,
- 8.08,
- 1,
- null,
- null,
- null,
- 52.75,
- 85.71,
- 37.5,
- false
- [
- "<a target="_blank" href="https://huggingface.co/deepseek-ai/DeepSeek-V3" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">deepseek-ai/DeepSeek-V3</a>",
- 6.88,
- "?",
- -1,
- "β Unknown",
- "β Unknown",
- false,
- "",
- null,
- null,
- null,
- null,
- null,
- null,
- null,
- null,
- null,
- null,
- 34.33,
- null,
- null,
- null,
- null,
- 16.26,
- 19.69,
- 38.08,
- null,
- null,
- null,
- false
- [
- "metadata": null
Parameter-Efficiency Plot
Model performance on FilBench with respect to their parameter size. For mixture-of-experts models, we plot their full parameter count. In general, we find that model size and performance are positively correlated.
Cost-Efficiency Plot
Model performance on FilBench with respect to their per-token output cost ($/1M tokens). We use the token-pricing as published in OpenRouter. For models not in OpenRouter, we either exlude them from the chart or use the cost of the base model it was finetuned from.
FilBench is a comprehensive evaluation benchmark for Filipino. We curate 12 sub-tasks across 4 major categories--Cultural Knowledge, Classical NLP, Reading Comprehension, and Generation--and evaluate several models in order to understand their Filipino-centric capabilities.
Overview
We average four core sections (weighted by the number of instances):
- Cultural Knowledge: Includes instances for measuring cultural understanding capabilities of LLMs.
- Classical NLP: Contains questions on standard NLP tasks such as text classification and named-entity recognition.
- Reading Comprehension: Contains more focused natural language understanding (NLU) tasks and questions from readability benchmarks.
- Generation: Contains instances for natural language generation (NLG), more focused on translation.
Evaluation Runner
We use our own fork of lighteval to perform evaluations. We highly recommend using the vLLM backend for faster inference. Sequentially, evaluating on FilBench can take 4.93 hours on 2 NVIDIA H100 GPUs. However, the evaluation suite can be parallelized per benchmark, where the longest-running task can take approximately 1 hour and 28 minutes, and the shortest task takes only 5.86 minutes.
To evaluate your model on FilBench and for it to appear in the leaderboard, please follow the steps in our Github repository.
Contact
This work was done by Lj V. Miranda (@ljvmiranda921), Elyanah Aco (@elyanah-aco), Conner Manuel (@connermanuel), Blaise Cruz (@jcblaisecruz02), and Joseph Imperial (@imperialite). For any questions, please reach out to us via filbench-eval@googlegroups.com or through our GitHub Issues.
Acknowledgements
We would like to thank Cohere Labs for providing credits through the Cohere Research Grant to run the Aya model series, and Together AI for additional computational credits for running several open models. We also acknowledge the Hugging Face team, particularly the OpenEvals team (ClΓ©mentine Fourrier @clefourrier and Nathan Habib @NathanHB) and Daniel van Strien @davanstrien, for their support in publishing the FilBench blog post.