New community provides an open-source framework to assess large language models for capability, energy efficiency and safety
Read the full press release here
25 February 2025, London – The GSMA Foundry, the GSMA’s innovation hub, today announced the launch of GSMA Open-Telco LLM Benchmarks, an open-source community aimed at improving the performance of large language models (LLMs) for telecom-specific applications. The community provides an industry-first framework for evaluating AI models in real-world telecom use cases and is supported at launch by Hugging Face, Khalifa University, The Linux Foundation and a host of leading mobile network operators and vendors.
As AI adoption in telecoms accelerates, LLMs have demonstrated significant shortcomings in handling technical telecom knowledge, regulatory compliance and network troubleshooting. In recent tests, GPT4¹ scored less than 75% on TeleQnA² ³, a comprehensive dataset tailored to assess the knowledge of LLMs in the field of telecommunications, and less than 40% on 3GPPTdocs Classification⁴, a dataset based on 3GPP standards documentation. Microsoft’s Phi2⁵, a much smaller model, scored only 10% on MATH500⁶ ⁷, a benchmark of 500 general maths questions.
These results highlight the current limitations of AI models in addressing telecom-specific queries. GSMA Open-Telco LLM Benchmarks will address these gaps by providing transparent, open evaluations of AI models across capabilities, energy efficiency and safety.
“Today’s AI models struggle with telecom-specific queries, often producing inaccurate, misleading or impractical recommendations,” said Louis Powell, Head of AI Initiatives, GSMA. “By creating an industry-wide set of benchmarks, we’re not only improving model performance but also ensuring AI in telecoms is safe, reliable and aligned with real-world operational needs.”
The mobile network operators supporting the launch of GSMA Open-Telco LLM Benchmarks include Deutsche Telekom, LG Uplus, SK Telecom and Turkcell and technology vendor, Huawei.
The GSMA Open-Telco LLM Benchmarks community enables mobile network operators, AI researchers and developers to submit use cases, datasets and models for evaluation. A standardised benchmarking framework ensures that all AI models are evaluated against real-world challenges in areas such as telecoms domain knowledge, mathematical reasoning, energy consumption and safety. The resulting benchmarks will be hosted on Hugging Face to ensure transparency and encourage community engagement.
Mobile network operators, vendors, startups and researchers are now encouraged to contribute, by submitting interest and LLM telcos use cases, to [email protected] and for more information visit www.gsma.com/get-involved/gsma-foundry/gsma-open-telco-llm-benchmarks.