GSMA Open-Telco LLM Benchmarks ranks frontier AI models for Telco AI, revealing critical gap in network automation readiness  

GSMA’s Louis Powell highlights the findings from the second phase of the GSMA Open-Telco LLM Benchmarks, expanding the focus from general domain knowledge to real-world challenges in network management, configuration, and troubleshooting – concluding that hybrid architectures are the way to move the industry forward 

We kicked off the GSMA Open-Telco LLM Benchmarks initiative in Barcelona this year to bring together a passionate community determined to tackle the unique challenges of telco AI. The community’s early work – under GSMA Open-Telco LLM Benchmarks 1.0 – quickly showed us that ‘out-of-the-box’ AI models just aren’t up to the task for telco-specific needs. The results spoke for themselves. 

Now, as we move to GSMA Open-Telco LLM Benchmarks 2.0, the initiative has grown from a benchmarking exercise to become a thriving, community-driven hub of open-source innovation. And the first findings lead us to the conclusion that a hybrid architecture strategy is the most effective path forward to unlock both scale and precision in telco AI; combining the broad reasoning of foundation models with the precision of specialised components.  

Why? Because the results reveal two critical realities: first, targeted fine-tuning delivers operational accuracy. And second, automation remains a shared challenge as current LLMs struggle to translate natural-language intents into schema-compliant configurations for zero-touch orchestration.  

Delivering the Data and Core Findings: The Hybrid Architecture Imperative 

The collective input of our community led to the construction of five high-stakes, operationally focused evaluation datasets to look at a range of scenarios: 

  • TeleYAML: Evaluates Intent-to-Configuration – translating human requests into precise, structured configuration outputs. 
  • TeleLogs: Benchmarks Root-Cause Analysis (RCA) and network troubleshooting on 5G data. 
  • TeleMath: Tests quantitative engineering reasoning and problem-solving. 
  • TeleQnA: multiple choice questions generated from reach, standards and other technical sources. 
  • 3GPP-TSG: Curated samples from official 3GPP documents. Each excerpt is accurately labelled with the responsible working group. 

The results provide a comprehensive view of model readiness, confirming that while advanced frontier models still lead in overall performance, targeted fine-tuning is crucial for critical operational accuracy. The TeleYAML scores alone tell us clearly that the industry must prioritise bridging the gap between language proficiency and structured, configuration logic reasoning to truly unlock the promise of network automation.  

Our key findings under GSMA Open-Telco LLM Benchmarks 2.0 are: 

  • Targeted Precision is Key: The results validate domain-specific adaptation, with AT&T’s fine-tuned Gemma model achieving the highest score on the crucial TeleLogs troubleshooting benchmark, narrowing the gap with larger, generalist LLMs. 
  • The Automation Bottleneck: Performance across all models on TeleYAML (Intent-to-Configuration) remains low, with top models scoring under 30. This highlights a critical and shared challenge: current LLMs still struggle to translate natural-language intents into the valid, schema-compliant configurations required for closed-loop automation and zero-touch orchestration. 
  • Strategic Takeaway: The GSMA Open-Telco LLM Benchmarks community asserts that the most effective path forward is a Hybrid Architecture strategy, integrating the broad reasoning of foundation models with the precision and domain-awareness of specialised components. 

In addition to building clear direction on utilising AI for the industry, what’s really exciting is the unprecedented level of collaboration we’re seeing from the industry. We’re building a shared ecosystem where rigorous experimentation, cross-operator validation and collaborative learning set the stage for how these models will be evaluated moving forward.  

GSMA members including AT&T, China Telecom, Deutsche Telekom, du, KDDI, KPN, Liberty Global, Orange, Telefónica, Turkcell, Swisscom, and Vodafone are all pitching in, along with our brilliant technology and research partners Adaptive-AI, Datumo, Huawei GTS, Hugging Face, The Linux Foundation, Khalifa University, NetoAI, Universitat Pompeu Fabra (UPF), University of Texas at Dallas and Queen’s University. 

The GSMA Open-Telco LLM Benchmarks community will broaden its scope to include holistic evaluation – such as energy efficiency, Time to First Token, and task latency – and to assess both agents and models. GSMA encourages telecom operators, AI researchers, and technology providers to participate in this open industry initiative. 

Together, we’re shaping the future of telco AI in a way that’s open, hands-on, and genuinely community led. If you would like to join us, please get in touch