Closing the AI Accuracy Gap in Telecoms: The Critical Role of Domain-Specific Language Models

GSMA’s Head of AI Initiatives, Louis Powell, and Ravi Kumar Palepu, CEO from NetoAI – a technology partner to the GSMA Open-Telco LLM Benchmarks initiative – look at the delta between general AI model performance and meeting the needs of the telecoms industry

The potential for Artificial Intelligence to transform the telecoms industry is undeniable. From optimising network functions to enhancing customer interactions, AI promises significant advancements. Yet, realising this potential fully has been hampered by a fundamental challenge: AI’s struggle to comprehend the specialised language of telecoms.

Our industry operates with a unique lexicon – a dense landscape of acronyms, technical standards and operational jargon. Standard AI models, typically trained on broad internet data, lack the fluency needed to navigate this complexity accurately. This isn’t just a minor inconvenience; it translates directly into performance limitations for AI applications deployed in real-world telecoms environments.

The Invisible Ceiling: Why General AI Falls Short in Telecoms

Many organisations exploring AI, especially using Retrieval-Augmented Generation (RAG) for knowledge extraction from technical documents, encounter a frustrating performance plateau. Accuracy often stalls around 75%, even with sophisticated system design. For an industry where precision is paramount – whether in network diagnostics or customer support – a 25% error rate is unacceptable. It undermines trust, introduces operational risks, and ultimately limits the return on AI investments.

This accuracy ceiling exists because generic models lack the deep semantic understanding required. They might recognise keywords but fail to grasp the nuanced context and relationships specific to telecoms technologies and operations.

A Dedicated Approach: Engineering Semantic Fluency

Addressing this semantic gap requires moving beyond general-purpose tools and embracing domain specialisation. Recent research efforts from NetoAI have focused on creating embedding models specifically trained to understand the intricacies of telecoms language. This involves a dedicated, multi-faceted approach, with three focus areas:

  • Deep Model Adaptation: Starting with powerful foundation models (like gte-Qwen2-1.5B-instruct), the process involves extensive fine-tuning. This goes beyond surface adjustments, modifying weights across hundreds of internal layers to embed domain knowledge deeply within the model’s architecture.
  • High-Quality, Specialised Data: Success hinges on training data that accurately reflects the target domain. This necessitates meticulous curation of large-scale datasets covering the breadth of telecom concepts, terminology and semantic relationships, often requiring significant manual effort by domain experts.
  • Domain-Specific Tokenisation: Standard methods for breaking text into processable units (tokenisation) often fragment critical telecoms jargon. Developing a tokeniser trained specifically on telecoms vocabulary is crucial for preserving meaning and ensuring the model receives accurate input.

Breaking the Barrier: Achieving Transformative Accuracy

The outcomes of such focused research are encouraging. Specialised embedding models born from this process, like the recently developed T-VEC (Telecom Vectorisation Model), demonstrate a dramatic improvement in understanding telecoms language. Where generic models falter, these specialised models excel:

  • On rigorous benchmarks designed to test nuanced telecom understanding, specialised models can achieve accuracy scores exceeding 0.93, compared to scores often below 0.07 for general models.
  • Crucially, this translates to downstream applications. RAG systems leveraging these specialised embeddings consistently reach accuracy levels between 88% and 93% on complex telecom tasks – exceeding the previous ~75% ceiling.

This leap in accuracy is transformative and exciting, making reliable AI possible for critical functions such as precise technical support bots providing correct first-time resolutions, accurate analysis tools interpreting network logs for faster diagnostics and dependable systems for navigating complex compliance and standards documents.

Fostering Industry Innovation Through Open Collaboration

Advancing AI in telecoms is a collective effort. To accelerate progress and enable wider adoption of more accurate AI, foundational tools emerging from this research, including the T-VEC model and the first open-source telecom-specific tokeniser, have been made available to the community under a permissive MIT license.

The goal is to empower engineers, researchers and innovators across the industry to build upon this work. The technical community is encouraged to:

  • Explore the Methodology: Understand the research process and detailed results presented in the technical paper.
  • Experiment with the Tools: Access the open-source model and tokeniser on Hugging Face and evaluate their performance on relevant telecom tasks.  
  • Contribute to Advancement: Share findings, provide feedback and collaborate on further improvements to push the boundaries of domain-specific AI.

Developing AI that truly speaks the language of telecoms is essential for unlocking its full potential. Domain-specific models represent a vital step forward, enabling the accuracy and reliability required for meaningful digital transformation in our industry. Continued research and community collaboration in this area will be key to building the next generation of intelligent telecom solutions.

Speak to Louis or Ravi to share your work and expertise in domain specific AI or to discuss future collaboration. You can also learn more about the application of AI in the telecoms industry through the GSMA AI Use Case Library.