Bridging the Language Gap: How KAZ LLM is Transforming AI for Kazakhstan and Beyond

In an era where Large Language Models (LLMs) are reshaping industries, a quiet revolution is unfolding, not in Silicon Valley, but in Kazakhstan.

Oleksii Sharavar from QazCode, recently joined the GSMA Foundry team to share the story of KAZ LLM; an ambitious project to develop a Kazakh-language AI model tailored for local communities, businesses, and public services. What started as a small internal initiative has now become a model of digital inclusion and international collaboration.

A virtual meeting screenshot shows three men in separate video frames. Top left: Faisal Zia, with a beard, labeled GSMA Foundry. Top right: Rich Cockle, labeled GSMA Foundry. Bottom: Oleksii Sharavar, CEO QazCode. GSMA Foundry logo appears at the top right.

Why KAZ LLM? A Mission Born from a Language Gap

Kazakhstan’s tech landscape is evolving rapidly, but one major challenge stood in the way: mainstream LLMs like ChatGPT lacked support for the Kazakh language.

“We had over a million monthly requests from customers, and we couldn’t respond to them fluently in Kazakh. There was no LLM support for our language. That’s when we knew we had to act,” Oleksii shared.

Originally part of Beeline Kazakhstan’s IT team, the QazCode unit realised that local language communities were being left behind by global AI tools. And it wasn’t just Kazakh. This was a challenge shared across many under-represented languages.

From an Internal Tool to a National Asset

Three years ago, the team began collecting voice data from call centers, transcribing and polishing it to build the first dataset of its kind. The result? Kaz-RoBERTA, a 2-billion-token Kazakh language model published on Hugging Face.

“We forgot about it after publishing. But then we checked back, 3,000 downloads. That was our turning point. Clearly, others needed this too.”

This discovery spurred a broader vision: a large-scale, national AI project that would require deep collaboration. That’s when partnerships with the Digital Ministry of Kazakhstan, Nazarbayev University, and the Barcelona Supercomputing Center came into play.

Building a KAZ LLM: A Global Effort with Local Roots

With a clear goal and the right partners, KAZ LLM took shape. Over nine months, the team collected massive datasets, tackled technical roadblocks, and trained a 70-billion parameter multilingual model in Kazakh, Russian, Turkish, and English.

“In countries like Kazakhstan, people often switch languages mid-sentence. Our model needed to understand that kind of linguistic fluidity,” Oleksii noted.

Access to academic literature, not just web forums or user-generated content, was crucial. The team worked closely with universities and government bodies to integrate high-quality, local data.

AI with Impact: Education, SMEs, and Public Services

KAZ LLM’s first real-world use QazCodee launched under the name AI Tutor within the Janymda app, an all-in-one lifestyle platform used by millions in Kazakhstan.

For Students:

  • Interactive lessons in Kazakh literature and history.
  • Instant answers, explanations, and vocabulary tools.

For Teachers:

  • Custom quiz generation from local content (e.g., works of national poet Abai).
  • Automation of repetitive educational tasks in seconds.

From 1,000 to over 20,000 daily active users in just a few months, KAZ LLM’s educational reach is expanding rapidly. New subjects like math, biology, and chemistry are on the roadmap.

Democratising Enterprise AI

Beyond education, KAZ LLM is entering the B2B landscape through a new partnership with Seekr, focusing on secure, ethical, and scalable AI solutions for small and medium enterprises (SMEs).

“SMEs don’t need just a language model. They need reliable agents to do specific jobs – legal, financial, marketing, and more – all in their native language and within their country’s data regulations.”

By enabling local AI agents, KAZ LLM is empowering SMEs to offer richer digital experiences without outsourcing their data or compromising privacy.

From Kazakhstan to the World

The project is already sparking international interest. Governments and startups in Ukraine, Bangladesh, and Uzbekistan are exploring similar models, using the KAZ LLM playbook as a foundation.

“We’re helping teams build LLMs for languages like Bengali, Urdu, and Uzbek. Local startups and linguists are essential to make these tools truly representative.”

What We Learned: Language is Everything

Oleksii reflects that the greatest surprise wasn’t technical, it was cultural.

“Linguists became the rockstars of our project. Without their work on tokenisation, bias, dialects, and cultural nuance, the model would have failed.”

KAZ LLM’s open-source release on Hugging Face means others can now build on this foundation, a testament to the team’s commitment to inclusive AI.

What’s Next for KAZ LLM?

  1. Expanding AI Tutor across subjects and reaching millions more students and teachers.
  2. Improving latency and UX, making the platform more accessible nationwide.
  3. Replicating the model in other emerging markets and underrepresented languages.

“The model is just the beginning. People don’t want an LLM, they want agents that do things. Legal agents, educational agents, government bots. That’s our next mission.”

Final Thoughts: Advice for Founders in Emerging Markets

To others trying to build AI in underrepresented languages, Oleksii’s advice is clear:

“Don’t go it alone. You need partners – governments, universities, infrastructure providers. And don’t overlook your linguists. They’re the key to cultural and technical success.”

A New Era of Local AI

KAZ LLM is a powerful reminder that AI’s true potential lies not just in scale, but in specificity. Language is more than words, it’s identity, culture, and access.

Thanks to Oleksii, the team at QazCode, and their partners, millions of Kazakh speakers now have a model that speaks their language, and understands their world.

Stay tuned for further developments from the GSMA Foundry. If you have any questions or want to join this exciting journey, feel free to reach out.