Addressing the AI-Language Gap – with BSC’s AINA Challenge  

The rapid evolution of artificial intelligence (AI) and its applications in language technologies has highlighted a significant challenge: the AI language gap. This gap primarily affects speakers of non-global or “low-resource” languages, which are underrepresented in the digital sphere. This blog explores the nature of this gap and how initiatives like Barcelona Supercomputing Center’s (BSC) Project AINA and the AINA Alliance are addressing these by launching the €1m AINA challenge to ensure no language is left behind in the digital age.

Barcelona Supercomputing Center: A Beacon of Innovation

Before delving into Project AINA, it’s essential to understand the role of the Barcelona Supercomputing Center (BSC) and why it’s considered a world leader in its field. BSC is renowned for its pioneering work in high-performance computing (HPC), data management, and computational science. With its flagship supercomputer, MareNostrum, BSC provides a powerful tool for scientific research across various disciplines, including life sciences, earth sciences, and engineering. The centre’s commitment to advancing computational capabilities, coupled with its collaborative approach to tackling global challenges, positions BSC at the forefront of technological innovation – it’s the world’s eighth-largest supercomputer, 3rd in Europe and one of the greenest globally.

Understanding the AI-Language Gap

The AI language gap refers to the disparity in developing and applying AI technologies across different languages. This gap is most evident in natural language processing (NLP), where most systems are designed and tested in “high-resource” languages like English. With over 7,000 languages spoken worldwide, only about 20 are considered “high-resource,” meaning they have sufficient data available to train AI models effectively [1]. This disparity leads to several critical issues:

  • Limited Access to Digital Services: Speakers of under-resourced languages have restricted access to digital services, contributing to a smaller digital footprint and less representation in AI applications.
  • Digital Divide: The language gap exacerbates the digital divide, as generative AI systems and large language models (LLMs) are primarily trained on data from a few hundred languages, with English dominating online content.
  • Siloed Innovation: Where innovation in telco and AI does occur, it often remains siloed, fragmented, and duplicated, further widening the gap.

A picture that speaks 1000 words:-

Bridging the Gap: Project AINA and the AINA Alliance

Emerging out of one of the largest supercomputers in the world, the Barcelona Supercomputing Center (BSC) initiated Project AINA and the AINA Alliance, which represent concerted efforts to address the AI language gap by fostering the development of AI technologies for non-global languages – GSMA Foundry is supporting their efforts in addressing the AI-Language Gap. Here’s how they are making a difference:

Project AINA

  • Digital and Linguistic Resources: AINA focuses on generating the digital and linguistic resources necessary to develop AI technologies in underrepresented languages. For instance, the project aims to ensure the survival of the Catalan language in the digital age by facilitating the development of voice assistants, automatic translators, and conversational agents in Catalan[2].
  • Open-Source Models: By creating open-source language data and models, AINA seeks to provide a more inclusive alternative to proprietary datasets, which often lack diverse representation[1][2].

The AINA Alliance

  • Global Collaboration: The AINA Alliance, formed through a collaboration between the Government of Catalonia and the GSMA association, aims to promote the equal treatment of Catalan and other non-global languages in the digital world. It is open to including other languages from all over the world, aiming to enable speakers of all languages to fully participate in the digital world[6].
  • Sharing Benefits: The alliance focuses on sharing the benefits of digital preservation of language and culture with both the public and private sectors. It encourages the global promotion of activities and tools developed by the AINA project, raising awareness of the benefits of these techniques for digital citizens[3].
  • Call to action: Operators have been at the forefront of enabling voice for the last 30 years – there’s a new voice at the table, and operators could have a crucial role again in allowing the new voice at the table – AI. While the project initially focuses on Catalan – we have the blueprint/recipe for what you need to do to improve the data availability for training LLM, whether AINA or other initiatives.

The AINA Challenge at MWC 2024

In line with the spirit of innovation and inclusivity, GSMA Foundry supports the BSC’s AINA Challenge launch at Mobile World Congress (MWC) 2024. Acceleration program about using and developing AI tools in Catalan with up to 24 projects winning a total amount of €1M.

Three challenges to be addressed:

  1. Develop applications and services using AI and language technologies in Catalan – Solutions to incorporate Catalan into new and existing AI/TL services or applications.
  2. Develop monitoring, control and alignment tools for AI applications, models and language technologies incorporating Catalan.
  3. Contribute to building an open resources ecosystem to help scale, adapt and make AI and language tech more robust in Catalan.

While the project initially focuses on Catalan, GSMA Foundry will work with the BSC to scale the project globally and build an ecosystem to help address the AI-Language Gap.

Further details can be found at https://www.gsma.com/get-involved/gsma-foundry/gsma-foundry-challenges/

Conclusion

The AI language gap poses a significant challenge to global inclusivity in the digital age. However, initiatives like Project AINA and the AINA Alliance are pioneering efforts to bridge this gap by developing and promoting AI technologies for non-global languages. By focusing on creating digital and linguistic resources, fostering global collaboration, and advocating for open-source models, these initiatives aim to ensure that no language is left behind as the world advances into an increasingly AI-driven future.

Citations:

[1] https://www.brookings.edu/articles/how-language-gaps-constrain-generative-ai-development/

[2] https://www.bsc.es/news/bsc-news/aina-born-the-project-will-guarantee-the-survival-the-catalan-language-the-digital-age

[3] https://catalangovernment.eu/catalangovernment/news/544322/aina-project-goes-international-to-protect-and-promote-catalan-in-global-digital-market