Skip to content

Does AI need a language lesson in minority languages?

Minority languages are disappearing, and our AI usage could be part of the cause. Grace Golby investigates.

Where language is poetic and precise, artificial intelligence is algorithmic and approximate. Yet, convenience often prevails, bringing many Bristol students to the same place: half-asleep with a strong cup of coffee in one hand and the mighty keyboard in the other.

Every year, many students are drawn to the seemingly endless prompt to ‘ask anything’ and although debate around the ethics of AI has been widespread, I do not wish to simply regurgitate it in this article. Rather, the languages student in me is intrigued by how ‘an estimated 90 per cent of the training data for current generative AI systems stems from English’. Whilst this may be useful for us English speakers, it raises an important question: does AI’s uneven language coverage threaten the survival of minority languages?

As speakers of English, which I assume many of you are in reading this article (if not very impressive), it is sometimes too easy to perceive the world through the ease of our language. Whether in hunting subtitles for a movie, finding a translation of an important article or listening to an English adaptation of a popular song, the language rarely disappoints in enabling wide accessibility online. After all, according to the British Council website, ‘around the world, 2.3 billion people speak English’. Yet, we should also remember that our planet’s population has climbed to over 8 billion people, making English a much smaller proportion of worldwide language use than it first seems.

For those who do not speak English, for those interested in preserving cultural heritage through language and for those trying to promote language equity in an age of digitalisation, a problem arises for minority languages with AI. 

They simply lack the depth of online resources and, consequently, the vast datasets required to accurately train pattern recognition. You may especially see this in clumsy interpretations of idioms, slang and cultural nuance. As a result, AI struggles to represent these languages in the digital world. This accelerates an international dependence on English, causes cultural misunderstandings and triggers lengthy Reddit debate.

Image and illustration by Epigram / Jemima Choi

And the issue is compounded because AI is everywhere. It powers our online search engines, fuels our chatbots, answers questions never before asked by humanity, and is utilised in science, healthcare, banking and security. In defaulting to English, we are often too preoccupied by the question at hand and miss the problem. We may believe that if English is so well utilised by AI and is spoken widely globally, it must facilitate effective engagement with AI for a large proportion of the globe. That is somewhat true. But what is sometimes forgotten is that the English language has evolved around English-speaking populations. It has grown and adapted to meet the needs of and reflect the culture of its users over time. There are even words that simply cannot be expressed in English as they do not exist in the language, with examples being ‘Schadenfreude’ or ‘Почемучка’. AI usage is global – the English language alone is not.

A good example would be a Welsh speaker switching to English when interacting with AI for ease of communication and better results. This may seem inconsequential on an individual level, but when repeated on a global scale it serves to undermine efforts in preserving the influence of minority languages. 

The solution? Representation and active promotion of minority languages with AI. After all, AI only disadvantages them due to the dominance of English (and other widely spoken languages). 

Image by Epigram / Jemima Choi
AI can’t help falling in love with you: the rise of chatbot romance
Betsan Branson Wiliam considers the increasing number of people using AI chatbots to fulfil romantic and sexual desires - why is this happening, and who is the most vulnerable?
AI for everyone!
By Vilhelmiina Haavisto, Deputy Science and Technology Editor The use of AI is becoming increasingly commonplace in society, but how are we adapting to the major changes it brings? Artificial intelligence (AI) has gone from being a futuristic sci-fi trope to a real-life technology with a myriad of applications in

A conscious effort is required to embed minority languages in AI’s design to avoid this digital marginalisation and lack of representation. To some extent, this has been taking place. One project utilising Google AI is ‘Woolaroo’, which helps users learn indigenous words in context through scanning photos and producing translations of pictured objects. Languages included range from Māori to Sicilian to the Australian Aboriginal language Yugambeh. Another initiative is NüshuRescue. ‘Nüshu is considered to be the world’s only writing system that is created and used exclusively by women’, and it is derived from Chinese characters. Women of the Xiao River valley passed it from mother to daughter and ‘it was used by women in a feudal society who lacked access to education in reading and writing’. NüshuRescue responds to threats of the language’s decline as ‘an AI-driven framework designed to train large language models (LLMs) on endangered languages with minimal data’, helping to keep the language alive and represented in AI usage.

Yet, despite these initiatives, it is clear that not enough is being done to target the dominance of English when it comes to AI. Although English is widespread and serves as a key lingua franca (a language used to communicate between native speakers of different languages) in many contexts, we cannot overlook the position in which it leaves minority languages. That is because, if we would like AI to reflect the vibrant diversity of our world, AI must reflect the diversity of its languages. I encourage you to engage with AI in different languages, whether you are a heritage speaker, a curious linguist or a monolingual speaker with some spare time between lectures. You may find something that you could not in English.

Featured Image: Epigram / Corin Hadley


Latest