|

[AI in Our Language: Why Southeast Asia Must Not Be Left Behind]

[AI in Our Language: Why Southeast Asia Must Not Be Left Behind]

In the global race to develop large language models (LLMs), linguistic dominance has often shaped technological progress. English and Chinese, with their wide-reaching economic and academic influence, continue to drive the most advanced AI models today. But what does this mean for Southeast Asia—a region teeming with diversity, yet where many languages are not globally dominant?

Travelling and witnessing the rapid growth in South-East Asia is certainly assuring with both an Eastern and Western blend of sights.

The Language Gap in AI Development

The current trajectory of AI advancement risks marginalizing regions where local languages lack large digital corpora. Countries like Laos, Myanmar, Cambodia, and even Bahasa-speaking Indonesia face the challenge of not just digitizing their cultures, but making them machine-readable, interpretable, and context-aware.

Enter SEA-LION (Southeast Asian Languages In One Network), an open-source family of large language models trained specifically on Southeast Asian languages. Developed by a coalition of researchers from NUS, NTU, and Hugging Face, SEA-LION is a major step toward bridging the linguistic gap. By focusing on languages such as Thai, Vietnamese, Burmese, Khmer, and Bahasa Indonesia—often neglected in global AI development—it empowers local communities, developers, and governments to build culturally relevant AI applications.

This move is not only inclusive, but vital for equitable digital transformation in the region.

Cognizant and Closing the Gap

Meanwhile, English-dominant LLMs developed by OpenAI, Anthropic, and Meta are increasingly cognizant of these limitations. With the help of reinforcement learning and multilingual training data, they’re learning to better serve non-native English speakers. However, challenges remain, particularly in accent recognition and contextual fluency across diverse English variants used in ASEAN.

The countries in Association of Southeast Asian Nations, also known as ASEAN (Source: easychinapprov.com)

The disparity is noticeable: NLP models are significantly more adept at parsing American or British English than they are at understanding the Southeast Asian English lexicon or pronunciation styles—be it the clipped cadence of Singaporean English, or the rising tones of Filipino-accented English.

Messaging Behaviours Are Telling

The use of messaging platforms also reflects these linguistic gaps. In Myanmar and Cambodia, voice messaging is heavily favoured over typing, largely due to low literacy rates and typing constraints in non-Roman scripts. WhatsApp and Telegram groups in these markets are vibrant ecosystems of audio snippets—something current AI transcription services still struggle to parse effectively unless trained on specific regional accents.

These behavioural nuances further complicate NLP efforts—text-heavy models are ill-equipped for audio-first communities unless explicitly designed for them.

CRM: Where AI Still Falls Short

Even in sectors where AI has made strides, such as customer relationship management (CRM), its limitations are apparent. While AI can process vast amounts of data and automate replies, it often falters in genuine human engagement—the ability to sense frustration, read emotional undercurrents, or pivot tone based on subtle cues. Humans intuitively adjust their approach based on mood, context, and relational memory. AI, no matter how sophisticated, still struggles to replicate the empathy and nuance essential to building long-term customer trust, especially in diverse cultural settings like Southeast Asia.

Yesterday’s Concerns Are Today’s Relics

It’s worth noting, however, how quickly things change. Just a year ago, many worried that Southeast Asia might be left behind in the AI boom. But with the rise of regional initiatives like SEA-LION, and the global push to develop multilingual and multi-modal AI systems, those concerns are rapidly becoming outdated.

The pace of development is so fast that worries from just six months ago now feel antiquated. AI models are no longer just text-based; they are voice-responsive, culturally aware, and increasingly location-sensitive. And as the technology learns to speak our languages, Southeast Asia’s digital future no longer needs to be written in someone else’s tongue.


Let’s ensure our voices—spoken or typed—shape the next generation of intelligence.

#AI #SoutheastAsia #SEA_LION #NaturalLanguageProcessing #DigitalInclusion #VoiceTech #CRM #LinkedInArticle

This article is also published on LinkedIn. More interesting stories and perspectives can be found on marvinfoo.com’s blog section.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *