Smartphones
The Learning Curve, Part 1: Why Teaching AI New Languages Begins with Data
Samsung Research in Indonesia be part of a series about the people and innovations behind the democratization of mobile AI
5/15/2024
As Samsung continues to pioneer premium mobile AI experiences, we visit Samsung Research centers around the world to learn how Galaxy AI1 is enabling more users to maximize their potential. Galaxy AI now supports 16 languages, so more people can expand their language capabilities, even when offline, thanks to on-device translation in features such as Live Translate2, Interpreter, Note Assist and Browsing Assist. But what does AI language development involve? This series examines the challenges of working with mobile AI and how we overcame them. First up, we head to Indonesia to learn where one begins teaching AI to speak a new language.
The first step is establishing targets, according to the team at Samsung R&D Institute Indonesia (SRIN). “Great AI begins good quality and relevant data. Each language demands a different way to process this, so we dive deep to understand the linguistic needs and the unique conditions of our country,” says Junaidillah Fadlil, head of AI at SRIN, whose team recently added Bahasa Indonesia (Indonesian language) support to Galaxy AI. “Local language development has to be led by insight and science, so every process for adding languages to Galaxy AI starts with us planning what information we need and can legally and ethically obtain.”
Galaxy AI features such as Live Translate perform three core processes: automatic speech recognition (ASR), machine translation (MT) and text-to-speech (TTS). Each process needs a distinct set of information.
ASR, for instance, needs extensive recordings of speech in numerous environments, each paired with an accurate text transcription. Varying background noise levels help account for different environments. “It’s not enough just to add traffic noises to recordings,” explains Muchlisin Adi Saputra, the team’s ASR lead. “We must go out into traffic or to a mall where we can authentically capture unique sounds at street level, like people calling out or hammering from a construction site.”
Sources of data must also be considered. Saputra adds: “We need to keep up to date with the latest slang and how it is used, and mostly we find it on social media!”
Next, MT requires translation training data. “Translating Bahasa Indonesia is challenging,” says Muhamad Faisal, the team’s MT lead. “Its extensive use of contextual and implicit meanings relies on social and situational cues, so we need numerous translated texts that the AI could reference for new words, foreign words, proper nouns, and idioms – any information that helps AI understand the context and rules of communication.”
TTS then requires recordings that cover a range of voices and tones, with additional context on how parts of words sound in different circumstances. “Good voice recordings could do half the job and cover all the required phonemes (units of sound in speech) for the AI model,” adds Harits Abdurrohman, TTS lead. “If a voice actor did a great job in the earlier phase, the focus shifts to refining the AI model to clearly pronounce specific words.”