Overcome the language barrier in NLP

Natural language processing is a transformative technology and has generated a lot of buzz in recent years for its large-scale impact. But most of the research and models built focus on mechanisms that work for the English language. Even though the models are built for other languages, they have been mostly around popular languages.

There is around 7000 languages ​​spoken in this world, the Asian continent having the highest percentage in terms of number of languages ​​spoken. If we don’t respond to the immense array of languages ​​that exist, we are missing out much of the world from the benefits of technological advancements. There is a need to develop speech recognition models for other languages ​​in order to make the technology more inclusive.

Free Course on Responsible AI. Register here>>

Difficult to build

Although researchers and tech companies have realized that introducing NLP in other languages ​​would be very useful from a business and societal perspective, it is quite difficult to build the models in other languages. because the availability of the correct and sufficient dataset is a huge problem. We need a large dataset to train and test the algorithm while building an NLP model. Although large populations may speak a particular language, obtaining such datasets can still be difficult.

If a small dataset is available, we would need to have separate models to build the model. Language data also needs to be cleaned. Many languages ​​have symbols and other characters that all types of computer systems may not recognize without appropriate modification. Adapting it to such systems can be time consuming and costly. If a company develops a model for other languages, it must open it up because it is still an emerging field and others can effectively learn from and be inspired by it.


We have made progress in recent years to build models in a wide range of languages.

See also
  • In 2020, Meta introduced the M2M-100, a multilingual machine translation (MMT) model that translates between any pair of 100 languages ​​without relying on English data. He stated that M2M-100 is trained on a total of 2,200 language instructions. The goal of building such a model is to improve the quality of translations around the world, especially those that speak low-resource languages, Meta said.
  • In September, IIT Bombay spear Udaan project which helps translate textbooks and other engineering study materials and other streams from English to Hindi and other Indian languages. It is a translation ecosystem based on donations and artificial intelligence.

How will this help?

Natural language processing finds its use in a wide range of areas, such as summarizing, answering questions, sentence similarity, translation, token classification and many more. If it can penetrate less popular languages, it will be immensely beneficial for:

  • Understand and analyze emotions on various social media platforms and ecommerce website comments where a large portion of people speak in their native language and not in English. It can be very beneficial for businesses for feedback and improvement.
  • Better customer service and engagement, as customers mostly like to talk to chatbots or virtual assistants in their native language.
  • Extending to various categories will improve the results and accuracy of the technology.
  • The right content is available to users in their native language – based on their choices and past habits.
  • The penetration of technology into non-popular languages ​​will benefit society.

We need to make sure that the benefits of technology are available to everyone for society to move forward. A good start for this will be the penetration of new era technologies beyond borders.

Subscribe to our newsletter

Receive the latest updates and relevant offers by sharing your email.

Join our Telegram Group. Be part of an engaging community

Sreejani Bhattacharyya

Sreejani Bhattacharyya

Sreejani Bhattacharyya is a journalist with a postgraduate degree in economics. When not writing, she finds herself reading about geopolitics, economics and philosophy. She can be reached at [email protected]

Source link

Previous Notification of employer in New York requiring monitoring of phone, email, internet access / use
Next On "I named my dog ​​Pushkin (and other immigrant stories)" by Margarita Gokun Silver