News Report Technology
January 25, 2024

AI4Bharat Releases ‘Airavata’, a Custom LLM to Improve Hindi Language in AI Models

In Brief

India’s AI4Bharat announced release of “Airavata”, a LLM to improve Hindi language support in AI models, built by fine-tuning OpenHathi.

AI4Bharat Releases ‘Airavat’, A Custom LLM for Improved Support Hindi Language

Indian higher education institute IIT Madras’ AI research lab AI4Bharat released Airavata, an instruction-tuned model for Hindi. According to the announcement, the model has been built by fine-tuning Sarvam AI’s OpenHathi, with diverse Hindi datasets to make it better suited for assistive tasks.

Hindi is the most spoken language in India with over 43% native speakers.

“Currently, Airavata supports Hindi, but we plan to expand this to all 22 scheduled Indic languages soon,” said the AI lab in a LinkedIn post. It is important to note that the performance of large language models (LLMs) relies on high-quality instruction tuning datasets. However, there is a scarcity of diverse datasets available for Hindi.

Major progress has also been made in developing datasets for pre-training like RedPajama; instruction tuning like Alpaca, UltraChat, Dolly, OpenAssistant, LMSYS-Chat; and evaluation benchmarks like AlpacaEval, MT-Bench. However, most of these advancements have been predominantly centered on the English language.

“There is some limited support for Indian languages, which can be attributed to the incidental inclusion of some Indian language data that slipped through the data filters during the pre-training of these language models. However, the representation of data, the efficacy of tokenizers, and task performance for Indian languages are considerably behind that of English,” AI4Bharat Labs said in its statement.

“The performance in Indian languages, even on closed-source models such as ChatGPT, GPT-4 and others, is inferior compared to English,” it added.

AI4Bharat Releases Instruction Tuning Datasets

The AI4Bharat team also released the instruction-tuning datasets used for the model to enable further research for IndicLLMs.

“Airavata” relies on human-curated datasets that are friendly to licensing agreements to develop instruction-tuned models. The team specifically avoid using data generated from proprietary models like GPT-4 because it would increase costs and limit the free usage of these models in other applications due to licensing restrictions.

Instead, the team believe human-curated datasets are a more sustainable approach for building models for most Indic languages.

However, Airavata, like other LLMs, encounters typical challenges. These include a possibility for hallucination, leading to fabricated information and may struggle with accuracy in complex or specialized topics. There’s also a risk of producing objectionable or biased content.

The team clarified that the model is for research purposes and is not recommended for any production use cases.

Previously, the AI4Bharat lab launched an open-source video transcreation platform – Chitralekha – which includes a workforce management system facilitating the complete transcreation process of a video from one language to another, covering transcription, translation and voice-over for the translated language.

It was created in collaboration with EkStep – a non-for-profit foundation and the team that was instrumental in developing India’s Aadhaar project.

Additionally, AI4Bharat has initiated the recruitment process for its AI resident and associate program for the 2024-25 term. This year-long pre-doctoral program emphasizes intensive work in natural language processing (NLP), speech, and vision projects.

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Kumar is an experienced Tech Journalist with a specialization in the dynamic intersections of AI/ML, marketing technology, and emerging fields such as crypto, blockchain, and NFTs. With over 3 years of experience in the industry, Kumar has established a proven track record in crafting compelling narratives, conducting insightful interviews, and delivering comprehensive insights. Kumar's expertise lies in producing high-impact content, including articles, reports, and research publications for prominent industry platforms. With a unique skill set that combines technical knowledge and storytelling, Kumar excels at communicating complex technological concepts to diverse audiences in a clear and engaging manner.

More articles
Kumar Gandharv
Kumar Gandharv

Kumar is an experienced Tech Journalist with a specialization in the dynamic intersections of AI/ML, marketing technology, and emerging fields such as crypto, blockchain, and NFTs. With over 3 years of experience in the industry, Kumar has established a proven track record in crafting compelling narratives, conducting insightful interviews, and delivering comprehensive insights. Kumar's expertise lies in producing high-impact content, including articles, reports, and research publications for prominent industry platforms. With a unique skill set that combines technical knowledge and storytelling, Kumar excels at communicating complex technological concepts to diverse audiences in a clear and engaging manner.

Hot Stories

Top Investment Projects of the Week 25-29.03

by Viktoriia Palchik
March 29, 2024
Join Our Newsletter.
Latest News

Top Investment Projects of the Week 25-29.03

by Viktoriia Palchik
March 29, 2024

Supply and Demand Zones

Cryptocurrency, like any other currency, is a financial instrument based on the fundamental economic principles of supply ...

Know More

Top 10 Crypto Wallets in 2024

With the current fast-growing crypto market, the significance of reliable and secure wallet solutions cannot be emphasized ...

Know More
Join Our Innovative Tech Community
Read More
Read more
Modular Blockchain Sophon Raises $10M Funding from Paper Ventures and Maven11 Amid Veil of Mystery
Business News Report
Modular Blockchain Sophon Raises $10M Funding from Paper Ventures and Maven11 Amid Veil of Mystery
March 29, 2024
Arbitrum Foundation Announces Third Phase Of Grants Program, Opens Applications From April 15th
News Report Technology
Arbitrum Foundation Announces Third Phase Of Grants Program, Opens Applications From April 15th
March 29, 2024
Top Investment Projects of the Week 25-29.03
Digest Technology
Top Investment Projects of the Week 25-29.03
March 29, 2024
Vitalik Buterin Advocates For Memecoins’ Potential In Crypto Sector, Favors ‘Good Memecoins’
News Report Technology
Vitalik Buterin Advocates For Memecoins’ Potential In Crypto Sector, Favors ‘Good Memecoins’
March 29, 2024