AI4Bharat Collecting 10 Tn Tokens To Build Next Generation Of AI Services

Posted On: 2025-03-09 Posted By: Team Inc42

Education Real Estate & Construction Inc 42

India, March 9 -- IIT Madras-incubated artificial intelligence (AI) lab, AI4Bharat, is reportedly collecting 10 Tn tokens of language data to build the "next generation of AI services".

For context, tokens are basic units of input and output for large language models (LLMs), and are a unit of text that can be a word, character, or subword.

As per Economic Times, AI4Bharat cofounder Mitesh Khapra claimed that the platform has "gone to almost every district in the country" and "tried to cover almost all the 22 official languages" in the past three years.

AI4Bharat claims to have sourced the data from voice samples of users across several demographics and professions.

Noting that the platform has built the tools required for data collect...

Click here to read full article from source

To read the full article or to get the complete feed from this publication, please Contact Us.

Exclusive

Category

Source

Publication

Location

AI4Bharat Collecting 10 Tn Tokens To Build Next Generation Of AI Services