India, March 9 -- IIT Madras-incubated artificial intelligence (AI) lab, AI4Bharat, is reportedly collecting 10 Tn tokens of language data to build the "next generation of AI services".
For context, tokens are basic units of input and output for large language models (LLMs), and are a unit of text that can be a word, character, or subword.
As per Economic Times, AI4Bharat cofounder Mitesh Khapra claimed that the platform has "gone to almost every district in the country" and "tried to cover almost all the 22 official languages" in the past three years.
AI4Bharat claims to have sourced the data from voice samples of users across several demographics and professions.
Noting that the platform has built the tools required for data collect...
Click here to read full article from source
To read the full article or to get the complete feed from this publication, please
Contact Us.