Sri Lanka, Aug. 28 -- Researchers at the University of Moratuwa have developed and launched the country's first large-scale Sinhala-only large language model (LLM), a breakthrough in local language computing. The model has been named "SinLlama."

According to the research team, SinLlama was built by continually pre-training the Llama-3-8B model with a corpus of nearly 10 million Sinhala sentences. The project, led by the university's Department of Computer Science and Engineering, has created the largest Sinhala LLM to date, which has already outperformed the base Llama-3-8B model on Sinhala text classification benchmarks.

The research team has made both the SinLlama model and its accompanying 10 million-sentence dataset freely available...