Global Estonian | Estonia gives Meta access to 4 billion words for large language model development

Estonia gives Meta access to 4 billion words for large language model development

Location: 
Estonia
News Category: 
Business

Estonia will make almost 4 billion words available for technology giant Meta – which owns Facebook and Instagram – to train large language models, the Ministry of Justice said on Thursday. Minister of Justice and Digital Affairs Lisa Pakosta (Eesti 200) said the survival of the Estonian language relies on such deals and giving away data for free.

The ministry said Meta is the second company developing large language models to be given access to the Estonian language corpus' open data, which contains nearly 4 billion words.

"Sharing Estonian-language data creates the precondition for large language models to understand the cultural context of Estonia and become more proficient in using the Estonian language," the press release said.

It also creates better services for Estonian-speaking users in various AI-based applications – such as chatbots, translation systems, and other language technology solutions, it added.

Pakosta said it is "crucial" that large language models take the Estonian language and culture into account.

Estonia is open to cooperation and ready to share its language data with other large language model developers, the ministry said.

It urges both the public and private sectors to publish data in order to increase the volume of high-quality Estonian-language data.

 

 


  

Veebilehte haldab Integratsiooni Sihtasutus.
Sihtasutuse asutaja on Eesti Vabariik, kelle nimel teostab asutajaõigusi Kultuuriministeerium.