Building a Dataset of Translated Sentences

Facebook has released CCMatrix, a dataset that contains 4.5 billion parallel sentences—sentences in one language and their corresponding translations in other languages. The dataset comprises parallel sentences for more than 500 language pairs. CCMatrix can help advance the development of translation systems, particularly for languages for which there is relatively little digitized material.

Source : https://www.datainnovation.org/2020/02/building-a-dataset-of-translated-sentences/

Date : February 14, 2020 at 11:06PM

Tag(s) : #DATA ENG

Email
Facebook
Twitter
Linkedin
Whatsapp
Print

Dans la même catégorie ...

How to build the next-generation data lake

10 Bits: the Data News Hotlist October 24, 2020 – October 30, 2020

Disruptive IoT Solution from WattIQ Turns Smart Plugs into Data Mines