Building a Dataset of Translated Sentences

Facebook has released CCMatrix, a dataset that contains 4.5 billion parallel sentences—sentences in one language and their corresponding translations in other languages. The dataset comprises parallel sentences for more than 500 language pairs. CCMatrix can help advance the development of translation systems, particularly for languages for which there is relatively little digitized material. 

Source : https://www.datainnovation.org/2020/02/building-a-dataset-of-translated-sentences/

Date : February 14, 2020 at 11:06PM

Tag(s) : #DATA ENG