Dataset | Uthayasanker Thayasivam

The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

In this work, we introduce the FLORES-101 evaluation benchmark, consisting of 3001 sentences extracted from English Wikipedia and covering a variety of different topics and domains. These sentences have been translated in 101 languages by professional translators through a carefully controlled process.

Language - Tamil

Reference - https://ai.facebook.com/blog/the-flores-101-data-set-helping-build-better-translation-systems-around-the-world, https://github.com/facebookresearch/flores

Citation - Creative Commons Attribution Share Alike 4.0 International

@inproceedings{,title={The FLORES-101 Evaluation Benchmark for Low-Resource

and Multilingual Machine Translation},
author={Goyal, Naman and Gao, Cynthia and Chaudhary, Vishrav and Chen,

Peng-Jen and Wenzek, Guillaume and Ju, Da and Krishnan, Sanjana and Ranzato,

Marc'Aurelio and Guzm\'{a}n, Francisco and Fan, Angela},year={2021}}
@inproceedings{,
title={Two New Evaluation Datasets for Low-Resource Machine Translation:

Nepali-English and Sinhala-English},
author={Guzm\'{a}n, Francisco and Chen, Peng-Jen and Ott, Myle and Pino,

Juan and Lample, Guillaume and Koehn,

Philipp and Chaudhary, Vishrav and Ranzato, Marc'Aurelio},
journal={arXiv preprint arXiv:1902.01382},
year={2019}
}