Data from: EnTam: An English-Tamil Parallel Corpus (EnTam v2.0)


EnTam is a sentence aligned English-Tamil bilingual corpus from some of the publicly available websites that we have collected for NLP research involving Tamil. The standard set of processing has been applied on the the raw web data before the data became available in sentence aligned English-Tamil parallel corpus suitable for various NLP tasks. The parallel corpus includes texts from bible, cinema and news domains.

License - CC BY-NC-SA 3.0

Authors - Ramasamy, Loganathan ; Bojar, Ondřej and Žabokrtský, Zdeněk

Language - Tamil

Reference- https://datasetsearch.research.google.com/search?src=2&query=Tamil%20NLP&docid=L2cvMTFqbnl2cXd5dg%3D%3D 

https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1454