Useful Resources

Task 1: Developing cleaned Sinhala corpora of (UNICODE) Sinhala

Aya Dataset: Please find the link here

C4 Dataset: Please find the link here

Datasets: The National Languages Processing Centre: Please find some more datasets here

Sinhala GPT2 Dataset: Please find the link here

Scroll to Top