Skip Navigation
Thai Natural Language Processing @lemmy.ml

mC4: A multilingual colossal, cleaned version of Common Crawl's web crawl corpus.

huggingface.co

legacy-datasets/mc4 · Datasets at Hugging Face

A multilingual colossal, cleaned version of Common Crawl's web crawl corpus. Based on Common Crawl dataset: "https://commoncrawl.org/".

0 comments

No comments