mC4: A multilingual colossal, cleaned version of Common Crawl's web crawl corpus.
mC4: A multilingual colossal, cleaned version of Common Crawl's web crawl corpus.

huggingface.co
legacy-datasets/mc4 · Datasets at Hugging Face

A multilingual colossal, cleaned version of Common Crawl's web crawl corpus. Based on Common Crawl dataset: "https://commoncrawl.org/".