• Flaky@iusearchlinux.fyi
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 year ago

    FWIW, Common Crawl - a free/open-source dataset of crawled internet pages - was used by OpenAI for GPT-2 and GPT-3 as well as EleutherAI’s GPT-NeoX. Maybe on GPT3.5/ChatGPT as well but they’ve been hush about that.