What’s stopping us from using the api to post all of reddit here in a massive one-time merger?

Obviously the Lemmy devs would have to do it, but would there be legal issues? I think it would solve most of the problems with Lemmy, really.

  • 7heo@lemmy.ml
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    1 year ago

    Let’s just do some quick math.

    From the list here, I counted 1985 subreddits. That’s from 2018. That was five (5!) years ago (yeah. I know).

    Let’s assume those subreddits have on average only 10000 posts each (which is really much, much lower than the reality, trust me).

    Now, let’s assume that those posts take up around 1MB of storage (that’s a lot of text, but not a lot of rich media. And there’s a whoooole lot of rich media on reddit, so that’s a very conservative estimate).

    Even with those absolutely lower-than-reality numbers, it would still give 20TB. That very easily fits on a single hard drive, right?

    Except… Unless you work at reddit, you’re going to have to copy this over the network. Meaning that even with peering of 1gbps to reddit (which isn’t gonna happen, they just won’t let you), it would still take 160000 seconds. Which is about two days of uninterrupted reddit scraping. At a full, constant 1Gbps.

    Realistically, you can expect to be scraping at around 100mbps at best, and with interruptions. That’s already changing the time it would require to about two weeks at the very best. And that’s not considering the, again, absolutely ridiculously low numbers I chose.

    Ah, and the list I linked? It’s without the NSFW material. Which can be easily 50MB+ a video. Of which there are dozen of thousands.