Digital bits, one zero
I’m !RoundSparrow@Lemmy.ml and RocketDerp on Github. I’m trying to test and identify Lemmy platform scalability and performance issues.

  • 5 Posts
  • 22 Comments
Joined 1 year ago
cake
Cake day: June 2nd, 2023

help-circle


  • I think this is bullshit.

    I think it is exactly how people are behaving. And I can even recall witnessing many people first hand who flip a newspaper to the sports section. Never learning anything about science news, medical news, unless it’s some kind of social column about a diet.

    People wanting to cut out and block things they don’t want to read in a newspaper is what I consider the “default behavior” of most of humanity. No surprise they do not care about the news their friends share. An intelligent computer system that filters out (based on topic/content study) what they don’t want to see before-hand is always going to be popular with such people.

    “One of the effects of living with electric information is that we live habitually in a state of information overload. There’s always more than you can cope with.” — Marshall McLuhan.








  • While yes, we should be able to delete our content if we want, but it’s a bit naive to think there could be true privacy in any decentralised social media platform.

    Especially an email or “reddit” threaded conversation systems where quoting of messages is routine. Here I am, quoting you.

    You are putting a billboard up in public, on a bulletin board in the center of the Internet, the assumption should be that anyone can photograph it.


  • Given the beta status of Lemmy, I don’t even think it’s a great idea to give the appearance of privacy. I think the core purpose of a webapp like Lemmy is public messages.

    I think it’s a can of worms for server operators to get into the business of thinking they can safely hold private messages between users/strangers. None of the Lemmy instances I’ve joined have had a “terms of service” or anything like that on SIgn Up, I really think the message should be sent far and wide that Lemmy is about posting IN PUBLIC and that messages are being FEDERATED to peers, even people that you don’t know could be collecting the data for a search engine.

    With small-time server operators opening up hundreds of Lemmy instances, without giving away their experience or human identity, how can you have any confidence that someone is properly securing a server they only have part-time job to update and operate? Major corporations are having their database stolen, Valve, Sony, Nintendo, health care companies, mobile network companies (AT&T)… you think a low-budget shoestring server by a hobbyist running Lemmy should be held to the same standards as a corporation who has an entire team and services to defend their data?






  • something like Apache Kafka

    Not that I see. A database like PostgreSQL can work, but you have to be really careful how new data flows into the database. As writing to the database involves record locking and invalidates the cache for output.

    Or changing to something that can be scaled, like cockroach db or neondb?

    Taking the bulk data, comments and postings, outside PostgreSQL would help. Especially since what most people are reading on a Reddit-like website is content form the last 48 hours… and your caching potential dies way down as people move on to the newer content.

    The comments alone are the primary problem, there are lot of them on each posting and they are bulky data. Also comments are unique data.


  • I doubt it is anything that level. The problem is the data itself, in the datababase.

    A reddit-like website is like email, every load from the database has unique content. You really have to be very careful when designing for scalability when almost all the data is unique.

    As opposed to a site like Amazon where the listing for a toothbrush is not unqiue on every page load. There aren’t new comments and new votes altering the toothbrush listing every time a user refreshes the page. And people aren’t switching brands of toothbrush every 24 hours like the front page of Reddit abandons old data and starts with fresh data.


  • The problems I see with Lemmy performance all point to SQL being poorly optimized. In particular, federation is doing database inserts of new content from other servers - and many servers can be incoming at the same time with their new postings, comments, votes. Priority is not given to interactive webapp/API users.

    Using a SQL database for a backend of a website with unique data all over the place is very tricky. You have to be really program the app to avoid touching the database and create queues and such when you can. Reddit (at lest 9 years ago when they open sourced it) is also based on PostgreSQL - and you will see they do not do live inserts into comments like Lemmy does - they queue them using something other than the main database then insert them in batch.

    email MTA apps I’ve seen do the same thing, they queue files to disk before putting into the main database.

    I don’t think nginx is the problem, the bottleneck is the backend of the backend, PostgreSQL doing all that I/O and record locking.