[Technical] Why not Fanout via static files or CDNs in the Fediverse?

matcha_addict@lemy.lol · edit-2 1 year ago

[Technical] Why not Fanout via static files or CDNs in the Fediverse?

django@discuss.tchncs.de · 1 year ago

How do you expect your feed to be updated?

matcha_addict@lemy.lol · 1 year ago

I write a post, and send a request to the server to publish it
The server takes the post and preprends it to the file housing all my posts
Now, when someone requests my posts, they will see my new one

If a CDN is involved, we would have to properly take care of the invalidations and what not. We would have to run a batch process to update the CDN files, so that we are not doing it too often, but doing it every minute or so is still plenty fast for social media use cases.

tofu · 1 year ago

So I have to constantly check all files from everyone I follow for new entries in order to have a working timeline?

matcha_addict@lemy.lol · 1 year ago

Yes, precisely. The existing implementation in the Fediverse does the opposite: everyone you follow has to insert their posts into the feed of everyone that follows them, which has its own issues.

tofu · 1 year ago

But only once. If an account doesn’t post/interact for a year, it doesn’t cause any traffic. With your approach, I constantly need to pull that account’s profile to see if something new showed up.

matcha_addict@lemy.lol · 1 year ago

Sure, but constantly having to do it is not really a bad thing, given it is automated and those reads are quite inexpensive compared to a database query. It’s a lot easier to handle heavy loads when serving static files.

tofu · 1 year ago

I’m really not sure about that being inexpensive. The files will grow and the list of people to follow usually grows as well. This just doesn’t scale well.

I follow 700 people on Mastodon. That’s 700 requests every interval. With 100-10000 posts or possibly millions of interactions in each file.

Of course you can do stuff like pagination or something like that. But some people follow 10000 accounts and want to have their timeline updated in short in intervals.

Pulling like this is usually used when the author can’t sent you something directly and it works in RSS Feeds. But most people don’t follow hundreds of RSS feeds. Which reminds me that every mastodon profile offers an RSS feed - you can already do what you described with an RSS reader.

matcha_addict@lemy.lol · 1 year ago

bringing up RSS feeds is actually very good, because although you can paginate or partition your feeds, I have never seen a feed that does that, even when they have decades of history. But if needed, partioning is an option so you don’t have to pull all of its posts but only recent ones, or by date/time range.

I would also respectfully disagree that people don’t subscribe to 100’s of RSS feeds. I would bet most people who consistently use RSS feed readers will have more than 100 feeds, me included.

And last, even if you follow 10,000, yes it would require a lot more time than reading from a single database, but it is still on the order of double digit seconds at most. If you compare 10,000 static file fetches with 10,000 database writes across different instances, I think the static files would fare better. This isn’t to mention that you are more likely to have to write more than read more (users with 100k followers are far more common than users with 100k subscriptions)

And just to emphasize, I do agree that double digit seconds would be quite long for a user’s loading time, which is why I would expect to fetch regularly so the user logs onto a pre made news feed.

django@discuss.tchncs.de · 1 year ago

Sorry, I meant your timeline, where you see other peoples posts.

matcha_addict@lemy.lol · 1 year ago

Oh my bad, I can explain that.

Before I do, one benefit of this method is that your timeline is entirely up to your client. Your instance becomes primarily tasked with making your posts available, and clients have the freedom of implementing the reading and news feed / timeline formation.

Hence, there are a few ways to do this. The best one is probably a mix of those.

Naive approach: fetch posts and build news feed when user requests it

This is not a good approach, but I mention it first because it’ll make explaining the next one easier.

User opens app or website, thereby requesting their timeline / news feed
server fetches list of user’s subscriptions and followees
for each followee or subscription, server fetches their content via their static file wherever they are hosted
server performs whatever filtering and ordering of content they want
user sees the result

Cons: loading time for the user may be long, depending on how many subscriptions they have it could be several seconds. P90 may even be in double digits.

Better approach: pre-build user’s timeline periodically.

Think like a periodic job (hourly, or every 10 min, etc) , which fetches posts in a similar manner as described above, but instead of doing it when user requests it, it is done in advance

Pros:

fast loading time compared to previous solution
when the job runs, if users on the same instance share a followee or subscription, we don’t have to query it twice (This benefit already exists on current fediverse implementations) Cons: posts aren’t real-time, delayed by the batch job frequency.

Best approach: hybrid

In this approach, we primarily do the second method, to achieve fast loading time. But to get more up-to-date content, we also simultaneously fetch the latest in the background, and interleave or add the latest posts as the user scrolls.

This way we get both fast initial load times and recent posts.

Surely there’s other good approaches. As I said in the beginning, clients have the freedom to implement this however they like.

[Technical] Why not Fanout via static files or CDNs in the Fediverse?

[Technical] Why not Fanout via static files or CDNs in the Fediverse?

Current Fediverse Implementation

Why this is Done

My Question: Why not rely on static files instead of database reads / writes to propagate content?

Cons

Pros

Naive approach: fetch posts and build news feed when user requests it

Better approach: pre-build user’s timeline periodically.

Best approach: hybrid