Current Fediverse Implementation
From my understanding, the prominent fediverse implementations implement fanout via writing to other instances.
In other words, if user A on instance A makes post A, instance A will write or sync post A in all instances that have followers for user A. So user B on instance B will read post A from instance B.
Why this is Done
From my understanding, to prevent a case where post A is viral and everyone wants to read it, and instance A’s database gets overwhelmed with reads. It also serves to replicate content
My Question: Why not rely on static files instead of database reads / writes to propagate content?
Instead of the above, if someone follows user A, they can get user A’s posts via a static file that contains all of User A’s posts. Do the same for everyone you follow.
Reading this file will be a lot less resource intensive than a database read, and with a CDN would be even better.
Cons
- posts are less “Real time”. Why? Because when post A is made, the static file must be updated (though fediverse does this already), and user B or instance B must fetch it. User B / instance B do not have the post pushed to them, so the post arrives with a delay depending on how frequently they fetch. But frequent fetches are okay, and easier to handle heavy loads than database reads.
- if using a CDN for the static files, there’s another delay based on the TTL and invalidation. This should still be small, up to a couple minutes at most.
Pros
- hosting a fediverse server is more accessible and cheaper, and it could scale better.
- Federation woes of posts not federating to other instances can potentially be resolved, as the fanout architecture is less complex (no longer necessary to write to a dozens or hundreds of instances for a single post).
- Clients can have greater freedom in implementing how they create news feeds. You don’t have to rely on your instance to do it. Instances primarily make content available, and clients can handle creating news feeds, content sorting and filtering (optional), etc.
What are your thoughts on this?
How do you expect your feed to be updated?
- I write a post, and send a request to the server to publish it
- The server takes the post and preprends it to the file housing all my posts
- Now, when someone requests my posts, they will see my new one
If a CDN is involved, we would have to properly take care of the invalidations and what not. We would have to run a batch process to update the CDN files, so that we are not doing it too often, but doing it every minute or so is still plenty fast for social media use cases.
So I have to constantly check all files from everyone I follow for new entries in order to have a working timeline?
Yes, precisely. The existing implementation in the Fediverse does the opposite: everyone you follow has to insert their posts into the feed of everyone that follows them, which has its own issues.
But only once. If an account doesn’t post/interact for a year, it doesn’t cause any traffic. With your approach, I constantly need to pull that account’s profile to see if something new showed up.
Sure, but constantly having to do it is not really a bad thing, given it is automated and those reads are quite inexpensive compared to a database query. It’s a lot easier to handle heavy loads when serving static files.
I’m really not sure about that being inexpensive. The files will grow and the list of people to follow usually grows as well. This just doesn’t scale well.
I follow 700 people on Mastodon. That’s 700 requests every interval. With 100-10000 posts or possibly millions of interactions in each file.
Of course you can do stuff like pagination or something like that. But some people follow 10000 accounts and want to have their timeline updated in short in intervals.
Pulling like this is usually used when the author can’t sent you something directly and it works in RSS Feeds. But most people don’t follow hundreds of RSS feeds. Which reminds me that every mastodon profile offers an RSS feed - you can already do what you described with an RSS reader.
bringing up RSS feeds is actually very good, because although you can paginate or partition your feeds, I have never seen a feed that does that, even when they have decades of history. But if needed, partioning is an option so you don’t have to pull all of its posts but only recent ones, or by date/time range.
I would also respectfully disagree that people don’t subscribe to 100’s of RSS feeds. I would bet most people who consistently use RSS feed readers will have more than 100 feeds, me included.
And last, even if you follow 10,000, yes it would require a lot more time than reading from a single database, but it is still on the order of double digit seconds at most. If you compare 10,000 static file fetches with 10,000 database writes across different instances, I think the static files would fare better. This isn’t to mention that you are more likely to have to write more than read more (users with 100k followers are far more common than users with 100k subscriptions)
And just to emphasize, I do agree that double digit seconds would be quite long for a user’s loading time, which is why I would expect to fetch regularly so the user logs onto a pre made news feed.
Sorry, I meant your timeline, where you see other peoples posts.
Oh my bad, I can explain that.
Before I do, one benefit of this method is that your timeline is entirely up to your client. Your instance becomes primarily tasked with making your posts available, and clients have the freedom of implementing the reading and news feed / timeline formation.
Hence, there are a few ways to do this. The best one is probably a mix of those.
Naive approach: fetch posts and build news feed when user requests it
This is not a good approach, but I mention it first because it’ll make explaining the next one easier.
- User opens app or website, thereby requesting their timeline / news feed
- server fetches list of user’s subscriptions and followees
- for each followee or subscription, server fetches their content via their static file wherever they are hosted
- server performs whatever filtering and ordering of content they want
- user sees the result
Cons: loading time for the user may be long, depending on how many subscriptions they have it could be several seconds. P90 may even be in double digits.
Better approach: pre-build user’s timeline periodically.
Think like a periodic job (hourly, or every 10 min, etc) , which fetches posts in a similar manner as described above, but instead of doing it when user requests it, it is done in advance
Pros:
- fast loading time compared to previous solution
- when the job runs, if users on the same instance share a followee or subscription, we don’t have to query it twice (This benefit already exists on current fediverse implementations) Cons: posts aren’t real-time, delayed by the batch job frequency.
Best approach: hybrid
In this approach, we primarily do the second method, to achieve fast loading time. But to get more up-to-date content, we also simultaneously fetch the latest in the background, and interleave or add the latest posts as the user scrolls.
This way we get both fast initial load times and recent posts.
Surely there’s other good approaches. As I said in the beginning, clients have the freedom to implement this however they like.