It’s something I’ve been thinking about lately. The older, bigger or more aggressive in its crawling an instance is, the more information it tends to hold.

And the information to feed this “engine” easily comes from several places, the posts and summarizations users make, tracker bots including their syndicated posts’ bodies of text, bridged contents from other protocols, dead instances whose posts can still be fetched, and who knows what else.

While slow and often incomplete at first, any given instance should have the tendency to become a library at an exponential rate. And if using some site software that can properly include the different types of posts together, the amount of stuff tracked grows even more.

Build some better search engine to work on top of what’s fetched by this potential instance and voilà, a search engine not dependent of Bing or Google’s API.

Makes sense to me, but what do you guys think?