Some thoughts on how useful Anubis really is. Combined with comments I read elsewhere about scrapers starting to solve the challenges, I’m afraid Anubis will be outdated soon and we need something else.
Anubis sucks
However, the number of viable options is limited.
Yeah but at least Anubis is cute.
I’ll take sucks but cute over dead internet and endless swarmings of zergling crawlers.
New developments: just a few hours before I post this comment, The Register posted an article about AI crawler traffic. https://www.theregister.com/2025/08/21/ai_crawler_traffic/
Anubis’ developer was interviewed and they posted the responses on their website: https://xeiaso.net/notes/2025/el-reg-responses/
In particular:
Fastly’s claims that 80% of bot traffic is now AI crawlers
In some cases for open source projects, we’ve seen upwards of 95% of traffic being AI crawlers. For one, deploying Anubis almost instantly caused server load to crater by so much that it made them think they accidentally took their site offline. One of my customers had their power bills drop by a significant fraction after deploying Anubis. It’s nuts.
So, yeah. If we believe Xe, OOP’s article is complete hogwash.
Cool article, thanks for linking! Not sure about that being a new development though, it’s just results, but we already knew it’s working. The question is, what’s going to work once the scrapers adapt?
Sometimes I think. Imagine if a company like google or facebook would implement something like anubis. And suddenly most people’s browsers would start solving cpu intensive constant cryptographic challenges. People would be outraged by the wasted energy. But somehow “cool small company” does it and it’s fine.
I do not think anubis system is sustainable for all the people to use it, it’s just too wasteful energy wise.
What alternatives do you propose?
Captcha.
It does all Anubis does. If a scrapper wants to solve it automatically it’s computer intensive, they have to run AI inference, but for the user it’s just a little time consuming.
With captchas you don’t run aggressive software unauthorized on anyone’s computer.
Solution did exist. But Anubis is “trendy” and they are masters in PR within some specific circles of people who always wants the lastest most trendiest thing.
But good old captcha would achieve the same result as Anubis, in a more sustainable way.
Or at least give user an option of running or not running the challenge and leave the page. And make clear for the user that their hardware is going to run an intensive task. It really feels very aggressive to have a webpage to run basically a cryptominer unauthorized in your computer. And for me having a cargirl as a mascot does not forgive the rudeness of it.
“good old captcha” is the most annoying thing ever for people and basically universally hated. Talking about leaving the page, what do you think what will cause more people to leave the page, a captcha that’s often broken or something where people don’t have to do anything but wait a little?
There are some sites where Anubis won’t let me through. Like, I just get immediately bounced.
So RIP dwarf fortress forums. I liked you.
I don’t get it, I thought it allows all browser with JavaScript enabled.
I, too get blocked by certain sites. I think it’s a configuration thing, where it does not like my combination of uBlock/NoScript, even when I explicitly allow their scripts…
I love that domain name.
This… makes no sense to me. Almost by definition, an AI vendor will have a datacenter full of compute capacity.
Well it doesnt fucking matter what “makes sense to you” because it is working…
Its being deployed by people who had their sites DDoS’d to shit by crawlers and they are very happy with the results so what even is the point of trying to argue here?It’s working because it’s not very used. It’s sort of a “pirate seagull” theory. As long a few people use it it works. Because scrappers don’t really count on Anubis so they don’t implement systems to surpass it.
If it were to become more common it would be really easy to implement systems that would defeat the purpose.
As of right now sites are ok because scrappers just send https requests and expect a full response. If someone wants to bypass Anubis protection they would need to take into account that they will receive a cryptographic challenge and have to solve it.
The thing is that cryptographic challenges can be very optimized. They are designed to run in a very inefficient environment as it is a browser. But if someone would take the challenge and solve it in a better environment using CUDA or something like that it would take a fraction of the energy defeating the purpose of “being so costly that it’s not worth scrapping”.
At this point it’s only a matter of time that we start seeing scrappers like that. Specially if more and more sites start using Anubis.
I’m constantly unable to access Anubis sites on my primary mobile browser and have to switch over to Fennec.
The current version of Anubis was made as a quick “good enough” solution to an emergency. The article is very enthusiastic about explaining why it shouldn’t work, but completely glosses over the fact that it has worked, at least to an extent where deploying it and maybe inconveniencing some users is preferable to having the entire web server choked out by a flood of indiscriminate scraper requests.
The purpose is to reduce the flood to a manageable level, not to block every single scraper request.
The article is very enthusiastic about explaining why it shouldn’t work, but completely glosses over the fact that it has worked
This post was originally written for ycombinator “Hacker” News which is vehemently against people hacking things together for greater good, and more importantly for free.
It’s more of a corporate PR release site and if you aren’t known by the “community”, calling out solutions they can’t profit off of brings all the tech-bros to the yard for engagement.
Exactly my thoughts too. Lots of theory about why it won’t work, but not looking at the fact that if people use it, maybe it does work, and when it won’t work, they will stop using it.
And it was/is for sure the lesser evil compared to what most others did: put the site behind Cloudflare.
I feel people that complain about Anubis have never had their server overheat and shut down on an almost daily basis because of AI scrapers 🤦
I still think captchas are a better solution.
In order to surpass them they have to run AI inference which is also comes with compute costs. But for legitimate users you don’t run unauthorized intensive tasks on their hardware.
They are much worse for accessibility, and also take longer to solve and are more distruptive for the majority of users.
Anubis is worse for privacy. As you have to have JavaScript enabled. And worse for the environment as the cryptographic challenges with PoW are just a waste.
Also reCaptcha types are not really that disturbing most of the time.
As I said, the polite thing you just be giving users the options. Anubis PoW running directly just for entering a website is one of the most rudest piece of software I’ve seen lately. They should be more polite, and just give an option to the user, maybe the user could chose to solve a captcha or run Anubis PoW, or even just having Anubis but after a button the user could click.
I don’t think is good practice to run that type of software just for entering a website. If that tendency were to grow browsers would need to adapt and straight up block that behavior. Like only allow access to some client resources after an user action.
Are you seriously complaining about an (entirely false) negative privacy aspect of Anubis and then suggest reCaptcha from Google is better?
Look, no one thinks Anubis is great, but often it is that or the website becoming entirely inaccessible because it is DDOSed to death by the AI scrapers.
First, I said reCaptcha types, meaning captchas of the style of reCaptcha. That could be implemented outside a google environment. Secondly, I never said that types were better for privacy. I just said Anubis is bad for privacy. Traditional captchas that work without JavaScript would be the privacy friendly way.
Third, it’s not a false proposition. Disabling JavaScript can protect your privacy a great deal. A lot of tracking is done through JavaScript.
Last, that’s just the Anubis PR slogan. Not the truth, as I said ddos mitigation could be implemented in other ways. More polite and/or environmental friendly.
Are you astrosurfing for anubis? Because I really cannot understand why something as simple as a landing page with a button “run PoW challenge” would be that bad
Is there a reason other than avoiding infrastructure centralization not to put a web server behind cloudflare?
Yes, because Cloudflare routinely blocks entire IP ranges and puts people into endless captcha loops. And it snoops on all traffic and collects a lot of metadata about all your site visitors. And if you let them terminate TLS they will even analyse the passwords that people use to log into the services you run. It’s basically a huge survelliance dragnet and probably a front for the NSA.
Cloudflare would need https keys so they could read all the content you worked so hard to encrypt. If I wanted to do bad shit I would apply at Cloudflare.
Maybe I’m misunderstanding what “behind cloudflare” means in this context, but I have a couple of my sites proxied through cloudflare, and they definitely don’t have my keys.
I wouldn’t think using a cloudflare captcha would require such a thing either.
That’s because they just terminate TLS at their end. Your DNS record is “poisoned” by the orange cloud and their infrastructure answers for you. They happen to have a trusted root CA so they just present one of their own certificates with a SAN that matches your domain and your browser trusts it. Bingo, TLS termination at CF servers. They have it in cleartext then and just re-encrypt it with your origin server if you enforce TLS, but at that point it’s meaningless.
Oh, I didn’t think about the fact that they’re a CA. That’s a good point; thanks for the info.
Hmm, I should look up how that works.
Edit: https://developers.cloudflare.com/ssl/origin-configuration/ssl-modes/#custom-ssltls
They don’t need your keys because they have their own CA. No way I’d use them.
Edit 2: And with their own DNS they could easily route any address through their own servers if they wanted to, without anyone noticing. They are entirely too powerful. Is there some way to prevent this?
Yeah, I’m just wondering what’s going to follow. I just hope everything isn’t going to need to go behind an authwall.
The developer is working on upgrades and better tools. https://xeiaso.net/blog/2025/avoiding-becoming-peg-dependency/
I’ll say the developer is also very responsive. They’re (ambiguous ‘they’, not sure of pronouns) active in a libraries-fighting-bots slack channel I’m on. Libraries have been hit hard by the bots: we have hoards of tasty archives and we don’t have money to throw resources at the problem.
The Anubis repo has an enbyware emblem fun fact :D
Yay! I won’t edit my comment (so your comment will make sense) but I checked and they also list they/them on their github profile
Cool, thanks for posting! Also the reasoning for the image is cool.
Unless you have a dirty heatsink, no amount of hammering would make the server overheat
Are you explaining my own server to me? 🙄
What CPU do you have made after 2004 that doesn’t have automatic temperature control ?
I don’t think there is any, unless you somehow managed to disable it ?
Even a raspberry pi without a heatsink won’t overheat to shutdownYou are right, it is actually worse, it usually just overloads the CPU so badly that it starts to throttle and then I can’t even access the server via SSH anymore. But sometimes it also crashes the server so that it reboots, and yes that can happen on modern CPUs as well.
You need to set you http serving process to a priority below the administrative processes (in the place where you are starting it, so assuming linux server that would be your init script or systemd service unit).
Actual crash causing reboot ? Do you have faulty ram maybe ? That’s really not ever supposed to happen from anything happenning in userland. That’s not AI, your stuff might be straight up broken.
Only thing that isn’t broken that could reboot a server is a watchdog timer.
You server shouldn’t crash, reboot or become unreachable from the admin interface even at 100% load and it shouldn’t overheat either, temperatures should never exceed 80C no matter what you do, it’s supposed to be impossible with thermal management, which all processors have had for decades.
Great that this is all theoretical 🤷 My server hardware might not be the newest but it is definitly not broken.
And besides, what good is that you can still barely access the server through ssh, when the cpu is constantly maxed out and site visitors only get a timeout when trying to access the services?
I don’t even get what you are trying to argue here. That the AI scraper DDOS isn’t so bad because in theory it shouldn’t crash the server? Are you even reading what you are writing yourself? 🤡
The problem is that the purpose of Anubis was to make crawling more computationally expensive and that crawlers are apparently increasingly prepared to accept that additional cost. One option would be to pile some required cycles on top of what’s currently asked, but it’s a balancing act before it starts to really be an annoyance for the meat popsicle users.
That’s why the developer is working on a better detection mechanism. https://xeiaso.net/blog/2025/avoiding-becoming-peg-dependency/
Have you tried accessing it by using Nyarch?
Yeah, well-written stuff. I think Anubis will come and go. This beautifully demonstrates and, best of all, quantifies the negligence of Anubis.
It’s very interesting to try to think of what would work, even conceptually. Some sort of purely client-side captcha type of thing perhaps. I keep thinking about it in half-assed ways for minutes at a time.
Maybe something that scrambles the characters of the site according to some random “offset” of some sort, e.g maybe randomly selecting a modulus size and an offset to cycle them, or even just a good ol’ cipher. And the “captcha” consists of a slider that adjusts the offset. You as the viewer know it’s solved when the text becomes something sensical - so there’s no need for the client code to store a readable key that could be used to auto-undo the scrambling. You could maybe even have some values of the slider randomly chosen to produce English text if the scrapers got smart enough to check for legibility (not sure how to hide which slider positions would be these red herring ones though) - which could maybe be enough to trick the scraper into picking up junk text sometimes.
Anubis is more of a economic solution. It doesn’t stop bots but it does make companies pay more to access content instead of having server operators foot the bill.
I’m sure you meant to sound more analytical than anything… but this really comes off as arrogant.
You make the claim that Anubis is negligent and come and go, and then admit ton only spending minutes at a time thinking of solutions yourself, which you then just sorta spout. It’s fun to think about solutions to this problem collectively, but can you honestly believe that Anubis is negligent when it’s so clearly working and when the author has been so extremely clear about their own perception of its pitfalls and hasty development (go read their blog, it’s a fun time).
That kind of captcha is trivial to bypass via frequency analysis. Text that looks like language, as opposed to random noise, is very statistically recognisable.
Not to mention it relies on security though obscurity
It wouldn’t be that hard to figure out and bypass
That type of captcha already exists. I don’t know about their specific implementation, but 4chan has it, and it is trivially bypassed by userscripts.
That’s the great thing about Anubis: it’s not client-side. Not entirely anyways. Similar to public key encryption schemes, it exploits the computational complexity of certain functions to solve the challenge. It can’t just say “solved, let me through” because the client has to calculate a number, based on the parameters of the challenge, that fits certain mathematical criteria, and then present it to the server. That’s the “proof of work” component.
A challenge could be something like “find the two prime factors of the semiprime
1522605027922533360535618378132637429718068114961380688657908494580122963258952897654000350692006139
”. This number is known as RSA-100, it was first factorized in 1991, which took several days of CPU time, but checking the result is trivial since it’s just integer multiplication. A similar semiprime of 260 decimal digits still hasn’t been factorized to this day. You can’t get around mathematics, no matter how advanced your AI model is.@rtxn I don’t understand how that isn’t client side?
Anything that is client side can be, if not spoofed, then at least delegated to a sub process, and my argument stands
Please, explain to us how you expect to spoof a math problem that you have to provide an answer to the server before proceeding.
You can math all you want on the client, but the server isn’t going to give you shit until you provide the right answer.
@Passerby6497 I really don’t understand the issue here
If there is a challenge to solve, then the server has provided that to the client
There is no way around this, is there?
You’re given the challenge to solve by the server, yes. But just because the challenge is provided to you, that doesn’t mean you can fake your way through it.
You still have to calculate the answer before you can get any farther. You can’t bullshit/spoof your way through the math problem to bypass it, because your correct answer is required to proceed.
There is no way around this, is there?
Unless the server gives you a well-known problem you have the answer to/is easily calculated, or you find a vulnerability in something like Anubis to make it accept a wrong answer, not really. You’re stuck at the interstitial page with a math prompt until you solve it.
Unless I’m misunderstanding your position, I’m not sure what the disconnect is. The original question was about spoofing the challenge client side, but you can’t really spoof the answer to a complicated math problem unless there’s an issue with the server side validation.
@Passerby6497 my stance is that the LLM might recognize that the best way to solve the problem is to run chromium and get the answer from there, then pass it on?
It’s not client-side because validation happens on the server side. The content won’t be displayed until and unless the server receives a valid response, and the challenge is formulated in such a way that calculating a valid answer will always take a long time. It can’t be spoofed because the server will know that the answer is bullshit. In my example, the server will know that the prime factors returned by the client are wrong because their product won’t be equal to the original semiprime. Delegating to a sub-process won’t work either, because what’s the parent process supposed to do? Move on to another piece of content that is also protected by Anubis?
The point is to waste the client’s time and thus reduce the number of requests the server has to handle, not to prevent scraping altogether.
@rtxn validation of what?
This is a typical network thing: client asks for resource, server says here’s a challenge, client responds or doesn’t, has the correct response or not, but has the challenge regardless
THEN (and this is the part you don’t seem to understand) the client process has to waste time solving the challenge, which is, by the way, orders of magnitudes lighter on the server than serving the actual meaningful content, or cancel the request. If a new request is sent during that time, it will still have to waste time solving the challenge. The scraper will get through eventually, but the challenge delays the response and reduces the load on the server because while the scrapers are busy computing, it doesn’t have to serve meaningful content to them.
@rtxn all right, that’s all you had to say initially, rather than try convincing me that the network client was out of the loop: it isn’t, that’s the whole point of Anubis
Yeah has seemed like a bit of a waste of time, once that difficulty gets scaled up and expiration down it’s gonna get annoying to use the web on phones
I had to get my glasses to re-read this comment.
You know why anubis is in place on so many sites, right? You are literally blaming the victims for the absolute bullshit AI is foisting on us all.
Yes, I manage cloudflare for a massive site that at times gets hit with millions of unique bot visits per hour
So you know that this is the lesser of the two evils? Seems like you’re viewing it from client’s perspective only.
No one wants to burden clients with Anubis, and Anubis shouldn’t exist. We are all (server operators and users) stuck with this solution for now because there is nothing else at the moment that keeps these scrapers at bay.
Even the author of Anubis doesn’t like the way it works. We all know it’s just more wasted computing for no reason except big tech doesn’t give a care about anyone.
My point is, and the author’s point is, it’s not computation that’s keeping the bots away right now. It’s the obscurity and challenge itself getting in the way.
I don’t think so. I think he’s blaming the “solution” as being a stop gap at best and painful for end-users at worst. Yes the AI crawlers have caused the issue but I’m not sure this is a great final solution.
As the article discussed, this is essentially “an expensive“ math problem meant to deter AI crawlers but in the end it ain’t really that expensive. It’s more like they put two door handles on a door hoping the bots are too lazy to turn both of them but also severely slowing down all one-handed people. I’m not sure it will ever be feasible to essentially figure out how to have one bot determine if the other end is also a bot without human interaction.
It works because it’s a bit of obscurity, not because it’s expensive. Once it’s a big enough problem the scrapers will adapt and then the only option is to make it more obscure/different or crank up the difficulty which will slow down genuine users much more