

He’s doing it in an attempt to “sabotage” AI training.
It’s also a useful flag to indicate that he doesn’t understand how AIs are trained.
Basically a deer with a human face. Despite probably being some sort of magical nature spirit, his interests are primarily in technology and politics and science fiction.
Spent many years on Reddit before joining the Threadiverse as well.
He’s doing it in an attempt to “sabotage” AI training.
It’s also a useful flag to indicate that he doesn’t understand how AIs are trained.
What’s not what we expected?
There are some lawsuits in motion about this and the early signs are that it is indeed legal. For example, in Kadrey et al v. Meta the judge issued a summary judgment that training an AI on books was “highly transformative” and fell under fair use, and similarly in Bartz, Graeber and Johnson v. Anthropic the judge ruled that training an AI on books was fair use. I always expected this would be the case since an AI model does not literally contain the training material it was trained on, it learns patterns from the training material but that’s not the same as the literal expression of the training material. Since the training material isn’t being copied there’s nothing for copyright to restrict here.
Assuming you know which instances are the ones they’re collecting data from. It could be any instance.
Case law is still pretty young in this area, but it’s looking like there’s nothing actually against copyright about the training of AI on copyrighted content. It’s not something that a license can restrict because the trainers can simply reject the license and carry on training under the basics of what the law allows them to do anyway.
Open source licenses only have power because they grant permissions that people normally wouldn’t have and put conditions on those permissions. If you don’t need those permissions then you don’t have to be bound by those conditions.
I don’t see why everyone’s surprised about this. The Fediverse is running on ActivityPub, an open protocol whose purpose is to broadcast the content we post here to anyone who wants it. Of course it’s being used to train AI, why wouldn’t it?
First-mover advantage is powerful.
You may know IPv6 is ridiculously bigger, but you don’t know it.
There are enough IPv6 addresses that you could give 10^17 addresses to every square millimeter of Earth’s surface. Or 5×10^28 addresses for every living human being. On a more cosmic scale, you could issue 4×10^15 addresses to every star in the observable universe.
We’re not going to run out by giving them to lightbulbs.
Modern LLMs are trained on highly curated and processed data, often synthetic data based off of original posts and not the posts themselves. And the trainers are well aware that there are people trying to “poison” the data in various ways. At this point it’s mainly an annoyance to other humans when people try.
I don’t know of any “men only” instances, the fact that it’s gender-specific is niche rather than the specific gender.
I don’t consider it something to be “fixed.” I like that the Fediverse is fully decentralized, with no authority over who gets “in” and who doesn’t. Once you’ve got some kind of authority that can decide who’s allowed on which instances, with some kind of global registry of individual users that can exclude you if the wrong people don’t like you, we’re basically back to being Reddit with some fancy extra steps.
Sure, it risks allowing assholes to continue getting new accounts. But we already have a Reddit, I’d rather try something new even if that comes with downsides.
Reddit is able to do global IP bans. The Fediverse is not able to do that because there’s no “global”.
“Asshole” is a broad term. It includes racists, abrasive personalities, anger-management problems, and so forth. Ie, people who have a tendency to get banned from other places. It’s not just trolls.
Being banned from Reddit is a unitary action. They can’t get back into Reddit, they’re just gone. Whereas in the Fediverse you can just go to a different instance and sign up afresh each time you get banned. This is part of the Fediverse’s design. And so I am concerned that the Fediverse will accumulate the worst users.
One thing that has been concerning me lately is that the Fediverse is being treated as a refuge for people who get banned on Reddit or other social media. Sure, sometimes those bans are based on arbitrary power tripping nonsense. But people actually do get banned for being assholes, and so I’ve got some worry that this is distilling the population of the Fediverse in an unfortunate direction.
I’m not making any statements about what other people may think about it. The question was why this guy is doing this thing, and I expect it’s because he finds it fun.
I’ve seen plenty of weird bots on Reddit over the years that had no point to their existence other than, presumably, having been fun to code. This is probably one such.
Right, and this is presumably something he finds fun. You were asking why, I was explaining why.
No, it’s not the same. I was using basketball as an analogy. Someone who doesn’t enjoy basketball wouldn’t “get it”, just as you’re not “getting” the fun that can come from building and playing around with AI bots. Different people find different things to be fun.
/r/SubSimGPT2Interactive/ does this.
I actually wandered away from the SubredditSimulator successor subreddits because even with GPT2 they were “too good”, they lost their charm. Back when SubredditSimulator was still active it was using simple Markov chain based text generators and they produced the most wonderfully bonkers nonsense, that was hilarious. Modern AIs just sound like regular people, and I get that everywhere already.
I, too, started out on kbin and ended up migrating to an mbin instance. I sent Ernest some money via that Koffi thing he had and I don’t regret it - I hope he found the funds useful, whatever it is that happened to him in the end. He kicked off an alternative to Lemmy and that’s super important for a distributed decentralized system like the Fediverse, you can’t have just one client for it.