Reddit sues Anthropic, alleging its bots accessed Reddit more than 100,000 times since last July
Reddit sues Anthropic, alleging its bots accessed Reddit more than 100,000 times since last July

Crawling without permission? That’s a lawsuit

Reddit sues Anthropic, alleging its bots accessed Reddit more than 100,000 times since last July
Crawling without permission? That’s a lawsuit
I hope they lose this case badly.
For the concerns I have about AI and stealing others work, I want to see Reddit burn for pretending that they are all about community and connection, while actively harming their users’ experience on the platform and attempting to profit off their content.
Yeah, something about a company making billions of dollars off completely user generated content and moderation just runs me the wrong way. As much as I hate Facebook, they at least pay people to do moderation there, and regularly update their site (as shitty as it is). I dont use either anymore, and I hope they die in a pit of flames owing billions to their shareholders.
As much as I hate Facebook, they at least pay people to do moderation there, and regularly update their site
Facebook pays content creators too (https://creators.facebook.com/earn-money ), including for things other than videos (like photo/image posts). Platforms like YouTube do too, but as far as I know, Reddit doesn't.
Shareholders of these companies are likely you or I, as they are so big they are significant parts of index funds purchased by retirement funds and the like
No matter who wins, everyone loses.
I can't believe you beat me to this. Well done.
Inconceivable!
You've fallen for one of the classic blunders!
I just watched this movie last week. Its got so many good lines.
In the filing, Reddit calls Anthropic a “late-blooming artificial intelligence (‘AI’) company that bills itself as the white knight of the AI industry,” alleging that “it is anything but.”
“This case is about the two faces of Anthropic: the public face that attempts to ingratiate itself into the consumer’s consciousness with claims of righteousness and respect for boundaries and the law, and the private face that ignores any rules that interfere with its attempts to further line its pockets.”
I mean, Reddit's objection is that they want to sell the same data to Google to do the same training.
I dunno, it just reads like a reddit comment to me. 🤣
They actually quite that in a real legal filing?
Jesus.
Did they ask /r/pettyrevenge to write that?
So who owns the data?
Well they already sold it for 60 million and I didnt get a dime, so not me apparently.
Reddit, created by users and Russian bots
So if reddit wins, that means the content is theirs. So if the content is theirs, they are liable for any content that is illegal. Is that true?
yes to both regardless of this lawsuit
The wiggle room for large businesses is that they remove content that violates local laws when notified of it
The content's theirs whether they win or not, isn't it? It's in the EULA when you sign up.
Edit: Here's the clause.
You retain any ownership rights you have in Your Content, but you grant Reddit the following license to use that Content:
When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. For example, this license includes the right to use Your Content to train AI and machine learning models, as further described in our Public Content Policy. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.
non-exclusive
That means we can license all our content to another company, and Reddit would be forced to allow them to fetch it, as we still own it, right?
No. I am not aware of any law that makes you liable by holding or claiming the copyright to some content. EG you may have to pay damages for libel, but not because you have copyright to the libelous statement.
Doesn't quite make sense.
You're telling me that someone can get popped for mistakenly visiting the dark side of the internet and having whatever-the-fuck horrible shit put on their machine, but owning the content and hosting it on your servers results in nothing?
This is like one of those cases where I'm kind of hoping they both lose somehow. Neither party are right in this case, Reddit is trying to claim copyright over content they have no rights to, and anthropic shouldn't be violating copyright without a licence.
But apparently you are actually allowed to violate copyright without a licence if you're an AI company because apparently llms are the future? So I guess Reddit are going to lose, which will be funny.
I am squarely on the reddit should lose this side.
Anthropic may be breaking copyright, but not Reddit's copyright. Sure maybe Anthropic should be sued, but not by Reddit.
Actually this case could be a good thing. The whole question of who owns user generated content needs hashing out, because no one seems to actually know.
Obviously the logical answer would be that the people who created it own the content, but that's never been officially decided.
Judge finds that anthropic has to pay restitution to the reddit users. Affirms that posts belong to users.
Well, I can dream.
You mean Reddit, the company that would be very happy if Anthropic did the exact same thing, but paid Reddit first?
"Violating copyright without a licence" is a lovely turn of phrase. You must be the valedictorian of the Lemmy School of Copyright.
Is it violating copyright to browse the web?
I think it's acceptable as long as you don't learn anything.
“Reddit’s humanity is uniquely valuable in a world flattened by AI,“ Lee said. ”Now more than ever, people are seeking authentic human-to-human conversation. Reddit hosts nearly 20 years of rich, human discussion on virtually every topic imaginable. These conversations don’t happen anywhere else—and they’re central to training language models like Claude.”
LMAO, reddit's days of genuine conversations between humans is long gone.
Only 100,000 times? Shit, do I need to be worried about getting sued too?
All porn subreddits are exempted
Reddit is just mad that Anthropic didn't pay them
... Yes?
Yes that is how capitalism works
The Users posted for free. They didn't get paid. They should be publicly available for scraping
Long story short: They are not combatting bots on their platform. They sold training data to google and these guys aren't paying, that's why they're suing.
Spez can forever get fucked
Suck shit reddit.
while half of reddit is infested with propaganda bots from russia.
Not just Russia.
Israel, US, China, North Korea, India and other countries... Nuclear Lobby, Fossil Fuel Lobby and countless other industry lobbyists... Private companies advertising their products...
But have you seen Rampart?
"We're the front page of the Internet!"
"No, not like that..."
“We’re the front page of the Internet! …as long as the front page isn’t scraped…” >:(
I hope they both choke on their own bots.
100.000 accesses isn't that much, right?
100,000 requests in 11 months? That's about 12.5 requests an hour
That's hardly anything. Facebook has a bot accessing my server's robots.txt multiple times a second. (My robots.txt used to say "Facebook bot go away" but now I just respond 404 to any requests from the Facebook bot. Pretend I said that all technical and stuff, it's 2 am and I ought to go to sleep.)
Some legitimate users probably submit more requests than that
Back in the day that's about how many times I accessed reddit a week.
Obviously Reddit isn't averse to bots scraping the site for data, just ones that aren't paying them. I'm regretting not going through and systematically deleting all my posts and comments before deleting my account, but I thought that happened automatically.
it wouldnt have mattered anyway if you left during the time most people did. Reddit rolled back mass deleted data and manually deleted accounts during that duration so that comments remained without usernames.
The fuckers!
I deleted all my posts and my account and they restored them all and then permanently banned me. They can recover anything they want to.
Basically nowhere on the internet does delete mean delete. Nearly everywhere it means archive or hide.
Do you really believe they don't have backups? Especially since it seems selling content for AI training was their plan for quite a while?
Or that they didn't make full backups a couple years ago before the protest, anticipating a lot of users would try to delete their comments?
I think the only way to truly delete anything from reddit would be living in EU and enforcing a GDPR request, but even in that case, I believe it would be very difficult to check they actually comply.
I think the only way to truly delete anything from reddit would be living in EU and enforcing a GDPR request, but even in that case, I believe it would be very difficult to check they actually comply.
Wouldn't work. GDPR is not copyright. Deleting the username is enough, unless you have doxed yourself in some post.
Rather, it can be argued that GDPR requires restoring comments at least in some situations. Comments may be necessary context to understand replies or even other posts.
I don't regret not deleting all my comments. For me, It's a mishmash of helpful/comedic/observational comments that I don't care that they have sold off for use as training data.
But, I just got shadowbanned, because of my VPN or something, so they aren't getting any more!
Bots? On Reddit!?
It's more likely than you think!
Edit: I guess great minds think alike.
Jokes on you for crawling mostly synthetic text?
hope reddit loses
Thought they signed up for that.
pay us or we sue
Isn't this just blackmail by reddit?
So Reddit serves free data but Anthropic took too much?
They give out free license for their data, but require following their terms of service.
Aren’t the copyrights still belonging to the original authors? What is Reddit suing for? The header and the footer? 🤔
Oof owie my hemochromatosis.
Fuck Reddit
Ehh... fuck reddit. They made the bed