The tool, called Nightshade, messes up training data in ways that could cause serious damage to image-generating AI models.
A new tool lets artists add invisible changes to the pixels in their art before they upload it online so that if it’s scraped into an AI training set, it can cause the resulting model to break in chaotic and unpredictable ways.
The tool, called Nightshade, is intended as a way to fight back against AI companies that use artists’ work to train their models without the creator’s permission.
[...]
Zhao’s team also developed Glaze, a tool that allows artists to “mask” their own personal style to prevent it from being scraped by AI companies. It works in a similar way to Nightshade: by changing the pixels of images in subtle ways that are invisible to the human eye but manipulate machine-learning models to interpret the image as something different from what it actually shows.
I find it very interesting that someone went in this direction to try to find a way to mitigate plagiarism. This is very akin to adversarial attacks in neural networks (you can read more in this short review https://arxiv.org/pdf/2303.06032.pdf)
I saw some comments saying that you could just build an AI that detects poisoned images, but that wouldn't be feasible with a simple NN classifier or feature-based approaches. This technique changes the artist style itself to something the AI would see differently in the latent space, yet, visually perceived as the same image. So if you're changing to a different style the AI has learned, it's fair to assume it will be realistic and coherent. Although maaaaaaaybe you could detect poisoned images with some dark magic tho, get the targeted AI then analyze the latent space to see if the image has been tampered with
On the other hand, I think if you build more robust features and just scale the data this problems might go away with more regularization in the network. Plus, it assumes you have the target of one AI generation tool, there are a dozen of these, and if someone trains with a few more images in a cluster, that's it, you shifted the features and the poisoned images are invalid
Lol... I just read the paper, and Dr Zhao actually just wrote a research paper on why it's actually legally OK to use images to train AI. Hear me out...
He changes the 'style' of input images to corrupt the ability of image generators to mimic them, and even shows that the super majority of artists even can't tell when this happens with his program, Glaze... Style is explicitly not copywriteable in US case law, and so he just provided evidence that the data OpenAI and others use to generate images is transformative which would legally mean that it falls under fair use.
No idea if this would actually get argued in court, but it certainly doesn't support the idea that these image generators are stealing actual artwork.
This is already a concept in the AI world and is often used while a model is being trained specifically to make it better. I believe it's called adversarial training or something like that.
I remember in the early 2010s reading an article like this one on openai.com talking about the dangers of using AI for image search engines to moderate against unwanted content. At the time the concern was CSAM salted to prevent its detection (along with other content salted with CSAM to generate false positives).
My guess is since we're still training AI with pools of data-entry people who tag pictures with what they appear to be, so that AI reads more into images than their human trainers (the proverbial man inside the Iron Turk).
This is going to be an interesting technology war.
I generally don't believe in intellectual property, I think it creates artificial scarcity and limits creativity. Of course the real tragedies in this field have to do with medicine and other serious business. But still, artists claiming ownership of their style of painting is fundamentally no different. Why can't I paint in your style? Do you really own it? Are you suggesting you didn't base your idea mostly on the work of others, and no one in turn can take your idea, be inspired by it and do with it as they please? Do my means have to be a pencil, why can't my means be a computer, why not an algorythm? Limitations, limitations, limitations. We need to reform our system and make the public domain the standard for ideas (in all their forms). Society doesn't treat artists properly, I am well aware of that. Generally creative minds are often troubled because they fall outside norms. There are many tragic examples. Also money-wise many artists don't get enough credit for their contributions to society, but making every idea a restricted area is not the solution. People should support the artists they like on a voluntary basis. Pirate the album but go to concerts, pirate the artwork but donate to the artist. And if that doesn't make you enough money, that's very unfortunate. But make no mistake: that's how almost all artists live. Only the top 0.something% actually make enough money by selling their work, and that's is usually the percentile that's best at marketing their arts, in other words: it's usually the industry. The others already depend upon donations or other sources of income. We can surely keep art alive, while still removing all these artificial limitations, copying is, was and will never be in any way similar to stealing. Let freedom rule. Join your local pirate party.
Obviously this is using some bug and/or weakness in the existing training process, so couldn't they just patch the mechanism being exploited?
Or at the very least you could take a bunch of images, purposely poison them, and now you have a set of poisoned images and their non-poisoned counterparts allowing you to train another model to undo it.
Sure you've set up a speedbump but this is hardly a solution.
What a dumb solution to a problem that doesn't need a solution. The problem isn't AI, it's the lack of understanding for the tech that has people thinking AI is theft.