Hacker News new | threads | past | comments | ask | show | jobs | submit vertis (2663) | logout
We’ve filed a law­suit chal­leng­ing Sta­ble Dif­fu­sion (stablediffusionlitigation.com)
154 points by zacwest 11 hours ago | flag | hide | past | favorite | 351 comments





They've got a copy of a figure from the original diffusion paper, showing a diffusion process on a spiral dataset. They seem to completely misunderstand it. The figure does not show image diffusion, rather it shows a diffusion process in which each data item is a 2D point. The figure is showing diffusion on an entire dataset and demonstrating that it can approximately reconstruct the spiral-shaped distribution.

I'm surprised they couldn't find someone with even a rudimentary understanding of diffusion models to review this.


Wow, not only do they get this wrong, it’s the core example they use to demonstrate copying.

Yeah, and their next figure isn't any better. They show a latent space interpolation figure from DDPM, and they seem to think this is how Diffusion models produce a "collage" (as they describe the process). Of course, this figure has nothing to do with how image generation is actually performed. It's just an experiment for the purpose of the paper to demonstrate that the latent space is structured.

In fact, this only works because the source images are given as input to the forward process - thus, the details being interpolated are from the inputs not from the model. If you look at Appendix Figure 9 from the same paper (https://arxiv.org/pdf/2006.11239.pdf) it is clear what's going on. Only when you take a smaller number of diffusing (q) steps can you successfully interpolate. When you take a large number of diffusing steps (top row of figure 9), all of the information from the input images is lost, and the "interpolations" are now just novel samples.

It's very hard for me to find a reason to include Figure 8 but not Figure 9 in their lawsuit that isn't either a complete lack of understanding, or intentional deception.


“Sta­ble Dif­fu­sion con­tains unau­tho­rized copies of mil­lions—and pos­si­bly bil­lions—of copy­righted images.”

That’s going to be hard to argue. Where are the copies?

“Hav­ing copied the five bil­lion images—with­out the con­sent of the orig­i­nal artists—Sta­ble Dif­fu­sion relies on a math­e­mat­i­cal process called dif­fu­sion to store com­pressed copies of these train­ing images, which in turn are recom­bined to derive other images. It is, in short, a 21st-cen­tury col­lage tool.“

“Diffu­sion is a way for an AI pro­gram to fig­ure out how to recon­struct a copy of the train­ing data through denois­ing. Because this is so, in copy­right terms it’s no dif­fer­ent from an MP3 or JPEG—a way of stor­ing a com­pressed copy of cer­tain dig­i­tal data.”

The examples of training diffusion (eg, reconstructing a picture out of noise) will be core to their argument in court. Certainly during training the goal is to reconstruct original images out of noise. But, do they exist in SD as copies? Idk


> That’s going to be hard to argue. Where are the copies?

In fairness, Diffusion is arguably a very complex entropy coding similar to Arithmetic/Huffman coding.

Given that copyright is protectable even on compressed/encrypted files, it seems fair that the “container of compressed bytes” (in this case the Diffusion model) does “contain” the original images no differently than a compressed folder of images contains the original images.

A lawyer/researcher would likely win this case if they re-create 90%ish of a single input image from the diffusion model with text input.


There's a key difference. A compression algorithm is made to be reversible. The point of compressing an MP3 is to be able to decompress as much of the original audio signal as possible.

Stable Diffusion is not made to decompress the original and actually has no direct mechanism for decompressing any originals. The originals are not present. The only thing present is an embedding of key components of the original in a multi-dimensional latent space that also includes text.

This doesn't mean that the outputs of Stable Diffusion cannot be in violation of a copyright, it just means that the operator is going to have to direct the model towards a part of that text/image latent space that violates copyright in some manner... and that the operator of the model, when given an output that is in violation of copyright, is liable for publishing the image. Remember, it is not a violation of copyright to photocopy an image in your house... it's a violation when you publish that image!


Storing copies of training data is pretty much the definition of overfitting, right?

The data must be encoded with various levels of feature abstraction for this stuff to work at all. Much like humans learning art, if devoid of the input that makes human art interesting (life experience).

I think a more promising avenue for litigating AI plagiarism is to identify that the model understands some narrow slice of the solution space that contains copyrighted works, but is much weaker when you try to deviate from it. Then you could argue that the model has probably used that distinct work rather than learned a style or a category.


Even that approach seems highly vulnerable to fair use. If the model does not recreate a copyrighted work with enough fidelity to be recognized as such, then how can it be said to be in violation of copyright?

> 90%ish of a single input image

Oh, one image is enough to apply copyright as if it were a patent, to ban a process that makes original works most of the time?

The article authors say it works as a "collage tool" trying to minimise the composition and layout of the image as unimportant elements. At the same time forgetting that SD is changing textures as well, so it's a collage minus textures and composition?

Is there anything left to complain about? unless, by draw of luck, both layout and textures are very similar to a training image. But ensuring no close duplications are allowed should suffice.

Copyright should apply one by one, not in bulk. Each work they complain about should be judged on its own merits.


Oh, one image is enough to apply copyright as if it were a patent, to ban a process that makes original works most of the time?

The software itself is not at issue here. If they had trained the network on public domain images then there’d be no lawsuit. The legal question to settle is whether it’s allowable to train (and use) a model on copyrighted images without permission from the artists.

They may actually be successful at arguing that the outputs are either copies or derived works which would require paying the original artist for licenses.


Then I think any work of art or media inspired by past sources would fall into this category. It's a very grey line, and I haven't seen anyone or any case law put it into proper terms as of yet.

> Oh, one image is enough to apply copyright as if it were a patent, to ban a process that makes original works most of the time?

The law can do whatever its writers want. The law is mutable, so the answer to your question is “maybe”.

Maybe SD will get outlawed for copyright reasons on a single image. The law and the courts have done sillier things.


All the handwringing about generative AI brings to mind the aphorism about genies returning to bottles. There can be lawsuits and laws--and there may even be cases where an output by chance or by tickling the input sufficiently looks very close to something in the training set. But anyone who thinks this technology will be banned in some manner is... mistaken.

So as a code author I am pretty upset about Copilot specifically, and it seems like SD is similar (hadn't heard before about DeviantArt doing the same as what GitHub did). But I agree with this take: the tech is here, it's going to be used, and it's not going to be shut down by a lawsuit. Nor should it, frankly.

What I object to is not the AI itself, or even that my code has been used to train it. It's the copyright for me but not for thee way that it's been deployed. Does GitHub/Microsoft's assertion that training sidesteps licensing apply to GitHub/Microsoft's own code? Do they want to allow (a hypothetical) FSFPilot to be trained on their proprietary source? Have they actually trained Copilot on their own source? If not, why not?

I published my source subject to a license, and the force of that license is provided by my copyright. I'm happy to find other ways of doing things, but it has to be equitable. I'm not simply ceding my authorship to the latest commercial content grab.


> Have they actually trained Copilot on their own source? If not, why not?

People have posted illegal Windows source code leaks to GitHub. Microsoft doesn’t seem to care that much because these repos stay up for months or even years at a time without Microsoft DMCAing them-if you go looking you’ll find some right now. I think it is entirely possible, even likely, that some of those repos were included in Copilot’s training data set. So Copilot actually was trained on (some of) Microsoft’s proprietary source code, and Microsoft doesn’t seem to care.


I doubt Microsoft sees fragments of Windows source code as a particular crown jewel these days. That said, some of it is decades old code that was intended for the public to see (unlike, presumably, anything in a public GitHub repository). And some of it is presumably third-party code licensed to Microsoft that was likewise never intended for public viewing. So, while it would be a good gesture on the part of Microsoft to scan their own code--if they haven't done so--I could see why it might be problematic. (Just as training on private GitHub repos would be.)

tl;dr I think there's a distinction between training on copyrighted but public content and private content.


Private third-party GitHub repos is another good example. If licenses don't apply to training data, as GitHub has asserted, why not use those too? Do they think they'll get in trouble over it? Why doesn't the same trouble apply to my publicly-readable GPL-licensed code?


But they are not original works, they are wholly derived works of the training data set. Take that data set away and the algorithm is unable to produce a single original pixel.

The fact that the derivation involves millions of works as opposed to a single one is immaterial for the copyright issue.


If I take a million copywritten images from magazines, cut them with scissors, and make a single collage, I would expect the resulting image to be fair use. Fair use is an affirmative defense, like self defense, where you justify your infringement.

People are treating this like its a binary technical decision. Either it is or isn't a violation. Reality is that things are spectrums and judges judge. SD will likely be treated like a remix that sampled copywritten work, but just a tiny bit of each work, and sufficiently transformed it to create a new work.


If I take a million copywritten images from magazines, cut them with scissors, and make a single collage, I would expect the resulting image to be fair use.

That’s not how it works. Your collage would be fine if it was the only one since you used magazines you bought. Where you’d get into trouble is if you started printing copies of your collage and distributing them. In that case you’d be producing derived works and be on the hook for paying for licenses from the original authors.


If I make software that randomly draws pixels on the screen then we can say for a fact that no copyrighted images were used.

If that software happens to output an image that is in violation of copyright then it is not the fault of the model. Also, if you ran this software in your home and did nothing with the image, then there's no violation of copyright either. It only becomes an issue when you choose to publish the image.

The key part of copyright is when someone publishes an image as their own. That they copy an image doesn't matter at all. It's what they DO with the image that matters!

The courts will most likely make a similar distinction between the model, the outputs of the model, and when an individual publishes the outputs of the model. This would be that the copyright violation occurs when an individual publishes an image.

Now, if tools like Stable Diffusion are constantly putting users at risk of unknowingly violating copyrights then this tool becomes less appealing. In this case it would make commercial sense to help users know when they are in violation of copyright. It would also make sense to update our copyright catalogues to facilitate these kinds of fingerprints.


The training data set is indeed mandatory but that doesn't make the resulting model a derivative in itself. In fact the training is specifically made to remove derivatives.

Go to stablediffusionweb.com and enter "a person like biden" into the box. You will see a picture exactly like President Biden. That picture will have been derived from the trained images of Joe Biden. That cannot be in dispute.

You've made some errors in reasoning.

First, there is a legal definition of a "derivative work" and there is an artistic notion of a "derivative work". If the two of us both draw a picture of the Statue of Liberty, artistically we have both derived the drawing based on the original statue. However, neither of these drawings in relation to the original sculpture nor the other drawing is legally considered a derivative work.

Let's think about a cartoonish caricature of Joe Biden. What "makes up" Joe Biden?

https://www.youtube.com/watch?v=QRu0lUxxVF4

To what extent are these "constituent parts" present in every image of Joe Biden? All of them? Is the latent space not something that is instead hidden in all images of Joe Biden? Can an image of Joe Biden be made by anyone that is not derived from these "high order" characteristics of what is recognizable as Joe Biden across a number of different renderings from disparate individuals?


Just because it generates you an image like Biden still does not make it a derivative either.

You can draw Biden yourself if you're talented and it's not considered a derivative of anything.


The difference is that computers create perfect copies of images by default, people don't.

If a person creates a perfect copy of something it shows they have put thousands of hours of practice into training their skills and maybe dozens or even hundreds of hours into the replica.

When a computer generates a replica of something it's what it was designed to do. AI art is trying to replicate the human process, but it will always have the stink of "the computer could do this perfectly but we are telling it not to right now"

Take Chess as an example. We have Chess engines that can beat even the best human Chess players very consistently.

But we also have Chess engines designed to play against beginners, or at all levels of Chess play really.

We still have Human-only tournaments. Why? Why not allow a Chess Engine set to perform like a Grandmaster to compete in tournaments?

Because there would always be the suspicion that if it wins, it's because it cheated to play at above it's level when it needed to. Because that's always an option for a computer, to behave like a computer does.


You’re acting like the “computer” has a will of it’s own. Generating a perfect copy of an image would be a completely separate task from training a model for image generation.

There are no models I know of with the ability to generate an exact copy of an image from its training set unless it was solely trained on that image to the point it could. In that case I could argue the model’s purpose was to copy that image rather than learn concepts from a broad variety of images to the point it would be almost impossible to generate an exact copy.

I think a lot of the arguments revolving around AI image generators could benefit from the constituent parties reading up on how transformers work. It would at least make the criticisms more pointed and relevant, unlike the criticisms drawn in the linked article.


There is no need for rhetorical games. The actual issue is that Stable Diffusion does create derivatives of copyrighted works. In some cases the produced images contain pixel level details from the originals. [1]

[1] https://arxiv.org/pdf/2212.03860.pdf


> The actual issue is that Stable Diffusion does create derivatives of copyrighted works.

Nothing points to that, in fact even in this website they had to lie on how stablediffusion actually works, maybe a sign that their argument isn't really solid enough.

> [1] https://arxiv.org/pdf/2212.03860.pdf

You realize those are considered defects of the model right? Sure, this model isn't perfect and will be improved.


> You realize those are considered defects of the model right? Sure, this model isn't perfect.

You can call copying of input as a defect, but why are you simultaneously arguing that it doesn't occur?


I don't call these defects copying either but overfitting characteristics. Usually they are there because there's a massive amount of near-identical images.

It's both undesirable and not relevant to this kind of lawsuit.


Correction: if you draw a copy of Biden and it happens to overlap enough with someone’s copyright of a drawing or image of Biden, you did create a derivative (whether you knew it or not).

is that really how copyright law works? Drawing something similar independently is considered a derivative even if there's no links to it?

It's bad news for art websites themselves if that's the case...


No that’s not… at least in many countries. Unlike patents, “parallel creation” is allowed, this was fought out in case law over photography decades ago, because photographers would take images of the same subject, then someone else would, and they might incidentally capture a similar image for lots of reasons and thus before ubiquitous photography in our pockets, when you had to have expensive equipment or carefully control the lighting in a portraiture studio to get great results… well it happened and people sued like those with money to spare for lawyers are want to do, and thus precedent has been established for much of this. You don’t see it a lot outside photography but it’s not a new thing for art copyright law and I think the necessity of the user to provide their own input and get different outcomes outside of extremely sophisticated prompt editing… will be a significant fact in their favour.

If I were to take the first word from a thousand books and use it to write my own would I be guilty of copyright violations?

Words have a special carve out in copyright law / precedent. So much so that a whole other category of Intellectual Property exists called Trademarks to protect special words.

But back to your point “if you were to take the first sentence from a thousand books and use it in your own book”, then yes based on my understanding (I am not a lawyer) of copyright you would be in violation of IP laws.


I doubt it would be a violation.

Specifically fair use #3 "the amount and substantiality of the portion used in relation to the copyrighted work as a whole."

A sentence being a copyright violation would make every book review in the world illegal.


lol thinking about this more:

I understand people’s livelihoods are potentially at stake, but what a shame it would be if we find AGI, even consciousness but have to shut it down because of a copyright dispute.


The real tragedy is being marketed to so heavily that we construe enforcing copyright on llm/diffusion companies with shutting down an AGI. I blame companies like openai purposefully marketing themselves poorly since nobody is going to enforce false advertising laws on something they don't understand.

That could be a funny movie.

Special agents from the MPAA sent to assassins an Android who can spew out high quality art.


Didn't yesterday someone proclaim generative models can't destroy anything worth protecting? It was about chatGPT but the principle is the same.

I think the result will be image sharing websites where you have to agree to have your image read into the model.

I think it is likely github will do the same with copilot.


Image sharing sites routinely steal artwork from the web. My business has a unique logo with the business name in it. It has repeatedly shown up on such sites, despite repeated DMCA takedown requests.

Simply appearing on a shared hosting site should not be enough.


maybe a fair price to pay for free repo hosting. wouldn't want my private repos being used for training though

what if they shut us down because of a copyright dispute? :-)

Seriously!!!

I didn’t say it cuz I didn’t think it would resonate, but it’s a whole new world we are quickly entering.


> In fairness, Diffusion is arguably a very complex entropy coding similar to Arithmetic/Huffman coding.

so, digits of pi anyone?


In that vein, surely MD5 hashes should also be copyrighted, as they are derived from a work.

Not really, since one of the major characteristics is being able to recover the copyrighted work from the encoded version.

Since md5 hashes don't share this property, they're not "in that vein".


And how that's different from gzip or base64, which can re-create original image when given appropriate input?

That’s my point, Diffusion[1] does seem to be “just like” gzip or base64.

And it would be illegal for me to sell or distribute zipped copies of images without the copyright holder’s consent. Similarly there might be an argument for why Diffusion[1] specifically can’t be built with copyrighted images.

[1] which is just one part of something like Stable Diffusion


A lossy compressor isn't just like a lossless compressor. Especially not one that has ~2 bytes for each input image.

I agree with you. My intuition is also that SD itself is not a violation of copyright.

That said it can sometimes be in violation of copyright if it creates a specific image that is “too close to another original” (just like a human would be in violation even if they never previously saw that image).

But the above is just my intuition (and possibly yours) that doesn’t mean a lawyer couldn’t make the argument that it’s a ”good enough lossy compression - just like jpeg but smaller” and therefore “contains the images in just 2 bytes”.

That lawyer may fail to win the argument, but there is a chance that they do win the argument! Especially as researchers keep making Diffusion and SD models better and better at being compression algos (which is a topic people are actively working on).


So it's fine to distribute copyrighted works, as long as they're jpeg(lossy) encoded? I don't think the law would agree with you.

If I compress a copyrighted work down to two bytes and publish that, I think that judges would declare it legal. If it can't be uncompressed to resemble the copyrighted work in any sense, no judge is going to declare it illegal.

How many bytes make it an original work vs a compressed copy?

Usually judges would care more about whether the bytes came from than how many of them there are.

Since SD is trained by gradient updating against several different images at the same time, it of course never copies any image bits straight into it. Since it's a latent-diffusion model, actual "image"ness is limited to the image encoder (VAE), so any fractional bits would be in there if you want to look.

The text encoder (LAION OpenCLIP) does have bits from elsewhere copied straight into it to build the tokens list.

https://huggingface.co/stabilityai/stable-diffusion-2-1/raw/...


“any fractional bits would be in there if you want to look.”

What do you mean by this in the context of generating images via prompt? “Fractional bits” don’t make sense and it’s more misleading if anything. Regardless, a model violating criteria for being within fair use will always be judged by the outputs it generates rather than its composing bytes (which can be independent)


The important distinction then is using another program or device to analyze the bits but without copying them, that takes its own new impression? Like using a camera?

Well, theoretically more like a vague memory of it or taking notes on it.

well I guess it wouldn't be different, only there aren't any companies zipping up millions of images and then offering people the chance to get those images by putting in the text prompt that recreates them without paying any fees to the artists whose images were used.

Search engines do that.

good point, but didn't Google Image search lose some case and have to change their behavior?

Great. Now the defence shows an artist that can recreate an image. Cool, now people who look at images get copyright suits filed against them for encoding those images in their heads.

Don't think stable Diffusion can reproduce any single image its trained on, not matter what prompts you use.

It does have Mona lisa because of over fitting. But that's because there is too much Mona lisa on internet.

These artist taking part in suit won't be able to recreat any of their work.


Just because I look at an image does not mean that I can recreate it. storing it in the training data means the AI can recreate it.

There's a world of difference that you are just writing off.


If you spent a decade trying to draw it, wouldn't your brain have the right "weights" to execute it pretty exactly going forward?

Except with computers, they don't need to eat or sleep, converse or attend stand-ups.

And once you're able to draw that one picture, you could probably draw similar ones. Your own style may emerge too.

Just thinking. Copywriters, students, and scribes used to copy stuff verbatim, sometimes just to "learn" it.

The product of that study could be published works, a synthesis of ideas from elsewhere, and so on. We would say it belonged to the executor, though.

So the AI learned, and what it has created belongs to it. Maybe.

Or, once we acknowledge AI can "see" images, precedent opens the way to citizenship (humanship?)


> storing it in the training data means the AI can recreate it.

No it doesn't, it means that abstract facts related to this image might be stored.


The pedantry gets tiring. If the AI can't recreate it exactly, it can recreate a likeness that is compelling enough that the average person would think it was the same. If it can't now, it will as it gets better. That's the point of using the training data.

That is not the point of using the training data. It's specifically trained to not do that.

See https://openai.com/blog/dall-e-2-pre-training-mitigations/ "Preventing Image Regurgitation".


That's probably a very relevant point. (I'm guessing.) If I ask for an image of a red dragon in the style of $ARTIST, and the algorithm goes off and says "Oh, I've got the perfect one already in my data"--or even "I've got a few like that, I'll just paste them together"--that's a problem.

Why does this argument apply to an Artificial Intelligence, but not a human one? A human is not breaking copyright just by being able recreate a copyrighted work they've studied.

It depends to what degree it's literal copying. See e.g. the Obama "Hope" poster. [1] Though that case is muddied by the fact that the artist lied about the source of his inspiration. Had it in fact been an older photo of JFK in a similar pose, there probably wouldn't have been a controversy.

[1] https://en.wikipedia.org/wiki/Barack_Obama_%22Hope%22_poster


> If the AI can't recreate it exactly, it can recreate a likeness that is compelling enough that the average person would think it was the same

That's the opposite goal of this image model. Sure you might find other types of research models which are meant to do that but that's not stablediffusion and the likes.


This just sounds like really fancy, really lossy compression to me.

Compression that returns something different from the original most of the time, but still could return the original.


No, it means there is a 512 bit number you can combine with the training data to reproduce a reasonable though not exact likeness (attempts to use SD and others as compression algorithms show they're pretty bad at it, because while they can get "similar" they'll outright confabulate details in a plausible looking way - i.e. redrawing the streets of San Francisco in images of the golden gate bridge).

Which of course then arrives at the problem: the original data plainly isn't stored in a byte exact form, and you can only recover it by providing an astounding specific input string (the 512 bit latent space vector). But that's not data which is contained within Stable Diffusion. It's equivalent to trying to sue a compression codec because a specific archive contains a copyrighted image.


> It's equivalent to trying to sue a compression codec because a specific archive contains a copyrighted image.

This is the most salient point in this whole HN thread!

You can’t sue Stable Diffusion or the creators of it! That just seems silly.

But (I don’t know I’m not a lawyer) there might be an argument to sue an instance of Stable Diffusion and the creators of it.

I haven’t picked a side of this debate yet, but it has already become a fun debate to watch.


Exactly, the quarrel here is between the users of Stable Diffusion, some of which are deliberately, legally speaking with intent (prompt crafting to get a specific output demonstrates clear intent), trying to use Stable Diffusion to produce images that are highly derivative of and may or may not be declared legally infringing works of another artist, and the artists who’s works are being potentially infringed upon.

You can’t sue Canon for helping a user take better infringing copies of a painting, nor can you sue Apple or Nikon or Sony or Samsung… you can sue the user making an infringing image, not the tools they used to make the infringing image… the tools have no mens rea.


You can't (successfully) sue the creators of Stable Diffusion because they're an academic group in Germany, a country that has an explicit allowance in copyright law for training non-commercial models.

> It's equivalent to trying to sue a compression codec because a specific archive contains a copyrighted image.

That's plainly untrue, as Stable Diffusion is not just the algorithm, but the trained model—trained on millions of copyrighted images.


But in fairness, even a human could know how to violate copyright but cannot be sued until they do violate it.

SD might know how to violate copyright but is that enough to sue it? Or can you only sue violations it helps create?


I would assert (with no legal backing, since this is the first suit that actually attempts to address the issue either way) that the trained model is a copyright infringement in itself. It is a novel kind of copyright infringement, to be sure, but I believe that use of copyrighted material in a neural net's training set without the creator's permission should be considered copyright infringement without any further act required to make it so.


This is a gross misunderstanding of "copyright?" The model doesn't need to "contain" the images anymore than a Harry Potter knockoff needs to "contain" huge chunks of text from the originals.

It depends. If names are different, character and plot details differ, etc., a book about students at a school for wizards battling great evil may be a not particularly imaginative rip-off and may even invite litigation if it's too close, but I'm guessing it wouldn't win in court. See also The Sword of Shannara and Tolkien. https://en.wikipedia.org/wiki/The_Sword_of_Shannara

Creators mimic styles and elements of others' works all the time. Unless an ML algorithm crosses some literal copying threshold, I fail to see it as doing anything substantially different from what people routinely do.


It doesn't matter if they exist as exact copies in my opinion.

The law doesn't recognize a mathematical computer transformation as creating a new work with original copyright.

If you give me an image, and I encrypt it with a randomly generated password, and then don't write down the password anywhere, the resulting file will be indistinguishable from random noise. No one can possibly derive the original image from it. But, it's still copyrighted by the original artist as long as they can show "This started as my image, and a machine made a rote mathematical transformation to it" because machine's making rote mathematical transformations cannot create new copyright.

The argument for stable diffusion would be that even if you cannot point to any image, since only algorithmic changes happened to the inputs, without any human creativity, the output is a derived work which does not have its own unique copyright.


This surely can't be the case, right? If it was, then what's stopping me from taking any possible byte sequence and applying my copyright to it?

I could always show that there exists some function f that produces said byte sequence when applied to my copyrighted material.

Can I sue Microsoft because the entire Windows 11 codebase is just one "rote mathematical transformation" away from the essay I wrote in elementary school?


The law doesn't care about technical tricks. It cares about how you got the bytes and what humans think of them.

Sure, the windows 11 codebase is in pi somewhere if you go far enough. Sure, pi is a non-copyrightable fact of nature. That doesn't mean the windows codebase is _actually_ in pi legally, just that it technically is.

The law does not care about weird gotchas like you describe.

I recommended reading this to a sibling comment, and I'll recommend it to you too: https://ansuz.sooke.bc.ca/entry/23

Yes, copyright law has obviously irrational results if you start trying to look at it only from a technical "but information is just 1s and 0s, you can't copyright 1s and 0s" perspective. The law does not care.

Which is why we have to think about the high level legal process that stable diffusion does, not so much the actual small technical details like "can you recover images from the neural net" or such.


Some years ago I had an idea to have a method of file sharing with strong plausible deniability from the sharer.

The idea, in stage one, was to split a file into chunks and xor those with other random chunks (equivalent to a one-time pad), those chunks as well as the created random chunks then got shared around the networks, with nobody hosting both parts of a pair.

The next stage is that future files inserted into the network would not create new random chunks but randomly use existing chunks already in the network. The result is a distributed store of chunks each of which is provably capable of generating any other chunk given the right pair. The correlations are then stored in a separate manifest.

It feels like such a system is some kind of entropy coding system. In the limit the manifest becomes the same size as the original data. At the same time though, you can prove that any given chunk contains no information. I love thinking about how the philosophy of information theory interacts with the law.


I think this touches on the core mismatch between the legal perspective and technical perspective.

Yes, on a technical level, those chunks are random data. On the legal side, however, those chunks are illegal copyright infringement because that is their intent, and there is a process that allows the intent to happen.

I can't really say it better than this post does, so I highly recommend reading it: https://ansuz.sooke.bc.ca/entry/23


Except you've a heckin' problem with Stable Diffusion because you have to argue that the intent is to steal the copyright by copying already existing artworks.

But that's not what people use Stable Diffusion for: people use Stable Diffusion to create new works which don't previously exist as that combination of colors/bytes/etc.

Artists don't have copyright on their artistic style, process, technique or subject matter - only on the actual artwork they output or reasonable similarities. But "reasonable similarity" covers exactly that intent - an intent to simply recreate the original.

People keep talking about copyright, but no one's trying to rip off actual existing work. They're doing things like "Pixar style, ultra detailed gundam in a flower garden". So you're rocking up in court saying "the intent is to steal my clients work" - but where is the clients line of gundam horticultural representations? It doesn't exist.

You can't copyright artistic style, only actual output. Artists are fearful that the ability to emulate style means commissions will dry up (this is true) but you've never had copyright protection over style, and it's not even remotely clear how that would work (and, IMO, it would be catastrophic if it was - there's exactly one group of megacorps who would now be in a position to sue everyone because try defining "style" in a legal sense).


> because you have to argue that the intent is to steal the copyright by copying already existing artworks

Copyright infringement can happen without intending to infringe copyright.

Various music copyright cases start with "Artist X sampled some music from artist Y, thinking it was transformative and fair use". The court, in some of these cases, have found something the artist _intended_ to be transformative to in fact be copyright infringement.

> You can't copyright artistic style, only actual output

You copyright outputs, and then works that are derived from those outputs are potentially copyrighted. Stable Diffusion's outputs are clearly defined from the training set, basically by definition of what neural networks are.

It's less clear they're definitely copyright-infringing derivative works, but it's far less clearcut than how you're phrasing it.


> But, it's still copyrighted by the original artist as long as they can show "This started as my image, and a machine made a rote mathematical transformation to it" because machine's making rote mathematical transformations cannot create new copyright.

Do you have evidence that this is actually what the courts have decided with respect to NNs?


So what happens if you put a painting into a mechanical grinder? Is the shapeless pile of dust still copyrighted work? I don’t think so.

Maybe?

If you take a bad paper shredder that, say, shreds a photo into large re-usable chunks, run the photo through that, and tape the large re-usable chunks back together, you have a photo with the same copyright as before.

If you tape them together in a new creative arrangement, you might apply enough human creativity to create a new copyrighted work.

If you grind the original to dust, and then have a mechanical process somehow mechanically re-arrange the pieces back into an image without applying creativity, then the new mechanically created arrangement would, I suspect, be a derived work.

Of course, such a process don't really exist, so for the "shapeless dust" question, it's pretty pointless to think about. However, stable diffusion is grinding images down into neural networks, and then without a significant amount of human creativity involved, creating images reconstituted from that dust.

Perhaps the prompt counts as human creativity, but that seems fairly unlikely. After all, you can give it a prompt of 'dog' and get reconstituted dust, that hardly seems like it clears a bar.

Perhaps the training process somehow injected human creativity, but that also seems difficult to argue, it's an algorithm.


The owner of that Banksy painting certainly thinks so.

The painting that has several cuts in about 25% of the surface area? Don’t think that constitutes as a shapeless pile of dust.

So what % does?

The only problem is that computers (I.e. most computers) cannot really generate random numbers.

Can I not then sue all random noise sources, since they are all unprovably encrypted versions of my copyrighted works?


No. Humans decided to include artwork that they did not have any right to use as part of a training data set. This is about holding humans accountable for their actions.

“Did they have a right to use publicly posted images” is up for the courts to decide

Pretty sure that’s already decided. Publicly played movies and music are not available to be used. Why would the same not apply to posted images?

What court case set the president that you can’t train a neural network on publicly posted movies and audio?

I'd assume the precedent would be about sharing encoder data, which would be covered in bittorrent cases.

"Training a neural network" is an implementation detail. These companies accessed millions of copyrighted works, encoded them such that the copyright was unenforcable, then sell the output of that transformation.


Not being able to reproduce the inputs (each image is contributing single bytes to the neural network) is relevant. Torrent files are a means to exactly reproduce their inputs. Diffusion models are trained to not reproduce their inputs, nor do they have the means to.

If you post a song on your website and I listen to it am I violating your copyright?

If my parrot recites your song after hearing my alleged infringement, I record its performance and post it on YouTube is that infringement?

Last one, if I use the song from your website to train an song recognition AI is that infringement?


If I host a song I don't have license to on my website I'm violating copyright by distributing it to you when you listen on my site.

If my parrot recites your song after hearing it and I record that and upload to YouTube. I've violated your copyright.

If a big company does the same(runs the song through a non-human process, then sells the output) I believe they're blatantly infringing copyright.


Big Company is not distributing the input images by distributing the neural network. There is no way to extract even a single input image out of a diffusion model.

"right" in the informal sense or in some legal sense?

Legal

Can you clarify? My understanding is that it's very unclear whether there are any legal issues (in most jurisdictions) in scraping for training.

Obviously some fairy reputable organisations and individuals are moderately confident that there isn't otherwise they wouldn't have done it.


"It's very unclear" in legal cases is synonymous with "it hasn't been challenged in court yet". You say they're moderately confident because they're fairly reputable, but remember that Madoff was a "reputable business man" for the 20 years he ran a ponzi scheme. They don't have to be confident in the legality to do it, they just had to be confident in the potential profit. With openai being values at $10B by Microsoft, I'd say they've successfully muddied the legal waters long enough to cash out.

I don't think you have to reproduce an entire original work to demonstrate copyright violation. Think about sampling in hip hop for example. A 2 second sample, distorted, re-pitched, etc. can be grounds for a copyright violation.

The difference here is that the images aren't stored, but rather an extremely abstract description of the image was used to very slightly adjust a network of millions of nodes in a tiny direction. No semblance of the original image even remotely exists in the model.

This is very much a 'color of your bits' topic, but I'm not sure why the internal representation matters. It's pretty trivial to recreate famous works like the Mona Lisa or Starry Night or Monet's Water Lily Pond. Obviously some representation of the originals exist inside the model+prompt. Why wouldn't that apply to other images in the training sets?

>It's pretty trivial to recreate famous works like the Mona Lisa or Starry Night or Monet's Water Lily Pond.

A recreation of a piece of art does not mean a copy, I've personally seen hundreds of recreations of Edvard Munch's 'The Scream', all of them perfectly legal.

Even in a massively overtrained model, it is practically impossible to create a 1:1 copy of a piece of art the model was trained upon.

And of course that would be a pointless exercise to begin with, why would anyone want to generate 1:1 copies (or anything near that) of existing images ?

The whole 'magic' of Stable Diffusion is that you can create new works of art in the combined styles of art, photography etc that it has been trained on.


It applies to these specific images because there were thousands and thousands of copies in the training set. That’s not true for newer works.

That's not true. As an example of a more recent copyright-protected work that Stability AI consistently reproduces fairly faithfully, I invite you to try out the prompt "bloodborne box art".

Longer term, by analogy, it will then of course turn into a "what color is your neural net" topic.

Which runs into some very interesting historical precedents.

((I wonder if there's a split between people who think AI emancipation might happen this century versus people who think that such a thing is silly to contemplate))


Because you're silently invoking additional data (the prompt + noise seed), which is not present in the training weights. You have the prompt + noise seed for any given output.

An MPEG codec doesn't contain every movie in the world just because it could represent them if given the right file.

The white light coming off a blank canvas also doesn't contain a copy of the Mona Lisa which will be revealed once someone obscures some of the light.


OK so let me encrypt a movie and distribute that. Then you tell people they need to invoke additional data to watch the movie. Also give some hints (try the movie title lol).

If you distribute a random byte stream, and someone uses that as a one time pad to encrypt a movie, then are you distributing the movie?

The answer is of course not, and the same principle applies if someone uses Stable Diffusion to find a latent space encoding for a copyright image (the 231 byte number - had to go double check what the grid size actually is).


I think it boils down to one question: can you prompt the model to show mostly unchanged pictures from artists? Then it's definitely problematic. If not, then I don't have enough knowledge of the topic to give a strong opinion. (my previous answer was just an use case that fits your argument)

> No semblance of the original image even remotely exists in the model

What does this mean? It doesn't mean you can't recreate the original, because that's been done. It doesn't mean that literally the bits for the image aren't present in the encoded data, because that's true for any compression algorithm.


Do you have any examples of recreating an image with these models? Something other than Mona lisa or other famous artworks because they have caused over fitting.

Not to mention that it works by inverting noise. Different noise, different result. Let's recognise the important contribution of noise here.

there are some artists with very strong, recognizable styles. if you provide one of these artists' name in your prompt and get a result back that employs their strong, recognizable style, i think that demonstrates that the network has a latent representation of the artists work stored inside of it.

So, what you are saying is that it is illegal to paint in the style of another artist? I‘m no lawyer, but I‘m pretty sure that is completely legss as long as you don’t claim your paintings ARE from the other artist.

I was with you right up until the final sentence.

How did "style" become "work"?


Because in some cases, adding a style prompt gives almost the original image: https://www.reddit.com/r/StableDiffusion/comments/wby0ob/it_...

And yet nobody has managed to demonstrate reconstruction of a large enough section of a work that is still under copyright to prove the point.

The only thing so far discovered is either a) older public domain works nearly fully reproduced b) small fragments of newer works or c) "likenesses"


That’s the key question of the lawsuit IMO!

Or it means their style is so easy to recognize that you can see it even when it doesn't exist.

The most common example of this (Greg Rutkowski) is not in StableDiffusion's training set.


No that doesn't, that demonstrates that the model has a abstract features and characteristics of this artists stored in the model, not work.

You can't bring back the training images no matter how hard you try.


No, it only demonstrates that it has a representation of their style, not any individual images. You could probably describe that style in a human readable text form and ask a human artist to reproduce it without ever seeing or communicating the original images.


Perhaps different media has different rules? You can’t necessarily apply music sampling rules to text, for example. Eg I don’t think incorporating a phrase from someone else’s poem into my poem would be grounds for a copyright violation.

"Copyright currently protects poetry just like it protects any other kind of writing or work of authorship. Poetry, therefore, is subject to the same minimal standards for originality that are used for other written works, and the same tests determine whether copyright infringement has occurred." [1]

[1] https://scholarship.law.vanderbilt.edu/vlr/vol58/iss3/13/


Can it? I thought up to 15s you can copy it verbatim without violation.

> That’s going to be hard to argue. Where are the copies?

Discovery will show exactly what the base images for training are. You can view that the outputs are derivative works.

I don't think the mechanism is going to shield the violation, and frankly it shouldn't.

License your source material for the purpose and do it right. Doesn't everyone know it's wrong to steal?


People in art school also practice by studying existing art and images.

I think what’s clear is that this is an unprecedented type of use. I’m really interested in seeing how the courts rule on this one as it has wide implications for the AI era.

Because this use is unprecedented, as you say, it's clear that the law wasn't written with this use case in mind. The more interesting question in my mind is what we think the new law should be, rather than what the courts happen to make of the existing law. I.e., I think the answer should come from politicians not from judges.

If not done extremely carefully, it also has wide implications for human artists

You could make the same argument that as long as you are using lossy compression you are unable to infringe on copyright.

That's a huge understatement. 5 billion images to a model of 5GB. 1 byte per image. Let's see if one byte per image would constitute a copyright violation in other fields than neural networks.

You took the images, encoded them in a computer process, and the result is able to reproduce some of those images. I fail to see why the size of the training set in bytes and the size of the model in bytes matters. Especially if, as other commenters have noted, much if the training data is repeated(mentions of thousands of mina Lisa's) so a straight division(training size/parameters size) says nothing about the bytes per copyrighted work.

It will be interesting to see how they legally define the moment where compression stops being compression and starts being an original work.

If I train on one image I can get it right back out. Even two, maybe even a thousand? Not sure what the line would be where it becomes ok vs not but there will have to be some answer.


There only needs to be an answer if it's determined that some number isn't copyright infringement. The easy answer would be to say that the process is what prevents the works from being transformative(and thus copyrightable) and not the size of the training set.

Another thing worth referencing in this context might be hashing. If a few bytes per image are copyright infringement, then likely so is publishing checksums.

Once you start recreating copyrighted works from hashes this analogy becomes relevant, until then how can you compare the two when the distinguishing feature is its ability to reproduce the training data.

What is a 1080p MP4 video of a film if not simply a highly detailed, irreversible but guaranteed unique checksum of that original content?

I think this is overstretching it. That would be a checksum that can be parsed by humans and contains artistic value that serves as the basis for claims to copyright. An actual checksum no longer has artistic value in itself and cant reproduce the original work.

Which is why this is framed as compression, it implies that fundamentally SD makes copies instead of (re)creating art. Leaving out the issue of recreating forgeries of existing works, using the training data for the creation of new pieces should be well covered inside the bounds of appropriation. Demanding anything more then filtering the output of SD for 1:1 reproductions of the training data is really pushing it.

edit: Checksums arent necessarily unique btw. See "Hash collisions".


Overfitting seems like a fuzzy area here. I could train a model on one image that could consistently produce an output no human could tell apart from the original. And of course, shades of gray from there.

Regarding your edit, what are the chances of a "hash collision" where the hash is two MP4 files for two different movies? Seems wildly astronomical.. impossible even? That's why this hash method is so special, plus the built in preview feature you can use to validate your hash against the source material, even without access to the original.


Once you are down to one picture, collisions become feasible given the right environment and resolution of the image.

Pretty sure this is nitpicking about an overused analogy though.


The distribution of the bytes matters a bit here. In theory the model could be over trained against one copyrighted work such that it is almost perfectly preserved within the model.

You can see this with the Mona Lisa. You can get pretty close reproductions back by asking for it (or at least you could in one of the iterations). Likely it overfit due to it being such a ubiquitous image.

if it's sufficiently lossy, yeah. don't know where you draw the line tho. maybe similar to fair use video clips.

Citing fair use is putting the cart before the horse here. The debate is around whether or not the stable diffusion training and generation processes can be considered transforming the worka to create a new one in the same way we do for humans which allows for the fair use of video clips. To say that it would be similar to fair use is assuming the outcome as evidence, aka begging the question.

> It is, in short, a 21st-cen­tury col­lage tool.

Interesting that they mention collages. IANAL but it was my impression that collages are derivative work if they incorporate many different pieces and only small parts of the original. Their compression argument seems more convincing.


Compression down to two bytes per image?

You run into the pigeonhole argument. That level of compression can only work if there are less than seventy thousand different images in existence, total.

Certainly there’s a deep theoretical equivalent between intelligence and compression, but this scenario isn’t what anyone means by “compression” normally.


When gzip turns my 10k character ASCII text file into a a 2kb archive, has it "compressed each character down to a fifth of a byte per character"? No, thats a misunderstanding of compression.

Just like gzip, training stable diffusion certainly removes a lot of data, but without understanding the effect of that transformation of the entropy of the data it's meaningless to say thing like "two bytes per image" because(like gzip) you need the whole encoded dataset to recover the image.

It's compressing many images into 10GB of data, not a single image into two bytes. This is directly analogous to what people usually mean by "compression"


One idea I had was to try to recreate the original using a prompt. If you succeed, it should be obvious that the original was in the training set?

The LAION-5B dataset is public, so you can check directly whether a picture is in there or not. StabilityAI only takes a very limited amount of information from each individual picture, so for Stable Diffusion to closely reproduce a picture it would need to appear quite frequently in the dataset. There are examples of this, such as old famous paintings, "bloodborne box art" and probably many others, though I haven't looked deeply into it.

No, the "original" is in the (likely detailed) prompt you give it.

This is quite easy to do, but the results can be off in funny ways. For example, try putting this into SD with Euler a sampling and a cfg scale of 10:

"The Night Watch, a painting made by Rembrandt in 1642"

It generates a convincing low-res imitation about half the time, but it also has a tendency to make the triband flag into an American flag, or put an old ship in the background, or replace the dark city arch with a sunset...

If you keep refining the prompt, you can get closer, but at that point you're just describing what the painting should look like, rather than asking the model to recall an original work.


I would agree that we're acting like hypocrites here. But unlike stable diffusion, GitHub didn't release their model. So it's extremely hard to know what's going on inside copilot, on the other hand we have the model of stable diffusion and we can see wether or not it has memorized copyrighted images.

You seem to be under the impression that SD can only generate original art. However, it will literally recreate existing paintings for you if you just prompt it with the title. Identical composition and everything.

I’m curious. Can you give an example of that happening for a painting that’s still in copyright?

> That’s going to be hard to argue. Where are the copies?

If you take that tack, I'll go one step further back in time and ask "Where is your agreement from the original author who owns the copyright that you could use this image in the way you did?"

The fact that there is suddenly a new way to "use an image" (input to a computer algorithm) doesn't mean that copyright magically doesn't also apply to that usage.

A canonical example is the fact that television programs like "WKRP in Cincinnati" can't use the music licenses from the television broadcast if they want to distribute a DVD or streaming version--the music has to be re-licensed.


My assumption would be 'fair use'. Artists themselves make use of this extremely often, like when doing paintovers on copyrighted images (VERY common), fan art where they paint trademarked characters (also VERY common). The are often done for commission as well.

AFAIK, downloading and learning from images, even copyrighted images, fall under fair use, this is how practically every artist today learns how to draw.

Stable Diffusion does not create 1:1 copies of artwork it has been trained on, and its purpose is quite the opposite, there may be cases where the transformative aspect of a generated image may be argued as not being transformative enough, but so far I've only seen one such reproducable image, which would be the 'bloodborne box art' prompt, which was also mentioned in this discussion.


> when doing paintovers on copyrighted images (VERY common)

What are you talking about? I've been doing drawing and digital painting as a hobby for a long time and tracing is absolutely not "VERY common". I don't know anybody who has ever done this.

> fan art where they paint trademarked characters (also VERY common)

This is true in the sense that many artists do it (besides confusing trademark law and copyright law: the character designs are copyright-protected, trademarks protect brand names and logos). However, it is not fair use (as far as I'm aware at least, I'm not a lawyer). A rightholder can request for fanart to be removed and the artist would have to remove it. Rightsholders almost never do, because fanart doesn't hurt them.

There's also more examples of it reproducing copyright-protected images, I pulled the "bloodborne box art" prompt from this article: https://arxiv.org/pdf/2212.03860.pdf But I agree with you that reproducing images is very much not the intention of Stable Diffusion, and it's already very rare. The way I see it, the cases of Stable Diffusion reproducing images too closely is just a gotcha for establishing a court case.


It's going to be very hard to them to argue against Stable Diffusion and not reach the conclusion that people looking at art are doing exactly what training the AI did.

You looked at my art, now I can use copyright against the copies in your brain.


By forcing the AI community to develop technology to avoid replication of training examples they might disclose every bit of human copying as well. What can detect copyright violations in AI can be applied on human works as well.

I’m afraid we won’t like the outcome.

This feels like the argument of a money launderer.

So what is the end goal of this? For copyright to transfer every step? That precedent happens, then what? Licensing schemes get set up and any piece of media that is put into these systems will result in the artist getting some kind of payment in return. Cool, that sound great. Except... who's paying?

The conglomerates who already have a bunch of IP they can feed into those systems, who can afford to purchase or through brute force (ie cheaply employed artists) create new works using their already massive amounts of capital. These are the entities that will have complete and total control over the best versions of the tools that artists say will bring about their doom. You better fucking believe they'll have their own licensing system too.

Copyright is a prison built for artists by big business, successfully marketed to artists as being a home.


Mostly I agree, but:

> Copyright is a prison built for artists by big business, successfully marketed to artists as being a home.

I think (continuing this analogy) that copyright is indeed a home, but very few artists can afford to buy their own home, so they rent from corporate landlords, and the bigger ones are the worst ones to be tenants of.


So what is the end goal of this?

For lawyers to make money. That is the goal of much litigation.


“Hav­ing copied the five bil­lion images—with­out the con­sent of the orig­i­nal artists—Sta­ble Dif­fu­sion relies on a math­e­mat­i­cal process called dif­fu­sion to store com­pressed copies of these train­ing images, which in turn are recom­bined to derive other images.”

This seems like it’s not an accurate description of what diffusion is doing. A diffusion model is not the same as compression. They’re implying that Stable Diffusion is taking the entire dataset and making it smaller then storing it. Instead, it’s just learning patterns about the art and replicating those patterns.

The “compression” they’re referring to is the latent space representation which is how Stable Diffusion avoids having to manipulate large images during computation. I mean you could call it a form of compression, but the actual training images aren’t stored using that latent space in the final model afaik. So it's not compressing every single image and storing it in the model.

This page says there were 5 billion images in the stable diffusion training dataset (albeit that may not be true as I see online it’s closer to the 2 billion mark). A Stable Diffusion model is about 5 gb. 5 gb / 5 billion is 1 byte per image. That’s impossible to fit an image in 1 byte. Obviously the claim about it storing compressed copies of the training data is not true. The size of the file comes from the weights in it, not because it’s storing “compressed copies”. In general, it seems this lawsuit is misrepresenting how Stable Diffusion works on a technical level.


If you can put a bunch of large things together into a small file and then later (lossily) extract the large thing out of that smaller file, I'd argue that's compression, yeah. It doesn't really matter if it was intended to be art up as a compression algorithm or not in my opinion. If anything, this approach can be considered a revolution in lossy image compression, even though there's no real market for that at the moment.

If someone finds a way to reverse a hash, I'd also argue that hashing has now become a form of compression.

I think in 5 billion images there are more than enough common image areas to allow for average compression to become lower than a single byte. This is a lossy process, it does not need a complete copy of the source data, similar to how an MP3 doesn't contain most of the audio data fed into it.

I think the argument that SD revolves around lossless compression is quite an interesting one, even if the original code authors didn't realise that's what they were doing. It's the first good technical argument I've heard, at least.

All of those could've been prevented if the model was trained on public domain images instead of random people's copyrighted work. Even if this lawsuit succeeds, I don't think image generation algorithms will be banned. Some AI companies will just have spent a shitton of cash failing to get away with copyright violation, but the technology can still work for art that's either unlicensed or licensed in such a way that AI models can be trained based on it.


“It is a par­a­site that, if allowed to pro­lif­er­ate, will make artists extinct.”

This is the fundamentally flawed and misguided argument that can literally be applied to any technological progress to curtail advancement.

Imagine if the medical tricorder (a device from Star Trek that does maybe 99% of what modern doctors do) is suddenly invented today. Doctors could use this argument to defend their livelihoods, but they lose sight of the fact that doctors don’t exist because society needs to employ them. They exist because we have a problem of people getting sick… if more sick people can be helped then great! That is an advancement for society because more lives are saved (as opposed to more doctors being employed), and then simply the standard for what doctors are expected to do is raised to a higher level, since someone is still expected to operate tricorders.

Similarly, artists exist for their output for society. If these AI models can truly fulfill the needs of society that artists currently output (that is debatable), then that simply raises the bar for what artists are expected to output. But it doesn’t change the fact that we only care about the output for society (which can never be truly harmed by advancements such as this because if someone can not outperform the AI then they are redundant), not the fact that artists exist.

Put another way, many current artists who fear this are simply doing the generative work of AI already… manually. The AI is democratizing art so that the lowest hanging fruit of art is now accessible to more people. The bar for art has now been raised so that the expected quality of newer work is to be much higher. Just like how after computer aided design was invented the quality of movie effects, digital game art, etc, all jumped. Progress means those doing current “levels” of art will need to add this tool to their repertoire to build more impressive things. Rent seeking and staying in place (from an artistic advancement point of view) is not the answer.

As someone else put it in a comment here, looking at other works of art and learning how to make art and creating new art from this influence is literally how humans have been doing it for eons. Everyone is standing on the shoulders of giants. This AI merely makes it explicit so I guess it brings out the rent seeking feeling since people must feel it’s now possible to quantify the amount their own work contributed to something. I guess if you don’t want anyone to be influenced by it—AI included—the traditional way is to not show it to anyone.


> This is the fundamentally flawed and misguided argument that can literally be applied to any technological progress to curtail advancement

Let's stop for a moment and define advancement (or "progress", as it's sometimes called). It's always tacit, and never explicitly defined, and I think it bears examination.

By advancement/progress, I'm taking the argument to mean "betterment". i.e. When we say "advances in science", we're usually referring to things getting better, as a result of more science.

However, science/technology are not good in themselves. They're just tooling. You need to stop and ask which direction you've taken this advancement in the tooling in, because whether you meant it or not, both advancement and progress have direction.

> Similarly, artists exist for their output for society. If these AI models can truly fulfill the needs of society that artists currently output (that is debatable), then that simply raises the bar for what artists are expected to output

I somewhat agree, and would say this is very much like when the camera was invented. Artists lamented that they no longer had a purpose, until they invented one for themselves with surrealism. Art shifted from visual reproduction to meaning and feeling.

Surrealism is then a good example of the direction that the advancement in science (of the creation of the camera) took. What is the direction that AI generated art is taking?


Maybe AI is the visual shortcut of an Excel Pivot Table: people use it, slice and dice, and get further insights to some purpose.

It's a tool. Folks get excited about statistics, massive datasets, and computer science is hip again.

Would we not want a push for folks to experience the exacting caress of an unforgiving compiler?

I thought this stuff would be easy!

Hopefully what doesn't happen is a fragmentation of folks into content caverns, where they may gaze into a mirror and see exactly what they wish, day after day. A literal instantiation of Plato's Caves, where scientific progress is frozen and forgotten.


I tend to use the argument, "if we stopped developing technology because it threatened some people's livelihoods, a 'calculator' would still refer to a person."

Seconded - you might even say a 'computer': https://en.wikipedia.org/wiki/Computer_(occupation)

If something is useful, it will be used and developed. I certainly can't see where all this is going, but I doubt the resistance will be more than a speed bump.

> This is the fundamentally flawed and misguided argument that can literally be applied to any technological progress to curtail advancement.

No, this is the only fundamentally correct way to view this. Before the existence of the printing press, we didn't need copyright law. Yet all that the printing press did was make transcribing books by hand faster.

Quantitative changes enabled by technology are qualitative changes. And not every form that a qualitative change takes is one that leaves the world better off than we found it.


Artists do not have an inalienable right to be paid to do art. There are many reasons why you might argue that the tech is harmful, but that it "will make artists extinct" is not a good one.

And authors don't have any natural right to prevent people from copying their books.

And yet, we have decided that society is better off when authors can make money off their work.


That's exactly OP's point: the output to society is what matters, not the simple existence of a career. It can definitely be argued that society will be the worse because SD replaces artists, but we shouldn't assume a priori that eliminating a specific job is a bad thing.

name one qualitative change resulting from quantitative change that did not benefit the world as a whole?

Atom bombs.

Coal mining (if coal hadn't ever been mined, we'd probably have gotten large scale wind- and hydroelectric power much sooner).

Whitney's cotton gin, albeit due to it coming before automated cotton picking.


All of the good changes have also come with new laws that forbid many of the bad uses thereof. Not every form of use of a technological invention is a net positive, and laws reflect that, by forbidding the negative uses.

The automobile revolutionized transportation, but also came with licensing requirements. (And more recently, we are finding to be responsible for a health and climate catastrophe, necessitating new restrictions on fuel economy, leaded gasoline, ICEs, etc.) You didn't need a license to walk or ride a bicycle, or ride a horse, but when we started putting people behind thousands of pounds of steel, all of a sudden we needed to come up with a myriad of new rules and restrictions on how automobiles could be used.

The printing press came with copyright laws. New and more destructive weapons and tools and chemicals came with more restrictions regarding their possession and expected use. The telephone and the computer combined allow robo calling and spam on an industrial level, and those particular uses of those new technologies are forbidden. Radio revolutionized communication, but we don't just let any random asshole blast static into the spectrum. We have narrowly curtailed, permitted and forbidden uses of it.

It would be far easier to name the technologies that net-benefited society, and did not need new rules around them, to prevent their destructive and damaging uses.

This one isn't looking to be one of them.


But this is not correct. The AI exist because it was stole the work of the original artists (and combined them somehow). You argument is correct if the AI could come up with an original work which would compete with an artist. But the AI work is not original.

What exactly is "original" work? Is Fortnite "original" or is it a clone of PUBG? Are all portraits of women just derivatives of The Mona Lisa?

>artists exist for their output for society

People will still be creative and make art even without society consuming their output. But society can create incentives to reward people for making art.


But incentives to maximize output is different from incentives to preserve current artists’ livelihoods.

We can create incentives so more people become doctors, but the purpose of incentives at the end of the day must maximize life saving, whether it’s done by an AI or a doctor using an AI.


It seems to me the communal voice of HN varies widely on copyright issues depending on who is getting sued and who is getting potentially hurt by violations.

People who generally make less money than programmers - writers, artists, musicians - should stop their whining and their unfair uses of copyright to control their creative output.

Programmers who are getting shafted by big corporations using their code to build machine learning models need urgent protection and fairness.

It probably doesn't help that almost every copyright argument here gets made with the understanding of copyright prevalent in the American model - that is to say that copyright exists to promote the progress of science and the useful arts - whereas in many countries copyright is understood as existing because the creator of something has a moral right to control and ownership of whatever they made.

Obviously this lawsuit is in the U.S but I suppose even if it loses here other lawsuits in other countries, with a different understanding of copyright, might succeed.


I'm personally for a massive reduction of copyright time to align it on patents but that's beside the point, even in the current copyright system that argument against machine learning makes no sense.

The proof of that is they even had to lie about how stable diffusion works on this website to make it convincing, that's a clear sign that they are in the wrong.

Even themselves discovered that the truth won't get them very far.

> Obviously this lawsuit is in the U.S but I suppose even if it loses here other lawsuits in other countries, with a different understanding of copyright, might succeed.

The cat is just out of the bag anyways, if machine learning is outlawed in the US, it will flourish somewhere else instead.


I meant it more that if the suit fails in the U.S similar suits might work in other countries.

It is a multi-front battle they face.


I think the pushback from programmers has a different motivation.

Programmers love to share code, but they don't want to share it with corporations who don't give back. We invented copyleft as a way to (ab)use the legal system to open up everything. We hate copyright and "love" "copyleft" as a means to weaken copyright.

It would be like if artists gave away all of their art, except not to corporations who hog their copyrights.


I'd argue that many artists are also fine with their art being reused and reposted elsewhere, as long as it's done with the necessary attribution.

For example, I don't think Sarah Ander­sen, one of the plaintiffs here, would've reached the popularity she's gained now if it wasn't for her comics being shared on meme sites and social media, for example, and I don't see any "do not repost" watermarks on her recent work like others that do object to resharing on other platforms have started doing.

I think many artists and copyleft programmers have the same positions. The biggest difference is that drawings are considered "art" and code is generally not, despite that fact a technical drawing and business logic are both hardly artistic and mostly an expression of skill whereas the demo scene, the indie gaming scene, and many online artists are very much about expressing themselves within a given set of boundaries.

When Microsoft steals code licensed to Github, people considered that to be a license dispute more than a copyright dispute. I'd argue there is no difference at all between artists and programmers when it comes to their work being absorbed and then reproduced by an AI company.


Copyright does not exist in every country. You have intellectual rights and commercial rights but those are different from the US concept of copyright.

In France for instance an artist cannot transfer the moral rights over its art, but can transfer the commercial rights, and there is no copyright concept (which makes is funny when sites copy the US have have a copyright mention at the bottom)


That's also true, but copyright does exist in a lot of countries - often with moral rights as the basis for the concept.

An artist can look at images for reference, and draw something new inspired by them. Why does it matter if a software tool can do this much faster?

If the artist makes the image very similar to one of the reference photos, it may be a copyright violation. It doesn't matter if the artist used a pencil or software to create the new work.

Current AI image generation does, however, make it easy to unknowingly violate copyright. If it generates an image similar to something else out there you wouldn't know.

I don't know much about copyright law though, am I wrong?


Iirc it can be a problem copyright wise if I paint from a photo reference because it can infringe the photographers IP.

I dont think anyone can be sued for making a drawing of some photo.

You can claim intellectual property for comic characters etc, but not photos.

YOu could get someone to go get a similar photo taken at the same place, and use tools to enhance it


There is a somewhat popular lawsuit right now which argues exactly that (and is reported to go into the next instance): https://petapixel.com/2022/12/08/photographer-loses-plagaris...

I don't know if you're right or wrong, but it seems plausible that we could create a database of copyrighted images to check against.

Every original image is copyrighted. You're suggesting making a digital copy of every image there is to check that AI isn't generating digital copies of every image there is.

Not a copy, a hash or fingerprint. Just enough data to measure if it's substantially similar.

But yes, it may be infeasible to index and compare against every image ever uploaded.


Couldn't I just add a few non-sense bytes into my images to change the hash/fingerprint?

If I understand correctly, wouldn't a hash database of <just the training set> be larger than the actual model? (in fact by 1 or 2 orders of magnitude?)

Approximately, yes.

Only the training set should suffice.


Well, there was a copyright case in Europe recently where an artist had taken a photograph, flipped it horizontally, and painted it.

It was deemed an original work by the court.

I can’t see how, with such a precedent, they could rule that SD doesn’t produce original works.

https://www.rangefinderonline.com/news-features/industry-new...


> It was deemed an original work by the court.

The resolution is much weirder than that, the court argued that the pose isn't original enough for the photo to deserve copyrights at all, independently of what the plagiarist did with it.


> Sta­ble Dif­fu­sion relies on a math­e­mat­i­cal process called dif­fu­sion to store com­pressed copies of these train­ing images, which in turn are recom­bined to derive other images. It is, in short, a 21st-cen­tury col­lage tool.

Just no, that's not how any of that works.

I guess that lie is convenient to legitimate the lawsuit.


It's a pretty funny assertion. The whole point of ML models is to take training data and learn something general from it, the common threads, such that it can identify/generate more things like the training examples. If the model were, as they assert, just compressing and reproducing/collaging training images then that would just indicate that the engineers of the model failed to prevent overfitting. So basically they're calling StabilityAI's engineers bad at their job.

As a side discussion, is there any research model which tries to do what they describe? Like overfitting to the maximum possible to create a way to compress data. It might be useful in different ways.

Yes, look at NeRF (neural radiance fields) and SIREN (Implicit Neural Representations with Periodic Activation Functions)

The papers I'm finding on those look truly amazing! Thanks a lot for the insights

That's a lie, sure, but if they had instead claimed:

The output of stable diffusion isn't possible without first examining millions of copyrighted images

Then the suit looks a little more solid, because (as you pointed out) it isn't possible for the stable diffusion owner to know which of those copyright images had clauses that prevents stable diffusion trading and similar usage.

The whole problem goes away once artists and photographers starting using a license that explicitly removes any use of the work as training data for any automated training.


> The whole problem goes away once artists and photographers starting using a license that explicitly removes any use of the work as training data for any automated training.

A license which should be opt-in, not opt-out.

Of course, it’s opt-out because they know, fundamentally, that most artists would not want to opt-in.


> A license which should be opt-in, not opt-out.

I dunno if it matters that the opt-in has to be at the legislation level.

After all, once Creative Commons adds that clause to their most popular license, it's game over for training things like Stable Diffusion.

I'm thinking that maybe the most popular software licenses can be extended with a single clause like "usage as training data not allowed".

Of course, we cannot retroactively apply these licenses so the current model will still be able to generate images/code; they just won't be able to easily use any new ones without getting into trouble.


How does it work then? :)

The opposite way, the training images are there to support the model to generalize features.

Reproducing parts of existing images in the dataset is called overfitting and is considered a failure of the model.


how do you measure success?

i wrote an OCR program in college. we split the data set in half. you train it on one half then test it against the other half.

you can train stable diffusion on half the images, but then what? you use the image descriptions of the other half and measure how similar they are? in essence, attempting to reproduce exact replicas. but i guess even then it wouldn't be copyright if those images weren't used in the model. more like me describing something vividly to you and asking you to paint it and then getting angry at you because its too accurate


You would not need have of the images to perform that test. No more than a handful of images to prove that the text representation will not produce a identical image to a given image that has had a description described.

They don't even produce the same image twice from the same description and a different random seed.


FID score is a measure of success.

Instead of aiming to reproduce exact replicas, you use a classifier and retrieve the input of the last layer. Do it for both generated and original inputs, and then measure the differences in the statistics.

Wikipedia has a good article on this.


Computerphile has friendly introductions to just about everything: https://youtu.be/1CIpzeNxIhU

Diffusion models learn a transformation operator. The parameters are adjusted such that the operator maximises the evidence lower bound, or in other words, increasing the likelihood of observing a slightly less noisy version of the input.

The guidance component is a vector representation of the text that changes where we are in the sample space. A change in the sample space changes likelihood so for the different prompts the likelihood of the same output image for the same input image will be different.

Since the model is trained to maximise the ELBO, it will produce a change closer to the prompt.

A good way to think about it is this: given a classifier, I can select a target class and compute the derivative of the input with respect to the target class, and apply the derivative to the input. This puts it closer to my target class.

From the perspective of some models (score models), they produce a derivative of the density (of the samples), so it’s a bit similar to computing a derivative via classifier.

The above was concerned with what the NN was doing.

The algorithm applies the operator a number of steps, and progressively improves the image. In some probabilistic models, you can think of this as an inverse of stochastic gradient descent procedure (meaning a series of steps) that, with some stochasticity, reach a high value region (the density).

However, it turns out that learning this operation doesn’t have to be grounded in probability theory and graphical models.

As long as the NN learns a sufficiently good recovery operator, diffusion will construct something based on the properties of the dataset that has been used.

At no point however are there condensed representations of images since the NN is not learning to produce an image from zero in one step. It merely learns to recover some operation applied to the input.

For the probabilistic view, read Denoising Diffusion Probabilistic Networks and references, in particular langevin dynamics. It includes citations to score models as well.

For the non probabilistic component, read Cold diffusion.

For using the classifier gradient to update an image towards another class, read about adversarial generation via input gradients.


> A good way to think about it is this: given a classifier, I can select a target class and compute the derivative of the input with respect to the target class, and apply the derivative to the input. This puts it closer to my target class.

excellent description, thanks


Sometimes I have to wonder about the hypocrisy you can see on HN threads. When its software development, many here seem to understand the merits of a similar lawsuit against Copilot[1], but as soon as its a different group such as artists, then it's "no, that's not how a NN works" or "the NN model works just the same way as a human would understand art and style."

[1] https://news.ycombinator.com/item?id=34274326


It's not hypocrisy, it's diversity.

HN is not a person, it's a forum with lots of people with different opinions. Depending on dozens of factors (time of day, title of the article, who gets in first) different opinions will dominate.

I've seen threads on Copilot that overwhelmingly come down in favor of Microsoft and threads on Stable Diffusion that come down hard against it. Also, even in a thread that has a lot of one opinion, there are always those who express the opposite view.


Funny thing - the forum works like a language model. It doesn't have one set personality, but it can generate from a distribution of people. The language model can generate from a distribution of prompts, which might be persona descriptions.

> Out of One, Many: Using Language Models to Simulate Human Samples

https://arxiv.org/abs/2209.06899


Aren't they different?

Stable Diffusion is about closed to open.

Copilot is about open to closed.

The Stable Diffusion version of Copilot would be something like

"Give me a cart checkout algorithm in the style of Carmack, secure C style."

And that's fine, if the destination code license were just as open--or even less restricting--than the source codes' license (relicensing rules permitting).

What could be the issue is the generated source becomes even more closed or proprietary, which defeats the original source intent.

Is that right?


I suspect it's different people. There's a kind of bias where it seems like everyone else on a forum is all one person who behaves super inconsistently, I've thought it as well.

At the very least, Stable Diffusion is much different than Copilot in term of the model license. I, you, and all the artists have irrevocable access to the model (in practical term, I'm not interested in discussion whether they can somehow legal strong arms people from using the model).

We only have mere limited access to Copilot. And it is impractical for almost anyone else on earth to train a similar model, while we are 100% sure it is possible to have a dataset or to redo the training of SD. Just from pure utilitarian point of view, it's much easier to support fighting against Copilot than SD


disregarding the access part, i say copilot also does not violate copyright, in so far as it only reproduces insubstantial portions of existing works.

If you asked copilot to reproduce an existing work, then surely that violates copyright - in the same way you can ask SD to reproduce one of the training data (which would violate copyright in the same way).

But both the training, and the usage of these ML models do not violate copyright. Only until someone produces a copyrighted works from it, does that particular _usage_ instance will violate copyright, and it does not invalidate any other usages.


How do you know it's the same people making those comments?

I believe Copilot was giving exact copies of large parts open source projects, without the license. Are image generators giving exact (or very similar) copies of existing works?

I feel like this is the main distinction.


Not large parts of open source projects. It was one function that was pretty well known and replicated. The author prompted with a part of the code, and the model finished the rest including the original comments.

There are two issues here

- the model needs to be carefully prompted (goaded) into copyright violation, so it is instigated to do it by excessive quoting from the original

- the replicated codes are usually boilerplate, common approaches or "famous" examples from books; in other words they are examples that appear in multiple places in the training set as opposed to just once

Do generic codes, boilerplate and API calls deserve protection? Maybe the famous examples do, but not every replicated code does.


Copilot didn't just spit out the fast inverse square root, it spat out someone's entire "about" page in HTML, name and all. This was just some guy's blog, not a commonly replicated algorithm from a book.

Furthermore, copyright infringement doesn't stop being copyright infringement if you do it based on someone else's copyright infringement. Just become someone else decided to rip the contents of a CD and upload it to a website doesn't mean I'm now allowed to download it from that website again.

Copyright law does include an originality floor, you can't copyright a letter or a shape unless you're a billion dollar startup and in the same way that you can't copyright fizzbuzz or hello world. I don't think that's relevant for many algorithms Copilot will generate for you, though.

If simple work doesn't deserve protection, the pop music industry with their generic lyrics and simple tunes may be in big trouble. Disney as well, with their simplistic cartoon characters like Donald Duck and Mickey Mouse.

Personally, I think copyright laws are extremely damaging in their duration and restrictions. IP law in a small amount of countries actually allows for patenting algorithms, which is equally silly. International IP law currently gets in the way of society in my opinion.

However, without short term copyright neither programmers nor artists will be happy and I don't think anyone but knock-off companies will be happy with such an arrangement. Five or ten years is long enough for copyright in my book, but within those five or ten years copyright must remain protected.


> Are image generators giving exact (or very similar) copies of existing works?

um, yes.[1][2] What else would they be trained on?

According to the model card:

[1] https://github.com/CompVis/stable-diffusion/blob/main/Stable...

it was trained on this data set(which has hyperlinks to images, so feel free to peruse):

[2] https://huggingface.co/datasets/laion/laion2B-en


> What else would they be trained on?

why does it matter how it was trained? The question is, does the generative AI _output_ copyrighted images?

Training is not a right that the copyright holder owns exclusively. Reproducing the works _is_, but if the AI only reproduces a style, but not a copy, then it isn't breaking any copyright.


Yes, because real artists are also allowed to learn from other paintings. No problem there, unless they recreate the exact work of others.

Banning AI from training on copyrighted works is also problematic because copyright doesn't protect ideas, it only protects expression. So the model has legitimate right to learn ideas (minus expression) from any source.

For example facts in the phonebook are not copyrighted, the authors have to mix fake data to be able claim copyright infringement. Maybe the models could finally learn how many fingers to draw on a hand.


These models produce a lot of “in the style of” content, which is different from an exact copy. Is that different enough? I guess that’s what this lawsuit is going to be about.

Yeah what's considered a copy or not is a grey area. Here's a good example of that: https://news.ycombinator.com/item?id=34378300

But artists have been making "in the style of" works for probably millennia. Fan art is a common example.

I suppose the advent of software that makes it easy to make "in the style of" works will force us to get much more clear on what is and isn't a copy. How exciting.

However, I don't see how the software tool is directly at fault, just the person using it.


I've seen some overtrained models. they keep showing the same face over and over again. surely from the training data. i don't think you can argue against stable diffusion as a whole, but maybe specific models that haven't muddled the data enough to become something unique

It's a small industry to fine-tune a model on your photos to generate fantasy images of yourself / to see yourself in a different way.

It's only hypocrisy if you think that HN is made up of a single person commenting under multiple accounts and not a diverse group of people with varying opinions.

Indeed

Instead of improving the world, creating better tools they want to sue eachother. I thought those times were over

Or maybe it is just bias against Microsoft


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: