OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling’s Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.
And does this apply equally to all artists who have seen any of my work? Can I start charging all artists born after 1990, for training their neural networks on my work?
Learning is not and has never been considered a financial transaction.
Actually, it has. The whole consept of copyright is relatively new, and corporations absolutely tried to have people who learned proprietary copyrighted information not be able to use it in other places.
It’s just that labor movements got such non-compete agreements thrown out of our society, or at least severely restricted on humanitarian grounds. The argument is that a human being has the right to seek happiness by learning and using the proprietary information they learned to better their station. By the way, this needed a lot of violent convincing that we have this.
So yes, knowledge and information learned is absolutely withing the scope of copyright as it stands, it’s only that the fundamental rights that humans have override copyright. LLMs (and companies for that matter) do not have such fundamental rights.
Copyright by the way is stupid in its current implementation, but OpenAI and ChatGPT does not get to get out of it IMO just because it’s “learning”. We humans ourselves are only getting out of copyright because of our special legal status.
You kind of do. Fair use protects reverse engineering, indexing for search engines, and other forms of analysis that create new knowledge about works or bodies of works. These models are meant to be used to create new works which is where the “generative” part of generative models comes in, and the fact that the models consist only of original analysis of the training data in comparison with one another means as your tool, they are protected.
https://en.wikipedia.org/wiki/Fair_use
Fair use only works if what you create is to reflect on the original and not to supercede it. For example if ChatGPT gobbled up a work on the reproduction of firefies, if you ask it a question about the topic and it just answers, that’s not fair use since you made the original material redundant. If it did what a search engine would do and just tell you that “here’s where you can find it, you might have to pay for it”, that’s fair use. This is of course US law, so it may be different everywhere, and US law is weird so the courts may say anything.
That’s the gist of it, fair use is fine as long as you are only creating new information and only use the copyrighted old work as is absolutely necessary for your new information to make sense, and even then, you can’t use so much of the copyrighted work that it takes away from the value of it.
Otherwise if I pirated a movie and put subtitles on it, I could argue it’s fair use since it’s new information and transformative. If I released the subtitles separately, that would be a strong argument for fair use. If I included a 10 sec clip in it to show my customers what the thing is like in action, then that may be argued. If it’s the pivotal 10 seconds that spoils the whole movie, that’s not fair use, since I took away from the value of the original.
ChatGPT ate up all of these authors’ works and for some, it may take away from the value they have created. It’s telling that OpenAI is trying to be shifty about it as well. If they had a strong argument, they’d want to settle it as soon as possibe as this is a big stormcloud on their company IP value. And yeah it sucks that people created something that may turn out to not be legal because some people have a right to profit from some pieces of capital assets, but that’s the story of the world the past 50 years.
Ehh, “learning” is doing a lot of lifting. These models “learn” in a way that is foreign to most artists. And that’s ignoring the fact the humans are not capital. When we learn we aren’t building a form a capital; when models learn they are only building a form of capital.
Artists, construction workers, administrative clerks, police and video game developers all develop their neural networks in the same way, a method simulated by ANNs.
This is not, “foreign to most artists,” it’s just that most artists have no idea what the mechanism of learning is.
The method by which you provide input to the network for training isn’t the same thing as learning.
Do we know enough about how our brain functions and how neural networks functions to make this statement?
Yes, we do. Take a university level course on ML if you want the long answer.
My friends who took computer science told me that we don’t totally understand how machine learning algorithms work. Though this conversation was a few years ago in college. Will have to ask them again
ANNs are not the same as synapses, analogous yes, but different mathematically even when simulated.
This is orthogonal to the topic at hand. How does the chemistry of biological synapses alone result in a different type of learned model that therefore requires different types of legal treatment?
The overarching (and relevant) similarity between biological and artificial nets is the concept of connectionist distributed representations, and the projection of data onto lower dimensional manifolds. Whether the network achieves its final connectome through backpropagation or a more biologically plausible method is beside the point.
What do you think education is? I went to university to acquire knowledge and train my skills so that I could later be paid for those skills. That was literally building my own human capital.