"Once, GitHub Copilot suggested starting an empty file with something it had even seen more than a whopping 700,000 different times during training -- that was the GNU General Public License."
@fribbledom It trains from the code, not strictly copying it. Even if it looks the same, it technically still belongs to Github's AI
You would think that, but if it manages to copy an entire GPL license under the right circumstances, what else does it happen to just copy - no matter how seldom that is.
@fribbledom the copilot model is clearly derived from works licensed under GPL. If you have any that were used, you should have all rights to sue them. And since they haven't disclosed the full list of what they used, or even given attribution to any projects besides direct dependencies (I assume - haven't gotten their stuff yet) basically every developer who's work licensed under a common open source license has been used as training data has a right to sue them.
@fribbledom It certainly has a shocking usage implication.
I was thinking about ANY AI trained on models the other day when a company that keeps trying to farm copywriter applications included a firm online portfolio requirement. It made me think they weren't hiring, they just wanted training materials for their own project; do they have a right to train skill and style from a professionals material? Is that plagiarism in a loose sense, and how do you prove it?
@fribbledom if it IS a license violation to launder code into projects through gpt-3 refactoring of the idea, how do you prove it? How do you un-train the model if you CAN prove it was used in violation? How would you audit every project that inadvertently used the loosely-plagiarized code?
@fribbledom imagine a website that uses ai to generate licenses in the style of thispersondoesnotexist.com
Truthfully, copyright is only one of the concerns people should be having here.
The GPL 3 automatically gives away any patent encumbrances in the code.
This isn't the case with the other licenses.
So, it isn't just an issue of the Copilot violating copyright, but of the Copilot violating patents.
Software patents shouldn't even really exist, but since they do ...
I think this is a case where a judge will have to decide. FSF will argue that it's basically copy/pasting snippets of code while MS will argue that it is the magic of learning and these things are wholly new constructs.
I think MS is going to have more to prove here, given AI has no rights, it's not paid a salary, and it actually constitutes property. A defining characteristic of creativity is that it's hard.
In any case, it's gonna be interesting to see...
@fribbledom I mean if it's a verbatim copy I don't see how it couldn't be infringement and if it's already copied the license text as-is it's not a ludicrous notion it won't do the same with actual code.
If it's similar I think the line of when it is and isn't derivative is gonna be a hard one and I'm 90% certain they're going to draw it in the wrong place, but I don't know which direction yet.
@fribbledom Funny thought: the "Monkey Picture Trial" might have bearing here. The court found that the photo copyright of a photo taken by a primate belonged to the owner of the camera, not the primate who pressed the shutter button. This would appear to set a precedent that a nonhuman agent cannot be a "creator" and someone else in the chain must be.
@fribbledom It appears to be similar to when a subcontractor provides me with stolen code. Regardless of whether I know it's stolen - it won't make me immune to the original author's rights.
@fribbledom It sounds to me like GitHub would be responsible for GPL violations.
It likely isn’t even legal for them to have sampled all this GPL code in a proprietary system.
I've read several statements, that recommend avoiding github.
Maybe withdrawin all own code is a good idea...
@fribbledom That is the very same issue also human workers face, for example at my employer: when your daily work is to write OpenSource, you might unknowingly write code which resembles things in proprietary code which you have seen. For that reason we are asked to stay away from proprietary code, simple.
Server run by the main developers of the project It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!