"Once, GitHub Copilot suggested starting an empty file with something it had even seen more than a whopping 700,000 different times during training -- that was the GNU General Public License."

That makes me wonder: if an AI copies my code, does the license still apply?

Will we run into a situation where GitHub Copilot keeps "unknowingly" laundering open-source code into commercial projects now?

But since this AI has been trained with data from various licensing models, what happens if the AI combines code from licenses that contradict each other?

This may be the most intriguing experiment in software licensing since the GPL 😄

@fribbledom You mean the part where they started hiding mechanical turks in their AI to do exactly that?

How do we know the plan wasn't about just that from the beginning?

@fribbledom It trains from the code, not strictly copying it. Even if it looks the same, it technically still belongs to Github's AI


You would think that, but if it manages to copy an entire GPL license under the right circumstances, what else does it happen to just copy - no matter how seldom that is.

@fribbledom Or when it leaks secret proprietary code into an open source project.

@penguin42 @fribbledom Looking forward to random private keys directly embedded in the code to be leaked through their AI

@phel @fribbledom What does it suggest when prompted with -----BEGIN RSA PRIVATE KEY----- ?

@fribbledom If it causes someone to accidentally open source their code, I'm gonna laugh so hard.

@fribbledom People already noticed it was trained on GPL code, so it shows MSFT thinks this is a way to launder source code into a new licences. They had some statement to the effect of 'it changes the code just enough to not count'.

Anyone using this is going to have security and legal nightmares later.
@fribbledom If someone uses a computer to launch a nuke at a country, is that not an act of war because they used a computer? Of course the license applies if one uses a computer to copy code.

@fribbledom the copilot model is clearly derived from works licensed under GPL. If you have any that were used, you should have all rights to sue them. And since they haven't disclosed the full list of what they used, or even given attribution to any projects besides direct dependencies (I assume - haven't gotten their stuff yet) basically every developer who's work licensed under a common open source license has been used as training data has a right to sue them.

@fribbledom It certainly has a shocking usage implication.

I was thinking about ANY AI trained on models the other day when a company that keeps trying to farm copywriter applications included a firm online portfolio requirement. It made me think they weren't hiring, they just wanted training materials for their own project; do they have a right to train skill and style from a professionals material? Is that plagiarism in a loose sense, and how do you prove it?

@fribbledom if it IS a license violation to launder code into projects through gpt-3 refactoring of the idea, how do you prove it? How do you un-train the model if you CAN prove it was used in violation? How would you audit every project that inadvertently used the loosely-plagiarized code?

@fribbledom imagine a website that uses ai to generate licenses in the style of

@fribbledom I hadn't thought of that, yeah this is a violation of many licenses.


Truthfully, copyright is only one of the concerns people should be having here.

The GPL 3 automatically gives away any patent encumbrances in the code.

This isn't the case with the other licenses.

So, it isn't just an issue of the Copilot violating copyright, but of the Copilot violating patents.

Software patents shouldn't even really exist, but since they do ...

@fribbledom licenses dont work after all do they? :(

I think this is a case where a judge will have to decide. FSF will argue that it's basically copy/pasting snippets of code while MS will argue that it is the magic of learning and these things are wholly new constructs.

I think MS is going to have more to prove here, given AI has no rights, it's not paid a salary, and it actually constitutes property. A defining characteristic of creativity is that it's hard.

In any case, it's gonna be interesting to see...

@fribbledom I mean if it's a verbatim copy I don't see how it couldn't be infringement and if it's already copied the license text as-is it's not a ludicrous notion it won't do the same with actual code.

If it's similar I think the line of when it is and isn't derivative is gonna be a hard one and I'm 90% certain they're going to draw it in the wrong place, but I don't know which direction yet.

@fribbledom Funny thought: the "Monkey Picture Trial" might have bearing here. The court found that the photo copyright of a photo taken by a primate belonged to the owner of the camera, not the primate who pressed the shutter button. This would appear to set a precedent that a nonhuman agent cannot be a "creator" and someone else in the chain must be.

@fribbledom It appears to be similar to when a subcontractor provides me with stolen code. Regardless of whether I know it's stolen - it won't make me immune to the original author's rights.

@fribbledom It sounds to me like GitHub would be responsible for GPL violations.

It likely isn’t even legal for them to have sampled all this GPL code in a proprietary system.

I've read several statements, that recommend avoiding github.
Maybe withdrawin all own code is a good idea...

@fribbledom looting the commons and disrespecting all licenses not proprietary

@fribbledom That is the very same issue also human workers face, for example at my employer: when your daily work is to write OpenSource, you might unknowingly write code which resembles things in proprietary code which you have seen. For that reason we are asked to stay away from proprietary code, simple.

Sign in to participate in the conversation

Server run by the main developers of the project 🐘 It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!