This might be a dumb question, but I can't seem to find the answer. What is the license of code generated by Copilot? Is it owned by GitHub, or the user? For example: Bison is GPL, but the parsers it generates aren't (and it specifically says so). Would be nice if Copilot was specific about ownership, but I can't find any info
The reason I'm curious about this is because I'd like to know whether or not it's OK to accept OSS contributions from people that used Copilot. Does the author actually have permission to give me the code they sent? (Also I wish I didn't have to think about this)
@tenderlove what I expect to be the official response from Copilot:
@tenderlove you won't find any because Copilot has been repeately shown to generate code based on multiple licenses without crediting them. sometimes just copies one and applies wrong license. sometimes mushes together.
it's legally dangerous to use anything that it generates in anything other than a private personal project because there's a non-trivial chance you're violating 1 or more licenses which you are unaware of.
@tenderlove I have also wondered a lot about this topic. The fact that they removed copyright and author info (while embarrassingly containing some earlier on) makes me really wonder if this is a legal minefield.
@enebo seems like there must be some kind of fair use involved depending on the code, but idk. I'm not a lawyer and honestly I don't even want to think about this problem
@tenderlove If you are generating rspec tests from a prompt I doubt it would lead to being accused of infringement. "Give me a dtoa implementation in Java". I would be very worried.
@tenderlove For fair use there is actually a page on it as fuzzy as it is: https://www.copyright.gov/fair-use/
After reading this I am none the wiser on whether you can accept a PR from copilot.
@enebo @tenderlove as someone who translated a lot of code and was still super careful about noting provenance and getting author consent for license changes even for a translation of the original code, Copilot is mind boggling to me.
Basically “code laundering” in a way that abstracts away the original copyright
@tenderlove I'm pretty sure this issue was resolved in Season 7 of Star Trek Voyager https://tvtropes.org/pmwiki/pmwiki.php/Recap/StarTrekVoyagerS7E18AuthorAuthor
@tenderlove I think it's all undefined until it's tested in court.
@jordan just accept patches until I end up in court
@tenderlove It'd be ironic if a GitHub employee used their Hyatt Legal Plan lawyer to sue a 3rd party for incorporating their code that Copilot regurgitated
@tenderlove it’s whatever you license it as, according to the faqs (https://github.com/features/copilot):
“GitHub does not own the suggestions GitHub Copilot provides to you. You are responsible for the code you write with GitHub Copilot’s help.”
obviously that leaves out all the potential legal trouble/concerns if the code used for training the model included GPL, etc. - but i guess courts will have to decide those issues.
@srecnig seems like as a maintainer it's probably safe to merge someone's code if they use Copilot? At least, it seems like I wouldn't be held responsible (I think??)
@tenderlove i’m not even close to being a lawyer, so i will just not say anything
@tenderlove haha, maybe i should’ve only quoted the faqs in my first reply, and not add any interpretation
@tenderlove I can definitely see how there would be concern, especially given that Copilot has sometimes reproduced lines of code verbatim from it's training material..
But how safe is it really to accept *any* contributions? Humans are definitely capable of copying code verbatim, taking a snippet and adapting it, or writing code that's structurally similar to things we've seen.
@tenderlove »The code, functions, and other output returned to you by GitHub Copilot are called “Suggestions.” GitHub does not claim any rights in Suggestions, and you retain ownership of and responsibility for Your Code, including Suggestions you include in Your Code.«
@lumaxis excellent, thank you!
@lumaxis @tenderlove They can say that, just like a book can say that lending or re-sale are forbidden. Doesn’t make it legally valid. US Copyright office guidance: https://www.govinfo.gov/content/pkg/FR-2023-03-16/pdf/2023-05321.pdf
@josephholsten What is your point?
@lumaxis That considering the US Copyright Office published “Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence” on 2023-03-16, doc about the functioning of the legal code may not be in sync with the current implementation ;-)
@josephholsten I’m not a lawyer either but as I understand, it reiterates and clarifies practices and interpretations that have already existed in similar fashion, especially "only humans can produce copyrightable material”.
And I'm still not sure how that relates to the previous discussion. If anything, that doc would reinforce the statement in GitHub's documentation?
@lumaxis also, crap I had a typo. I also wish I had thought a better example of “people say things about copyright that aren’t exactly true” than this https://www.reddit.com/r/writers/comments/11wnhrj/when_you_publish_a_paperback_book_dont_do_this/
@tenderlove We're just not accepting any significant AI-authored code at @bridgetown.
https://github.com/bridgetownrb/bridgetown/blob/main/CONTRIBUTING.md#ai-generated-code-policy
(This has largely been lifted from Shoelace's policy by Cory LaViska.)
@jaredwhite this is really great.
@dataKnightmare #FYI Vale per Copilot ma anche per tutta l'allegra famiglia GPT: sotto quale licenza considerare il codice prodotto?
@olistik
è un marasma, visto che tutta la combriccola ha aspirato l'aspirabile strafottendosene della licenza. Adesso ti dicono che i diritti non sono loro, il che è come dire che se dici che sono tuio ti stai prendendo la responsailità del fato che magari loro hanno violato licenze a destra e a manca.
per tacere del fatto che in quanto LLM, non c'è nessuna assicurazione riguardo alla bontà del codice.
secondo me ce n'è abbastanza per evitare questo pattume come la peste.
@dataKnightmare non solo i diritti non sono loro ma non possono neanche dirti quali sono le licenze.
Potrebbero tranquillamente aver violato fior fior di licenze.
@tenderlove The answer is a moving target. For example the US only ruled last week that code generated by an AI cannot be copyrighted.
@tenderlove I've been asking myself this ever since GH started sending me unsolicited PRs with dependency updates...
@petko @tenderlove Dependabot isn't AI generated though, it's all programmatically generated without any sort of training model
@BobbyMcWho, yes, you would think it would be clear for such trivial programatically-generated contributions... Do I assume the contribution is licensed under my repo's license? Who do I add as a contributor holding copyright over the contribution? Github? Microsoft? The authors of dependabot?
And these are questions for the trivial case, let alone for the ML model that rips off all open source code on GH... // @tenderlove
@tenderlove this is all pending a bunch of court cases, it is unclear what the ruling will be. eg: https://petapixel.com/2023/02/07/getty-images-are-suing-stable-diffusion-for-a-staggering-1-8-trillion/ once some precedence in the legal system is set the industry will have to adapt. If you lift 21% of the code you wrote off a method in a GPL sourced repo, where do you stand? If you only lift the concept? Its a brave new world.
@tenderlove Given theres a lawsuit about wether Copilot produced code is GPL if its trained on GPL code, I suspect the silence is deliberate until these sorts of things are worked out…