A post by Julia Reda: "GitHub Copilot is not infringing your copyright"
@fribbledom I think some of the comments on that blog post have a point; the article does not address the cases where copilot reproduces large-ish chunks of its training code.
@fribbledom she hasn't paid attention. People are largely complaining about the fact that Copilot can and does reproduce large pieces of original GPL licensed code verbatim, it's not small line fragments.
Secondly, if we take the "machine generation is not derivative work and is public domain" argument, it would set a precedent for laundering original copyrighted work through ML models such as this.
Copilot itself might not be infringing copyright, but _you_ are by using it.
@fribbledom it also leaks secrets, which is a second serious complaint
Specifically, *compilers* take a corpus of source code and generate new works, namely binaries. Under a naieve reading of “the output of a machine simply does not qualify for copyright protection”, *no* existing software is eligible for copyright protection; it is *all* the output of a machine.
@RAOF @fribbledom Copilot does generate actually unique permutations or irreduceable algorithm stuff (you can't copyright pure algorithms), the problem isn't with those, the problem is with verbatim copies of original code it makes, that's an actual legal problem, it doesn't matter who gave you the code if it's word-for-word someone else's.
@evolbug @fribbledom no applied to music your statement would mean that a sine-wave of a certain length could be copyrightable, which is ridiculous (brb trying to do that as a art performance). On the other hand, a very memorable tune of a few seconds length is, or at least can constitute a protected work. Also originality is a boolean, not a float. A given piece of work is either original or not.
@evolbug @fribbledom no, it’s not a strawman, you’re basing your arguments on an understanding of protectable work that is, best case, a very rough approximation of „the law“. I know where you come from, I debated for half a year with a professor until I understood most of the intricacies of german copyright law (which are compliant with the revised berne convention on copyright and so shouldn’t materially differ through the developed world).
@evolbug @fribbledom You can’t generally say, that every function is copyrightable. Some are, some not, depending on originality, which is something ultimately a judge decides on. If we both write a function for flooring a float, chances are, we will have similar results. Something similar is happening here. It should be trivial (in a computer sciences meaning) to show that the model can’t have just remembered everything verbatim, so it can’t copy.
Finally a differentiated analysis of the matter!
I have such an aversion against people, who shout polarizing content into the world just to get attention.
Sometimes I feel people have forgotten how to THINK. It's actually sad.
Thank you for sharing this.❤️
@fribbledom "some commentators accuse GitHub of copyright infringement, because Copilot itself is not released under a copyleft licence" Aren't they in fact saying the code produced by Copilot is a derivative work, and thus should be released under copyleft?
"On the other hand, the argument that the outputs of GitHub Copilot are derivative works of the training data is based on the assumption that a machine can produce works. This assumption is wrong and counterproductive. Copyright law has only ever applied to intellectual creations – where there is no creator, there is no work. This means that machine-generated code like that of GitHub Copilot is not a work under copyright law at all, so it is not a derivative work either."
@fribbledom The question that stays is: if someone doesn't want to have their work used to train a machine, how that person can do that? Is it possible for someone to ask GitHub to remove their code from training? Or is it not arbitrary? If not, someone that does not want to have their work used would need to move their repositories to another platform that would block it.
@bekopharm I am trying to understand the situation and letting it up to whatever people wanna do with their code.
@robby save way to run a project into the void without a pragmatic way to check source, submit patches or PRs. Nobody wants to register on yet another website just to do stuff.
And the bots will do it anyway.
I can tell. Self hosting my projects for a decade now.
License infringement will happen and you only get the official ways to deal with this. As usual. A "technical" solution will not help or rescue anything. As usual. Especially GPL history is full of this.
@robby I'd happily use more self hosted systems if they'd let me login e.g. with IndieAuth or similar. Just not keen on raising yet another account.
Sure, do your thing. I do. No worries. Going self hosted and walling off because of some bot reading your repo is absurd tho.
Oh and on jumping the hoops: just yesterday we found an issue with the generic hid joystick driver in kernel. It will go unreported because all parties involved are not going to waste a day finding the proper report channel
scraping protection might be an idea, but there are also legit use-cases where one would want to download a lot of FLOSS Code, and you can't prevent GitHub from obtaining a copy. They could also send interns to clone all the code in Internet Cafés ... as long as copies exist, they are able to get one. The question is if they bother datamining other platforms when they can just go through their repo storage ...
in the mood for a rant
@fribbledom interesting that julia reda now also can do the glorious word twisting for which other politicians are known.
copilot is not "ok". machine "learning" is just a mathematical transformation, not magically non-copyrighted because "a machine did it!!!111". i could as well just save windows source code in EBCDIC and say it isn't copyrighted anymore. OR ENCODE FUCKING MUSIC IN $CODEC AND IT ISN'T COPYRIGHTED ANYMORE! 🖕
our whole legal system is broken, everything is bullshit. have the money, buy the law.
maybe one shouldn't have based it on the roman law system.
NERO, GET THE TORCHES.
@fribbledom Strong agree on copilot not infringing by virtue of not being open source, but I think her second point (which responds to the only criticism I’ve seen online) is questionable. Unclear how “a machine cannot produce work” fits with compilers producing work which has always gotten copyright protection. Her point here feels like motivated reasoning based on her (laudable, and stated up front) general desire to see ever-weakened copyright.
So can I take proprietary code, train my own ML on it and then use the resulting suggestions in an open source project?
Server run by the main developers of the project It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!