OpenAI accusing DeepSeek, another AI model, of stealing their outputs to train their own model is the funniest thing I’ve read this year.
« This thing is stealing our work that we built by stealing other people’s work and paraphrasing it incorrectly ». Peak 2025.
@thelinuxEXP Information wants to be free.
You can't own it.
@Nobody Sure, because AI only uses « information ». No films, videos, images, scientific papers…
Sorry, but that’s the worst take.
@thelinuxEXP @Nobody but then, how to train AI ? (I am not saying that stealing is a good thing) Perhaps i'm not aware enough, but i don't think an AI reading a book without paying causes harm to the author. Open AI models are not open source, but others are, and i think they use the same methods .
@darklogel@mastodon.social @thelinuxEXP@mastodon.social @Nobody@social.freetalklive.com
but then, how to train AI?If you take this just a bit further, you might begin to understand why people are mad at the very existence of generative AI.
@darklogel @thelinuxEXP @Nobody Piracy of creative works in any form does not directly harm anyone in a physical violence sense, no.
Unfortunately the only system we have for making sure people are compensated for their work is the ancient copyright system, which corporations readily wield to bonk people on the head with in piracy allegations.
When people then want to bonk corporations on the head for doing the same thing, relevant laws are frequently revealed to not be equal for all.
@darklogel @thelinuxEXP @Nobody It has been obvious for ages that the current system is broken and needs replacing.
Corporate dragons sitting on mountains of IP, benefitting from the status quo have been fighting that forever ofc. - famously Disney pushing hard here.
So while far from ideal, them's the rules.
@darklogel @thelinuxEXP @Nobody OpenAI is well aware of it, which is why when other corporations threaten to bonk OpenAI on the head for breaking the rules, OpenAI opt to pay said corporation monies for access to the material they want to use.
That is the direct answer to your question on how you train an AI: You use free material or you pay the rights holder. Whether that is a record label / a movie publisher / a book publisher / a news organisation / whatever.
@AngryAnt @thelinuxEXP @Nobody I don't think it's possible to do so, even for corporations as big as OpenAI, it would cost too much. As a scientist I think knowledge should be free, why not consider AI like a new kind of wikipedia ? We give it all the information humans know once, and then it can advise your movie or book to everyone, isn't it a good deal?
@darklogel @AngryAnt @Nobody It really doesn’t work this way though. First, Wikipedia is open, and not privatized, and doesn’t cost you a thing unless you want to.
AI will definitely be private, and costs money (at least for the latest models).
Second, AI pre-digests things where Wikipedia lets you make your own mind.
Finally, AI misinterprets a LOT, where Wikipedia is generally very reliable. In short: AI is terrible tech we shouldn’t use :)
@thelinuxEXP @AngryAnt @Nobody I disagree, gpt, mistral, gemini free models are not the most recent and powerful ones, but almost. If you are looking for free AI models, a lot of them are partly or totally open source. AI doesn't impose you anything, like on wikipedia, you can choose to trust whatever you want. And about the accuracy, it will improve as scientists work on AI, but a problem is that a lot of things are dependent on people's point of view, neither wiki nor AI can give them all.
@darklogel @AngryAnt @Nobody No AI is truly open source: only the model is, not the training dataset, so it’s results can’t be replicated -> the Open Source Initiative has a clear definition that AI currently matches.
Apart from that, yes, the AI totally imposes its interpretation of facts: it summarizes things for you. That’s its only purpose :)
@thelinuxEXP @AngryAnt @Nobody If they gave everybody all the books and movies they hacked on the internet to train their models....i don't think it's a good idea (maybe it's a better one to list all the ressources that have been used, and even then, they will be people complaining they didn't pay everything)
And I think what you wanted to say is that you can't redo the whole model training process, cause if you want the result, just copy the trained one.
@thelinuxEXP @AngryAnt @Nobody but even if you had the dataset, i don't think you could train it as they did by yourself, it's like wanting to build a 5GW nuclear plant in your garden. You can't get all the calculus workforce needed to train big AI models.
About IA use, like wikipedia, it summarizes info, if you want to, it can go way deeper in the topic, and even give different viewpoints. Reducing AI use to just summarizing things is like saying a laptop's only use is to go on the internet.
@darklogel @AngryAnt @Nobody AI doesn’t give you the sources. Wikipedia also tries very hard to avoid bias that isn’t justified by a source. AI doesn’t.
Really, comparing AI to Wikipedia is comparing your froend’s knowledge of a topic to a scientific encyclopedia. One is half accurate and biased, and the other is a list of sources that have been properly summarized and peer reviewed. They are not comparable in any way’s that makes logical sense ;)
@thelinuxEXP @AngryAnt @Nobody giving sources to verify information is a functionality that is being implemented in more and more models lately. And AI is just 1000 times more powerful than Wikipedia (don't worry i like wikipedia), you can't ask wikipedia to explain more, or in a different way, or to try to find its bias, or to summarize something, or to compare products, or to do your maths calculations or to ....
AI is more than an encyclopedia, it just lacks 1% accuracy.
@darklogel @AngryAnt @Nobody And thus is unsuitable for the general public. Because no one will ask it to rephrase or what its biases are, we all know how people work: they use the result as is. If that result is an accurate, it shouldn’t be put forward as a replacement for doing actual research. You can argue for all the AI power you went, if it’s inaccurate, even 1% (and it’s likely a lot more than 1%), then it’s not fit for purpose.
@thelinuxEXP @AngryAnt @Nobody It means that if my maths teacher makes more than one mistake per class, then i should change her? And if I do more....wow, i should go to hell then XD.
Everything in the world is biased, you, me, wikipedia, history, this social network (left), and thus AI, dumb people's dumb use of AI is a fatality, if you give people a knife, there will be one stabbing another, but people still use knives.
@darklogel @AngryAnt @Nobody As per the calculus, that’s yet another disadvantage of AI. It does a worse job than humans and uses more resources. As long as AI isn’t 100% trustworthy in how it summarizes things, it is unsuitable for its intended purpose. Any energy it uses to be 80% or even 98% accurate is thus wasted: you still need another information source to confirm, and thus you might as well not use AI at all.
@thelinuxEXP @AngryAnt @Nobody I don't get it, I'm not 100% accurate, nor you, then we are trash? Being 98% accurate, AI is already more accurate than 98% of people. If it was so inefficient, then people would not use it. And as time passes, it will continue to improve.
@darklogel @AngryAnt @Nobody LOL. People will use anything to avoid doing it themselves. I’m not accurate, but a computer is supposed to. Expected to, even.
The sad truth is: AI is only useful is you’re not skilled enough, or too lazy, to do the thing yourself. No one with the talent to create, diagnose, or read, wants to use an AI to do it for them. It’s overused because people are lazy and will use a shortcut even if it’s a bad one.
@thelinuxEXP @AngryAnt @Nobody But you can't be skilled in everything, then it's useful. And it can also enable you to gain skills (it depends on your use), and one day if not already, it will be more skilled than you in the area you're the most skilled in.
@darklogel @AngryAnt @Nobody Fibally, as per other uses of AI, I’ve yet to see a single one that has accurate and reproducible results. Sure it’s able to find cancer before a doctor. IF you did an Xray, and accept to give that personal data to a private company. And provided the AI can reproduce this accurately instead of it being a fluke. All use cases have been one-offs that AI devs couldn’t explain how to reproduce. Not confidence inspiring!
@thelinuxEXP @AngryAnt @Nobody I'm sorry to disagree, there are so many wonderful uses of AI (and some bad too), and being able to summarize, to explain a math reasoning, or to give the perks of buying this instead of that is already cool in itself. And again, data scientists and AI specialists are getting better and better are training their AI to do what they want
@darklogel @AngryAnt @Nobody Again, better done by websites and real humans than by AI. For free. In videos, articles, anything you want. AI doesn’t do it better, just faster.
It’s still mediocre tech for people who like shortcuts.
@thelinuxEXP @AngryAnt @Nobody I do agree some websites and real people can do it, and sometimes better than AI, but they are biased (just joking XD).
More seriously, AI does it faster, sometimes, it's difficult to find what you want on the internet, and your neighbor isn't always skilled enough.
@darklogel @AngryAnt @Nobody Basically, we’ll never agree on this. The way Ai works is a glorified autocorrect. It can’t be accurate to a point where it’s usable, because it doesn’t understand the concept of truth or fact. It doesn’t understand what it learns, it learns patterns. It just predicts what is likely, and that’s very often wrong. Until that changes, my stance will be: AI is mediocre tech for lazy people.
@thelinuxEXP @AngryAnt @Nobody indeed AI isn't suitable for everything, perhaps you're asking things too difficult, sometimes i ask him to correct an exercise and it just tell nonsense, and it's ok, cause less than 1% (perhaps 0.1 or even 0.01) of people could answer it.
@thelinuxEXP @AngryAnt @Nobody But yes basically we'll never agree, i'm a tech enthusiast, i hope i've managed to give you new perspectives on AI, and that perhaps, you will look at the question in a different way in the future. As for me, I think your points were interesting and raises important questions about tech and science.
Ps micro texting apps are the worst for debating.
@darklogel @AngryAnt @Nobody And if they can’t pay for all the content they use, then their product isn’t viable. As simple as that. If I wanted to start a streaming service but I couldn’t pay for the rights to movies, no one would give me a pass. I don’t know why people think this is a good argument for AI :)
@thelinuxEXP @AngryAnt @Nobody There are a lot of exemples where it doesn't work that way, health care system (in europe), army, school, independent newspapers, wikipedia, and of course, scientific research
@darklogel @AngryAnt @Nobody Scientific publications are mostly paywalled. Army tech is 100% proprietary and not shared openly. Health is the post proprietary thing out there, with formula for medicine being kept for years until it’s freely available. Papers ask viewers to pay in a lot of cases to read them. Not sure what the point was ?
@thelinuxEXP @AngryAnt @Nobody I never said it was free nor open, I only said these are activities needing way more money that they earn, i don't think paywall compensate the wages of scientists, they need government's subsidy or people's donations. I wasn't talking about big pharma corps or arms companies, i was talking about hospitals and national armies.
@darklogel @AngryAnt @Nobody I don’t think we can compare public services to private knowledge bases though
@thelinuxEXP @AngryAnt @Nobody Why private knowledge base ? Didn't we just say that most of the models are free to use and open source ? The real pb is when there is a monopoly, but until now, it's not the case. As you said, training databases are private, but it's easy to understand why like i explained in my last message.
@darklogel @AngryAnt @Nobody The model is free, the content isn’t, that’s not open! And no, your aguments don’t work. Just because it a hard to do, or not feasible doesn’t mean we should let these companies just steal stuff.
If they can’t open these contents, or a list of it; and provide the licenses they bought for it, they shouldn’t be allowed to operate, period. Anything else is corporate justification for not paying.
@thelinuxEXP @AngryAnt @Nobody Let's try to see that from an other frame, in the compagnie perspective, buying every book is too expensive (between 100 and 500 millions $ just for the books according to gpt and 1 or 2 billions $ for deepseek), but as an author, one book more or one book less isn't that different, I would do it for the science.
@darklogel @thelinuxEXP @Nobody I agree. The copyright rules are dumb. I thought I had made that pretty clear? As a scientist I'm sure you were taught to not extract data from its context? :)
People should be fairly compensated for their work and then that should be the end of it. IP treasure troves belong in the dark ages and digital piracy is an oxymoron.
Obviously until we fix that all should be equal before the law - so long as we choose to live in a society based on laws.
@AngryAnt @darklogel @Nobody This is the best way of saying it I’ve seen!
@AngryAnt @thelinuxEXP @Nobody I'm sorry, i don't understand your point (i'm not a native english speaker).
@darklogel @thelinuxEXP @Nobody That's ok - neither am I.
@darklogel @thelinuxEXP @Nobody I don't think you've engaged with this topic enough to have a strong opinion. Using AI tools, one could, for example, feed AI models with a soundtracks a composers has made. One could then generate music that is 'close enough' for practical use and avoid hiring *any* composer, including the one whose work enabled the model.
I don't know about you. But I see that as direct harm, and there are already services facilitating this. Likewise for books.
@rusozoll @thelinuxEXP @Nobody I already know that, and all I can say is that, like in every scientific revolution, there are losers and winners. Some jobs will disappear, and others appear. But I'm not sure that creative jobs are really at risk, you like the artist as much as its art. AI music isn't that successful and i don't think it will ever be,