Follow

Nvidia just replaced video codecs with a neural network.

This is pretty mind-blowing: youtube.com/watch?v=NqmMnjJ6GE

I don't even wanna think about further implications like faked recordings, online impersonation or the impact on Hollywood productions.

@fribbledom So this is how we'll get holograms and VR movies etc

@fribbledom That's awesome! I'll always have perfect, Hollywood teeth on videocalls! ;).

@fribbledom

The GAN image looks pretty rubbery and this feels like a good excuse for providers to decide they can charge more for bandwidth.

@Mainebot @fribbledom It's already here though; this technique is (essentially) already used to create realistic facial interactions on some games and VR chat platforms. I think this just applies the concepts to things like Zoom meetings.

@fribbledom this is basically how the vtuber stuff works right?

I can't wait to fake that original frame and be a big titty anime girl in all my meetings.

@fribbledom That's amazing! Though, it's gonna suck if the audio is crap, but video looks perfect

@fribbledom #tw This could be a game changer for supporting tele-maintenance in remote environments with limited bandwidth. Next step could be to create neural networks for objects (engine parts etc).

@fribbledom Sort of FaceRig with realistic imagery, ne? Impressive stuff!

And certainly, with that tech, pretending to be someone else (visually, at least) will be as simple as animoji.

Could be fantastic for helping with gender dysphoria, too, if you were able to begin with a mildly tweaked version of your facial model. (Voice is another matter, though)

@fribbledom This is wild. So easy to see the good and the bad that could come from it.

@fribbledom Imagine if the same was done for the voice. In combination this would be a great tool for impersonating people in video calls. And as the video / audio would be fake even in legitimate calls you could plausibly deny having been the one calling. What a world to live in.

@baldo @fribbledom They cannot fake much when you use trusted (verified) encryption like #Linphone does.

@fribbledom years back when I was learning about the trendy new field of neural networks, the researcher described them to me as, conceptually, a form of data compression. "AI upscaling" like #NVIDIA has been doing with their SHIELD and GPUs is kind of like a form of decompression, so it follows that by pairing that with a matched compression network we can see applications like video conferencing. What a time to be alive!

@fribbledom I just kinda laughed that for once it was the white lady who it didn't work 100% for.(Probably because the corners of her lips were pinched shut in the key pic)

@fribbledom damn... this is a pretty amazing advancement in the end-quality and bandwidth use, but yeah... I immediately thought about prank face-swaps, and then... fraudulent impersonations!

@fribbledom I was already thinking about fictional character "puppetry" for video streams, when they showed it as an actual example :)

I can see this becoming a thing among RPG actual play video streamers, for example, showing themselves as their fictional character (but also videogame streamers, and others, assuming an "official character image").

@renatoram
That, plus beauty filters on video calls with people you want to impress, followed by that gotcha moment when you're about to actually meet them...

This method is effectively deepfaking the video, but with an option to use your real face ... although a decent implementation could apply a correction every n frames to adjust output to input.

On the plus side: video calls in you undies, and noone will know! (except when something goes wrong, and then it'll be hilarious)
@fribbledom

@fribbledom I think I'd already assumed all of that stuff was in the works anyway, I'm just glad my video-conferences might be less terrible sometime soon

@fribbledom Interesting, but it would need a lot more testing in particular when faces move like in real life, or people walk around in their offices.

@fribbledom

camera user: what's all this insanity? this is supposed to be a video of my dog fetching a stick!

neural network codec: that's not how i imagined it.

@fribbledom A problem is it only works with data it has been trained with. Showing anything else besides faces won't work here (which could be solved with "smart" codecs, e.g. use AI for faces and normal video codecs for everything else).

I don't think it'll be used for video calls (also because the generated faces sometimes look odd), but it seems good enough to create virtual avatars (e.g. for video games or larger virtual conferences where bandwidth is also a problem).

@fribbledom uau, it looks nice. And especcially with low bandwith

@fribbledom The biggest problem I have with NN approaches is that they made open-source an empty shell. Now the codes are not the important part anymore: if you do not have the data and the computational power, then the codes are completely useless.

And of course..only the big companies and the government can have those.

@fribbledom Yet more ammunition for those engaged in disinformation?

@fribbledom @esopriester I wonder why they compare that to a now rather inefficient video codec like h264 and not to something more modern like VP9 or a AV1

@chrismarquardt @esopriester

Good point, yeah. That said tho, I wouldn't expect any of the state-of-the-art video codecs to perform much better at ~100 bytes / frame.

I'm actually surprised how *well* they still behaved.

Sign in to participate in the conversation
Mastodon

Server run by the main developers of the project 🐘 It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!