Does anyone know if ffmpeg is deterministic? If you convert the same input with the same settings, will the final hashsum of the file be the same?
@kepstin Thanks! Hm. Want to work on some deduplication for Masto attachments, but looks like if the same GIF will be uploaded multiple times, the results will not count as the same file. So perhaps I will simply store the hashsum of the original file instead of end result
@gargron yes, for this sort of usage, I'd always recommend hashing the original file, then generating derivative files indexed by the hash of the original.
@gargron Among other things - hashing the original file means that you can simply re-use the already encoded derivative media, rather than having to do the encode then realizing "oh, wait, I already have this" and throwing it away.
for this purpose, what's better - md5 or sha256? https://github.com/thoughtbot/paperclip#checksum--fingerprint
@gargron For new code, I wouldn't recommend any hash algorithm older than sha256.
There might be some security issues here - if someone knows the md5 hash of a piece of media, they could send another file with the same hash, and it'll show the old media (which might have been private?) instead of the new. Requires knowing the hash in advance, and if you know the hash/filename you could probably see it anyways?
@Gargron @kepstin Maybe take a look at video fingerprinting? https://cyber.sci-hub.io/MTAuMTEwOS90Y3N2dC4yMDEwLjIwNDYwNTY=/sarkar2010.pdf
@Gargron I think it depends which codecs you are using, some may be deterministic, others not
@gargron I'd be really concerned if it weren't.
@fluffy Be concerned then
@gargron huh. Does it put in a time stamp or something?
@fluffy seems to be codec and system dependent, but there is also the multithreading thing
@gargron ah, interesting. Didn't think that the data could be valid in different orderings.
@Gargron When using the same ffmpeg binary on the same architecture, it should be the same.
If you change the ffmpeg version, recompile it, or change the architecture (x86 vs ARM), maybe not.
@Gargron it is generally not
@Gargron short answer: no.
Longer answer: it's complicated and depends threading, codecs and parameters in play. Hardware differences could be a factor too.
@Gargron I think the question is more "Is transcoding deterministic?" and the answer to that seems to be "No".
Server run by the main developers of the project It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!