I wanted to find out if using as a storage backend would give file deduplication "for free", but unfortunately it looks like ImageMagick operations on the same input file are not deterministic, so you still end up with different hashes when the same file is uploaded more than once.


Of course it sounds nice in theory if your server didn't need to download a remote file and would simply save a given hash to then just put it into a IPFS gateway URL, however I believe in practice that's not sufficient. No guarantee that the software that creates a post uses the same thumbnail dimensions that Mastodon needs, so the file still needs to be downloaded to the server, converted, then re-uploaded to IPFS, and if that process is non-deterministic, it's not worth it...

Yeah, seems like it's down to timestamps in the metadata, but ImageMagick refuses to accept any options that are supposed to unset those. Plus it looks like all IPFS-related libraries in Ruby are both incomplete and unmaintained, so...

@Gargron IPFS should surely give you deduplication for free. Do you strip image metadata before calling ImageMagick? You might want to use something like MAT2 for cleaning uploaded files: 0xacab.org/jvoisin/mat2
(I observed that Twitter removes metadata from JPEG uploads.)

@chpietsch If I upload the same identical file twice, the thumbnails for it (same settings, dimensions) get different hashes.

@Gargron Maybe it's time to trash ImageMagick then. I am surprised that you use it for heavy lifting. It also has a bad security record.

@Gargron Even if thumbnails are not deduplicated, original files are

@val Only if they are smaller than the threshold where they are downsized.

@Gargron yo IPFS is like Node-or-fuck-it and it's _so_ annoying

@Gargron if both tools are broken [in Ruby], maybe it's time to look at alternatives. It would be nice to have a mature IPFS client in Ruby, but it's a lot of work (maybe ?), and ImageMagick does not have the best reputation either. 😩

@Gargron Have you ever heard of exiftool? I think it uses Pearl. But, privacy conscious people generally use it to wipe EXIF data from photos and PDFs before uploading to sites.

@TheOuterLinux @Gargron so apparently this is not happening by default, already? jeezus...


@DJ_Pure_Applesauce @Gargron I haven't actually checked how Mastodon handles EXIF data in images, but a lot of sites will convert images to JPEG in order to embed their own data, which can aid in tracking the images and PDFs (ones that use JPEG). And most conversions to formats like PNG, to help either save data or because it's a standard (JPEG is debatably not a free format), still retain that metadata. I just assume and wipe before uploading regardless of what site. Exiftool has many uses.

Sign in to participate in the conversation

Server run by the main developers of the project 🐘 It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!