Detect and remove duplicated images

Enonic version: 6.15.5
OS: Windows 10

I am importing content from another CMS, including many duplicated images. Does Enonic XP have any way to detect and remove those duplicated images that are being or have been imported?

So the images have been duplicated inside the CMS that you’re importing from, which means that the image binary content may be identical, but the rest of the metadata is different? Then I don’t think there is any built-in functionality in Enonic XP which would help you detect this.

You would need to write code that does this after import, for instance by gathering all image attachments one at a time, generate a checksum for each of them, and create a list of checksums and their content IDs which you then can use to delete duplicates from.

Depending on the format of the export from the other CMS, it might be easier to remove duplicates before the content is imported into Enonic XP.

2 Likes

Sounds like a good plan, many thanks!

FYI: Under the hood at storage level, a single binary (image in this case) will only be stored once, even if you import it multiple times. Only the metadata will be duplicated.

1 Like

Thanks @tsi, this is very good to know! We were mostly concerned about storage so this solves our problem straight away, although we will try bjh’s suggestion first so that we can avoid duplicated metadata as well.