I learn people can save a lot hard-drive space thanks to deduplication, but problem was it has to be resurce-consuming (RAM/CPU).
Why not allow to application gives hint, which files could been related?
For example:
I copy one image to another and save the another image. GIMP could hind: first image is related to second.
Package manager could have CVS integration and looking, which packages have similar commits, or looking for similar packages names or for some metadata (PROVIDES?). It maybe is not great for rpm, but for gnu/guix it could be great. Simple: we can install many versions of the same package (GUIX allow for this) and package manager give hint: files put inside konsole-branding-upstream could be related to these put inside konsole-branding-opensuse.
In other words, one attempts to remove duplicate files from a file-system …
But, symbolic, relative and, physical, links are also “duplicate files
” … >:)
Yes, this is a Data Centre issue – (real) duplicate files can consume file-system space and, there are projects which make use of databases which are attempting to address this issue –
Simply search the Internet for “Linux deduplication” …
Maybe it is good for data centers, but I think it is good for desktop too.
Imagine we edit an video, so keeping many version. Deduplication of parts of this video could bring better results than compressing. Another think is savegames.
The file versions created by Multi-Media editing sessions usually have new file name extensions (V1/2/3) – and, are, therefore, not duplicates …
Which is why, traditionally, UNIX® doesn’t save a new version of a file when a file is saved after modification …
Other operating systems – some current, many not (for example, DEC operating systems such as VAX/VMS and RSX-11 when it used Files-11]) – used to save a limited number of new versions of files after the file was changed.
It’s normally a user’s responsibility to purge the no longer needed file versions …
In a strict company environment, scripts executed by administrators can force a company’s policy related to multiple file versions …
Individual users can, of course, also write scripts to do this sort of thing – if they want to …