How to find duplicate files BY CONTENT!!: Difference between revisions
Appearance
New page: Today in IRC suseROCKS needed to find all duplicate files in a directory by their content, not by their file name, so we whipped up this fancy little 1 liner bash script to do the trick: ... |
mNo edit summary |
||
Line 1: | Line 1: | ||
Today in IRC suseROCKS needed to find all duplicate files in a directory by their content, not by their file name, so we whipped up this fancy little 1 liner bash script to do the trick: | Today in IRC suseROCKS needed to find all duplicate files in a directory by their content, not by their file name, so we whipped up this fancy little 1 liner bash script to do the trick: | ||
find . -type f -exec md5sum '{}' \; | sort | awk 'dup[$1]++{print $2} | find . -type f -exec md5sum '{}' \; | sort | awk 'dup[$1]++{print $2}' | ||
EDIT: | EDIT: | ||
Line 7: | Line 7: | ||
As Andreas suggested, using xargs instead of -exec is much faster, here is the updated command: | As Andreas suggested, using xargs instead of -exec is much faster, here is the updated command: | ||
find . -type f -print0 | xargs -0 md5sum | sort | awk | find . -type f -print0 | xargs -0 md5sum | sort | awk 'dup[$1]++{print $2}' |
Latest revision as of 13:25, 25 October 2016
Today in IRC suseROCKS needed to find all duplicate files in a directory by their content, not by their file name, so we whipped up this fancy little 1 liner bash script to do the trick:
find . -type f -exec md5sum '{}' \; | sort | awk 'dup[$1]++{print $2}'
EDIT:
As Andreas suggested, using xargs instead of -exec is much faster, here is the updated command:
find . -type f -print0 | xargs -0 md5sum | sort | awk 'dup[$1]++{print $2}'