How to find duplicate files BY CONTENT!!: Difference between revisions

From www.ReeltoReel.nl Wiki
Jump to navigation Jump to search
(New page: Today in IRC suseROCKS needed to find all duplicate files in a directory by their content, not by their file name, so we whipped up this fancy little 1 liner bash script to do the trick: ...)
 
mNo edit summary
 
Line 1: Line 1:
Today in IRC suseROCKS needed to find all duplicate files in a directory by their content, not by their file name, so we whipped up this fancy little 1 liner bash script to do the trick:
Today in IRC suseROCKS needed to find all duplicate files in a directory by their content, not by their file name, so we whipped up this fancy little 1 liner bash script to do the trick:


  find . -type f -exec md5sum '{}' \; | sort | awk 'dup[$1]++{print $2}
  find . -type f -exec md5sum '{}' \; | sort | awk 'dup[$1]++{print $2}'


EDIT:
EDIT:
Line 7: Line 7:
As Andreas suggested, using xargs instead of -exec is much faster, here is the updated command:
As Andreas suggested, using xargs instead of -exec is much faster, here is the updated command:


  find . -type f -print0 | xargs -0 md5sum | sort | awk ‘dup[$1]++{print $2}
  find . -type f -print0 | xargs -0 md5sum | sort | awk 'dup[$1]++{print $2}'

Latest revision as of 13:25, 25 October 2016

Today in IRC suseROCKS needed to find all duplicate files in a directory by their content, not by their file name, so we whipped up this fancy little 1 liner bash script to do the trick:

find . -type f -exec md5sum '{}' \; | sort | awk 'dup[$1]++{print $2}'

EDIT:

As Andreas suggested, using xargs instead of -exec is much faster, here is the updated command:

find . -type f -print0 | xargs -0 md5sum | sort | awk 'dup[$1]++{print $2}'