Unix shell script to find duplicate files in a directory including all subdirectories.
Solution:
This is basic approach to find duplicate files in a directory including its sub-directories in linux system. You could obtain much efficient solution using inbuilt utilities like fslint , fdups & rdfind etc. For these utilities we need to install packages on the system.
But this approach uses native shell script to find the duplicate files.
#Usage of Script : duplicate.sh directory_path=$1 find $directory_path -type f -printf “%10s\t%p\n” | sort –numeric | uniq -D –check-chars=10 > $directory_path/test.dat sed ‘s/\t/,/g’ $directory_path/test.dat > $directory_path/test1.dat rm -rf $directory_path/diff.dat size2=0 echo “======Regular duplicate files=========================\n” > $directory_path/diff.dat while read line do size1=`echo $line|cut -d’,’ -f1` if [[ “$size2” != “$size1” ]] then size2=$size1 file1=`echo $line | cut -d’,’ -f2` grep -w $size1, $directory_path/test1.dat > $directory_path/test2.dat while read line2 do file2=`echo $line2 | cut -d’,’ -f2` q=`diff -q “$file1” “$file2″` if [[ -z $q ]]; then echo $file2 >> $directory_path/diff.dat fi done $directory_path/diff.dat echo “Output is stored in $directory_path/diff.dat file” rm -rf $directory_path/test.dat rm -rf $directory_path/test1.dat rm -rf $directory_path/test2.dat
Or We can use below command to find duplicates in a directory find . -type f -print0 | xargs -0 -n1 md5sum | sort --key=1,32 | uniq -w 32 -d --all-repeated=separate