Shell Script · unix

Unix shell script to find duplicate files in a directory including all subdirectories.

Solution:

This is basic approach to find duplicate files in a directory including its sub-directories in linux system. You could obtain much efficient solution using inbuilt utilities like fslint , fdups & rdfind etc. For these utilities we need to install packages on the system.

But this approach uses native shell script to find the duplicate files.

#Usage of Script : duplicate.sh 

directory_path=$1
find $directory_path -type f -printf “%10s\t%p\n” | sort –numeric | uniq -D –check-chars=10 > $directory_path/test.dat
sed ‘s/\t/,/g’ $directory_path/test.dat > $directory_path/test1.dat
rm -rf $directory_path/diff.dat
size2=0
echo “======Regular duplicate files=========================\n” > $directory_path/diff.dat
while read line
do
size1=`echo $line|cut -d’,’ -f1`

if [[ “$size2” != “$size1” ]]
then
size2=$size1
file1=`echo $line | cut -d’,’ -f2`
grep -w $size1, $directory_path/test1.dat > $directory_path/test2.dat
while read line2
do
file2=`echo $line2 | cut -d’,’ -f2`
q=`diff -q “$file1” “$file2″`
if [[ -z $q ]];
then
echo $file2 >> $directory_path/diff.dat
fi

done $directory_path/diff.dat
echo “Output is stored in $directory_path/diff.dat file”
rm -rf $directory_path/test.dat
rm -rf $directory_path/test1.dat
rm -rf $directory_path/test2.dat

Or 
We can use below command to find duplicates in a directory

find . -type f -print0 | xargs -0 -n1 md5sum | sort --key=1,32 | uniq -w 32 -d --all-repeated=separate