Find duplicate lines in a sorted file

After sorting a file you will often find that some duplicate data, or you may be given various lists that need deduping. sort and uniq will quickly and easily remove duplicates, lsit only the dupilcates or only the unique data.

sort myfile.txt | uniq

List only the unique lines:

sort myfile.txt | uniq -u

List only the duplicate lines:

sort myfile.txt | uniq -d

Get a count of the number of lines by adding the -c option:

sort myfile.txt | uniq -uc

sort myfile.txt | uniq -dc

Skip fields. This will skip 3 fields, which could be useful with log files to skip the time stamp data:

uniq -f 3 mylogfile

Skip characters. This will skip the first 30 characters:

uniq -s 30 myfile.txt

Compare characters. This will compare the first 30 characters:

uniq -w 30 myfile.txt

References

Last modified: 16/10/2011 Tags: ,

This website is a personal resource. Nothing here is guaranteed correct or complete, so use at your own risk and try not to delete the Internet. -Stephan

Site Info

Privacy policy

Go to top