Current location

Home > Computing > Linux Notes > Find duplicate lines in a sorted file

Find duplicate lines in a sorted file

After sorting a file you will often find that some duplicate data, or you may be given various lists that need deduping. sort and uniq will quickly and easily remove duplicates, lsit only the dupilcates or only the unique data.

sort myfile.txt | uniq

List only the unique lines:

sort myfile.txt | uniq -u

List only the duplicate lines:

sort myfile.txt | uniq -d

Get a count of the number of lines by adding the -c option:

sort myfile.txt | uniq -uc

sort myfile.txt | uniq -dc

Skip fields. This will skip 3 fields, which could be useful with log files to skip the time stamp data:

uniq -f 3 mylogfile

Skip characters. This will skip the first 30 characters:

uniq -s 30 myfile.txt

Compare characters. This will compare the first 30 characters:

uniq -w 30 myfile.txt

References

Remove duplicate lines with uniq

Last modified: 16/10/2011 Tags: sort, uniq

This website is a personal resource. Nothing here is guaranteed correct or complete, so use at your own risk and try not to delete the Internet. -Stephan

Skip to

Find duplicate lines in a sorted file

References

Site Menu

Search

Site Info

Accessibility