Wednesday, April 18, 2012

Remove duplicate lines in a text file with uniq

After sorting a file you will often find that some duplicate data, or you may be given various lists that need de-duplicating. sort and uniq will quickly and easily remove duplicates, list only the duplicates or only the unique data:

sort myfile.txt | uniq

List only the unique lines: sort myfile.txt | uniq -u
List only the duplicate lines: sort myfile.txt | uniq -d

Get a count of the number of lines by adding the -c option.
sort myfile.txt | uniq -uc
sort myfile.txt | uniq -dc
Skip fields: uniq -f 3 mylogfile. this could be useful with log files to skip the time stamp data
Skip characters. uniq -s 30 myfile.txt. Skip the first 30 characters
Compare characters. uniq -w 30 myfile.txt. Compare the first 30 characters

No comments:

Post a Comment