Thursday, October 11, 2012

joining files with empty field

An easy way to join two text files by common column is to use the "join" command in linux.  Here are some examples for that:

$ cat A.txt B.txt
John    A       1
Linda   B       2
Rares   C       3
1       A
2       B
3       C
$ join -1 2 -2 2 A.txt B.txt
A John 1 1
B Linda 2 2
C Rares 3 3

What the trick is, when there is missed/empty field in one of the file, the "-e" option works only when "-o" is set. Here is full story for that:

"Important: FILE1 and FILE2 must be sorted on the join fields." (from this online manpage).
This problem #1. Problem #2 is worse: option -e is badly documented -- only works in conjunction with-o, so for example:
$ join -a 1 -a 2 -e'-' -o '0,1.2,2.2' sfile1.txt sfile2.txt 
bar 2 -   
boo - z 
foo 1 x 
qux 3 y 
where the s prefix name indicated files that I've sorted beforehand.

