## Wednesday, September 10, 2008

### AWK learning note

1. first update the internal variables when reading one line:
1. 當 AWK 從資料檔中讀取一筆資料列時, AWK 會使用內建變數\$0 予以記錄.
2. AWK 會立刻重新分析 \$0 的欄位情況, 並將 \$0 上各欄位的資料用 \$1, \$2, ..予以記錄.
`例如 : AWK 從資料檔 emp.dat 中讀入第一筆資料列"A125 Jenny 100 210" 之後, 程式中:\$0 之值將是 "A125 Jenny 100 210"\$1 之值為 "A125" \$2 之值為 "Jenny"\$3 之值為 100 \$4 之值為 210NF 之值為 4 \$NF 之值為 210NR 之值為 1 FILENAME 之值為 ``emp.dat''where NF: Number of Fields in current \$0NR: Number of Records of currently having been read.FILENAMEAWK: filename of current proceeding2. 'PATTERN{ACTION}' or -f script.awkthe following two ways are same:\$awk -f pay1.awk emp.dat\$awk ' { print \$2, \$3 * \$4 } ' emp.datif you save the script into a file named pay1.awk.讀者可使用``-f''參數,讓AWK主程式使用其它僅含 AWK函數 的檔案中的函數其語法如下:awk -f AWK主程式檔名 -f AWK函數檔名 資料檔檔名`
`3. BEGIN/END and array in AWKfor example, we have a data file like:`
Mary O.S. Arch. Discrete
Steve D.S. Algorithm Arch.
Wang Discrete Graphics O.S.
Lisa Graphics A.I. Lily Discrete Algorithm
`---------------------------------------`
{for( i=2; i<>
END{
for(coursein Number)
printf("\%-10s %d\n", course, Number[course] )
}

`---------------------------------------`
comment:
a. NF=4 in this case, line number
b. END is a AWK之保留字, 為{ Pattern}之一種, like BEGIN. The only difference is END only run after all lines are proceeded, while BEGIN works initially before the script, and only one time (both BEGIN and END).
c. \$i represents the ith elements in the line array, which is different from Perl program (in which, the \$i is a variable name, in AWK, variable name cannot begin with \$.)

4. Shell command and awk command
for example:
`---------------------------------------`
BEGIN {
while ( "who" | getline ) n++
print n
}
`---------------------------------------where the who is a system command used in shell, and the getline is an awk command for input;`

5. Filename in the script should be quoted by "",
for example,
`---------------------------------------`
BEGIN {
print `` ID Number Arrival Time'' > ``today_rpt1''
print ``==========================='' > ``today_rpt1''
}

{ printf(" %s %s\n", \$1,\$2 ) > "today_rpt1" }
`---------------------------------------`
\$awk -f reformat1.awk arr.dat

Note:
a. if today_rpt1 is not quoted by "", then it will be taken as a variable (which default value is 0, or Null String in AWK.)
b. the redirection mark is '>', not '>>‘, even you want to append to the end of the file. The only difference between them is, for '>>', it will append to the end of the file if it's open first time and the file exists. For '>', AWK will create a new file when it occurs first time, then append to the end (like '>>'). This is little bit different from Unix.

6. Input and output command in Awk
AWK input command: getline
AWK output command: print, printf

7. three ways to run awk
a. \$awk '{print}' file1.txt file2.txt
b. \$awk -f myscript.awk file1.txt file2.txt
save {print} into a file(myscript.awk) first
c. \$myshell file1.txt file2.txt
save awk '{print}' \$* into a shell file(named myshell. Here \$* means all parameters after the shell command. You also can use \$1 represents the first parameter, and \$2 the second one.

8. FS(Field Separator) and RS(Record Separator)
By default, the FS is any empty character (space, \t, ), RS is newline '\n'. But they can be changed, like

`--------------------------------------- make_report.awk -------------------------`
BEGIN {
FS = "\n"
RS = ""
split( "一. 二. 三. 四. 五. 六. 七. 八. 九.", C_Number, " " )
}
{
printf("\n%s 報告人 : %s \n",C_Number[NR],\$1)
for( i=2; i<= NF; i++)
printf(" %d. %s\n", i-1, \$i)
}
`--------------------------------------- week.rpt ------------------------------張長弓GNUPLOT 入門吳國強Latex 簡介VAST-2 使用手冊mathematica 入門李小華AWK Tutorial Guide Regular Expression--------------------------------------- Output ------------------------[xianjund@douglasgran data]\$ awk -f make_report week.rpt一. 報告人 : 張長弓1. GNUPLOT    入門二. 報告人 : 吳國強1. Latex 簡介2. VAST-2 使用手冊3. mathematica 入門三. 報告人 : 李小華1. AWK Tutorial Guide Regular Expression---------------------------------------9. ARGC and ARGV[]like C, buta. ARGC does not include the -v, -f and their options. for example, in \$awk -vx=36 -f program1 data1 data2or\$awk '{ print \$1 ,\$2 }' data1 data2`
ARGC=3
ARGV[0]= "awk"
ARGV[1]="data1"
ARGV[2]="data2"