Wednesday, September 10, 2008

AWK learning note

1. first update the internal variables when reading one line:
  1. 當 AWK 從資料檔中讀取一筆資料列時, AWK 會使用內建變數$0 予以記錄.
  2. AWK 會立刻重新分析 $0 的欄位情況, 並將 $0 上各欄位的資料用 $1, $2, ..予以記錄.

例如 : AWK 從資料檔 emp.dat 中讀入第一筆資料列
"A125 Jenny 100 210" 之後, 程式中:
$0 之值將是 "A125 Jenny 100 210"
$1 之值為 "A125" $2 之值為 "Jenny"
$3 之值為 100 $4 之值為 210
NF 之值為 4 $NF 之值為 210
NR 之值為 1 FILENAME 之值為 ``emp.dat''
where NF: Number of Fields in current $0
NR: Number of Records of currently having been read.
FILENAMEAWK: filename of current proceeding

2. 'PATTERN{ACTION}' or -f script.awk
the following two ways are same:

$awk -f pay1.awk emp.dat
$awk ' { print $2, $3 * $4 } ' emp.dat

if you save the script into a file named pay1.awk.
讀者可使用``-f''參數,讓AWK主程式使用其它僅含 AWK函數 的
awk -f AWK主程式檔名 -f AWK函數檔名 資料檔檔名
3. BEGIN/END and array in AWK
for example, we have a data file like:
Mary O.S. Arch. Discrete
Steve D.S. Algorithm Arch.
Wang Discrete Graphics O.S.
Lisa Graphics A.I. Lily Discrete Algorithm
{for( i=2; i<>
for(coursein Number)
printf("\%-10s %d\n", course, Number[course] )

a. NF=4 in this case, line number
b. END is a AWK之保留字, 為{ Pattern}之一種, like BEGIN. The only difference is END only run after all lines are proceeded, while BEGIN works initially before the script, and only one time (both BEGIN and END).
c. $i represents the ith elements in the line array, which is different from Perl program (in which, the $i is a variable name, in AWK, variable name cannot begin with $.)

4. Shell command and awk command
for example:
while ( "who" | getline ) n++
print n
where the who is a system command used in shell, and the getline is an awk command for input;

5. Filename in the script should be quoted by "",
for example,
print `` ID Number Arrival Time'' > ``today_rpt1''
print ``==========================='' > ``today_rpt1''

{ printf(" %s %s\n", $1,$2 ) > "today_rpt1" }
$awk -f reformat1.awk arr.dat

a. if today_rpt1 is not quoted by "", then it will be taken as a variable (which default value is 0, or Null String in AWK.)
b. the redirection mark is '>', not '>>‘, even you want to append to the end of the file. The only difference between them is, for '>>', it will append to the end of the file if it's open first time and the file exists. For '>', AWK will create a new file when it occurs first time, then append to the end (like '>>'). This is little bit different from Unix.

6. Input and output command in Awk
AWK input command: getline
AWK output command: print, printf

7. three ways to run awk
a. $awk '{print}' file1.txt file2.txt
b. $awk -f myscript.awk file1.txt file2.txt
save {print} into a file(myscript.awk) first
c. $myshell file1.txt file2.txt
save awk '{print}' $* into a shell file(named myshell. Here $* means all parameters after the shell command. You also can use $1 represents the first parameter, and $2 the second one.

8. FS(Field Separator) and RS(Record Separator)
By default, the FS is any empty character (space, \t, ), RS is newline '\n'. But they can be changed, like

--------------------------------------- make_report.awk -------------------------
FS = "\n"
RS = ""
split( "一. 二. 三. 四. 五. 六. 七. 八. 九.", C_Number, " " )
printf("\n%s 報告人 : %s \n",C_Number[NR],$1)
for( i=2; i<= NF; i++)
printf(" %d. %s\n", i-1, $i)
--------------------------------------- week.rpt ------------------------------




Latex 簡介

VAST-2 使用手冊

mathematica 入門


AWK Tutorial Guide Regular Expression
--------------------------------------- Output ------------------------
[xianjund@douglasgran data]$ awk -f make_report week.rpt

一. 報告人 : 張長弓

二. 報告人 : 吳國強
1. Latex 簡介
2. VAST-2 使用手冊
3. mathematica 入門

三. 報告人 : 李小華
1. AWK Tutorial Guide Regular Expression
9. ARGC and ARGV[]

like C, but
a. ARGC does not include the -v, -f and their options. for example, in
$awk -vx=36 -f program1 data1 data2
$awk '{ print $1 ,$2 }' data1 data2
ARGV[0]= "awk"

