AWK language course
AWK language course aims to explain AWK in 15 minutes to let you find awesome tool friend despite it’s given name. The correct pronunciation is [auk] after smaller seabirds Parakeet auklets.
AWK language (is):
c
and shell
programming languagesgawk
and mawk
sed
, grep
, wc
, tr
, cut
, printf
, tail
, head
, cat
, tac
, bc
, column
, …AWK language use-cases are:
AWK language capabilities:
main()
Every AWK execution consist of folowing three phases:
BEGIN{ ... }
are actions performed at the beginning before first text character is read[condition]{ ... }
are actions performed on every AWK record
(default text line)AWK record
is automatically split into AWK fields
(by default words)END{ ... }
are actions performed at the end of the execution after last text character is read
$ echo -e "AWK is still useful\ntext-processing technology!" | \
> awk 'BEGIN{wcnt=0;print "lineno/#words/3rd-word:individual words\n"}
> {printf("% 6d/% 6d/% 8s:%s\n",NR,NF,$3,$0);wcnt+=NF}
> END{print "\nSummary:", NR, "lines/records,", wcnt, "words/fields"}'
lineno/#words/3rd-word:individual words
1/ 4/ still:AWK is still useful
2/ 2/ :text-processing technology!
Summary:2 lines/records, 6 words/fields
Passing text data to AWK:
cat input-data.txt | awk <app>
awk <app> input-data.txt
AWK application execution styles (-f
):
awk '{ ... }' input-data.txt
awk -f myapp.awk input-data.txt
specifying an AWK variable on command-line -v var=val
AWK field
separator FS
variable or -F <FS>
switchGlobal variables are documented here, most common ones are:
$0
value of current AWK record
(whole line without line-break)$1
, $2
, … $NF
values of first, second, … last AWK field
(word)FS
Specifies the input AWK field
separator, i.e. how AWK breaks input record into fields (default: a whitespace).RS
Specifies the input AWK record
separator, i.e. how AWK breaks input stream into records (default: an universal line break).OFS
Specifies the output separator, i.e. how AWK print parsed fields to the output stream using print()
(default: single space).ORS
Specifies the output separator, i.e. how AWK print parsed records to the output stream using print()
(default: line break)FILENAME
contains the name of the input file read by awk (read only global variable)AWK functions are documented, the most important ones are:
print
, printf()
and sprintf()
length()
substr()
split()
index()
sub()
and gsub()
~
operator and match()
tolower()
and toupper()
Prefer general awk
before an specific AWK implementation:
awk
for portable programsgawk
General rule of thumb is to create AWK program as a *.awk
file if equivalent one-liner is not well readable.
If you have troubles to understand one line awk program then feel free to use GNU AWK’s profiling functionality i.e. -p
option to receive pretty printed AWK code (in awkprof.out
).
length > 80
should be rather written 'length($0) > 80 { print }'
or 'length($0) > 80 { print $0 }'
'
quotation when writing awk oneline applications to avoid shell expansion (for instance $1
)awk "{print $1}"
should be awk '{print $1}'
awk
or nawk
)1
(index()
, split()
, $i
, …)[g]sub()
can lead to unwanted behavior unless you force gawk to drop such support via exporting environment variable LC_ALL=C
$ echo “Zřetelně” | gawk ‘{print toupper($0)}’
ZŘETELNĚ
$ echo “Zřetelně” | mawk ‘{print toupper($0)}’
ZřETELNě
$ echo “Zřetelně” | busybox awk ‘{print toupper($0)}’
ZřETELNě
* extended reqular expressions are available just for gawk (and for older version has to be explicitly enabled):
$ ps auxwww | gawk ‘{if($2~/^[0-9]{1,1}$/){print}}’
root 1 0.0 0.0 197064 4196 ? Ss Oct31 2:21 /usr/lib/systemd/systemd —switched-root —system —deserialize 24
root 4 0.0 0.0 0 0 ? S< Oct31 0:00 [kworker/0:0H]
$ ps auxwww | gawk —re-interval ‘{if($2~/^[0-9]{1,1}$/){print}}’
root 1 0.0 0.0 197064 4196 ? Ss Oct31 2:21 /usr/lib/systemd/systemd —switched-root —system —deserialize 24
root 4 0.0 0.0 0 0 ? S< Oct31 0:00 [kworker/0:0H]
$ ps auxwww | mawk ‘{if($2~/^[0-9]{1,1}$/){print}}’
$
```