Filter to delete all characters
from the input which are not letters.
tr -cd "A-Za-z"
-c==complement
AWK : Utility which is a 'pattern scanning and processing language' is a complete programming language with a syntax resembling that of C.
Invoking awk
AWK involves a script containing the commends which awk uses. The script can either be a string which is an argument to awk, or can be contained in a file.
The data is either piped from STDIN, or is contained in one or more files given as args
Data is divided into records
each of which is subdivided into fields
Cringle Chris 14 75 33
Smith Sam 56 59 45
pattern { action }
awk -f awkfile or awk '{ }'
Naming the fields
Shell script: $1, $2 name the arguments
AWK script : $1, $2 fields, $0 is record
print == echo,
Q) Script which will copy STDIN to STDOUT
A) { print $0 }
Q) Write a shell script which
runs awk to print out the first field of each line of standard input.
A) #This shell script prints
the first field of each line of STDIN
awk '{ print $1 }'
Q) Input: hello there
Command: { print "(" $0 ")" }
Chris
A) (hello there)
(Chris)
Formatted Output
printf as shell utility
$printf "Hello %s!\n"
$LOGNAME
Hello Chris
printf as awk
{ printf "The first field
is %s\n", $1 }
araja:~/tmp$ awk '{ printf
"The first field is %s\n", $1 }' < xyz
The first field is Cringle
The first field is Smith
Q)Write an awk script which
when given input in two columns, representing a person's first name followed
by their family name will reverse the order of the names, and separate
them with a comma.
A) awk '{ printf "%s,
%s\n", $2, $1 }'
or
awk '{ print $2 ", " $1 }'
DATA:
| Veges | Price | lbs |
| potatoes | 0.50 | 5 |
| carrots | 0.80 | 2.5 |
| peas | 2.20 | 1 |
| beans | 2.10 | 2 |
| artichokes | 8.50 | 0.5 |
| sweetcorn | 0.9 | 3 |
Q) $ awk '{ printf "%s
%.2f %.1f\n", $1, $2, $3 }' <vegetables
A) potatoes 0.50 5.0
carrots 0.80 2.5
peas 2.20 1.0
-------
Q) Evaluate number of secs
in a day
A) awk '{ print 24*60*60
}
Q) Write an awk script which
will reformat the data in vegetables in the following format:
I bought 5.0 lbs of potatoes
at 50c per lb.
A) { printf "I bought %.lf
lbs of %s at %dp per lb\n", $3, $1, 100*$2 }
Q) Write an awk script which
uses the data in vegetables to calculate the total amount of money spent
on each vegetable, printing it in a following format:
potatoes cost 2.50
A) { printf %s cost %.2f\n",
$1, $2*$3 }
Q) Display the current year
A) $date | awk '{ print
$6 }'
or
$ date | cut -d ' ' -f6
$ date +"%Y"
Patterns
In the previous examples, we have performed actions on every line of input, by using a null pattern.
2 simple patterns: BEGIN,
END
a) Copy STDIN to STDOUT and
print messages.
BEGIN { print "Start
of file" } # Done at the start
{ print $0 }
# for each line of input
END { print "End of file"
} # done at the end
b) Print the cost per pound
of every vegetable whose name commences with a vowel
/^[aeiou]/ { printf "%s
costs %.2f per lb\n", $1, $2 }
Here ERE will look for a match in the entire record
c)Looks for a match only
in the first field. (Field# tilde)
$1 ~ /^[zeiou]/
{ printf "%s costs %.2f per lb\n", $1,$2 }
d)
grep -E 'ERE'
or
awk '/ERE/ {print $0
}'
Patterns can be expressions which evaluate to TRUE or FALSE
e) Print out the cost per
pound of all expensive(>$1) vegeatbles
$2 > 1.00 { printf "%s
costs %.2f per lb\n", $1, $2 }
Q) Print out the total costs
for vegetables only if that cost is at least 2.50.
A)
$2*$3 >= 2.50 { printf "%s cost %.2f\n", $1, $2*$3 }
Q) Print out the names of
each vege purchased which either cost at most $1 per lb or for which
less than 1 lb was purchased.
A)
$ 2 <= 1 ||
$3 < 1 { printf "%s\n", $1 }
Variables
Shell has varibale which
we give names ti and access their values by placing a dollar before their
names.
In awk we also have variables,
assign values but no need for a dollar sign.
Q) Require a total grocery
bill
A) BEGIN { total = 0
}
{ total = total + $2*$3 }
END { printf "Total cost is %.2f\n", total}
Q) Average price
A) BEGIN { totalcost
= 0
totalweight = 0 }
{ totalcost = totalcost + $2*$3 }
#Cost, an use +=
{ totalweight = totalweight + $3 }
#Weight
END
{printf "Average cost is $%.2f per lb\n" ,totalcost/totalweight }
Special Variables
FILENAME The pathname
of the current input file
FS Input
field separator, usually SPACE
NF Number
of fields in current record.
NR # of current
record from start of input
FNR # of current record
from start of current input file
OFS Output field separator
used by print, usually SPACE
ORS Output record
separator used by print, usually NEWLINE
Records are assumed to be single line (Like cat -n)
Q) Prepend each input line with line number.
{ printf "%6d %s\n", NR, $0 }
Q) Using awk, select the
first three lines of standard input, in the manner of head -3
A) NR <= 3 { print
$0 }
NR == 3 { exit }
The variable NR starts off with value 1 on the first line of input, and continues counting however many files are given as args. FNR, is reset to 1 each time a new file is read. FILENAME contains the name
Q) Write an awk script firstlines
which will read from a number of files and print out the first line of
each file preceded by the message The first line of filename is: in the
following manner
$ awk -f firstlines vegetables
/usr/dict/words
A) Use FNR to form
the pattern to find the first line of each input file.
FNR == 1 {printf "The
first line of %s is:\n%s\n", FILENAME, $0 }
$awk '{ print NF }'
hello there
2
A B C D E
5
(blank line)
0
Control-D
Q) If some data in vegetables
has been mistyped, there might be lines in the file which contain either
less than or more than three fields. Such lines cannot be processed correctly
by precious awk scripts. Write an awk script which will read a file and
print out a list of which lines contain a number of fields different to
three.
A)
NF != 3 { printf "Line %d has %d fields\n", NR, NF }
Arguments to 'awk' scripts
Q) Shell script called price
which would take one argument, representing a vegetable name, and interrogate
the file vegetables as before to print out the total price paid for that
vegetable.
A) awk ' { printf
"%s %.2f\n", $1, $2*$3 }' vegetables } | grep $1 #evaluate total
cost for all veges and uses grep to filter out the single line of output
from awk. grep $1, $1 refers to first arg of shell script.
awk '/$1/ {printf "#%s %.2f\n", $1, $2*$3 }' vegetables won't
work why???
Alternative:
awk '{ if (veg == $1)
#where $1 is first arg to script
printf "%s %.2f\n", $1, $2*$3 }' veg=$1 vegetables #veg-$1 will set veg
as soon as awk starts
awk ' veg == $1
{ printf "%s %.2f\n", $1, $2*$3 }' veg=$1 vegetables
Q)Write a shell script which
will take a single argument, representing cost in cents, and print out
the names of all vegetables listed in file vegetables which cost more than
that number of cents per lb.
A) if [ $# -ne 1 ]
then echo "One argument needed"
exit 1
fi
awk
'{ if ($2 * 100 >= cost)
printf "%s\n", $1 }' cost=$1 vegetables
exit
0
Arrays :
Associative array is a collection
of variables which has a name, and each variable in that array has
an index.
Q) Write an awk script which
will read as input a sequence of lines each containing the name of a month.
Output should be the name of the month read in followed by the number of
days in it.
A) BEGIN {
daysin["January]=31; daysin["February"] = 28
so on
daysin["December"]=31
}
{
printf "%s had %d days\n", $1, daysin[$1] }
Q) Calculate the average
cost per pound for each vegetable
A) { costs[$1] += $2*$3;
weights[$1] += $3 }
END { for (veg in costs)
printf "%s: %.2f cents per pound\n", veg, costs[veg]/weights[veg] }
Q)To print the value of your
path
A) printf "%s\n", ENVIRON["PATH"];
Fields & Record Separators
FS= field separator
RS= record separator
/etc/passwd
chris:hyrwer9834j2k3409wer:1623:103:Chris
Cringle:/cs/ugrad/chris:/bin/sh
sam.....
To display passwd file using
NIS
ypcat passwd
Q) Using awk and /etc/passwd
write a shell script findname which takes an argument, which is a usercode,
and displays the name of the user who owns that usercode.
A) if [ $# -ne
1 ]
then echo "findname requires one argument"
exit 1
fi
awk '
BEGIN { FS=':' }
{
if ($1 ==usercode)
printd "%s\n", $5 }
' usercode=$1 </etc/passwd #Run
the awk with usercode set to value of the first arg of the shell script,
and read the data from /etc/passwd
Q) Write an awk script which
will read standard input containing a list of company names and phone numbers,
together with other information. All companies in the input which have
the keyword Anytown as part of their data should be printed out. The data
for each company should be separated by a single line containing a single
% symbol.
A) BEGIN { RS="%" }
/Anytown/ { print $0 }'
print can be tailored to indivdual requirements by use of output field and output record separators OFS(space) and ORS(\n)
Q) Write an awk script which
will read in the password file and output users names and home directories,
in the format
ABC has home directory /cs/ugrad/abc
A) awk ' BEGIN { FS=":"
OFS=" has home directory "
ORS=".\n" }
{ print $5,$6 }' </etc/passwd
Functions:
sin(x), cos(x), rand(),
toupper(S), split(....), match(s, ERE) etc
Q) Write a script which will
"roll a die". Each time a line of input is entered the script will print
out a number between 1 and 6 to mimiv someone throwing a die
A) awk '{ printf %d\n",
int(rand()*6 + 1) }'
Q) Write a script which will
read the password file and display each user's name in capitals
A) awk ' BEGIN
{FS=":" }
{ print toupper($5) }' </etc/passwd
Q) Electrical retail company:
Free delivery for addresses zipcode starting with B and followed by atleast
1 digit.
Outside this area , flat rate
Customer data
invoice#, customer, road,
town, zipcode
The company requires a document
to instruct the delivery man which customers to visit and which to charge
delivery fee
A) BEGIN { FS="," }
{ if ( match($%, "^B[0-9]") > 0)
fee = "no fee"
else
fee = "standard fee"
printf "%s, %s, %s, %s: %s\n", $2, $3, $4, $5, fee
}