Lecture 5

Clarifications:
./myscript   versus exec myscript

Filter to delete all characters from the input which are not letters.
tr -cd "A-Za-z"   -c==complement

AWK : Utility which is a 'pattern scanning and processing language' is a complete programming language with a syntax resembling that of C.

Invoking awk

AWK involves a script containing the commends which awk uses. The script can either be a string which is an argument to awk, or can be contained in a file.

The data is either piped from STDIN, or is contained in one or more files given as args

Data is divided into records each of which is subdivided into fields
Cringle Chris 14 75 33
Smith Sam 56 59 45

pattern { action }

awk -f awkfile or awk '{ }'

Naming the fields

Shell script: $1, $2  name the arguments
AWK script : $1, $2 fields,   $0 is record

print == echo,

Q) Script which will copy STDIN to STDOUT
A) { print $0 }
 

Q) Write a shell script which runs awk to print out the first field of each line of standard input.
A) #This shell script prints the first field of each line of STDIN
          awk '{ print $1 }'

Q) Input: hello there                       Command: { print "(" $0 ")" }
                 Chris
A) (hello there)
      (Chris)
 

Formatted Output

printf as shell utility
$printf "Hello %s!\n" $LOGNAME
Hello Chris

printf as awk
{ printf "The first field is %s\n", $1 }

araja:~/tmp$ awk '{ printf "The first field is %s\n", $1 }' < xyz
The first field is Cringle
The first field is Smith
 

Q)Write an awk script which when given input in two columns, representing a person's first name followed by their family name will reverse the order of the names, and separate them with a comma.
A) awk '{ printf "%s, %s\n", $2, $1 }'
      or
      awk '{ print $2 ", " $1 }'

DATA:

Veges Price lbs
potatoes 0.50 5
carrots 0.80 2.5
peas 2.20 1
beans  2.10 2
artichokes 8.50 0.5
sweetcorn 0.9 3

Q) $ awk '{ printf "%s %.2f %.1f\n", $1, $2, $3 }' <vegetables
A) potatoes 0.50 5.0
     carrots    0.80 2.5
      peas       2.20 1.0
      -------

Q) Evaluate number of secs in a day
A) awk '{ print 24*60*60 }

Q) Write an awk script which will reformat the data in vegetables in the following format:
I bought 5.0 lbs of potatoes at 50c per lb.
A) { printf "I bought %.lf lbs of %s at %dp per lb\n", $3, $1, 100*$2 }

Q) Write an awk script which uses the data in vegetables to calculate the total amount of money spent on each vegetable, printing it in a following format:
potatoes cost 2.50
A) { printf %s cost %.2f\n", $1, $2*$3 }

Q) Display the current year
A) $date | awk '{ print $6 }'
      or
      $ date | cut -d ' ' -f6
      $ date +"%Y"

Patterns

In the previous examples, we have performed actions on every line of input, by using a null pattern.

2 simple patterns: BEGIN,  END
 

a) Copy STDIN to STDOUT and print messages.
BEGIN { print "Start of file" } # Done at the start
{ print $0 }                                  # for each line of input
END { print "End of file" }        # done at the end

b) Print the cost per pound of every vegetable whose name commences with a vowel
/^[aeiou]/ { printf "%s  costs %.2f per lb\n", $1, $2 }
Here ERE will look for a match in the entire record

c)Looks for a match only in the first field. (Field# tilde)
$1 ~  /^[zeiou]/ { printf "%s costs %.2f per lb\n", $1,$2 }

d)
grep -E 'ERE'
or
awk '/ERE/ {print $0 }'
 

Patterns can be expressions which evaluate to TRUE or FALSE
e) Print out the cost per pound of all expensive(>$1) vegeatbles
$2 > 1.00 { printf "%s costs %.2f per lb\n", $1, $2 }
 

Q) Print out the total costs for vegetables only if that cost is at least 2.50.
A)
      $2*$3 >= 2.50 { printf "%s cost %.2f\n", $1, $2*$3 }

Q) Print out the names of each vege purchased which  either cost at most $1 per lb or for which less than 1 lb was purchased.
A)
 $ 2 <= 1 || $3 < 1 { printf "%s\n", $1 }

Variables

Shell has varibale which we give names ti and access their values by placing a dollar before their names.
In awk we also have variables, assign values but  no need for a dollar sign.

Q) Require a total grocery bill
A) BEGIN { total = 0 }
      { total = total + $2*$3 }
      END { printf "Total cost is %.2f\n", total}

Q) Average price
A) BEGIN { totalcost = 0
                       totalweight = 0 }
      { totalcost = totalcost + $2*$3 }            #Cost, an use +=
       { totalweight = totalweight + $3 }        #Weight
    END {printf "Average cost is $%.2f per lb\n" ,totalcost/totalweight }

Special Variables

FILENAME  The pathname of the current input file
FS    Input field separator, usually SPACE
NF    Number of fields in current record.
NR   # of current record from start  of  input
FNR  # of current record from start of current input file
OFS  Output field separator used by print, usually SPACE
ORS  Output record separator used by print, usually NEWLINE

Records are assumed to be single line (Like cat -n)
Q) Prepend each input line with line number.
{ printf "%6d %s\n", NR, $0 }

Q) Using awk, select the first three lines of standard input, in the manner of head -3
A) NR <= 3 { print $0 }
     NR == 3 { exit }

The variable NR starts off with value 1 on the first line of input, and continues counting however many files are given as args. FNR, is reset to 1 each time a new file is read. FILENAME contains the name

Q) Write an awk script firstlines which will read from a number of files and print out the first line of each file preceded by the message The first line of filename is: in the following manner
$ awk -f firstlines vegetables /usr/dict/words
A)  Use FNR to form the pattern to find the first line of each input file.
FNR == 1 {printf "The first line of %s is:\n%s\n", FILENAME, $0 }

$awk '{ print NF }'
hello there
2
A B C D E
5
(blank line)
0
Control-D

Q) If some data in vegetables has been mistyped, there might be lines in the file which contain either less than or more than three fields. Such lines cannot be processed correctly by precious awk scripts. Write an awk script which will read a file and print out a list of which lines contain a number of fields different to three.
A)     NF != 3 { printf "Line %d has %d fields\n", NR, NF }

Arguments to 'awk' scripts

Q) Shell script called price which would take one argument, representing a vegetable name, and interrogate the file vegetables as before to print out the total price paid for that vegetable.
A)  awk ' { printf "%s %.2f\n", $1, $2*$3 }' vegetables } |  grep $1 #evaluate total cost for all veges and uses grep to filter out the single line of output from awk. grep $1, $1 refers to first arg of shell script.
 
      awk '/$1/ {printf "#%s %.2f\n", $1, $2*$3 }' vegetables  won't work why???

Alternative:
      awk '{ if (veg == $1)          #where $1 is first arg to script
           printf "%s %.2f\n", $1, $2*$3 }' veg=$1 vegetables #veg-$1 will set veg as soon as awk starts

awk ' veg == $1
          { printf "%s %.2f\n", $1, $2*$3 }' veg=$1 vegetables

Q)Write a shell script which will take a single argument, representing cost in cents, and print out the names of all vegetables listed in file vegetables which cost more than that number of cents per lb.
A) if [ $# -ne 1 ]
     then echo "One argument needed"
              exit 1
    fi
    awk '{ if ($2 * 100 >= cost)
              printf "%s\n", $1 }' cost=$1 vegetables
    exit 0

Arrays  :
Associative array is a collection of variables which has a name, and each variable in that array has an index.

Q) Write an awk script which will read as input a sequence of lines each containing the name of a month. Output should be the name of the month read in followed by the number of days in it.
A) BEGIN {
         daysin["January]=31; daysin["February"] = 28
         so on
                                              daysin["December"]=31
     }
    { printf "%s had %d days\n", $1, daysin[$1] }

Q) Calculate the average cost per pound for each vegetable
A) { costs[$1] += $2*$3; weights[$1] += $3 }
     END   { for (veg in costs)
                      printf "%s: %.2f cents per pound\n", veg, costs[veg]/weights[veg] }

Q)To print the value of your path
A) printf "%s\n", ENVIRON["PATH"];

Fields & Record Separators

FS= field separator
RS= record separator

/etc/passwd
chris:hyrwer9834j2k3409wer:1623:103:Chris Cringle:/cs/ugrad/chris:/bin/sh
sam.....

To display passwd file using NIS
ypcat passwd

Q) Using awk and /etc/passwd write a shell script findname which takes an argument, which is a usercode, and displays the name of the user who owns that usercode.
A)  if [ $# -ne 1 ]
          then echo "findname requires one argument"
          exit 1
      fi

      awk '
           BEGIN { FS=':' }
           {
                if ($1 ==usercode)
                    printd "%s\n", $5 }

     ' usercode=$1 </etc/passwd #Run the awk with usercode set to value of the first arg of the shell script, and read the data from /etc/passwd
 

Q) Write an awk script which will read standard input containing a list of company names and phone numbers, together with other information. All companies in the input which have the keyword Anytown as part of their data should be printed out. The data for each company should be separated by a single line containing a single % symbol.
A) BEGIN { RS="%" }
      /Anytown/ { print $0 }'
 

print can be tailored to indivdual requirements by use of output field and output record separators OFS(space) and ORS(\n)

Q) Write an awk script which will read in the password file and output users names and home directories, in the format
ABC has home directory /cs/ugrad/abc
A) awk ' BEGIN { FS=":"
                                OFS=" has home directory "
                                 ORS=".\n" }
      { print $5,$6 }' </etc/passwd

Functions:
sin(x), cos(x), rand(), toupper(S), split(....), match(s, ERE) etc

Q) Write a script which will "roll a die". Each time a line of input is entered the script will print out a number between 1 and 6 to mimiv someone throwing a die
A) awk '{ printf %d\n", int(rand()*6 + 1) }'

Q) Write a script which will read the password file and display each user's name in capitals
A)  awk ' BEGIN {FS=":" }
              { print toupper($5) }' </etc/passwd

Q) Electrical retail company:
      Free delivery for addresses zipcode starting with B and followed by atleast 1 digit.
      Outside this area , flat rate
Customer data
invoice#, customer, road, town, zipcode
The company requires a document to instruct the delivery man which customers to visit and which to charge delivery fee
A) BEGIN { FS="," }
      { if ( match($%, "^B[0-9]") > 0)
           fee = "no fee"
         else
           fee = "standard fee"
         printf "%s, %s, %s, %s: %s\n", $2, $3, $4, $5, fee
     }