awk Useful one-liners - calculating average from a CSV etc Compute the median of values in a column from tabular data


Example

Given a file using ; as a column delimiter. We compute the median of the values in the second column with the following program, written for GNU awk. The provided input is the list of grades of a student group:

gawk -F';' '{ sample[NR] = $2 }
 END {
   asort(sample);
   if(NR % 2 == 1) {
     print(sample[int(NR/2) + 1])
   } else {
     print(sample[NR/2])
   }
}' <<EOF
Alice;2
Victor;1
Barbara;1
Casper;4
Deborah;0
Ernest;1
Fabiola;4
Giuseppe;4
EOF

The output of this program is 1.

Remember that NR holds the number of the line being processed, in the END block it therefore hold the total number of lines in the file.

Many implementations of awk do not have a function to sort arrays, which therefore need to be defined before the code above could be used.