extract the status code and hour and count the unique occurrences

November 28, 2023

Content #

Let’s find the all of the unique HTTP status codes in an apache web server log file named access.log. To do this, print out the ninth item in the log file with the awk command.

$ tail -1 access.log
18.19.20.21 - - [19/Apr/2014:19:51:20 -0400] "GET / HTTP/1.1" 200 7136 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.154 Safari/537.36"
$ tail -1 access.log | awk '{print $9}'
200
$ awk '{print $9}' access.log | sort | uniq
200
301
302
404
$

Let’s take it another step forward and count how many of each status code we have.

$ awk '{print $9}' access.log | sort | uniq -c | sort -nr
5641 200
207 301
86 404
18 302
2 304
$

Now let’s see extract the status code and hour from the access.log file and count the unique occurrences of those combinations. Next, lets sort them by number of occurrences. This will show us the hours during which the website was most active.

$ cat access.log | awk '{print $9, $4}' | cut -c 1-4,18-19 | uniq -c | sort -n | tail
         72 200 09
         76 200 06
         81 200 06
         82 200 06
         83 200 06
         83 200 06
         84 200 06
        109 200 20
        122 200 20
        383 200 10