Today I needed to take a HTTP request and extract the
etag header; the
was used as part of an
implementation in a service I was using and I wanted to script an update to a
resource. I was doing this in a
Makefile so wanted to do this without firing
up a scripting language.
It turns out this is the domain of tools like
sed stands for stream
editor. It applies scripts to text streams which edit the content of the
stream. When you watch someone using
sed, the scripts look super-cryptic,
but in fact they’re not too bad. Like a regular expression, they benefit from
reading left to right; when viewed as a whole they are just a mess. In fact,
half of a
sed script is often a regular expression!
First, we’ll get the HTTP headers to work with. I found a new
-D <filename> that will do this for you. So to get the headers for dx13.co.uk:
curl -D headers.txt https://dx13.co.uk
There’s quite a lot of headers that come with a call to dx13.co.uk, so I
trimmed most of them from the end to leave something a bit shorter to work
with, which doesn’t affect the
sed commands at all. I left us with:
> cat headers.txt HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Tue, 06 Nov 2018 15:58:30 GMT etag: "5be1ba26-a9dd" access-control-allow-origin: * expires: Fri, 22 Mar 2019 14:03:49 GMT cache-control: max-age=600 x-github-request-id: 6F9E:2F59:86E637:B2E922:5C94E8ED
We’ll come to executing scripts in a minute. First, we’ll get familiar with what a script looks like. The basic form is:
addrselects a set of lines to operate on. It can be a single line, a line range or a regular expression.
!at the end of the address.
commandis executed on all file lines.
Xis a command (like
optionsare options to the command.
shas the option
'14d': the range is line 14; and then
dremoves the line; no options are used. This removes line 14 of the input.
'/:/d': the range is the regex
:; and then
dremoves the lines; no options are used. This will remove lines containing
:from the input.
's/^.*: /foo! /': the range is all lines; the command is
s; the option is the find/replace specification. We’ll see what this does later.
I found the
s command familiar – it’s just like vim’s.
By default, sed applies its first argument as a script and second as the input
file, and outputs to
A simple script is a vim-like search and replace. Here, we replace the header
> sed 's/^.*: /foo! /' headers.txt HTTP/2 200 foo! GitHub.com foo! text/html; charset=utf-8 foo! Tue, 06 Nov 2018 15:58:30 GMT foo! "5be1ba26-a9dd" foo! * foo! Fri, 22 Mar 2019 14:03:49 GMT foo! max-age=600 foo! 6F9E:2F59:86E637:B2E922:5C94E8ED
As we head straight to the
s command and don’t specify an address, the command
is executed on all lines of the file.
By using the
-e flag, multiple scripts can be chained. You can also use one
big script string with semi-colons, but I find multiple
-e flags easier to
Replace header names with
foo! as above, then replace
> sed -e 's/^.*: /foo! /' -e 's/foo/bar/' headers.txt HTTP/2 200 bar! GitHub.com bar! text/html; charset=utf-8 bar! Tue, 06 Nov 2018 15:58:30 GMT bar! "5be1ba26-a9dd" bar! * bar! Fri, 22 Mar 2019 14:03:49 GMT bar! max-age=600 bar! 6F9E:2F59:86E637:B2E922:5C94E8ED
As mentioned in the primer, removing lines is done using a command within the
!d is used to invert the behaviour.
Remove all the lines containing a colon:
> sed '/:/d' headers.txt HTTP/2 200
Note that we use the address
/:/ which is a regex that matches all lines
with a colon. The rest of the script executes on these lines.
Remove all the lines without a colon:
> sed '/:/!d' headers.txt server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Tue, 06 Nov 2018 15:58:30 GMT etag: "5be1ba26-a9dd" access-control-allow-origin: * expires: Fri, 22 Mar 2019 14:03:49 GMT cache-control: max-age=600 x-github-request-id: 6F9E:2F59:86E637:B2E922:5C94E8ED
Here we use
/:/! as the address – this causes the command to be executed
on the lines that don’t match the regex.
Finally we’re ready!
Combining the above, we can retrieve the ETag header using a chain of three scripts:
> sed -e '/etag/!d' -e 's/^etag: //' -e 's/"//g' headers.txt 5be1ba26-a9dd
s/"//gmeans global; leaving it out means that
sedwould replace only the first instance of
"that it found. Making the replacement global means that all instances on the line are replaced.
In the end, it feels like a bit of an anti-climax. However, it’s now much
clearer to me where I’d try to make use of
sed, and I feel I’ve learned
enough to be dangerous!