Joining Lines with sed

I had a need this week to join lines in a shell script. Specifically, I had a file containing file names, one per line and needed them colon-separated in a single line.

I could have done something in Perl or Awk, or something. But a bit of searching turned up this solution in sed:

sed -e :a -e '$!N; s/\n/:/; ta' test.txt

sed usually operates a line at a time, so a simple s/\n/:/g won’t work; it will never find the newline characters.

Re-engineering how this command works taught me a bit about sed. So how does it work? It actually contains four sed commands. Let’s consider them one at a time:

  1. :a sets a label for a future jump/goto operation.

  2. $!N is the N command, modified with the address $!. The N command reads the next input line into the pattern space, separated from the current pattern space contents by a newline. Simple sed scripts typically do not diverge from the standard pattern of reading the input, one line at a time, into the pattern space, where it is processed and either dropped or printed. The N command does diverge: with the current line in the pattern space, it basically says “and I want to process the next line too, as if it were part of this one”.

    The $! address is an inversion of the $ address, the last line; the result is that N is invoked for every line except the last.

  3. s/\n/:/ is a more familiar sed command. It modifies the pattern space, replacing all occurances of the newline character \n with :. In this case, it is just a single character: the one separating the previous pattern space contents with the line read by N.

  4. ta branches back to the label a, if there has been a successful substitution since the last input line was read. In this case, the s command replaced the joining newline with :, so there was and therefore control returns to the first line. Once the last line is reached, no newline is added to the pattern space, so there is no substitution and the branch does not happen.

The result is a small program that continuously loops until the last line has been read, replacing the joining newline with a colon at each step. And a much deeper understanding, in my case, of how sed works and the tricks that can be done stepping outside the basic commands used in common scripts.