Miscellaneous Findings III: Unix text tricks

For this round of Miscellaneous Findings, we have a bunch of ways to mess with text. They all use tools that come with Unix, so they should work without having to install extra junk, if you are working on a Unix-based OS. The last finding, using regexes in sed, I found particularly useful for converting hundreds of JS files that used CommonJS (e.g. var something = require('something')) to use ES Modules (import something from 'something')

This is a roundup of miscellaneous things that I’ve found out about (or have rediscovered). I take notes on findings regularly, and I put the findings that translate well to speech on my podcast, Small Findings. The rest (which are often technical findings), I put here. They’re not always written up for maximum comprehension as a blog post, but if anything is hard to understand, please email me if you need clarification.

Replacing all with sed

To replace all instances of a string in a directory tree with another string, do a find for the file types you want to target, pipe that to xargs to run sed on the files it finds.

Example:

find . -type f \( -name '*.md' -o -name '*.js' -o -name '*.json' -o -name 'Makefile' \) | xargs sed -i "s/small-findings/smallfindings/g"

Where:

  • md, js, json, and Makefile are the kinds of files in which the replacement should be made.
  • xargs is telling sed to run with:
    • A regular expression that replaces small-findings with smallfindings
    • A list of files that is whatever the find command found.

Shell script example.

#unix #bash #programming #text

Regex in sed

If you use sed without the -r switch, it does support a sort of regex, but doesn’t support capture groups. If you do, you can do something like this to replace all instances of var something = require('somepackage') with import something from 'somepackage' in a file:

xargs sed -r "s/var (\w+) = require\('(.*)'\)/import \1 from '\2'/g" -i myfile.js

(You use \1, \2, et al to point to capture groups in the replacement clause instead of $1, $2.)

If you want to run that on every JS file in a directory tree, you can pipe the output of a find command that looks for all .js files into xargs, which will run the sed command you give it and add each output from find to the commands. It’s sort of like currying.

find . -type f \( -name '*.js' \) | xargs sed -r "s/var (\w+) = require\('(.*)'\)/import \1 from '\2'/g" -i

#command #shell #regex #unix

xargs

(I’ve seen xargs a lot in shell scripts I’ve used and had a hazy idea about what it did but only just now did I actually look it up.)

xargs is a command that:

  • Runs another command for you
  • Converts stuff piped to it via stdin into command-line arguments for that other command

It’s a glue tool that’s necessary because:

  • A lot of Unix commands communicate via stdin/stdout pipes
  • Some do not

In that way, it’s like apply in JavaScript which converts arrays into function arguments.

As an example, you can use it to pass the results (a bunch of filenames) of get-entries-in-date-range to cat to mash up the results into a single file:

./tools/get-entries-in-date-range.sh 2020-03-28 | xargs cat > episode-2-script.md

#unix #bash

tr for replacing text

There is a Unix command called tr. You pipe in input text and give it two arguments:

  • The set of characters to replace
  • The set of replacement characters

Then, it writes the result out to stdout.

The nice thing is that it works on multiline text, unlike sed.

So, you can use it in combination with sed to work around sed’s single-line limitations.

e.g.:

cat in.json | tr \\n @ | sed -e 's/\]@\[/,/g' | tr @ \\n > out.json

That line:

  • Pipes the contexts of in.json
  • Replaces line breaks with @. (Thereby making it a single line.)
  • Runs sed to replace instances which were originally ]\n[ with just a comma.
  • Reverses the first replacement. Replaces @ with line breaks.
  • Writes the result to out.json.

So, if in.json happened to be a bunch of concatenated JSON arrays and looked like this:

[
  "a",
  "b"
]
[
  "c",
  "d"
]
[
  "e",
  "f"
]

(Which is not valid JSON.)

The above line would put this into out.json:

[
  "a",
  "b"
,  "c",
  "d"
,  "e",
  "f"
]

And that is valid JSON.

#tr #shell #unix #sed #text #bash