BASH - An Overview of Command Substitutions and Functions

I spent the last week working on a bash script to launch AWS EC2 spot instances with custom specifications and startup-scripts. It's the most complicated bash script I've ever written. Through many hours of trial, error, and waiting for responses from AWS customer support, I've learned some hard-earned lessons about bash. I hope by sharing one of these lessons, I can save you some pain.

In this post, I'll focus on two important tools: expressions and functions.

What is a command substitution?

A command substitution is a way to execute a command and direct its output. You declare a command substitution with the syntax: $(<command-to-execute>).

Command substitutions are incredibly useful and versatile. For example, you can use them to quickly generate variables:

# read file and convert text to base64, store in variable
echo "abc123" > example.txt
converted_text="$(cat example.txt | base64)"
echo $converted_text
# outputs YWJjMTIzCg==

Or to insert some transformed input within a string:

echo "Unconverted text: $(cat example.txt)"
# outputs abc123
echo "Converted text: $(cat example.txt | base64)"
# outputs YWJjMTIzCg==

There are many other uses for command substitutions. The examples above are just a small sampling of their possibilities.

What is a function?

If you're reading this post, I think it's safe to assume you already know what a function is. I'll skip the definition, but I'll cover the bash syntax.

In bash, you may use two forms to declare a function. Don't ask me why - I think it's poor form to allow more than one syntactically correct form, but here they are:

# form 1
some_func {
    ...
}

# form 2
some_func () {
    ...
}

As with most every other language, you call a function by simply invoking its name:

hello_world {
    echo "Hello world."
}

hello_world
# outputs "Hello world."

Why should I know the difference?

A command substitution and a function are both ways to generate and direct the output of some code. They're similar, but they have one key difference: when they're executed.

A command substitution is executed once: when it's declared. A function is executed each time it's called. This is important - it means that they cannot be used interchangeably for things like conditions in while and until loops.

Consider the example below:

instance_id=$(get_request_state | jq -r '.SpotInstanceRequests[0] .InstanceId')
while [ $instance_id == "null" ]; do
    # run spinner
    for i in `seq 1 30`; do
        j=$(( (j+1) %4 ))
        printf "\rWaiting for request to be fulfilled...${spin:$j:1}"
        sleep .1
    done
    instance_id=$instance_id
done

This is a modified excerpt from my script. The while loop should check the state of a spot instance request every 3 seconds, and break when the request is fulfilled (when the requested instance has been created and has an id).

It's an infinite loop. The command substitution that sets $instance_id is always run before AWS finishes creating an instance, and thus returns null. Since it's run only once, when the variable is declared, it's never updated, and the loop runs untill you get frustrated and ^C out of it.

However, if you use a function instead of a substitution, the loop works correctly:

function get_instance_id {
    get_request_state | \
    jq -r '.SpotInstanceRequests[0] .InstanceId'
}

spin="/-\|"
instance_id=get_instance_id
while [ $instance_id == "null" ]; do
    # run spinner
    for ((i=0; i<30; i++)); do
        j=$(( (j+1) %4 ))
        printf "\rWaiting for request to be fulfilled...${spi
        sleep .1
    done
    instance_id=get_instance_id
done
printf "\nRequest fulfilled.\n\nInstance ID:\t$instance_id\n"

Since the function get_instance_id is run every time it's called, instance_id is updated every 3 seconds, eventually takes on a non-null value, and breaks the loop.

When should I use command substitution?

As I said, command substitutions are incredibly useful. You should use them if you need to run a command once, like setting a variable that doesn't need to be updated, or inserting some complex output in a string.

When should I use a function?

Again, I assume you're familiar with functions. You probably already know that you should use a function if you need to run a command, or set of commands, more than once.

Closing Remarks

I want to close by noting that you can use functions in command substitutions, and vice-versa. This enables you to do make some really interesting lines!

Also, if you've learned bash in a formal or structured manner, this may be an obvious lesson. However, it may not be so obvious if you're like me and have learned primarily through code snippets on StackExchange and GitHub. If you belong to the second camp, I hope this has helped you.

Written by