How does SSH's ProxyCommand actually work?

I have been writing support for SSH’s ProxyCommand into a product at work over the past month or so. I find protocols fascinating, particularly those as old as / with the pedigree of SSH.

ProxyCommand is an interesting corner in the set of (defacto) SSH standards. It is used to establish a secure connection to a destination host, where that destination host cannot be directly accessed by the local machine. It is mostly used when you want to tunnel through a bastion host. It specifies a command that your SSH client executes to create a proxied connection — hence the name — to the destination host through the bastion.

I’m writing about it because my experience over the last few months suggests that ProxyCommand can be slightly mind-bending. Certainly it took me a little while to wrap my head around what is going on.

So let’s take a look at how it works.

netcat

To start understanding what’s going on, we’ll first look at something else: netcat, or nc. nc can do a lot of things. Its man page begins:

The nc (or netcat) utility is used for just about anything under the sun involving TCP or UDP. It can open TCP connections, send UDP packets, listen on arbitrary TCP and UDP ports, do port scanning, and deal with both IPv4 and IPv6.

But we will use nc to send and receive data between a local terminal and a server. If we execute nc HOST PORT, nc will remain running and will proxy bytes to the remote host that we send via stdin and stdout. In other words, it’ll send what we type in our terminal to the remote host, and print out what the host returns.

By using nc to make a HTTP request to the dev Hugo server that I run when building this site, we can see this in action.

In this example, the first three lines, starting GET /, are what I type to make a HTTP request — this is about the simplest complete HTTP request you can make.nc forwards that to Hugo, and then it prints out the response from the server to stdout and so my terminal.

> nc localhost 1313
GET / HTTP/1.1                                      \
Host: localhost                                     |- what I typed
                                                    /
HTTP/1.1 200 OK                                     \
Accept-Ranges: bytes                                |
Content-Length: 15825                               |
Content-Type: text/html; charset=utf-8              |
Last-Modified: Sun, 04 Aug 2024 09:30:58 GMT        |- what the server
Date: Sun, 04 Aug 2024 09:34:06 GMT                 |    sent back
                                                    |
<!DOCTYPE html>                                     |
<html>                                              |
                                                    |
<head>                                              |
                                                    |
... more page content ...                           /

What is happening in detail:

nc opens a network connection to localhost. This could be any non-TLS host.
nc waits to receive data from stdin (ie, the shell in this case) or from the network.
I type the first three lines, the HTTP request, terminated by a double newline.
nc receives each character I type as a byte over stdin via my terminal and shell. (The shell and terminal handle echoing the content I’m typing back to me).
nc sends those bytes over its TCP connection to localhost:1313 (Hugo).
Hugo receives the request and sends a response.
nc receives the response and forwards it byte-by-byte to stdout.
The shell receives the output and instructs the terminal to print it.
The terminal shows the response.

The thing to bear in mind is that nc is proxying stdin/out over a network connection to a remote host. stdin becomes the data that is sent to the remote host, and stdout is used to return data to the process that started nc.

In the example above, I use an interactive shell to start nc and so the stdin and stdout are connected to my terminal. This is why my typing is sent to the remote server (albeit that server is on localhost) and why the returned data is printed to my terminal. This works because HTTP is a text-based protocol, so we are sending and receiving data the terminal and shell can process. A binary process would likely send control codes that would break our shell or terminal.

But, in the general case, any program that starts an nc child process and binds its stdin and stdout can then use the nc process to send and receive any bytes to and from a remote server. Those bytes need not be printable, as with the shell case above, so binary protocols — like SSH — can be used.

What SSH expects from `ProxyCommand`

When we use a ProxyCommand with SSH, SSH does very specific things:

SSH executes a child process using the command in ProxyCommand.
SSH binds the child process stdin and stdout.
SSH uses the child process’s stdin and stdout instead of creating its own network connection.
At this point, SSH starts the standard SSH handshake and authentication over the stdin/stdout network-like socket, assuming there is an sshd at the other end of it.

That final step should sound very familiar from that last paragraph on nc 😀

SSH assumes the child process is routing the parent process’s SSH’s network traffic all the way to the destination.

That means that the SSH parent process will assume that it is talking to DESTINATIONHOST when exectuing the following command, but in fact it will have a connection to ANOTHERHOST because ANOTHERHOST is specified in the ProxyCommand string:

ssh -o ProxyCommand='nc ANOTHERHOST 22' DESTINATIONHOST

This can happen through misconfiguration, for example.

To counter this threat, the parent SSH process will validate the host key it receives from the server on the other end of the ProxyCommand. In this case, SSH will then realise it is talking to the wrong server and terminate the connection.

Once I’d internalised that all ProxyCommand does from SSH’s point of view is to start a child process that SSH uses as a network connection via the child’s stdin/out, things started falling into place. Specifically, I understood that the child only has to open a connection to the remote sshd, proxy raw bytes and nothing more — because the parent ssh process takes care of the actual ssh-ness of things.

The simplest ProxyCommand

ProxyCommand is usually used to tunnel into a network via a remote bastion host. But to show the simplest ProxyCommand, let’s proxy an SSH connection over a local nc child process.

These commands have the same effect, in terms of the network connections created:

ssh mike@destination

# - and - 

ssh -o ProxyCommand='nc destination 22' mike@destination

In the first case, ssh creates its own network connection to destination.

In the second, ssh starts a child nc process. Then nc creates the network connection and waits for input on stdin/out. ssh then uses the stdin/out of the child nc process as its network connection.

But in both cases, the machine only has one TCP connection to destination.

The nc process will exit when the destination host closes its connection, which will happen once ssh completes the SSH session with the destination host. Again, nc doesn’t have any smarts about when to exit — it just waits for the parent SSH process to finish doing what it needs to do.

The parent ssh process sends exactly the same bytes over the nc stdin/out pipes that it would send over its own network connection. The std/in out is just a network connection to ssh. Specifically, it expects the thing that is responding over the stdin/out pipes to be an sshd server.

ssh will execute the usual SSH handshake, authentication and so on itself, using the exact same sent and received bytes, just sending them to the nc stdin/out pipes instead of its own network socket.

There’s nothing special happening from the parent ssh process’s point of view, except this (odd) way to send its usual network traffic!

It’s important to remember that the ProxyCommand string is executed completely locally. The child process may send commands to a remote host — eg, to tell that remote host to proxy to the destination host — but the process itself is entirely executed locally. Given that ssh wants to use the process’s stdin/out as its network connection, by definition the process must be local 😬

Using remote servers in `ProxyCommand`

The main use-case for ProxyCommand isn’t using nc as a local network proxy, of course. That’s … kinda pointless. Instead, it’s using a remote bastion host to proxy to another “destination” host within the same network as the bastion.

This is where ProxyCommand starts to get confusing, because we stack a few things together:

That ssh can start and bind a remote program’s stdin/out.
That nc can be used as that remote program to create an onward connection to the host inside the bastion’s network.

Binding a remote process’s stdin/out

To start a remote program and bind its stdin/out (and stderr), we do this:

> ssh mike@destination_host ls

This command does a lot of stuff:

The local ssh client creates a network connection to sshd on destination.
The local ssh client requests that sshd run ls.
The remote sshd runs ls, binding the ls process’s stdin/out to itself.
The remote sshd receives the result of ls over the bound stdout.
The remote sshd sends the resulting bytes to the local ssh client.
The local ssh client prints them out to the local terminal.
The remote ls process exits.
The remote sshd sends the exit code over to the local ssh client.
The remote sshd and the local ssh close the network connection.
The local ssh exits with the ls exit code (ie, it mirrors the remote process’s exit code).

Note that during step 3 & 4, the local ssh client could send data to the remote sshd which the remote sshd would pass to the executed command (ls) via its stdin.

So that’s our first layer of stdin/out “proxying”. We can get the remote sshd to proxy the stdin/out of a command to our local ssh client.

Proxying via a remote `nc` process

Let’s look at how we take remote execution and turn it into a proxy running on the bastion host.

What we do is:

Use nc as our remote program.
Embed that remote execution of nc in a ProxyCommand.

The command line looks like this:

ssh -o ProxyCommand='ssh mike@bastion nc destination 22' mike@destination

Let’s look at how this command gets to the position where the parent local SSH process is able to start a handshake with the sshd server running on destination.

The first stage is setting up the connection to the destination:

The ssh client parent process creates a child ssh client process:
- The child process is ssh mike@bastion nc destination 22, ie the exact ProxyCommand string. Note this child ssh process is connecting to bastion.
- The parent binds to the child’s stdin/out.
The child ssh client process opens an SSH connection (runs the SSH handshake, authenticates) to sshd on bastion.
The child ssh process requests that the bastion’s sshd starts nc destination 22.
The bastion sshd starts the nc process, binding the nc stdin/out to itself.
The nc process running on the bastion creates a TCP network connection to port 22 on destination.

Side note: to avoid typo errors where the wrong destination host ends up being used (as noted above), ProxyCommand allows substitution of the destination host and port using %-variables:

ssh -o ProxyCommand='ssh mike@bastion nc %h %p' mike@destination

This is especially useful in configuration files, where the exact destination host isn’t known in advance.

At this point, the parent ssh client process is able to start sending bytes to destination, to start the SSH handshake and authentication process with the sshd on destination. Let’s see how those bytes make their way to destination.

The parent ssh client process sends the bytes forming the first part of the new SSH handshake over the child ssh client’s stdin.
The child ssh receives the bytes over stdin and sends them over the network to sshd on the bastion.
The bastion’s sshd receives the bytes and sends them to its child nc process over stdin.
The nc process receives the bytes over stdin and sends them over the network to destination:22.
The sshd server on destination receives the bytes.

The destination server’s response mirrors that:

The destination sshd sends bytes across the network to the bastion’s nc process.
nc outputs those bytes over stdout to its parent sshd process.
The bastion’s sshd sends the bytes over the network to the child ssh client process.
The child ssh client outputs the bytes over stdout, where they are received by the parent ssh client process.

Now we can put this together into a data flow diagram:

Summary

Overall, it’s somewhat convoluted! But this is because ProxyCommand is designed to provide flexibility to the user by allowing the network connection to be created in any way the user wants. Other ways include:

The SSH client’s man page gives the example of using an HTTP proxy rather than using an SSH server as the proxy.
More intelligent proxies can use the SSH subsystem mechanism to provide extra functionality.

The key thing that ProxyCommand does is to abstract the routing of the network traffic to the destination — whether that’s via a local nc, a remote bastion or something completely different — from the user’s SSH session (ie, the “parent” ssh client process).

Importantly, because a new SSH session is established by the parent over the network path ProxyCommand creates, SSH is able to validate the connection to the destination using its host key and all data in the user’s SSH session is securely encrypted — so any proxies along the way cannot read it.

How does SSH’s ProxyCommand actually work?

netcat

What SSH expects from ProxyCommand

The simplest ProxyCommand

Using remote servers in ProxyCommand

Binding a remote process’s stdin/out

Proxying via a remote nc process

Summary

What SSH expects from `ProxyCommand`

Using remote servers in `ProxyCommand`

Proxying via a remote `nc` process