SSH uses four TCP segments for each character you type

- -

Every time we press a key in an interactive SSH session, the SSH Client sends that keystroke as a TCP segment to the SSH Server. Here is a Wireshark capture of I having SSH-ed into my own machine and pressing a single key. I have not included packets indicating the setup and teardown of the TCP and SSH connections.

You can differentiate between the client and server from the source and destination port numbers. SSH servers usually work on port 22.

After the client has transmitted a character over a TCP segment, the server acknowledges that it has received it. Acknowledgements of data enable TCP to provide a reliable transport service to higher level protocols like HTTP, SMTP and SSH.

After this, the server actually processes the character by sending it to the program, which is typically bash, but could be anything – sh, zsh or even emacs. The shell will interpret the character and send back the result over another TCP segment. This enables the client to echo the character on the screen and send back an acknowledgement to the server.

I have depicted this in a timeline –

I tried this out and I can see only three segments per character I type

To be minimal about the traffic that sending acknowledgements for data causes, TCP piggybacks ACKs on data that it has to send anyway. This is implementation specific, though if you were using the socket API, you’d play around with the TCP_NODELAY (also called the Nagle Algorithm) and the TCP_QUICKACK options to reduce or disable delayed acknowledgments.

Here is an example of this happening when I SSH-ed into an AWS EC2 server.

Note that the server is sending the ACK to the character it received and the response together in one TCP segment.

Why not send whole commands?

One may wonder why does SSH not transmit every command or every line rather than sending individual characters. The answer lies in the fact that the program on the server may have commands that are only a character long, without requiring a newline character. Think ESC in vi, or M-x in emacs, or SPACE in more. Even while sending commands to bash, there could be readline-specific keystrokes like C-e or C-a or even TAB that need to be sent as they are pressed instead of waiting for a newline. So, SSH chooses not to try to understand what sequence of characters constitute a command and simply sends across characters as they are typed. In fact, it doesn’t even assume if and how the character pressed will echo on the screen, it finds it out from the server program.

What happens when I use SSH with other software like git?

One the most frequent ways that I use SSH is when I do any remote git operations like pull, push, or clone. When actual data is being sent, the SSH software understands that it isn’t an interactive invocation and TCP utilizes all the available capacity of a each segment to help SSH send all that data.

Deciding the sizes of segments is left to TCP and would warrant a blog post by itself, but here is a quick primer – The Maximum Segment Size (MSS) that TCP calculates is such that there will be no IP layer fragmentation of segments. In other words, TCP sets its MSS lesser than or equal to the Path MTU (Maximum Transmission Unit). TCP also sets the Don’t Fragment (DF) Flag to ensure that Segments don’t get fragmented on the IP.

Here is a capture of Wireshark sniffing data when I cloned a repository from GitHub over SSH.

Note how the data in each segment plus the TCP and IP headers equal the exact MTU of my Ethernet interface – 1500


Update

There is also a discussion about this on Hacker News


Nikhil Mungel writes blogs on networking, ruby and GNU/Linux. If you’d like to see more, follow him on twitter.com/hyfather

Comments