You can differentiate between the client and server from the source and destination port numbers. SSH servers usually work on port 22.
After the client has transmitted a character over a TCP segment, the server acknowledges that it has received it. Acknowledgements of data enable TCP to provide a reliable transport service to higher level protocols like HTTP, SMTP and SSH.
After this, the server actually processes the character by sending it
to the program, which is typically bash
, but could be anything –
sh
, zsh
or even emacs
. The shell will interpret the character
and send back the result over another TCP segment. This enables the
client to echo the character on the screen and send back an
acknowledgement to the server.
I have depicted this in a timeline –
To be minimal about the traffic that sending acknowledgements for data
causes, TCP piggybacks ACKs on data that it has to send anyway. This
is implementation specific, though if you were using the socket API,
you’d play around with the TCP_NODELAY
(also called the Nagle
Algorithm) and the TCP_QUICKACK
options to reduce or disable delayed acknowledgments.
Here is an example of this happening when I SSH-ed into an AWS EC2 server.
Note that the server is sending the ACK to the character it received and the response together in one TCP segment.
One may wonder why does SSH not transmit every command or every line
rather than sending individual characters. The
answer lies in the fact that the program on the server may
have commands that are only a character long, without requiring a
newline character. Think ESC in
vi
, or M-x in emacs
, or SPACE in more
. Even while sending
commands to bash
, there could be readline-specific keystrokes like
C-e or C-a or even TAB that need to be sent as they are pressed
instead of waiting for a newline.
So, SSH chooses not to try to understand what sequence of characters
constitute a command and simply sends across characters as they are
typed. In fact, it doesn’t even assume if and how the character
pressed will echo on the screen, it finds it out from the server program.
One the most frequent ways that I use SSH is when I do any remote
git
operations like pull
, push
, or clone
. When actual data is
being sent, the SSH software understands that it isn’t an interactive
invocation and TCP utilizes all the available capacity of a each
segment to help SSH send all that data.
Deciding the sizes of segments is left to TCP and would warrant a blog post by itself, but here is a quick primer – The Maximum Segment Size (MSS) that TCP calculates is such that there will be no IP layer fragmentation of segments. In other words, TCP sets its MSS lesser than or equal to the Path MTU (Maximum Transmission Unit). TCP also sets the Don’t Fragment (DF) Flag to ensure that Segments don’t get fragmented on the IP.
Here is a capture of Wireshark sniffing data when I cloned a repository from GitHub over SSH.
Note how the data in each segment plus the TCP and IP headers equal the exact MTU of my Ethernet interface – 1500
There is also a discussion about this on Hacker News
Nikhil Mungel writes blogs on networking, ruby and GNU/Linux. If you’d like to see more, follow him on twitter.com/hyfather
]]>ifconfig
, since other invocations are well documented
in the real man page.ifconfig
is a systems administration utility for UNIX-like systems
that allows for diagnosing and configuring network
interfaces. Although some claim that it is being replaced with iproute2
(or simply the ip
command), I
have seen it being used abundantly.
You can using ifconfig
to bring up interfaces, turn them off, and configure
the protocols and identifiers they use.
ifconfig
prints out a wealth of information if invoked
without any parameters and options. I simply could not find the
definitions of most of these things and what follows
is my attempt at documenting these exhaustively.
This is what we see when we invoke GNU ifconfig
on a virtual host running
Ubuntu. Note the absence of a wifi interface, as is the case with most
servers.
$ ifconfig
eth0 Link encap:Ethernet HWaddr 08:00:27:0c:49:47
inet addr:192.168.0.121 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe0c:4947/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3461 errors:0 dropped:0 overruns:0 frame:0
TX packets:3686 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1778710 (1.7 MB) TX bytes:821363 (821.3 KB)
Interrupt:10 Base address:0xd020
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:12 errors:0 dropped:0 overruns:0 frame:0
TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:720 (720.0 B) TX bytes:720 (720.0 B)
eth0 Link encap:Ethernet HWaddr 08:00:27:0c:49:47
Application data is progressively encapsulated as it descends through
the layers of the TCP/IP Stack. Link encap:Ethernet
means that IP
Datagrams coming from the Internet layer will be wrapped in an Ethernet
Frame before leaving this interface.
HWaddr 08:00:27:0c:49:47
is the 48 bit Media Access Control (MAC) address. It uniquely
identifies this network interface on the hardware layer. This address
will be sent in ARP (Address Resolution Protocol) reponse packets when other devices want to send
Ethernet Frames to this interface.
eth0 inet addr:192.168.0.121 Bcast:192.168.0.255 Mask:255.255.255.0
inet addr:192.168.0.121
needs no introduction, it is the 32 bit IPv4 address that
this interface is using. Wanting to know this address is also probably the most common reason
for invoking ifconfig
.
Modern networking relies on slicing networks
into smaller portions using subnetting and Classless
Inter-Domain Routing (CIDR).
For subnetting to work, we need to understand what part of
an IP address is the Network ID and what part is the Host ID. This
information is carried in the Network Mask Mask:255.255.255.0
.
Bcast:192.168.0.255
is the broadcast address of the subnetwork the interface is
on. Packets sent to this address will be received by all interfaces
on this subnet.
We get this the broadcast address by masking the IP Address with a
bit complement of the network mask Mask:255.255.255.0
like this –
Network Mask: 255 . 255 . 255 . 0
Complement all bits: 0 . 0 . 0 . 255
Original IP address: 192 . 168 . 0 . 121
_____________________
OR them bitwise: 192 . 168 . 0 . 255
Which is the Broadcast Address
I never paid much attention to IPv6 addresses in the past. However, it isn’t too complicated to get to the bottom of it. Your local IPv6 addresses are essentially based on the MAC address of the interface.
eth0 inet6 addr: fe80::a00:27ff:fe0c:4947/64 Scope:Link
fe80::a00:27ff:fe0c:4947/64
is the 128 bit link-local IPv6 address
for the interface. We understand that it is a link-local address
because of the Scope:Link
field. Link-local IPv6 addresses are for
communicating with the directly attached network, and not globally.
This is how all link-local addresses are laid out:
10 bytes | 54 bytes | 64 bytes
1111 1110 10 | All Zeroes | Interface Identifier
Let's see whether our IPv6 address conforms to this pattern:
fe80::a00:27ff:fe0c:4947
(we replace :: with multiple all-zero double-octets)
fe80:0000:0000:0000 : 0a00:27ff:fe0c:4947
PREFIX | INTERFACE IDENTIFER
All these zeroes make a | This looks a lot similiar
link-local IPv6 address | to the MAC address which
non-routable | is '08:00:27:0c:49:47'
The Interface Identifier is in fact usually made up using the MAC address. This is called EUI-64, or Extended Unique Indentifier by the IEEE.
08:00:27:0c:49:47 # Start with the MAC adress
08:00:27:ff:fe:0c:49:47 # Insert ff:fe in the center
0a:00:27:ff:fe:0c:49:47 # Invert the 7th MSB starting from the right
0a00:27ff:fe0c:4947 # Group it into double octets!
eth0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
UP
means that network interface is activated (with address and routing
tables) and is accessible to the IP layer.
BROADCAST
means that interface supports broadcasting (and can hence obtain
an IP address using DHCP).
RUNNING
signifies that the network driver has been loaded and has
initialized the interface.
MULTICAST
tells us that multicasting support is enabled on this
interface.
Since we didn’t invoke ifconfig
with the --all
flag, it will only
print out interfaces that are currently UP
.
MTU 1500
shows that the current Maximum Transmission Unit
is set to 1500 bytes, the largest
allowed over Ethernet. Any IP datagrams larger than 1500 bytes will be
fragmented into multiple Ethernet Frames, if allowed by the routers
and hosts in between. Else we’ll just get an ICMP Destination
Unreachable
response with Code 4.
And finally, Metric:1
is the cost associated with routing frames
over this interface. Normally, Linux kernels don’t build routing
tables based on metrics. This value is only present for
compatibility. If you do try to change the metric, it may not work. [1]
$ sudo ifconfig eth0 metric 2
SIOCSIFMETRIC: Operation not supported
eth0 RX packets:3461 errors:0 dropped:0 overruns:0 frame:0
TX packets:3686 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1778710 (1.7 MB) TX bytes:821363 (821.3 KB)
RX
stands for received and TX
stands for transmitted.
Documentation for the fields that follow is sparse and only
long-deserted ghost-town forums popped up in my searches.
I download the source code for GNU inetutils 1.9.1
and here are my findings after a few recursive greps:
RX packets
: total number of packets received.
RX errors
: an aggregation of the total number of packets received
with errors. This includes too-long-frames errors, ring-buffer overflow errors, crc errors,
frame alignment errors, fifo overruns, and missed packets.
The ring-buffer refers to a buffer that the NIC transfers frames to before raising an IRQ with the
kernel.
The RX overruns
field displays fifo overruns, which are
caused by the rate at which the ring-buffer is drained being higher
that the kernel being able to handle IO.
RX frame
accounts for the incoming frames that were misaligned.
TX packets
indicate the total number of transmitted packets.
TX errors
present a summation of errors encountered while
transmitting packets. This list includes errors due to the
transmission being aborted, errors due to the carrier, fifo errors,
heartbeat errors, and window errors. This particular struct
in the source code
isn’t commented.
We also have itemized error counts for dropped
, overruns
, and
carrier
.
collisions
is the number of transmissions terminated due to CSMA/CD
(Carrier Sense Multiple Access with Collision Detection).
The final line is merely all successfully received and transmitted data in bytes and a human readable format.
Since this isn’t a statistic, it gets its own heading.
The txqueuelen
field displays the current Transmit
Queue Length.
This queue limits the number of frames in the interface’s device driver that are queued for
transmission.
The value of the txqueuelen
can also be set by the ifconfig
command.
eth0 Interrupt:10 Base address:0xd020
Interrupt:10
corresponds to the IRQ number against which to look up
the eth0
device in /proc/interrupts
, where the interrupts are counted.
$ cat /proc/interrupts
CPU0
0: 115 XT-PIC-XT-PIC timer
1: 3402 XT-PIC-XT-PIC i8042
2: 0 XT-PIC-XT-PIC cascade
5: 1 XT-PIC-XT-PIC snd_intel8x0
8: 0 XT-PIC-XT-PIC rtc0
9: 0 XT-PIC-XT-PIC acpi
=> 10: 53981 XT-PIC-XT-PIC eth0 <=
11: 1535 XT-PIC-XT-PIC ohci_hcd:usb1
12: 146 XT-PIC-XT-PIC i8042
14: 16923 XT-PIC-XT-PIC ata_piix
15: 10416 XT-PIC-XT-PIC ata_piix
53981
is the number of times the eth0
device has interrupted CPU0
.
The third column tells the name of the programmable interrupt handler,
and XT-PIC-XT-PIC
may be something that my VirtualBox is doing.
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:12 errors:0 dropped:0 overruns:0 frame:0
TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:720 (720.0 B) TX bytes:720 (720.0 B)
It isn’t connected to the NIC (or any hardware) and frames relayed over
the loopback don’t exit the host on any layer. It is fully implemented
in software. This also means that IP Datagrams sent over this
interface are not encapsulted in an Ethernet frame, as can be seen by
Link encap:Local Loopback
.
lo inet addr:127.0.0.1 Mask:255.0.0.0
We have a large address space as set by the liberal subnet mask –
Mask:255.0.0.0
.
The loopback device can be configured with an IP address on the 127.0.0.0/8
subnetwork which can be any address between 127.0.0.1
to
127.255.255.254
. The loopback address on my machine is 127.0.0.1
, which is usually the default.
lo inet6 addr: ::1/128 Scope:Host
Unlike IPv4, only one address is reserved for the loopback interface in
the IPv6 address space – 0:0:0:0:0:0:0:1
. It represented more
succintly as ::1/128
since we can replace consecutive groups of 0
by a ::
.
The IPv6 Scope for the loopback address ::1/128
and is treated under the
link-local
scope in RFC 3513. The terminology Scope:Host
or
Scope:Node
is also used to further emphasize that the packet will never exit the host
(or node). Unlike other link-local addresses, if a packet addressed
to ::1/128
is received on an Ethernet interface, it is promptly dropped.
lo UP LOOPBACK RUNNING MTU:16436 Metric:1
The eponymous LOOPBACK
flag in the flags string isn’t as interesting
as the MTU:16436
. Since the loopback interface isn’t bounded by the
physical limitations of Ethernet or FDDI, its MTU is set to more than
16KiB.
We can send a 16 x 1024 = 16384
byte data packet, with an additional
52
bytes without fragmenting it. 52
bytes are usually sufficient
for TCP and IP headers (both are 20 bytes long without options).
The concept of Metric
is the same as it was for Ethernet interface above.
lo RX packets:12 errors:0 dropped:0 overruns:0 frame:0
TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
The fields for loopback statistics are printed out by the same
function and retain the same definitions from the Ethernet piece
above. However, errors and collisions have little chance of making an
appearance here, since there isn’t a physical medium present.
The txqueuelen
is set to 0
by default. It can be changed for the
lo
device, but I doubt if that would have any effect.
Don’t like GNU ifconfig
or don’t have it? No problem, there are a few other ways of
querying a system for similar information. netstat -ai
and ifconfig
also work on Mac
OS X, but the output is slightly different since both tools originate
from the BSD userland.
With iproute2
–
$ ip --statistics link list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN mode DEFAULT
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
RX: bytes packets errors dropped overrun mcast
67710 812 0 0 0 0
TX: bytes packets errors dropped carrier collsns
67710 812 0 0 0 0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT qlen 1000
link/ether 08:00:27:89:cf:84 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped overrun mcast
10372230 53359 9 0 0 0
TX: bytes packets errors dropped carrier collsns
206555 1826 0 0 0 0
Or with netstat
, on which the ifconfig
output is actually based on –
$ netstat --all --interfaces
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 0 56092 10 0 0 3095 0 0 0 BMRU
lo 16436 0 858 0 0 0 858 0 0 0 LRU
The Flg
field above shows us the status of the interfaces. BMRU
stands for Broadcast, Multicast, Running, and Up. LRU
stands for
Loopback, Running, and Up.
And then this week, I got a chance to speak about the same topic at the Ruby Lightning Talk series organized by the great folks at ApartmentList and Thoughtbot in the heart of SoMa in San Francisco. I condensed the full-format presentation from earlier this year into this crisp 10 minute talk that I’ve embedded here –
]]>.tgz
) from a Continuous Integration server to an RPM repository server.
We already have an existing RPM repository server that uses Apache, and once my tarball was in the correct location, it would be available over HTTP for all to consume.
Cutting to the chase – What is the simplest way by which I could automatically transfer a ~20 MiB file from one CentOS host to another? I didn’t want to install an FTP server or any extra Apache module on the existing RPM host that would then support multi-part file uploads.
The quickest solution, it seemed was an scp
or an rsync
.
So, how would this CI host be authorized to open an SSH tunnel to the web-server? Where would the identity key reside? There is no elaborate keyserver in this ecosystem.
I decided to the transfer the responsibility of protecting the system from the identity key to the remote host’s operating system.
A new user called tarballs on the RPM repository host with its HOME
set to /var/www/html/tarballs
, and set its SHELL
to rbash
.
When the bash is started with the name rbash (ln -s /bin/bash /bin/rbash
) or by invoking bash like this: bash -r
, it starts up in a restricted way, which is handy while setting up more controlled environments. I know of it thanks to Saurabh ‘Rob’ Mookherjee, a sysadmin whom I work with.
When in bash’s restricted mode, one cannot change directories, use commands with a /
in them, neither can one change the PATH
or the SHELL
variables. A more comprehensive list of contraints can be found in the manpage for bash.
So, all is good except the tarballs user still has access to all executables that exist in its PATH
that the system assigns by default.
A quick hack in the /etc/profile.d
to unset the PATH
for the tarballs user and there is hardly anything the tarballs user can do once logged in.
The only required executble binary was /usr/bin/ln
to make a symlink called ‘latest’ to the most recent tarball that was SCP’ed over.
I copied this binary to tarballs’ HOME
. A kludge, I admit.
Now, from my Continuous Integration agent, I can script these two commands to be run everytime a new build artifact is to be uploaded to the repository.
scp -v -i tarball_identity tmp/build7f3cd88.tar.gz tarballs@repo.host.com:
ssh -v -i tarball_identity tarballs@repo.host.com "ln -sf build7f3cd88.tar.gz latest"
For reference, here is what I have on the repository host:
[root@repo.host.com ~]# cat /etc/profile.d/tarballs.sh
if [ `whoami` = 'tarballs' ]; then unset PATH; fi
In larger, more complicated systems that support different products and web-apps, I have seen the occasional file that is rsync’ed to another host, or a larger script residing remotely being invoked over SSH. While such things usually happen inside of a VPN or a DMZ, it is still a risky proposition to have an identity file being checked into the codebase or lying on an arbitrary host. While having a more robust security solution should certainly be on the list, creating a separate user on the remote host that has only enough privileges to perform a said task is a great idea. Once such a user exists, we have effectively moved that responsibility from the SSH identity keyfile to the remote host’s operating system.
Bear in mind that this infrastructure lies in a secure corporate datacenter with access to the machines restricted to trusted co-workers. Also, while the RPM repository host is important, all the data it holds can be easily mirrored and reproduced.
Solely relying on an rbash
is by no means a solution for any mission-critical host that is directly exposed to the internet or any untrusted zone.
I have observed people use different strategies and workflows with bundler and RVM, since there is at least one overlap in what they do: manage collections of rubygems. Bundler calles them bundles and RVM calles them gemsets.
Broadly, here are two patterns.
When a new project is cloned or intialized, an RVM gemset is created with the project’s name. Then, every time one wishes to work on that project, they switch to that gemset using the .rvmrc.
When cloning or initilizing a new project on the system, no gemsets are explicitly created in RVM. Instead, bundler is used to manage all gemmy things across the system and across projects.
bundle install --path .gems
This --path
helps keeping the global RVM gem-space always empty (except for bundler.gem, rake.gem and rails.gem, to initlialize new projects.)
This also results in one’s .bundle/config
file to now contain the entry
BUNDLE\_PATH: .gems
And of course, .gems should be ignored by your SCM.
echo ".gems" \>\> .gitignore \# if you use git.
protip: I also include .gems when I create my tagfile. This helps me to quickly jump into the gem’s codebase using my editor.
Bundler is closer to the project while RVM is closer to the system. I like to have my project’s gems in my project’s directory managed with a tool built for the job.
Bundler stays with you in production.env while RVM might not, depending on the sysadmin and the situation at hand. I uniformly work on a philosophy of keeping my development as close to production with regards to the toolchain. No, I don’t run RHEL on my laptop.
One additional step encountered is prefixing every command that needs to run in the project context by bundle exec
(I have it aliased to bx
).
Which is only fair, since that is how every command would run on production.env.
e. g. bundle exec rake db:migrate
or bx rails dbconsole
protip: Forgot to prefix bx
to the previous command? Run bx !!
.
UPDATE: @tdinkar pointed me to passing the --binstubs
flag to bundle install that gets rid of having to use bundle exec
for every executable command to be run in the bundled context.
Here is Yehuda’s blog post delving into more detail about the --binstubs
flag and the reason for its existence.
It wasn’t all that sporadic, a pattern was noticed soon enough – a
harmless C-x C-s
issued when inside vi.
Unfortunately, most distros do not come bundled with emacs and I have to resort to botching up and fumbling with vi (or vim) to edit a few configurations now and then.
Everytime I’d tweak an LXC configuration with vim and hit C-x C-s
on the unsuspecting editor, things would freeze up. It’d refuse to respond to even the widely respected un-interceptability of the ^Z
.
Since I use a multiplexed SSH control-master, I’d waste no time opening another SSH to the obstinate host in a different tab and mind my business.
Till I learned about flow control.
Software Flow Control enables communication links to be started or stopped using the primary communication channel, which in this case was the ssh tunnel. XON and XOFF dictate the status of the data-link transmission to the tty (X stands for transmission).
XOFF is mapped to C-s
by default, which causes the SSH session to not receive any signals.
It can be promptly remedied by an XON (mapped to C-q
by default) to resume the transmission.
Emacs by default intercepts all C-
sequences and hence does not exhibit this behaviour.
TL;DR
SSH freezes up, ‘hangs’ or stops responding when you hit C-s
?
Use C-q
to resume it.
We start right from the basics of what lifecycle does a line of code typically follow to how different environments should be configured and managed.
Since we talked to a Ruby audience, we talk specifically about release practices and workflows of Ruby on Rails applications centered around Continuous Integration that have enabled us to deploy extremely fast.
Chris Breeze is a colleague who works out of the Chicago office, he is also the man behind Chicago Carp.
The ones at the beginning are ideal for smaller scale applications that do not typically need to scale fast. We also go on to talk on more advanced patterns by using system-level packing tools that can enable an application to scale very rapidly if you use Chef or Puppet.
We presented this at DevOpsDays Bangalore 2011.
]]>Adding these to ~/.ssh/config
or /etc/ssh_config
will allow you to
multiplex one SSH connection to open multiple terminals, multiple
scp
and git push
without having to authenticate over keys or passwords.
Host \*
ControlMaster auto
ControlPath /tmp/%r0%h:%p
Add this to ~/.bashrc
shopt -s histappend
Remember to use pushd
and popd
when in deeply working with many
directories.
grizzly:~$ pushd /var/log/apache2/
/var/log/apache2 ~
grizzly:apache2$ # do something
grizzly:apache2$ pushd /etc
/etc /var/log/apache2 ~
grizzly:etc$ # do something else
grizzly:etc$ popd
/var/log/apache2 ~
grizzly:apache2$ popd
~
grizzly:~$
In fact, you can effectively use pushd
instead of cd
.
tail -f httpd.log &
Then start the REPL, or continue working on the shell (which is also a REPL).
Note that these work with the emacs-readline, which is the default configuration in most distributions (including OS X).
]]>