TCP Observability and Instrumentation - Part 1
We’ve all been in this scenario. It’s another quiet day and an email comes in or someone pops into your slack channel…
My app is experiencing slowness. Is something going on with the network?
This is a loaded question that needs a lot of clarification but regardless of the details it prompts another question: how can we better understand what’s happening on our network? A typical monitoring setup will capture device and interface level metrics for throughput/errors/drops/etc and for more wide spread problems this should be plenty to help you isolate hotspots that could be the source of the problem. You may even have probers setup around your network that will utilize ICMP to test reachability and response times, but these may not necessarily give reliable measurements for how well a network is performing with real application payloads/traffic. What if we want visibility for TCP per host or per socket?
ss will give us information about open sockets:
If you include the
-i flag, you’ll get a ton of TCP diagnostic info along with it.
This is a lot of great information but it’s not presented in a useful way, at least for trying to track behavior over time. The output also isn’t friendly to text manipulation either. Ideally we want this information as structured data gathered on a polling interval we define, or as part of a metrics agent that would scrape and emit this information over time. If we’re creating a CLI tool or an agent, we want something portable that can be distributed as a single binary without needing to worry about dependencies. Go or Rust would fit the bill here, but we are going to use Go since it’s a bit easier to work with.
Let’s ask the kernel ourselves
We have a few angles we can approach from here, let’s start with some modules get have in the standard library. As of linux kernel 2.6, using sycall
TCP_INFO option will give us TCPInfo struct. In order to grab this information, we need the file descriptor of the socket. Go’s net.Dailer type allows for a custom Control function to be called after socket creation which passes in a syscall.RawConn struct. which
RawConn also allows a control function to be defined which gives you access to the file descriptor of the socket created. Using the FD passed to our control function, we can setup an anonymous goroutine with ticker to constantly poll information for that socket.
And if we run this, we will see some TCP state changes along with the associated
TCPInfo struct from our syscall.
This is a decent start. We can do someting as simple as an HTTP transfer or setup our own TCP server/client and instrument our dialers. This has some downsides though.
- We can’t guarentee the file descriptor will always reference the same socket. This is actually noted in the docstring for RawConn’s
Controlmethod. File descriptors can be reused after they’re closed and will always be the lowest available integer when a new one is created. Given a file descriptor is unique to a PID though, if our process only created a single socket, this may end up not being a problem.
- The struct being returned doesn’t contain as many fields as the
We can come back to this approach at a later point, it could be useful for setting up distributed agents that transfer dummy files between each other and keep track of each sessions info.
Using Netlink to query socket statistics
Netlink is a mechanism for user space <-> kernel communications via the normal sockets API. Of particular interest to us is
sock_diag netlink subsytem. When we use
SOCK_DIAG_BY_FAMILY message type, we can extract TCP statistics for all sockets. How this works is well documented at this link. This is also flexible in the fact that it has
NLM_F_DUMP flag to return a list of sockets but the sockets returned can also be filtered by certain attributes. Luckily enough, there is a netlink module which already implements this functionality. At the time of writing this a PR is in motion for it but not merged yet. For now, we’ll fork the repo and update our
go.mod config to replace references until it’s merged.
For the sake of brevity, we’ll only display the first element in the array just to see what data we get back.
There’s a lot of useful data here. When looking at TCP performance/troubleshooting we’ll probably want to look at:
- Total Retransmissions: We can assume a retransmission is a lost packet. The
tcp_infostruct is deceiving, it’s not a total, only at time of snapshopt. This can also be used to calculate percentage lost since we also have total segments in/out.
- Connection State:
ca_stategives us info about the congestion control mechanism’s state machine.
- Segments In/Out: Useful for calculating runtime data such as loss percentage and throughput per second.
- Window Size: Also useful for spotting trends
We may also want to grab some data about TCP parameters in the kernel, but for now this is a good start.
Where do we go from here
Next time we’ll look at how we can turn this data into some useful tooling. Examples can be found here: https://github.com/crutcha/blog-examples/tree/master/tcpinfo.