nf_conntrack: table full - how the absence of rules can lead to unexpected behaviour
I recently observed the dreaded:
nf_conntrack: table full, dropping packet
message on a host that formed part of the external tier of an infrastructure, where we expected, managed and throttled many connections. The odd thing was, the hosts should have been doing nothing iptables-wise to be tracking connections or otherwise generating this message. On behaving and misbehaving hosts both an `iptables -L` would show a bunch of empty chains. Odd.
However, a few leaps of logic later lead to the following being discovered on the well-behaved hosts:
# lsmod | egrep 'ip_tables|conntrack' ip_tables 9899 1 iptable_filter x_tables 14175 1 ip_tables
and curiously this on the mis-behaving hosts:
# lsmod | egrep 'ip_tables|conntrack' nf_conntrack_ipv4 10346 3 iptable_nat,nf_nat nf_conntrack 60975 4 ipt_MASQUERADE,iptable_nat,nf_nat,nf_conntrack_ipv4 nf_defrag_ipv4 1073 1 nf_conntrack_ipv4 ip_tables 9899 2 iptable_nat,iptable_filter x_tables 14175 3 ipt_MASQUERADE,iptable_nat,ip_tables
Sure enough, we can see why nf_conntrack is now involved in the TCP stack and why we might be filling up its buffers, but it doesn’t explain the disparity between the hosts.
In retrospect the explanation is both blindingly obvious, craftily subtle and provable to boot. In short, when a rule is added to the ‘nat’ iptables table the various kernel modules required to support it are dynamically loaded. They remain, and are therefore part of the execution path of iptables, even if their contents is flushed. What this means in practice is that for a running kernel, once you have defined a nat iptables rule you are at the mercy of its buffer size and other constraints for the lifetime of that kernel run. Or put more simply, creating and flushing nat rules does not leave you in the same state as having never created them.
We can prove this in a rather ham-fisted way.
We’ll create a small dummy client and server in Ruby for the purposes of opening many concurrent connections. We’ll manipulate some of the limits down in order to enable us to re-produce the error without requiring massive live scale. The following scripts are best run under Ruby 1.9 so that we can make use of native threads.
#!/usr/bin/env ruby1.9
#
# Accept many connections
#
require 'socket'
server = TCPServer.open(7777)
loop {
Thread.start(server.accept) do |client|
loop {
sleep 60 # do nothing
}
end
}
#!/usr/bin/env ruby1.9
#
# Connect many times
#
require 'socket'
host = 'localhost'
port = 7777
19998.times do
Thread.start do
TCPSocket.open(host, port)
loop {
sleep 60 # do nothing
}
end
end
As root, we’ll run the following to open up our ability to create connections:
ulimit -n 20000 echo 20000 > /proc/sys/kernel/threads-max echo 0 > /proc/sys/net/ipv4/tcp_syncookies iptables -L # forces the 'ip_tables' kernel modules to be loaded with empty tables and chains
If you then run ./server.rb followed by ./client.rb and `watch -n 2 “dmesg | tail -10”` you’ll see, well, not much going on. However, if we introduce and then flush and delete a nat table iptables ruleset we’ll see both the modules loaded and the tests produce the expected error in the kernel ring buffer output:
iptables --table nat --append POSTROUTING --out-interface eth0 -j MASQUERADE iptables --flush iptables --table nat --flush iptables --delete-chain iptables --table nat --delete-chain ... # lsmod | egrep 'ip_tables|conn' nf_conntrack_ipv4 10346 3 iptable_nat,nf_nat nf_conntrack 60975 4 ipt_MASQUERADE,iptable_nat,nf_nat,nf_conntrack_ipv4 nf_defrag_ipv4 1073 1 nf_conntrack_ipv4 ip_tables 9899 2 iptable_nat,iptable_filter x_tables 14175 3 ipt_MASQUERADE,iptable_nat,ip_tables ... sysctl net.netfilter.nf_conntrack_max=100
If we run the same tests again with the artificially low limit and monitor the kernel ring buffer with `watch -n 2 “dmesg | tail -10”`once again you’ll quickly see the “nf_conntrack: table full, dropping packet” message.
So, what have we learnt here? The short of it is that manipulating nat tables under iptables on a running kernel will change the behaviour of your network stack, and that clearing down any nat tables will not return the stack to the same previous state. In order to do that you’ll have to:
rmmod iptable_nat rmmod ipt_MASQUERADE rmmod nf_nat rmmod nf_conntrack_ipv4 rmmod nf_conntrack rmmod nf_defrag_ipv4 ... # lsmod | egrep 'ip_tables|conn' ip_tables 9899 1 iptable_filter x_tables 14175 1 ip_tables
to return things to the previous state, at least for this example. Whether that is preferable on a production world-facing system to a reboot or a recommission is open to debate.
EDIT:
Some further testing on my part has determined that even listing the nat tables with `iptables -t nat -L` will cause the conntrack modules to be probed into the kernel. For very busy world-facing hosts the only solution that I can see is to add the various conntrack modules to /etc/modprobe.d/blacklist.conf to ensure that they are never loaded.
I’ve seen this on CentOS/RHEL and Ubuntu, right up to the current server release of the latter.
Installing Nagios on Ubuntu or Debian without Postfix
If you install the default ‘nagios3’ package from the repositories on a Debian-based distribution, you wind up with a full copy of postfix installed. This is fine if you’re simply trying to get the thing to work, but as part of a wider infrastructure you most likely do not want a full-fledged MTA arbitrarily popping up on your Nagios host - an MTA that you have to administer, monitor (!), patch and most importantly secure.
The dependency chain that causes postfix to be installed is:
nagios3 → nagios3-core → nagios3-common → bsd-mailx → default-mta | mail-transport-agent.
Why the package maintainers made bsd-mailx dependent on a fully-fledged MTA I will never know. Perhaps they wanted to ensure things “just worked”? It still seems a bit heavy handed to me, especially when one can configure .mailrc to point to a mailhost and be done with it.
In order to install nagios3 from the repositories and satisfy those dependencies without pulling in postfix you should install the ‘lsb-invalid-mta’ package, which provides ‘mail-transport-agent’ and satisfies the dependency chain above, in place of postfix. The package provides a sendmail binary that does nothing but return a non-zero return code, so you’ll never accidentally send mail from a local system, but you will have to configure your system to take advantage of a suitable MTA host.
Here is some puppet to install nagios3 without postfix:
# /etc/puppet/modules/nagios-server/manifests/init.pp
#
# Class: nagios-server
#
# This class maintains a Nagios server.
#
# Parameters:
# None
#
# Requires:
# nagios-server::install
#
class nagios-server {
include nagios-server::install
service { 'apache2':
ensure => running,
enable => true,
require => Class['nagios-server::install'],
}
service { 'nagios3':
ensure => running,
enable => true,
require => Class['nagios-server::install'],
}
}
# /etc/puppet/modules/nagios-server/manifests/install.pp
#
# Class: nagios-server::install
#
# This class will install a Nagios server from the repo packages
#
# Parameters:
# None
#
# Requires:
# Nothing
#
class nagios-server::install {
# Prevent nagios3-common->mailx dependency from pulling in an MTA.
package { 'lsb-invalid-mta':
ensure => present,
}
$packages = ['nagios3', 'nagios-images', 'nagios-plugins', 'nagios3-doc',]
package { $packages:
ensure => present,
require => Package['lsb-invalid-mta'],
}
}
Code On The Road
As I write this I am sitting on a wall in the Croatian town of Bol, on the isle of Brac. I am currently touring Europe on a motorbike (www.biketouraroundeurope.com, if you’re interested) and all the while I have been maintaining an application for a client.
Without sounding smug (although having said that, this will inevitably come across as exactly that) it has been satisfying to push code updates from the inside of a tent, receive SMS notifications of important events, and otherwise keep things ticking over from nothing more than a £70 netbook from ebay and the odd bit of wireless Internet access obtained with the unfortunate buying of a beer.
Whilst a bit extreme, this just goes to show you what a good, automated deployment system can do for you. If I had to hand-ball this stuff onto the multiple boxes that make up the production and staging servers I wouldn’t have been able to do keep the app running.
Now me, on my wall, is small-fry stuff I admit (the app maintains just over 1 million subscribers for a number of national newspapers, and is in no way mission critical). But if you’ve a big farm of boxes and a decently sized development team you’ll be wanting to get new code into live quite often. New code is change, and change is risk, and the more you can automate the risk out of the process the quieter your alerting systems and the happier your clients.
I’ve been keeping a close eye on CloudFoundry (www.cloudfoundry.org, but the real stuff goes on on github) recently. Even within the annexed datacenter walls products of this kind are most definitely the future of system administration. SCM + Capistrano (or Ant, or whatever else your world works best with) should certainly be a minimum, we all know this by now - but taking the abstraction a level higher as these tools do and thinking only in terms of “compute units” and request routing is, I am sure, the way forward. Leave one team to worry about the gear, and another to deploy things to it in an automated, repeatable, reconfigurable fashion.
Starting a Rails instance automatically on boot on Ubuntu Server
Audience
The follow steps will allow you to start a Rails instance automatically on boot under an Ubuntu instance in the simplest possible way. If you know about alternative Rails servers, deployment architectures beyond “webserver + mod_proxy + Mongrel” and the like then you are not the intended audience for this post.
Below is the simplest possible thing you can do to start a Rails instance automatically when you reboot your Ubuntu server, in keeping with the startup scripts that are already there. I hope that this is useful for someone starting out.
Steps
- save the following script in a file called ‘rails’ within the /tmp directory
- edit the USER, PORT, RAILS_ROOT variables to suit your application
- run the following commands:
sudo cp /tmp/rails /etc/init.d sudo update-rc.d rails defaults
Reboot and test.
Script
#! /bin/sh ### BEGIN INIT INFO # Provides: rails # Required-Start: $remote_fs $syslog # Required-Stop: $remote_fs $syslog # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 # Short-Description: Start a Rails instance # Description: Do the simplest thing possible in keeping with # upstart to spin up a single Rails instance. ### END INIT INFO # Author: Sam Pointer# # Do NOT "set -e" # PATH should only include /usr/* if it runs after the mountnfs.sh script PATH=/sbin:/usr/sbin:/bin:/usr/bin USER="myappuser" PORT=3000 RAILS_ROOT="/home/myapp/current" COMMAND="ruby script/server -e production -p $PORT -d" DESCRIPTION="Rails instance" # Load the VERBOSE setting and other rcS variables . /lib/init/vars.sh # Define LSB log_* functions. # Depend on lsb-base (>= 3.0-6) to ensure that this file is present. . /lib/lsb/init-functions # # Function that starts the daemon/service # do_start() { # Return # 0 if daemon has been started # 1 if daemon was already running # 2 if daemon could not be started su -c "cd $RAILS_ROOT && $COMMAND" $USER } case "$1" in start) [ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESCRIPTION" do_start case "$?" in 0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;; 2) [ "$VERBOSE" != no ] && log_end_msg 1 ;; esac ;; esac

