On Wed, 25 Feb 2004, Ron Peterson wrote:
> On Tue, 24 Feb 2004, Ron Peterson wrote:
>
> > I've also added some graphs of must monitoring mist (which is the machine
> > I actually care about the most right now). Mist ping latencies are
> > predictably on the upswing again. I'll likely be rebooting soon.
> >
> > http://depot.mtholyoke.edu:8080/tmp/must-mist/2002-02-24_8:40/
>
> I've had to turn my attention to some other responsibilities, so I haven't
> done any kernel profiling yet. However, I can report that I rebooted
> 'mist' into 2.4.20 yesterday, and I have seen rock solid .15 ms response
> times for more than 24 hours. Host 'must' is likewise now stable, running
> 2.4.20 for two days now. I have graphs, logs, etc. if anyone cares to see
> them.
These machines remain very stable at 2.4.20.
I don't know where things currently stand vis-a-vis knowing what's
causing this network/system load creep problem, but I thought I'd report
that I installed 2.4.21 on a single processor about a week ago (1GHz PIII,
500MB, Intel 82820 (ICH2) Chipset w/ eepro100 module), and am seeing the
same bad behaviour. I have very clear graphs, if that's useful, but
haven't been logging system stats as aggressively as on some other
machines.
So something between 2.4.20 and 2.4.21, I think. I wish I could be more
helpfull..
_________________________
Ron Peterson
Network & Systems Manager
Mount Holyoke College
You're not providing any new information until you work on those kernel
profiles we asked for the other week.
On Thu, 4 Mar 2004, David S. Miller wrote:
> You're not providing any new information until you work on those kernel
> profiles we asked for the other week.
I didn't hear back when I wrote and asked if you still wanted those. I'm
rebooting with profiling turned on. It takes a few days for things to get
out of control, but I'll provide data when that happens.
_________________________
Ron Peterson
Network & Systems Manager
Mount Holyoke College
On Thu, 4 Mar 2004, Ron Peterson wrote:
> ... thought I'd report that I installed 2.4.21 on a single processor
> about a week ago (1GHz PIII, 500MB, Intel 82820 (ICH2) Chipset w/
> eepro100 module), and am seeing the same bad behaviour.
I've booted with kernel profiling turned on. I've posted some preliminary
results. I don't have profile data yet, but you can see in the following
that when I turn off my iptables rules, the ping latency graph flattens
out.
http://depot.mtholyoke.edu:8080/tmp/tap-sam/2004-03-06_9:30/sam_last_108000.png
http://depot.mtholyoke.edu:8080/tmp/tap-sam/
My understanding is that the kernel profile information will become
interesting when the machine starts thrashing. If it would be useful for
me to dump anything before then, let me know.
_________________________
Ron Peterson
Network & Systems Manager
Mount Holyoke College
On Sat, 6 Mar 2004, Ron Peterson wrote:
> My understanding is that the kernel profile information will become
> interesting when the machine starts thrashing. If it would be useful for
> me to dump anything before then, let me know.
On a related note...
What kind of performance hit do you take for booting with kernel profiling
turned on? If not much, I would consider always booting this way, so that
if a machine starts sinking, I could maybe capture some useful
information. Is that wise?
_________________________
Ron Peterson
Network & Systems Manager
Mount Holyoke College
On Sat, 6 Mar 2004 09:55:09 -0500 (EST)
Ron Peterson <[email protected]> wrote:
> My understanding is that the kernel profile information will become
> interesting when the machine starts thrashing.
Yes, now please, pretty please, get us the profiles...
On Mon, 8 Mar 2004, David S. Miller wrote:
> Date: Mon, 8 Mar 2004 23:34:31 -0800
> From: David S. Miller <[email protected]>
> To: Ron Peterson <[email protected]>
> Cc: [email protected]
> Subject: Re: network / performance problems
>
> On Sat, 6 Mar 2004 09:55:09 -0500 (EST)
> Ron Peterson <[email protected]> wrote:
>
> > My understanding is that the kernel profile information will become
> > interesting when the machine starts thrashing.
>
> Yes, now please, pretty please, get us the profiles...
http://depot.mtholyoke.edu:8080/tmp/tap-sam/2004-03-09_09:30/
The machines is not really thrashing yet, but I'd expect in another couple
days, if experience holds, that it will be gonzo. I'd like to revert back
to 2.4.20 before then, as this is a production machine. I'll leave it
going as is for a short while, however, in case anyone has any suggestions
about things I should look at while it's misbehaving.
_________________________
Ron Peterson
Network & Systems Manager
Mount Holyoke College
On Tue, 9 Mar 2004, Ron Peterson wrote:
>
> The machines is not really thrashing yet, but I'd expect in another couple
> days, if experience holds, that it will be gonzo. I'd like to revert back
> to 2.4.20 before then, as this is a production machine. I'll leave it
> going as is for a short while, however, in case anyone has any suggestions
> about things I should look at while it's misbehaving.
I'm now dumping profile information from sam to the following location
every fifteen minutes:
http://depot.mtholyoke.edu:8080/tmp/sam-profile/
I'm thinking I'll reboot sam to 2.4.20 tomorrow morning, unless someone
says they'd like some more data.
Best.
_________________________
Ron Peterson
Network & Systems Manager
Mount Holyoke College
On Thu, 4 Mar 2004, Ron Peterson wrote:
> These machines remain very stable at 2.4.20.
>
> I don't know where things currently stand vis-a-vis knowing what's
> causing this network/system load creep problem, but I thought I'd report
> that I installed 2.4.21 on a single processor about a week ago (1GHz PIII,
> 500MB, Intel 82820 (ICH2) Chipset w/ eepro100 module), and am seeing the
> same bad behaviour.
I still don't know the root cause of my ever increasing ping
latencies. However, I can report that if I compile all the netfilter
helpers as modules, rather than statically linking them, that everything
runs fine.
This has solved my immediate problem, so I've turned my attention to other
things. As far as I know, though, there's still something amiss.
I have another machine that's not in production yet running 2.6.5. I'm
adopted the habit of compiling netfilter stuff as modules, but I'll
statically link everything and run it that way to see what I can see.
_________________________
Ron Peterson
Network & Systems Manager
Mount Holyoke College
On Mon, 12 Apr 2004, Ron Peterson wrote:
> I have another machine that's not in production yet running 2.6.5. I'm
> adopted the habit of compiling netfilter stuff as modules, but I'll
> statically link everything and run it that way to see what I can see.
Results here:
http://depot.mtholyoke.edu:8080/tmp/tap-stow/2004-04-14/
The problem persists. To the best of my knowledge, starting with kernel
version 2.4.21, and including 2.6 series kernels, if you statically link
netfilter code, and use iptables to set up connection tracking rules (as
below), ksoftirqd will consume increasing cpu%, and ping latencies
will grow. Eventually the machine will be unuseable.
#! /bin/sh
IPTABLES=/usr/local/sbin/iptables
IFPUB=eth0
IFPRIV=eth1
PUBIP=...
PUBNET=...
PRIVIP=...
PRIVNET=...
# The default policy for each chain is to DROP the packet.
$IPTABLES -P INPUT DROP
$IPTABLES -P OUTPUT DROP
$IPTABLES -P FORWARD DROP
# Flush existing rules for all chains.
$IPTABLES -F
$IPTABLES -t nat -F
$IPTABLES -X
# Allow this host to establish new connections. Otherwise only accept
# established connections.
$IPTABLES -A OUTPUT --match state --state NEW,ESTABLISHED,RELATED -j ACCEPT
$IPTABLES -A INPUT --match state --state ESTABLISHED,RELATED -j ACCEPT
# Allow ping from on-campus
$IPTABLES -A INPUT -i $IFPUB -s $PUBNET --protocol icmp --icmp-type echo-request -j ACCEPT
$IPTABLES -A INPUT -i $IFPRIV -s $PRIVNET --protocol icmp --icmp-type echo-request -j ACCEPT
# Allow incoming ssh connections.
$IPTABLES -A INPUT --protocol tcp --destination-port 22 -j ACCEPT
# Allow incoming https connections.
# $IPTABLES -A INPUT --protocol tcp --destination-port 443 -j ACCEPT
# Allow Samba/SMB/NetBIOS
$IPTABLES -A INPUT --protocol tcp --destination-port 137:139 -j ACCEPT
$IPTABLES -A INPUT --protocol tcp --destination-port 445 -j ACCEPT
# Allow CUPS
$IPTABLES -A INPUT --protocol tcp --destination-port 631 -j ACCEPT
# Allow this host to talk to itself.
$IPTABLES -A INPUT -d 127.0.0.1 -i lo -j ACCEPT
$IPTABLES -A INPUT -s $PUBIP -d $PUBIP -j ACCEPT
$IPTABLES -A INPUT -s $PRIVIP -d $PRIVIP -j ACCEPT
_________________________
Ron Peterson
Network & Systems Manager
Mount Holyoke College