2001-04-15 00:54:26

by George Bonser

[permalink] [raw]
Subject: 2.4 stable when?

I have a web server farm that right now has about 125 apache processes
running per machine. If I try to use 2.4.3 or even 2.4.3-ac6 it will go to
about 400 (meaning it is slow in clearing connections), the load average
will start to climb until it gets to close to 100 and then stops responding.
It runs ok for about the first 5 minutes of its life and then sinks deeper
and deeper into the mire until it disappears. No processes are shown stuck
in D state.

2.4.4pre3 works, sorta, but is very "pumpy". The load avg will go up to
about 60, then drop, then climb again, then drop. It will vary from very
sluggish performance to snappy and back again to sluggish.

With 2.2 kernels I see something like this:

00:48:00 up 4:51, 1 user, load average: 0.00, 0.02, 0.06

141 processes: 139 sleeping, 2 running, 0 zombie, 0 stopped
CPU states: 2.8% user, 3.2% system, 0.0% nice, 94.1% idle

and that is with about 120 remote users connected to the box via apache.


Is there any information that would be helpful to the kernel developers that
I might be able to provide or is this a known issue that is currently being
worked out?




2001-04-15 14:54:12

by Rik van Riel

[permalink] [raw]
Subject: Re: 2.4 stable when?

On Sat, 14 Apr 2001, George Bonser wrote:

> 2.4.4pre3 works, sorta, but is very "pumpy". The load avg will go up to
> about 60, then drop, then climb again, then drop. It will vary from very
> sluggish performance to snappy and back again to sluggish.

So it's stable ;))

> With 2.2 kernels I see something like this:
>
> 00:48:00 up 4:51, 1 user, load average: 0.00, 0.02, 0.06

*nod*

> Is there any information that would be helpful to the kernel
> developers that I might be able to provide or is this a known issue
> that is currently being worked out?

I never heard about this problem. What would be helpful is to
send a few minutes' (a full 'load cycle'?) worth of output from
'vmstat 5' and some information about the configuration of the
machine.

It's possible I'll want more information later, but just the
vmstat output would be a good start.

If the data isn't too big, I'd appreciate it if you could also
CC [email protected].

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-04-15 15:37:14

by George Bonser

[permalink] [raw]
Subject: RE: 2.4 stable when?

>
> > Is there any information that would be helpful to the kernel
> > developers that I might be able to provide or is this a known issue
> > that is currently being worked out?
>
> I never heard about this problem. What would be helpful is to
> send a few minutes' (a full 'load cycle'?) worth of output from
> 'vmstat 5' and some information about the configuration of the
> machine.
>
> It's possible I'll want more information later, but just the
> vmstat output would be a good start.
>
> If the data isn't too big, I'd appreciate it if you could also
> CC [email protected].
>
> regards,

Sounds good. I think I can do this. Also, it appears that the problem is
related to how busy the farm is. The machines are load balanced in a "least
connections" mode. There are 5 servers in the farm. Suppose I have 300
connections to each machine and reboot one to load the new kernel.

When that server comes back up it is handed 300 connections all at once. It
seems (and this is subjective ... does it handle things differently with
more than 256 processes?) that when I give the machine much more than 200
connections, it is very slow to clear them. It seems to have trouble at
that point clearing connections as fast as it is getting them. If I have
less than 200 connections initially, it seems to handle things OK.

I tried to collect some data last night but it appeared to work ok. I will
wait for the load to come up later today and try it during its peak time.
While I could put the balancer into a "slow start" mode, 2.2 always seemed
to handle the burst of new connections just fine so I didn't bother.

The machine is a UP Pentium-III 800MHz with 512MB of RAM running Debian
Woody. It is a SuperMicro 6010L 1U unit with the SuperMicro 370DLR
motherboard. This uses the ServerWorks ServerSet III LE chipset and Adaptec
AIC-7892 Ultra160 disk controller and on-board dual Intel NIC (only using
eth0).

I have cut the configuration pretty much to the bone, no NetFilter support,
no QoS, no Audio/Video. Tried to get it as plain vanilla as possible (my
first step when having problems).

I was able to run 2.4.0-test12 in this application and did for quite some
time. I don't recall trying 2.4.1 but I know I had severe problems with
2.4.2 and 2.4.3. Now that I think about it, I am not sure the farm was as
busy back when I put 2.4.0 on it or that I ever rebooted during my peak
period. This might have been a problem all along but I just never saw it. It
seems to have to do with handing the machine a large number of connections
at once and then a stream of them at a pretty good clip. It is getting about
40 connections/second right this minute but that should come up a bit in the
next couple of hours.

To be quite honest, I could run this on 2.2 forever, it is just a webserver.
My only reason for using 2.4 would be to see if I can go SMP on these things
when my load gets higher and I get some benefit of the finer granularity of
2.4 in SMP to serve a higher load with fewer machines than would be possible
with 2.2. That, and just to beat on a 2.4 kernel and report any problems to
you guys.


2001-04-15 15:55:39

by jeff millar

[permalink] [raw]
Subject: v2.4.3 networking problem other than tcp_ecn?

Several web sites have stoped working recently about the time I upgraded to
2.4.2 - 2.4.3. Some testing at one site showed it doesn't respond to pings
except for an occasional reply reported as "admin prohibited filter" by
tcpdump or as "packet filtered" by ping. The kernel doesn't have tcp_ecn
compiled in, access is via ppp and dialup, everything possible compiled as
modules, iptables firewall setup. This problem applies to machine on the
local net and to the firewall itself.

ideas?

jeff

2001-04-20 07:33:44

by George Bonser

[permalink] [raw]
Subject: RE: 2.4 stable when?

Just to follow up on this ...

I am now running 2.4.4pre4 and it seems to be stable. If I reboot the
machine (or simply stop and restart apache) the load avg does go much higher
than I am used to seeing (near 50 for about 5 minutes or so) it does not
hang as previous kernels did. I have the vmstat and top -i info if anyone
is curious. It does not touch swap, though.





> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of George Bonser
> Sent: Sunday, April 15, 2001 8:39 AM
> To: Rik van Riel
> Cc: [email protected]
> Subject: RE: 2.4 stable when?
>
>
> >
> > > Is there any information that would be helpful to the kernel
> > > developers that I might be able to provide or is this a known issue
> > > that is currently being worked out?
> >
> > I never heard about this problem. What would be helpful is to
> > send a few minutes' (a full 'load cycle'?) worth of output from
> > 'vmstat 5' and some information about the configuration of the
> > machine.
> >
> > It's possible I'll want more information later, but just the
> > vmstat output would be a good start.
> >
> > If the data isn't too big, I'd appreciate it if you could also
> > CC [email protected].
> >
> > regards,
>
> Sounds good. I think I can do this. Also, it appears that the problem is
> related to how busy the farm is. The machines are load balanced
> in a "least
> connections" mode. There are 5 servers in the farm. Suppose I have 300
> connections to each machine and reboot one to load the new kernel.
>
> When that server comes back up it is handed 300 connections all
> at once. It
> seems (and this is subjective ... does it handle things differently with
> more than 256 processes?) that when I give the machine much more than 200
> connections, it is very slow to clear them. It seems to have trouble at
> that point clearing connections as fast as it is getting them. If I have
> less than 200 connections initially, it seems to handle things OK.
>
> I tried to collect some data last night but it appeared to work ok. I will
> wait for the load to come up later today and try it during its peak time.
> While I could put the balancer into a "slow start" mode, 2.2 always seemed
> to handle the burst of new connections just fine so I didn't bother.
>
> The machine is a UP Pentium-III 800MHz with 512MB of RAM running Debian
> Woody. It is a SuperMicro 6010L 1U unit with the SuperMicro 370DLR
> motherboard. This uses the ServerWorks ServerSet III LE chipset
> and Adaptec
> AIC-7892 Ultra160 disk controller and on-board dual Intel NIC (only using
> eth0).
>
> I have cut the configuration pretty much to the bone, no
> NetFilter support,
> no QoS, no Audio/Video. Tried to get it as plain vanilla as possible (my
> first step when having problems).
>
> I was able to run 2.4.0-test12 in this application and did for quite some
> time. I don't recall trying 2.4.1 but I know I had severe problems with
> 2.4.2 and 2.4.3. Now that I think about it, I am not sure the farm was as
> busy back when I put 2.4.0 on it or that I ever rebooted during my peak
> period. This might have been a problem all along but I just never
> saw it. It
> seems to have to do with handing the machine a large number of connections
> at once and then a stream of them at a pretty good clip. It is
> getting about
> 40 connections/second right this minute but that should come up a
> bit in the
> next couple of hours.
>
> To be quite honest, I could run this on 2.2 forever, it is just a
> webserver.
> My only reason for using 2.4 would be to see if I can go SMP on
> these things
> when my load gets higher and I get some benefit of the finer
> granularity of
> 2.4 in SMP to serve a higher load with fewer machines than would
> be possible
> with 2.2. That, and just to beat on a 2.4 kernel and report any
> problems to
> you guys.
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/