2006-01-08 06:16:38

by Grant Coady

[permalink] [raw]
Subject: Why is 2.4.32 four times faster than 2.6.14.6??

Hi there,

Recently I started testing 2.6 stable on my firewall box here, I
work via ssh terminals to firewall and have a good feel for the
CLI responsiveness.

The test? Just a simple display the apache access log, but it is
very slow with 2.6 kernels (2.6.14.5, 2.6.14.6, 2.6.15):

2.4.32-hf32.1:
grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log |cut -c-96
[...]
2006-01-08 14:38:52 +1100: bugsplatter.mine.nu 207.46.98.39 "GET /test/linux-2.6/sempro/ HTTP/1.

real 0m1.562s
user 0m0.600s
sys 0m0.310s

2.6.14.6:
grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log |cut -c-96
[...]
2006-01-08 14:38:52 +1100: bugsplatter.mine.nu 207.46.98.39 "GET /test/linux-2.6/sempro/ HTTP/1.

real 0m6.318s
user 0m0.690s
sys 0m1.140s

grant@deltree:~$ /sbin/lspci
00:00.0 Host bridge: Intel Corporation 430FX - 82437FX TSC [Triton I] (rev 02)
00:07.0 ISA bridge: Intel Corporation 82371FB PIIX ISA [Triton I] (rev 02)
00:07.1 IDE interface: Intel Corporation 82371FB PIIX IDE [Triton I] (rev 02)
00:0d.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 08)
00:0e.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 08)
00:10.0 VGA compatible controller: S3 Inc. 86c764/765 [Trio32/64/64V+] (rev 44)

dmesg + .config on: http://bugsplatter.mine.nu/test/boxen/deltree/

--
Thanks,
Grant.
http://bugsplatter.mine.nu/


2006-01-08 06:58:13

by Markus Rechberger

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??

Hi,

what does hdparm show up?
Were there any other processes running during the test?
what does "vmstat 1" show up during the test?

Markus

On 1/8/06, Grant Coady <[email protected]> wrote:
> Hi there,
>
> Recently I started testing 2.6 stable on my firewall box here, I
> work via ssh terminals to firewall and have a good feel for the
> CLI responsiveness.
>
> The test? Just a simple display the apache access log, but it is
> very slow with 2.6 kernels (2.6.14.5, 2.6.14.6, 2.6.15):
>
> 2.4.32-hf32.1:
> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log |cut -c-96
> [...]
> 2006-01-08 14:38:52 +1100: bugsplatter.mine.nu 207.46.98.39 "GET /test/linux-2.6/sempro/ HTTP/1.
>
> real 0m1.562s
> user 0m0.600s
> sys 0m0.310s
>
> 2.6.14.6:
> grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log |cut -c-96
> [...]
> 2006-01-08 14:38:52 +1100: bugsplatter.mine.nu 207.46.98.39 "GET /test/linux-2.6/sempro/ HTTP/1.
>
> real 0m6.318s
> user 0m0.690s
> sys 0m1.140s
>
> grant@deltree:~$ /sbin/lspci
> 00:00.0 Host bridge: Intel Corporation 430FX - 82437FX TSC [Triton I] (rev 02)
> 00:07.0 ISA bridge: Intel Corporation 82371FB PIIX ISA [Triton I] (rev 02)
> 00:07.1 IDE interface: Intel Corporation 82371FB PIIX IDE [Triton I] (rev 02)
> 00:0d.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 08)
> 00:0e.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 08)
> 00:10.0 VGA compatible controller: S3 Inc. 86c764/765 [Trio32/64/64V+] (rev 44)
>
> dmesg + .config on: http://bugsplatter.mine.nu/test/boxen/deltree/
>
> --
> Thanks,
> Grant.
> http://bugsplatter.mine.nu/
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


--
Markus Rechberger

2006-01-08 07:18:43

by be-news06

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??

Markus Rechberger <[email protected]> wrote:
> Hi,
>
> what does hdparm show up?
> Were there any other processes running during the test?
> what does "vmstat 1" show up during the test?

also also retry with redirection to /dev/null, this could be a console
problem since there is 5 minutes wait time.

Gruss
Bernd

2006-01-08 07:29:00

by Grant Coady

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??

On Sun, 8 Jan 2006 07:58:09 +0100, Markus Rechberger <[email protected]> wrote:

>Were there any other processes running during the test?
box runs same config both kernels: the usual light load ~100% idle ;)
>what does "vmstat 1" show up during the test?

grant@deltree:~$ uname -r
2.6.14.6a
grant@deltree:~$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
[...]
0 0 0 63800 11520 32352 0 0 0 0 110 18 0 0 100 0
0 0 0 63800 11520 32352 0 0 0 0 106 17 1 0 99 0
3 0 0 63560 11520 32352 0 0 0 0 346 502 22 9 69 0
1 0 0 63560 11520 32352 0 0 0 0 1057 1987 59 41 0 0
1 0 0 63560 11520 32352 0 0 0 0 1062 2011 59 41 0 0
1 0 0 63560 11520 32352 0 0 0 0 1053 2001 50 50 0 0
1 0 0 63500 11596 32352 0 0 0 136 1054 1974 61 39 0 0
1 0 0 63500 11596 32352 0 0 0 0 1040 1978 50 50 0 0
0 0 0 63620 11596 32352 0 0 0 0 799 1425 45 27 29 0
0 0 0 63620 11596 32352 0 0 0 0 104 12 0 0 100 0
0 0 0 63620 11596 32352 0 0 0 0 103 10 0 0 100 0

grant@deltree:~$ uname -r
2.4.32-hf32.1
grant@deltree:~$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
[...]
0 0 0 83192 6532 21404 0 0 0 0 104 12 0 1 99 0
0 0 0 83152 6572 21404 0 0 0 80 116 24 0 1 99 0
1 0 0 82952 6572 21404 0 0 0 0 168 130 6 5 89 0
2 0 0 82952 6572 21404 0 0 0 0 667 1019 65 35 0 0
0 0 0 83152 6572 21404 0 0 0 0 297 378 41 10 49 0
0 0 0 83152 6572 21404 0 0 0 0 104 9 0 1 99 0
0 0 0 83064 6656 21404 0 0 0 304 169 121 0 1 99 0
0 0 0 83064 6656 21404 0 0 0 0 137 42 0 2 98 0

--
Thanks,
Grant.
http://bugsplatter.mine.nu/

2006-01-08 07:42:29

by Grant Coady

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??

On Sun, 08 Jan 2006 08:18:40 +0100, [email protected] (Bernd Eckenfels) wrote:

>Markus Rechberger <[email protected]> wrote:
>> Hi,
>>
>> what does hdparm show up?
>> Were there any other processes running during the test?
>> what does "vmstat 1" show up during the test?
>
>also also retry with redirection to /dev/null, this could be a console
>problem since there is 5 minutes wait time.

Excuse me? there is no 5 minutes wait time ;)

grant@deltree:~$ uname -r
2.4.32-hf32.1
grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log |cut -c-96

real 0m1.671s
user 0m0.550s
sys 0m0.300s
grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log |cut -c-96 >/dev/null

real 0m0.510s
user 0m0.420s
sys 0m0.080s

grant@deltree:~$ uname -r
2.6.14.6a
grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log |cut -c-96 >/dev/null
real 0m6.463s
user 0m0.770s
sys 0m0.990s
grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log |cut -c-96 >/dev/null

real 0m0.524s
user 0m0.400s
sys 0m0.130s

Yes, the delay is in the console, that's what I'm talking about ;)
A very perceivable, measurable sluggishness in 2.6 CLI over ssh.

Same thing on 2.6.14.6 screen console is (copy by hand):
real 0m1.374s
user 0m0.510s
sys 0m0.960s

--
Thanks,
Grant.
http://bugsplatter.mine.nu/

2006-01-08 08:00:51

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??

On Sun, Jan 08, 2006 at 06:42:25PM +1100, Grant Coady wrote:
> Excuse me? there is no 5 minutes wait time ;)

sure there is, you see the real time is 6mins vs 1min. since user and system
time are nearly the same, the delay is introduced by sleeping io. And if it
is not the disk, it is the terminal, as proofen by the redirection.

No the question is, if this is in the pty or tcp/networkstack code.

Gruss
Bernd
--
(OO) -- Bernd_Eckenfels@M?rscher_Strasse_8.76185Karlsruhe.de --
( .. ) ecki@{inka.de,linux.de,debian.org} http://www.eckes.org/
o--o 1024D/E383CD7E eckes@IRCNet v:+497211603874 f:+49721151516129
(O____O) When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl!

2006-01-08 08:11:41

by Con Kolivas

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??

On Sunday 08 January 2006 19:00, Bernd Eckenfels wrote:
> On Sun, Jan 08, 2006 at 06:42:25PM +1100, Grant Coady wrote:
> > Excuse me? there is no 5 minutes wait time ;)
>
> sure there is, you see the real time is 6mins vs 1min. since user and
> system time are nearly the same, the delay is introduced by sleeping io.

He was trying to say it was seconds, not minutes.

Cheers,
Con

2006-01-08 09:12:32

by be-news06

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??

Con Kolivas <[email protected]> wrote:
> He was trying to say it was seconds, not minutes.

Oh yes, sure ... surry me bad.

Gruss
Bernd

2006-01-08 09:57:47

by Willy Tarreau

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??

Hi Grant,

On Sun, Jan 08, 2006 at 06:28:53PM +1100, Grant Coady wrote:
> On Sun, 8 Jan 2006 07:58:09 +0100, Markus Rechberger <[email protected]> wrote:
>
> >Were there any other processes running during the test?
> box runs same config both kernels: the usual light load ~100% idle ;)
> >what does "vmstat 1" show up during the test?
>
> grant@deltree:~$ uname -r
> 2.6.14.6a
> grant@deltree:~$ vmstat 1
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> [...]
> 0 0 0 63800 11520 32352 0 0 0 0 110 18 0 0 100 0
> 0 0 0 63800 11520 32352 0 0 0 0 106 17 1 0 99 0
> 3 0 0 63560 11520 32352 0 0 0 0 346 502 22 9 69 0
> 1 0 0 63560 11520 32352 0 0 0 0 1057 1987 59 41 0 0
> 1 0 0 63560 11520 32352 0 0 0 0 1062 2011 59 41 0 0
> 1 0 0 63560 11520 32352 0 0 0 0 1053 2001 50 50 0 0
> 1 0 0 63500 11596 32352 0 0 0 136 1054 1974 61 39 0 0
> 1 0 0 63500 11596 32352 0 0 0 0 1040 1978 50 50 0 0
> 0 0 0 63620 11596 32352 0 0 0 0 799 1425 45 27 29 0
> 0 0 0 63620 11596 32352 0 0 0 0 104 12 0 0 100 0
> 0 0 0 63620 11596 32352 0 0 0 0 103 10 0 0 100 0
>
> grant@deltree:~$ uname -r
> 2.4.32-hf32.1
> grant@deltree:~$ vmstat 1
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> [...]
> 0 0 0 83192 6532 21404 0 0 0 0 104 12 0 1 99 0
> 0 0 0 83152 6572 21404 0 0 0 80 116 24 0 1 99 0
> 1 0 0 82952 6572 21404 0 0 0 0 168 130 6 5 89 0
> 2 0 0 82952 6572 21404 0 0 0 0 667 1019 65 35 0 0
> 0 0 0 83152 6572 21404 0 0 0 0 297 378 41 10 49 0
> 0 0 0 83152 6572 21404 0 0 0 0 104 9 0 1 99 0
> 0 0 0 83064 6656 21404 0 0 0 304 169 121 0 1 99 0
> 0 0 0 83064 6656 21404 0 0 0 0 137 42 0 2 98 0

It's rather strange that 2.6 *eats* CPU apparently doing nothing ! At first
I thought about a PIO/DMA problemn but we can clearly see that there's no
IO in on both vmstat. Could you please retest :
- without the pipe (remove '| cut ...') to avoid inter-process
communications
- with cat instead of grep to ensure you don't spend time processing
anything

You should be able to find one simple pattern which makes the problem
appear/disappear on 2.6. At least, 'cat x.log >/dev/null' should not
take time or that time should be spent in I/O. I remember an old test
I did a long time ago which behaved badly on 2.6, it consisted in lots
of pipes (eg: dd bs=1 | dd bs=1 |...). May be you're on a simplified
form of this.

> --
> Thanks,
> Grant.
> http://bugsplatter.mine.nu/

Cheers,
Willy

2006-01-08 10:23:39

by be-news06

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??

Willy Tarreau <[email protected]> wrote:
> It's rather strange that 2.6 *eats* CPU apparently doing nothing !

it eats it in high interrupt load. And it is caused by the pty-ssh-tcp
output, so most likely those are eepro100 interrupts.

Gruss
Bernd

2006-01-08 10:54:09

by Willy Tarreau

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??

On Sun, Jan 08, 2006 at 11:23:37AM +0100, Bernd Eckenfels wrote:
> Willy Tarreau <[email protected]> wrote:
> > It's rather strange that 2.6 *eats* CPU apparently doing nothing !
>
> it eats it in high interrupt load.

*high* ? he never goes far beyond 1000/s !

> And it is caused by the pty-ssh-tcp output,

quite possibly, but I'd rather think it's more precisely related
to the ping-pong in the scheduler between grep, cut and ssh. I
had the same symptom with 'ls' in xterm with lots of files. It
took tens of seconds to list 2000 files while 'ls |cat' gave
the same result instantly.

I also have another example (2.6.15-rc5, dual athlon, logged in
via SSH) :
willy@pcw:willy$ time ls -l

real 0m0.150s
user 0m0.016s
sys 0m0.024s

Now if I start 4 processes in background :
willy@pcw:willy$ time ls -l

real 0m4.432s
user 0m0.028s
sys 0m0.008s

With 8 processes in background :
willy@pcw:willy$ time ls -l

real 0m49.817s
user 0m0.020s
sys 0m0.008s

willy@pcw:willy$ time ls -l | wc -l
1259

real 0m18.917s
user 0m0.016s
sys 0m0.012s

I think my case with 4 processes on a dual CPU ressembles Grant's case
with 2 processes on single CPU. The background processes are only ones
which eat CPU half of their time, which might sometimes match an I/O
bound process such as grep from a disk.

> so most likely those are eepro100 interrupts.

I don't think so.

> Gruss
> Bernd

Regards,
Willy

PS: please don't remove people in CC:

2006-01-08 11:05:48

by Grant Coady

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??

On Sun, 8 Jan 2006 10:57:41 +0100, Willy Tarreau <[email protected]> wrote:

> Could you please retest :
> - without the pipe (remove '| cut ...') to avoid inter-process
> communications

I thought it made a difference, then delay back again, I'll try
again tomorrow when I'm more awake.

>You should be able to find one simple pattern which makes the problem
>appear/disappear on 2.6. At least, 'cat x.log >/dev/null' should not
>take time or that time should be spent in I/O.

Yes, done that and the time went down by ~five seconds.

More tomorrow my time ;)

--
Thanks,
Grant.
http://bugsplatter.mine.nu/

2006-01-08 11:09:22

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??

On Sun, Jan 08, 2006 at 11:54:01AM +0100, Willy Tarreau wrote:
> > it eats it in high interrupt load.
> *high* ? he never goes far beyond 1000/s !

it is 10 times higher on 2.6 than 2.4 (I dont think this can be explained
by the timer, only.)

> quite possibly, but I'd rather think it's more precisely related
> to the ping-pong in the scheduler between grep, cut and ssh.

Could be, that would also send smaller buffers to tcp...

Gruss
Bernd
--
(OO) -- Bernd_Eckenfels@M?rscher_Strasse_8.76185Karlsruhe.de --
( .. ) ecki@{inka.de,linux.de,debian.org} http://www.eckes.org/
o--o 1024D/E383CD7E eckes@IRCNet v:+497211603874 f:+49721151516129
(O____O) When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl!

2006-01-08 11:16:44

by Willy Tarreau

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??

On Sun, Jan 08, 2006 at 12:09:19PM +0100, Bernd Eckenfels wrote:
> On Sun, Jan 08, 2006 at 11:54:01AM +0100, Willy Tarreau wrote:
> > > it eats it in high interrupt load.
> > *high* ? he never goes far beyond 1000/s !
>
> it is 10 times higher on 2.6 than 2.4 (I dont think this can be explained
> by the timer, only.)
>
> > quite possibly, but I'd rather think it's more precisely related
> > to the ping-pong in the scheduler between grep, cut and ssh.
>
> Could be, that would also send smaller buffers to tcp...

that's true, and that could explain the higher interrupt rate.

> Gruss
> Bernd

Cheers,
Willy

2006-01-08 11:18:09

by Grant Coady

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??

On Sun, 08 Jan 2006 11:23:37 +0100, [email protected] (Bernd Eckenfels) wrote:

>Willy Tarreau <[email protected]> wrote:
>> It's rather strange that 2.6 *eats* CPU apparently doing nothing !
>
>it eats it in high interrupt load. And it is caused by the pty-ssh-tcp
>output, so most likely those are eepro100 interrupts.

That would be true for either 2.4 or 2.6, no? Also it runs e100
driver, but...

2.4 dmesg:
Intel(R) PRO/100 Network Driver - version 2.3.43-k1
Copyright (c) 2004 Intel Corporation

e100: selftest OK.
e100: eth0: Intel(R) PRO/100 Network Connection
Hardware receive checksums enabled
cpu cycle saver enabled

2.6 dmesg:
[ 31.977945] e100: Intel(R) PRO/100 Network Driver, 3.4.14-k2-NAPI
[ 31.978007] e100: Copyright(c) 1999-2005 Intel Corporation
[ 32.002928] e100: eth0: e100_probe: addr 0xfd201000, irq 11, MAC addr 00:90:27:42:AA:77
[ 32.026992] e100: eth1: e100_probe: addr 0xfd200000, irq 12, MAC addr 00:90:27:58:32:D4
[ 32.186941] e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex

Are rx checksums not turned on in 2.6' e100 driver?
CPU is only pentium/mmx 233

--
Thanks,
Grant.
http://bugsplatter.mine.nu/

2006-01-08 12:04:30

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??


>grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log |cut -c-96
>
>real 0m1.671s
>user 0m0.550s
>sys 0m0.300s
>grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log |cut -c-96 >/dev/null
>
>real 0m0.510s
>user 0m0.420s
>sys 0m0.080s

Given that the first command is the first one accessing access_log at
all, then: the second time, access_log is already cached and
therefore can be accessed faster.


Jan Engelhardt
--
| Alphagate Systems, http://alphagate.hopto.org/
| jengelh's site, http://jengelh.hopto.org/

2006-01-08 18:21:59

by Octavio Alvarez Piza

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??

On Sun, 08 Jan 2006 22:05:43 +1100
Grant Coady <[email protected]> wrote:

> On Sun, 8 Jan 2006 10:57:41 +0100, Willy Tarreau <[email protected]> wrote:
>
> > Could you please retest :
> > - without the pipe (remove '| cut ...') to avoid inter-process
> > communications
>
> I thought it made a difference, then delay back again, I'll try
> again tomorrow when I'm more awake.
>
> >You should be able to find one simple pattern which makes the problem
> >appear/disappear on 2.6. At least, 'cat x.log >/dev/null' should not
> >take time or that time should be spent in I/O.
>
> Yes, done that and the time went down by ~five seconds.

Just make sure you first read all the file with cat (I'd retry all from
the initial tests) so you don't add hd-read time to the first command.

Octavio.

2006-01-08 19:20:47

by Grant Coady

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??

On Sun, 8 Jan 2006 13:04:22 +0100 (MET), Jan Engelhardt <[email protected]> wrote:

>
>>grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log |cut -c-96
>>
>>real 0m1.671s
>>user 0m0.550s
>>sys 0m0.300s
>>grant@deltree:~$ time grep -v 192\.168\. /var/log/apache/access_log |cut -c-96 >/dev/null
>>
>>real 0m0.510s
>>user 0m0.420s
>>sys 0m0.080s
>
>Given that the first command is the first one accessing access_log at
>all, then: the second time, access_log is already cached and
>therefore can be accessed faster.

I did repeat measurements to check for variation due to caching,
and that is not what is happening, we comparing going out over
ssh terminal to dumping output locally.

Grant.

2006-01-08 19:27:57

by Grant Coady

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??

On Sun, 8 Jan 2006 10:21:01 -0800, Octavio Alvarez Piza <[email protected]> wrote:

>On Sun, 08 Jan 2006 22:05:43 +1100
>Grant Coady <[email protected]> wrote:
>
>> On Sun, 8 Jan 2006 10:57:41 +0100, Willy Tarreau <[email protected]> wrote:
>>
>> > Could you please retest :
>> > - without the pipe (remove '| cut ...') to avoid inter-process
>> > communications
>>
>> I thought it made a difference, then delay back again, I'll try
>> again tomorrow when I'm more awake.
>>
>> >You should be able to find one simple pattern which makes the problem
>> >appear/disappear on 2.6. At least, 'cat x.log >/dev/null' should not
>> >take time or that time should be spent in I/O.
>>
>> Yes, done that and the time went down by ~five seconds.
>
>Just make sure you first read all the file with cat (I'd retry all from
>the initial tests) so you don't add hd-read time to the first command.

I do notice occasional pauses (just a slight jerkiness) in output
from log file, perhaps when it is appended to. As I wrote earlier,
this one liner is something I do fairly often to see what is hitting
the web server, the 'cut -c -96' is because I run 96 character
wide ssh terminals.

Grant.

2006-01-09 02:37:53

by Jesse Brandeburg

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??

On 1/8/06, Grant Coady <[email protected]> wrote:
> On Sun, 08 Jan 2006 11:23:37 +0100, [email protected] (Bernd Eckenfels) wrote:
>
> >Willy Tarreau <[email protected]> wrote:
> >> It's rather strange that 2.6 *eats* CPU apparently doing nothing !
> >
> >it eats it in high interrupt load. And it is caused by the pty-ssh-tcp
> >output, so most likely those are eepro100 interrupts.
>
> That would be true for either 2.4 or 2.6, no? Also it runs e100
> driver, but...
>
> 2.4 dmesg:
> Intel(R) PRO/100 Network Driver - version 2.3.43-k1
> Copyright (c) 2004 Intel Corporation
>
> e100: selftest OK.
> e100: eth0: Intel(R) PRO/100 Network Connection
> Hardware receive checksums enabled
> cpu cycle saver enabled
>
> 2.6 dmesg:
> [ 31.977945] e100: Intel(R) PRO/100 Network Driver, 3.4.14-k2-NAPI
> [ 31.978007] e100: Copyright(c) 1999-2005 Intel Corporation
> [ 32.002928] e100: eth0: e100_probe: addr 0xfd201000, irq 11, MAC addr 00:90:27:42:AA:77
> [ 32.026992] e100: eth1: e100_probe: addr 0xfd200000, irq 12, MAC addr 00:90:27:58:32:D4
> [ 32.186941] e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
>
> Are rx checksums not turned on in 2.6' e100 driver?
> CPU is only pentium/mmx 233

Hey Grant, to answer your question, checksums are not offloaded with
the current e100 driver but that really shouldn't make that much of a
difference. I'm actually going to go with interrupt load due to e100
being at least related to the problem.

BTW I get access denied when hitting
http://bugsplatter.mine.nu/test/boxen/deltree/

The netdev-2.6 git tree currently has a driver that supports microcode
loading for your rev 8 PRO/100 and that microcode may help your
interrupt load due to e100. however, it may already be loading.
Also, what do you have HZ set to? (250 is default in 2.6, 1000 in 2.4)
so you could try running your 2.6 kernel with HZ=1000

while you're running your test you could try (if you have sysstat)
sar -I <e100 interrupt> 1 10

or a simpler version, 10 loops of cat /proc/interrupts; sleep 1;

Lets see if its e100,
Jesse

2006-01-09 02:46:11

by Lee Revell

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??

On Sun, 2006-01-08 at 18:37 -0800, Jesse Brandeburg wrote:
> Also, what do you have HZ set to? (250 is default in 2.6, 1000 in 2.4)
> so you could try running your 2.6 kernel with HZ=1000

s/1000/100/

2006-01-09 02:58:53

by Grant Coady

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??

On Sun, 8 Jan 2006 18:37:52 -0800, Jesse Brandeburg <[email protected]> wrote:

>On 1/8/06, Grant Coady <[email protected]> wrote:
>> On Sun, 08 Jan 2006 11:23:37 +0100, [email protected] (Bernd Eckenfels) wrote:
>>
>> >Willy Tarreau <[email protected]> wrote:
>> >> It's rather strange that 2.6 *eats* CPU apparently doing nothing !
>> >
>> >it eats it in high interrupt load. And it is caused by the pty-ssh-tcp
>> >output, so most likely those are eepro100 interrupts.
>>
>> That would be true for either 2.4 or 2.6, no? Also it runs e100
>> driver, but...
>>
>> 2.4 dmesg:
>> Intel(R) PRO/100 Network Driver - version 2.3.43-k1
>> Copyright (c) 2004 Intel Corporation
>>
>> e100: selftest OK.
>> e100: eth0: Intel(R) PRO/100 Network Connection
>> Hardware receive checksums enabled
>> cpu cycle saver enabled
>>
>> 2.6 dmesg:
>> [ 31.977945] e100: Intel(R) PRO/100 Network Driver, 3.4.14-k2-NAPI
>> [ 31.978007] e100: Copyright(c) 1999-2005 Intel Corporation
>> [ 32.002928] e100: eth0: e100_probe: addr 0xfd201000, irq 11, MAC addr 00:90:27:42:AA:77
>> [ 32.026992] e100: eth1: e100_probe: addr 0xfd200000, irq 12, MAC addr 00:90:27:58:32:D4
>> [ 32.186941] e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
>>
>> Are rx checksums not turned on in 2.6' e100 driver?
>> CPU is only pentium/mmx 233
>
>Hey Grant, to answer your question, checksums are not offloaded with
>the current e100 driver but that really shouldn't make that much of a
>difference. I'm actually going to go with interrupt load due to e100
>being at least related to the problem.

Okay, that may make a difference with a slow CPU.

>BTW I get access denied when hitting
>http://bugsplatter.mine.nu/test/boxen/deltree/

OMG! Not the red screen? Hmmm, collateral damage ;) Just turned it
off, unless you're masquerading as a web crawler :p

>The netdev-2.6 git tree currently has a driver that supports microcode
>loading for your rev 8 PRO/100 and that microcode may help your
>interrupt load due to e100. however, it may already be loading.

>Also, what do you have HZ set to? (250 is default in 2.6, 1000 in 2.4)
>so you could try running your 2.6 kernel with HZ=1000

Running it with 100Hz, isn't 2.4 == 100Hz? I can try 1000Hz,
but not for some hours now, other stuff on.

>while you're running your test you could try (if you have sysstat)
>sar -I <e100 interrupt> 1 10
>
>or a simpler version, 10 loops of cat /proc/interrupts; sleep 1;
>
>Lets see if its e100,

Yes, lets. More later.

Cheers,
Grant.

2006-01-09 06:56:47

by Grant Coady

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??

On Sun, 8 Jan 2006 18:37:52 -0800, Jesse Brandeburg <[email protected]> wrote:

Added Willy to Cc: -- I see a problem with 2 e100 NICs' accounting
under 2.4.32... See near and of this report.

>> Are rx checksums not turned on in 2.6' e100 driver?
>> CPU is only pentium/mmx 233
>
>Hey Grant, to answer your question, checksums are not offloaded with
>the current e100 driver but that really shouldn't make that much of a
>difference. I'm actually going to go with interrupt load due to e100
>being at least related to the problem.
>
>The netdev-2.6 git tree currently has a driver that supports microcode
>loading for your rev 8 PRO/100 and that microcode may help your
>interrupt load due to e100. however, it may already be loading.
>Also, what do you have HZ set to? (250 is default in 2.6, 1000 in 2.4)
>so you could try running your 2.6 kernel with HZ=1000
>
>while you're running your test you could try (if you have sysstat)
>sar -I <e100 interrupt> 1 10
>
>or a simpler version, 10 loops of cat /proc/interrupts; sleep 1;

No sar, now I'm running a separate link from the other e100 eth1
from deltree to another box so measurement and test traffic are
separated. I do everything via ssh so I can copy/paste from
terminals ;)

This run is 2.6.15a, with 100Hz and voluntary preempt:

grant@deltree:~$ cat /proc/interrupts
CPU0
0: 106221 XT-PIC timer
1: 8 XT-PIC i8042
2: 0 XT-PIC cascade
3: 11443 XT-PIC eth2 <<== ADSL modem
8: 1 XT-PIC rtc
11: 20402 XT-PIC eth0 <<== localnet
12: 21860 XT-PIC eth1 <<== spare -> test
14: 3260 XT-PIC ide0
NMI: 0
ERR: 0
grant@deltree:~$

grant@deltree:~$ while true; do grep eth1 /proc/interrupts; sleep 1; done
12: 26357 XT-PIC eth1
12: 26357 XT-PIC eth1 \
12: 26573 XT-PIC eth1 |
12: 27039 XT-PIC eth1 > time grep -v 192\.168\. \
12: 27514 XT-PIC eth1 | /var/log/apache/access_log \
12: 28320 XT-PIC eth1 | | cut -c-96
12: 29090 XT-PIC eth1 | real 0m6.205s
12: 30017 XT-PIC eth1 | user 0m0.620s
12: 30434 XT-PIC eth1 / sys 0m0.730s
12: 30434 XT-PIC eth1

grant@deltree:~$ while true; do grep eth1 /proc/interrupts; sleep 1; done
12: 30650 XT-PIC eth1
12: 30651 XT-PIC eth1 \
12: 30657 XT-PIC eth1 |
12: 30661 XT-PIC eth1 > time cat /var/log/apache/access_log
12: 30936 XT-PIC eth1 | real 0m2.383s
12: 31343 XT-PIC eth1 | user 0m0.010s
12: 31593 XT-PIC eth1 / sys 0m0.480s
12: 31593 XT-PIC eth1

This run is 2.6.15b, with 1000Hz and voluntary preempt:

grant@deltree:~$ while true; do grep eth1 /proc/interrupts; sleep 1; done
12: 4386 XT-PIC eth1
12: 4386 XT-PIC eth1 \
12: 4427 XT-PIC eth1 |
12: 4904 XT-PIC eth1 > time grep -v 192\.168\. \
12: 5350 XT-PIC eth1 | /var/log/apache/access_log \
12: 6065 XT-PIC eth1 | | cut -c-96
12: 6906 XT-PIC eth1 | real 0m6.649s
12: 7693 XT-PIC eth1 | user 0m0.841s
12: 8450 XT-PIC eth1 / sys 0m1.047s
12: 8548 XT-PIC eth1
12: 8548 XT-PIC eth1

ran above a few times to gauge repeatability, variation ~200ms in real.

grant@deltree:~$ while true; do grep eth1 /proc/interrupts; sleep 1; done
12: 18347 XT-PIC eth1
12: 18348 XT-PIC eth1 \
12: 18417 XT-PIC eth1 |
12: 18794 XT-PIC eth1 > time cat /var/log/apache/access_log
12: 19181 XT-PIC eth1 | real 0m2.573s
12: 19283 XT-PIC eth1 / user 0m0.005s
12: 19284 XT-PIC eth1 sys 0m0.547s

No joy with 1000Hz, turn off preempt?

This run is 2.6.15c, with 1000Hz and no preempt:

grant@deltree:~$ while true; do grep eth1 /proc/interrupts; sleep 1; done
12: 4400 XT-PIC eth1
12: 4400 XT-PIC eth1 \
12: 4614 XT-PIC eth1 |
12: 5053 XT-PIC eth1 > time grep -v 192\.168\. \
12: 5495 XT-PIC eth1 | /var/log/apache/access_log \
12: 6686 XT-PIC eth1 | | cut -c-96
12: 7394 XT-PIC eth1 | real 0m6.696s
12: 8258 XT-PIC eth1 | user 0m0.841s
12: 8456 XT-PIC eth1 / sys 0m0.994s
12: 8457 XT-PIC eth1
12: 8457 XT-PIC eth1

grant@deltree:~$ while true; do grep eth1 /proc/interrupts; sleep 1; done
12: 8544 XT-PIC eth1 \
12: 8814 XT-PIC eth1 | time cat /var/log/apache/access_log
12: 9485 XT-PIC eth1 > real 0m2.511s
12: 9486 XT-PIC eth1 | user 0m0.004s
12: 9486 XT-PIC eth1 / sys 0m0.529s

Still no joy?

Confirm 2.4 timings, this is with 2.4.32-hf32.1:

grant@deltree:~$ while true; do egrep 'eth0|eth1' /proc/interrupts; sleep 1; done
11: 6744 XT-PIC eth0
12: 4 XT-PIC eth1
11: 6746 XT-PIC eth0 \
12: 4 XT-PIC eth1 |
11: 7178 XT-PIC eth0 > time grep -v 192\.168\. \
12: 4 XT-PIC eth1 | /var/log/apache/access_log \
11: 7552 XT-PIC eth0 | | cut -c-96
12: 4 XT-PIC eth1 / real 0m1.565s
11: 7554 XT-PIC eth0 user 0m0.510s
12: 4 XT-PIC eth1 sys 0m0.340s

grant@deltree:~$ while true; do egrep 'eth0|eth1' /proc/interrupts; sleep 1; done
11: 9136 XT-PIC eth0 \
12: 4 XT-PIC eth1 |
11: 9516 XT-PIC eth0 > time cat /var/log/apache/access_log
12: 4 XT-PIC eth1 | real 0m1.946s
11: 10146 XT-PIC eth0 | user 0m0.000s
12: 4 XT-PIC eth1 / sys 0m0.200s
11: 10321 XT-PIC eth0
12: 4 XT-PIC eth1

Odd, with 2.4, the two e100 NICs are not being accounted properly:

root@deltree:~# ifconfig
eth0 Link encap:Ethernet HWaddr 00:90:27:42:AA:77
inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:4840 errors:0 dropped:0 overruns:0 frame:0
TX packets:8825 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:341812 (333.8 Kb) TX bytes:9931009 (9.4 Mb)
Interrupt:11 Base address:0xdcc0 Memory:fd201000-fd201038

eth1 Link encap:Ethernet HWaddr 00:90:27:58:32:D4
inet addr:192.168.2.1 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Interrupt:12 Base address:0xdc80 Memory:fd200000-fd200038

dmesg says:
Intel(R) PRO/100 Network Driver - version 2.3.43-k1
Copyright (c) 2004 Intel Corporation

e100: selftest OK.
e100: eth0: Intel(R) PRO/100 Network Connection
Hardware receive checksums enabled
cpu cycle saver enabled

e100: selftest OK.
e100: eth1: Intel(R) PRO/100 Network Connection
Hardware receive checksums enabled
cpu cycle saver enabled

smc-ultra.c:v2.02 2/3/98 Donald Becker ([email protected])
eth2: SMC Ultra at 0x280, 00 00 C0 5D 46 B5,assigned IRQ 3 memory 0xd0000-0xd3fff.
e100: eth0 NIC Link is Up 100 Mbps Full duplex
e100: eth1 NIC Link is Up 100 Mbps Full duplex

Cheers,
Grant.

2006-02-22 19:27:45

by Enrico Weigelt

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??


BTW: I had a similar problem after switching from 2.6.8.1
to 2.6.15 ... the whole machine (athlon-xp) behaves extremly
slow and not even playing mp3 worked without hangs.

So I switched back to old 2.6.8.1 for now ...


cu
--
---------------------------------------------------------------------
Enrico Weigelt == metux IT service

phone: +49 36207 519931 www: http://www.metux.de/
fax: +49 36207 519932 email: [email protected]
cellphone: +49 174 7066481
---------------------------------------------------------------------
-- DSL ab 0 Euro. -- statische IP -- UUCP -- Hosting -- Webshops --
---------------------------------------------------------------------

2006-02-22 23:18:01

by Adrian Bunk

[permalink] [raw]
Subject: Re: Why is 2.4.32 four times faster than 2.6.14.6??

On Wed, Feb 22, 2006 at 08:27:07PM +0100, Enrico Weigelt wrote:
>
> BTW: I had a similar problem after switching from 2.6.8.1
> to 2.6.15 ... the whole machine (athlon-xp) behaves extremly
> slow and not even playing mp3 worked without hangs.
>
> So I switched back to old 2.6.8.1 for now ...

A better solution would be if it could be determined what your problem
is.

Could you try 2.6.16-rc4?

If the problem is still present, please open a bug report at the kernel
Bugzilla [1] with an explanation of your problem, your .config with
2.6.16-rc4 and the output of "dmesg -s 1000000" in both 2.6.8.1 and
2.6.16-rc4.

We should fix regressions like yours, but this requires bug reports
notifying us about problems.

> cu

TIA
Adrian

[1] http://bugzilla.kernel.org/

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed