2002-09-18 14:41:34

by Con Kolivas

[permalink] [raw]
Subject: [BENCHMARK] contest results for 2.5.36

Here are the latest results with 2.5.36 compared with 2.5.34

No Load:
Kernel Time CPU
2.4.19 68.14 99%
2.4.20-pre7 68.11 99%
2.5.34 69.88 99%
2.4.19-ck7 68.40 98%
2.4.19-ck7-rmap 68.73 99%
2.4.19-cc 68.37 99%
2.5.36 69.58 99%

Process Load:
Kernel Time CPU
2.4.19 81.10 80%
2.4.20-pre7 81.92 80%
2.5.34 71.39 94%
2.5.36 71.80 94%

Mem Load:
Kernel Time CPU
2.4.19 92.49 77%
2.4.20-pre7 92.25 77%
2.5.34 138.05 54%
2.5.36 132.45 56%

IO Halfmem Load:
Kernel Time CPU
2.4.19 99.41 70%
2.4.20-pre7 99.42 71%
2.5.34 74.31 93%
2.5.36 94.82 76%

IO Fullmem Load:
Kernel Time CPU
2.4.19 173.00 41%
2.4.20-pre7 146.38 48%
2.5.34 74.00 94%
2.5.36 87.57 81%


The full log for 2.5.34 is:

noload Time: 69.88 CPU: 99% Major Faults: 247874 Minor Faults: 295941
process_load Time: 71.39 CPU: 94% Major Faults: 204811 Minor Faults: 256001
io_halfmem Time: 74.31 CPU: 93% Major Faults: 204019 Minor Faults: 255284
Was writing number 4 of a 112Mb sized io_load file after 76 seconds
io_fullmem Time: 74.00 CPU: 94% Major Faults: 204019 Minor Faults: 255289
Was writing number 2 of a 224Mb sized io_load file after 98 seconds
mem_load Time: 138.05 CPU: 54% Major Faults: 204107 Minor Faults: 255695


and for 2.5.36 is:

noload Time: 69.58 CPU: 99% Major Faults: 242825 Minor Faults: 292307
process_load Time: 71.80 CPU: 94% Major Faults: 205009 Minor Faults: 256150
io_halfmem Time: 94.82 CPU: 76% Major Faults: 204019 Minor Faults: 255214
Was writing number 6 of a 112Mb sized io_load file after 104 seconds
io_fullmem Time: 87.57 CPU: 81% Major Faults: 204019 Minor Faults: 255312
Was writing number 3 of a 224Mb sized io_load file after 119 seconds
mem_load Time: 132.45 CPU: 56% Major Faults: 204115 Minor Faults: 255234


As you can see, going from 2.5.34 to 2.5.36 has had a minor improvement in
response under memory loading, but a drop in response under IO load. The log
shows more was written by the IO load during benchmarking in 2.5.36 The values
are different from the original 2.5.34 results I posted as there was a problem
with the potential for loads overlapping, and doing the memory load before
others made for heavy swapping afterwards.

contest has been upgraded to v0.34 with numerous small changes and a few fixes.
It can be downloaded here:

http://contest.kolivas.net

Comments?
Con.


2002-09-18 16:37:46

by Andrew Morton

[permalink] [raw]
Subject: Re: [BENCHMARK] contest results for 2.5.36

Con Kolivas wrote:
>
> Here are the latest results with 2.5.36 compared with 2.5.34
>
> No Load:
> Kernel Time CPU
> 2.4.19 68.14 99%
> 2.4.20-pre7 68.11 99%
> 2.5.34 69.88 99%
> 2.4.19-ck7 68.40 98%
> 2.4.19-ck7-rmap 68.73 99%
> 2.4.19-cc 68.37 99%
> 2.5.36 69.58 99%

page_add/remove_rmap. Be interesting to test an Alan kernel too.

> Process Load:
> Kernel Time CPU
> 2.4.19 81.10 80%
> 2.4.20-pre7 81.92 80%
> 2.5.34 71.39 94%
> 2.5.36 71.80 94%

Ingo ;)

> Mem Load:
> Kernel Time CPU
> 2.4.19 92.49 77%
> 2.4.20-pre7 92.25 77%
> 2.5.34 138.05 54%
> 2.5.36 132.45 56%

The swapping fix in -mm1 may help here.

> IO Halfmem Load:
> Kernel Time CPU
> 2.4.19 99.41 70%
> 2.4.20-pre7 99.42 71%
> 2.5.34 74.31 93%
> 2.5.36 94.82 76%

Don't know. Was this with IO load against the same disk as
the one on which the kernel was being compiled, or a different
one? This is a very important factor - one way we're testing the
VM and the other way we're testing the IO scheduler.

> IO Fullmem Load:
> Kernel Time CPU
> 2.4.19 173.00 41%
> 2.4.20-pre7 146.38 48%
> 2.5.34 74.00 94%
> 2.5.36 87.57 81%

If the IO load was against the same disk 2.5 _should_ have sucked,
due to the writes-starves-reads problem which we're working on. So
I assume it was against a different disk. In which case 2.5 should not
have shown these improvements, because all the fixes for this are still
in -mm. hm. Helpful, aren't I?

2002-09-18 16:48:09

by Rik van Riel

[permalink] [raw]
Subject: Re: [BENCHMARK] contest results for 2.5.36

On Wed, 18 Sep 2002, Andrew Morton wrote:

> > No Load:
> > Kernel Time CPU
> > 2.4.19 68.14 99%
> > 2.4.20-pre7 68.11 99%
> > 2.5.34 69.88 99%
> > 2.4.19-ck7 68.40 98%
> > 2.4.19-ck7-rmap 68.73 99%
> > 2.4.19-cc 68.37 99%
> > 2.5.36 69.58 99%
>
> page_add/remove_rmap. Be interesting to test an Alan kernel too.

Yes, but why are page_add/remove_rmap slower in 2.5 than in
Con's -rmap kernel ? ;)

> > Process Load:
> > Kernel Time CPU
> > 2.4.19 81.10 80%
> > 2.4.20-pre7 81.92 80%
> > 2.5.34 71.39 94%
> > 2.5.36 71.80 94%
>
> Ingo ;)

Looks like an unfair sched_yield, the process load is supposed
to get 20% of the CPU (one process in process_load vs. make -j4).

For the other results I agree with you, furter VM improvements in
2.5 will probably fix those.

cheers,

Rik
--
Spamtrap of the month: [email protected]

http://www.surriel.com/ http://distro.conectiva.com/

2002-09-18 21:12:50

by Con Kolivas

[permalink] [raw]
Subject: Re: [BENCHMARK] contest results for 2.5.36

Quoting Andrew Morton <[email protected]>:

[snip..]

> page_add/remove_rmap. Be interesting to test an Alan kernel too.

Just tell me which ones to test and I'll happily throw them in too.

> > Process Load:
> > Kernel Time CPU
> > 2.4.19 81.10 80%
> > 2.4.20-pre7 81.92 80%
> > 2.5.34 71.39 94%
> > 2.5.36 71.80 94%
>
> Ingo ;)
>
> > Mem Load:
> > Kernel Time CPU
> > 2.4.19 92.49 77%
> > 2.4.20-pre7 92.25 77%
> > 2.5.34 138.05 54%
> > 2.5.36 132.45 56%
>
> The swapping fix in -mm1 may help here.
>
> > IO Halfmem Load:
> > Kernel Time CPU
> > 2.4.19 99.41 70%
> > 2.4.20-pre7 99.42 71%
> > 2.5.34 74.31 93%
> > 2.5.36 94.82 76%
>
> Don't know. Was this with IO load against the same disk as
> the one on which the kernel was being compiled, or a different
> one? This is a very important factor - one way we're testing the
> VM and the other way we're testing the IO scheduler.

My laptop which does all the testing has only one hard disk

> > IO Fullmem Load:
> > Kernel Time CPU
> > 2.4.19 173.00 41%
> > 2.4.20-pre7 146.38 48%
> > 2.5.34 74.00 94%
> > 2.5.36 87.57 81%
>
> If the IO load was against the same disk 2.5 _should_ have sucked,
> due to the writes-starves-reads problem which we're working on. So
> I assume it was against a different disk. In which case 2.5 should not
> have shown these improvements, because all the fixes for this are still
> in -mm. hm. Helpful, aren't I?

It's the same disk. These are the correct values after I've fixed all known
problems in contest. I need someone else to try contest with a different disk.
Helpful? This is all new so it will take a while for _anyone_ to understand
exactly what's going on I'm sure. Since we haven't had incremental benchmarks
till now, who knows what it was that made IO full mem drop from 146 to 74 in the
first place?

Con.

2002-09-18 21:35:44

by Andrew Morton

[permalink] [raw]
Subject: Re: [BENCHMARK] contest results for 2.5.36

Con Kolivas wrote:
>
> Quoting Andrew Morton <[email protected]>:
>
> [snip..]
>
> > page_add/remove_rmap. Be interesting to test an Alan kernel too.
>
> Just tell me which ones to test and I'll happily throw them in too.

That's OK - you're already doing -rmap. I just can't read.

> ...
> > If the IO load was against the same disk 2.5 _should_ have sucked,
> > due to the writes-starves-reads problem which we're working on. So
> > I assume it was against a different disk. In which case 2.5 should not
> > have shown these improvements, because all the fixes for this are still
> > in -mm. hm. Helpful, aren't I?
>
> It's the same disk. These are the correct values after I've fixed all known
> problems in contest. I need someone else to try contest with a different disk.
> Helpful? This is all new so it will take a while for _anyone_ to understand
> exactly what's going on I'm sure. Since we haven't had incremental benchmarks
> till now, who knows what it was that made IO full mem drop from 146 to 74 in the
> first place?

Strange. 2.5 should have done badly. There are many variables...

We're on top of the VM latency problems now. Jens is moving ahead
with the deadline scheduler which appears to provide nice linear
tunability of the read-versus-write policy. Once we sort out the
accidental writes-starve-read problem we'll do well at this.

2002-09-18 23:50:21

by Jonathan Lundell

[permalink] [raw]
Subject: NMI watchdog stability

Back in March 2001, Keith Owens wrote and Andrew Morton replied:
At 4:47pm -0700 9/18/02, Jonathan Lundell wrote:
> >
>> Am I the only person who is annoyed that nmi watchdog is now off by
>> default and the only way to activate it is by a boot parameter? You
>> cannot even patch the kernel to build a version that has nmi watchdog
>> on because the startup code runs out of the __setup routine, no boot
>> parameter, no watchdog.
>
>It was causing SMP boxes to crash mysteriously after
>several hours or days. Quite a lot of them. Nobody
>was able to explain why, so it was turned off.

This was in the context of 2.4.2-ac21. More of the thread,with no
conclusive result, can be found at
http://www.uwsg.iu.edu/hypermail/linux/kernel/0103.2/0906.html

Was there any resolution? Was the problem real, did it get fixed, and
is it safe to turn on the local-APIC-based NMI ticker on a 2.4.9 SMP
system? (I'm stuck with 2.4.9, actually Red Hat's 2.4.9-31, for
external reasons.) What was the nature of the mysterious crashes?

Thanks.
--
/Jonathan Lundell.

2002-09-19 08:00:36

by Daniel Phillips

[permalink] [raw]
Subject: Re: [BENCHMARK] contest results for 2.5.36

On Wednesday 18 September 2002 18:50, Rik van Riel wrote:
> On Wed, 18 Sep 2002, Andrew Morton wrote:
>
> > > No Load:
> > > Kernel Time CPU
> > > 2.4.19 68.14 99%
> > > 2.4.20-pre7 68.11 99%
> > > 2.5.34 69.88 99%
> > > 2.4.19-ck7 68.40 98%
> > > 2.4.19-ck7-rmap 68.73 99%
> > > 2.4.19-cc 68.37 99%
> > > 2.5.36 69.58 99%
> >
> > page_add/remove_rmap. Be interesting to test an Alan kernel too.
>
> Yes, but why are page_add/remove_rmap slower in 2.5 than in
> Con's -rmap kernel ? ;)

I don't know what you guys are going on about, these differences are
getting close to statistically insignificant.

--
Daniel

2002-09-19 08:09:18

by Con Kolivas

[permalink] [raw]
Subject: Re: [BENCHMARK] contest results for 2.5.36

Quoting Daniel Phillips <[email protected]>:

> On Wednesday 18 September 2002 18:50, Rik van Riel wrote:
> > On Wed, 18 Sep 2002, Andrew Morton wrote:
> >
> > > > No Load:
> > > > Kernel Time CPU
> > > > 2.4.19 68.14 99%
> > > > 2.4.20-pre7 68.11 99%
> > > > 2.5.34 69.88 99%
> > > > 2.4.19-ck7 68.40 98%
> > > > 2.4.19-ck7-rmap 68.73 99%
> > > > 2.4.19-cc 68.37 99%
> > > > 2.5.36 69.58 99%
> > >
> > > page_add/remove_rmap. Be interesting to test an Alan kernel too.
> >
> > Yes, but why are page_add/remove_rmap slower in 2.5 than in
> > Con's -rmap kernel ? ;)
>
> I don't know what you guys are going on about, these differences are
> getting close to statistically insignificant.

These ones definitely are insignificant. I've found the limit with repeat
measurements about +/- 1%

Con.

2002-09-19 12:02:38

by John Levon

[permalink] [raw]
Subject: Re: NMI watchdog stability

On Wed, Sep 18, 2002 at 04:55:13PM -0700, Jonathan Lundell wrote:

> >It was causing SMP boxes to crash mysteriously after
> >several hours or days. Quite a lot of them. Nobody
> >was able to explain why, so it was turned off.
>
> This was in the context of 2.4.2-ac21. More of the thread,with no
> conclusive result, can be found at
> http://www.uwsg.iu.edu/hypermail/linux/kernel/0103.2/0906.html
>
> Was there any resolution? Was the problem real, did it get fixed, and

Some machines corrupt %ecx on the way back from an NMI. Perhaps that was
the factor all the people with problems saw.

regards
john

--
"Please crack down on the Chinaman's friends and Hitler's commander. Mother is
the best bet and don't let Satan draw you too fast. A boy has never wept ...
nor dashed a thousand kim. Did you hear me?"
- Dutch Schultz

2002-09-19 13:13:25

by Richard B. Johnson

[permalink] [raw]
Subject: Re: NMI watchdog stability

On Thu, 19 Sep 2002, John Levon wrote:

> On Wed, Sep 18, 2002 at 04:55:13PM -0700, Jonathan Lundell wrote:
>
> > >It was causing SMP boxes to crash mysteriously after
> > >several hours or days. Quite a lot of them. Nobody
> > >was able to explain why, so it was turned off.
> >
> > This was in the context of 2.4.2-ac21. More of the thread,with no
> > conclusive result, can be found at
> > http://www.uwsg.iu.edu/hypermail/linux/kernel/0103.2/0906.html
> >
> > Was there any resolution? Was the problem real, did it get fixed, and
>
> Some machines corrupt %ecx on the way back from an NMI. Perhaps that was
> the factor all the people with problems saw.
>
> regards
> john
>

How is this? The handler saves/restores register values. The fact that
some interrupt occurred has no effect upon the contents of general
registers, only selectors (segments), EIP, ESP, and the return address
on the stack. If ECX was being destroyed, it was software that did it,
not some "machine". What kernel version destroys ECX upon NMI?


Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.