2008-01-25 12:00:26

by Asbjorn Sannes

[permalink] [raw]
Subject: Unpredictable performance

Hi,

I am experiencing unpredictable results with the following test
without other processes running (exception is udev, I believe):
cd /usr/src/test
tar -jxf ../linux-2.6.22.12
cp ../working-config linux-2.6.22.12/.config
cd linux-2.6.22.12
make oldconfig
time make -j3 > /dev/null # This is what I note down as a "test" result
cd /usr/src ; umount /usr/src/test ; mkfs.ext3 /dev/cc/test
and then reboot

The kernel is booted with the parameter mem=81920000

For 2.6.23.14 the results vary from (real time) 33m30.551s to 45m32.703s
(30 runs)
For 2.6.23.14 with nop i/o scheduler from 29m8.827s to 55m36.744s (24 runs)
For 2.6.22.14 also varied a lot.. but, lost results :(
For 2.6.20.21 only vary from 34m32.054s to 38m1.928s (10 runs)

Any idea of what can cause this? I have tried to make the runs as equal
as possible, rebooting between each run.. i/o scheduler is cfq as default.

sys and user time only varies a couple of seconds.. and the order of
when it is "fast" and when it is "slow" is completly random, but it
seems that the results are mostly concentrated around the mean.

--
Asbj?rn Sannes


2008-01-25 14:00:49

by Nick Piggin

[permalink] [raw]
Subject: Re: Unpredictable performance

On Friday 25 January 2008 22:32, Asbjorn Sannes wrote:
> Hi,
>
> I am experiencing unpredictable results with the following test
> without other processes running (exception is udev, I believe):
> cd /usr/src/test
> tar -jxf ../linux-2.6.22.12
> cp ../working-config linux-2.6.22.12/.config
> cd linux-2.6.22.12
> make oldconfig
> time make -j3 > /dev/null # This is what I note down as a "test" result
> cd /usr/src ; umount /usr/src/test ; mkfs.ext3 /dev/cc/test
> and then reboot
>
> The kernel is booted with the parameter mem=81920000
>
> For 2.6.23.14 the results vary from (real time) 33m30.551s to 45m32.703s
> (30 runs)
> For 2.6.23.14 with nop i/o scheduler from 29m8.827s to 55m36.744s (24 runs)
> For 2.6.22.14 also varied a lot.. but, lost results :(
> For 2.6.20.21 only vary from 34m32.054s to 38m1.928s (10 runs)
>
> Any idea of what can cause this? I have tried to make the runs as equal
> as possible, rebooting between each run.. i/o scheduler is cfq as default.
>
> sys and user time only varies a couple of seconds.. and the order of
> when it is "fast" and when it is "slow" is completly random, but it
> seems that the results are mostly concentrated around the mean.

Hmm, lots of things could cause it. With such big variations in
elapsed time, and small variations on CPU time, I guess the fs/IO
layers are the prime suspects, although it could also involve the
VM if you are doing a fair amount of page reclaim.

Can you boot with enough memory such that it never enters page
reclaim? `grep scan /proc/vmstat` to check.

Otherwise you could mount the working directory as tmpfs to
eliminate IO.

bisecting it down to a single patch would be really helpful if you
can spare the time.

Thanks,
Nick

2008-01-25 14:58:34

by Asbjørn Sannes

[permalink] [raw]
Subject: Re: Unpredictable performance

Nick Piggin wrote:
> On Friday 25 January 2008 22:32, Asbjorn Sannes wrote:
>
>> Hi,
>>
>> I am experiencing unpredictable results with the following test
>> without other processes running (exception is udev, I believe):
>> cd /usr/src/test
>> tar -jxf ../linux-2.6.22.12
>> cp ../working-config linux-2.6.22.12/.config
>> cd linux-2.6.22.12
>> make oldconfig
>> time make -j3 > /dev/null # This is what I note down as a "test" result
>> cd /usr/src ; umount /usr/src/test ; mkfs.ext3 /dev/cc/test
>> and then reboot
>>
>> The kernel is booted with the parameter mem=81920000
>>
>> For 2.6.23.14 the results vary from (real time) 33m30.551s to 45m32.703s
>> (30 runs)
>> For 2.6.23.14 with nop i/o scheduler from 29m8.827s to 55m36.744s (24 runs)
>> For 2.6.22.14 also varied a lot.. but, lost results :(
>> For 2.6.20.21 only vary from 34m32.054s to 38m1.928s (10 runs)
>>
>> Any idea of what can cause this? I have tried to make the runs as equal
>> as possible, rebooting between each run.. i/o scheduler is cfq as default.
>>
>> sys and user time only varies a couple of seconds.. and the order of
>> when it is "fast" and when it is "slow" is completly random, but it
>> seems that the results are mostly concentrated around the mean.
>>
>
> Hmm, lots of things could cause it. With such big variations in
> elapsed time, and small variations on CPU time, I guess the fs/IO
> layers are the prime suspects, although it could also involve the
> VM if you are doing a fair amount of page reclaim.
>
> Can you boot with enough memory such that it never enters page
> reclaim? `grep scan /proc/vmstat` to check.
>
> Otherwise you could mount the working directory as tmpfs to
> eliminate IO.
>
> bisecting it down to a single patch would be really helpful if you
> can spare the time.
>
I'm going to run some tests without limiting the memory to 80 megabytes
(so that it is 2 gigabyte) and see how much it varies then, but iff I
recall correctly it did not vary much. I'll reply to this e-mail with
the results.

I can do some bisecting next week and see if I find any, but it will
probably take a lot of time considering that I need to do enough runs..
how much should this vary anyways? The kernel is compiled as an UP
kernel and there is nothing running in parallel with it.. it is
basically a .sh script running on boot appending the output of time to a
file .. formatting and rebooting.

--
Asbj?rn Sannes

2008-01-25 15:02:51

by Asbjørn Sannes

[permalink] [raw]
Subject: Re: Unpredictable performance

Asbj?rn Sannes wrote:
> Nick Piggin wrote:
>
>> On Friday 25 January 2008 22:32, Asbjorn Sannes wrote:
>>
>>
>>> Hi,
>>>
>>> I am experiencing unpredictable results with the following test
>>> without other processes running (exception is udev, I believe):
>>> cd /usr/src/test
>>> tar -jxf ../linux-2.6.22.12
>>> cp ../working-config linux-2.6.22.12/.config
>>> cd linux-2.6.22.12
>>> make oldconfig
>>> time make -j3 > /dev/null # This is what I note down as a "test" result
>>> cd /usr/src ; umount /usr/src/test ; mkfs.ext3 /dev/cc/test
>>> and then reboot
>>>
>>> The kernel is booted with the parameter mem=81920000
>>>
>>> For 2.6.23.14 the results vary from (real time) 33m30.551s to 45m32.703s
>>> (30 runs)
>>> For 2.6.23.14 with nop i/o scheduler from 29m8.827s to 55m36.744s (24 runs)
>>> For 2.6.22.14 also varied a lot.. but, lost results :(
>>> For 2.6.20.21 only vary from 34m32.054s to 38m1.928s (10 runs)
>>>
>>> Any idea of what can cause this? I have tried to make the runs as equal
>>> as possible, rebooting between each run.. i/o scheduler is cfq as default.
>>>
>>> sys and user time only varies a couple of seconds.. and the order of
>>> when it is "fast" and when it is "slow" is completly random, but it
>>> seems that the results are mostly concentrated around the mean.
>>>
>>>
>> Hmm, lots of things could cause it. With such big variations in
>> elapsed time, and small variations on CPU time, I guess the fs/IO
>> layers are the prime suspects, although it could also involve the
>> VM if you are doing a fair amount of page reclaim.
>>
>> Can you boot with enough memory such that it never enters page
>> reclaim? `grep scan /proc/vmstat` to check.
>>
>> Otherwise you could mount the working directory as tmpfs to
>> eliminate IO.
>>
>> bisecting it down to a single patch would be really helpful if you
>> can spare the time.
>>
>>
> I'm going to run some tests without limiting the memory to 80 megabytes
> (so that it is 2 gigabyte) and see how much it varies then, but iff I
> recall correctly it did not vary much. I'll reply to this e-mail with
> the results.
>
5 runs gives me:
real 5m58.626s
real 5m57.280s
real 5m56.584s
real 5m57.565s
real 5m56.613s

Should I test with tmpfs aswell?

--
Asbjorn Sannes

2008-01-25 17:17:12

by Ray Lee

[permalink] [raw]
Subject: Re: Unpredictable performance

On Jan 25, 2008 3:32 AM, Asbjorn Sannes <[email protected]> wrote:
> Hi,
>
> I am experiencing unpredictable results with the following test
> without other processes running (exception is udev, I believe):
> cd /usr/src/test
> tar -jxf ../linux-2.6.22.12
> cp ../working-config linux-2.6.22.12/.config
> cd linux-2.6.22.12
> make oldconfig
> time make -j3 > /dev/null # This is what I note down as a "test" result
> cd /usr/src ; umount /usr/src/test ; mkfs.ext3 /dev/cc/test
> and then reboot
>
> The kernel is booted with the parameter mem=81920000
>
> For 2.6.23.14 the results vary from (real time) 33m30.551s to 45m32.703s
> (30 runs)
> For 2.6.23.14 with nop i/o scheduler from 29m8.827s to 55m36.744s (24 runs)
> For 2.6.22.14 also varied a lot.. but, lost results :(
> For 2.6.20.21 only vary from 34m32.054s to 38m1.928s (10 runs)
>
> Any idea of what can cause this? I have tried to make the runs as equal
> as possible, rebooting between each run.. i/o scheduler is cfq as default.
>
> sys and user time only varies a couple of seconds.. and the order of
> when it is "fast" and when it is "slow" is completly random, but it
> seems that the results are mostly concentrated around the mean.

First off, not all tests are good tests. In particular, small timing
differences can get magnified horrendously by heading into swap.

That said, do you have the means and standard deviations of those
runs? That's a good way to tell whether the tests are converging or
not, and whether your results are telling you anything.

Also as you're on a uniprocessor system, make -j2 is probably going to
be faster than make -j3. Perhaps immaterial to whatever you're trying
to test, but there you go.

2008-01-25 20:50:30

by Asbjørn Sannes

[permalink] [raw]
Subject: Re: Unpredictable performance

Ray Lee wrote:
> On Jan 25, 2008 3:32 AM, Asbjorn Sannes <[email protected]> wrote:
>
>> Hi,
>>
>> I am experiencing unpredictable results with the following test
>> without other processes running (exception is udev, I believe):
>> cd /usr/src/test
>> tar -jxf ../linux-2.6.22.12
>> cp ../working-config linux-2.6.22.12/.config
>> cd linux-2.6.22.12
>> make oldconfig
>> time make -j3 > /dev/null # This is what I note down as a "test" result
>> cd /usr/src ; umount /usr/src/test ; mkfs.ext3 /dev/cc/test
>> and then reboot
>>
>> The kernel is booted with the parameter mem=81920000
>>
>> For 2.6.23.14 the results vary from (real time) 33m30.551s to 45m32.703s
>> (30 runs)
>> For 2.6.23.14 with nop i/o scheduler from 29m8.827s to 55m36.744s (24 runs)
>> For 2.6.22.14 also varied a lot.. but, lost results :(
>> For 2.6.20.21 only vary from 34m32.054s to 38m1.928s (10 runs)
>>
>> Any idea of what can cause this? I have tried to make the runs as equal
>> as possible, rebooting between each run.. i/o scheduler is cfq as default.
>>
>> sys and user time only varies a couple of seconds.. and the order of
>> when it is "fast" and when it is "slow" is completly random, but it
>> seems that the results are mostly concentrated around the mean.
>>
.. I may have jumped the gun a "little" early saying that it is mostly
concentrated around the mean, grepping from memory is not always .. hm,
accurate :P
>
> First off, not all tests are good tests. In particular, small timing
> differences can get magnified horrendously by heading into swap.
>
>
So, what you are saying is that it is expected to vary this much under
memory pressure? That I can not do anything with this on real hardware?
> That said, do you have the means and standard deviations of those
> runs? That's a good way to tell whether the tests are converging or
> not, and whether your results are telling you anything.
>
>
I have all the numbers, I was just hoping that there was a way to
benchmark a small change without a lot of runs. It seems to me to quite
randomly distributed .. from the 2.6.23.14 runs:
43m10.022s, 34m31.104s, 43m47.221s, 41m17.840s, 34m15.454s,
37m54.327s, 35m6.193s, 38m16.909s, 37m45.411s, 40m13.169s
38m17.414s, 34m37.561s, 43m18.181s, 35m46.233s, 34m44.414s,
39m55.257s, 35m28.477s, 33m30.551s, 41m36.394s, 43m6.359s,
42m42.396s, 37m44.293s, 41m6.615s, 35m43.084s, 39m25.846s,
34m23.753s, 36m0.556s, 41m38.095s, 45m32.703s, 36m18.325s,
42m4.840s, 43m53.759s, 35m51.138s, 40m19.001s

Say I made a histogram of this (tilt your head :P) with 1 minute intervals:
33 *
34 *****
35 *****
36 **
37 ***
38 **
39 **
40 **
41 ****
42 **
43 *****
44
45 *

I don't really know what to make of that.. Going to see what happens
with less memory and make -j1, perhaps it will be more stable.
> Also as you're on a uniprocessor system, make -j2 is probably going to
> be faster than make -j3. Perhaps immaterial to whatever you're trying
> to test, but there you go.
Yes, I was hoping to have a more deterministic test to get a higher
confidence in fewer runs when testing changes. Especially under memory
pressure. And I truly was not expecting this much fluctuations, which is
why I tested several kernel versions to see if this influenced it and
mailed lkml. The computer is actually a dual core amd processor, but I
compiled the kernel with no smp to see if that helped on the dispersion.

--
Asbjorn Sannes


2008-01-26 00:38:54

by Nick Piggin

[permalink] [raw]
Subject: Re: Unpredictable performance

On Saturday 26 January 2008 02:03, Asbj?rn Sannes wrote:
> Asbj?rn Sannes wrote:
> > Nick Piggin wrote:
> >> On Friday 25 January 2008 22:32, Asbjorn Sannes wrote:
> >>> Hi,
> >>>
> >>> I am experiencing unpredictable results with the following test
> >>> without other processes running (exception is udev, I believe):
> >>> cd /usr/src/test
> >>> tar -jxf ../linux-2.6.22.12
> >>> cp ../working-config linux-2.6.22.12/.config
> >>> cd linux-2.6.22.12
> >>> make oldconfig
> >>> time make -j3 > /dev/null # This is what I note down as a "test" result
> >>> cd /usr/src ; umount /usr/src/test ; mkfs.ext3 /dev/cc/test
> >>> and then reboot
> >>>
> >>> The kernel is booted with the parameter mem=81920000
> >>>
> >>> For 2.6.23.14 the results vary from (real time) 33m30.551s to
> >>> 45m32.703s (30 runs)
> >>> For 2.6.23.14 with nop i/o scheduler from 29m8.827s to 55m36.744s (24
> >>> runs) For 2.6.22.14 also varied a lot.. but, lost results :(
> >>> For 2.6.20.21 only vary from 34m32.054s to 38m1.928s (10 runs)
> >>>
> >>> Any idea of what can cause this? I have tried to make the runs as equal
> >>> as possible, rebooting between each run.. i/o scheduler is cfq as
> >>> default.
> >>>
> >>> sys and user time only varies a couple of seconds.. and the order of
> >>> when it is "fast" and when it is "slow" is completly random, but it
> >>> seems that the results are mostly concentrated around the mean.
> >>
> >> Hmm, lots of things could cause it. With such big variations in
> >> elapsed time, and small variations on CPU time, I guess the fs/IO
> >> layers are the prime suspects, although it could also involve the
> >> VM if you are doing a fair amount of page reclaim.
> >>
> >> Can you boot with enough memory such that it never enters page
> >> reclaim? `grep scan /proc/vmstat` to check.
> >>
> >> Otherwise you could mount the working directory as tmpfs to
> >> eliminate IO.
> >>
> >> bisecting it down to a single patch would be really helpful if you
> >> can spare the time.
> >
> > I'm going to run some tests without limiting the memory to 80 megabytes
> > (so that it is 2 gigabyte) and see how much it varies then, but iff I
> > recall correctly it did not vary much. I'll reply to this e-mail with
> > the results.
>
> 5 runs gives me:
> real 5m58.626s
> real 5m57.280s
> real 5m56.584s
> real 5m57.565s
> real 5m56.613s
>
> Should I test with tmpfs aswell?

I wouldn't worry about it. It seems like it might be due to page reclaim
(fs / IO can't be ruled out completely though). Hmm, I haven't been following
reclaim so closely lately; you say it started going bad around 2.6.22? It
may be lumpy reclaim patches?

2008-01-28 09:01:33

by Asbjørn Sannes

[permalink] [raw]
Subject: Re: Unpredictable performance

Ray Lee wrote:
> On Jan 25, 2008 12:49 PM, Asbjørn Sannes <[email protected]> wrote:
>
>> Ray Lee wrote:
>>
>>> On Jan 25, 2008 3:32 AM, Asbjorn Sannes <[email protected]> wrote:
>>>
>>>
>>>> Hi,
>>>>
>>>> I am experiencing unpredictable results with the following test
>>>> without other processes running (exception is udev, I believe):
>>>> cd /usr/src/test
>>>> tar -jxf ../linux-2.6.22.12
>>>> cp ../working-config linux-2.6.22.12/.config
>>>> cd linux-2.6.22.12
>>>> make oldconfig
>>>> time make -j3 > /dev/null # This is what I note down as a "test" result
>>>> cd /usr/src ; umount /usr/src/test ; mkfs.ext3 /dev/cc/test
>>>> and then reboot
>>>>
>>>> The kernel is booted with the parameter mem=81920000
>>>>
>>>> For 2.6.23.14 the results vary from (real time) 33m30.551s to 45m32.703s
>>>> (30 runs)
>>>> For 2.6.23.14 with nop i/o scheduler from 29m8.827s to 55m36.744s (24 runs)
>>>> For 2.6.22.14 also varied a lot.. but, lost results :(
>>>> For 2.6.20.21 only vary from 34m32.054s to 38m1.928s (10 runs)
>>>>
>>>> Any idea of what can cause this? I have tried to make the runs as equal
>>>> as possible, rebooting between each run.. i/o scheduler is cfq as default.
>>>>
>>>> sys and user time only varies a couple of seconds.. and the order of
>>>> when it is "fast" and when it is "slow" is completly random, but it
>>>> seems that the results are mostly concentrated around the mean.
>>>>
>>>>
>> .. I may have jumped the gun a "little" early saying that it is mostly
>> concentrated around the mean, grepping from memory is not always .. hm,
>> accurate :P
>>
>
> For you (or anyone!) to have any faith in your conclusions at all, you
> need to generate the mean and the standard deviation of each of your
> runs.
>
>
>>> First off, not all tests are good tests. In particular, small timing
>>> differences can get magnified horrendously by heading into swap.
>>>
>>>
>>>
>> So, what you are saying is that it is expected to vary this much under
>> memory pressure? That I can not do anything with this on real hardware?
>>
>
> No, I'm saying exactly what I wrote.
>
> What you're testing is basically a bunch of processes competing for
> the CPU scheduler. Who wins that competition is essentially random.
> Whoever wins then places pressure on the IO subsystem. If you then go
> into swap, you're then placing even *more* random pressure on the IO
> system. The reason is that the order of the requests you're asking it
> to do vary *wildly* between each of your 'tests', and disk drives have
> a horrible time seeking between tracks. That's what I'm saying by
> minute differences in the kernel's behavior can get magnified
> thousands of times if you start hitting swap, or running a test that
> won't all fit into cache.
>
>
Yes, I get this, just to make it clear, the test is supposed to throw
things out from the page cache, because this is in part what I want to
test. I was under the impression that (not anymore though) a kernel
compile would be pretty deterministic in how it would pressure the page
cache and how much I/O would result from that.. so, I'm going to try and
change the test.
> So whatever you're trying to measure, you need to be aware that you're
> basically throwing a random number generator into the mix.
>
>
>>> That said, do you have the means and standard deviations of those
>>> runs? That's a good way to tell whether the tests are converging or
>>> not, and whether your results are telling you anything.
>>>
>>>
>>>
>> I have all the numbers, I was just hoping that there was a way to
>> benchmark a small change without a lot of runs. It seems to me to quite
>> randomly distributed .. from the 2.6.23.14 runs:
>>
>
> Sure, you just keep running the tests until your standard deviation
> converges to a significant enough range, where significant is whatever
> you like it to be (+- one minute, say, or 10 seconds, or whatever).
> But beware, if your test is essentially random, then it may never
> converge. That in itself is interesting, too.
>
>
>> It seems to me to quite
>> randomly distributed .. from the 2.6.23.14 runs:
>> 43m10.022s, 34m31.104s, 43m47.221s, 41m17.840s, 34m15.454s,
>> 37m54.327s, 35m6.193s, 38m16.909s, 37m45.411s, 40m13.169s
>> 38m17.414s, 34m37.561s, 43m18.181s, 35m46.233s, 34m44.414s,
>> 39m55.257s, 35m28.477s, 33m30.551s, 41m36.394s, 43m6.359s,
>> 42m42.396s, 37m44.293s, 41m6.615s, 35m43.084s, 39m25.846s,
>> 34m23.753s, 36m0.556s, 41m38.095s, 45m32.703s, 36m18.325s,
>> 42m4.840s, 43m53.759s, 35m51.138s, 40m19.001s
>>
>> Say I made a histogram of this (tilt your head :P) with 1 minute intervals:
>> 33 *
>> 34 *****
>> 35 *****
>> 36 **
>> 37 ***
>> 38 **
>> 39 **
>> 40 **
>> 41 ****
>> 42 **
>> 43 *****
>> 44
>> 45 *
>>
>
>
Mean is 2328s and standard deviation is 210s.
> Just eyeballing that, I can tell you your standard deviation is large,
> and you would therefore need to run more tests.
>
> However, let me just underscore that unless you're planning on setting
> up a compile server that you *want* to push into swap all the time,
> then this is a pretty silly test for getting good numbers, and instead
> should try testing something closer to what you're actually concerned
> about.
>
> As an example -- there's a reason kernel developers have stopped
> trying to get good numbers from dbench: it's horribly sensitive to
> timing issues.
>
>
>> I don't really know what to make of that.. Going to see what happens
>> with less memory and make -j1, perhaps it will be more stable.
>>
>>> Also as you're on a uniprocessor system, make -j2 is probably going to
>>> be faster than make -j3. Perhaps immaterial to whatever you're trying
>>> to test, but there you go.
>>>
>> Yes, I was hoping to have a more deterministic test to get a higher
>> confidence in fewer runs when testing changes. Especially under memory
>> pressure. And I truly was not expecting this much fluctuations, which is
>> why I tested several kernel versions to see if this influenced it and
>> mailed lkml. The computer is actually a dual core amd processor, but I
>> compiled the kernel with no smp to see if that helped on the dispersion.
>>
>
> Well, if you really feel something is weird between the different
> kernel versions, then you should try bisecting various kernels between
> 2.6.20 and 2.6.22 and see if it leads you to anything conclusive. You
> only took 10 data points on 2.6.20, though, so I'd retest it as much
> as you have the other versions just to make sure the results are
> stable.
>
>
Just going to put this here, did more tests on 2.6.20.21 (56runs) and
here is the outcome of that:
33: **
34: *********
35: ************
36: *****************
37: **********
38: *****
39: *

Mean is 2175s (~36m) and standard deviation 77s.

> The kernel really *does* change behavior between each version,
> sometimes intentional, sometimes not. But good tests are one of the
> few starting points for discovering that.
>
> In particular, though, you need to be aware that Ingo Molnar's
> "Completely Fair Scheduler" went into 2.6.23. What that means is that
> the processes get scheduled on the CPU more fairly. It also means that
> the disk drive access in your test is going to be much more random and
> seeky than before, as the processes are now interleaving their
> requests together in a much more fine grained way. This is why I keep
> saying (sorry!) that I don't think you're really testing whatever it
> is that you actually care about.
>
What I want to test is how compressed caching (compressing the page
cache) improves performance under memory pressure. Now, if I just
compared it to the vanilla kernel and I only had one version of
compressed caching to test with, then that would be fine. But I want to
test some small changes to heuristics that may only change the outcome
by 2-3%. A good starting point for me would be to have a vanilla kernel
and a test that gives me consistent enough results, before starting to
compare it.

I'm going to try a kernel compile with no parallel processes, the
randomness should become less I suppose :)


--
Asbjorn Sannes

2008-01-28 09:12:11

by Asbjørn Sannes

[permalink] [raw]
Subject: Re: Unpredictable performance

Nick Piggin wrote:
> On Saturday 26 January 2008 02:03, Asbj?rn Sannes wrote:
>
>> Asbj?rn Sannes wrote:
>>
>>> Nick Piggin wrote:
>>>
>>>> On Friday 25 January 2008 22:32, Asbjorn Sannes wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am experiencing unpredictable results with the following test
>>>>> without other processes running (exception is udev, I believe):
>>>>> cd /usr/src/test
>>>>> tar -jxf ../linux-2.6.22.12
>>>>> cp ../working-config linux-2.6.22.12/.config
>>>>> cd linux-2.6.22.12
>>>>> make oldconfig
>>>>> time make -j3 > /dev/null # This is what I note down as a "test" result
>>>>> cd /usr/src ; umount /usr/src/test ; mkfs.ext3 /dev/cc/test
>>>>> and then reboot
>>>>>
>>>>> The kernel is booted with the parameter mem=81920000
>>>>>
>>>>> For 2.6.23.14 the results vary from (real time) 33m30.551s to
>>>>> 45m32.703s (30 runs)
>>>>> For 2.6.23.14 with nop i/o scheduler from 29m8.827s to 55m36.744s (24
>>>>> runs) For 2.6.22.14 also varied a lot.. but, lost results :(
>>>>> For 2.6.20.21 only vary from 34m32.054s to 38m1.928s (10 runs)
>>>>>
>>>>> Any idea of what can cause this? I have tried to make the runs as equal
>>>>> as possible, rebooting between each run.. i/o scheduler is cfq as
>>>>> default.
>>>>>
>>>>> sys and user time only varies a couple of seconds.. and the order of
>>>>> when it is "fast" and when it is "slow" is completly random, but it
>>>>> seems that the results are mostly concentrated around the mean.
>>>>>
>>>> Hmm, lots of things could cause it. With such big variations in
>>>> elapsed time, and small variations on CPU time, I guess the fs/IO
>>>> layers are the prime suspects, although it could also involve the
>>>> VM if you are doing a fair amount of page reclaim.
>>>>
>>>> Can you boot with enough memory such that it never enters page
>>>> reclaim? `grep scan /proc/vmstat` to check.
>>>>
>>>> Otherwise you could mount the working directory as tmpfs to
>>>> eliminate IO.
>>>>
>>>> bisecting it down to a single patch would be really helpful if you
>>>> can spare the time.
>>>>
>>> I'm going to run some tests without limiting the memory to 80 megabytes
>>> (so that it is 2 gigabyte) and see how much it varies then, but iff I
>>> recall correctly it did not vary much. I'll reply to this e-mail with
>>> the results.
>>>
>> 5 runs gives me:
>> real 5m58.626s
>> real 5m57.280s
>> real 5m56.584s
>> real 5m57.565s
>> real 5m56.613s
>>
>> Should I test with tmpfs aswell?
>>
>
> I wouldn't worry about it. It seems like it might be due to page reclaim
> (fs / IO can't be ruled out completely though). Hmm, I haven't been following
> reclaim so closely lately; you say it started going bad around 2.6.22? It
> may be lumpy reclaim patches?
>
Going to bisect it soon, but I suspect it will take some time
(considering how many runs I need to make any sense of the results).

--
Asbjorn Sannes