2009-09-25 22:09:30

by Aneurin Price

[permalink] [raw]
Subject: Regression: kernels since 2.6.26 are unusably slow

Hi,
Every kernel since 2.6.26 has been unusably slow for me - to the extent that I
initially thought they were hanging on boot. Exactly how bad it is seems to
vary, but it could take several minutes to boot, and then another minute or two
to log in (with bash pegging the CPU as it loads). Everything seems to need
vastly more CPU time than usual - it feels a little like I'm using a 486.

I've bisected the problem down to commit
42651f15824d003e8357693ab72c4dbb3e280836 (x86: fix trimming e820 with MTRR
holes). Having basically no idea what that means, I thought I'd try building a
kernel with MTRRs disabled, to see if that would make any difference, but no
joy.

Can anyone give me some idea of where to go next, or let me know what further
information I should provide?

Thanks,
Nye


2009-09-25 22:51:32

by Frans Pop

[permalink] [raw]
Subject: Re: Regression: kernels since 2.6.26 are unusably slow

Adding maintainers involved with the commit in CC.

Aneurin Price wrote:
> Hi,
> Every kernel since 2.6.26 has been unusably slow for me - to the extent
> that I initially thought they were hanging on boot. Exactly how bad it is
> seems to vary, but it could take several minutes to boot, and then
> another minute or two to log in (with bash pegging the CPU as it loads).
> Everything seems to need vastly more CPU time than usual - it feels a
> little like I'm using a 486.
>
> I've bisected the problem down to commit
> 42651f15824d003e8357693ab72c4dbb3e280836 (x86: fix trimming e820 with
> MTRR holes). Having basically no idea what that means, I thought I'd try
> building a kernel with MTRRs disabled, to see if that would make any
> difference, but no joy.
>
> Can anyone give me some idea of where to go next, or let me know what
> further information I should provide?

Let's start with some basic info. Can you please send your kernel config
and the output of 'dmesg' after a boot with a "slow" kernel?

Also, what distribution do you use (and what is its release/version)?

Cheers,
FJP

2009-09-26 01:02:08

by Aneurin Price

[permalink] [raw]
Subject: Re: Regression: kernels since 2.6.26 are unusably slow

On Fri, Sep 25, 2009 at 23:51, Frans Pop <[email protected]> wrote:
> Adding maintainers involved with the commit in CC.
>
> Aneurin Price wrote:
>> Hi,
>> Every kernel since 2.6.26 has been unusably slow for me - to the extent
>> that I initially thought they were hanging on boot. Exactly how bad it is
>> seems to vary, but it could take several minutes to boot, and then
>> another minute or two to log in (with bash pegging the CPU as it loads).
>> Everything seems to need vastly more CPU time than usual - it feels a
>> little like I'm using a 486.
>>
>> I've bisected the problem down to commit
>> 42651f15824d003e8357693ab72c4dbb3e280836 (x86: fix trimming e820 with
>> MTRR holes). Having basically no idea what that means, I thought I'd try
>> building a kernel with MTRRs disabled, to see if that would make any
>> difference, but no joy.
>>
>> Can anyone give me some idea of where to go next, or let me know what
>> further information I should provide?
>
> Let's start with some basic info. Can you please send your kernel config
> and the output of 'dmesg' after a boot with a "slow" kernel?
>

Attached. I'm not entirely convinced about the sanity of that kernel config, so
if it might help I could repeat with a Debian packaged kernel, on the basis that
it's presumably working without error for a large number of people.

> Also, what distribution do you use (and what is its release/version)?
>

I'm using Debian unstable.


Hmm. Looking at dmesg for a 'working' kernel includes an interesting part which
I foolishly forgot to save, saying that the last 512MB of memory is
inaccessible (I was wondering where that last half-gig had gone :-)). I can go
and check exactly what it says, but I don't have the energy left for any more
reboots today. Anyway, that prompted me to try booting with mem=7168 - and that
makes the problem go away. Perhaps that will shed some light on things.

Thanks,
Nye


Attachments:
dmesg.2.6.31-09061-g6d7f18f-dirty (43.34 kB)
config-2.6.31-09061-g6d7f18f-dirty (60.42 kB)
Download all attachments

2009-09-26 01:47:58

by Yinghai Lu

[permalink] [raw]
Subject: Re: Regression: kernels since 2.6.26 are unusably slow

On Fri, Sep 25, 2009 at 6:02 PM, Aneurin Price <[email protected]> wrote:

> Hmm. Looking at dmesg for a 'working' kernel includes an interesting part which
> I foolishly forgot to save, saying that the last 512MB of memory is
> inaccessible (I was wondering where that last half-gig had gone :-)). I can go
> and check exactly what it says, but I don't have the energy left for any more
> reboots today. Anyway, that prompted me to try booting with mem=7168 - and that
> makes the problem go away. Perhaps that will shed some light on things.

[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009e800 (usable)
[ 0.000000] BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 00000000dfee0000 (usable)
[ 0.000000] BIOS-e820: 00000000dfee0000 - 00000000dfee3000 (ACPI NVS)
[ 0.000000] BIOS-e820: 00000000dfee3000 - 00000000dfef0000 (ACPI data)
[ 0.000000] BIOS-e820: 00000000dfef0000 - 00000000dff00000 (reserved)
[ 0.000000] BIOS-e820: 00000000f0000000 - 00000000f4000000 (reserved)
[ 0.000000] BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
[ 0.000000] BIOS-e820: 0000000100000000 - 0000000220000000 (usable)
[ 0.000000] DMI 2.4 present.
[ 0.000000] last_pfn = 0x220000 max_arch_pfn = 0x400000000
[ 0.000000] MTRR default type: uncachable
[ 0.000000] MTRR fixed ranges enabled:
[ 0.000000] 00000-9FFFF write-back
[ 0.000000] A0000-BFFFF uncachable
[ 0.000000] C0000-CDFFF write-protect
[ 0.000000] CE000-EFFFF uncachable
[ 0.000000] F0000-FFFFF write-through
[ 0.000000] MTRR variable ranges enabled:
[ 0.000000] 0 base 100000000 mask FE0000000 write-back
[ 0.000000] 1 base 100000000 mask F00000000 write-back
[ 0.000000] 2 base 000000000 mask F00000000 write-back
[ 0.000000] 3 base 0E0000000 mask FE0000000 uncachable
[ 0.000000] 4 base 0DFF00000 mask FFFF00000 write-through
[ 0.000000] 5 disabled
[ 0.000000] 6 disabled
[ 0.000000] 7 disabled

looks like your MTRR has some problem, and with that WRITE-THROUGH
there, the trimming e820 will not happen

YH

2009-09-26 02:49:10

by Robert Hancock

[permalink] [raw]
Subject: Re: Regression: kernels since 2.6.26 are unusably slow

On 09/25/2009 07:47 PM, Yinghai Lu wrote:
> On Fri, Sep 25, 2009 at 6:02 PM, Aneurin Price<[email protected]> wrote:
>
>> Hmm. Looking at dmesg for a 'working' kernel includes an interesting part which
>> I foolishly forgot to save, saying that the last 512MB of memory is
>> inaccessible (I was wondering where that last half-gig had gone :-)). I can go
>> and check exactly what it says, but I don't have the energy left for any more
>> reboots today. Anyway, that prompted me to try booting with mem=7168 - and that
>> makes the problem go away. Perhaps that will shed some light on things.
>
> [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009e800 (usable)
> [ 0.000000] BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
> [ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
> [ 0.000000] BIOS-e820: 0000000000100000 - 00000000dfee0000 (usable)
> [ 0.000000] BIOS-e820: 00000000dfee0000 - 00000000dfee3000 (ACPI NVS)
> [ 0.000000] BIOS-e820: 00000000dfee3000 - 00000000dfef0000 (ACPI data)
> [ 0.000000] BIOS-e820: 00000000dfef0000 - 00000000dff00000 (reserved)
> [ 0.000000] BIOS-e820: 00000000f0000000 - 00000000f4000000 (reserved)
> [ 0.000000] BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
> [ 0.000000] BIOS-e820: 0000000100000000 - 0000000220000000 (usable)
> [ 0.000000] DMI 2.4 present.
> [ 0.000000] last_pfn = 0x220000 max_arch_pfn = 0x400000000
> [ 0.000000] MTRR default type: uncachable
> [ 0.000000] MTRR fixed ranges enabled:
> [ 0.000000] 00000-9FFFF write-back
> [ 0.000000] A0000-BFFFF uncachable
> [ 0.000000] C0000-CDFFF write-protect
> [ 0.000000] CE000-EFFFF uncachable
> [ 0.000000] F0000-FFFFF write-through
> [ 0.000000] MTRR variable ranges enabled:
> [ 0.000000] 0 base 100000000 mask FE0000000 write-back
> [ 0.000000] 1 base 100000000 mask F00000000 write-back
> [ 0.000000] 2 base 000000000 mask F00000000 write-back
> [ 0.000000] 3 base 0E0000000 mask FE0000000 uncachable
> [ 0.000000] 4 base 0DFF00000 mask FFFF00000 write-through
> [ 0.000000] 5 disabled
> [ 0.000000] 6 disabled
> [ 0.000000] 7 disabled
>
> looks like your MTRR has some problem, and with that WRITE-THROUGH
> there, the trimming e820 will not happen

You might want to see if there's a BIOS update available for that
system/motherboard..

2009-09-26 07:19:40

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Regression: kernels since 2.6.26 are unusably slow

On Fri, 25 Sep 2009 23:09:32 +0100
Aneurin Price <[email protected]> wrote:

> Hi,
> Every kernel since 2.6.26 has been unusably slow for me - to the
> extent that I initially thought they were hanging on boot. Exactly
> how bad it is seems to vary, but it could take several minutes to
> boot, and then another minute or two to log in (with bash pegging the
> CPU as it loads). Everything seems to need vastly more CPU time than
> usual - it feels a little like I'm using a 486.

means part of memory is uncached.

>
> I've bisected the problem down to commit
> 42651f15824d003e8357693ab72c4dbb3e280836 (x86: fix trimming e820 with
> MTRR holes). Having basically no idea what that means, I thought I'd
> try building a kernel with MTRRs disabled, to see if that would make
> any difference, but no joy.

b0rked MTRRs will do that to you ;-)

>
> Can anyone give me some idea of where to go next, or let me know what
> further information I should provide?
>

one thing to try is enabling CONFIG_MTRR_SANITIZER as config option;
that lets linux rewrite all MTRRs to see if it can puzzle a combination
together that fits all your memory...

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2009-09-26 22:51:48

by Aneurin Price

[permalink] [raw]
Subject: Re: Regression: kernels since 2.6.26 are unusably slow

On Sat, Sep 26, 2009 at 03:49, Robert Hancock <[email protected]> wrote:
> On 09/25/2009 07:47 PM, Yinghai Lu wrote:
>>
>> On Fri, Sep 25, 2009 at 6:02 PM, Aneurin Price<[email protected]>
>>  wrote:
>>
>>> Hmm. Looking at dmesg for a 'working' kernel includes an interesting part
>>> which
>>> I foolishly forgot to save, saying that the last 512MB of memory is
>>> inaccessible (I was wondering where that last half-gig had gone :-)). I
>>> can go
>>> and check exactly what it says, but I don't have the energy left for any
>>> more
>>> reboots today. Anyway, that prompted me to try booting with mem=7168 -
>>> and that
>>> makes the problem go away. Perhaps that will shed some light on things.
>>
>> [    0.000000]  BIOS-e820: 0000000000000000 - 000000000009e800 (usable)
>> [    0.000000]  BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
>> [    0.000000]  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
>> [    0.000000]  BIOS-e820: 0000000000100000 - 00000000dfee0000 (usable)
>> [    0.000000]  BIOS-e820: 00000000dfee0000 - 00000000dfee3000 (ACPI NVS)
>> [    0.000000]  BIOS-e820: 00000000dfee3000 - 00000000dfef0000 (ACPI data)
>> [    0.000000]  BIOS-e820: 00000000dfef0000 - 00000000dff00000 (reserved)
>> [    0.000000]  BIOS-e820: 00000000f0000000 - 00000000f4000000 (reserved)
>> [    0.000000]  BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
>> [    0.000000]  BIOS-e820: 0000000100000000 - 0000000220000000 (usable)
>> [    0.000000] DMI 2.4 present.
>> [    0.000000] last_pfn = 0x220000 max_arch_pfn = 0x400000000
>> [    0.000000] MTRR default type: uncachable
>> [    0.000000] MTRR fixed ranges enabled:
>> [    0.000000]   00000-9FFFF write-back
>> [    0.000000]   A0000-BFFFF uncachable
>> [    0.000000]   C0000-CDFFF write-protect
>> [    0.000000]   CE000-EFFFF uncachable
>> [    0.000000]   F0000-FFFFF write-through
>> [    0.000000] MTRR variable ranges enabled:
>> [    0.000000]   0 base 100000000 mask FE0000000 write-back
>> [    0.000000]   1 base 100000000 mask F00000000 write-back
>> [    0.000000]   2 base 000000000 mask F00000000 write-back
>> [    0.000000]   3 base 0E0000000 mask FE0000000 uncachable
>> [    0.000000]   4 base 0DFF00000 mask FFFF00000 write-through
>> [    0.000000]   5 disabled
>> [    0.000000]   6 disabled
>> [    0.000000]   7 disabled
>>
>> looks like your MTRR has some problem, and with that WRITE-THROUGH
>> there, the trimming e820 will not happen
>
> You might want to see if there's a BIOS update available for that
> system/motherboard..
>

Slightly surprisingly to me, after spending two and a half hours figuring out a
way to update the BIOS, it actually worked:

[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009dc00 (usable)
[ 0.000000] BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 00000000dfee0000 (usable)
[ 0.000000] BIOS-e820: 00000000dfee0000 - 00000000dfee3000 (ACPI NVS)
[ 0.000000] BIOS-e820: 00000000dfee3000 - 00000000dfef0000 (ACPI data)
[ 0.000000] BIOS-e820: 00000000dfef0000 - 00000000dff00000 (reserved)
[ 0.000000] BIOS-e820: 00000000f0000000 - 00000000f4000000 (reserved)
[ 0.000000] BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
[ 0.000000] BIOS-e820: 0000000100000000 - 0000000220000000 (usable)
[ 0.000000] DMI 2.4 present.
[ 0.000000] last_pfn = 0x220000 max_arch_pfn = 0x400000000
[ 0.000000] MTRR default type: uncachable
[ 0.000000] MTRR fixed ranges enabled:
[ 0.000000] 00000-9FFFF write-back
[ 0.000000] A0000-BFFFF uncachable
[ 0.000000] C0000-CDFFF write-protect
[ 0.000000] CE000-EFFFF uncachable
[ 0.000000] F0000-FFFFF write-through
[ 0.000000] MTRR variable ranges enabled:
[ 0.000000] 0 base 000000000 mask F00000000 write-back
[ 0.000000] 1 base 0E0000000 mask FE0000000 uncachable
[ 0.000000] 2 base 100000000 mask F00000000 write-back
[ 0.000000] 3 base 200000000 mask FE0000000 write-back
[ 0.000000] 4 base 0DFF00000 mask FFFF00000 uncachable
[ 0.000000] 5 disabled
[ 0.000000] 6 disabled
[ 0.000000] 7 disabled

I do wish I understood this better, but I suppose that'll have to be a topic for
future research.

Given that I'm no longer having the problem, this isn't of pressing importance
to me so is purely informational:

CONFIG_MTRR_SANITIZER didn't help[0], though I didn't try changing the value of
MTRR_SANITIZER_SPARE_REG_NR_DEFAULT. I also tried mtrr-uncover which I learned
about from http://kerneltrap.org/mailarchive/linux-kernel/2008/9/30/3454074, and
that didn't help either. However, the thing that makes me bother mentioning all
this is that I tried booting Windows XP (64bit) before updating the BIOS, and
that worked fine with the full 8GB of RAM. This indicates to me that whatever
the problem, it wasn't insurmountable, so in principle Linux could do better.

If it doesn't seem worth looking into further then that's perfectly
understandable, but if anyone is interested and would like any more information,
let me know.

Thank you all for your suggestions,
Nye



[0] Google indicated that passing the boot param 'enable_mtrr_cleanup' would be
appropriate; I don't know if that's correct, and I don't really understand the
wording on MTRR_SANITIZER_ENABLE_DEFAULT. I got the impression that the optional
0 or 1 value just means 'enable MTRR cleanup or not', and is overridden by the
boot parameter. Maybe that option would be worth clarifying?