2005-02-21 13:46:31

by Martin Mokrejs

[permalink] [raw]
Subject: memory management weirdness

Hi,
I have received no answer to my former question
(see http://marc.theaimsgroup.com/?l=linux-kernel&m=110827143716215&w=2).
I've spent some more time on that problem and have more or less confirmed
it's because of buggy bios. However, the linux kernel doesn't handle properly
such case. I've tested 2.4.30-pre1 kernel and latest 2.6.11-rc4 kernel.
The conclusion is, that once the machine has physically installed 4x1GB
DDR400 DIMM's (bios detects only 3556 or less memory as some buffers
are allocated by the Intel 875P chipset and AGP card), the linux 2.6.11*
runs up-to 18x slower than when only 2x1GB + 2x 512MB DDR memory is installed.

Although I've not re-tested this today again, it used to help a bit to specify
mem=3548M to decrease memory used by linux (tested with AGP card plugged in, when
bios reported 3556MB RAM only).

I found that removing the AGP based videoc card and using an old PCI based
video card results in bios detecting 4072MB of RAM. But still, the machine was
slow. I've tried to "cat >| /proc/mtrr" to alter the memory settings, but the
result was only a partial speedup.

I'm not sure how to convince linux kernel to run fast again.
I suspect either the memory mapping of interrupts are the cause. Disabling
acpi did not help me initially, so I've conducted most of my tests with
acpi enabled.

I've put dmesg, iomem, interrupt, lspci and time(1) requirements of my test
on web: http://www.natur.cuni.cz/~mmokrejs/tmp/. The differences can be seen easily
by diffing the files. All tests in http://www.natur.cuni.cz/~mmokrejs/tmp/4MB/
we carried with AGP aperture size set to 4MB, although teh video card has
128MB RAM.

Later I've reverted to AGP aperture set to 128MB back and tested again:
http://www.natur.cuni.cz/~mmokrejs/tmp/128MB/.

Finally, I put back two 512MB memory modules to have only 3GB RAM physically,
and the result is at http://www.natur.cuni.cz/~mmokrejs/tmp/128MB/only_phys_3GB/.

About a week ago I tried to contact ASUS, but no answer so far from their
techinical support through some web robot.
http://vip.asus.com/eservice/techmailstatus.aspx?ID=WTM200502111723398547
I do not recommend their "greatest" and real "flag-ship" P4C800-E-Deluxe
motherboard for use with memory sizes above 3GB (although they claim 4GB
is possible). BIOS is the latest release 1.19, although 1.20.001 was tested
as well.



My questions to LKML people are:

1) Could someone tell me what are the differences in
2.4.30-pre1 kernel'd dmesg and 2.6.11-rc4* dmesg outputs? For example, memory
areas "reserved twice" reported by 2.4.30-pre1. Also, differences in /proc/mtrr
under both kernels differ.

2) How about the /proc/interrupts outputs? Aren't they too high? How about the
level/edge interrupt mappings? Would they help?

Please Cc: me in replies. Many thanks for any response, I have wasted seemingly
a lot of money on 2GB RAM. :(
martin

P.S.:

1GB DDR400 modules are Micron CL2.5 2bank 512M chip modules (64Mx8),
non-ecc, unbuffered

512MB DDR 500 modules (yes, PC4000, not PC3200 as is the max supported
by the motherboard) are Kingston HyperX modules, non-ecc, unbuffered.


2005-02-21 14:20:23

by Parag Warudkar

[permalink] [raw]
Subject: Re: memory management weirdness

> Hi,
> I have received no answer to my former question
> (see http://marc.theaimsgroup.com/?l=linux-kernel&m=110827143716215&w=2).
> I've spent some more time on that problem and have more or less confirmed
> it's because of buggy bios. However, the linux kernel doesn't handle properly
> such case. I've tested 2.4.30-pre1 kernel and latest 2.6.11-rc4 kernel.
> The conclusion is, that once the machine has physically installed 4x1GB
> DDR400 DIMM's (bios detects only 3556 or less memory as some buffers
> are allocated by the Intel 875P chipset and AGP card), the linux 2.6.11*
> runs up-to 18x slower than when only 2x1GB + 2x 512MB DDR memory is installed.
>
Can you enable profiling and then post the profile info for various cases - slow and fast? Check out Documentation/basic_profiling.txt in the kernel source for understanding how to do this. This might help narrow down the issue.

Parag



2005-02-21 23:47:29

by Andi Kleen

[permalink] [raw]
Subject: Re: memory management weirdness

Martin MOKREJ? <[email protected]> writes:

> Hi,
> I have received no answer to my former question
> (see http://marc.theaimsgroup.com/?l=linux-kernel&m=110827143716215&w=2).

That's because it's a BIOS problem.

There are limits on how much Linux can work around BIOS breakage.


> Although I've not re-tested this today again, it used to help a bit to specify
> mem=3548M to decrease memory used by linux (tested with AGP card plugged in, when
> bios reported 3556MB RAM only).
>
> I found that removing the AGP based videoc card and using an old PCI based
> video card results in bios detecting 4072MB of RAM. But still, the machine was
> slow. I've tried to "cat >| /proc/mtrr" to alter the memory settings, but the
> result was only a partial speedup.
>
> I'm not sure how to convince linux kernel to run fast again.

It's most likely a MTRR problem. Play more with them.


> Finally, I put back two 512MB memory modules to have only 3GB RAM physically,
> and the result is at http://www.natur.cuni.cz/~mmokrejs/tmp/128MB/only_phys_3GB/.


The cheaper Intel chipsets don't support >4GB at all, and you always
need some space below 4GB for PCI memory mappings/AGP aperture etc.


> About a week ago I tried to contact ASUS, but no answer so far from their
> techinical support through some web robot.
> http://vip.asus.com/eservice/techmailstatus.aspx?ID=WTM200502111723398547
> I do not recommend their "greatest" and real "flag-ship" P4C800-E-Deluxe
> motherboard for use with memory sizes above 3GB (although they claim 4GB
> is possible). BIOS is the latest release 1.19, although 1.20.001 was tested
> as well.

In general non server boards tend to be not very well or not at all
tested with a lot of memory ("a lot" is defined as >2GB for higher end
desktop boards, or >1GB on very cheap desktop boards). That is a
common problem on other motherboards too; Asus is not alone with this.

-Andi

2005-02-22 08:04:58

by Ingo Molnar

[permalink] [raw]
Subject: Re: memory management weirdness


* Andi Kleen <[email protected]> wrote:

> > Although I've not re-tested this today again, it used to help a bit to specify
> > mem=3548M to decrease memory used by linux (tested with AGP card plugged in, when
> > bios reported 3556MB RAM only).
> >
> > I found that removing the AGP based videoc card and using an old PCI based
> > video card results in bios detecting 4072MB of RAM. But still, the machine was
> > slow. I've tried to "cat >| /proc/mtrr" to alter the memory settings, but the
> > result was only a partial speedup.
> >
> > I'm not sure how to convince linux kernel to run fast again.
>
> It's most likely a MTRR problem. Play more with them.

in particular, try to create two small tables in the same format: one
showing the e820 memory map as reported in your kernel log, and one
showing the mtrr areas. If there is any e820 area that is not write-back
cached via the mtrr mappings then that's the problem. You can also use
"mem=exactmap,..." to fix up the memory map that the BIOS provides to
Linux. Slowdowns are very often such MTRR problems. (perhaps the kernel
should report RAM areas that are not covered by MTRR write-back?)

Ingo

2005-02-22 09:58:47

by Martin Mokrejs

[permalink] [raw]
Subject: Re: memory management weirdness

Parag Warudkar wrote:
>>Hi,
>> I have received no answer to my former question
>>(see http://marc.theaimsgroup.com/?l=linux-kernel&m=110827143716215&w=2).
>>I've spent some more time on that problem and have more or less confirmed
>>it's because of buggy bios. However, the linux kernel doesn't handle properly
>>such case. I've tested 2.4.30-pre1 kernel and latest 2.6.11-rc4 kernel.
>>The conclusion is, that once the machine has physically installed 4x1GB
>>DDR400 DIMM's (bios detects only 3556 or less memory as some buffers
>>are allocated by the Intel 875P chipset and AGP card), the linux 2.6.11*
>>runs up-to 18x slower than when only 2x1GB + 2x 512MB DDR memory is installed.
>>
>
> Can you enable profiling and then post the profile info for various cases
> - slow and fast? Check out Documentation/basic_profiling.txt in the kernel
> source for understanding how to do this. This might help narrow down the issue.

http://www.natur.cuni.cz/~mmokrejs/tmp/profile-2.6.11-rc4-bk7-(3|4)GB.txt

The 3GB labeled file corresponds to fast case, 4GB is ugly slow.
What can you gather from those files? I've used readprofile but also oprofile
was enabled in kernel. I've left on the web also /proc/profile snapshots along with
System.map file. Maybe oprofile can also be used later to extract info from them.
Many thanks for help!
Martin

2005-02-22 10:14:59

by Martin Mokrejs

[permalink] [raw]
Subject: Re: memory management weirdness

Ingo Molnar wrote:
> * Andi Kleen <[email protected]> wrote:
>
>
>>> Although I've not re-tested this today again, it used to help a bit to specify
>>>mem=3548M to decrease memory used by linux (tested with AGP card plugged in, when
>>>bios reported 3556MB RAM only).
>>>
>>> I found that removing the AGP based videoc card and using an old PCI based
>>>video card results in bios detecting 4072MB of RAM. But still, the machine was
>>>slow. I've tried to "cat >| /proc/mtrr" to alter the memory settings, but the
>>>result was only a partial speedup.
>>>
>>> I'm not sure how to convince linux kernel to run fast again.
>>
>>It's most likely a MTRR problem. Play more with them.
>
>
> in particular, try to create two small tables in the same format: one
> showing the e820 memory map as reported in your kernel log, and one
> showing the mtrr areas. If there is any e820 area that is not write-back
> cached via the mtrr mappings then that's the problem. You can also use
> "mem=exactmap,..." to fix up the memory map that the BIOS provides to
> Linux. Slowdowns are very often such MTRR problems. (perhaps the kernel
> should report RAM areas that are not covered by MTRR write-back?)

I've just extracted the requested info from the files I've put on web.
Here it is:

2.4.30-pre1-bk5 2.6.11-rc4-bk7
0000000000000000 - 000000000009fc00 (usable) + +
000000000009fc00 - 00000000000a0000 (reserved) + +
00000000000e8000 - 0000000000100000 (reserved) + +
0000000000100000 - 00000000de330000 (usable) + +
00000000de330000 - 00000000de340000 (ACPI data) + +
00000000de340000 - 00000000de3f0000 (ACPI NVS) + +
00000000de3f0000 - 00000000de400000 (reserved) + +
00000000ffb80000 - 0000000100000000 (reserved) + +
found SMP MP-table at 000ff780 + +
hm, page 000ff000 reserved twice. + - ???
hm, page 00100000 reserved twice. + - ???
hm, page 000f1000 reserved twice. + - ???
hm, page 000f2000 reserved twice. + - ???




2.4.30-pre1-bk5 2.6.11-rc4-bk7
reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1 + +
reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1 + +
reg02: base=0xc0000000 (3072MB), size= 256MB: write-back, count=1 + +
reg03: base=0xd0000000 (3328MB), size= 128MB: write-back, count=1 + +
reg04: base=0xd8000000 (3456MB), size= 64MB: write-back, count=1 + +
reg05: base=0xdc000000 (3520MB), size= 32MB: write-back, count=1 + +
reg06: base=0xfe800000 (4072MB), size= 4MB: write-combining, count=1 + - !!!
reg06: base=0xf0000000 (3840MB), size= 128MB: write-combining, count=1 + +
The 4MB area should be AGP aperture, as it was set in BIOS to 4MB only

The files on web contain concatened infor from dmes, iomem, interrupts, mtrr, lspci:
http://www.natur.cuni.cz/~mmokrejs/tmp/4MB/2.4.30-pre1-bk5
http://www.natur.cuni.cz/~mmokrejs/tmp/4MB/2.6.11-rc4-bk7


So, 2.6 kernel does not see AGP aperture area. What to do next? ;)
Martin

2005-02-24 02:24:35

by Parag Warudkar

[permalink] [raw]
Subject: Re: memory management weirdness

On Tuesday 22 February 2005 04:57 am, Martin MOKREJ? wrote:
> The 3GB labeled file corresponds to fast case, 4GB is ugly slow.
> What can you gather from those files?
I did take a look and didn't analyze it further since Andi Mentioned it is a
known BIOS bug.
Sorry about the trouble - didn't imagine it might be BIOS related. Generally
speaking it helps to have profile available when things are going slow.

Parag