2007-06-21 17:42:46

by Peter Rabbitson

[permalink] [raw]
Subject: Terrible IO performance when using 4GB of RAM on a 32 bit machine

Hello everyone,

I hope this question is not too basic for the intended audience. I have
a server with an Intel SE7210TP1-E motherboard[1] and a single 3.4GHz P4
CPU[2]. I am currently running a vanilla 2.6.21.5 kernel with SMP/HT.
Two patches are applied: one is a SATA driver[3] and the other is a
stable kernel bugfix[4]. When I upgraded my memory to 4GB I experienced
abysmal performance at the IO layer across all block devices. CPU and
network performance seem to be unaffected. Booting the system with
mem=3900M fixes the issue, but I wanted to get to the root of it.

I have a 4 drive raid array with LVM on top, which after tuning
consistently delivers 210MB/s sequential read performance. If I omit the
mem option and let the kernel autodetect all 4GB of memory, performance
drops to about 4MB/s. A drop to 1MB/s is also observed on direct disk
access (dd if=/dev/sdX). Two of the disks are connected the the on-board
SATA ports and two to a Highpoint RocketRaid 1820A Card. A backup hard
drive connected to the on-board IDE is suffering the same problems, so
it should be something more fundamental.

What I have tried so far in various combinations:

o a kernel without SMP/HT support
o a kernel with HIGHPTE=y
o different timer frequencies (100, 1000)
o older kernel version (2.6.18)

I have captured dmesg output without mem[5], with mem=3900M[6] and
mem=2048M[7].

Overall the system did not exhibit any problems in the last 2 years it
operated, and it seems to be running fine with mem=3900M, which is in
effect for about a month now. I would appreciate any suggestions on how
to troubleshoot this further, or requests for additional information
about the system.

TIA

Peter

[1]
http://download.intel.com/support/motherboards/server/se7210tp1-e/sb/tpsse7210tp1e20.pdf
[2] http://rabbit.us/pool/4g/cpuinfo.txt
[3]
http://www.highpoint-tech.com/BIOS_Driver/rr1820a/Linux/rr18xx-opensource-v1.17-0123.tgz
[4] http://www.mail-archive.com/[email protected]/msg08186.html
[5] http://rabbit.us/pool/4g/dmesg_nomem.txt
[6] http://rabbit.us/pool/4g/dmesg_3900.txt
[7] http://rabbit.us/pool/4g/dmesg_2048.txt

Follows a diff of [6] and [5]:

--- dmesg_3900.txt 2007-06-21 19:04:33.000000000 +0200
+++ dmesg_nomem.txt 2007-06-21 19:04:36.000000000 +0200
@@ -20,41 +20,24 @@
BIOS-e820: 00000000fbeff000 - 00000000fbf00000 (ACPI NVS)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
- limit_regions start: 0000000000000000 - 000000000009dc00 (usable)
- limit_regions start: 000000000009dc00 - 00000000000a0000 (reserved)
- limit_regions start: 00000000000e4000 - 0000000000100000 (reserved)
- limit_regions start: 0000000000100000 - 00000000fbef0000 (usable)
- limit_regions start: 00000000fbef0000 - 00000000fbeff000 (ACPI data)
- limit_regions start: 00000000fbeff000 - 00000000fbf00000 (ACPI NVS)
- limit_regions start: 00000000fee00000 - 00000000fee01000 (reserved)
- limit_regions start: 00000000ffb00000 - 0000000100000000 (reserved)
- limit_regions endfor: 0000000000000000 - 000000000009dc00 (usable)
- limit_regions endfor: 000000000009dc00 - 00000000000a0000 (reserved)
- limit_regions endfor: 00000000000e4000 - 0000000000100000 (reserved)
- limit_regions endfor: 0000000000100000 - 00000000f3c00000 (usable)
-user-defined physical RAM map:
- user: 0000000000000000 - 000000000009dc00 (usable)
- user: 000000000009dc00 - 00000000000a0000 (reserved)
- user: 00000000000e4000 - 0000000000100000 (reserved)
- user: 0000000000100000 - 00000000f3c00000 (usable)
-3004MB HIGHMEM available.
+3134MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000ff780
-Entering add_active_range(0, 0, 998400) 0 entries of 256 used
+Entering add_active_range(0, 0, 1031920) 0 entries of 256 used
Zone PFN ranges:
DMA 0 -> 4096
Normal 4096 -> 229376
- HighMem 229376 -> 998400
+ HighMem 229376 -> 1031920
early_node_map[1] active PFN ranges
- 0: 0 -> 998400
-On node 0 totalpages: 998400
+ 0: 0 -> 1031920
+On node 0 totalpages: 1031920
DMA zone: 32 pages used for memmap
DMA zone: 0 pages reserved
DMA zone: 4064 pages, LIFO batch:0
Normal zone: 1760 pages used for memmap
Normal zone: 223520 pages, LIFO batch:31
- HighMem zone: 6008 pages used for memmap
- HighMem zone: 763016 pages, LIFO batch:31
+ HighMem zone: 6269 pages used for memmap
+ HighMem zone: 796275 pages, LIFO batch:31
DMI 2.3 present.
ACPI: RSDP 000F87E0, 0014 (r0 ACPIAM)
ACPI: RSDT FBEF0000, 0030 (r1 A M I OEMRSDT 7000529 MSFT 97)
@@ -83,9 +66,9 @@
ACPI: IRQ10 used by override.
Enabling APIC mode: Flat. Using 2 I/O APICs
Using ACPI (MADT) for SMP configuration information
-Allocating PCI resources starting at f4000000 (gap: f3c00000:0c400000)
-Built 1 zonelists. Total pages: 990600
-Kernel command line: root=/dev/md0 ro vga=307 mem=3900M
+Allocating PCI resources starting at fc000000 (gap: fbf00000:02f00000)
+Built 1 zonelists. Total pages: 1023859
+Kernel command line: root=/dev/md0 ro vga=307
mapped APIC to ffffd000 (fee00000)
mapped IOAPIC to ffffc000 (fec00000)
mapped IOAPIC to ffffb000 (fec10000)
@@ -93,11 +76,11 @@
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 16384 bytes)
-Detected 3391.614 MHz processor.
+Detected 3391.760 MHz processor.
Console: colour VGA+ 132x44
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
-Memory: 3955644k/3993600k available (2894k kernel code, 36912k
reserved, 930k data, 252k init, 3076096k highmem)
+Memory: 4088536k/4127680k available (2894k kernel code, 37968k
reserved, 930k data, 252k init, 3210176k highmem)
virtual kernel memory layout:
fixmap : 0xfff83000 - 0xfffff000 ( 496 kB)
pkmap : 0xff800000 - 0xffc00000 (4096 kB)
@@ -107,7 +90,7 @@
.data : 0xc03d3a6f - 0xc04bc34c ( 930 kB)
.text : 0xc0100000 - 0xc03d3a6f (2894 kB)
Checking if this processor honours the WP bit even in supervisor
mode... Ok.
-Calibrating delay using timer specific routine.. 6785.46 BogoMIPS
(lpj=3392734)
+Calibrating delay using timer specific routine.. 6785.48 BogoMIPS
(lpj=3392744)
Mount-cache hash table entries: 512
CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000
0000441d 00000000 00000000
monitor/mwait feature present.
@@ -126,7 +109,7 @@
CPU0: Intel(R) Pentium(R) 4 CPU 3.40GHz stepping 04
Booting processor 1/1 eip 2000
Initializing CPU#1
-Calibrating delay using timer specific routine.. 6782.08 BogoMIPS
(lpj=3391040)
+Calibrating delay using timer specific routine.. 6782.06 BogoMIPS
(lpj=3391032)
CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000
0000441d 00000000 00000000
monitor/mwait feature present.
CPU: Trace cache: 12K uops, L1 D cache: 16K
@@ -138,12 +121,14 @@
CPU1: Intel P4/Xeon Extended MCE MSRs (12) available
CPU1: Thermal monitoring enabled
CPU1: Intel(R) Pentium(R) 4 CPU 3.40GHz stepping 04
-Total of 2 processors activated (13567.54 BogoMIPS).
+Total of 2 processors activated (13567.55 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
+APIC calibration not consistent with PM Timer: 118ms instead of 100ms
+APIC delta adjusted to PM-Timer: 1246875 (1483551)
checking TSC synchronization [CPU#0 -> CPU#1]: passed.
Brought up 2 CPUs
-migration_cost=13
+migration_cost=312
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: PCI BIOS revision 2.10 entry at 0xf0031, last bus=3
@@ -180,13 +165,13 @@
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq". If it helps, post
a report
pnp: 00:0a: ioport range 0x680-0x6ff has been reserved
-Time: tsc clocksource has been installed.
pnp: 00:0a: ioport range 0x295-0x296 has been reserved
pnp: 00:0b: iomem range 0xfec10000-0xfec1ffff has been reserved
pnp: 00:0b: iomem range 0xfed20000-0xfed8ffff has been reserved
-pnp: 00:0b: iomem range 0xffb00000-0xffbfffff has been reserved
+pnp: 00:0b: iomem range 0xffb00000-0xffbfffff could not be reserved
pnp: 00:0c: iomem range 0xfec00000-0xfec00fff has been reserved
-pnp: 00:0c: iomem range 0xfee00000-0xfee00fff has been reserved
+pnp: 00:0c: iomem range 0xfee00000-0xfee00fff could not be reserved
+Time: tsc clocksource has been installed.
pnp: 00:0d: iomem range 0x0-0x9ffff could not be reserved
pnp: 00:0d: iomem range 0xc0000-0xdffff could not be reserved
pnp: 00:0d: iomem range 0xe0000-0xfffff could not be reserved
@@ -202,7 +187,7 @@
PCI: Bridge: 0000:00:1e.0
IO window: c000-cfff
MEM window: fc600000-fe6fffff
- PREFETCH window: f4000000-f40fffff
+ PREFETCH window: fc000000-fc0fffff
PCI: Setting latency timer of device 0000:00:1e.0 to 64
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 5, 131072 bytes)


2007-06-21 19:27:16

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Terrible IO performance when using 4GB of RAM on a 32 bit machine

Peter Rabbitson wrote:
>
> I have captured dmesg output without mem[5], with mem=3900M[6] and
> mem=2048M[7].
>

What does /proc/mtrr look like in the two cases?

-hpa

2007-06-21 23:02:26

by Peter Rabbitson

[permalink] [raw]
Subject: Re: Terrible IO performance when using 4GB of RAM on a 32 bit machine

H. Peter Anvin wrote:
> Peter Rabbitson wrote:
>> I have captured dmesg output without mem[5], with mem=3900M[6] and
>> mem=2048M[7].
>>
>
> What does /proc/mtrr look like in the two cases?
>

Identical for mem=3900 and without it.

reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1

reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1

reg02: base=0xc0000000 (3072MB), size= 512MB: write-back, count=1

reg03: base=0xe0000000 (3584MB), size= 256MB: write-back, count=1

reg04: base=0xf0000000 (3840MB), size= 128MB: write-back, count=1

reg05: base=0xf8000000 (3968MB), size= 32MB: write-back, count=1


Peter

2007-06-22 00:02:21

by Robert Hancock

[permalink] [raw]
Subject: Re: Terrible IO performance when using 4GB of RAM on a 32 bit machine

Peter Rabbitson wrote:
> H. Peter Anvin wrote:
>> Peter Rabbitson wrote:
>>> I have captured dmesg output without mem[5], with mem=3900M[6] and
>>> mem=2048M[7].
>>>
>>
>> What does /proc/mtrr look like in the two cases?
>>
>
> Identical for mem=3900 and without it.
>
> reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1
> reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1
> reg02: base=0xc0000000 (3072MB), size= 512MB: write-back, count=1
> reg03: base=0xe0000000 (3584MB), size= 256MB: write-back, count=1
> reg04: base=0xf0000000 (3840MB), size= 128MB: write-back, count=1
> reg05: base=0xf8000000 (3968MB), size= 32MB: write-back, count=1

Looks like another case of bad MTRRs on an Intel motherboard? The BIOS
is marking only memory up to 4000MB as cacheable, but the actual memory
extends up to about 4031MB. Therefore anything that accesses the top
31MB of memory will run very slow.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-06-22 08:29:21

by Peter Rabbitson

[permalink] [raw]
Subject: Re: Terrible IO performance when using 4GB of RAM on a 32 bit machine

Robert Hancock wrote:
> Peter Rabbitson wrote:
>> H. Peter Anvin wrote:
>>>
>>> What does /proc/mtrr look like in the two cases?
>>>
>>
>> Identical for mem=3900 and without it.
>>
>> reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1
>> reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1
>> reg02: base=0xc0000000 (3072MB), size= 512MB: write-back, count=1
>> reg03: base=0xe0000000 (3584MB), size= 256MB: write-back, count=1
>> reg04: base=0xf0000000 (3840MB), size= 128MB: write-back, count=1
>> reg05: base=0xf8000000 (3968MB), size= 32MB: write-back, count=1
>
> Looks like another case of bad MTRRs on an Intel motherboard? The BIOS
> is marking only memory up to 4000MB as cacheable, but the actual memory
> extends up to about 4031MB. Therefore anything that accesses the top
> 31MB of memory will run very slow.
>

Ah, it all makes sense now. In this case I assume mem=4000 is perfectly
safe and usable for the time being. In the beginning I tried with
mem=4g, which obviously did not work. If anyone is interested in adding
an exception/workaround for this particular motherboard, I'd be happy to
help with testing. I have added more information about the system:
current kernel config [1], output of `lspci -vv`[2], dmesg with mem=4000[3].

Thank you!

Peter

[1] http://rabbit.us/pool/4g/config-2.6.21.5.arzamas.6.txt
[2] http://rabbit.us/pool/4g/lspci_4000.txt
[3] http://rabbit.us/pool/4g/dmesg_4000.txt

2007-06-22 14:38:33

by Robert Hancock

[permalink] [raw]
Subject: Re: Terrible IO performance when using 4GB of RAM on a 32 bit machine

Peter Rabbitson wrote:
> Robert Hancock wrote:
>> Peter Rabbitson wrote:
>>> H. Peter Anvin wrote:
>>>>
>>>> What does /proc/mtrr look like in the two cases?
>>>>
>>>
>>> Identical for mem=3900 and without it.
>>>
>>> reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1
>>> reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1
>>> reg02: base=0xc0000000 (3072MB), size= 512MB: write-back, count=1
>>> reg03: base=0xe0000000 (3584MB), size= 256MB: write-back, count=1
>>> reg04: base=0xf0000000 (3840MB), size= 128MB: write-back, count=1
>>> reg05: base=0xf8000000 (3968MB), size= 32MB: write-back, count=1
>>
>> Looks like another case of bad MTRRs on an Intel motherboard? The BIOS
>> is marking only memory up to 4000MB as cacheable, but the actual
>> memory extends up to about 4031MB. Therefore anything that accesses
>> the top 31MB of memory will run very slow.
>>
>
> Ah, it all makes sense now. In this case I assume mem=4000 is perfectly
> safe and usable for the time being. In the beginning I tried with
> mem=4g, which obviously did not work. If anyone is interested in adding
> an exception/workaround for this particular motherboard, I'd be happy to
> help with testing. I have added more information about the system:
> current kernel config [1], output of `lspci -vv`[2], dmesg with
> mem=4000[3].
>
> Thank you!
>
> Peter

There was a patch floating around recently to detect the case where the
MTRRs don't map all of RAM as write-back, automatically cap the memory
used by the kernel to what is mapped and print some loud warnings..

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-06-23 06:56:53

by Peter Rabbitson

[permalink] [raw]
Subject: Re: Terrible IO performance when using 4GB of RAM on a 32 bit machine [solved]

Robert Hancock wrote:
> Peter Rabbitson wrote:
>>
>> reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1
>> reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1
>> reg02: base=0xc0000000 (3072MB), size= 512MB: write-back, count=1
>> reg03: base=0xe0000000 (3584MB), size= 256MB: write-back, count=1
>> reg04: base=0xf0000000 (3840MB), size= 128MB: write-back, count=1
>> reg05: base=0xf8000000 (3968MB), size= 32MB: write-back, count=1
>
> Looks like another case of bad MTRRs on an Intel motherboard? The BIOS
> is marking only memory up to 4000MB as cacheable, but the actual memory
> extends up to about 4031MB. Therefore anything that accesses the top
> 31MB of memory will run very slow.

I sincerely apologize for not paying enough attention. Intel has fixed
this issue 2 BIOS revisions ago[1], kudos to Suren Karapetyan for
pointing this out. I just upgraded the BIOS and it indeed solves the
problem. The mtrr still seems not to be going over 4000MB, but
everything works without any visible slowdowns at all. Here is how the
mtrr and the relevant dmesg line look like now:

reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1
reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1
reg02: base=0xc0000000 (3072MB), size= 512MB: write-back, count=1
reg03: base=0xe0000000 (3584MB), size= 256MB: write-back, count=1
reg04: base=0xf0000000 (3840MB), size= 128MB: write-back, count=1
(there is no reg05)
...
Memory: 4024568k/4063168k available (2894k kernel code, 37456k reserved,
930k data, 252k init, 3145664k highmem)
...

Thank you for the help!

Peter

[1]
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=N&Inst=Yes&ProductID=1640&DwnldID=11261&strOSs=All&OSFullName=All%20Operating%20Systems&lang=eng