2007-11-03 12:52:22

by Bo Branten

[permalink] [raw]
Subject: x86_64 ten times slower than i386


Hello,

I tryed different linux distributions on a computer with an Intel Core 2 Quad
and I noticed that the 64-bit versions was at least 10 times slower than the
32-bit versions, to boot the system took over 20 minutes in 64-bit mode and
then even scrolling text at the command prompt felt slow, however Vista 64
boots as fast as Vista 32 on the same computer. So I would like to ask if this
is a known problem or if there is some simple misstake I can have done.

I used live cd's from gentoo and ubunto and at least ubuntu has a rather new
kernel, also dmesg doesn't show anything strange, for example is the bogomips
figure the same as when booting in 32-bit mode.


2007-11-03 16:27:16

by Matt Mackall

[permalink] [raw]
Subject: Re: x86_64 ten times slower than i386

On Sat, Nov 03, 2007 at 01:31:49PM +0100, Bo Brant?n wrote:
>
> Hello,
>
> I tryed different linux distributions on a computer with an Intel Core 2
> Quad and I noticed that the 64-bit versions was at least 10 times slower
> than the 32-bit versions, to boot the system took over 20 minutes in 64-bit
> mode and then even scrolling text at the command prompt felt slow, however
> Vista 64 boots as fast as Vista 32 on the same computer. So I would like to
> ask if this is a known problem or if there is some simple misstake I can
> have done.
>
> I used live cd's from gentoo and ubunto and at least ubuntu has a rather
> new kernel, also dmesg doesn't show anything strange, for example is the
> bogomips figure the same as when booting in 32-bit mode.

This is typically due to a problem with the setup of your MTRRs. Try
booting with mem=nnnM where nnn is some number smaller than your
actual amount of memory.

--
Mathematics is the supreme nostalgia of our time.

2007-11-03 22:38:43

by Bo Branten

[permalink] [raw]
Subject: Re: x86_64 ten times slower than i386

On Sat, 3 Nov 2007, Matt Mackall wrote:

> This is typically due to a problem with the setup of your MTRRs. Try
> booting with mem=nnnM where nnn is some number smaller than your
> actual amount of memory.

Thank you for that advice, the system has 4GB and if I boot with mem=3072M
it will run as fast as normal while if I don't use the mem option it will
run 10 times slower, however if I use a figure like mem=3500M the kernel
will panic, is there any way to determine the highest usable figure
without try and error?

2007-11-03 22:54:53

by Matt Mackall

[permalink] [raw]
Subject: Re: x86_64 ten times slower than i386

On Sat, Nov 03, 2007 at 11:38:24PM +0100, Bo Brant?n wrote:
> On Sat, 3 Nov 2007, Matt Mackall wrote:
>
> >This is typically due to a problem with the setup of your MTRRs. Try
> >booting with mem=nnnM where nnn is some number smaller than your
> >actual amount of memory.
>
> Thank you for that advice, the system has 4GB and if I boot with mem=3072M
> it will run as fast as normal while if I don't use the mem option it will
> run 10 times slower, however if I use a figure like mem=3500M the kernel
> will panic, is there any way to determine the highest usable figure
> without try and error?

This is not really my area, but I suspect if you send us your dmesg
output, someone here will be able to tell you how to optimize things.
How much memory does the system report at normal boot? It's not
uncommon for BIOSes to do the wrong thing with memory approaching 4G,
even on supposedly 64-bit boxes.

Also, please send us your panic message (take a digital photo if you
need to), as that shouldn't happen either.

--
Mathematics is the supreme nostalgia of our time.

2007-11-03 23:30:43

by H. Peter Anvin

[permalink] [raw]
Subject: Re: x86_64 ten times slower than i386

Bo Brant?n wrote:
> On Sat, 3 Nov 2007, Matt Mackall wrote:
>
>> This is typically due to a problem with the setup of your MTRRs. Try
>> booting with mem=nnnM where nnn is some number smaller than your
>> actual amount of memory.
>
> Thank you for that advice, the system has 4GB and if I boot with
> mem=3072M it will run as fast as normal while if I don't use the mem
> option it will run 10 times slower, however if I use a figure like
> mem=3500M the kernel will panic, is there any way to determine the
> highest usable figure without try and error?

Yes, look at how your MTRRs are set up (cat /proc/mtrr).

-hpa

2007-11-05 08:06:42

by Joseph Fannin

[permalink] [raw]
Subject: Re: x86_64 ten times slower than i386

On Sat, Nov 03, 2007 at 11:38:24PM +0100, Bo Brant?n wrote:
> On Sat, 3 Nov 2007, Matt Mackall wrote:
>
>> This is typically due to a problem with the setup of your MTRRs. Try
>> booting with mem=nnnM where nnn is some number smaller than your
>> actual amount of memory.
>
> Thank you for that advice, the system has 4GB and if I boot with mem=3072M
> it will run as fast as normal

Also, check if a BIOS upgrade is available -- it's possible that a
newer BIOS will have fixed this.

--
Joseph Fannin
[email protected]

2007-11-05 10:15:26

by Bo Branten

[permalink] [raw]
Subject: Re: x86_64 ten times slower than i386


> On Sat, 3 Nov 2007, Matt Mackall wrote:
>
>> This is typically due to a problem with the setup of your MTRRs. Try

This is the output from cat /proc/mtrr

reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1
reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1
reg02: base=0xc0000000 (3072MB), size= 256MB: write-back, count=1
reg03: base=0xcf800000 (3320MB), size= 8MB: uncachable, count=1
reg04: base=0xcf700000 (3319MB), size= 1MB: uncachable, count=1

2007-11-05 16:00:27

by Bo Branten

[permalink] [raw]
Subject: Re: x86_64 ten times slower than i386


After I uppgraded the BIOS the mtrr looks like below, and now it works if
I boot with mem=4736M so I can use all memory but it still doesn't work
without the mem parameter then it will run as slow as before.

reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1
reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1
reg02: base=0xc0000000 (3072MB), size= 256MB: write-back, count=1
reg03: base=0xcf800000 (3320MB), size= 8MB: uncachable, count=1
reg04: base=0xcf700000 (3319MB), size= 1MB: uncachable, count=1
reg05: base=0x100000000 (4096MB), size= 512MB: write-back, count=1
reg06: base=0x120000000 (4608MB), size= 128MB: write-back, count=1

2007-11-05 17:27:38

by H. Peter Anvin

[permalink] [raw]
Subject: Re: x86_64 ten times slower than i386

Bo Brant?n wrote:
>
>> On Sat, 3 Nov 2007, Matt Mackall wrote:
>>
>>> This is typically due to a problem with the setup of your MTRRs. Try
>
> This is the output from cat /proc/mtrr
>
> reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1
> reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1
> reg02: base=0xc0000000 (3072MB), size= 256MB: write-back, count=1
> reg03: base=0xcf800000 (3320MB), size= 8MB: uncachable, count=1
> reg04: base=0xcf700000 (3319MB), size= 1MB: uncachable, count=1

Confusing! Your maximum would be 3072+256M = 3328M, except that because
of the uncachable MTRRs (presumably stolen memory for video), the
maximum is 3319M.

-hpa

2007-11-05 17:27:55

by H. Peter Anvin

[permalink] [raw]
Subject: Re: x86_64 ten times slower than i386

Bo Brant?n wrote:
>
> After I uppgraded the BIOS the mtrr looks like below, and now it works
> if I boot with mem=4736M so I can use all memory but it still doesn't
> work without the mem parameter then it will run as slow as before.
>
> reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1
> reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1
> reg02: base=0xc0000000 (3072MB), size= 256MB: write-back, count=1
> reg03: base=0xcf800000 (3320MB), size= 8MB: uncachable, count=1
> reg04: base=0xcf700000 (3319MB), size= 1MB: uncachable, count=1
> reg05: base=0x100000000 (4096MB), size= 512MB: write-back, count=1
> reg06: base=0x120000000 (4608MB), size= 128MB: write-back, count=1

What does your e820 map look like?

-hpa

2007-11-05 18:46:38

by Bo Branten

[permalink] [raw]
Subject: Re: x86_64 ten times slower than i386

On Mon, 5 Nov 2007, H. Peter Anvin wrote:

>> reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1
>> reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1
>> reg02: base=0xc0000000 (3072MB), size= 256MB: write-back, count=1
>> reg03: base=0xcf800000 (3320MB), size= 8MB: uncachable, count=1
>> reg04: base=0xcf700000 (3319MB), size= 1MB: uncachable, count=1
>> reg05: base=0x100000000 (4096MB), size= 512MB: write-back, count=1
>> reg06: base=0x120000000 (4608MB), size= 128MB: write-back, count=1
>
> What does your e820 map look like?

BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000008f000 (usable)
BIOS-e820: 000000000008f000 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000cf561000 (usable)
BIOS-e820: 00000000cf561000 - 00000000cf56e000 (reserved)
BIOS-e820: 00000000cf56e000 - 00000000cf637000 (usable)
BIOS-e820: 00000000cf637000 - 00000000cf6e9000 (ACPI NVS)
BIOS-e820: 00000000cf6e9000 - 00000000cf6ed000 (usable)
BIOS-e820: 00000000cf6ed000 - 00000000cf6f2000 (ACPI data)
BIOS-e820: 00000000cf6f2000 - 00000000cf6f3000 (usable)
BIOS-e820: 00000000cf6f3000 - 00000000cf6ff000 (ACPI data)
BIOS-e820: 00000000cf6ff000 - 00000000cf700000 (usable)
BIOS-e820: 00000000cf700000 - 00000000d0000000 (reserved)
BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 000000012c000000 (usable)

2007-11-05 19:11:38

by H. Peter Anvin

[permalink] [raw]
Subject: Re: x86_64 ten times slower than i386

Bo Brant?n wrote:
> On Mon, 5 Nov 2007, H. Peter Anvin wrote:
>
>>> reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1
>>> reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1
>>> reg02: base=0xc0000000 (3072MB), size= 256MB: write-back, count=1
>>> reg03: base=0xcf800000 (3320MB), size= 8MB: uncachable, count=1
>>> reg04: base=0xcf700000 (3319MB), size= 1MB: uncachable, count=1
>>> reg05: base=0x100000000 (4096MB), size= 512MB: write-back, count=1
>>> reg06: base=0x120000000 (4608MB), size= 128MB: write-back, count=1
>>
>> What does your e820 map look like?
>
> BIOS-provided physical RAM map:
> BIOS-e820: 0000000000000000 - 000000000008f000 (usable)
> BIOS-e820: 000000000008f000 - 00000000000a0000 (reserved)
> BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
> BIOS-e820: 0000000000100000 - 00000000cf561000 (usable)
> BIOS-e820: 00000000cf561000 - 00000000cf56e000 (reserved)
> BIOS-e820: 00000000cf56e000 - 00000000cf637000 (usable)
> BIOS-e820: 00000000cf637000 - 00000000cf6e9000 (ACPI NVS)
> BIOS-e820: 00000000cf6e9000 - 00000000cf6ed000 (usable)
> BIOS-e820: 00000000cf6ed000 - 00000000cf6f2000 (ACPI data)
> BIOS-e820: 00000000cf6f2000 - 00000000cf6f3000 (usable)
> BIOS-e820: 00000000cf6f3000 - 00000000cf6ff000 (ACPI data)
> BIOS-e820: 00000000cf6ff000 - 00000000cf700000 (usable)
> BIOS-e820: 00000000cf700000 - 00000000d0000000 (reserved)
> BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
> BIOS-e820: 0000000100000000 - 000000012c000000 (usable)
>

Okay, the bug is that the range 4736MB to 4800MB is marked USABLE in the
map, but isn't covered by any MTRR. Your BIOS is still buggy.

echo 'base=0x128000000 size=0x4000000 type=write-back' > /proc/mtrr

... should fix it since you still have an unused MTRR. It might make X
unhappy, though, since it may want an MTRR to mark the framebuffer
write-combining.

-hpa

2007-11-06 00:26:58

by Andi Kleen

[permalink] [raw]
Subject: Re: x86_64 ten times slower than i386

On Mon, Nov 05, 2007 at 08:32:24AM -0800, Ray Lee wrote:
> (Don't trim cc:s.)
>
> On Nov 5, 2007 8:00 AM, Bo Brant?n <[email protected]> wrote:
>
> >> Intel Core 2 Quad
> >> and I noticed that the 64-bit versions was at least 10 times slower than the
> >> 32-bit versions,
>
> >
> > After I uppgraded the BIOS the mtrr looks like below, and now it works if
> > I boot with mem=4736M so I can use all memory but it still doesn't work
> > without the mem parameter then it will run as slow as before.

Then the BIOS is still broken Comapl in to your motherboard vendor.
> >
> > reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1
> > reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1
> > reg02: base=0xc0000000 (3072MB), size= 256MB: write-back, count=1
> > reg03: base=0xcf800000 (3320MB), size= 8MB: uncachable, count=1
> > reg04: base=0xcf700000 (3319MB), size= 1MB: uncachable, count=1
> > reg05: base=0x100000000 (4096MB), size= 512MB: write-back, count=1
> > reg06: base=0x120000000 (4608MB), size= 128MB: write-back, count=1
>
> Jesse Barnes (cc:d) wrote a patch to address this, I think (x86: trim
> memory not covered by WB MTRRs), but as far as I can tell it hasn't
> been merged yet. System is Intel, 4gb of RAM.

It wasn't merged because it broke booting on some systems.
Besides the memory would be still lost -- all it did was to automate
the "mem=XXXX" line.

-Andi

2007-11-06 01:20:21

by H. Peter Anvin

[permalink] [raw]
Subject: Re: x86_64 ten times slower than i386

Andi Kleen wrote:
>> Jesse Barnes (cc:d) wrote a patch to address this, I think (x86: trim
>> memory not covered by WB MTRRs), but as far as I can tell it hasn't
>> been merged yet. System is Intel, 4gb of RAM.
>
> It wasn't merged because it broke booting on some systems.
> Besides the memory would be still lost -- all it did was to automate
> the "mem=XXXX" line.

There really are only two ways to deal with this -- drop the memory
(which should be automated, and a warning printed) or adjust the MTRRs.
The problem is that at some point we run out of MTRRs, partially
because they're masks instead of base/limit.

Even use of PAT doesn't trivially resolve this issue with less than
doing MTRR emulation via PAT (setting the default MTRR to WB); however,
that is bound to cause trouble with SMM.

-hpa

2007-11-06 19:40:57

by Willy Tarreau

[permalink] [raw]
Subject: Re: x86_64 ten times slower than i386

On Mon, Nov 05, 2007 at 05:19:44PM -0800, H. Peter Anvin wrote:
> Andi Kleen wrote:
> >>Jesse Barnes (cc:d) wrote a patch to address this, I think (x86: trim
> >>memory not covered by WB MTRRs), but as far as I can tell it hasn't
> >>been merged yet. System is Intel, 4gb of RAM.
> >
> >It wasn't merged because it broke booting on some systems.
> >Besides the memory would be still lost -- all it did was to automate
> >the "mem=XXXX" line.
>
> There really are only two ways to deal with this -- drop the memory
> (which should be automated, and a warning printed) or adjust the MTRRs.
> The problem is that at some point we run out of MTRRs, partially
> because they're masks instead of base/limit.

Just out of curiosity, what would be the problem if the MTRRs covered more
than the memory size ? For instance, instead of having 512 MB at 4G, why
not have 1G at 4G ?

regards,
Willy

2007-11-06 19:50:53

by H. Peter Anvin

[permalink] [raw]
Subject: Re: x86_64 ten times slower than i386

Willy Tarreau wrote:
> On Mon, Nov 05, 2007 at 05:19:44PM -0800, H. Peter Anvin wrote:
>> Andi Kleen wrote:
>>>> Jesse Barnes (cc:d) wrote a patch to address this, I think (x86: trim
>>>> memory not covered by WB MTRRs), but as far as I can tell it hasn't
>>>> been merged yet. System is Intel, 4gb of RAM.
>>> It wasn't merged because it broke booting on some systems.
>>> Besides the memory would be still lost -- all it did was to automate
>>> the "mem=XXXX" line.
>> There really are only two ways to deal with this -- drop the memory
>> (which should be automated, and a warning printed) or adjust the MTRRs.
>> The problem is that at some point we run out of MTRRs, partially
>> because they're masks instead of base/limit.
>
> Just out of curiosity, what would be the problem if the MTRRs covered more
> than the memory size ? For instance, instead of having 512 MB at 4G, why
> not have 1G at 4G ?

That's fine, *as long as* you don't have any I/O devices there,
including things like UMA graphics devices or memory areas used by SMM.
In theory, those should be marked reserved in e820. In practice...

That's really the fundamental problem with the Intel MTRR design. It
works really well for banks of memory and really poorly as soon as
something wants to "steal" memory or address space.

-hpa

2007-11-06 20:02:22

by Willy Tarreau

[permalink] [raw]
Subject: Re: x86_64 ten times slower than i386

On Tue, Nov 06, 2007 at 11:50:13AM -0800, H. Peter Anvin wrote:
> Willy Tarreau wrote:
> >Just out of curiosity, what would be the problem if the MTRRs covered more
> >than the memory size ? For instance, instead of having 512 MB at 4G, why
> >not have 1G at 4G ?
>
> That's fine, *as long as* you don't have any I/O devices there,
> including things like UMA graphics devices or memory areas used by SMM.
> In theory, those should be marked reserved in e820. In practice...

OK, thanks Peter for the explanation.

Cheers,
Willy

2007-11-07 18:43:17

by Jesse Barnes

[permalink] [raw]
Subject: Re: x86_64 ten times slower than i386

On Monday, November 05, 2007 4:26 Andi Kleen wrote:
> On Mon, Nov 05, 2007 at 08:32:24AM -0800, Ray Lee wrote:
> > (Don't trim cc:s.)
> >
> > On Nov 5, 2007 8:00 AM, Bo Brant?n <[email protected]> wrote:
> > >> Intel Core 2 Quad
> > >> and I noticed that the 64-bit versions was at least 10 times
> > >> slower than the 32-bit versions,
> > >
> > > After I uppgraded the BIOS the mtrr looks like below, and now it
> > > works if I boot with mem=4736M so I can use all memory but it
> > > still doesn't work without the mem parameter then it will run as
> > > slow as before.
>
> Then the BIOS is still broken Comapl in to your motherboard vendor.
>
> > > reg00: base=0x00000000 ( 0MB), size=2048MB: write-back, count=1
> > > reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1
> > > reg02: base=0xc0000000 (3072MB), size= 256MB: write-back, count=1
> > > reg03: base=0xcf800000 (3320MB), size= 8MB: uncachable, count=1
> > > reg04: base=0xcf700000 (3319MB), size= 1MB: uncachable, count=1
> > > reg05: base=0x100000000 (4096MB), size= 512MB: write-back,
> > > count=1 reg06: base=0x120000000 (4608MB), size= 128MB:
> > > write-back, count=1
> >
> > Jesse Barnes (cc:d) wrote a patch to address this, I think (x86:
> > trim memory not covered by WB MTRRs), but as far as I can tell it
> > hasn't been merged yet. System is Intel, 4gb of RAM.
>
> It wasn't merged because it broke booting on some systems.
> Besides the memory would be still lost -- all it did was to automate
> the "mem=XXXX" line.

Andi, do you have any details on which system broke and how? I haven't
heard back from you on my last message on the subject... the patch was
in -mm for awhile with no complaints.

Ultimately, this is a broken BIOS issue, but still, it would be nice if
the kernel handled it better.

Thanks,
Jesse

2007-11-10 13:41:36

by Bo Branten

[permalink] [raw]
Subject: Re: x86_64 ten times slower than i386

On Mon, 5 Nov 2007, Bo Brant?n wrote:

>
> After I uppgraded the BIOS the mtrr looks like below, and now it works if I
> boot with mem=4736M so I can use all memory but it still doesn't work without
> the mem parameter then it will run as slow as before.

I noticed that after I uppgraded the BIOS it works automtically without
the mem parameter with kernel 2.6.22-14 while with kernel 2.6.19 the mem
parameter is still needed so some fix has been added that takes care of
this problem.