2008-02-07 07:32:13

by Balaji Rao

[permalink] [raw]
Subject: [PATCH][Regression] x86, 32-bit: trim memory not covered by wb mtrrs - FIX

Hello,

The following commit caused my X server to stop working.

commit 99fc8d424bc5d803fe92cad56c068fe64e73747a
Author: Jesse Barnes <[email protected]>
Date: Wed Jan 30 13:33:18 2008 +0100

x86, 32-bit: trim memory not covered by wb mtrrs

This patch fixes the improper handling of addresses > 4G by mtrr_trim_uncached_memory.
This, now brings up X on my system.

Signed-off-by: Balaji Rao <[email protected]>

diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 1e27b69..b0f1d48 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -659,7 +659,7 @@ static __init int amd_special_default_mtrr(void)
*/
int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
{
- unsigned long i, base, size, highest_addr = 0, def, dummy;
+ unsigned long i, base, size, highest_pfn = 0, def, dummy;
mtrr_type type;
u64 trim_start, trim_size;

@@ -682,30 +682,27 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
mtrr_if->get(i, &base, &size, &type);
if (type != MTRR_TYPE_WRBACK)
continue;
- base <<= PAGE_SHIFT;
- size <<= PAGE_SHIFT;
- if (highest_addr < base + size)
- highest_addr = base + size;
+ if (highest_pfn < base + size)
+ highest_pfn = base + size;
}

/* kvm/qemu doesn't have mtrr set right, don't trim them all */
- if (!highest_addr) {
+ if (!highest_pfn) {
printk(KERN_WARNING "WARNING: strange, CPU MTRRs all blank?\n");
WARN_ON(1);
return 0;
}

- if ((highest_addr >> PAGE_SHIFT) < end_pfn) {
+ if (highest_pfn < end_pfn) {
printk(KERN_WARNING "WARNING: BIOS bug: CPU MTRRs don't cover"
" all of memory, losing %LdMB of RAM.\n",
- (((u64)end_pfn << PAGE_SHIFT) - highest_addr) >> 20);
+ ((u64)(end_pfn - highest_pfn) << PAGE_SHIFT) >> 20);

WARN_ON(1);

printk(KERN_INFO "update e820 for mtrr\n");
- trim_start = highest_addr;
- trim_size = end_pfn;
- trim_size <<= PAGE_SHIFT;
+ trim_start = highest_pfn << PAGE_SHIFT;
+ trim_size = end_pfn << PAGE_SHIFT;
trim_size -= trim_start;
add_memory_region(trim_start, trim_size, E820_RESERVED);
update_e820();


2008-02-07 08:03:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH][Regression] x86, 32-bit: trim memory not covered by wb mtrrs - FIX


* Balaji Rao <[email protected]> wrote:

> Hello,
>
> The following commit caused my X server to stop working.
>
> commit 99fc8d424bc5d803fe92cad56c068fe64e73747a
> Author: Jesse Barnes <[email protected]>
> Date: Wed Jan 30 13:33:18 2008 +0100
>
> x86, 32-bit: trim memory not covered by wb mtrrs
>
> This patch fixes the improper handling of addresses > 4G by
> mtrr_trim_uncached_memory. This, now brings up X on my system.

thanks. Incidentally this same bug was reported and fixed yesterday, and
that fix is upstream already. Could you please compare your solution to
Yinghai Lu's fix below, and send us a patch for any further improvements
(or cleanups) you might notice in that code? It seems to be almost the
same fix as yours. (and hopefully it fixes your X problem too)

Ingo

------------------>
commit 20651af9ac60fd6e31360688ad44861a7d05256a
Author: Yinghai Lu <[email protected]>
Date: Wed Feb 6 22:39:45 2008 +0100

x86: fix mttr trimming

Pavel Emelyanov reported that his networking card did not work
and bisected it down to:

"
The commit

093af8d7f0ba3c6be1485973508584ef081e9f93
x86_32: trim memory by updating e820

broke my e1000 card: on loading driver says that

e1000: probe of 0000:04:03.0 failed with error -5

and the interface doesn't appear.
"

on a 32-bit kernel, base will overflow when try to do PAGE_SHIFT,
and highest_addr will always less 4G.

So use pfn instead of address to avoid the overflow when more than
4g RAM is installed on a 32-bit kernel.

Many thanks to Pavel Emelyanov for reporting and testing it.

Bisected-by: Pavel Emelyanov <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>
Tested-by: Pavel Emelyanov <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 1e27b69..b6e136f 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -659,7 +659,7 @@ static __init int amd_special_default_mtrr(void)
*/
int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
{
- unsigned long i, base, size, highest_addr = 0, def, dummy;
+ unsigned long i, base, size, highest_pfn = 0, def, dummy;
mtrr_type type;
u64 trim_start, trim_size;

@@ -682,28 +682,27 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
mtrr_if->get(i, &base, &size, &type);
if (type != MTRR_TYPE_WRBACK)
continue;
- base <<= PAGE_SHIFT;
- size <<= PAGE_SHIFT;
- if (highest_addr < base + size)
- highest_addr = base + size;
+ if (highest_pfn < base + size)
+ highest_pfn = base + size;
}

/* kvm/qemu doesn't have mtrr set right, don't trim them all */
- if (!highest_addr) {
+ if (!highest_pfn) {
printk(KERN_WARNING "WARNING: strange, CPU MTRRs all blank?\n");
WARN_ON(1);
return 0;
}

- if ((highest_addr >> PAGE_SHIFT) < end_pfn) {
+ if (highest_pfn < end_pfn) {
printk(KERN_WARNING "WARNING: BIOS bug: CPU MTRRs don't cover"
- " all of memory, losing %LdMB of RAM.\n",
- (((u64)end_pfn << PAGE_SHIFT) - highest_addr) >> 20);
+ " all of memory, losing %luMB of RAM.\n",
+ (end_pfn - highest_pfn) >> (20 - PAGE_SHIFT));

WARN_ON(1);

printk(KERN_INFO "update e820 for mtrr\n");
- trim_start = highest_addr;
+ trim_start = highest_pfn;
+ trim_start <<= PAGE_SHIFT;
trim_size = end_pfn;
trim_size <<= PAGE_SHIFT;
trim_size -= trim_start;

2008-02-07 08:25:18

by Balaji Rao

[permalink] [raw]
Subject: Re: [PATCH][Regression] x86, 32-bit: trim memory not covered by wb mtrrs - FIX

On Thursday 07 February 2008 01:32:45 pm Ingo Molnar wrote:
> * Balaji Rao <[email protected]> wrote:
> > Hello,
> >
> > The following commit caused my X server to stop working.
> >
> > commit 99fc8d424bc5d803fe92cad56c068fe64e73747a
> > Author: Jesse Barnes <[email protected]>
> > Date: Wed Jan 30 13:33:18 2008 +0100
> >
> > x86, 32-bit: trim memory not covered by wb mtrrs
> >
> > This patch fixes the improper handling of addresses > 4G by
> > mtrr_trim_uncached_memory. This, now brings up X on my system.
>
> thanks. Incidentally this same bug was reported and fixed yesterday, and
> that fix is upstream already. Could you please compare your solution to
> Yinghai Lu's fix below, and send us a patch for any further improvements
> (or cleanups) you might notice in that code? It seems to be almost the
> same fix as yours. (and hopefully it fixes your X problem too)

Cool! Yes, Yinghai Lu's patch indeed is the same as mine and its really
surprising that we've used the same variable name too! :)

regards,
balaji rao

2008-02-07 08:50:46

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH][Regression] x86, 32-bit: trim memory not covered by wb mtrrs - FIX

On Feb 7, 2008 12:21 AM, Balaji Rao <[email protected]> wrote:
> On Thursday 07 February 2008 01:32:45 pm Ingo Molnar wrote:
> > * Balaji Rao <[email protected]> wrote:
> > > Hello,
> > >
> > > The following commit caused my X server to stop working.
> > >
> > > commit 99fc8d424bc5d803fe92cad56c068fe64e73747a
> > > Author: Jesse Barnes <[email protected]>
> > > Date: Wed Jan 30 13:33:18 2008 +0100
> > >
> > > x86, 32-bit: trim memory not covered by wb mtrrs
> > >
> > > This patch fixes the improper handling of addresses > 4G by
> > > mtrr_trim_uncached_memory. This, now brings up X on my system.
> >
> > thanks. Incidentally this same bug was reported and fixed yesterday, and
> > that fix is upstream already. Could you please compare your solution to
> > Yinghai Lu's fix below, and send us a patch for any further improvements
> > (or cleanups) you might notice in that code? It seems to be almost the
> > same fix as yours. (and hopefully it fixes your X problem too)
>
> Cool! Yes, Yinghai Lu's patch indeed is the same as mine and its really
> surprising that we've used the same variable name too! :)

minor difference
+ trim_start = highest_pfn << PAGE_SHIFT;
+ trim_size = end_pfn << PAGE_SHIFT;

could cause some problem with 32 bit kernel when mem > 4g.
becase highest_pfn and end_pfn is unsigned long aka 32 bit ...could overflow.

so need to assign thtem to trim_start/trim_end at first
or
+ trim_start = (u64)highest_pfn << PAGE_SHIFT;
+ trim_size = (u64)end_pfn << PAGE_SHIFT;

YH

2008-02-07 08:56:20

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH][Regression] x86, 32-bit: trim memory not covered by wb mtrrs - FIX

On Feb 7, 2008 12:02 AM, Ingo Molnar <[email protected]> wrote:
>
> * Balaji Rao <[email protected]> wrote:
>
> > Hello,
> >
> > The following commit caused my X server to stop working.
> >
> > commit 99fc8d424bc5d803fe92cad56c068fe64e73747a
> > Author: Jesse Barnes <[email protected]>
> > Date: Wed Jan 30 13:33:18 2008 +0100
> >
> > x86, 32-bit: trim memory not covered by wb mtrrs
> >
> > This patch fixes the improper handling of addresses > 4G by
> > mtrr_trim_uncached_memory. This, now brings up X on my system.
>
> thanks. Incidentally this same bug was reported and fixed yesterday, and
> that fix is upstream already. Could you please compare your solution to
> Yinghai Lu's fix below, and send us a patch for any further improvements
> (or cleanups) you might notice in that code? It seems to be almost the
> same fix as yours. (and hopefully it fixes your X problem too)

then title for patch is not right
> > x86, 32-bit: trim memory not covered by wb mtrrs

should be

x86_64: trim memory not covered by wb mtrrs

anyway, we can not modify git log in linux-2.6.git. can we?

YH

2008-02-07 09:00:50

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH][Regression] x86, 32-bit: trim memory not covered by wb mtrrs - FIX


* Yinghai Lu <[email protected]> wrote:

> then title for patch is not right
> > > x86, 32-bit: trim memory not covered by wb mtrrs
>
> should be
>
> x86_64: trim memory not covered by wb mtrrs

indeed :) Although further reading into the commit clearly tells us that
this is 64-bit only for the time being.

> anyway, we can not modify git log in linux-2.6.git. can we?

yeah, linux-2.6.git is append-only.

Ingo

2008-02-07 09:05:45

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH][Regression] x86, 32-bit: trim memory not covered by wb mtrrs - FIX


* Yinghai Lu <[email protected]> wrote:

> minor difference
> + trim_start = highest_pfn << PAGE_SHIFT;
> + trim_size = end_pfn << PAGE_SHIFT;
>
> could cause some problem with 32 bit kernel when mem > 4g. becase
> highest_pfn and end_pfn is unsigned long aka 32 bit ...could overflow.
>
> so need to assign thtem to trim_start/trim_end at first
> or
> + trim_start = (u64)highest_pfn << PAGE_SHIFT;
> + trim_size = (u64)end_pfn << PAGE_SHIFT;

indeed ...

i think the 64-bit behavior of gcc is inherently dangerous, we had
numerous subtle bugs in that area.

i think perhaps Sparse should be extended to warn about this. I think
any case where on _32-bit_ we have an 'unsigned long' that is shifted to
the left by any significant amount is clearly in danger of overflowing.
_Especially_ when the lvalue is 64-bit!

or in other words, on any such construct:

64-bit lvalue = ... 32-bit value

we should enforce _explicit_ (u64) conversions.

Ingo

2008-02-07 09:11:36

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH][Regression] x86, 32-bit: trim memory not covered by wb mtrrs - FIX

On Feb 7, 2008 1:04 AM, Ingo Molnar <[email protected]> wrote:
>
> * Yinghai Lu <[email protected]> wrote:
>
> > minor difference
> > + trim_start = highest_pfn << PAGE_SHIFT;
> > + trim_size = end_pfn << PAGE_SHIFT;
> >
> > could cause some problem with 32 bit kernel when mem > 4g. becase
> > highest_pfn and end_pfn is unsigned long aka 32 bit ...could overflow.
> >
> > so need to assign thtem to tr, 32-bitim_start/trim_end at first
> > or
> > + trim_start = (u64)highest_pfn << PAGE_SHIFT;
> > + trim_size = (u64)end_pfn << PAGE_SHIFT;
>
> indeed ...
>
> i think the 64-bit behavior of gcc is inherently dangerous, we had
> numerous subtle bugs in that area.
>
> i think perhaps Sparse should be extended to warn about this. I think
> any case where on _32-bit_ we have an 'unsigned long' that is shifted to
> the left by any significant amount is clearly in danger of overflowing.
> _Especially_ when the lvalue is 64-bit!
>
> or in other words, on any such construct:
>
> 64-bit lvalue = ... 32-bit value
>
> we should enforce _explicit_ (u64) conversions.

so you mean gcc will do some optimization to make

+ trim_start = highest_pfn;
+ trim_start <<= PAGE_SHIFT;

to be

+ trim_start = highest_pfn << PAGE_SHIFT;

looks scary...

then gcc need to be fixed.

YH

2008-02-07 10:17:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH][Regression] x86, 32-bit: trim memory not covered by wb mtrrs - FIX


* Yinghai Lu <[email protected]> wrote:

> On Feb 7, 2008 1:04 AM, Ingo Molnar <[email protected]> wrote:
> >
> > * Yinghai Lu <[email protected]> wrote:
> >
> > > minor difference
> > > + trim_start = highest_pfn << PAGE_SHIFT;
> > > + trim_size = end_pfn << PAGE_SHIFT;
> > >
> > > could cause some problem with 32 bit kernel when mem > 4g. becase
> > > highest_pfn and end_pfn is unsigned long aka 32 bit ...could overflow.
> > >
> > > so need to assign thtem to tr, 32-bitim_start/trim_end at first
> > > or
> > > + trim_start = (u64)highest_pfn << PAGE_SHIFT;
> > > + trim_size = (u64)end_pfn << PAGE_SHIFT;
> >
> > indeed ...
> >
> > i think the 64-bit behavior of gcc is inherently dangerous, we had
> > numerous subtle bugs in that area.
> >
> > i think perhaps Sparse should be extended to warn about this. I think
> > any case where on _32-bit_ we have an 'unsigned long' that is shifted to
> > the left by any significant amount is clearly in danger of overflowing.
> > _Especially_ when the lvalue is 64-bit!
> >
> > or in other words, on any such construct:
> >
> > 64-bit lvalue = ... 32-bit value
> >
> > we should enforce _explicit_ (u64) conversions.
>
> so you mean gcc will do some optimization to make
>
> + trim_start = highest_pfn;
> + trim_start <<= PAGE_SHIFT;
>
> to be
>
> + trim_start = highest_pfn << PAGE_SHIFT;
>
> looks scary...

no, that would not be valid.

I mean the simple example you offered:

+ trim_start = highest_pfn << PAGE_SHIFT;

it _looks_ good but is inherently unsafe if 'highest_pfn' is 32-bit and
PAGE_SHIFT is 32-bit (which they are).

Or another, user-space example, on a 32-bit box:

int main (void)
{
unsigned long long a;
unsigned long b = 0x80000000, c = 2;

a = b << c;

printf("%Ld\n", a);

return 0;
}

gcc will print "0" and emits no warning - so we silently lost
information and truncated bits. I'm sure this is somewhere specified in
some language standard, but it's causing bugs left and right when we use
u64.

So if there's no helpful gcc warning that already exists, i think we
should extend the Sparse static checker to flag all such instances of:

u64 = u32 << u32;

arithmetics, because in 100% of the cases i've seen so far they cause
nasty bugs. We've had such bugs in the scheduler, and we've had them in
arch/x86 as well. It's a royal PITA.

Ingo

2008-02-07 11:40:12

by Balaji Rao

[permalink] [raw]
Subject: Re: [PATCH][Regression] x86, 32-bit: trim memory not covered by wb mtrrs - FIX

On Thursday 07 February 2008 02:20:35 pm Yinghai Lu wrote:
<snip>
> > Cool! Yes, Yinghai Lu's patch indeed is the same as mine and its really
> > surprising that we've used the same variable name too! :)
>
> minor difference
> + trim_start = highest_pfn << PAGE_SHIFT;
> + trim_size = end_pfn << PAGE_SHIFT;
>
> could cause some problem with 32 bit kernel when mem > 4g.
> becase highest_pfn and end_pfn is unsigned long aka 32 bit ...could
> overflow.
>
> so need to assign thtem to trim_start/trim_end at first
> or
> + trim_start = (u64)highest_pfn << PAGE_SHIFT;
> + trim_size = (u64)end_pfn << PAGE_SHIFT;
>

Yea right.. Thanks for pointing that out.

--
regards,
balaji rao