2013-10-17 18:57:56

by Olof Johansson

[permalink] [raw]
Subject: Corrupted low memory in v3.9+

Hi Peter,

When booting a newer kernel on a Chromebox (Samsung SNB system,
coreboot firmware, verified boot and not using seabios), I get reports
of:

[ 1.727520] Corrupted low memory at ffff880000002a90 (2a90 phys) = 100000000
[ 1.734565] Corrupted low memory at ffff880000002a98 (2a98 phys) =
50000000000

I bisected it down to:

commit 95c9608478d639dcffc14ea47b31bff021a99ed1
Author: H. Peter Anvin <[email protected]>
Date: Thu Feb 14 14:02:52 2013 -0800

x86, mm: Move reserving low memory later in initialization

Move the reservation of low memory, except for the 4K which actually
does belong to the BIOS, later in the initialization; in particular,
after we have already reserved the trampoline.

The current code locates the trampoline as high as possible, so by
deferring the allocation we will still be able to reserve as much
memory as is possible. This allows us to run with reservelow=640k
without getting a crash on system startup.


My config has:

CONFIG_X86_RESERVE_LOW=64

...but /proc/iomem says:

00000000-00000fff : reserved
00001000-0009ffff : System RAM
000a0000-000fffff : reserved
[...]


While before the above patch it said:

00000000-00000fff : reserved
00001000-0000ffff : reserved
00010000-0009ffff : System RAM
[...]

And the low memory checker never even ran before, since it had nothing
to check. Earlier the lower reserved region would be included in the
e820-reserved area if I read the code correctly, and now it's just
marked reserved by the memblock code.

I guess it could be argued either way whether this is a regression or
not; but at the end of the day we now have systems where this warning
pops when it didn't use to. :(


-Olof


2013-10-17 19:39:06

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Corrupted low memory in v3.9+

On 10/17/2013 11:57 AM, Olof Johansson wrote:
>
> And the low memory checker never even ran before, since it had nothing
> to check. Earlier the lower reserved region would be included in the
> e820-reserved area if I read the code correctly, and now it's just
> marked reserved by the memblock code.
>
> I guess it could be argued either way whether this is a regression or
> not; but at the end of the day we now have systems where this warning
> pops when it didn't use to. :(
>

I'm wondering if this is a problem with the low memory checker (the
residual value of which I have to admit to being skeptical of) or
something else.

Could you boot the box with "debug memblock=debug" and earlyprintk
turned on and send the boot output?

-hpa

2013-10-17 20:39:22

by Olof Johansson

[permalink] [raw]
Subject: Re: Corrupted low memory in v3.9+

On Thu, Oct 17, 2013 at 12:39 PM, H. Peter Anvin <[email protected]> wrote:
> On 10/17/2013 11:57 AM, Olof Johansson wrote:
>>
>> And the low memory checker never even ran before, since it had nothing
>> to check. Earlier the lower reserved region would be included in the
>> e820-reserved area if I read the code correctly, and now it's just
>> marked reserved by the memblock code.
>>
>> I guess it could be argued either way whether this is a regression or
>> not; but at the end of the day we now have systems where this warning
>> pops when it didn't use to. :(
>>
>
> I'm wondering if this is a problem with the low memory checker (the
> residual value of which I have to admit to being skeptical of) or
> something else.

There's a chance that it's a valid trip of the low-memory checker,
i.e. that we do have a bios (or more likely smm), that stomps on that
memory -- it was never checked for in the past and definitely not
warned about. I'm not sure if that was intentional behavior or not (to
not check this area), I lack history on the topic.

> Could you boot the box with "debug memblock=debug" and earlyprintk
> turned on and send the boot output?

Ah, yes, I did verify that the first 64K were indeed set aside as
reserved by doing just that:

[ 0.000000] MEMBLOCK configuration:
[ 0.000000] memory size = 0x7c750000 reserved size = 0xb05000
[ 0.000000] memory.cnt = 0x6
[ 0.000000] memory[0x0] [0x00000000010000-0x0000000009ffff], 0x90000 bytes
[ 0.000000] memory[0x1] [0x00000000100000-0x00000000efffff], 0xe00000 bytes
[ 0.000000] memory[0x2] [0x00000001000000-0x0000001fffffff],
0x1f000000 bytes
[ 0.000000] memory[0x3] [0x00000020200000-0x0000003fffffff],
0x1fe00000 bytes
[ 0.000000] memory[0x4] [0x00000040200000-0x0000007c6bffff],
0x3c4c0000 bytes
[ 0.000000] memory[0x5] [0x00000100000000-0x000001005fffff], 0x600000 bytes
[ 0.000000] reserved.cnt = 0x2
[ 0.000000] reserved[0x0] [0x0000000009f000-0x000000000fffff], 0x61000 bytes
[ 0.000000] reserved[0x1] [0x00000001000000-0x00000001aa3fff],
0xaa4000 bytes
[ 0.000000] memblock_reserve: [0x00000000099000-0x0000000009f000]
reserve_real_mode+0x61/0x87
[ 0.000000] Base memory trampoline at [ffff880000099000] 99000 size 24576
[ 0.000000] reserving inaccessible SNB gfx pages
[ 0.000000] memblock_reserve: [0x00000000000000-0x00000000100000]
setup_arch+0xa2d/0xa41
[...]

Unfortunately x86 doesn't keep the memblock structures around, so
there's no way to verify after booting in debugfs, but based on the
above it should have been reserved properly.


-Olof