2012-10-21 00:06:35

by Tom Rini

[permalink] [raw]
Subject: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

Hello all,

I grabbed 3.7-rc2 and found the following on boot:
PANIC: early exception 08 rip 246:10 error 81441d7f cr2 0

A git bisect says that this problems came from:
1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a is the first bad commit
commit 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a
Author: Jacob Shin <[email protected]>
Date: Thu Oct 20 16:15:26 2011 -0500

x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.

On systems with very large memory (1 TB in our case), BIOS may report a
reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
these from the direct mapping.


The box in question is an Asus motherboard with AMD Phenom(tm) II X6
1100T and 16GB memory. Happy to provide any other information required.

--
Tom


2012-10-21 00:18:57

by Tom Rini

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On 10/20/12 17:11, Shin, Jacob wrote:
> Hi could you please attach the dmesg output? Before rc2 is fine as well.
> I would like to see the E820 table. Thank you,

dmesg is quite long so I've put it on pastebin: http://pastebin.com/4eSPEAvB

--
Tom

2012-10-21 04:01:46

by Yinghai Lu

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On Sat, Oct 20, 2012 at 5:17 PM, Tom Rini <[email protected]> wrote:
> On 10/20/12 17:11, Shin, Jacob wrote:
>> Hi could you please attach the dmesg output? Before rc2 is fine as well.
>> I would like to see the E820 table. Thank you,
>
> dmesg is quite long so I've put it on pastebin: http://pastebin.com/4eSPEAvB
>
> --

[ 0.000000] BIOS-e820: [mem 0x0000000100001000-0x000000042fffffff] usable

pre-calculate table size is too small, so it crashes.

can you please try

git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
for-x86-mm

and post bootlog?

Thanks

Yinghai

2012-10-21 04:20:17

by Jacob Shin

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On Sat, Oct 20, 2012 at 09:01:43PM -0700, Yinghai Lu wrote:
> On Sat, Oct 20, 2012 at 5:17 PM, Tom Rini <[email protected]> wrote:
> > On 10/20/12 17:11, Shin, Jacob wrote:
> >> Hi could you please attach the dmesg output? Before rc2 is fine as well.
> >> I would like to see the E820 table. Thank you,
> >
> > dmesg is quite long so I've put it on pastebin: http://pastebin.com/4eSPEAvB
> >
> > --
>
> [ 0.000000] BIOS-e820: [mem 0x0000000100001000-0x000000042fffffff] usable
>
> pre-calculate table size is too small, so it crashes.

Right,

I think just this one patch 3/6 on top of -rc2 should work:

https://lkml.org/lkml/2012/8/29/223

That would be a simpler path for 3.7,

Thanks!

>
> can you please try
>
> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
> for-x86-mm
>
> and post bootlog?
>
> Thanks
>
> Yinghai
>

2012-10-21 17:51:42

by Tom Rini

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On 10/20/12 21:18, Jacob Shin wrote:
> On Sat, Oct 20, 2012 at 09:01:43PM -0700, Yinghai Lu wrote:
>> On Sat, Oct 20, 2012 at 5:17 PM, Tom Rini <[email protected]> wrote:
>>> On 10/20/12 17:11, Shin, Jacob wrote:
>>>> Hi could you please attach the dmesg output? Before rc2 is fine as well.
>>>> I would like to see the E820 table. Thank you,
>>>
>>> dmesg is quite long so I've put it on pastebin: http://pastebin.com/4eSPEAvB
>>>
>>> --
>>
>> [ 0.000000] BIOS-e820: [mem 0x0000000100001000-0x000000042fffffff] usable
>>
>> pre-calculate table size is too small, so it crashes.
>
> Right,
>
> I think just this one patch 3/6 on top of -rc2 should work:
>
> https://lkml.org/lkml/2012/8/29/223
>
> That would be a simpler path for 3.7,

It doesn't apply easily (for me) on top of 3.7-rc2 however. Happy to
test a patch on top of 3.7-rc2 when you're able to.

--
Tom

2012-10-21 17:52:40

by Tom Rini

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On 10/20/12 21:01, Yinghai Lu wrote:
> On Sat, Oct 20, 2012 at 5:17 PM, Tom Rini <[email protected]> wrote:
>> On 10/20/12 17:11, Shin, Jacob wrote:
>>> Hi could you please attach the dmesg output? Before rc2 is fine as well.
>>> I would like to see the E820 table. Thank you,
>>
>> dmesg is quite long so I've put it on pastebin: http://pastebin.com/4eSPEAvB
>>
>> --
>
> [ 0.000000] BIOS-e820: [mem 0x0000000100001000-0x000000042fffffff] usable
>
> pre-calculate table size is too small, so it crashes.
>
> can you please try
>
> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
> for-x86-mm
>
> and post bootlog?

This boots but I'm bisecting another failure later on and can't post the
boot log (just finished bisecting that issue now).

--
Tom

2012-10-21 21:06:41

by Jacob Shin

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On Sun, Oct 21, 2012 at 10:51:35AM -0700, Tom Rini wrote:
> On 10/20/12 21:18, Jacob Shin wrote:
> > On Sat, Oct 20, 2012 at 09:01:43PM -0700, Yinghai Lu wrote:
> >> On Sat, Oct 20, 2012 at 5:17 PM, Tom Rini <[email protected]> wrote:
> >>> On 10/20/12 17:11, Shin, Jacob wrote:
> >>>> Hi could you please attach the dmesg output? Before rc2 is fine as well.
> >>>> I would like to see the E820 table. Thank you,
> >>>
> >>> dmesg is quite long so I've put it on pastebin: http://pastebin.com/4eSPEAvB
> >>>
> >>> --
> >>
> >> [ 0.000000] BIOS-e820: [mem 0x0000000100001000-0x000000042fffffff] usable
> >>
> >> pre-calculate table size is too small, so it crashes.
> >
> > Right,
> >
> > I think just this one patch 3/6 on top of -rc2 should work:
> >
> > https://lkml.org/lkml/2012/8/29/223
> >
> > That would be a simpler path for 3.7,
>
> It doesn't apply easily (for me) on top of 3.7-rc2 however. Happy to
> test a patch on top of 3.7-rc2 when you're able to.

Ah, sorry, this one should apply on top of 3.7-rc2:

https://lkml.org/lkml/2012/8/24/469

Could you try that? Just that single patch, not the whole patchset.

Thanks!

-Jacob

>
> --
> Tom
>

2012-10-21 21:24:06

by Tom Rini

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On 10/21/12 14:06, Jacob Shin wrote:
> On Sun, Oct 21, 2012 at 10:51:35AM -0700, Tom Rini wrote:
>> On 10/20/12 21:18, Jacob Shin wrote:
>>> On Sat, Oct 20, 2012 at 09:01:43PM -0700, Yinghai Lu wrote:
>>>> On Sat, Oct 20, 2012 at 5:17 PM, Tom Rini <[email protected]> wrote:
>>>>> On 10/20/12 17:11, Shin, Jacob wrote:
>>>>>> Hi could you please attach the dmesg output? Before rc2 is fine as well.
>>>>>> I would like to see the E820 table. Thank you,
>>>>>
>>>>> dmesg is quite long so I've put it on pastebin: http://pastebin.com/4eSPEAvB
>>>>>
>>>>> --
>>>>
>>>> [ 0.000000] BIOS-e820: [mem 0x0000000100001000-0x000000042fffffff] usable
>>>>
>>>> pre-calculate table size is too small, so it crashes.
>>>
>>> Right,
>>>
>>> I think just this one patch 3/6 on top of -rc2 should work:
>>>
>>> https://lkml.org/lkml/2012/8/29/223
>>>
>>> That would be a simpler path for 3.7,
>>
>> It doesn't apply easily (for me) on top of 3.7-rc2 however. Happy to
>> test a patch on top of 3.7-rc2 when you're able to.
>
> Ah, sorry, this one should apply on top of 3.7-rc2:
>
> https://lkml.org/lkml/2012/8/24/469
>
> Could you try that? Just that single patch, not the whole patchset.

That fixes it, replied with a note and Tested-by, thanks!

--
Tom

2012-10-22 14:40:30

by Jacob Shin

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On Sun, Oct 21, 2012 at 02:23:58PM -0700, Tom Rini wrote:
> On 10/21/12 14:06, Jacob Shin wrote:
> > Ah, sorry, this one should apply on top of 3.7-rc2:
> >
> > https://lkml.org/lkml/2012/8/24/469
> >
> > Could you try that? Just that single patch, not the whole patchset.
>
> That fixes it, replied with a note and Tested-by, thanks!

Thanks for testing!

hpa, so sorry, but it looks like we need one more patch [PATCH 2/5] x86:
find_early_table_space based on memory ranges that are being mapped:

https://lkml.org/lkml/2012/8/24/469

on top of this, because find_early_table_space calculation does not come out
correctly for this particular E820 table that Tom has:

http://pastebin.com/4eSPEAvB

The reason why we hit this now, and never hit it before is because before the
start was hard coded to 1UL<<32.

Thanks,

-Jacob

>
> --
> Tom
>
>

2012-10-22 18:05:34

by Yinghai Lu

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On Mon, Oct 22, 2012 at 7:40 AM, Jacob Shin <[email protected]> wrote:
> On Sun, Oct 21, 2012 at 02:23:58PM -0700, Tom Rini wrote:
>> On 10/21/12 14:06, Jacob Shin wrote:
>> > Ah, sorry, this one should apply on top of 3.7-rc2:
>> >
>> > https://lkml.org/lkml/2012/8/24/469
>> >
>> > Could you try that? Just that single patch, not the whole patchset.
>>
>> That fixes it, replied with a note and Tested-by, thanks!
>
> Thanks for testing!
>
> hpa, so sorry, but it looks like we need one more patch [PATCH 2/5] x86:
> find_early_table_space based on memory ranges that are being mapped:
>
> https://lkml.org/lkml/2012/8/24/469
>
> on top of this, because find_early_table_space calculation does not come out
> correctly for this particular E820 table that Tom has:
>
> http://pastebin.com/4eSPEAvB
>
> The reason why we hit this now, and never hit it before is because before the
> start was hard coded to 1UL<<32.
>

I'm afraid that we may need add more patches to make v3.7 really
handle every corner case.

During testing, I found more problem:
1. E820_RAM and E820_RESEVED_KERN
EFI change some E820_RAM to E820_RESREVED_KERN to cover
efi setup_data. and will pass to e820_saved, to next kexec-ed kernel.
So we can use E820_RAM to loop it, and should still E820_RAM and
E820_RESERVED_KERN combined.
otherwise will render page table with small pages, or every some partial
is not covered.
So i change to for_each_mem_pfn_range(), we fill the memblock with
E820_RAM and E820_RESERVED_KERN, and memblock will merge
range together, that will make mapping still use big page size.

2. partial page:
E820 or user could pass memmap that is not page aligned.
old cold will guarded by max_low_pfn and max_pfn. so the end partial
page will be trimmed down, and memblock can one use it.
middle partial page will still get covered by directly mapping, and
memblock still can use them.
Now we will not map middle partial page and memblock still try to use it
we could get panic when accessing those pages.

So I would suggest to just revert that temporary patch at this time,
and later come out one complete patch for stable kernels.

Thanks

Yinghai

2012-10-22 18:39:15

by Jacob Shin

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On Mon, Oct 22, 2012 at 11:05:29AM -0700, Yinghai Lu wrote:
> On Mon, Oct 22, 2012 at 7:40 AM, Jacob Shin <[email protected]> wrote:
> > On Sun, Oct 21, 2012 at 02:23:58PM -0700, Tom Rini wrote:
> >> On 10/21/12 14:06, Jacob Shin wrote:
> >> > Ah, sorry, this one should apply on top of 3.7-rc2:
> >> >
> >> > https://lkml.org/lkml/2012/8/24/469
> >> >
> >> > Could you try that? Just that single patch, not the whole patchset.
> >>
> >> That fixes it, replied with a note and Tested-by, thanks!
> >
> > Thanks for testing!
> >
> > hpa, so sorry, but it looks like we need one more patch [PATCH 2/5] x86:
> > find_early_table_space based on memory ranges that are being mapped:
> >
> > https://lkml.org/lkml/2012/8/24/469
> >
> > on top of this, because find_early_table_space calculation does not come out
> > correctly for this particular E820 table that Tom has:
> >
> > http://pastebin.com/4eSPEAvB
> >
> > The reason why we hit this now, and never hit it before is because before the
> > start was hard coded to 1UL<<32.
> >
>
> I'm afraid that we may need add more patches to make v3.7 really
> handle every corner case.
>
> During testing, I found more problem:
> 1. E820_RAM and E820_RESEVED_KERN
> EFI change some E820_RAM to E820_RESREVED_KERN to cover
> efi setup_data. and will pass to e820_saved, to next kexec-ed kernel.
> So we can use E820_RAM to loop it, and should still E820_RAM and
> E820_RESERVED_KERN combined.
> otherwise will render page table with small pages, or every some partial
> is not covered.
> So i change to for_each_mem_pfn_range(), we fill the memblock with
> E820_RAM and E820_RESERVED_KERN, and memblock will merge
> range together, that will make mapping still use big page size.

Does EFI do this on above 4G memory? All the EFI BIOSes we have in house looked
to be only touching under 4G.

>
> 2. partial page:
> E820 or user could pass memmap that is not page aligned.
> old cold will guarded by max_low_pfn and max_pfn. so the end partial
> page will be trimmed down, and memblock can one use it.
> middle partial page will still get covered by directly mapping, and
> memblock still can use them.
> Now we will not map middle partial page and memblock still try to use it
> we could get panic when accessing those pages.
>
> So I would suggest to just revert that temporary patch at this time,
> and later come out one complete patch for stable kernels.

Hm okay, I was hoping not, but if it has to be ..

>
> Thanks
>
> Yinghai
>

2012-10-22 19:46:39

by Yinghai Lu

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On Mon, Oct 22, 2012 at 11:38 AM, Jacob Shin <[email protected]> wrote:
>
> Does EFI do this on above 4G memory? All the EFI BIOSes we have in house looked
> to be only touching under 4G.

I have no idea about it.

>
>>
>> 2. partial page:
>> E820 or user could pass memmap that is not page aligned.
>> old cold will guarded by max_low_pfn and max_pfn. so the end partial
>> page will be trimmed down, and memblock can one use it.
>> middle partial page will still get covered by directly mapping, and
>> memblock still can use them.
>> Now we will not map middle partial page and memblock still try to use it
>> we could get panic when accessing those pages.
>>
>> So I would suggest to just revert that temporary patch at this time,
>> and later come out one complete patch for stable kernels.
>
> Hm okay, I was hoping not, but if it has to be ..

It's hpa's call.

2012-10-22 20:26:25

by H. Peter Anvin

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On 10/22/2012 12:46 PM, Yinghai Lu wrote:
> On Mon, Oct 22, 2012 at 11:38 AM, Jacob Shin <[email protected]> wrote:
>>
>> Does EFI do this on above 4G memory? All the EFI BIOSes we have in house looked
>> to be only touching under 4G.
>
> I have no idea about it.
>

I don't think we can rely on what is happening right now anyway.

>>> 2. partial page:
>>> E820 or user could pass memmap that is not page aligned.
>>> old cold will guarded by max_low_pfn and max_pfn. so the end partial
>>> page will be trimmed down, and memblock can one use it.
>>> middle partial page will still get covered by directly mapping, and
>>> memblock still can use them.
>>> Now we will not map middle partial page and memblock still try to use it
>>> we could get panic when accessing those pages.
>>>
>>> So I would suggest to just revert that temporary patch at this time,
>>> and later come out one complete patch for stable kernels.
>>
>> Hm okay, I was hoping not, but if it has to be ..
>
> It's hpa's call.

So the issue is that two E820 RAM ranges (or ACPI, or kernel-reserved)
are immediately adjacent on a non-page-aligned address? Or is there a
gap in between and memblock is still expecting to use it?

We should not map a partial page at the end of RAM; it is functionally
lost. Two immediately adjacent pages could be coalesced, but not a
partial page that abuts I/O space (and yes, such abortions can happen in
the real world.)

However, the issue obviously is that what we can realistically put in
3.7 or stable is limited at this point.

-hpa


2012-10-22 20:50:41

by Yinghai Lu

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On Mon, Oct 22, 2012 at 1:26 PM, H. Peter Anvin <[email protected]> wrote:
>>>> 2. partial page:
>>>> E820 or user could pass memmap that is not page aligned.
>>>> old cold will guarded by max_low_pfn and max_pfn. so the end partial
>>>> page will be trimmed down, and memblock can one use it.
>>>> middle partial page will still get covered by directly mapping, and
>>>> memblock still can use them.
>>>> Now we will not map middle partial page and memblock still try to use it
>>>> we could get panic when accessing those pages.
>>>>
>>>> So I would suggest to just revert that temporary patch at this time,
>>>> and later come out one complete patch for stable kernels.
>>>
>>> Hm okay, I was hoping not, but if it has to be ..
>>
>> It's hpa's call.
>
> So the issue is that two E820 RAM ranges (or ACPI, or kernel-reserved)
> are immediately adjacent on a non-page-aligned address?

yes. or the user take out range that is not page aligned.

> Or is there a
> gap in between and memblock is still expecting to use it?

yes, current implementation is. and init_memory_mapping map those partial pages
and holes.

>
> We should not map a partial page at the end of RAM; it is functionally
> lost.

Now we did not, we have max_low_pfn, and max_pfn to cap out end partial page.

> Two immediately adjacent pages could be coalesced, but not a
> partial page that abuts I/O space (and yes, such abortions can happen in
> the real world.)
>
> However, the issue obviously is that what we can realistically put in
> 3.7 or stable is limited at this point.

ok, let's see if we can meet this extreme corner case except user
specify not page aligned "memmap="

Thanks

Yinghai

2012-10-22 21:00:53

by H. Peter Anvin

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On 10/22/2012 01:50 PM, Yinghai Lu wrote:
>>
>> We should not map a partial page at the end of RAM; it is functionally
>> lost.
>
> Now we did not, we have max_low_pfn, and max_pfn to cap out end partial page.
>

Well, it is not just end of RAM, which is where the entire current
implementation falls apart, obviously.

-hpa

2012-10-22 21:06:56

by Yinghai Lu

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On Mon, Oct 22, 2012 at 2:00 PM, H. Peter Anvin <[email protected]> wrote:
> On 10/22/2012 01:50 PM, Yinghai Lu wrote:
>>>
>>> We should not map a partial page at the end of RAM; it is functionally
>>> lost.
>>
>> Now we did not, we have max_low_pfn, and max_pfn to cap out end partial page.
>>
>
> Well, it is not just end of RAM, which is where the entire current
> implementation falls apart, obviously.
>

ok, I will fix that from memblock_x86_fill().

after we put the E820_RAM and E820_RESERVED_KERN in to memblock, do
one trim in memblock.memory.

Thanks

Yinghai Lu

2012-10-22 21:25:29

by Yinghai Lu

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On Mon, Oct 22, 2012 at 1:52 PM, H. Peter Anvin <[email protected]> wrote:
> On 10/22/2012 01:50 PM, Yinghai Lu wrote:
>> ok, let's see if we can meet this extreme corner case except user
>> specify not page aligned "memmap="
>
> If it is *only* memmap= there is a very simple solution: if the memmap
> is RAM then we round up the starting address and round down the end
> address; if the memmap is not RAM then we round up instead...

We never know that bios guys will not let bios produce crazy e820 map.

2012-10-22 21:27:53

by H. Peter Anvin

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On 10/22/2012 02:25 PM, Yinghai Lu wrote:
> On Mon, Oct 22, 2012 at 1:52 PM, H. Peter Anvin <[email protected]> wrote:
>> On 10/22/2012 01:50 PM, Yinghai Lu wrote:
>>> ok, let's see if we can meet this extreme corner case except user
>>> specify not page aligned "memmap="
>>
>> If it is *only* memmap= there is a very simple solution: if the memmap
>> is RAM then we round up the starting address and round down the end
>> address; if the memmap is not RAM then we round up instead...
>
> We never know that bios guys will not let bios produce crazy e820 map.
>

Yeah, well, that just *will* happen... that's a given.

We can trim those ranges, though. Who cares if we lose some RAM.

-hpa

2012-10-22 22:23:20

by H. Peter Anvin

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On 10/22/2012 01:50 PM, Yinghai Lu wrote:
> ok, let's see if we can meet this extreme corner case except user
> specify not page aligned "memmap="

If it is *only* memmap= there is a very simple solution: if the memmap
is RAM then we round up the starting address and round down the end
address; if the memmap is not RAM then we round up instead...

-hpa

2012-10-22 23:35:23

by Yinghai Lu

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On Mon, Oct 22, 2012 at 2:27 PM, H. Peter Anvin <[email protected]> wrote:
>>
>> We never know that bios guys will not let bios produce crazy e820 map.
>>
>
> Yeah, well, that just *will* happen... that's a given.
>
> We can trim those ranges, though. Who cares if we lose some RAM.
>

please check attached two patches that handle partial pages for 3.7.

and you still need patch in
https://lkml.org/lkml/2012/8/24/469

to address early page table size calculation problem for Tom Rini

Thanks

Yinghai


Attachments:
memblock_trim_memory.patch (2.39 kB)
use_for_each_mem_pfn_range_setup.patch (1.29 kB)
Download all attachments

2012-10-24 16:48:18

by Jacob Shin

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On Mon, Oct 22, 2012 at 04:35:18PM -0700, Yinghai Lu wrote:
> On Mon, Oct 22, 2012 at 2:27 PM, H. Peter Anvin <[email protected]> wrote:
> >>
> >> We never know that bios guys will not let bios produce crazy e820 map.
> >>
> >
> > Yeah, well, that just *will* happen... that's a given.
> >
> > We can trim those ranges, though. Who cares if we lose some RAM.
> >
>
> please check attached two patches that handle partial pages for 3.7.
>
> and you still need patch in
> https://lkml.org/lkml/2012/8/24/469
>
> to address early page table size calculation problem for Tom Rini

Acked-by: Jacob Shin <[email protected]>

hpa, we need this patch: https://lkml.org/lkml/2012/8/24/469 and the above
2 from Yinghai to handle corner case E820 layouts.

I got an email from Greg KH that 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a is
queued for stable, so these need to go to stable as well.

Thanks,

-Jacob

>
> Thanks
>
> Yinghai



2012-10-24 18:53:30

by H. Peter Anvin

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On 10/24/2012 09:48 AM, Jacob Shin wrote:
>
> hpa, we need this patch: https://lkml.org/lkml/2012/8/24/469 and the above
> 2 from Yinghai to handle corner case E820 layouts.
>

I can apply Yinghai's patches, but the above patch no longer applies.
Could you refresh it on top of tip:x86/u, please?

-hpa

2012-10-24 19:01:48

by Yinghai Lu

[permalink] [raw]
Subject: [tip:x86/urgent] x86, mm: Trim memory in memblock to be page aligned

Commit-ID: 6ede1fd3cb404c0016de6ac529df46d561bd558b
Gitweb: http://git.kernel.org/tip/6ede1fd3cb404c0016de6ac529df46d561bd558b
Author: Yinghai Lu <[email protected]>
AuthorDate: Mon, 22 Oct 2012 16:35:18 -0700
Committer: H. Peter Anvin <[email protected]>
CommitDate: Wed, 24 Oct 2012 11:52:21 -0700

x86, mm: Trim memory in memblock to be page aligned

We will not map partial pages, so need to make sure memblock
allocation will not allocate those bytes out.

Also we will use for_each_mem_pfn_range() to loop to map memory
range to keep them consistent.

Signed-off-by: Yinghai Lu <[email protected]>
Link: http://lkml.kernel.org/r/CAE9FiQVZirvaBMFYRfXMmWEcHbKSicQEHz4VAwUv0xFCk51ZNw@mail.gmail.com
Acked-by: Jacob Shin <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
Cc: <[email protected]>
---
arch/x86/kernel/e820.c | 3 +++
include/linux/memblock.h | 1 +
mm/memblock.c | 24 ++++++++++++++++++++++++
3 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index ed858e9..df06ade 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1077,6 +1077,9 @@ void __init memblock_x86_fill(void)
memblock_add(ei->addr, ei->size);
}

+ /* throw away partial pages */
+ memblock_trim_memory(PAGE_SIZE);
+
memblock_dump_all();
}

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 569d67d..d452ee1 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -57,6 +57,7 @@ int memblock_add(phys_addr_t base, phys_addr_t size);
int memblock_remove(phys_addr_t base, phys_addr_t size);
int memblock_free(phys_addr_t base, phys_addr_t size);
int memblock_reserve(phys_addr_t base, phys_addr_t size);
+void memblock_trim_memory(phys_addr_t align);

#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
diff --git a/mm/memblock.c b/mm/memblock.c
index 931eef1..6259055 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -930,6 +930,30 @@ int __init_memblock memblock_is_region_reserved(phys_addr_t base, phys_addr_t si
return memblock_overlaps_region(&memblock.reserved, base, size) >= 0;
}

+void __init_memblock memblock_trim_memory(phys_addr_t align)
+{
+ int i;
+ phys_addr_t start, end, orig_start, orig_end;
+ struct memblock_type *mem = &memblock.memory;
+
+ for (i = 0; i < mem->cnt; i++) {
+ orig_start = mem->regions[i].base;
+ orig_end = mem->regions[i].base + mem->regions[i].size;
+ start = round_up(orig_start, align);
+ end = round_down(orig_end, align);
+
+ if (start == orig_start && end == orig_end)
+ continue;
+
+ if (start < end) {
+ mem->regions[i].base = start;
+ mem->regions[i].size = end - start;
+ } else {
+ memblock_remove_region(mem, i);
+ i--;
+ }
+ }
+}

void __init_memblock memblock_set_current_limit(phys_addr_t limit)
{

2012-10-24 19:02:47

by Yinghai Lu

[permalink] [raw]
Subject: [tip:x86/urgent] x86, mm: Use memblock memory loop instead of e820_RAM

Commit-ID: 1f2ff682ac951ed82cc043cf140d2851084512df
Gitweb: http://git.kernel.org/tip/1f2ff682ac951ed82cc043cf140d2851084512df
Author: Yinghai Lu <[email protected]>
AuthorDate: Mon, 22 Oct 2012 16:35:18 -0700
Committer: H. Peter Anvin <[email protected]>
CommitDate: Wed, 24 Oct 2012 11:52:36 -0700

x86, mm: Use memblock memory loop instead of e820_RAM

We need to handle E820_RAM and E820_RESERVED_KERNEL at the same time.

Also memblock has page aligned range for ram, so we could avoid mapping
partial pages.

Signed-off-by: Yinghai Lu <[email protected]>
Link: http://lkml.kernel.org/r/CAE9FiQVZirvaBMFYRfXMmWEcHbKSicQEHz4VAwUv0xFCk51ZNw@mail.gmail.com
Acked-by: Jacob Shin <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
Cc: <[email protected]>
---
arch/x86/kernel/setup.c | 15 ++++++++-------
1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 468e98d..5d888af 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -921,18 +921,19 @@ void __init setup_arch(char **cmdline_p)
#ifdef CONFIG_X86_64
if (max_pfn > max_low_pfn) {
int i;
- for (i = 0; i < e820.nr_map; i++) {
- struct e820entry *ei = &e820.map[i];
+ unsigned long start, end;
+ unsigned long start_pfn, end_pfn;

- if (ei->addr + ei->size <= 1UL << 32)
- continue;
+ for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn,
+ NULL) {

- if (ei->type == E820_RESERVED)
+ end = PFN_PHYS(end_pfn);
+ if (end <= (1UL<<32))
continue;

+ start = PFN_PHYS(start_pfn);
max_pfn_mapped = init_memory_mapping(
- ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr,
- ei->addr + ei->size);
+ max((1UL<<32), start), end);
}

/* can we preseve max_low_pfn ?*/

2012-10-24 19:53:24

by Jacob Shin

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On Wed, Oct 24, 2012 at 11:53:16AM -0700, H. Peter Anvin wrote:
> On 10/24/2012 09:48 AM, Jacob Shin wrote:
> >
> > hpa, we need this patch: https://lkml.org/lkml/2012/8/24/469 and the above
> > 2 from Yinghai to handle corner case E820 layouts.
> >
>
> I can apply Yinghai's patches, but the above patch no longer applies.
> Could you refresh it on top of tip:x86/u, please?

Sorry about that, it applied to Linus's 3.7-rc2 so I just assumed .. :-(

>From 7d2a67f6b435ede202bdf5d1982f9b5af90cce34 Mon Sep 17 00:00:00 2001
From: Jacob Shin <[email protected]>
Date: Wed, 24 Oct 2012 14:24:44 -0500
Subject: [PATCH] x86/mm: find_early_table_space based on ranges that are
actually being mapped

Current logic finds enough space for direct mapping page tables from 0
to end. Instead, we only need to find enough space to cover mr[0].start
to mr[nr_range].end -- the range that is actually being mapped by
init_memory_mapping()

This is needed after 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a, to address
the panic reported here:

https://lkml.org/lkml/2012/10/20/160
https://lkml.org/lkml/2012/10/21/157

Signed-off-by: Jacob Shin <[email protected]>
Tested-by: Tom Rini <[email protected]>

---
arch/x86/mm/init.c | 70 ++++++++++++++++++++++++++++++----------------------
1 file changed, 41 insertions(+), 29 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 8653b3a..bc287d6 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -29,36 +29,54 @@ int direct_gbpages
#endif
;

-static void __init find_early_table_space(unsigned long end, int use_pse,
- int use_gbpages)
+struct map_range {
+ unsigned long start;
+ unsigned long end;
+ unsigned page_size_mask;
+};
+
+/*
+ * First calculate space needed for kernel direct mapping page tables to cover
+ * mr[0].start to mr[nr_range - 1].end, while accounting for possible 2M and 1GB
+ * pages. Then find enough contiguous space for those page tables.
+ */
+static void __init find_early_table_space(struct map_range *mr, int nr_range)
{
- unsigned long puds, pmds, ptes, tables, start = 0, good_end = end;
+ int i;
+ unsigned long puds = 0, pmds = 0, ptes = 0, tables;
+ unsigned long start = 0, good_end;
phys_addr_t base;

- puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;
- tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);
+ for (i = 0; i < nr_range; i++) {
+ unsigned long range, extra;

- if (use_gbpages) {
- unsigned long extra;
+ range = mr[i].end - mr[i].start;
+ puds += (range + PUD_SIZE - 1) >> PUD_SHIFT;

- extra = end - ((end>>PUD_SHIFT) << PUD_SHIFT);
- pmds = (extra + PMD_SIZE - 1) >> PMD_SHIFT;
- } else
- pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
-
- tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);
+ if (mr[i].page_size_mask & (1 << PG_LEVEL_1G)) {
+ extra = range - ((range >> PUD_SHIFT) << PUD_SHIFT);
+ pmds += (extra + PMD_SIZE - 1) >> PMD_SHIFT;
+ } else {
+ pmds += (range + PMD_SIZE - 1) >> PMD_SHIFT;
+ }

- if (use_pse) {
- unsigned long extra;
-
- extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
+ if (mr[i].page_size_mask & (1 << PG_LEVEL_2M)) {
+ extra = range - ((range >> PMD_SHIFT) << PMD_SHIFT);
#ifdef CONFIG_X86_32
- extra += PMD_SIZE;
+ extra += PMD_SIZE;
#endif
- ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
- } else
- ptes = (end + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ /* The first 2/4M doesn't use large pages. */
+ if (mr[i].start < PMD_SIZE)
+ extra += range;
+
+ ptes += (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ } else {
+ ptes += (range + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ }
+ }

+ tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);
+ tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);
tables += roundup(ptes * sizeof(pte_t), PAGE_SIZE);

#ifdef CONFIG_X86_32
@@ -76,7 +94,7 @@ static void __init find_early_table_space(unsigned long end, int use_pse,
pgt_buf_top = pgt_buf_start + (tables >> PAGE_SHIFT);

printk(KERN_DEBUG "kernel direct mapping tables up to %#lx @ [mem %#010lx-%#010lx]\n",
- end - 1, pgt_buf_start << PAGE_SHIFT,
+ mr[nr_range - 1].end - 1, pgt_buf_start << PAGE_SHIFT,
(pgt_buf_top << PAGE_SHIFT) - 1);
}

@@ -85,12 +103,6 @@ void __init native_pagetable_reserve(u64 start, u64 end)
memblock_reserve(start, end - start);
}

-struct map_range {
- unsigned long start;
- unsigned long end;
- unsigned page_size_mask;
-};
-
#ifdef CONFIG_X86_32
#define NR_RANGE_MR 3
#else /* CONFIG_X86_64 */
@@ -263,7 +275,7 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
* nodes are discovered.
*/
if (!after_bootmem)
- find_early_table_space(end, use_pse, use_gbpages);
+ find_early_table_space(mr, nr_range);

for (i = 0; i < nr_range; i++)
ret = kernel_physical_mapping_init(mr[i].start, mr[i].end,
--
1.7.9.5

2012-10-24 21:49:51

by Jacob Shin

[permalink] [raw]
Subject: [tip:x86/urgent] x86, mm: Find_early_table_space based on ranges that are actually being mapped

Commit-ID: 844ab6f993b1d32eb40512503d35ff6ad0c57030
Gitweb: http://git.kernel.org/tip/844ab6f993b1d32eb40512503d35ff6ad0c57030
Author: Jacob Shin <[email protected]>
AuthorDate: Wed, 24 Oct 2012 14:24:44 -0500
Committer: H. Peter Anvin <[email protected]>
CommitDate: Wed, 24 Oct 2012 13:37:04 -0700

x86, mm: Find_early_table_space based on ranges that are actually being mapped

Current logic finds enough space for direct mapping page tables from 0
to end. Instead, we only need to find enough space to cover mr[0].start
to mr[nr_range].end -- the range that is actually being mapped by
init_memory_mapping()

This is needed after 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a, to address
the panic reported here:

https://lkml.org/lkml/2012/10/20/160
https://lkml.org/lkml/2012/10/21/157

Signed-off-by: Jacob Shin <[email protected]>
Link: http://lkml.kernel.org/r/20121024195311.GB11779@jshin-Toonie
Tested-by: Tom Rini <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/mm/init.c | 70 ++++++++++++++++++++++++++++++---------------------
1 files changed, 41 insertions(+), 29 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 8653b3a..bc287d6 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -29,36 +29,54 @@ int direct_gbpages
#endif
;

-static void __init find_early_table_space(unsigned long end, int use_pse,
- int use_gbpages)
+struct map_range {
+ unsigned long start;
+ unsigned long end;
+ unsigned page_size_mask;
+};
+
+/*
+ * First calculate space needed for kernel direct mapping page tables to cover
+ * mr[0].start to mr[nr_range - 1].end, while accounting for possible 2M and 1GB
+ * pages. Then find enough contiguous space for those page tables.
+ */
+static void __init find_early_table_space(struct map_range *mr, int nr_range)
{
- unsigned long puds, pmds, ptes, tables, start = 0, good_end = end;
+ int i;
+ unsigned long puds = 0, pmds = 0, ptes = 0, tables;
+ unsigned long start = 0, good_end;
phys_addr_t base;

- puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;
- tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);
+ for (i = 0; i < nr_range; i++) {
+ unsigned long range, extra;

- if (use_gbpages) {
- unsigned long extra;
+ range = mr[i].end - mr[i].start;
+ puds += (range + PUD_SIZE - 1) >> PUD_SHIFT;

- extra = end - ((end>>PUD_SHIFT) << PUD_SHIFT);
- pmds = (extra + PMD_SIZE - 1) >> PMD_SHIFT;
- } else
- pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
-
- tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);
+ if (mr[i].page_size_mask & (1 << PG_LEVEL_1G)) {
+ extra = range - ((range >> PUD_SHIFT) << PUD_SHIFT);
+ pmds += (extra + PMD_SIZE - 1) >> PMD_SHIFT;
+ } else {
+ pmds += (range + PMD_SIZE - 1) >> PMD_SHIFT;
+ }

- if (use_pse) {
- unsigned long extra;
-
- extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
+ if (mr[i].page_size_mask & (1 << PG_LEVEL_2M)) {
+ extra = range - ((range >> PMD_SHIFT) << PMD_SHIFT);
#ifdef CONFIG_X86_32
- extra += PMD_SIZE;
+ extra += PMD_SIZE;
#endif
- ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
- } else
- ptes = (end + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ /* The first 2/4M doesn't use large pages. */
+ if (mr[i].start < PMD_SIZE)
+ extra += range;
+
+ ptes += (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ } else {
+ ptes += (range + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ }
+ }

+ tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);
+ tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);
tables += roundup(ptes * sizeof(pte_t), PAGE_SIZE);

#ifdef CONFIG_X86_32
@@ -76,7 +94,7 @@ static void __init find_early_table_space(unsigned long end, int use_pse,
pgt_buf_top = pgt_buf_start + (tables >> PAGE_SHIFT);

printk(KERN_DEBUG "kernel direct mapping tables up to %#lx @ [mem %#010lx-%#010lx]\n",
- end - 1, pgt_buf_start << PAGE_SHIFT,
+ mr[nr_range - 1].end - 1, pgt_buf_start << PAGE_SHIFT,
(pgt_buf_top << PAGE_SHIFT) - 1);
}

@@ -85,12 +103,6 @@ void __init native_pagetable_reserve(u64 start, u64 end)
memblock_reserve(start, end - start);
}

-struct map_range {
- unsigned long start;
- unsigned long end;
- unsigned page_size_mask;
-};
-
#ifdef CONFIG_X86_32
#define NR_RANGE_MR 3
#else /* CONFIG_X86_64 */
@@ -263,7 +275,7 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
* nodes are discovered.
*/
if (!after_bootmem)
- find_early_table_space(end, use_pse, use_gbpages);
+ find_early_table_space(mr, nr_range);

for (i = 0; i < nr_range; i++)
ret = kernel_physical_mapping_init(mr[i].start, mr[i].end,

2012-10-25 06:42:43

by Yinghai Lu

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86, mm: Find_early_table_space based on ranges that are actually being mapped

On Wed, Oct 24, 2012 at 2:49 PM, tip-bot for Jacob Shin
<[email protected]> wrote:
> Commit-ID: 844ab6f993b1d32eb40512503d35ff6ad0c57030
> Gitweb: http://git.kernel.org/tip/844ab6f993b1d32eb40512503d35ff6ad0c57030
> Author: Jacob Shin <[email protected]>
> AuthorDate: Wed, 24 Oct 2012 14:24:44 -0500
> Committer: H. Peter Anvin <[email protected]>
> CommitDate: Wed, 24 Oct 2012 13:37:04 -0700
>
> x86, mm: Find_early_table_space based on ranges that are actually being mapped
>
> Current logic finds enough space for direct mapping page tables from 0
> to end. Instead, we only need to find enough space to cover mr[0].start
> to mr[nr_range].end -- the range that is actually being mapped by
> init_memory_mapping()
>
> This is needed after 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a, to address
> the panic reported here:
>
> https://lkml.org/lkml/2012/10/20/160
> https://lkml.org/lkml/2012/10/21/157
>
> Signed-off-by: Jacob Shin <[email protected]>
> Link: http://lkml.kernel.org/r/20121024195311.GB11779@jshin-Toonie
> Tested-by: Tom Rini <[email protected]>
> Signed-off-by: H. Peter Anvin <[email protected]>
> ---
> arch/x86/mm/init.c | 70 ++++++++++++++++++++++++++++++---------------------
> 1 files changed, 41 insertions(+), 29 deletions(-)
>
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 8653b3a..bc287d6 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -29,36 +29,54 @@ int direct_gbpages
> #endif
> ;
>
> -static void __init find_early_table_space(unsigned long end, int use_pse,
> - int use_gbpages)
> +struct map_range {
> + unsigned long start;
> + unsigned long end;
> + unsigned page_size_mask;
> +};
> +
> +/*
> + * First calculate space needed for kernel direct mapping page tables to cover
> + * mr[0].start to mr[nr_range - 1].end, while accounting for possible 2M and 1GB
> + * pages. Then find enough contiguous space for those page tables.
> + */
> +static void __init find_early_table_space(struct map_range *mr, int nr_range)
> {
> - unsigned long puds, pmds, ptes, tables, start = 0, good_end = end;
> + int i;
> + unsigned long puds = 0, pmds = 0, ptes = 0, tables;
> + unsigned long start = 0, good_end;
> phys_addr_t base;
>
> - puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;
> - tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);
> + for (i = 0; i < nr_range; i++) {
> + unsigned long range, extra;
>
> - if (use_gbpages) {
> - unsigned long extra;
> + range = mr[i].end - mr[i].start;
> + puds += (range + PUD_SIZE - 1) >> PUD_SHIFT;
>
> - extra = end - ((end>>PUD_SHIFT) << PUD_SHIFT);
> - pmds = (extra + PMD_SIZE - 1) >> PMD_SHIFT;
> - } else
> - pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
> -
> - tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);
> + if (mr[i].page_size_mask & (1 << PG_LEVEL_1G)) {
> + extra = range - ((range >> PUD_SHIFT) << PUD_SHIFT);
> + pmds += (extra + PMD_SIZE - 1) >> PMD_SHIFT;
> + } else {
> + pmds += (range + PMD_SIZE - 1) >> PMD_SHIFT;
> + }
>
> - if (use_pse) {
> - unsigned long extra;
> -
> - extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
> + if (mr[i].page_size_mask & (1 << PG_LEVEL_2M)) {
> + extra = range - ((range >> PMD_SHIFT) << PMD_SHIFT);
> #ifdef CONFIG_X86_32
> - extra += PMD_SIZE;
> + extra += PMD_SIZE;
> #endif
> - ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
> - } else
> - ptes = (end + PAGE_SIZE - 1) >> PAGE_SHIFT;
> + /* The first 2/4M doesn't use large pages. */
> + if (mr[i].start < PMD_SIZE)
> + extra += range;

those three lines should be added back.

it just get reverted in 7b16bbf9

Revert "x86/mm: Fix the size calculation of mapping tables"


> +
> + ptes += (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
> + } else {
> + ptes += (range + PAGE_SIZE - 1) >> PAGE_SHIFT;
> + }
> + }
>
> + tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);
> + tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);
> tables += roundup(ptes * sizeof(pte_t), PAGE_SIZE);
>
> #ifdef CONFIG_X86_32
> @@ -76,7 +94,7 @@ static void __init find_early_table_space(unsigned long end, int use_pse,
> pgt_buf_top = pgt_buf_start + (tables >> PAGE_SHIFT);
>
> printk(KERN_DEBUG "kernel direct mapping tables up to %#lx @ [mem %#010lx-%#010lx]\n",
> - end - 1, pgt_buf_start << PAGE_SHIFT,
> + mr[nr_range - 1].end - 1, pgt_buf_start << PAGE_SHIFT,
> (pgt_buf_top << PAGE_SHIFT) - 1);
> }
>
> @@ -85,12 +103,6 @@ void __init native_pagetable_reserve(u64 start, u64 end)
> memblock_reserve(start, end - start);
> }
>
> -struct map_range {
> - unsigned long start;
> - unsigned long end;
> - unsigned page_size_mask;
> -};
> -
> #ifdef CONFIG_X86_32
> #define NR_RANGE_MR 3
> #else /* CONFIG_X86_64 */
> @@ -263,7 +275,7 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
> * nodes are discovered.
> */
> if (!after_bootmem)
> - find_early_table_space(end, use_pse, use_gbpages);
> + find_early_table_space(mr, nr_range);
>
> for (i = 0; i < nr_range; i++)
> ret = kernel_physical_mapping_init(mr[i].start, mr[i].end,
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2012-10-25 07:55:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86, mm: Find_early_table_space based on ranges that are actually being mapped


* Yinghai Lu <[email protected]> wrote:

> On Wed, Oct 24, 2012 at 2:49 PM, tip-bot for Jacob Shin
> <[email protected]> wrote:
> > Commit-ID: 844ab6f993b1d32eb40512503d35ff6ad0c57030
> > Gitweb: http://git.kernel.org/tip/844ab6f993b1d32eb40512503d35ff6ad0c57030
> > Author: Jacob Shin <[email protected]>
> > AuthorDate: Wed, 24 Oct 2012 14:24:44 -0500
> > Committer: H. Peter Anvin <[email protected]>
> > CommitDate: Wed, 24 Oct 2012 13:37:04 -0700
> >
> > x86, mm: Find_early_table_space based on ranges that are actually being mapped
> >
> > Current logic finds enough space for direct mapping page tables from 0
> > to end. Instead, we only need to find enough space to cover mr[0].start
> > to mr[nr_range].end -- the range that is actually being mapped by
> > init_memory_mapping()
> >
> > This is needed after 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a, to address
> > the panic reported here:
> >
> > https://lkml.org/lkml/2012/10/20/160
> > https://lkml.org/lkml/2012/10/21/157
> >
> > Signed-off-by: Jacob Shin <[email protected]>
> > Link: http://lkml.kernel.org/r/20121024195311.GB11779@jshin-Toonie
> > Tested-by: Tom Rini <[email protected]>
> > Signed-off-by: H. Peter Anvin <[email protected]>
> > ---
> > arch/x86/mm/init.c | 70 ++++++++++++++++++++++++++++++---------------------
> > 1 files changed, 41 insertions(+), 29 deletions(-)
> >
> > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> > index 8653b3a..bc287d6 100644
> > --- a/arch/x86/mm/init.c
> > +++ b/arch/x86/mm/init.c
> > @@ -29,36 +29,54 @@ int direct_gbpages
> > #endif
> > ;
> >
> > -static void __init find_early_table_space(unsigned long end, int use_pse,
> > - int use_gbpages)
> > +struct map_range {
> > + unsigned long start;
> > + unsigned long end;
> > + unsigned page_size_mask;
> > +};
> > +
> > +/*
> > + * First calculate space needed for kernel direct mapping page tables to cover
> > + * mr[0].start to mr[nr_range - 1].end, while accounting for possible 2M and 1GB
> > + * pages. Then find enough contiguous space for those page tables.
> > + */
> > +static void __init find_early_table_space(struct map_range *mr, int nr_range)
> > {
> > - unsigned long puds, pmds, ptes, tables, start = 0, good_end = end;
> > + int i;
> > + unsigned long puds = 0, pmds = 0, ptes = 0, tables;
> > + unsigned long start = 0, good_end;
> > phys_addr_t base;
> >
> > - puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;
> > - tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);
> > + for (i = 0; i < nr_range; i++) {
> > + unsigned long range, extra;
> >
> > - if (use_gbpages) {
> > - unsigned long extra;
> > + range = mr[i].end - mr[i].start;
> > + puds += (range + PUD_SIZE - 1) >> PUD_SHIFT;
> >
> > - extra = end - ((end>>PUD_SHIFT) << PUD_SHIFT);
> > - pmds = (extra + PMD_SIZE - 1) >> PMD_SHIFT;
> > - } else
> > - pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
> > -
> > - tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);
> > + if (mr[i].page_size_mask & (1 << PG_LEVEL_1G)) {
> > + extra = range - ((range >> PUD_SHIFT) << PUD_SHIFT);
> > + pmds += (extra + PMD_SIZE - 1) >> PMD_SHIFT;
> > + } else {
> > + pmds += (range + PMD_SIZE - 1) >> PMD_SHIFT;
> > + }
> >
> > - if (use_pse) {
> > - unsigned long extra;
> > -
> > - extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
> > + if (mr[i].page_size_mask & (1 << PG_LEVEL_2M)) {
> > + extra = range - ((range >> PMD_SHIFT) << PMD_SHIFT);
> > #ifdef CONFIG_X86_32
> > - extra += PMD_SIZE;
> > + extra += PMD_SIZE;
> > #endif
> > - ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > - } else
> > - ptes = (end + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > + /* The first 2/4M doesn't use large pages. */
> > + if (mr[i].start < PMD_SIZE)
> > + extra += range;
>
> those three lines should be added back.
>
> it just get reverted in 7b16bbf9

Could you please send a delta patch against tip:x86/urgent?

Thanks,

Ingo

2012-10-25 14:33:36

by Yinghai Lu

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86, mm: Find_early_table_space based on ranges that are actually being mapped

On Thu, Oct 25, 2012 at 12:55 AM, Ingo Molnar <[email protected]> wrote:
>> > - ptes = (end + PAGE_SIZE - 1) >> PAGE_SHIFT;
>> > + /* The first 2/4M doesn't use large pages. */
>> > + if (mr[i].start < PMD_SIZE)
>> > + extra += range;
>>
>> those three lines should be added back.

missed "not" ...

>>
>> it just get reverted in 7b16bbf9
>
> Could you please send a delta patch against tip:x86/urgent?

please check attached one.

Thanks

Yinghai


Attachments:
remove_wrong_addback.patch (859.00 B)

2012-10-25 22:23:38

by Jacob Shin

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86, mm: Find_early_table_space based on ranges that are actually being mapped

On Thu, Oct 25, 2012 at 07:33:32AM -0700, Yinghai Lu wrote:
> On Thu, Oct 25, 2012 at 12:55 AM, Ingo Molnar <[email protected]> wrote:
> >> > - ptes = (end + PAGE_SIZE - 1) >> PAGE_SHIFT;
> >> > + /* The first 2/4M doesn't use large pages. */
> >> > + if (mr[i].start < PMD_SIZE)
> >> > + extra += range;
> >>
> >> those three lines should be added back.
>
> missed "not" ...
>
> >>
> >> it just get reverted in 7b16bbf9
> >
> > Could you please send a delta patch against tip:x86/urgent?
>
> please check attached one.

Acked-by: Jacob Shin <[email protected]>

Sorry about that, I just retrofitted the patch and didn't see those lines got
reverted out,

Thanks!

>
> Thanks
>
> Yinghai


2012-10-25 23:31:42

by Yinghai Lu

[permalink] [raw]
Subject: [tip:x86/urgent] x86, mm: Undo incorrect revert in arch/x86/mm/ init.c

Commit-ID: f82f64dd9f485e13f29f369772d4a0e868e5633a
Gitweb: http://git.kernel.org/tip/f82f64dd9f485e13f29f369772d4a0e868e5633a
Author: Yinghai Lu <[email protected]>
AuthorDate: Thu, 25 Oct 2012 15:45:26 -0700
Committer: H. Peter Anvin <[email protected]>
CommitDate: Thu, 25 Oct 2012 15:45:45 -0700

x86, mm: Undo incorrect revert in arch/x86/mm/init.c

Commit

844ab6f9 x86, mm: Find_early_table_space based on ranges that are actually being mapped

added back some lines back wrongly that has been removed in commit

7b16bbf97 Revert "x86/mm: Fix the size calculation of mapping tables"

remove them again.

Signed-off-by: Yinghai Lu <[email protected]>
Link: http://lkml.kernel.org/r/CAE9FiQW_vuaYQbmagVnxT2DGsYc=9tNeAbdBq53sYkitPOwxSQ@mail.gmail.com
Acked-by: Jacob Shin <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/mm/init.c | 4 ----
1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index bc287d6..d7aea41 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -65,10 +65,6 @@ static void __init find_early_table_space(struct map_range *mr, int nr_range)
#ifdef CONFIG_X86_32
extra += PMD_SIZE;
#endif
- /* The first 2/4M doesn't use large pages. */
- if (mr[i].start < PMD_SIZE)
- extra += range;
-
ptes += (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
} else {
ptes += (range + PAGE_SIZE - 1) >> PAGE_SHIFT;

2012-10-28 20:48:17

by Tom Rini

[permalink] [raw]
Subject: Re: BUG: 1bbbbe7 (x86: Exclude E820_RESERVED regions...) PANIC on boot

On 10/22/12 07:40, Jacob Shin wrote:
> On Sun, Oct 21, 2012 at 02:23:58PM -0700, Tom Rini wrote:
>> On 10/21/12 14:06, Jacob Shin wrote:
>>> Ah, sorry, this one should apply on top of 3.7-rc2:
>>>
>>> https://lkml.org/lkml/2012/8/24/469
>>>
>>> Could you try that? Just that single patch, not the whole patchset.
>>
>> That fixes it, replied with a note and Tested-by, thanks!
>
> Thanks for testing!
>
> hpa, so sorry, but it looks like we need one more patch [PATCH 2/5] x86:
> find_early_table_space based on memory ranges that are being mapped:
>
> https://lkml.org/lkml/2012/8/24/469
>
> on top of this, because find_early_table_space calculation does not come out
> correctly for this particular E820 table that Tom has:
>
> http://pastebin.com/4eSPEAvB
>
> The reason why we hit this now, and never hit it before is because before the
> start was hard coded to 1UL<<32.

As a final follow-up, v3.7-rc3 does not have the problem I reported
previously.

--
Tom