2013-05-25 04:29:17

by Yuanhan Liu

[permalink] [raw]
Subject: [PATCH] x86, mm: fix boot hang regression

Commit 8d57470d introduced a kernel panic while setting mem=2G at
boot time, and commit c9b3234a6 turns the the kernel panic to hang.

While, the reason is the same: the are accessing a BAD address; I mean
the mapping is broken.

Here is a mem mapping range dumped at boot time:
[mem 0x00000000-0x000fffff] page 4k (0)
[mem 0x7fe00000-0x7fffffff] page 1G (1)
[mem 0x7c000000-0x7fdfffff] page 1G (2)
[mem 0x00100000-0x001fffff] page 4k (3)
[mem 0x00200000-0x7bffffff] page 2M (4)

Where, we met no problems while setting memory map for region (0) to
(3). But we have set PG_LEVEL_1G mapping for pud index 0x1 at (1).

And pud index comes to 0x1 as well while setting 0x40000000-0x7bf00000
part of (4). What's more, it's PG_LEVEL_2M mapping, which results to a
splitting of PG_LEVEL_1G mapping. This breaks former mapping for (1) and
(2). In the same time, due to "end" setting to 0x7c000000, we missed the
chance to fix it at phys_pmd_init() for code:
if (address >= end) {
....
continue;
}

Thus, using a extra flag to indicate we are splitting a large PUD(or PMD)
and changing the above if statement to following will make this issue gone:
if(address >= end && !spliting) {
...
}

Reported-by: LKP <[email protected]>
CC: For 3.9+ <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Yinghai Lu <[email protected]>
Bisected-by: "Xie, ChanglongX" <[email protected]>
Signed-off-by: Yuanhan Liu <[email protected]>

---
I reported this panic regression long time ago, and I didn't notic the above
panic->hang change before, which might confuse Yinghai for understanding
what happened from 2 logs I sent before(one is from 8d57470d, another is
from the HEAD commit at that time, which turn to a hang as stated).
More, it seems that Yinghai can't produce it. And I was busying at
something else. And I finally got a day yesterday(and a good mood ;).

Last, Thanks Changlong's effort for bisecting the 2 above commit.
---
arch/x86/mm/init_64.c | 51 +++++++++++++++++++++++++++++++++++++++++-------
1 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index bb00c46..e4c7038 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -401,7 +401,7 @@ void __init cleanup_highmap(void)

static unsigned long __meminit
phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end,
- pgprot_t prot)
+ pgprot_t prot, bool split_pmd)
{
unsigned long pages = 0, next;
unsigned long last_map_addr = end;
@@ -411,7 +411,7 @@ phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end,

for (i = pte_index(addr); i < PTRS_PER_PTE; i++, addr = next, pte++) {
next = (addr & PAGE_MASK) + PAGE_SIZE;
- if (addr >= end) {
+ if (addr >= end && !split_pmd) {
if (!after_bootmem &&
!e820_any_mapped(addr & PAGE_MASK, next, E820_RAM) &&
!e820_any_mapped(addr & PAGE_MASK, next, E820_RESERVED_KERN))
@@ -446,7 +446,7 @@ phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end,

static unsigned long __meminit
phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
- unsigned long page_size_mask, pgprot_t prot)
+ unsigned long page_size_mask, pgprot_t prot, bool split_pud)
{
unsigned long pages = 0, next;
unsigned long last_map_addr = end;
@@ -457,9 +457,10 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
pmd_t *pmd = pmd_page + pmd_index(address);
pte_t *pte;
pgprot_t new_prot = prot;
+ bool split_pmd = false;

next = (address & PMD_MASK) + PMD_SIZE;
- if (address >= end) {
+ if (address >= end && !split_pud) {
if (!after_bootmem &&
!e820_any_mapped(address & PMD_MASK, next, E820_RAM) &&
!e820_any_mapped(address & PMD_MASK, next, E820_RESERVED_KERN))
@@ -472,7 +473,7 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
spin_lock(&init_mm.page_table_lock);
pte = (pte_t *)pmd_page_vaddr(*pmd);
last_map_addr = phys_pte_init(pte, address,
- end, prot);
+ end, prot, split_pmd);
spin_unlock(&init_mm.page_table_lock);
continue;
}
@@ -495,6 +496,7 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
continue;
}
new_prot = pte_pgprot(pte_clrhuge(*(pte_t *)pmd));
+ split_pmd = true;
}

if (page_size_mask & (1<<PG_LEVEL_2M)) {
@@ -509,7 +511,8 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
}

pte = alloc_low_page();
- last_map_addr = phys_pte_init(pte, address, end, new_prot);
+ last_map_addr = phys_pte_init(pte, address, end,
+ new_prot, split_pmd);

spin_lock(&init_mm.page_table_lock);
pmd_populate_kernel(&init_mm, pmd, pte);
@@ -531,6 +534,7 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
pud_t *pud = pud_page + pud_index(addr);
pmd_t *pmd;
pgprot_t prot = PAGE_KERNEL;
+ bool split_pud = false;

next = (addr & PUD_MASK) + PUD_SIZE;
if (addr >= end) {
@@ -545,7 +549,8 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
if (!pud_large(*pud)) {
pmd = pmd_offset(pud, 0);
last_map_addr = phys_pmd_init(pmd, addr, end,
- page_size_mask, prot);
+ page_size_mask, prot,
+ split_pud);
__flush_tlb_all();
continue;
}
@@ -568,6 +573,36 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
continue;
}
prot = pte_pgprot(pte_clrhuge(*(pte_t *)pud));
+ /*
+ * We set page table in top-down now, which means we
+ * might have set a PG_LEVEL_1G mapping for a higher
+ * address.
+ *
+ * And in the meantime, here we meet the same PUD in
+ * a lower mem region and we are about to split it.
+ * Setting split_pud to make sure we will re-map
+ * former mapping as well. Or, we will just ignore
+ * it due to
+ * if (address >= end) {
+ * ...
+ * continue;
+ * }
+ * at phys_pmd_init().
+ *
+ * Example: here is one case I met:
+ * [mem 0x00000000-0x000fffff] page 4k (0)
+ * [mem 0x7fe00000-0x7fffffff] page 1G (1)
+ * [mem 0x7c000000-0x7fdfffff] page 1G (2)
+ * [mem 0x00100000-0x001fffff] page 4k (3)
+ * [mem 0x00200000-0x7bffffff] page 2M (4)
+ *
+ * Where mem 0x400000000 to mem 0x7fffffff will use same
+ * PUD, and we have set a PG_LEVEL_1G mapping at (1).
+ * While handling 0x40000000 - 0x7bf00000 part of (4),
+ * we will split PUD and break former mapping for (1)
+ * and (2) as stated above.
+ */
+ split_pud = true;
}

if (page_size_mask & (1<<PG_LEVEL_1G)) {
@@ -583,7 +618,7 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,

pmd = alloc_low_page();
last_map_addr = phys_pmd_init(pmd, addr, end, page_size_mask,
- prot);
+ prot, split_pud);

spin_lock(&init_mm.page_table_lock);
pud_populate(&init_mm, pud, pmd);
--
1.7.7.6


2013-05-25 07:31:44

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86, mm: fix boot hang regression

On Fri, May 24, 2013 at 9:30 PM, Yuanhan Liu
<[email protected]> wrote:
> Commit 8d57470d introduced a kernel panic while setting mem=2G at
> boot time, and commit c9b3234a6 turns the the kernel panic to hang.
>
> While, the reason is the same: the are accessing a BAD address; I mean
> the mapping is broken.
>
> Here is a mem mapping range dumped at boot time:
> [mem 0x00000000-0x000fffff] page 4k (0)
> [mem 0x7fe00000-0x7fffffff] page 1G (1)
> [mem 0x7c000000-0x7fdfffff] page 1G (2)
> [mem 0x00100000-0x001fffff] page 4k (3)
> [mem 0x00200000-0x7bffffff] page 2M (4)
>
...
> I reported this panic regression long time ago, and I didn't notic the above
> panic->hang change before, which might confuse Yinghai for understanding
> what happened from 2 logs I sent before(one is from 8d57470d, another is
> from the HEAD commit at that time, which turn to a hang as stated).
> More, it seems that Yinghai can't produce it. And I was busying at
> something else. And I finally got a day yesterday(and a good mood ;).
>
> Last, Thanks Changlong's effort for bisecting the 2 above commit.
> ---
> arch/x86/mm/init_64.c | 51 +++++++++++++++++++++++++++++++++++++++++-------
> 1 files changed, 43 insertions(+), 8 deletions(-)

oh, I know the reason, my intel box has acpi or reserved area just below 2GiB.

your patch is not right fix.

Attached patch should fix the problem.

Thanks

Yinghai


Attachments:
fix_adjust_page_mask_with_1g.patch (1.78 kB)

2013-05-25 10:24:59

by Yuanhan Liu

[permalink] [raw]
Subject: Re: [PATCH] x86, mm: fix boot hang regression

On Sat, May 25, 2013 at 12:31:43AM -0700, Yinghai Lu wrote:
> On Fri, May 24, 2013 at 9:30 PM, Yuanhan Liu
> <[email protected]> wrote:
> > Commit 8d57470d introduced a kernel panic while setting mem=2G at
> > boot time, and commit c9b3234a6 turns the the kernel panic to hang.
> >
> > While, the reason is the same: the are accessing a BAD address; I mean
> > the mapping is broken.
> >
> > Here is a mem mapping range dumped at boot time:
> > [mem 0x00000000-0x000fffff] page 4k (0)
> > [mem 0x7fe00000-0x7fffffff] page 1G (1)
> > [mem 0x7c000000-0x7fdfffff] page 1G (2)
> > [mem 0x00100000-0x001fffff] page 4k (3)
> > [mem 0x00200000-0x7bffffff] page 2M (4)
> >
> ...
> > I reported this panic regression long time ago, and I didn't notic the above
> > panic->hang change before, which might confuse Yinghai for understanding
> > what happened from 2 logs I sent before(one is from 8d57470d, another is
> > from the HEAD commit at that time, which turn to a hang as stated).
> > More, it seems that Yinghai can't produce it. And I was busying at
> > something else. And I finally got a day yesterday(and a good mood ;).
> >
> > Last, Thanks Changlong's effort for bisecting the 2 above commit.
> > ---
> > arch/x86/mm/init_64.c | 51 +++++++++++++++++++++++++++++++++++++++++-------
> > 1 files changed, 43 insertions(+), 8 deletions(-)
>
> oh, I know the reason, my intel box has acpi or reserved area just below 2GiB.

Due to the 1GB page mapping support, it seems that this issue does not
exist on platform before Sandybridge.

>
> your patch is not right fix.
>
> Attached patch should fix the problem.

Firstly, your patch works and feel free to add:
Tested-by: Yuanhan Liu <[email protected]>

While, your patch fixes this issue on source side. I thought that
before, too. But I think it's better to fix it on root side.
That's why I sent a patch to try to fix it at mapping stage; Meanwhile,
I was trying to sent another patch to fix this issue on source
side later. And you did that.

On the other hand, since we support splitting PUD(and PMD), I guess we
should make it do the right work just in case will meet such case later.

So, I still think my patch does fix something and should be merged,
unless I did it wrongly. And please correct me if I'm wrong.

Thanks.

--yliu

2013-05-28 23:27:10

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86, mm: fix boot hang regression

On Sat, May 25, 2013 at 3:25 AM, Yuanhan Liu
<[email protected]> wrote:

> Firstly, your patch works and feel free to add:
> Tested-by: Yuanhan Liu <[email protected]>

ok, will resend it in complete form.

>
> While, your patch fixes this issue on source side. I thought that
> before, too. But I think it's better to fix it on root side.
> That's why I sent a patch to try to fix it at mapping stage; Meanwhile,
> I was trying to sent another patch to fix this issue on source
> side later. And you did that.
>
> On the other hand, since we support splitting PUD(and PMD), I guess we
> should make it do the right work just in case will meet such case later.

No, we should not split them.

Thanks

Yinghai

2013-05-28 23:28:59

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] x86: Fix adjust_range_size_mask calling position

Commit 8d57470d cause a kernel panic while setting mem=2G.
[mem 0x00000000-0x000fffff] page 4k
[mem 0x7fe00000-0x7fffffff] page 1G
[mem 0x7c000000-0x7fdfffff] page 1G
[mem 0x00100000-0x001fffff] page 4k
[mem 0x00200000-0x7bffffff] page 2M

but for last entry we should have
[mem 0x00200000-0x3fffffff] page 2M
[mem 0x40000000-0x7bffffff] page 1G

Actually there is bug about calling sequence for
adjust_range_page_size_mask(). Merge first will make
adjust to 1g for second partial 1g range fail.

Fix that by calling adjust_range_size_mask before merging mem_range
with same page size.

We need this one for v3.9 stable.

Bisected-by: "Xie, ChanglongX" <[email protected]>
Bisected-by: Yuanhan Liu <[email protected]>
Reported-and-tested-by: Yuanhan Liu <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>

---
arch/x86/mm/init.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux-2.6/arch/x86/mm/init.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init.c
+++ linux-2.6/arch/x86/mm/init.c
@@ -277,6 +277,9 @@ static int __meminit split_mem_range(str
end_pfn = limit_pfn;
nr_range = save_mr(mr, nr_range, start_pfn, end_pfn, 0);

+ if (!after_bootmem)
+ adjust_range_page_size_mask(mr, nr_range);
+
/* try to merge same page size and continuous */
for (i = 0; nr_range > 1 && i < nr_range - 1; i++) {
unsigned long old_start;
@@ -291,9 +294,6 @@ static int __meminit split_mem_range(str
nr_range--;
}

- if (!after_bootmem)
- adjust_range_page_size_mask(mr, nr_range);
-
for (i = 0; i < nr_range; i++)
printk(KERN_DEBUG " [mem %#010lx-%#010lx] page %s\n",
mr[i].start, mr[i].end - 1,

2013-05-28 23:37:05

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86: Fix adjust_range_size_mask calling position

On 05/28/2013 04:28 PM, Yinghai Lu wrote:
>
> Actually there is bug about calling sequence for
> adjust_range_page_size_mask(). Merge first will make
> adjust to 1g for second partial 1g range fail.
>

Sorry, I'm not sure I understand what the above paragraph is trying to say.

-hpa

2013-05-28 23:43:46

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86: Fix adjust_range_size_mask calling position

On Tue, May 28, 2013 at 4:36 PM, H. Peter Anvin <[email protected]> wrote:
> On 05/28/2013 04:28 PM, Yinghai Lu wrote:
>>
>> Actually there is bug about calling sequence for
>> adjust_range_page_size_mask(). Merge first will make
>> adjust to 1g for second partial 1g range fail.
>>
>
> Sorry, I'm not sure I understand what the above paragraph is trying to say.

We merge the continuous range with same page size allow too early.
in the case
[mem 0x00200000-0x3fffffff] page 2M
[mem 0x40000000-0x7bffffff] page 2M
after merging them, will get
[mem 0x00200000-0x7bffffff] page 2M
even we can use 1G page to map
[mem 0x40000000-0x7bffffff]

that will cause problem, because we map
[mem 0x7fe00000-0x7fffffff] page 1G
[mem 0x7c000000-0x7fdfffff] page 1G
with 1G page.

so need to adjust page size at first before merge
[mem 0x00200000-0x3fffffff] page 2M
[mem 0x40000000-0x7bffffff] page 2M

Yinghai

2013-05-29 02:15:00

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86: Fix adjust_range_size_mask calling position

On 05/28/2013 04:43 PM, Yinghai Lu wrote:
>>
>> Sorry, I'm not sure I understand what the above paragraph is trying to say.
>
> We merge the continuous range with same page size allow too early.
> in the case
> [mem 0x00200000-0x3fffffff] page 2M
> [mem 0x40000000-0x7bffffff] page 2M
> after merging them, will get
> [mem 0x00200000-0x7bffffff] page 2M
> even we can use 1G page to map
> [mem 0x40000000-0x7bffffff]
>
> that will cause problem, because we map
> [mem 0x7fe00000-0x7fffffff] page 1G
> [mem 0x7c000000-0x7fdfffff] page 1G
> with 1G page.
>
> so need to adjust page size at first before merge
> [mem 0x00200000-0x3fffffff] page 2M
> [mem 0x40000000-0x7bffffff] page 2M
>

What I meant with that is please update the description so someone
uninitiated can follow the description, either now or in 5-10 years.

-hpa

2013-05-29 21:10:03

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v2] x86: Fix adjust_range_size_mask calling position

Commit 8d57470d cause a kernel panic while setting mem=2G.
[mem 0x00000000-0x000fffff] page 4k
[mem 0x7fe00000-0x7fffffff] page 1G
[mem 0x7c000000-0x7fdfffff] page 1G
[mem 0x00100000-0x001fffff] page 4k
[mem 0x00200000-0x7bffffff] page 2M

for last entry is not what we want, we should have
[mem 0x00200000-0x3fffffff] page 2M
[mem 0x40000000-0x7bffffff] page 1G

Actually we merge the continuous ranges with same page size too early.
in this case, before merging we have
[mem 0x00200000-0x3fffffff] page 2M
[mem 0x40000000-0x7bffffff] page 2M
after merging them, will get
[mem 0x00200000-0x7bffffff] page 2M
even we can use 1G page to map
[mem 0x40000000-0x7bffffff]

that will cause problem, because we already map
[mem 0x7fe00000-0x7fffffff] page 1G
[mem 0x7c000000-0x7fdfffff] page 1G
with 1G page.

Fix that by calling adjust_range_size_mask before merging range
with same page size.

We need this one for v3.9 stable.

-v2: update change log.

Bisected-by: "Xie, ChanglongX" <[email protected]>
Bisected-by: Yuanhan Liu <[email protected]>
Reported-and-tested-by: Yuanhan Liu <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>

---
arch/x86/mm/init.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux-2.6/arch/x86/mm/init.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init.c
+++ linux-2.6/arch/x86/mm/init.c
@@ -277,6 +277,9 @@ static int __meminit split_mem_range(str
end_pfn = limit_pfn;
nr_range = save_mr(mr, nr_range, start_pfn, end_pfn, 0);

+ if (!after_bootmem)
+ adjust_range_page_size_mask(mr, nr_range);
+
/* try to merge same page size and continuous */
for (i = 0; nr_range > 1 && i < nr_range - 1; i++) {
unsigned long old_start;
@@ -291,9 +294,6 @@ static int __meminit split_mem_range(str
nr_range--;
}

- if (!after_bootmem)
- adjust_range_page_size_mask(mr, nr_range);
-
for (i = 0; i < nr_range; i++)
printk(KERN_DEBUG " [mem %#010lx-%#010lx] page %s\n",
mr[i].start, mr[i].end - 1,

2013-05-31 11:32:34

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2] x86: Fix adjust_range_size_mask calling position


* Yinghai Lu <[email protected]> wrote:

> Commit 8d57470d cause a kernel panic while setting mem=2G.

'Commit 8d57470d' is not the standard way of how we refer to upstream
commits in changelogs. See commit 5e427ec2d066 as an example for the
proper format.

Thanks,

Ingo

2013-05-31 11:34:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2] x86: Fix adjust_range_size_mask calling position


* Yinghai Lu <[email protected]> wrote:

> Commit 8d57470d cause a kernel panic while setting mem=2G.
> [mem 0x00000000-0x000fffff] page 4k
> [mem 0x7fe00000-0x7fffffff] page 1G
> [mem 0x7c000000-0x7fdfffff] page 1G
> [mem 0x00100000-0x001fffff] page 4k
> [mem 0x00200000-0x7bffffff] page 2M
>
> for last entry is not what we want, we should have
> [mem 0x00200000-0x3fffffff] page 2M
> [mem 0x40000000-0x7bffffff] page 1G

What this changelog does not mention, why does the erroneous entry result
in a kernel panic? It looks like a valid range, so a crash is unexpected.

(This is worth explaining, even if the fix for that is in a separate
patch.)

Thanks,

Ingo

2013-05-31 15:53:40

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH v3] x86: Fix adjust_range_size_mask calling position

Commit 8d57470d cause a kernel panic while setting mem=2G.
[mem 0x00000000-0x000fffff] page 4k
[mem 0x7fe00000-0x7fffffff] page 1G
[mem 0x7c000000-0x7fdfffff] page 1G
[mem 0x00100000-0x001fffff] page 4k
[mem 0x00200000-0x7bffffff] page 2M

for last entry is not what we want, we should have
[mem 0x00200000-0x3fffffff] page 2M
[mem 0x40000000-0x7bffffff] page 1G

Actually we merge the continuous ranges with same page size too early.
in this case, before merging we have
[mem 0x00200000-0x3fffffff] page 2M
[mem 0x40000000-0x7bffffff] page 2M
after merging them, will get
[mem 0x00200000-0x7bffffff] page 2M
even we can use 1G page to map
[mem 0x40000000-0x7bffffff]

that will cause problem, because we already map
[mem 0x7fe00000-0x7fffffff] page 1G
[mem 0x7c000000-0x7fdfffff] page 1G
with 1G page, aka [0x40000000-0x7fffffff] is mapped with 1G page already.
During phys_pud_init() for [0x40000000-0x7bffffff], it will not
reuse existing that pud page, and allocate new one then try to use
2M page to map it instead, as page_size_mask does not include
PG_LEVEL_1G. At end will have [7c000000-0x7fffffff] not mapped, loop
in phys_pmd_init stop mapping at 0x7bffffff.

That is right behavoir, it maps exact range with exact page size that
we ask, and we should explicitly call it to map [7c000000-0x7fffffff]
before or after mapping 0x40000000-0x7bffffff.
Anyway we need to make sure ranges' page_size_mask correct and consistent
after split_mem_range for each range.

Fix that by calling adjust_range_size_mask before merging range
with same page size.

We need this one for v3.9 stable.

-v2: update change log.
-v3: add more explanation why [7c000000-0x7fffffff] is not mapped, and
it causes panic.

Bisected-by: "Xie, ChanglongX" <[email protected]>
Bisected-by: Yuanhan Liu <[email protected]>
Reported-and-tested-by: Yuanhan Liu <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>

---
arch/x86/mm/init.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux-2.6/arch/x86/mm/init.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init.c
+++ linux-2.6/arch/x86/mm/init.c
@@ -277,6 +277,9 @@ static int __meminit split_mem_range(str
end_pfn = limit_pfn;
nr_range = save_mr(mr, nr_range, start_pfn, end_pfn, 0);

+ if (!after_bootmem)
+ adjust_range_page_size_mask(mr, nr_range);
+
/* try to merge same page size and continuous */
for (i = 0; nr_range > 1 && i < nr_range - 1; i++) {
unsigned long old_start;
@@ -291,9 +294,6 @@ static int __meminit split_mem_range(str
nr_range--;
}

- if (!after_bootmem)
- adjust_range_page_size_mask(mr, nr_range);
-
for (i = 0; i < nr_range; i++)
printk(KERN_DEBUG " [mem %#010lx-%#010lx] page %s\n",
mr[i].start, mr[i].end - 1,

2013-05-31 20:19:04

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v3] x86: Fix adjust_range_size_mask calling position

On 05/31/2013 08:53 AM, Yinghai Lu wrote:
> Commit 8d57470d cause a kernel panic while setting mem=2G.

As Ingo pointed out, don't just refer to a commit by number; you need to
include the number as well as the subject line. This is particularly
important if git ever switches away from SHA-1.

-hpa

2013-05-31 20:22:26

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v3] x86: Fix adjust_range_size_mask calling position

On 05/31/2013 08:53 AM, Yinghai Lu wrote:
>
> We need this one for v3.9 stable.
>

The right way to express this is:

Cc: <[email protected]> v3.9

-hpa

Subject: [tip:x86/urgent] x86: Fix adjust_range_size_mask calling position

Commit-ID: 7de3d66b1387ddf5a37d9689e5eb8510fb75c765
Gitweb: http://git.kernel.org/tip/7de3d66b1387ddf5a37d9689e5eb8510fb75c765
Author: Yinghai Lu <[email protected]>
AuthorDate: Fri, 31 May 2013 08:53:07 -0700
Committer: H. Peter Anvin <[email protected]>
CommitDate: Fri, 31 May 2013 13:21:32 -0700

x86: Fix adjust_range_size_mask calling position

Commit

8d57470d x86, mm: setup page table in top-down

causes a kernel panic while setting mem=2G.

[mem 0x00000000-0x000fffff] page 4k
[mem 0x7fe00000-0x7fffffff] page 1G
[mem 0x7c000000-0x7fdfffff] page 1G
[mem 0x00100000-0x001fffff] page 4k
[mem 0x00200000-0x7bffffff] page 2M

for last entry is not what we want, we should have
[mem 0x00200000-0x3fffffff] page 2M
[mem 0x40000000-0x7bffffff] page 1G

Actually we merge the continuous ranges with same page size too early.
in this case, before merging we have
[mem 0x00200000-0x3fffffff] page 2M
[mem 0x40000000-0x7bffffff] page 2M
after merging them, will get
[mem 0x00200000-0x7bffffff] page 2M
even we can use 1G page to map
[mem 0x40000000-0x7bffffff]

that will cause problem, because we already map
[mem 0x7fe00000-0x7fffffff] page 1G
[mem 0x7c000000-0x7fdfffff] page 1G
with 1G page, aka [0x40000000-0x7fffffff] is mapped with 1G page already.
During phys_pud_init() for [0x40000000-0x7bffffff], it will not
reuse existing that pud page, and allocate new one then try to use
2M page to map it instead, as page_size_mask does not include
PG_LEVEL_1G. At end will have [7c000000-0x7fffffff] not mapped, loop
in phys_pmd_init stop mapping at 0x7bffffff.

That is right behavoir, it maps exact range with exact page size that
we ask, and we should explicitly call it to map [7c000000-0x7fffffff]
before or after mapping 0x40000000-0x7bffffff.
Anyway we need to make sure ranges' page_size_mask correct and consistent
after split_mem_range for each range.

Fix that by calling adjust_range_size_mask before merging range
with same page size.

-v2: update change log.
-v3: add more explanation why [7c000000-0x7fffffff] is not mapped, and
it causes panic.

Bisected-by: "Xie, ChanglongX" <[email protected]>
Bisected-by: Yuanhan Liu <[email protected]>
Reported-and-tested-by: Yuanhan Liu <[email protected]>
Signed-off-by: Yinghai Lu <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Cc: <[email protected]> v3.9
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/mm/init.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index eaac174..1f34e92 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -277,6 +277,9 @@ static int __meminit split_mem_range(struct map_range *mr, int nr_range,
end_pfn = limit_pfn;
nr_range = save_mr(mr, nr_range, start_pfn, end_pfn, 0);

+ if (!after_bootmem)
+ adjust_range_page_size_mask(mr, nr_range);
+
/* try to merge same page size and continuous */
for (i = 0; nr_range > 1 && i < nr_range - 1; i++) {
unsigned long old_start;
@@ -291,9 +294,6 @@ static int __meminit split_mem_range(struct map_range *mr, int nr_range,
nr_range--;
}

- if (!after_bootmem)
- adjust_range_page_size_mask(mr, nr_range);
-
for (i = 0; i < nr_range; i++)
printk(KERN_DEBUG " [mem %#010lx-%#010lx] page %s\n",
mr[i].start, mr[i].end - 1,