2011-06-07 18:09:56

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH 0/3] x86: remove x86_init.mapping.pagetable_reserve

Currently find_early_table_space calculates an overestimate of how much
memory the pagetable for 1:1 mapping is going to need. After
kernel_physical_mapping_init completes we know exactly how much memory
we used so we memblock reserve only the used memory and "free" the rest.

This patch series modifies find_early_table_space to calculate the exact
amount of memory we need for the 1:1 mapping, so that we can memblock
reverve it right away and we don't need to free the unused memory after
kernel_physical_mapping_init.

At this point we can also safely revert "x86,xen: introduce
x86_init.mapping.pagetable_reserve".


The list of patches with diffstat follows:

Stefano Stabellini (3):
x86: calculate precisely the memory needed by init_memory_mapping
Revert "x86,xen: introduce x86_init.mapping.pagetable_reserve"
x86: move memblock_x86_reserve_range PGTABLE to find_early_table_space

arch/x86/include/asm/pgtable_types.h | 1 -
arch/x86/include/asm/x86_init.h | 12 -----
arch/x86/kernel/x86_init.c | 4 --
arch/x86/mm/init.c | 87 +++++++++++++++++++---------------
arch/x86/xen/mmu.c | 15 ------
5 files changed, 49 insertions(+), 70 deletions(-)


Many thanks to Konrad that helped me review the patch series and
performed an impressive amount of tests:

*Configurations
baremetal Linux 64-bit
baremetal Linux 32-bit NOHIGHMEM, HIGHMEM4G and HIGHMEM64G
32-bit and 64-bit Linux on Xen
32-bit and 64-bit Linux HVM on Xen


*Hardware
AMD development box (Tilapia) - 8GB
AMD BIOSTAR Grp N61PB-M2S/N61PB-M2S (Sempron) - 4GB
Intel DX58SO (Core i7) - 8GB
Supermicro X7DB8/X7DB8 (Harpertown) - 4GB
IBM x3850 (Cranford) - 8GB
MSI MS-7680/H61M-P23 (MS-7680) (SandyBridge, i2500)- 8GB


A git branch based on 3.0-rc1 is available here:

git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git 3.0-rc1-rem_pg_reserve-4

- Stefano


2011-06-07 18:10:15

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH 1/3] x86: calculate precisely the memory needed by init_memory_mapping

From: Stefano Stabellini <[email protected]>

- take into account the size of the initial pagetable;

- remove the extra PMD_SIZE added when use_pse, because the previously
allocated PMDs are always 2M aligned;

- remove the extra page added on x86_32 for the fixmap because is not
needed: the PMD entry is already allocated and contiguous for the whole
range (a PMD page covers 4G of virtual addresses) and the pte entry is
already allocated by early_ioremap_init.

Signed-off-by: Stefano Stabellini <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
---
arch/x86/mm/init.c | 62 ++++++++++++++++++++++++++++++++++++++-------------
1 files changed, 46 insertions(+), 16 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 3032644..0cfe8d4 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -28,22 +28,52 @@ int direct_gbpages
#endif
;

-static void __init find_early_table_space(unsigned long end, int use_pse,
- int use_gbpages)
+static void __init find_early_table_space(unsigned long start,
+ unsigned long end, int use_pse, int use_gbpages)
{
- unsigned long puds, pmds, ptes, tables, start = 0, good_end = end;
+ unsigned long pmds = 0, ptes = 0, tables = 0, good_end = end,
+ pud_mapped = 0, pmd_mapped = 0, size = end - start;
phys_addr_t base;

- puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;
- tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);
+ pud_mapped = DIV_ROUND_UP(PFN_PHYS(max_pfn_mapped),
+ (PUD_SIZE * PTRS_PER_PUD));
+ pud_mapped *= (PUD_SIZE * PTRS_PER_PUD);
+ pmd_mapped = DIV_ROUND_UP(PFN_PHYS(max_pfn_mapped),
+ (PMD_SIZE * PTRS_PER_PMD));
+ pmd_mapped *= (PMD_SIZE * PTRS_PER_PMD);
+
+ if (start < PFN_PHYS(max_pfn_mapped)) {
+ if (PFN_PHYS(max_pfn_mapped) < end)
+ size -= PFN_PHYS(max_pfn_mapped) - start;
+ else
+ size = 0;
+ }
+
+#ifndef __PAGETABLE_PUD_FOLDED
+ if (end > pud_mapped) {
+ unsigned long puds;
+ if (start < pud_mapped)
+ puds = (end - pud_mapped + PUD_SIZE - 1) >> PUD_SHIFT;
+ else
+ puds = (end - start + PUD_SIZE - 1) >> PUD_SHIFT;
+ tables += roundup(puds * sizeof(pud_t), PAGE_SIZE);
+ }
+#endif

if (use_gbpages) {
unsigned long extra;

extra = end - ((end>>PUD_SHIFT) << PUD_SHIFT);
pmds = (extra + PMD_SIZE - 1) >> PMD_SHIFT;
- } else
- pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
+ }
+#ifndef __PAGETABLE_PMD_FOLDED
+ else if (end > pmd_mapped) {
+ if (start < pmd_mapped)
+ pmds = (end - pmd_mapped + PMD_SIZE - 1) >> PMD_SHIFT;
+ else
+ pmds = (end - start + PMD_SIZE - 1) >> PMD_SHIFT;
+ }
+#endif

tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);

@@ -51,23 +81,20 @@ static void __init find_early_table_space(unsigned long end, int use_pse,
unsigned long extra;

extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
-#ifdef CONFIG_X86_32
- extra += PMD_SIZE;
-#endif
ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
} else
- ptes = (end + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ ptes = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;

tables += roundup(ptes * sizeof(pte_t), PAGE_SIZE);

-#ifdef CONFIG_X86_32
- /* for fixmap */
- tables += roundup(__end_of_fixed_addresses * sizeof(pte_t), PAGE_SIZE);
+ if (!tables)
+ return;

+#ifdef CONFIG_X86_32
good_end = max_pfn_mapped << PAGE_SHIFT;
#endif

- base = memblock_find_in_range(start, good_end, tables, PAGE_SIZE);
+ base = memblock_find_in_range(0x00, good_end, tables, PAGE_SIZE);
if (base == MEMBLOCK_ERROR)
panic("Cannot find space for the kernel page tables");

@@ -261,7 +288,7 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
* nodes are discovered.
*/
if (!after_bootmem)
- find_early_table_space(end, use_pse, use_gbpages);
+ find_early_table_space(start, end, use_pse, use_gbpages);

for (i = 0; i < nr_range; i++)
ret = kernel_physical_mapping_init(mr[i].start, mr[i].end,
@@ -275,6 +302,9 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,

__flush_tlb_all();

+ if (pgt_buf_end != pgt_buf_top)
+ printk(KERN_DEBUG "initial kernel pagetable allocation wasted %lx"
+ " pages\n", pgt_buf_top - pgt_buf_end);
/*
* Reserve the kernel pagetable pages we used (pgt_buf_start -
* pgt_buf_end) and free the other ones (pgt_buf_end - pgt_buf_top)
--
1.7.2.3

2011-06-07 18:10:46

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH 2/3] Revert "x86,xen: introduce x86_init.mapping.pagetable_reserve"

From: Stefano Stabellini <[email protected]>

This reverts commit 279b706bf800b5967037f492dbe4fc5081ad5d0f.

Signed-off-by: Stefano Stabellini <[email protected]>
Acked-by: Konrad Rzeszutek Wilk <[email protected]>
---
arch/x86/include/asm/pgtable_types.h | 1 -
arch/x86/include/asm/x86_init.h | 12 ------------
arch/x86/kernel/x86_init.c | 4 ----
arch/x86/mm/init.c | 25 +++----------------------
arch/x86/xen/mmu.c | 15 ---------------
5 files changed, 3 insertions(+), 54 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index d56187c..7db7723 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -299,7 +299,6 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
/* Install a pte for a particular vaddr in kernel space. */
void set_pte_vaddr(unsigned long vaddr, pte_t pte);

-extern void native_pagetable_reserve(u64 start, u64 end);
#ifdef CONFIG_X86_32
extern void native_pagetable_setup_start(pgd_t *base);
extern void native_pagetable_setup_done(pgd_t *base);
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index d3d8590..643ebf2 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -68,17 +68,6 @@ struct x86_init_oem {
};

/**
- * struct x86_init_mapping - platform specific initial kernel pagetable setup
- * @pagetable_reserve: reserve a range of addresses for kernel pagetable usage
- *
- * For more details on the purpose of this hook, look in
- * init_memory_mapping and the commit that added it.
- */
-struct x86_init_mapping {
- void (*pagetable_reserve)(u64 start, u64 end);
-};
-
-/**
* struct x86_init_paging - platform specific paging functions
* @pagetable_setup_start: platform specific pre paging_init() call
* @pagetable_setup_done: platform specific post paging_init() call
@@ -134,7 +123,6 @@ struct x86_init_ops {
struct x86_init_mpparse mpparse;
struct x86_init_irqs irqs;
struct x86_init_oem oem;
- struct x86_init_mapping mapping;
struct x86_init_paging paging;
struct x86_init_timers timers;
struct x86_init_iommu iommu;
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 6f164bd..6eee082 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -61,10 +61,6 @@ struct x86_init_ops x86_init __initdata = {
.banner = default_banner,
},

- .mapping = {
- .pagetable_reserve = native_pagetable_reserve,
- },
-
.paging = {
.pagetable_setup_start = native_pagetable_setup_start,
.pagetable_setup_done = native_pagetable_setup_done,
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 0cfe8d4..15590fd 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -106,11 +106,6 @@ static void __init find_early_table_space(unsigned long start,
end, pgt_buf_start << PAGE_SHIFT, pgt_buf_top << PAGE_SHIFT);
}

-void __init native_pagetable_reserve(u64 start, u64 end)
-{
- memblock_x86_reserve_range(start, end, "PGTABLE");
-}
-
struct map_range {
unsigned long start;
unsigned long end;
@@ -305,24 +300,10 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
if (pgt_buf_end != pgt_buf_top)
printk(KERN_DEBUG "initial kernel pagetable allocation wasted %lx"
" pages\n", pgt_buf_top - pgt_buf_end);
- /*
- * Reserve the kernel pagetable pages we used (pgt_buf_start -
- * pgt_buf_end) and free the other ones (pgt_buf_end - pgt_buf_top)
- * so that they can be reused for other purposes.
- *
- * On native it just means calling memblock_x86_reserve_range, on Xen it
- * also means marking RW the pagetable pages that we allocated before
- * but that haven't been used.
- *
- * In fact on xen we mark RO the whole range pgt_buf_start -
- * pgt_buf_top, because we have to make sure that when
- * init_memory_mapping reaches the pagetable pages area, it maps
- * RO all the pagetable pages, including the ones that are beyond
- * pgt_buf_end at that time.
- */
+
if (!after_bootmem && pgt_buf_end > pgt_buf_start)
- x86_init.mapping.pagetable_reserve(PFN_PHYS(pgt_buf_start),
- PFN_PHYS(pgt_buf_end));
+ memblock_x86_reserve_range(pgt_buf_start << PAGE_SHIFT,
+ pgt_buf_end << PAGE_SHIFT, "PGTABLE");

if (!after_bootmem)
early_memtest(start, end);
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index dc708dc..2004f1e 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1153,20 +1153,6 @@ static void __init xen_pagetable_setup_start(pgd_t *base)
{
}

-static __init void xen_mapping_pagetable_reserve(u64 start, u64 end)
-{
- /* reserve the range used */
- native_pagetable_reserve(start, end);
-
- /* set as RW the rest */
- printk(KERN_DEBUG "xen: setting RW the range %llx - %llx\n", end,
- PFN_PHYS(pgt_buf_top));
- while (end < PFN_PHYS(pgt_buf_top)) {
- make_lowmem_page_readwrite(__va(end));
- end += PAGE_SIZE;
- }
-}
-
static void xen_post_allocator_init(void);

static void __init xen_pagetable_setup_done(pgd_t *base)
@@ -1997,7 +1983,6 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {

void __init xen_init_mmu_ops(void)
{
- x86_init.mapping.pagetable_reserve = xen_mapping_pagetable_reserve;
x86_init.paging.pagetable_setup_start = xen_pagetable_setup_start;
x86_init.paging.pagetable_setup_done = xen_pagetable_setup_done;
pv_mmu_ops = xen_mmu_ops;
--
1.7.2.3

2011-06-07 18:10:22

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH 3/3] x86: move memblock_x86_reserve_range PGTABLE to find_early_table_space

From: Stefano Stabellini <[email protected]>

Now that find_early_table_space knows how to calculate the exact amout
of memory needed by the kernel pagetable, we can reserve the range
directly in find_early_table_space.

Signed-off-by: Stefano Stabellini <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
---
arch/x86/mm/init.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 15590fd..36bacfe 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -104,6 +104,10 @@ static void __init find_early_table_space(unsigned long start,

printk(KERN_DEBUG "kernel direct mapping tables up to %lx @ %lx-%lx\n",
end, pgt_buf_start << PAGE_SHIFT, pgt_buf_top << PAGE_SHIFT);
+
+ if (pgt_buf_top > pgt_buf_start)
+ memblock_x86_reserve_range(pgt_buf_start << PAGE_SHIFT,
+ pgt_buf_top << PAGE_SHIFT, "PGTABLE");
}

struct map_range {
@@ -301,10 +305,6 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
printk(KERN_DEBUG "initial kernel pagetable allocation wasted %lx"
" pages\n", pgt_buf_top - pgt_buf_end);

- if (!after_bootmem && pgt_buf_end > pgt_buf_start)
- memblock_x86_reserve_range(pgt_buf_start << PAGE_SHIFT,
- pgt_buf_end << PAGE_SHIFT, "PGTABLE");
-
if (!after_bootmem)
early_memtest(start, end);

--
1.7.2.3

2011-06-20 22:38:12

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86: calculate precisely the memory needed by init_memory_mapping

On 06/07/2011 11:13 AM, [email protected] wrote:
>
> - remove the extra page added on x86_32 for the fixmap because is not
> needed: the PMD entry is already allocated and contiguous for the whole
> range (a PMD page covers 4G of virtual addresses) and the pte entry is
> already allocated by early_ioremap_init.
>

Hi Stefano,

I think this is wrong. A PMD page covers *1G* of virtual addresses, and
in the 2+2 and 1+3 memory configurations, we may or may not need a
separate PMD for the fixmap.

Am I missing something?

-hpa

2011-06-21 17:53:11

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86: calculate precisely the memory needed by init_memory_mapping

On Mon, 20 Jun 2011, H. Peter Anvin wrote:
> On 06/07/2011 11:13 AM, [email protected] wrote:
> >
> > - remove the extra page added on x86_32 for the fixmap because is not
> > needed: the PMD entry is already allocated and contiguous for the whole
> > range (a PMD page covers 4G of virtual addresses) and the pte entry is
> > already allocated by early_ioremap_init.
> >
>
> Hi Stefano,
>
> I think this is wrong. A PMD page covers *1G* of virtual addresses, and
> in the 2+2 and 1+3 memory configurations, we may or may not need a
> separate PMD for the fixmap.
>
> Am I missing something?

You are right, a PMD page covers 1G of virtual addresses so that part of
the explanation in the comment is wrong.

The reason why we don't need a separate PMD for the fixmap is that in
both PAE and non-PAE cases the last gigabyte of virtual addresses is
always covered by the initial allocation in head_32.S (swapper_pg_dir or
initial_pg_pmd).

2011-06-21 18:22:51

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86: calculate precisely the memory needed by init_memory_mapping

Stefano Stabellini <[email protected]> wrote:

>On Mon, 20 Jun 2011, H. Peter Anvin wrote:
>> On 06/07/2011 11:13 AM, [email protected] wrote:
>> >
>> > - remove the extra page added on x86_32 for the fixmap because is
>not
>> > needed: the PMD entry is already allocated and contiguous for the
>whole
>> > range (a PMD page covers 4G of virtual addresses) and the pte entry
>is
>> > already allocated by early_ioremap_init.
>> >
>>
>> Hi Stefano,
>>
>> I think this is wrong. A PMD page covers *1G* of virtual addresses,
>and
>> in the 2+2 and 1+3 memory configurations, we may or may not need a
>> separate PMD for the fixmap.
>>
>> Am I missing something?
>
>You are right, a PMD page covers 1G of virtual addresses so that part
>of
>the explanation in the comment is wrong.
>
>The reason why we don't need a separate PMD for the fixmap is that in
>both PAE and non-PAE cases the last gigabyte of virtual addresses is
>always covered by the initial allocation in head_32.S (swapper_pg_dir
>or
>initial_pg_pmd).

Ok, wasn't sure if Xen used the static allocation or not.
--
Sent from my mobile phone. Please excuse my brevity and lack of formatting.

2011-06-21 20:37:22

by Stefano Stabellini

[permalink] [raw]
Subject: [tip:x86/mm] x86, mm: Calculate precisely the memory needed by init_memory_mapping

Commit-ID: 8e7f9f8d40764a04454726c071ce5eac850ce219
Gitweb: http://git.kernel.org/tip/8e7f9f8d40764a04454726c071ce5eac850ce219
Author: Stefano Stabellini <[email protected]>
AuthorDate: Tue, 7 Jun 2011 19:13:27 +0100
Committer: H. Peter Anvin <[email protected]>
CommitDate: Tue, 21 Jun 2011 13:06:24 -0700

x86, mm: Calculate precisely the memory needed by init_memory_mapping

- take into account the size of the initial pagetable;

- remove the extra PMD_SIZE added when use_pse, because the previously
allocated PMDs are always 2M aligned;

- remove the extra page added on x86_32 for the fixmap because is not
needed: we allocate all possible PMDs statically in init_32.S and
the pte entry is already allocated by early_ioremap_init.

Signed-off-by: Stefano Stabellini <[email protected]>
Link: http://lkml.kernel.org/r/1307470409-7654-1-git-send-email-stefano.stabellini@eu.citrix.com
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/mm/init.c | 62 ++++++++++++++++++++++++++++++++++++++-------------
1 files changed, 46 insertions(+), 16 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 3032644..0cfe8d4 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -28,22 +28,52 @@ int direct_gbpages
#endif
;

-static void __init find_early_table_space(unsigned long end, int use_pse,
- int use_gbpages)
+static void __init find_early_table_space(unsigned long start,
+ unsigned long end, int use_pse, int use_gbpages)
{
- unsigned long puds, pmds, ptes, tables, start = 0, good_end = end;
+ unsigned long pmds = 0, ptes = 0, tables = 0, good_end = end,
+ pud_mapped = 0, pmd_mapped = 0, size = end - start;
phys_addr_t base;

- puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;
- tables = roundup(puds * sizeof(pud_t), PAGE_SIZE);
+ pud_mapped = DIV_ROUND_UP(PFN_PHYS(max_pfn_mapped),
+ (PUD_SIZE * PTRS_PER_PUD));
+ pud_mapped *= (PUD_SIZE * PTRS_PER_PUD);
+ pmd_mapped = DIV_ROUND_UP(PFN_PHYS(max_pfn_mapped),
+ (PMD_SIZE * PTRS_PER_PMD));
+ pmd_mapped *= (PMD_SIZE * PTRS_PER_PMD);
+
+ if (start < PFN_PHYS(max_pfn_mapped)) {
+ if (PFN_PHYS(max_pfn_mapped) < end)
+ size -= PFN_PHYS(max_pfn_mapped) - start;
+ else
+ size = 0;
+ }
+
+#ifndef __PAGETABLE_PUD_FOLDED
+ if (end > pud_mapped) {
+ unsigned long puds;
+ if (start < pud_mapped)
+ puds = (end - pud_mapped + PUD_SIZE - 1) >> PUD_SHIFT;
+ else
+ puds = (end - start + PUD_SIZE - 1) >> PUD_SHIFT;
+ tables += roundup(puds * sizeof(pud_t), PAGE_SIZE);
+ }
+#endif

if (use_gbpages) {
unsigned long extra;

extra = end - ((end>>PUD_SHIFT) << PUD_SHIFT);
pmds = (extra + PMD_SIZE - 1) >> PMD_SHIFT;
- } else
- pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
+ }
+#ifndef __PAGETABLE_PMD_FOLDED
+ else if (end > pmd_mapped) {
+ if (start < pmd_mapped)
+ pmds = (end - pmd_mapped + PMD_SIZE - 1) >> PMD_SHIFT;
+ else
+ pmds = (end - start + PMD_SIZE - 1) >> PMD_SHIFT;
+ }
+#endif

tables += roundup(pmds * sizeof(pmd_t), PAGE_SIZE);

@@ -51,23 +81,20 @@ static void __init find_early_table_space(unsigned long end, int use_pse,
unsigned long extra;

extra = end - ((end>>PMD_SHIFT) << PMD_SHIFT);
-#ifdef CONFIG_X86_32
- extra += PMD_SIZE;
-#endif
ptes = (extra + PAGE_SIZE - 1) >> PAGE_SHIFT;
} else
- ptes = (end + PAGE_SIZE - 1) >> PAGE_SHIFT;
+ ptes = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;

tables += roundup(ptes * sizeof(pte_t), PAGE_SIZE);

-#ifdef CONFIG_X86_32
- /* for fixmap */
- tables += roundup(__end_of_fixed_addresses * sizeof(pte_t), PAGE_SIZE);
+ if (!tables)
+ return;

+#ifdef CONFIG_X86_32
good_end = max_pfn_mapped << PAGE_SHIFT;
#endif

- base = memblock_find_in_range(start, good_end, tables, PAGE_SIZE);
+ base = memblock_find_in_range(0x00, good_end, tables, PAGE_SIZE);
if (base == MEMBLOCK_ERROR)
panic("Cannot find space for the kernel page tables");

@@ -261,7 +288,7 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
* nodes are discovered.
*/
if (!after_bootmem)
- find_early_table_space(end, use_pse, use_gbpages);
+ find_early_table_space(start, end, use_pse, use_gbpages);

for (i = 0; i < nr_range; i++)
ret = kernel_physical_mapping_init(mr[i].start, mr[i].end,
@@ -275,6 +302,9 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,

__flush_tlb_all();

+ if (pgt_buf_end != pgt_buf_top)
+ printk(KERN_DEBUG "initial kernel pagetable allocation wasted %lx"
+ " pages\n", pgt_buf_top - pgt_buf_end);
/*
* Reserve the kernel pagetable pages we used (pgt_buf_start -
* pgt_buf_end) and free the other ones (pgt_buf_end - pgt_buf_top)

2011-06-21 20:37:51

by Stefano Stabellini

[permalink] [raw]
Subject: [tip:x86/mm] Revert "x86,xen: introduce x86_init.mapping.pagetable_reserve"

Commit-ID: d8ca7b16cfc1496d57287caa12ade8a8e4d0c0f8
Gitweb: http://git.kernel.org/tip/d8ca7b16cfc1496d57287caa12ade8a8e4d0c0f8
Author: Stefano Stabellini <[email protected]>
AuthorDate: Tue, 7 Jun 2011 19:13:28 +0100
Committer: H. Peter Anvin <[email protected]>
CommitDate: Tue, 21 Jun 2011 13:07:11 -0700

Revert "x86,xen: introduce x86_init.mapping.pagetable_reserve"

This reverts commit 279b706bf800b5967037f492dbe4fc5081ad5d0f.

Signed-off-by: Stefano Stabellini <[email protected]>
Link: http://lkml.kernel.org/r/1307470409-7654-2-git-send-email-stefano.stabellini@eu.citrix.com
Acked-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/include/asm/pgtable_types.h | 1 -
arch/x86/include/asm/x86_init.h | 12 ------------
arch/x86/kernel/x86_init.c | 4 ----
arch/x86/mm/init.c | 25 +++----------------------
arch/x86/xen/mmu.c | 15 ---------------
5 files changed, 3 insertions(+), 54 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index d56187c..7db7723 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -299,7 +299,6 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
/* Install a pte for a particular vaddr in kernel space. */
void set_pte_vaddr(unsigned long vaddr, pte_t pte);

-extern void native_pagetable_reserve(u64 start, u64 end);
#ifdef CONFIG_X86_32
extern void native_pagetable_setup_start(pgd_t *base);
extern void native_pagetable_setup_done(pgd_t *base);
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index d3d8590..643ebf2 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -68,17 +68,6 @@ struct x86_init_oem {
};

/**
- * struct x86_init_mapping - platform specific initial kernel pagetable setup
- * @pagetable_reserve: reserve a range of addresses for kernel pagetable usage
- *
- * For more details on the purpose of this hook, look in
- * init_memory_mapping and the commit that added it.
- */
-struct x86_init_mapping {
- void (*pagetable_reserve)(u64 start, u64 end);
-};
-
-/**
* struct x86_init_paging - platform specific paging functions
* @pagetable_setup_start: platform specific pre paging_init() call
* @pagetable_setup_done: platform specific post paging_init() call
@@ -134,7 +123,6 @@ struct x86_init_ops {
struct x86_init_mpparse mpparse;
struct x86_init_irqs irqs;
struct x86_init_oem oem;
- struct x86_init_mapping mapping;
struct x86_init_paging paging;
struct x86_init_timers timers;
struct x86_init_iommu iommu;
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 6f164bd..6eee082 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -61,10 +61,6 @@ struct x86_init_ops x86_init __initdata = {
.banner = default_banner,
},

- .mapping = {
- .pagetable_reserve = native_pagetable_reserve,
- },
-
.paging = {
.pagetable_setup_start = native_pagetable_setup_start,
.pagetable_setup_done = native_pagetable_setup_done,
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 0cfe8d4..15590fd 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -106,11 +106,6 @@ static void __init find_early_table_space(unsigned long start,
end, pgt_buf_start << PAGE_SHIFT, pgt_buf_top << PAGE_SHIFT);
}

-void __init native_pagetable_reserve(u64 start, u64 end)
-{
- memblock_x86_reserve_range(start, end, "PGTABLE");
-}
-
struct map_range {
unsigned long start;
unsigned long end;
@@ -305,24 +300,10 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
if (pgt_buf_end != pgt_buf_top)
printk(KERN_DEBUG "initial kernel pagetable allocation wasted %lx"
" pages\n", pgt_buf_top - pgt_buf_end);
- /*
- * Reserve the kernel pagetable pages we used (pgt_buf_start -
- * pgt_buf_end) and free the other ones (pgt_buf_end - pgt_buf_top)
- * so that they can be reused for other purposes.
- *
- * On native it just means calling memblock_x86_reserve_range, on Xen it
- * also means marking RW the pagetable pages that we allocated before
- * but that haven't been used.
- *
- * In fact on xen we mark RO the whole range pgt_buf_start -
- * pgt_buf_top, because we have to make sure that when
- * init_memory_mapping reaches the pagetable pages area, it maps
- * RO all the pagetable pages, including the ones that are beyond
- * pgt_buf_end at that time.
- */
+
if (!after_bootmem && pgt_buf_end > pgt_buf_start)
- x86_init.mapping.pagetable_reserve(PFN_PHYS(pgt_buf_start),
- PFN_PHYS(pgt_buf_end));
+ memblock_x86_reserve_range(pgt_buf_start << PAGE_SHIFT,
+ pgt_buf_end << PAGE_SHIFT, "PGTABLE");

if (!after_bootmem)
early_memtest(start, end);
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index dc708dc..2004f1e 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1153,20 +1153,6 @@ static void __init xen_pagetable_setup_start(pgd_t *base)
{
}

-static __init void xen_mapping_pagetable_reserve(u64 start, u64 end)
-{
- /* reserve the range used */
- native_pagetable_reserve(start, end);
-
- /* set as RW the rest */
- printk(KERN_DEBUG "xen: setting RW the range %llx - %llx\n", end,
- PFN_PHYS(pgt_buf_top));
- while (end < PFN_PHYS(pgt_buf_top)) {
- make_lowmem_page_readwrite(__va(end));
- end += PAGE_SIZE;
- }
-}
-
static void xen_post_allocator_init(void);

static void __init xen_pagetable_setup_done(pgd_t *base)
@@ -1997,7 +1983,6 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {

void __init xen_init_mmu_ops(void)
{
- x86_init.mapping.pagetable_reserve = xen_mapping_pagetable_reserve;
x86_init.paging.pagetable_setup_start = xen_pagetable_setup_start;
x86_init.paging.pagetable_setup_done = xen_pagetable_setup_done;
pv_mmu_ops = xen_mmu_ops;

2011-06-21 20:38:28

by Stefano Stabellini

[permalink] [raw]
Subject: [tip:x86/mm] x86, init : Move memblock_x86_reserve_range PGTABLE to find_early_table_space

Commit-ID: 1938931a20da89359fb3f1189d46f9b0f29e5117
Gitweb: http://git.kernel.org/tip/1938931a20da89359fb3f1189d46f9b0f29e5117
Author: Stefano Stabellini <[email protected]>
AuthorDate: Tue, 7 Jun 2011 19:13:29 +0100
Committer: H. Peter Anvin <[email protected]>
CommitDate: Tue, 21 Jun 2011 13:07:14 -0700

x86, init : Move memblock_x86_reserve_range PGTABLE to find_early_table_space

Now that find_early_table_space knows how to calculate the exact amout
of memory needed by the kernel pagetable, we can reserve the range
directly in find_early_table_space.

This allows Xen to know what memory range these will occupy and
therefore how to manage that memory.

Signed-off-by: Stefano Stabellini <[email protected]>
Link: http://lkml.kernel.org/r/1307470409-7654-3-git-send-email-stefano.stabellini@eu.citrix.com
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/mm/init.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 15590fd..36bacfe 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -104,6 +104,10 @@ static void __init find_early_table_space(unsigned long start,

printk(KERN_DEBUG "kernel direct mapping tables up to %lx @ %lx-%lx\n",
end, pgt_buf_start << PAGE_SHIFT, pgt_buf_top << PAGE_SHIFT);
+
+ if (pgt_buf_top > pgt_buf_start)
+ memblock_x86_reserve_range(pgt_buf_start << PAGE_SHIFT,
+ pgt_buf_top << PAGE_SHIFT, "PGTABLE");
}

struct map_range {
@@ -301,10 +305,6 @@ unsigned long __init_refok init_memory_mapping(unsigned long start,
printk(KERN_DEBUG "initial kernel pagetable allocation wasted %lx"
" pages\n", pgt_buf_top - pgt_buf_end);

- if (!after_bootmem && pgt_buf_end > pgt_buf_start)
- memblock_x86_reserve_range(pgt_buf_start << PAGE_SHIFT,
- pgt_buf_end << PAGE_SHIFT, "PGTABLE");
-
if (!after_bootmem)
early_memtest(start, end);

2011-06-21 20:59:26

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0/3] x86: remove x86_init.mapping.pagetable_reserve


-tip testing found that these patches cause the following boot crash
on native:

[ 0.000000] Base memory trampoline at [ffff88000009d000] 9d000 size 8192
[ 0.000000] init_memory_mapping: 0000000000000000-000000003fff0000
[ 0.000000] 0000000000 - 003fff0000 page 4k
[ 0.000000] kernel direct mapping tables up to 3fff0000 @ 3fef0000-3fff0000
[ 0.000000] Kernel panic - not syncing: alloc_low_page: ran out of memory

Config attached, full bootlog below. I've excluded the commits for
now.

Ingo

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 3.0.0-rc4-tip+ (mingo@sirius) (gcc version 4.6.0 20110509 (Red Hat 4.6.0-7) (GCC) ) #139263 SMP PREEMPT Tue Jun 21 22:47:13 CEST 2011
[ 0.000000] Command line: root=/dev/sda6 earlyprintk=ttyS0,115200 console=ttyS0,115200 debug initcall_debug sysrq_always_enabled ignore_loglevel selinux=0 nmi_watchdog=1 panic=1 3
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
[ 0.000000] BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)
[ 0.000000] BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS)
[ 0.000000] BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data)
[ 0.000000] BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
[ 0.000000] BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
[ 0.000000] bootconsole [earlyser0] enabled
[ 0.000000] debug: ignoring loglevel setting.
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] DMI 2.3 present.
[ 0.000000] DMI: System manufacturer System Product Name/A8N-E, BIOS ASUS A8N-E ACPI BIOS Revision 1008 08/22/2005
[ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
[ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
[ 0.000000] No AGP bridge found
[ 0.000000] last_pfn = 0x3fff0 max_arch_pfn = 0x400000000
[ 0.000000] MTRR default type: uncachable
[ 0.000000] MTRR fixed ranges enabled:
[ 0.000000] 00000-9FFFF write-back
[ 0.000000] A0000-BFFFF uncachable
[ 0.000000] C0000-C7FFF write-protect
[ 0.000000] C8000-FFFFF uncachable
[ 0.000000] MTRR variable ranges enabled:
[ 0.000000] 0 base 0000000000 mask FFC0000000 write-back
[ 0.000000] 1 disabled
[ 0.000000] 2 disabled
[ 0.000000] 3 disabled
[ 0.000000] 4 disabled
[ 0.000000] 5 disabled
[ 0.000000] 6 disabled
[ 0.000000] 7 disabled
[ 0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[ 0.000000] found SMP MP-table at [ffff8800000f5680] f5680
[ 0.000000] initial memory mapped : 0 - 20000000
[ 0.000000] Base memory trampoline at [ffff88000009d000] 9d000 size 8192
[ 0.000000] init_memory_mapping: 0000000000000000-000000003fff0000
[ 0.000000] 0000000000 - 003fff0000 page 4k
[ 0.000000] kernel direct mapping tables up to 3fff0000 @ 3fef0000-3fff0000
[ 0.000000] Kernel panic - not syncing: alloc_low_page: ran out of memory


Attachments:
(No filename) (3.25 kB)
config (66.28 kB)
Download all attachments

2011-06-22 17:12:20

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH 0/3] x86: remove x86_init.mapping.pagetable_reserve

On Tue, 21 Jun 2011, Ingo Molnar wrote:
>
> -tip testing found that these patches cause the following boot crash
> on native:
>
> [ 0.000000] Base memory trampoline at [ffff88000009d000] 9d000 size 8192
> [ 0.000000] init_memory_mapping: 0000000000000000-000000003fff0000
> [ 0.000000] 0000000000 - 003fff0000 page 4k
> [ 0.000000] kernel direct mapping tables up to 3fff0000 @ 3fef0000-3fff0000
> [ 0.000000] Kernel panic - not syncing: alloc_low_page: ran out of memory
>
> Config attached, full bootlog below. I've excluded the commits for
> now.
>

Thanks for the logs; I was able to reproduce the problem and I know what
the issue is: CONFIG_DEBUG_PAGEALLOC forces use_pse to 0 while
on x86_64 cpu_has_pse is 1.
As a consequence the initial pagetable allocator in head_64.S didn't
allocate any pte pages but find_early_table_space assumes it did.
The issue doesn't happen on x86_32 (PAE and non-PAE) because head_32.S
always uses 4KB pages.

The patch below fixes the problem: on x86_64 we should not limit the
memory size we need to cover with 4KB ptes depending on the initial
allocation, because head_64.S always uses 2MB pages.

Ingo, if you know any other debug config options that might affect
page table allocations, please let me know.

---

commit 2b66a94cf8dbbf4cf2148456381b8674ed8191f0
Author: Stefano Stabellini <[email protected]>
Date: Wed Jun 22 11:46:23 2011 +0000

x86_64: do not assume head_64.S used 4KB pages when !use_pse

head_64.S, which sets up the initial page table on x86_64, is not aware
of PSE being enabled or disabled and it always allocates the initial
mapping using 2MB pages.

Therefore on x86_64 find_early_table_space shouldn't update the amount
of pages needed for pte pages depending on the size of the initial
mapping, because we know for sure that no pte pages have been allocated
yet.

Signed-off-by: Stefano Stabellini <[email protected]>
Reported-by: Ingo Molnar <[email protected]>

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 36bacfe..1e3098b 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -42,12 +42,19 @@ static void __init find_early_table_space(unsigned long start,
(PMD_SIZE * PTRS_PER_PMD));
pmd_mapped *= (PMD_SIZE * PTRS_PER_PMD);

+ /*
+ * On x86_64 do not limit the size we need to cover with 4KB pages
+ * depending on the initial allocation because head_64.S always uses
+ * 2MB pages.
+ */
+#ifdef CONFIG_X86_32
if (start < PFN_PHYS(max_pfn_mapped)) {
if (PFN_PHYS(max_pfn_mapped) < end)
size -= PFN_PHYS(max_pfn_mapped) - start;
else
size = 0;
}
+#endif

#ifndef __PAGETABLE_PUD_FOLDED
if (end > pud_mapped) {