2006-08-01 09:02:19

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 7 of 13] Make __FIXADDR_TOP variable to allow it to make space for a hypervisor

* Jeremy Fitzhardinge ([email protected]) wrote:
> -#define MAXMEM (-__PAGE_OFFSET-__VMALLOC_RESERVE)
> +#define MAXMEM (__FIXADDR_TOP-__PAGE_OFFSET-__VMALLOC_RESERVE)

In the native case we lose one page of lowmem now.

thanks,
-chris


2006-08-01 14:34:48

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 7 of 13] Make __FIXADDR_TOP variable to allow it to make space for a hypervisor

On Tue, 1 Aug 2006 02:03:30 -0700
Chris Wright <[email protected]> wrote:

> * Jeremy Fitzhardinge ([email protected]) wrote:
> > -#define MAXMEM (-__PAGE_OFFSET-__VMALLOC_RESERVE)
> > +#define MAXMEM (__FIXADDR_TOP-__PAGE_OFFSET-__VMALLOC_RESERVE)
>
> In the native case we lose one page of lowmem now.

erm, isn't this the hunk which gave one of my machines a 4kb highmem zone?
I have memories of reverting it.

2006-08-01 21:36:37

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 7 of 13] Make __FIXADDR_TOP variable to allow it to make space for a hypervisor

* Andrew Morton ([email protected]) wrote:
> On Tue, 1 Aug 2006 02:03:30 -0700
> Chris Wright <[email protected]> wrote:
>
> > * Jeremy Fitzhardinge ([email protected]) wrote:
> > > -#define MAXMEM (-__PAGE_OFFSET-__VMALLOC_RESERVE)
> > > +#define MAXMEM (__FIXADDR_TOP-__PAGE_OFFSET-__VMALLOC_RESERVE)
> >
> > In the native case we lose one page of lowmem now.
>
> erm, isn't this the hunk which gave one of my machines a 4kb highmem zone?
> I have memories of reverting it.

Yes, that does sound quite familiar. I couldn't find the thread, do you
recall any of the details? I expect it's the same issue as the off by one
page I mentioned above.

thanks,
-chris

2006-08-02 01:47:35

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 7 of 13] Make __FIXADDR_TOP variable to allow it to make space for a hypervisor

On Tue, 2006-08-01 at 14:37 -0700, Chris Wright wrote:
> * Andrew Morton ([email protected]) wrote:
> > On Tue, 1 Aug 2006 02:03:30 -0700
> > Chris Wright <[email protected]> wrote:
> >
> > > * Jeremy Fitzhardinge ([email protected]) wrote:
> > > > -#define MAXMEM (-__PAGE_OFFSET-__VMALLOC_RESERVE)
> > > > +#define MAXMEM (__FIXADDR_TOP-__PAGE_OFFSET-__VMALLOC_RESERVE)
> > >
> > > In the native case we lose one page of lowmem now.
> >
> > erm, isn't this the hunk which gave one of my machines a 4kb highmem zone?
> > I have memories of reverting it.
>
> Yes, that does sound quite familiar. I couldn't find the thread, do you
> recall any of the details? I expect it's the same issue as the off by one
> page I mentioned above.

Good catch Andrew. I did say we'd fix this. Will frob this patch
further...

Here's my final reply from my archives:

Subject:
Re: sparsemem panic in 2.6.17-rc5-mm1 and -mm2
Date:
Wed, 07 Jun 2006 16:49:12 +1000

On Tue, 2006-06-06 at 22:50 -0700, Andrew Morton wrote:
> On Wed, 07 Jun 2006 15:36:25 +1000
> Rusty Russell <[email protected]> wrote:
> > You now have 1 page more memory available in your system.
>
> The kernel has differing opinions about that:
>
> BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
> BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
> BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
> BIOS-e820: 0000000000100000 - 0000000038000000 (usable)
> BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
> BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
> BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
> 0MB HIGHMEM available.
> 896MB LOWMEM available.

Sure, the pages are reserved, but we still map them into the kernel
address space.

> > I'm sure Gerd will slap me if I'm wrong on this. Here's the patch
> > fragment:
> >
> > -#define MAXMEM (-__PAGE_OFFSET-__VMALLOC_RESERVE)
> > +#define MAXMEM (__FIXADDR_TOP-__PAGE_OFFSET-__VMALLOC_RESERVE)
>
> Well. Applying this with `patch -R' would presumably restore the situation.
> But not having a clue why this change was made, I didn't bother trying it.

Actually, the comment above __VMALLOC_RESERVE already says "This much
address space is reserved for vmalloc() and iomap() as well as fixmap
mappings.", so it should have already been taken into account there.

So, please revert this. When we introduce an actual CONFIG_MEMORY hole,
we'll patch in an explicit "-MEMHOLE_SIZE" or something here.

> From what you're saying, it appears that this patch is an unrelated change,
> to fix the off-by-one? And that if this machine had anything other than
> exactly 7*128MB of physical memory, I wouldn't have noticed?

You wouldn't have noticed, yes. But I'm not convinced the "fix" was
right anyway.

Thanks for chasing this!
Rusty.
--
Help! Save Australia from the worst of the DMCA: http://linux.org.au/law

2006-08-02 07:00:36

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 7 of 13] Make __FIXADDR_TOP variable to allow it to make space for a hypervisor

* Rusty Russell ([email protected]) wrote:
> On Tue, 2006-08-01 at 14:37 -0700, Chris Wright wrote:
> > * Andrew Morton ([email protected]) wrote:
> > > On Tue, 1 Aug 2006 02:03:30 -0700
> > > Chris Wright <[email protected]> wrote:
> > >
> > > > * Jeremy Fitzhardinge ([email protected]) wrote:
> > > > > -#define MAXMEM (-__PAGE_OFFSET-__VMALLOC_RESERVE)
> > > > > +#define MAXMEM (__FIXADDR_TOP-__PAGE_OFFSET-__VMALLOC_RESERVE)
> > > >
> > > > In the native case we lose one page of lowmem now.
> > >
> > > erm, isn't this the hunk which gave one of my machines a 4kb highmem zone?
> > > I have memories of reverting it.
> >
> > Yes, that does sound quite familiar. I couldn't find the thread, do you
> > recall any of the details? I expect it's the same issue as the off by one
> > page I mentioned above.
>
> Good catch Andrew. I did say we'd fix this. Will frob this patch
> further...

Here's an updated patch. Rather than use __FIXADDR_TOP to adjust for
MAXMEM, directly update __VMALLOC_RESERVE which is used to reserve the
space for vmalloc, iomap, and fixmap (as comments aptly point out). I
tested this with a bunch of configurations, and booted a XenoLinux
kernel with this patch as well.

thanks,
-chris
--

Make __FIXADDR_TOP variable to allow it to make space for a hypervisor.

Make __FIXADDR_TOP a variable, so that it can be set to not get in the
way of address space a hypervisor may want to reserve.

Original patch by Gerd Hoffmann <[email protected]>

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
Signed-off-by: Chris Wright <[email protected]>
Cc: Gerd Hoffmann <[email protected]>

---
arch/i386/Kconfig | 1 +
arch/i386/mm/init.c | 42 ++++++++++++++++++++++++++++++++++++++++++
arch/i386/mm/pgtable.c | 19 +++++++++++++++++++
include/asm-i386/fixmap.h | 7 ++++++-
4 files changed, 68 insertions(+), 1 deletion(-)

diff -r 5183f1f33cf4 arch/i386/Kconfig
--- a/arch/i386/Kconfig Tue Aug 01 01:06:48 2006 -0400
+++ b/arch/i386/Kconfig Wed Aug 02 02:34:44 2006 -0400
@@ -792,6 +792,7 @@ config COMPAT_VDSO
config COMPAT_VDSO
bool "Compat VDSO support"
default y
+ depends on !PARAVIRT
help
Map the VDSO to the predictable old-style address too.
---help---
diff -r 5183f1f33cf4 arch/i386/mm/init.c
--- a/arch/i386/mm/init.c Tue Aug 01 01:06:48 2006 -0400
+++ b/arch/i386/mm/init.c Wed Aug 02 02:34:44 2006 -0400
@@ -629,6 +629,48 @@ void __init mem_init(void)
(unsigned long) (totalhigh_pages << (PAGE_SHIFT-10))
);

+#if 1 /* double-sanity-check paranoia */
+ printk("virtual kernel memory layout:\n"
+ " fixmap : 0x%08lx - 0x%08lx (%4ld kB)\n"
+#ifdef CONFIG_HIGHMEM
+ " pkmap : 0x%08lx - 0x%08lx (%4ld kB)\n"
+#endif
+ " vmalloc : 0x%08lx - 0x%08lx (%4ld MB)\n"
+ " lowmem : 0x%08lx - 0x%08lx (%4ld MB)\n"
+ " .init : 0x%08lx - 0x%08lx (%4ld kB)\n"
+ " .data : 0x%08lx - 0x%08lx (%4ld kB)\n"
+ " .text : 0x%08lx - 0x%08lx (%4ld kB)\n",
+ FIXADDR_START, FIXADDR_TOP,
+ (FIXADDR_TOP - FIXADDR_START) >> 10,
+
+#ifdef CONFIG_HIGHMEM
+ PKMAP_BASE, PKMAP_BASE+LAST_PKMAP*PAGE_SIZE,
+ (LAST_PKMAP*PAGE_SIZE) >> 10,
+#endif
+
+ VMALLOC_START, VMALLOC_END,
+ (VMALLOC_END - VMALLOC_START) >> 20,
+
+ (unsigned long)__va(0), (unsigned long)high_memory,
+ ((unsigned long)high_memory - (unsigned long)__va(0)) >> 20,
+
+ (unsigned long)&__init_begin, (unsigned long)&__init_end,
+ ((unsigned long)&__init_end - (unsigned long)&__init_begin) >> 10,
+
+ (unsigned long)&_etext, (unsigned long)&_edata,
+ ((unsigned long)&_edata - (unsigned long)&_etext) >> 10,
+
+ (unsigned long)&_text, (unsigned long)&_etext,
+ ((unsigned long)&_etext - (unsigned long)&_text) >> 10);
+
+#ifdef CONFIG_HIGHMEM
+ BUG_ON(PKMAP_BASE+LAST_PKMAP*PAGE_SIZE > FIXADDR_START);
+ BUG_ON(VMALLOC_END > PKMAP_BASE);
+#endif
+ BUG_ON(VMALLOC_START > VMALLOC_END);
+ BUG_ON((unsigned long)high_memory > VMALLOC_START);
+#endif /* double-sanity-check paranoia */
+
#ifdef CONFIG_X86_PAE
if (!cpu_has_pae)
panic("cannot execute a PAE-enabled kernel on a PAE-less CPU!");
diff -r 5183f1f33cf4 arch/i386/mm/pgtable.c
--- a/arch/i386/mm/pgtable.c Tue Aug 01 01:06:48 2006 -0400
+++ b/arch/i386/mm/pgtable.c Wed Aug 02 02:34:44 2006 -0400
@@ -12,6 +12,7 @@
#include <linux/slab.h>
#include <linux/pagemap.h>
#include <linux/spinlock.h>
+#include <linux/module.h>

#include <asm/system.h>
#include <asm/pgtable.h>
@@ -137,6 +138,12 @@ void set_pmd_pfn(unsigned long vaddr, un
__flush_tlb_one(vaddr);
}

+static int fixmaps;
+#ifndef CONFIG_COMPAT_VDSO
+unsigned long __FIXADDR_TOP = 0xfffff000;
+EXPORT_SYMBOL(__FIXADDR_TOP);
+#endif
+
void __set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t flags)
{
unsigned long address = __fix_to_virt(idx);
@@ -146,6 +153,18 @@ void __set_fixmap (enum fixed_addresses
return;
}
set_pte_pfn(address, phys >> PAGE_SHIFT, flags);
+ fixmaps++;
+}
+
+void set_fixaddr_top(unsigned long top)
+{
+ BUG_ON(fixmaps > 0);
+#ifdef CONFIG_COMPAT_VDSO
+ BUG_ON(top - PAGE_SIZE != __FIXADDR_TOP);
+#else
+ __FIXADDR_TOP = top - PAGE_SIZE;
+ __VMALLOC_RESERVE -= top;
+#endif
}

pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
diff -r 5183f1f33cf4 include/asm-i386/fixmap.h
--- a/include/asm-i386/fixmap.h Tue Aug 01 01:06:48 2006 -0400
+++ b/include/asm-i386/fixmap.h Wed Aug 02 02:34:44 2006 -0400
@@ -19,7 +19,11 @@
* Leave one empty page between vmalloc'ed areas and
* the start of the fixmap.
*/
-#define __FIXADDR_TOP 0xfffff000
+#ifndef CONFIG_COMPAT_VDSO
+extern unsigned long __FIXADDR_TOP;
+#else
+#define __FIXADDR_TOP 0xfffff000
+#endif

#ifndef __ASSEMBLY__
#include <linux/kernel.h>
@@ -93,6 +97,7 @@ enum fixed_addresses {

extern void __set_fixmap (enum fixed_addresses idx,
unsigned long phys, pgprot_t flags);
+extern void set_fixaddr_top(unsigned long top);

#define set_fixmap(idx, phys) \
__set_fixmap(idx, phys, PAGE_KERNEL)

2006-08-02 07:20:13

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 7 of 13] Make __FIXADDR_TOP variable to allow it to make space for a hypervisor

On Wed, 2006-08-02 at 00:01 -0700, Chris Wright wrote:
> Here's an updated patch. Rather than use __FIXADDR_TOP to adjust for
> MAXMEM, directly update __VMALLOC_RESERVE which is used to reserve the
> space for vmalloc, iomap, and fixmap (as comments aptly point out). I
> tested this with a bunch of configurations, and booted a XenoLinux
> kernel with this patch as well.

Just one minor point:

> +void set_fixaddr_top(unsigned long top)
> +{
> + BUG_ON(fixmaps > 0);
> +#ifdef CONFIG_COMPAT_VDSO
> + BUG_ON(top - PAGE_SIZE != __FIXADDR_TOP);
> +#else
> + __FIXADDR_TOP = top - PAGE_SIZE;
> + __VMALLOC_RESERVE -= top;
> +#endif
> }

This no longer seems to be an appropriate name. How about
set_address_top_reserve or something?

void set_address_top_reserve(unsigned long reserve)
{
BUG_ON(fixmaps > 0);
#ifdef CONFIG_COMPAT_VDSO
BUG_ON(reserve != 0);
#else
__FIXADDR_TOP = -reserve - PAGE_SIZE;
__VMALLOC_RESERVE += reserve;
#endif
}

(I *think* I got the logic here correct).

Rusty.
--
Help! Save Australia from the worst of the DMCA: http://linux.org.au/law

2006-08-02 08:25:28

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 7 of 13] Make __FIXADDR_TOP variable to allow it to make space for a hypervisor

* Rusty Russell ([email protected]) wrote:
> On Wed, 2006-08-02 at 00:01 -0700, Chris Wright wrote:
> > Here's an updated patch. Rather than use __FIXADDR_TOP to adjust for
> > MAXMEM, directly update __VMALLOC_RESERVE which is used to reserve the
> > space for vmalloc, iomap, and fixmap (as comments aptly point out). I
> > tested this with a bunch of configurations, and booted a XenoLinux
> > kernel with this patch as well.
>
> Just one minor point:
>
> > +void set_fixaddr_top(unsigned long top)
> > +{
> > + BUG_ON(fixmaps > 0);
> > +#ifdef CONFIG_COMPAT_VDSO
> > + BUG_ON(top - PAGE_SIZE != __FIXADDR_TOP);
> > +#else
> > + __FIXADDR_TOP = top - PAGE_SIZE;
> > + __VMALLOC_RESERVE -= top;
> > +#endif
> > }
>
> This no longer seems to be an appropriate name. How about
> set_address_top_reserve or something?

Sure how about reserve_top_address()? Requires a respin of the next
patch as well. I've done that, but it still needs to be updated to
early_param, so I won't repost that one.

thanks,
-chris
--

Subject: Make __FIXADDR_TOP variable to allow it to make space for a hypervisor.

Make __FIXADDR_TOP a variable, so that it can be set to not get in the
way of address space a hypervisor may want to reserve.

Original patch by Gerd Hoffmann <[email protected]>

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
Signed-off-by: Chris Wright <[email protected]>
Cc: Gerd Hoffmann <[email protected]>

---
arch/i386/Kconfig | 1 +
arch/i386/mm/init.c | 42 ++++++++++++++++++++++++++++++++++++++++++
arch/i386/mm/pgtable.c | 26 ++++++++++++++++++++++++++
include/asm-i386/fixmap.h | 7 ++++++-
4 files changed, 75 insertions(+), 1 deletion(-)

===================================================================
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -792,6 +792,7 @@ config COMPAT_VDSO
config COMPAT_VDSO
bool "Compat VDSO support"
default y
+ depends on !PARAVIRT
help
Map the VDSO to the predictable old-style address too.
---help---
===================================================================
--- a/arch/i386/mm/init.c
+++ b/arch/i386/mm/init.c
@@ -629,6 +629,48 @@ void __init mem_init(void)
(unsigned long) (totalhigh_pages << (PAGE_SHIFT-10))
);

+#if 1 /* double-sanity-check paranoia */
+ printk("virtual kernel memory layout:\n"
+ " fixmap : 0x%08lx - 0x%08lx (%4ld kB)\n"
+#ifdef CONFIG_HIGHMEM
+ " pkmap : 0x%08lx - 0x%08lx (%4ld kB)\n"
+#endif
+ " vmalloc : 0x%08lx - 0x%08lx (%4ld MB)\n"
+ " lowmem : 0x%08lx - 0x%08lx (%4ld MB)\n"
+ " .init : 0x%08lx - 0x%08lx (%4ld kB)\n"
+ " .data : 0x%08lx - 0x%08lx (%4ld kB)\n"
+ " .text : 0x%08lx - 0x%08lx (%4ld kB)\n",
+ FIXADDR_START, FIXADDR_TOP,
+ (FIXADDR_TOP - FIXADDR_START) >> 10,
+
+#ifdef CONFIG_HIGHMEM
+ PKMAP_BASE, PKMAP_BASE+LAST_PKMAP*PAGE_SIZE,
+ (LAST_PKMAP*PAGE_SIZE) >> 10,
+#endif
+
+ VMALLOC_START, VMALLOC_END,
+ (VMALLOC_END - VMALLOC_START) >> 20,
+
+ (unsigned long)__va(0), (unsigned long)high_memory,
+ ((unsigned long)high_memory - (unsigned long)__va(0)) >> 20,
+
+ (unsigned long)&__init_begin, (unsigned long)&__init_end,
+ ((unsigned long)&__init_end - (unsigned long)&__init_begin) >> 10,
+
+ (unsigned long)&_etext, (unsigned long)&_edata,
+ ((unsigned long)&_edata - (unsigned long)&_etext) >> 10,
+
+ (unsigned long)&_text, (unsigned long)&_etext,
+ ((unsigned long)&_etext - (unsigned long)&_text) >> 10);
+
+#ifdef CONFIG_HIGHMEM
+ BUG_ON(PKMAP_BASE+LAST_PKMAP*PAGE_SIZE > FIXADDR_START);
+ BUG_ON(VMALLOC_END > PKMAP_BASE);
+#endif
+ BUG_ON(VMALLOC_START > VMALLOC_END);
+ BUG_ON((unsigned long)high_memory > VMALLOC_START);
+#endif /* double-sanity-check paranoia */
+
#ifdef CONFIG_X86_PAE
if (!cpu_has_pae)
panic("cannot execute a PAE-enabled kernel on a PAE-less CPU!");
===================================================================
--- a/arch/i386/mm/pgtable.c
+++ b/arch/i386/mm/pgtable.c
@@ -12,6 +12,7 @@
#include <linux/slab.h>
#include <linux/pagemap.h>
#include <linux/spinlock.h>
+#include <linux/module.h>

#include <asm/system.h>
#include <asm/pgtable.h>
@@ -137,6 +138,12 @@ void set_pmd_pfn(unsigned long vaddr, un
__flush_tlb_one(vaddr);
}

+static int fixmaps;
+#ifndef CONFIG_COMPAT_VDSO
+unsigned long __FIXADDR_TOP = 0xfffff000;
+EXPORT_SYMBOL(__FIXADDR_TOP);
+#endif
+
void __set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t flags)
{
unsigned long address = __fix_to_virt(idx);
@@ -146,6 +153,25 @@ void __set_fixmap (enum fixed_addresses
return;
}
set_pte_pfn(address, phys >> PAGE_SHIFT, flags);
+ fixmaps++;
+}
+
+/**
+ * reserve_top_address - reserves a hole in the top of kernel address space
+ * @reserve - size of hole to reserve
+ *
+ * Can be used to relocate the fixmap area and poke a hole in the top
+ * of kernel address space to make room for a hypervisor.
+ */
+void reserve_top_address(unsigned long reserve)
+{
+ BUG_ON(fixmaps > 0);
+#ifdef CONFIG_COMPAT_VDSO
+ BUG_ON(reserve != 0);
+#else
+ __FIXADDR_TOP = -reserve - PAGE_SIZE;
+ __VMALLOC_RESERVE += reserve;
+#endif
}

pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
===================================================================
--- a/include/asm-i386/fixmap.h
+++ b/include/asm-i386/fixmap.h
@@ -19,7 +19,11 @@
* Leave one empty page between vmalloc'ed areas and
* the start of the fixmap.
*/
-#define __FIXADDR_TOP 0xfffff000
+#ifndef CONFIG_COMPAT_VDSO
+extern unsigned long __FIXADDR_TOP;
+#else
+#define __FIXADDR_TOP 0xfffff000
+#endif

#ifndef __ASSEMBLY__
#include <linux/kernel.h>
@@ -93,6 +97,7 @@ enum fixed_addresses {

extern void __set_fixmap (enum fixed_addresses idx,
unsigned long phys, pgprot_t flags);
+extern void reserve_top_address(unsigned long reserve);

#define set_fixmap(idx, phys) \
__set_fixmap(idx, phys, PAGE_KERNEL)