LinuxLists.cc - [RFC] [PATCH 2.6.19-rc4] kdump panics early in boot when reserving MP Tables located in high memory

[permalink] [raw]

Subject: Re: [Fastboot] [RFC] [PATCH 2.6.19-rc4] kdump panics early in boot when reserving MP Tables located in high memory

On Thu, Nov 02, 2006 at 05:24:32PM -0500, Amul Shah wrote:
> The kdump crash kernel panics when it tries to reserve the MP Config
> tables on an ES7000.
>
> The MP Config table is located above 1MB of physical memory in a
> reserved memory area. It is located outside the first 1MB area because
> the tables are too large, 240k.
>

Hi Amul,

Can you tell where it is placed in your system? At the end of physical
RAM?

> The crash kernel is given a user defined memory map with E820 reserved
> and ACPI areas passed in by kexec tools and a usable area from 16MB
> physical to 80MB physical. This user defined map causes the top of
> memory to be set as 80MB.
>
> The ACPI tables and MP Tables reside higher in memory. When reserving
> memory with reserve_bootmem_generic, the function has a BUG panic if the
> memory location to reserve is above the top of memory. The MP table is
> above the top of memory in a user defined memory map.
>
> This patch will ignore reserving the MP tables if the MP table resides
> in an area already reserved in the E820.
>
> I have two alternate patches that accomplish the same effect if this
> patch is not acceptable.
> 1. avoid reserving the MP tables if a user defined memory map or if a
> user defined memory limit ("mem=") is used.
> 2. avoid reserving the MP tables if a kernel parameter is passed in to
> ignore MP table reservation.
>
>

I think both the methods are not the right way to solve the problem. It
will just fix the symtom you are facing. Currently this solution works
for you as you are using MADT tables from ACPI. But it will fail if you try
to boot second kernel on your system with acpi=off as MP tables are not
accessible.

I think the right way to fix this problem would be to let kexec-tools know
where the MP table is in RAM and kexec-tools can create another memmap=
entry for that area so that MP tables are accessible in second kernel.

I think you need to export the MP table location and size to user space,
say through /sys/kernel/. And also modify kexec-tools to parse it.

Thanks
Vivek

2006-11-03 02:40:59

[permalink] [raw]

Subject: Re: [RFC] [PATCH 2.6.19-rc4] kdump panics early in boot when reserving MP Tables located in high memory

On Thursday 02 November 2006 23:24, Amul Shah wrote:

>
> The ACPI tables and MP Tables reside higher in memory. When reserving
> memory with reserve_bootmem_generic, the function has a BUG panic if the
> memory location to reserve is above the top of memory. The MP table is
> above the top of memory in a user defined memory map.

I think it would be cleaner to add a check in reserve_bootmem_generic
that just returns when pfn >= end_pfn && pfn < end_pfn_mapped

How about this patch? Does it work?

-Andi

Handle reserve_bootmem_generic beyond end_pfn

This can happen on kexec kernels with some configurations, in particularly
on Unisys ES7000 systems.

Analysis by Amul Shah

Cc: Amul Shah <[email protected]>

Signed-off-by: Andi Kleen <[email protected]>

Index: linux/arch/x86_64/mm/init.c
===================================================================
--- linux.orig/arch/x86_64/mm/init.c
+++ linux/arch/x86_64/mm/init.c
@@ -655,9 +655,22 @@ void free_initrd_mem(unsigned long start

void __init reserve_bootmem_generic(unsigned long phys, unsigned len)
{
- /* Should check here against the e820 map to avoid double free */
#ifdef CONFIG_NUMA
int nid = phys_to_nid(phys);
+#endif
+ unsigned long pfn = phys >> PAGE_SHIFT;
+ if (pfn >= end_pfn) {
+ /* This can happen with kdump kernels when accessing firmware
+ tables. */
+ if (pfn < end_pfn_map)
+ return;
+ printk(KERN_ERR "reserve_bootmem: illegal reserve %lx %u\n",
+ phys, len);
+ return;
+ }
+
+ /* Should check here against the e820 map to avoid double free */
+#ifdef CONFIG_NUMA
reserve_bootmem_node(NODE_DATA(nid), phys, len);
#else
reserve_bootmem(phys, len);

2006-11-03 14:31:54

[permalink] [raw]

Subject: Re: [Fastboot] [RFC] [PATCH 2.6.19-rc4] kdump panics early in boot when reserving MP Tables located in high memory

On Thu, 2006-11-02 at 18:36 -0500, Vivek Goyal wrote:
> On Thu, Nov 02, 2006 at 05:24:32PM -0500, Amul Shah wrote:
> > The kdump crash kernel panics when it tries to reserve the MP Config
> > tables on an ES7000.
> >
> > The MP Config table is located above 1MB of physical memory in a
> > reserved memory area. It is located outside the first 1MB area because
> > the tables are too large, 240k.
> >
>
> Hi Amul,
>
> Can you tell where it is placed in your system? At the end of physical
> RAM?

The MP tables are located at 896MB of RAM. I believe that we ended up
there because i386 Linux would choke if the tables were located higher
than 896MB (the low memory boundary).

> > The crash kernel is given a user defined memory map with E820 reserved
> > and ACPI areas passed in by kexec tools and a usable area from 16MB
> > physical to 80MB physical. This user defined map causes the top of
> > memory to be set as 80MB.
> >
> > The ACPI tables and MP Tables reside higher in memory. When reserving
> > memory with reserve_bootmem_generic, the function has a BUG panic if the
> > memory location to reserve is above the top of memory. The MP table is
> > above the top of memory in a user defined memory map.
> >
> > This patch will ignore reserving the MP tables if the MP table resides
> > in an area already reserved in the E820.
> >
> > I have two alternate patches that accomplish the same effect if this
> > patch is not acceptable.
> > 1. avoid reserving the MP tables if a user defined memory map or if a
> > user defined memory limit ("mem=") is used.
> > 2. avoid reserving the MP tables if a kernel parameter is passed in to
> > ignore MP table reservation.
> >
> >
>
> I think both the methods are not the right way to solve the problem. It
> will just fix the symtom you are facing. Currently this solution works
> for you as you are using MADT tables from ACPI. But it will fail if you try
> to boot second kernel on your system with acpi=off as MP tables are not
> accessible.

You are correct, this patch most certainly only fixes the Unisys problem
(does that make me a corporate shill? ;). Since we can't boot the
ES7000 without ACPI, the crash kernel must have access to the ACPI Data
area.

> I think the right way to fix this problem would be to let kexec-tools know
> where the MP table is in RAM and kexec-tools can create another memmap=
> entry for that area so that MP tables are accessible in second kernel.

The MP Tables already reside in a reserved area which the kexec-tools
handles. Unfortunately because end of ram in the user defined map is
physical 80MB, trying to reserve memory above that point breaks the
ES7000.

> I think you need to export the MP table location and size to user space,
> say through /sys/kernel/. And also modify kexec-tools to parse it.
>
> Thanks
> Vivek

Andi's suggestion will work with a modification. I'll post it once I've
tested it.

thanks,
Amul

2006-11-03 14:56:42

[permalink] [raw]

Subject: Re: [RFC] [PATCH 2.6.19-rc4] kdump panics early in boot when reserving MP Tables located in high memory

On Fri, 2006-11-03 at 03:40 +0100, Andi Kleen wrote:
> On Thursday 02 November 2006 23:24, Amul Shah wrote:
>
> >
> > The ACPI tables and MP Tables reside higher in memory. When reserving
> > memory with reserve_bootmem_generic, the function has a BUG panic if the
> > memory location to reserve is above the top of memory. The MP table is
> > above the top of memory in a user defined memory map.
>
> I think it would be cleaner to add a check in reserve_bootmem_generic
> that just returns when pfn >= end_pfn && pfn < end_pfn_mapped
>
> How about this patch? Does it work?
>
> -Andi
>
> Handle reserve_bootmem_generic beyond end_pfn
>
> This can happen on kexec kernels with some configurations, in particularly
> on Unisys ES7000 systems.
>
> Analysis by Amul Shah
>
> Cc: Amul Shah <[email protected]>
>
> Signed-off-by: Andi Kleen <[email protected]>
>
> Index: linux/arch/x86_64/mm/init.c
> ===================================================================
> --- linux.orig/arch/x86_64/mm/init.c
> +++ linux/arch/x86_64/mm/init.c
> @@ -655,9 +655,22 @@ void free_initrd_mem(unsigned long start
>
> void __init reserve_bootmem_generic(unsigned long phys, unsigned len)
> {
> - /* Should check here against the e820 map to avoid double free */
> #ifdef CONFIG_NUMA
> int nid = phys_to_nid(phys);
> +#endif
> + unsigned long pfn = phys >> PAGE_SHIFT;
> + if (pfn >= end_pfn) {
> + /* This can happen with kdump kernels when accessing firmware
> + tables. */
> + if (pfn < end_pfn_map)
> + return;
> + printk(KERN_ERR "reserve_bootmem: illegal reserve %lx %u\n",
> + phys, len);
> + return;
> + }
> +
> + /* Should check here against the e820 map to avoid double free */
> +#ifdef CONFIG_NUMA
> reserve_bootmem_node(NODE_DATA(nid), phys, len);
> #else
> reserve_bootmem(phys, len);

Andi,
That won't worked because in arch/86_64/kernel/e820.c, the exactmap
parsing clobbers end_pfn_map.

static int __init parse_memmap_opt(char *p)
{
char *oldp;
unsigned long long start_at, mem_size;

if (!strcmp(p, "exactmap")) {
#ifdef CONFIG_CRASH_DUMP
/* If we are doing a crash dump, we
* still need to know the real mem
* size before original memory map is
* reset.
*/
saved_max_pfn = e820_end_of_ram();
#endif
end_pfn_map = 0;
e820.nr_map = 0;
userdef = 1;
return 0;
}

The following was my alternate patch with uses the saved_max_pfn
variable which avoids the MP config table reservation bug. I'll rewrite
it to go into reserve_bootmem_generic and submit that patch once I have
tested it.

I chose to use end_user_pfn because it is left unmodified unless the
user specifies an exactmap or a "mem=" value as a kernel boot parameter.
This might be a no-no, since I didn't check to see if I'm under the top
of memory. I'm making the assumption that since the user chose to
define memory the user knows what s/he is doing.

This patch is just for comment since I'll be using the same logic when I
update the patch that Andi sent.

thanks,
Amul

diff -Naur linux-2.6.19-rc4/arch/x86_64/kernel/mpparse.c linux-2.6.19-rc4-pfncheck/arch/x86_64/kernel/mpparse.c
--- linux-2.6.19-rc4/arch/x86_64/kernel/mpparse.c 2006-10-31 17:38:41.000000000 -0500
+++ linux-2.6.19-rc4-pfncheck/arch/x86_64/kernel/mpparse.c 2006-11-03 10:18:24.000000000 -0500
@@ -34,6 +34,7 @@
/* Have we found an MP table */
int smp_found_config;
unsigned int __initdata maxcpus = NR_CPUS;
+extern unsigned long end_user_pfn;

int acpi_found_madt;

@@ -528,6 +529,8 @@
extern void __bad_mpf_size(void);
unsigned int *bp = phys_to_virt(base);
struct intel_mp_floating *mpf;
+ int mpf_below_mem;
+ unsigned long mpf_pfn;

Dprintk("Scan SMP from %p for %ld bytes.\n", bp,length);
if (sizeof(*mpf) != 16)
@@ -542,8 +545,33 @@
|| (mpf->mpf_specification == 4)) ) {

smp_found_config = 1;
+ mpf_below_mem = 0;
reserve_bootmem_generic(virt_to_phys(mpf), PAGE_SIZE);
- if (mpf->mpf_physptr)
+ if (mpf->mpf_physptr) {
+ mpf_pfn = mpf->mpf_physptr>>PAGE_SHIFT;
+ mpf_below_mem = 1;
+#ifdef CONFIG_CRASH_DUMP
+ if (mpf_pfn > end_pfn &&
+ mpf_pfn < saved_max_pfn) {
+ printk(KERN_WARNING "WARNING: "
+ "mpf_physptr > end_pfn %lx in "
+ "user defined map\n", end_pfn);
+ printk(KERN_WARNING "WARNING: "
+ "Not reserving the MP Tables\n");
+ mpf_below_mem = 0;
+ }
+#endif
+ if (mpf_pfn > end_pfn &&
+ end_user_pfn != MAXMEM>>PAGE_SHIFT) {
+ printk(KERN_WARNING "WARNING: "
+ "mpf_physptr > end_pfn %lx. Try"
+ " a larger 'mem='\n", end_pfn);
+ printk(KERN_WARNING "WARNING: "
+ "Not reserving the MP Tables\n");
+ mpf_below_mem = 0;
+ }
+ }
+ if (mpf_below_mem)
reserve_bootmem_generic(mpf->mpf_physptr, PAGE_SIZE);
mpf_found = mpf;
return 1;

2006-11-03 16:51:13

[permalink] [raw]

Subject: Re: [RFC] [PATCH 2.6.19-rc4] kdump panics early in boot when reserving MP Tables located in high memory

[Finally dropping that annoying fastboot list from cc. Please never include any closed
mailing lists in l-k posts. Thanks]

> That won't worked because in arch/86_64/kernel/e820.c, the exactmap
> parsing clobbers end_pfn_map.

That's a bug imho. It shouldn't do that.

end_pfn_map should be always the highest address in e820 so that we
can access all firmware tables safely.

-Andi

>
> static int __init parse_memmap_opt(char *p)
> {
> char *oldp;
> unsigned long long start_at, mem_size;
>
> if (!strcmp(p, "exactmap")) {
> #ifdef CONFIG_CRASH_DUMP
> /* If we are doing a crash dump, we
> * still need to know the real mem
> * size before original memory map is
> * reset.
> */
> saved_max_pfn = e820_end_of_ram();
> #endif
> end_pfn_map = 0;
> e820.nr_map = 0;
> userdef = 1;
> return 0;
> }

2006-11-03 17:18:52

[permalink] [raw]

Subject: Re: [RFC] [PATCH 2.6.19-rc4] kdump panics early in boot when reserving MP Tables located in high memory

On Fri, Nov 03, 2006 at 05:51:03PM +0100, Andi Kleen wrote:
>
> [Finally dropping that annoying fastboot list from cc. Please never include any closed
> mailing lists in l-k posts. Thanks]
>
> > That won't worked because in arch/86_64/kernel/e820.c, the exactmap
> > parsing clobbers end_pfn_map.
>
> That's a bug imho. It shouldn't do that.
>
> end_pfn_map should be always the highest address in e820 so that we
> can access all firmware tables safely.
>

Hi Andi,

end_pfn_map still contins the highest address in e820. The only difference
is that it is reset and recalculated based on new memory map passed
with the help of memmap= options.

Actually with mempmap=exactmap, we are overriding the BIOS provided
memory map with a User defined memory map so we reset the end_pfn_map
to zero and it will be calculated again based on new memory map passed
with the help of memmap= options.

So to access all the firmware tables safely, one has to make sure that
right memmap= options have been passed to the kernel.

That's why IMHO, the right way to fix this problem is not doing
some special condition checks in kernel, instead, passing the right
memmap= options. To do that kexec-tools has to know where the firmware
tables are and that's why location of MP tables should be exported to
user space.

Thanks
Vivek

2006-11-03 17:40:41

[permalink] [raw]

Subject: Re: [RFC] [PATCH 2.6.19-rc4] kdump panics early in boot when reserving MP Tables located in high memory

On Fri, Nov 03, 2006 at 05:51:03PM +0100, Andi Kleen wrote:
>
> [Finally dropping that annoying fastboot list from cc. Please never include any closed
> mailing lists in l-k posts. Thanks]
>

Sorry, did not notice your message in the last mail and copied my last
response to fastboot mailing list.

When did fastboot become a closed mailing list? AFAIK, its an open list
and anybody can do the posting.

Are you getting notifications that your mail has been waiting for
moderator's approval to be posted?

Thanks
Vivek

2006-11-03 18:43:49

[permalink] [raw]

Subject: Re: [RFC] [PATCH 2.6.19-rc4] kdump panics early in boot when reserving MP Tables located in high memory

On Friday 03 November 2006 18:40, Vivek Goyal wrote:

> When did fastboot become a closed mailing list? AFAIK, its an open list
> and anybody can do the posting.

It's been for a long time. At least I remember often getting these bounces.

>
> Are you getting notifications that your mail has been waiting for
> moderator's approval to be posted?

Yes.

-Andi

2006-11-03 18:52:08

[permalink] [raw]

Subject: Re: [RFC] [PATCH 2.6.19-rc4] kdump panics early in boot when reserving MP Tables located in high memory

On Fri, Nov 03, 2006 at 07:43:48PM +0100, Andi Kleen wrote:
> On Friday 03 November 2006 18:40, Vivek Goyal wrote:
>
> > When did fastboot become a closed mailing list? AFAIK, its an open list
> > and anybody can do the posting.
>
> It's been for a long time. At least I remember often getting these bounces.
>

You are right. Just now I sent a mail to the administrator of the list and
he told that recently he made fastboot a closed list to avoid spams. But he
is now re-opening the list for everybody as we want to archive the
kexec/kdump related discussions in fastboot list. Finding a past discussion
on LKML is tough.

So now onwards you should not be receiving those annoying messages.

Thanks
Vivek

2006-11-03 19:49:14

[permalink] [raw]

Subject: Re: [RFC] [PATCH 2.6.19-rc4] kdump panics early in boot when reserving MP Tables located in high memory

On Fri, 2006-11-03 at 12:17 -0500, Vivek Goyal wrote:
> On Fri, Nov 03, 2006 at 05:51:03PM +0100, Andi Kleen wrote:
> >
> > [Finally dropping that annoying fastboot list from cc. Please never include any closed
> > mailing lists in l-k posts. Thanks]
> >
> > > That won't worked because in arch/86_64/kernel/e820.c, the exactmap
> > > parsing clobbers end_pfn_map.
> >
> > That's a bug imho. It shouldn't do that.
> >
> > end_pfn_map should be always the highest address in e820 so that we
> > can access all firmware tables safely.
> >
>
> Hi Andi,
>
> end_pfn_map still contins the highest address in e820. The only difference
> is that it is reset and recalculated based on new memory map passed
> with the help of memmap= options.

Andi, Vivek is right. We can use end_pfn_map. My observation is wrong.

> Actually with mempmap=exactmap, we are overriding the BIOS provided
> memory map with a User defined memory map so we reset the end_pfn_map
> to zero and it will be calculated again based on new memory map passed
> with the help of memmap= options.
>
> So to access all the firmware tables safely, one has to make sure that
> right memmap= options have been passed to the kernel.
>
> That's why IMHO, the right way to fix this problem is not doing
> some special condition checks in kernel, instead, passing the right
> memmap= options. To do that kexec-tools has to know where the firmware
> tables are and that's why location of MP tables should be exported to
> user space.

Vivek, the problem condition is in generic reserve_bootmem_core
(mm/bootmem.c), where this
BUG_ON(PFN_DOWN(addr) >= bdata->node_low_pfn);
checks the target address against the top of that node's memory.

When I said:
> The ACPI tables and MP Tables reside higher in memory. When reserving
> memory with reserve_bootmem_generic, the function has a BUG panic if the
> memory location to reserve is above the top of memory. The MP table is
> above the top of memory in a user defined memory map.
I wasn't accurate. I should have said that the top of memory as seen in
that function is the top of the memory for the node of usable memory.

Since the user defined map as passed in the kexec tools is accurate, we
do need the conditional check to avoid this problem. I'm more than
happy to work in a second patch to export the MP table location to user
space for the kexec tools (the ES7000 will be a special case even for
that feature since the MP tables already reside in a reserved area).

thanks,
Amul

2006-11-03 19:52:09

[permalink] [raw]

Subject: Re: [RFC] [PATCH 2.6.19-rc4] kdump panics early in boot when reserving MP Tables located in high memory

On Friday 03 November 2006 20:47, Amul Shah wrote:

> Andi, Vivek is right. We can use end_pfn_map. My observation is wrong.

Ok. Then my patch should work?

> Vivek, the problem condition is in generic reserve_bootmem_core
> (mm/bootmem.c), where this
> BUG_ON(PFN_DOWN(addr) >= bdata->node_low_pfn);
> checks the target address against the top of that node's memory.

In general these early BUGs should be eliminated - they are always
messy because the kernel exception handlers are not fully functional
yet. printks or worst case panics are better.

-Andi

2006-11-03 21:18:57

[permalink] [raw]

Subject: Re: [RFC] [PATCH 2.6.19-rc4] kdump panics early in boot when reserving MP Tables located in high memory

On Fri, 2006-11-03 at 20:52 +0100, Andi Kleen wrote:
> On Friday 03 November 2006 20:47, Amul Shah wrote:
>
> > Andi, Vivek is right. We can use end_pfn_map. My observation is wrong.
>
> Ok. Then my patch should work?

The patch does work on a 2.6.16 derived kernel (SLES 10 kernel). The
2.6.19-rc4 kernel is doing some funny things when I use it as a kdump
kernel (regardless of the patch).

> > Vivek, the problem condition is in generic reserve_bootmem_core
> > (mm/bootmem.c), where this
> > BUG_ON(PFN_DOWN(addr) >= bdata->node_low_pfn);
> > checks the target address against the top of that node's memory.
>
> In general these early BUGs should be eliminated - they are always
> messy because the kernel exception handlers are not fully functional
> yet. printks or worst case panics are better.
>
> -Andi

I assume that we are not going to change mm/bootmem.c since your patch
works. Am I right?

Amul

2006-11-03 22:01:34