2006-11-02 13:21:11

by Magnus Damm

[permalink] [raw]
Subject: [PATCH] x86_64: setup saved_max_pfn correctly (kdump)

x86_64: setup saved_max_pfn correctly

2.6.19-rc4 has broken CONFIG_CRASH_DUMP support on x86_64. It is impossible
to read out the kernel contents from /proc/vmcore because saved_max_pfn is set
to zero instead of the max_pfn value before the user map is setup.

This happens because saved_max_pfn is initialized at parse_early_param() time,
and at this time no active regions have been registered. save_max_pfn is setup
from e820_end_of_ram(), more exact find_max_pfn_with_active_regions() which
returns 0 because no regions exist.

This patch fixes this by registering before and removing after the call
to e820_end_of_ram().

Signed-off-by: Magnus Damm <[email protected]>
---

Applies to 2.6.19-rc4.

arch/x86_64/kernel/e820.c | 2 ++
1 file changed, 2 insertions(+)

--- 0002/arch/x86_64/kernel/e820.c
+++ work/arch/x86_64/kernel/e820.c 2006-11-02 21:37:19.000000000 +0900
@@ -594,7 +594,9 @@ static int __init parse_memmap_opt(char
* size before original memory map is
* reset.
*/
+ e820_register_active_regions(0, 0, -1UL);
saved_max_pfn = e820_end_of_ram();
+ remove_all_active_ranges();
#endif
end_pfn_map = 0;
e820.nr_map = 0;


2006-11-02 14:29:30

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86_64: setup saved_max_pfn correctly (kdump)

On Thu, Nov 02, 2006 at 10:19:34PM +0900, Magnus Damm wrote:
> x86_64: setup saved_max_pfn correctly
>
> 2.6.19-rc4 has broken CONFIG_CRASH_DUMP support on x86_64. It is impossible
> to read out the kernel contents from /proc/vmcore because saved_max_pfn is set
> to zero instead of the max_pfn value before the user map is setup.
>
> This happens because saved_max_pfn is initialized at parse_early_param() time,
> and at this time no active regions have been registered. save_max_pfn is setup
> from e820_end_of_ram(), more exact find_max_pfn_with_active_regions() which
> returns 0 because no regions exist.
>
> This patch fixes this by registering before and removing after the call
> to e820_end_of_ram().
>
> Signed-off-by: Magnus Damm <[email protected]>
> ---
>
> Applies to 2.6.19-rc4.
>
> arch/x86_64/kernel/e820.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> --- 0002/arch/x86_64/kernel/e820.c
> +++ work/arch/x86_64/kernel/e820.c 2006-11-02 21:37:19.000000000 +0900
> @@ -594,7 +594,9 @@ static int __init parse_memmap_opt(char
> * size before original memory map is
> * reset.
> */
> + e820_register_active_regions(0, 0, -1UL);
> saved_max_pfn = e820_end_of_ram();
> + remove_all_active_ranges();
> #endif
> end_pfn_map = 0;
> e820.nr_map = 0;

This looks fine to me for the time being.

Down the line I am thinking that how about passing saved_max_pfn as
command line parameter. I think that way we don't have to pass all the
memmap= options to second kernel and kexec-tools can pass the memory map
through parameter segment. This memory map can be modified to represent
only the memory which can be used by second kernel and not the whole of
the memory.

I think this will simplify the logic and also save us precious comand
line in second kernel for kdump purposes.

Thanks
Vivek



2006-11-02 16:08:51

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH] x86_64: setup saved_max_pfn correctly (kdump)

On Thu, 2 Nov 2006, Magnus Damm wrote:

> x86_64: setup saved_max_pfn correctly
>
> 2.6.19-rc4 has broken CONFIG_CRASH_DUMP support on x86_64. It is impossible
> to read out the kernel contents from /proc/vmcore because saved_max_pfn is set
> to zero instead of the max_pfn value before the user map is setup.
>
> This happens because saved_max_pfn is initialized at parse_early_param() time,
> and at this time no active regions have been registered. save_max_pfn is setup
> from e820_end_of_ram(), more exact find_max_pfn_with_active_regions() which
> returns 0 because no regions exist.
>
> This patch fixes this by registering before and removing after the call
> to e820_end_of_ram().
>

Hey Magnus,

I see what you are doing and why. However if you look in
arch/x86_64/kernel/setup.c, you'll see

parse_early_param();

finish_e820_parsing();

e820_register_active_regions(0, 0, -1UL);

If you just called e820_register_active_regions(0, 0, -1UL) before
parse_early_param(), would it still fix the problem without having to call
e820_register_active_regions(0, 0, -1UL) twice?


> Signed-off-by: Magnus Damm <[email protected]>
> ---
>
> Applies to 2.6.19-rc4.
>
> arch/x86_64/kernel/e820.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> --- 0002/arch/x86_64/kernel/e820.c
> +++ work/arch/x86_64/kernel/e820.c 2006-11-02 21:37:19.000000000 +0900
> @@ -594,7 +594,9 @@ static int __init parse_memmap_opt(char
> * size before original memory map is
> * reset.
> */
> + e820_register_active_regions(0, 0, -1UL);
> saved_max_pfn = e820_end_of_ram();
> + remove_all_active_ranges();
> #endif
> end_pfn_map = 0;
> e820.nr_map = 0;
>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2006-11-02 17:40:20

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86_64: setup saved_max_pfn correctly (kdump)

On Thu, Nov 02, 2006 at 10:19:34PM +0900, Magnus Damm wrote:
> x86_64: setup saved_max_pfn correctly
>
> 2.6.19-rc4 has broken CONFIG_CRASH_DUMP support on x86_64. It is impossible
> to read out the kernel contents from /proc/vmcore because saved_max_pfn is set
> to zero instead of the max_pfn value before the user map is setup.

Do you know what patch has broken it?

Or did just nobody test crash dump at all since -rc* started?

-Andi

2006-11-02 17:40:35

by Magnus Damm

[permalink] [raw]
Subject: Re: [PATCH] x86_64: setup saved_max_pfn correctly (kdump)

On 11/2/06, Vivek Goyal <[email protected]> wrote:
> On Thu, Nov 02, 2006 at 10:19:34PM +0900, Magnus Damm wrote:
> > x86_64: setup saved_max_pfn correctly
> >
> > 2.6.19-rc4 has broken CONFIG_CRASH_DUMP support on x86_64. It is impossible
> > to read out the kernel contents from /proc/vmcore because saved_max_pfn is set
> > to zero instead of the max_pfn value before the user map is setup.
> >
> > This happens because saved_max_pfn is initialized at parse_early_param() time,
> > and at this time no active regions have been registered. save_max_pfn is setup
> > from e820_end_of_ram(), more exact find_max_pfn_with_active_regions() which
> > returns 0 because no regions exist.
> >
> > This patch fixes this by registering before and removing after the call
> > to e820_end_of_ram().
> >
> > Signed-off-by: Magnus Damm <[email protected]>
> > ---
> >
> > Applies to 2.6.19-rc4.
> >
> > arch/x86_64/kernel/e820.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > --- 0002/arch/x86_64/kernel/e820.c
> > +++ work/arch/x86_64/kernel/e820.c 2006-11-02 21:37:19.000000000 +0900
> > @@ -594,7 +594,9 @@ static int __init parse_memmap_opt(char
> > * size before original memory map is
> > * reset.
> > */
> > + e820_register_active_regions(0, 0, -1UL);
> > saved_max_pfn = e820_end_of_ram();
> > + remove_all_active_ranges();
> > #endif
> > end_pfn_map = 0;
> > e820.nr_map = 0;
>
> This looks fine to me for the time being.

Great, thanks.

> Down the line I am thinking that how about passing saved_max_pfn as
> command line parameter. I think that way we don't have to pass all the
> memmap= options to second kernel and kexec-tools can pass the memory map
> through parameter segment. This memory map can be modified to represent
> only the memory which can be used by second kernel and not the whole of
> the memory.

Hm, I'm not sure how that will improve things. Isn't memmap= just used
to inform the secondary kernel which space it can use? You need to
tell it somehow regardless - I think the current implementation seems
to work pretty well. But I guess you mean that we should pass an
modified e820 map that only includes valid areas for the second
kernel. That may simplify things. But changing the interface is
painful.

Right now I feel that so many things are happening in the kdump world
that it's difficult just to keep the current code working as is.

> I think this will simplify the logic and also save us precious comand
> line in second kernel for kdump purposes.

I remember seeing some email regarding extending the amount of command
line space, that's an alternative approach if we are running out of
space.

Thanks!

/ magnus

2006-11-02 18:05:13

by Magnus Damm

[permalink] [raw]
Subject: Re: [PATCH] x86_64: setup saved_max_pfn correctly (kdump)

Hi Mel,

Thanks for your input! Great work with the add_active_range() code.

On 11/3/06, Mel Gorman <[email protected]> wrote:
> Hey Magnus,
>
> I see what you are doing and why. However if you look in
> arch/x86_64/kernel/setup.c, you'll see
>
> parse_early_param();
>
> finish_e820_parsing();
>
> e820_register_active_regions(0, 0, -1UL);
>
> If you just called e820_register_active_regions(0, 0, -1UL) before
> parse_early_param(), would it still fix the problem without having to call
> e820_register_active_regions(0, 0, -1UL) twice?

Well, I guess it is possible to move the
e820_register_active_regions() up, but I'm not sure if that would give
us anything.

We need to call e820_register_active_regions() before e820_end_of_ram,
that's for sure, but the "exactmap" code in parse_memmap_opt() sets
e820.nr_map to 0 after the call to e820_end_of_ram(). Then it adds a
new set of user-supplied ranges to the e820 map which then need to be
registered using e820_register_active_regions().

So yeah, we can move the function up above parse_early_param() but
then we need to insert another call to e820_register_active_regions()
somewhere after all user-supplied ranges have been added.

Another solution could be to rewrite e820_end_of_ram() to instead scan
e820.map[] backwards from e820.nr_map - 1 to locate the last ram page.
But can you do that in two lines of code? =)

Thanks!

/ magnus

2006-11-02 18:08:17

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86_64: setup saved_max_pfn correctly (kdump)

On Fri, Nov 03, 2006 at 02:40:26AM +0900, Magnus Damm wrote:
> On 11/2/06, Vivek Goyal <[email protected]> wrote:
> >On Thu, Nov 02, 2006 at 10:19:34PM +0900, Magnus Damm wrote:
> >> x86_64: setup saved_max_pfn correctly
> >>
> >> 2.6.19-rc4 has broken CONFIG_CRASH_DUMP support on x86_64. It is
> >impossible
> >> to read out the kernel contents from /proc/vmcore because saved_max_pfn
> >is set
> >> to zero instead of the max_pfn value before the user map is setup.
> >>
> >> This happens because saved_max_pfn is initialized at parse_early_param()
> >time,
> >> and at this time no active regions have been registered. save_max_pfn is
> >setup
> >> from e820_end_of_ram(), more exact find_max_pfn_with_active_regions()
> >which
> >> returns 0 because no regions exist.
> >>
> >> This patch fixes this by registering before and removing after the call
> >> to e820_end_of_ram().
> >>
> >> Signed-off-by: Magnus Damm <[email protected]>
> >> ---
> >>
> >> Applies to 2.6.19-rc4.
> >>
> >> arch/x86_64/kernel/e820.c | 2 ++
> >> 1 file changed, 2 insertions(+)
> >>
> >> --- 0002/arch/x86_64/kernel/e820.c
> >> +++ work/arch/x86_64/kernel/e820.c 2006-11-02 21:37:19.000000000 +0900
> >> @@ -594,7 +594,9 @@ static int __init parse_memmap_opt(char
> >> * size before original memory map is
> >> * reset.
> >> */
> >> + e820_register_active_regions(0, 0, -1UL);
> >> saved_max_pfn = e820_end_of_ram();
> >> + remove_all_active_ranges();
> >> #endif
> >> end_pfn_map = 0;
> >> e820.nr_map = 0;
> >
> >This looks fine to me for the time being.
>
> Great, thanks.
>

I looked at Mel's suggestion of shifting the call to
e820_register_active_regions() above parse_early_param() and that should
be a better solution.

> >Down the line I am thinking that how about passing saved_max_pfn as
> >command line parameter. I think that way we don't have to pass all the
> >memmap= options to second kernel and kexec-tools can pass the memory map
> >through parameter segment. This memory map can be modified to represent
> >only the memory which can be used by second kernel and not the whole of
> >the memory.
>
> Hm, I'm not sure how that will improve things. Isn't memmap= just used
> to inform the secondary kernel which space it can use? You need to
> tell it somehow regardless - I think the current implementation seems
> to work pretty well. But I guess you mean that we should pass an
> modified e820 map that only includes valid areas for the second
> kernel. That may simplify things. But changing the interface is
> painful.
>

Actually kexec already passes a memory map to second kernel through
parameter segment. We can just modify that memory map instead of
passing it through memmap= command line options. Currently memory map
passed by kexec is used to calculate the saved_max_pfn and it can be
calculated using a command line parameter. So effectively I am replacing
multiple memmap= command line parameter with one.

> Right now I feel that so many things are happening in the kdump world
> that it's difficult just to keep the current code working as is.
>

I agree. Probably we should do it later.

> >I think this will simplify the logic and also save us precious comand
> >line in second kernel for kdump purposes.
>
> I remember seeing some email regarding extending the amount of command
> line space, that's an alternative approach if we are running out of
> space.

I had also seen those patches. Does not seem to be upstream. Don't know
what happened to those.

Thanks
Vivek

2006-11-02 18:20:53

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86_64: setup saved_max_pfn correctly (kdump)

On Fri, Nov 03, 2006 at 03:05:08AM +0900, Magnus Damm wrote:
> Hi Mel,
>
> Thanks for your input! Great work with the add_active_range() code.
>
> On 11/3/06, Mel Gorman <[email protected]> wrote:
> >Hey Magnus,
> >
> >I see what you are doing and why. However if you look in
> >arch/x86_64/kernel/setup.c, you'll see
> >
> > parse_early_param();
> >
> > finish_e820_parsing();
> >
> > e820_register_active_regions(0, 0, -1UL);
> >
> >If you just called e820_register_active_regions(0, 0, -1UL) before
> >parse_early_param(), would it still fix the problem without having to call
> >e820_register_active_regions(0, 0, -1UL) twice?
>
> Well, I guess it is possible to move the
> e820_register_active_regions() up, but I'm not sure if that would give
> us anything.
>
> We need to call e820_register_active_regions() before e820_end_of_ram,
> that's for sure, but the "exactmap" code in parse_memmap_opt() sets
> e820.nr_map to 0 after the call to e820_end_of_ram(). Then it adds a
> new set of user-supplied ranges to the e820 map which then need to be
> registered using e820_register_active_regions().
>
> So yeah, we can move the function up above parse_early_param() but
> then we need to insert another call to e820_register_active_regions()
> somewhere after all user-supplied ranges have been added.
>
> Another solution could be to rewrite e820_end_of_ram() to instead scan
> e820.map[] backwards from e820.nr_map - 1 to locate the last ram page.
> But can you do that in two lines of code? =)
>

Is there are reason that why e820_end_of_ram() should be looking at active
regions instead of dicrectly probing e820 memory map? If no, then modifying
e820_end_of_ram() wil save us extra calls.

Thanks
Vivek

2006-11-02 18:21:10

by Magnus Damm

[permalink] [raw]
Subject: Re: [PATCH] x86_64: setup saved_max_pfn correctly (kdump)

On 2 Nov 2006 18:40:16 +0100, Andi Kleen <[email protected]> wrote:
> On Thu, Nov 02, 2006 at 10:19:34PM +0900, Magnus Damm wrote:
> > x86_64: setup saved_max_pfn correctly
> >
> > 2.6.19-rc4 has broken CONFIG_CRASH_DUMP support on x86_64. It is impossible
> > to read out the kernel contents from /proc/vmcore because saved_max_pfn is set
> > to zero instead of the max_pfn value before the user map is setup.
>
> Do you know what patch has broken it?

Not really. I can find out if you want, but it has to wait until the
middle of next week - I'm out of the office at the moment. If I have
to guess then I point in Mel's direction. =)

> Or did just nobody test crash dump at all since -rc* started?

I think the problem is what to test. The entire chain of crash dump
tools is in an "interesting" state right now because the code is
pretty easy to break and it is under constant development - the
relocatable kernel code and my and Simon's kexec port for xen are two
good examples. And that the code is spread out over two kernels that
may not be of the same version together with a more or less
unmaintained userspace tool does not help...

I hope that someone else than me booted a crash kernel, but it looks
like noone really tested copying a crash image on x86_64. I didn't
until now. =) It's not that surprising though - there are only a few
kexec/kdump developers out there and there are several architectures
plus focus on features. Bound to break IMO.

Thanks,

/ magnus

2006-11-02 18:28:55

by Magnus Damm

[permalink] [raw]
Subject: Re: [PATCH] x86_64: setup saved_max_pfn correctly (kdump)

On 11/3/06, Vivek Goyal <[email protected]> wrote:
> On Fri, Nov 03, 2006 at 03:05:08AM +0900, Magnus Damm wrote:
> > Hi Mel,
> >
> > Thanks for your input! Great work with the add_active_range() code.
> >
> > On 11/3/06, Mel Gorman <[email protected]> wrote:
> > >Hey Magnus,
> > >
> > >I see what you are doing and why. However if you look in
> > >arch/x86_64/kernel/setup.c, you'll see
> > >
> > > parse_early_param();
> > >
> > > finish_e820_parsing();
> > >
> > > e820_register_active_regions(0, 0, -1UL);
> > >
> > >If you just called e820_register_active_regions(0, 0, -1UL) before
> > >parse_early_param(), would it still fix the problem without having to call
> > >e820_register_active_regions(0, 0, -1UL) twice?
> >
> > Well, I guess it is possible to move the
> > e820_register_active_regions() up, but I'm not sure if that would give
> > us anything.
> >
> > We need to call e820_register_active_regions() before e820_end_of_ram,
> > that's for sure, but the "exactmap" code in parse_memmap_opt() sets
> > e820.nr_map to 0 after the call to e820_end_of_ram(). Then it adds a
> > new set of user-supplied ranges to the e820 map which then need to be
> > registered using e820_register_active_regions().
> >
> > So yeah, we can move the function up above parse_early_param() but
> > then we need to insert another call to e820_register_active_regions()
> > somewhere after all user-supplied ranges have been added.
> >
> > Another solution could be to rewrite e820_end_of_ram() to instead scan
> > e820.map[] backwards from e820.nr_map - 1 to locate the last ram page.
> > But can you do that in two lines of code? =)
> >
>
> Is there are reason that why e820_end_of_ram() should be looking at active
> regions instead of dicrectly probing e820 memory map? If no, then modifying
> e820_end_of_ram() wil save us extra calls.

No special reason. I was just happy that I was able to solve the
problem by using two lines of new kernel functions. =) Also, the code
is not timing critical at all so I see no reason to write up something
else when we already have a working solution.

Feel free to hack up with a better patch and I'll test and report
before end of next week.

Thanks!

/ magnus

2006-11-02 22:41:35

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH] x86_64: setup saved_max_pfn correctly (kdump)

On Fri, 3 Nov 2006, Magnus Damm wrote:

> Hi Mel,
>
> Thanks for your input! Great work with the add_active_range() code.
>

Thanks

> On 11/3/06, Mel Gorman <[email protected]> wrote:
>> Hey Magnus,
>>
>> I see what you are doing and why. However if you look in
>> arch/x86_64/kernel/setup.c, you'll see
>>
>> parse_early_param();
>>
>> finish_e820_parsing();
>>
>> e820_register_active_regions(0, 0, -1UL);
>>
>> If you just called e820_register_active_regions(0, 0, -1UL) before
>> parse_early_param(), would it still fix the problem without having to call
>> e820_register_active_regions(0, 0, -1UL) twice?
>
> Well, I guess it is possible to move the
> e820_register_active_regions() up, but I'm not sure if that would give
> us anything.
>
> We need to call e820_register_active_regions() before e820_end_of_ram,
> that's for sure, but the "exactmap" code in parse_memmap_opt() sets
> e820.nr_map to 0 after the call to e820_end_of_ram(). Then it adds a
> new set of user-supplied ranges to the e820 map which then need to be
> registered using e820_register_active_regions().
>
> So yeah, we can move the function up above parse_early_param() but
> then we need to insert another call to e820_register_active_regions()
> somewhere after all user-supplied ranges have been added.
>

Ah right, I see the problem now and why you need to do things that way in
your patch. Sorry about that.

> Another solution could be to rewrite e820_end_of_ram() to instead scan
> e820.map[] backwards from e820.nr_map - 1 to locate the last ram page.
> But can you do that in two lines of code? =)
>

Nope. As the path you are doing this in is not time-critical, the patch is
fine to me.

> Thanks!
>
> / magnus
>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab