2014-11-12 06:40:01

by Hatayama, Daisuke

[permalink] [raw]
Subject: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO

Currently, VMCOREINFO note information reports the virtual address of
phys_base that is assigned to symbol phys_base. But this doesn't make
sense because to refer to value of the phys_base, it's necessary to
get the value of phys_base itself we are now about to refer to.

Userland tools related to kdump such as makedumpfile and crash utility
so far have made some efforts to calculate phys_base from memory
mapping information on a variety of crash dump formats. But there's no
guarantee to keep maintaining it in the future.

This is also useful for crash dump mechanism running outside Linux
kernel such as virtual machine hypervisor such as qemu dump, which
ordinary users use via virsh dump, or ones implemented on vendor
specific firmware. They cannot get phys_base without special mechanism
because phys_base is kernel information.

To get VMCOREINFO in vmcore, it's easy to use strings and grep
commands like this; VMCOREINFO consists of simple string:

$ strings vmcore-3.10.0-121.el7.x86_64 | grep -E ".*VMCOREINFO.*" -A 100
VMCOREINFO
OSRELEASE=3.10.0-121.el7.x86_64
PAGESIZE=4096
...

Similarly, this is also useful to get value of phys_base in kdump 2nd
kernel contained in vmcore using the above-mentioned external crash
dump mechanism; kdump 2nd kernel is an inherently relocated kernel.

This commit doesn't remove VMCOREINFO_SYMBOL(phys_base) line because
makedumpfile refers to it and if removing it, old versions
makedumpfile doesn't work well.

Signed-off-by: HATAYAMA Daisuke <[email protected]>
---
arch/x86/kernel/machine_kexec_64.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 4859810..e6d00a4 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -334,6 +334,7 @@ void arch_crash_save_vmcoreinfo(void)
#endif
vmcoreinfo_append_str("KERNELOFFSET=%lx\n",
(unsigned long)&_text - __START_KERNEL);
+ VMCOREINFO_LENGTH(phys_base, phys_base);
}

/* arch-dependent functionality related to kexec file-based syscall */
--
1.9.3


2014-11-12 08:14:38

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO

On Wed, 12 Nov 2014 15:40:42 +0900 (JST)
HATAYAMA Daisuke <[email protected]> wrote:

> Currently, VMCOREINFO note information reports the virtual address of
> phys_base that is assigned to symbol phys_base. But this doesn't make
> sense because to refer to value of the phys_base, it's necessary to
> get the value of phys_base itself we are now about to refer to.
>
> Userland tools related to kdump such as makedumpfile and crash utility
> so far have made some efforts to calculate phys_base from memory
> mapping information on a variety of crash dump formats. But there's no
> guarantee to keep maintaining it in the future.
>
> This is also useful for crash dump mechanism running outside Linux
> kernel such as virtual machine hypervisor such as qemu dump, which
> ordinary users use via virsh dump, or ones implemented on vendor
> specific firmware. They cannot get phys_base without special mechanism
> because phys_base is kernel information.
>
> To get VMCOREINFO in vmcore, it's easy to use strings and grep
> commands like this; VMCOREINFO consists of simple string:
>
> $ strings vmcore-3.10.0-121.el7.x86_64 | grep -E ".*VMCOREINFO.*" -A 100
> VMCOREINFO
> OSRELEASE=3.10.0-121.el7.x86_64
> PAGESIZE=4096
> ...
>
> Similarly, this is also useful to get value of phys_base in kdump 2nd
> kernel contained in vmcore using the above-mentioned external crash
> dump mechanism; kdump 2nd kernel is an inherently relocated kernel.
>
> This commit doesn't remove VMCOREINFO_SYMBOL(phys_base) line because
> makedumpfile refers to it and if removing it, old versions
> makedumpfile doesn't work well.
>
> Signed-off-by: HATAYAMA Daisuke <[email protected]>
> ---
> arch/x86/kernel/machine_kexec_64.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index 4859810..e6d00a4 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -334,6 +334,7 @@ void arch_crash_save_vmcoreinfo(void)
> #endif
> vmcoreinfo_append_str("KERNELOFFSET=%lx\n",
> (unsigned long)&_text - __START_KERNEL);
> + VMCOREINFO_LENGTH(phys_base, phys_base);

While I fully agree with the concept, I don't like the use of
VMCOREINFO_LENGTH. LENGTH(symbol) has been used to store array length
in VMCOREINFO.

OTOH there is currently no good syntax for storing a value (short of
VMCOREINFO_NUMBER, but that one is signed). I think it would be best to
extend the VMCOREINFO syntax for storing variables, preferably in
hexadecimal, e.g.:

#define VMCOREINFO_VALUE(name, value) \
vmcoreinfo_append_str("VALUE(%s)=%lx\n", #name, (unsigned long) value)

This interface is somewhat suboptimal, because it can only store a
single long value. So, maybe we should dump the complete variable in
hex, or something similar...

Anyone has a better idea?

Petr T

2014-11-12 22:12:34

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO

On Wed, Nov 12, 2014 at 03:40:42PM +0900, HATAYAMA Daisuke wrote:
> Currently, VMCOREINFO note information reports the virtual address of
> phys_base that is assigned to symbol phys_base. But this doesn't make
> sense because to refer to value of the phys_base, it's necessary to
> get the value of phys_base itself we are now about to refer to.
>

Hi Hatayama,

/proc/vmcore ELF headers have virtual address information and using
that you should be able to read actual value of phys_base. gdb deals
with virtual addresses all the time and can read value of any symbol
using those headers.

So I am not sure what's the need for exporting actual value of
phys_base.

Thanks
Vivek

2014-11-13 00:16:25

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO

From: Vivek Goyal <[email protected]>
Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
Date: Wed, 12 Nov 2014 17:12:05 -0500

> On Wed, Nov 12, 2014 at 03:40:42PM +0900, HATAYAMA Daisuke wrote:
>> Currently, VMCOREINFO note information reports the virtual address of
>> phys_base that is assigned to symbol phys_base. But this doesn't make
>> sense because to refer to value of the phys_base, it's necessary to
>> get the value of phys_base itself we are now about to refer to.
>>
>
> Hi Hatayama,
>
> /proc/vmcore ELF headers have virtual address information and using
> that you should be able to read actual value of phys_base. gdb deals
> with virtual addresses all the time and can read value of any symbol
> using those headers.
>
> So I am not sure what's the need for exporting actual value of
> phys_base.
>

Sorry, my logic in the patch description was wrong. For /proc/vmcore,
there's enough information for makedumpdile to get phys_base. It's
correct. The problem here is that other crash dump mechanisms that run
outside Linux kernel independently don't have information to get
phys_base.

Thanks.
HATAYAMA, Daisuke

2014-11-13 00:32:01

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO

From: Petr Tesarik <[email protected]>
Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
Date: Wed, 12 Nov 2014 09:14:34 +0100

> On Wed, 12 Nov 2014 15:40:42 +0900 (JST)
> HATAYAMA Daisuke <[email protected]> wrote:
>
>> Currently, VMCOREINFO note information reports the virtual address of
>> phys_base that is assigned to symbol phys_base. But this doesn't make
>> sense because to refer to value of the phys_base, it's necessary to
>> get the value of phys_base itself we are now about to refer to.
>>
>> Userland tools related to kdump such as makedumpfile and crash utility
>> so far have made some efforts to calculate phys_base from memory
>> mapping information on a variety of crash dump formats. But there's no
>> guarantee to keep maintaining it in the future.
>>
>> This is also useful for crash dump mechanism running outside Linux
>> kernel such as virtual machine hypervisor such as qemu dump, which
>> ordinary users use via virsh dump, or ones implemented on vendor
>> specific firmware. They cannot get phys_base without special mechanism
>> because phys_base is kernel information.
>>
>> To get VMCOREINFO in vmcore, it's easy to use strings and grep
>> commands like this; VMCOREINFO consists of simple string:
>>
>> $ strings vmcore-3.10.0-121.el7.x86_64 | grep -E ".*VMCOREINFO.*" -A 100
>> VMCOREINFO
>> OSRELEASE=3.10.0-121.el7.x86_64
>> PAGESIZE=4096
>> ...
>>
>> Similarly, this is also useful to get value of phys_base in kdump 2nd
>> kernel contained in vmcore using the above-mentioned external crash
>> dump mechanism; kdump 2nd kernel is an inherently relocated kernel.
>>
>> This commit doesn't remove VMCOREINFO_SYMBOL(phys_base) line because
>> makedumpfile refers to it and if removing it, old versions
>> makedumpfile doesn't work well.
>>
>> Signed-off-by: HATAYAMA Daisuke <[email protected]>
>> ---
>> arch/x86/kernel/machine_kexec_64.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
>> index 4859810..e6d00a4 100644
>> --- a/arch/x86/kernel/machine_kexec_64.c
>> +++ b/arch/x86/kernel/machine_kexec_64.c
>> @@ -334,6 +334,7 @@ void arch_crash_save_vmcoreinfo(void)
>> #endif
>> vmcoreinfo_append_str("KERNELOFFSET=%lx\n",
>> (unsigned long)&_text - __START_KERNEL);
>> + VMCOREINFO_LENGTH(phys_base, phys_base);
>
> While I fully agree with the concept, I don't like the use of
> VMCOREINFO_LENGTH. LENGTH(symbol) has been used to store array length
> in VMCOREINFO.
>
> OTOH there is currently no good syntax for storing a value (short of
> VMCOREINFO_NUMBER, but that one is signed). I think it would be best to
> extend the VMCOREINFO syntax for storing variables, preferably in
> hexadecimal, e.g.:
>
> #define VMCOREINFO_VALUE(name, value) \
> vmcoreinfo_append_str("VALUE(%s)=%lx\n", #name, (unsigned long) value)
>
> This interface is somewhat suboptimal, because it can only store a
> single long value. So, maybe we should dump the complete variable in
> hex, or something similar...
>
> Anyone has a better idea?
>
> Petr T

Just as you say, it's natural to write value of phys_base in
hexadecimal format.

For VMCOREINFO_VALUE, looking at the current helper macros, it seems
to me more natural to make it have more specific name:

#define VMCOREINFO_PHYS_BASE(value) \
vmcoreinfo_append_str("PHYS_BASE=%lx\n", (unsigned long) value)

--
Thanks.
HATAYAMA, Daisuke

2014-11-13 08:07:03

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO

On Thu, 13 Nov 2014 09:17:09 +0900 (JST)
HATAYAMA Daisuke <[email protected]> wrote:

> From: Vivek Goyal <[email protected]>
> Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
> Date: Wed, 12 Nov 2014 17:12:05 -0500
>
> > On Wed, Nov 12, 2014 at 03:40:42PM +0900, HATAYAMA Daisuke wrote:
> >> Currently, VMCOREINFO note information reports the virtual address of
> >> phys_base that is assigned to symbol phys_base. But this doesn't make
> >> sense because to refer to value of the phys_base, it's necessary to
> >> get the value of phys_base itself we are now about to refer to.
> >>
> >
> > Hi Hatayama,
> >
> > /proc/vmcore ELF headers have virtual address information and using
> > that you should be able to read actual value of phys_base. gdb deals
> > with virtual addresses all the time and can read value of any symbol
> > using those headers.
> >
> > So I am not sure what's the need for exporting actual value of
> > phys_base.
> >
>
> Sorry, my logic in the patch description was wrong. For /proc/vmcore,
> there's enough information for makedumpdile to get phys_base. It's
> correct. The problem here is that other crash dump mechanisms that run
> outside Linux kernel independently don't have information to get
> phys_base.

Yes, but these mechanisms won't be able to read VMCOREINFO either, will
they?

Petr T

2014-11-13 08:30:43

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO



(2014/11/13 17:06), Petr Tesarik wrote:
> On Thu, 13 Nov 2014 09:17:09 +0900 (JST)
> HATAYAMA Daisuke <[email protected]> wrote:
>
>> From: Vivek Goyal <[email protected]>
>> Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
>> Date: Wed, 12 Nov 2014 17:12:05 -0500
>>
>>> On Wed, Nov 12, 2014 at 03:40:42PM +0900, HATAYAMA Daisuke wrote:
>>>> Currently, VMCOREINFO note information reports the virtual address of
>>>> phys_base that is assigned to symbol phys_base. But this doesn't make
>>>> sense because to refer to value of the phys_base, it's necessary to
>>>> get the value of phys_base itself we are now about to refer to.
>>>>
>>>
>>> Hi Hatayama,
>>>
>>> /proc/vmcore ELF headers have virtual address information and using
>>> that you should be able to read actual value of phys_base. gdb deals
>>> with virtual addresses all the time and can read value of any symbol
>>> using those headers.
>>>
>>> So I am not sure what's the need for exporting actual value of
>>> phys_base.
>>>
>>
>> Sorry, my logic in the patch description was wrong. For /proc/vmcore,
>> there's enough information for makedumpdile to get phys_base. It's
>> correct. The problem here is that other crash dump mechanisms that run
>> outside Linux kernel independently don't have information to get
>> phys_base.
>
> Yes, but these mechanisms won't be able to read VMCOREINFO either, will
> they?
>

I don't intend such sophisticated function only by VMCOREINFO.
Search vmcore for VMCOREINFO using strings + grep before opening it by crash.
I intend that only here.

--
Thanks.
HATAYAMA, Daisuke

2014-11-13 14:26:21

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO

On Thu, Nov 13, 2014 at 05:30:21PM +0900, HATAYAMA, Daisuke wrote:
>
>
> (2014/11/13 17:06), Petr Tesarik wrote:
> >On Thu, 13 Nov 2014 09:17:09 +0900 (JST)
> >HATAYAMA Daisuke <[email protected]> wrote:
> >
> >>From: Vivek Goyal <[email protected]>
> >>Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
> >>Date: Wed, 12 Nov 2014 17:12:05 -0500
> >>
> >>>On Wed, Nov 12, 2014 at 03:40:42PM +0900, HATAYAMA Daisuke wrote:
> >>>>Currently, VMCOREINFO note information reports the virtual address of
> >>>>phys_base that is assigned to symbol phys_base. But this doesn't make
> >>>>sense because to refer to value of the phys_base, it's necessary to
> >>>>get the value of phys_base itself we are now about to refer to.
> >>>>
> >>>
> >>>Hi Hatayama,
> >>>
> >>>/proc/vmcore ELF headers have virtual address information and using
> >>>that you should be able to read actual value of phys_base. gdb deals
> >>>with virtual addresses all the time and can read value of any symbol
> >>>using those headers.
> >>>
> >>>So I am not sure what's the need for exporting actual value of
> >>>phys_base.
> >>>
> >>
> >>Sorry, my logic in the patch description was wrong. For /proc/vmcore,
> >>there's enough information for makedumpdile to get phys_base. It's
> >>correct. The problem here is that other crash dump mechanisms that run
> >>outside Linux kernel independently don't have information to get
> >>phys_base.
> >
> >Yes, but these mechanisms won't be able to read VMCOREINFO either, will
> >they?
> >
>
> I don't intend such sophisticated function only by VMCOREINFO.
> Search vmcore for VMCOREINFO using strings + grep before opening it by crash.
> I intend that only here.

I think this is very crude and not proper way to get to vmcoreinfo. Can
you give more context. What are those mechanisms and what are you trying
to do.

Thanks
Vivek

2014-11-13 14:48:16

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO

On Thu, 13 Nov 2014 09:25:48 -0500
Vivek Goyal <[email protected]> wrote:

> On Thu, Nov 13, 2014 at 05:30:21PM +0900, HATAYAMA, Daisuke wrote:
> >
> > (2014/11/13 17:06), Petr Tesarik wrote:
> > >On Thu, 13 Nov 2014 09:17:09 +0900 (JST)
> > >HATAYAMA Daisuke <[email protected]> wrote:
> > >
> > >>From: Vivek Goyal <[email protected]>
> > >>Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
> > >>Date: Wed, 12 Nov 2014 17:12:05 -0500
> > >>
> > >>>On Wed, Nov 12, 2014 at 03:40:42PM +0900, HATAYAMA Daisuke wrote:
> > >>>>Currently, VMCOREINFO note information reports the virtual address of
> > >>>>phys_base that is assigned to symbol phys_base. But this doesn't make
> > >>>>sense because to refer to value of the phys_base, it's necessary to
> > >>>>get the value of phys_base itself we are now about to refer to.
> > >>>>
> > >>>
> > >>>Hi Hatayama,
> > >>>
> > >>>/proc/vmcore ELF headers have virtual address information and using
> > >>>that you should be able to read actual value of phys_base. gdb deals
> > >>>with virtual addresses all the time and can read value of any symbol
> > >>>using those headers.
> > >>>
> > >>>So I am not sure what's the need for exporting actual value of
> > >>>phys_base.
> > >>>
> > >>
> > >>Sorry, my logic in the patch description was wrong. For /proc/vmcore,
> > >>there's enough information for makedumpdile to get phys_base. It's
> > >>correct. The problem here is that other crash dump mechanisms that run
> > >>outside Linux kernel independently don't have information to get
> > >>phys_base.
> > >
> > >Yes, but these mechanisms won't be able to read VMCOREINFO either, will
> > >they?
> > >
> >
> > I don't intend such sophisticated function only by VMCOREINFO.
> > Search vmcore for VMCOREINFO using strings + grep before opening it by crash.
> > I intend that only here.
>
> I think this is very crude and not proper way to get to vmcoreinfo.

Same here. If VMCOREINFO must be locatable without communicating any
information to the hypervisor, then I would rather go for something
similar to what s390(x) folks do - a well-known location in physical
memory that contains a pointer to a checksummed OS info structure,
which in turn contains the VMCOREINFO pointers.

I'm a bit surprised such mechanism is not needed by Fujitsu SADUMP.
Or is that part of the current plan, Daisuke?

Petr T

2014-11-14 01:28:52

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO

From: Vivek Goyal <[email protected]>
Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
Date: Thu, 13 Nov 2014 09:25:48 -0500

> On Thu, Nov 13, 2014 at 05:30:21PM +0900, HATAYAMA, Daisuke wrote:
>>
>>
>> (2014/11/13 17:06), Petr Tesarik wrote:
>> >On Thu, 13 Nov 2014 09:17:09 +0900 (JST)
>> >HATAYAMA Daisuke <[email protected]> wrote:
>> >
>> >>From: Vivek Goyal <[email protected]>
>> >>Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
>> >>Date: Wed, 12 Nov 2014 17:12:05 -0500
>> >>
>> >>>On Wed, Nov 12, 2014 at 03:40:42PM +0900, HATAYAMA Daisuke wrote:
>> >>>>Currently, VMCOREINFO note information reports the virtual address of
>> >>>>phys_base that is assigned to symbol phys_base. But this doesn't make
>> >>>>sense because to refer to value of the phys_base, it's necessary to
>> >>>>get the value of phys_base itself we are now about to refer to.
>> >>>>
>> >>>
>> >>>Hi Hatayama,
>> >>>
>> >>>/proc/vmcore ELF headers have virtual address information and using
>> >>>that you should be able to read actual value of phys_base. gdb deals
>> >>>with virtual addresses all the time and can read value of any symbol
>> >>>using those headers.
>> >>>
>> >>>So I am not sure what's the need for exporting actual value of
>> >>>phys_base.
>> >>>
>> >>
>> >>Sorry, my logic in the patch description was wrong. For /proc/vmcore,
>> >>there's enough information for makedumpdile to get phys_base. It's
>> >>correct. The problem here is that other crash dump mechanisms that run
>> >>outside Linux kernel independently don't have information to get
>> >>phys_base.
>> >
>> >Yes, but these mechanisms won't be able to read VMCOREINFO either, will
>> >they?
>> >
>>
>> I don't intend such sophisticated function only by VMCOREINFO.
>> Search vmcore for VMCOREINFO using strings + grep before opening it by crash.
>> I intend that only here.
>
> I think this is very crude and not proper way to get to vmcoreinfo. Can

I agree it's crude, but it's useful enough for my usecase.

> you give more context. What are those mechanisms and what are you trying
> to do.
>

I after all write the same thing in the patch description... I mean
qemu dump, xendump (and other hypervisor dumps), firmware dumps
implemented on each vendor system for the crash dump mechanism.

But there's no prepared information now. So, crash utility has made
effort to calculate phys_base from the existing information. One
example is to check if how far linux_banner string is apart from where
it actually is in vmcore, and another is to walk page tables via CR4
register if it is included in note information of the crash dump
mechanism's environment.

There's no gurantee that these always work well. For example, because
these crash dump mechanisms run independently of Linux kernel, it
could be when CPU moves to some firmware code. Then, CR4 could point
at differnet page table.

Looking at VMCOREINFO, there's already phys_base information but it's
symbol value only and it doesn't make sense. So it should be
corrected. I think this is natural.

Also, I think it important to keep externally running crash dump
mechansms as independent as possible. If they depend on linux kernel,
it could reduce robustoness of their functionality. In case of
phys_base, if we added a mechanism to get phys_base, it would mean
that the crash dump mechisms doesn't work well until they got
phys_base. For robustness, I think it best to make Linux kernel put
necessary information anywhere and to make users to use them
independently.

--
Thanks.
HATAYAMA, Daisuke

2014-11-14 01:39:45

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO

From: Petr Tesarik <[email protected]>
Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
Date: Thu, 13 Nov 2014 15:48:10 +0100

> On Thu, 13 Nov 2014 09:25:48 -0500
> Vivek Goyal <[email protected]> wrote:
>
>> On Thu, Nov 13, 2014 at 05:30:21PM +0900, HATAYAMA, Daisuke wrote:
>> >
>> > (2014/11/13 17:06), Petr Tesarik wrote:
>> > >On Thu, 13 Nov 2014 09:17:09 +0900 (JST)
>> > >HATAYAMA Daisuke <[email protected]> wrote:
>> > >
>> > >>From: Vivek Goyal <[email protected]>
>> > >>Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
>> > >>Date: Wed, 12 Nov 2014 17:12:05 -0500
>> > >>
>> > >>>On Wed, Nov 12, 2014 at 03:40:42PM +0900, HATAYAMA Daisuke wrote:
>> > >>>>Currently, VMCOREINFO note information reports the virtual address of
>> > >>>>phys_base that is assigned to symbol phys_base. But this doesn't make
>> > >>>>sense because to refer to value of the phys_base, it's necessary to
>> > >>>>get the value of phys_base itself we are now about to refer to.
>> > >>>>
>> > >>>
>> > >>>Hi Hatayama,
>> > >>>
>> > >>>/proc/vmcore ELF headers have virtual address information and using
>> > >>>that you should be able to read actual value of phys_base. gdb deals
>> > >>>with virtual addresses all the time and can read value of any symbol
>> > >>>using those headers.
>> > >>>
>> > >>>So I am not sure what's the need for exporting actual value of
>> > >>>phys_base.
>> > >>>
>> > >>
>> > >>Sorry, my logic in the patch description was wrong. For /proc/vmcore,
>> > >>there's enough information for makedumpdile to get phys_base. It's
>> > >>correct. The problem here is that other crash dump mechanisms that run
>> > >>outside Linux kernel independently don't have information to get
>> > >>phys_base.
>> > >
>> > >Yes, but these mechanisms won't be able to read VMCOREINFO either, will
>> > >they?
>> > >
>> >
>> > I don't intend such sophisticated function only by VMCOREINFO.
>> > Search vmcore for VMCOREINFO using strings + grep before opening it by crash.
>> > I intend that only here.
>>
>> I think this is very crude and not proper way to get to vmcoreinfo.
>
> Same here. If VMCOREINFO must be locatable without communicating any
> information to the hypervisor, then I would rather go for something
> similar to what s390(x) folks do - a well-known location in physical
> memory that contains a pointer to a checksummed OS info structure,
> which in turn contains the VMCOREINFO pointers.
>
> I'm a bit surprised such mechanism is not needed by Fujitsu SADUMP.
> Or is that part of the current plan, Daisuke?
>

It's useful if there is. I don't plan now. For now, the idea of this
patch is enough for me.

BTW, for the above idea, I suspect that if the location in the
physical memory is unique, it cannot deal with the kdump 2nd kernel
case. I think it better for the idea to be able to represent multiple
kernel information.

--
Thanks.
HATAYAMA, Daisuke

2014-11-14 08:31:51

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO

On Fri, 14 Nov 2014 10:42:35 +0900 (JST)
HATAYAMA Daisuke <[email protected]> wrote:

> From: Petr Tesarik <[email protected]>
> Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
> Date: Thu, 13 Nov 2014 15:48:10 +0100
>
> > On Thu, 13 Nov 2014 09:25:48 -0500
> > Vivek Goyal <[email protected]> wrote:
> >
> >> On Thu, Nov 13, 2014 at 05:30:21PM +0900, HATAYAMA, Daisuke wrote:
> >> >
> >> > (2014/11/13 17:06), Petr Tesarik wrote:
> >> > >On Thu, 13 Nov 2014 09:17:09 +0900 (JST)
> >> > >HATAYAMA Daisuke <[email protected]> wrote:
> >> > >
> >> > >>From: Vivek Goyal <[email protected]>
> >> > >>Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
> >> > >>Date: Wed, 12 Nov 2014 17:12:05 -0500
> >> > >>
> >> > >>>On Wed, Nov 12, 2014 at 03:40:42PM +0900, HATAYAMA Daisuke wrote:
> >> > >>>>Currently, VMCOREINFO note information reports the virtual address of
> >> > >>>>phys_base that is assigned to symbol phys_base. But this doesn't make
> >> > >>>>sense because to refer to value of the phys_base, it's necessary to
> >> > >>>>get the value of phys_base itself we are now about to refer to.
> >> > >>>>
> >> > >>>
> >> > >>>Hi Hatayama,
> >> > >>>
> >> > >>>/proc/vmcore ELF headers have virtual address information and using
> >> > >>>that you should be able to read actual value of phys_base. gdb deals
> >> > >>>with virtual addresses all the time and can read value of any symbol
> >> > >>>using those headers.
> >> > >>>
> >> > >>>So I am not sure what's the need for exporting actual value of
> >> > >>>phys_base.
> >> > >>>
> >> > >>
> >> > >>Sorry, my logic in the patch description was wrong. For /proc/vmcore,
> >> > >>there's enough information for makedumpdile to get phys_base. It's
> >> > >>correct. The problem here is that other crash dump mechanisms that run
> >> > >>outside Linux kernel independently don't have information to get
> >> > >>phys_base.
> >> > >
> >> > >Yes, but these mechanisms won't be able to read VMCOREINFO either, will
> >> > >they?
> >> > >
> >> >
> >> > I don't intend such sophisticated function only by VMCOREINFO.
> >> > Search vmcore for VMCOREINFO using strings + grep before opening it by crash.
> >> > I intend that only here.
> >>
> >> I think this is very crude and not proper way to get to vmcoreinfo.
> >
> > Same here. If VMCOREINFO must be locatable without communicating any
> > information to the hypervisor, then I would rather go for something
> > similar to what s390(x) folks do - a well-known location in physical
> > memory that contains a pointer to a checksummed OS info structure,
> > which in turn contains the VMCOREINFO pointers.
> >
> > I'm a bit surprised such mechanism is not needed by Fujitsu SADUMP.
> > Or is that part of the current plan, Daisuke?
> >
>
> It's useful if there is. I don't plan now. For now, the idea of this
> patch is enough for me.
>
> BTW, for the above idea, I suspect that if the location in the
> physical memory is unique, it cannot deal with the kdump 2nd kernel
> case.

No, not at all. The low 640K are copied away to a pre-allocated area by
kexec purgatory code on x86_64, so it's safe to overwrite any location
in there. The copy is needed, because BIOS already uses some hardcoded
addresses in that range. I think the Linux kernel may safely use part of
PFN 0 starting at physical address 0x0500. This area was originally
used by MS-DOS, so chances are high that no broken BIOS out there
corrupts this part of RAM...

Anyway, I'm not going to implement it right now for lack of time. I'm
adding it to my TODO list, but if anybody wants to post a patch, I
won't be offended.

Petr T

2014-11-14 09:51:45

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO

From: Petr Tesarik <[email protected]>
Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
Date: Fri, 14 Nov 2014 09:31:45 +0100

> On Fri, 14 Nov 2014 10:42:35 +0900 (JST)
> HATAYAMA Daisuke <[email protected]> wrote:
>
>> From: Petr Tesarik <[email protected]>
>> Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
>> Date: Thu, 13 Nov 2014 15:48:10 +0100
>>
>> > On Thu, 13 Nov 2014 09:25:48 -0500
>> > Vivek Goyal <[email protected]> wrote:
>> >
>> >> On Thu, Nov 13, 2014 at 05:30:21PM +0900, HATAYAMA, Daisuke wrote:
>> >> >
>> >> > (2014/11/13 17:06), Petr Tesarik wrote:
>> >> > >On Thu, 13 Nov 2014 09:17:09 +0900 (JST)
>> >> > >HATAYAMA Daisuke <[email protected]> wrote:
>> >> > >
>> >> > >>From: Vivek Goyal <[email protected]>
>> >> > >>Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
>> >> > >>Date: Wed, 12 Nov 2014 17:12:05 -0500
>> >> > >>
>> >> > >>>On Wed, Nov 12, 2014 at 03:40:42PM +0900, HATAYAMA Daisuke wrote:
>> >> > >>>>Currently, VMCOREINFO note information reports the virtual address of
>> >> > >>>>phys_base that is assigned to symbol phys_base. But this doesn't make
>> >> > >>>>sense because to refer to value of the phys_base, it's necessary to
>> >> > >>>>get the value of phys_base itself we are now about to refer to.
>> >> > >>>>
>> >> > >>>
>> >> > >>>Hi Hatayama,
>> >> > >>>
>> >> > >>>/proc/vmcore ELF headers have virtual address information and using
>> >> > >>>that you should be able to read actual value of phys_base. gdb deals
>> >> > >>>with virtual addresses all the time and can read value of any symbol
>> >> > >>>using those headers.
>> >> > >>>
>> >> > >>>So I am not sure what's the need for exporting actual value of
>> >> > >>>phys_base.
>> >> > >>>
>> >> > >>
>> >> > >>Sorry, my logic in the patch description was wrong. For /proc/vmcore,
>> >> > >>there's enough information for makedumpdile to get phys_base. It's
>> >> > >>correct. The problem here is that other crash dump mechanisms that run
>> >> > >>outside Linux kernel independently don't have information to get
>> >> > >>phys_base.
>> >> > >
>> >> > >Yes, but these mechanisms won't be able to read VMCOREINFO either, will
>> >> > >they?
>> >> > >
>> >> >
>> >> > I don't intend such sophisticated function only by VMCOREINFO.
>> >> > Search vmcore for VMCOREINFO using strings + grep before opening it by crash.
>> >> > I intend that only here.
>> >>
>> >> I think this is very crude and not proper way to get to vmcoreinfo.
>> >
>> > Same here. If VMCOREINFO must be locatable without communicating any
>> > information to the hypervisor, then I would rather go for something
>> > similar to what s390(x) folks do - a well-known location in physical
>> > memory that contains a pointer to a checksummed OS info structure,
>> > which in turn contains the VMCOREINFO pointers.
>> >
>> > I'm a bit surprised such mechanism is not needed by Fujitsu SADUMP.
>> > Or is that part of the current plan, Daisuke?
>> >
>>
>> It's useful if there is. I don't plan now. For now, the idea of this
>> patch is enough for me.
>>
>> BTW, for the above idea, I suspect that if the location in the
>> physical memory is unique, it cannot deal with the kdump 2nd kernel
>> case.
>
> No, not at all. The low 640K are copied away to a pre-allocated area by
> kexec purgatory code on x86_64, so it's safe to overwrite any location
> in there. The copy is needed, because BIOS already uses some hardcoded
> addresses in that range. I think the Linux kernel may safely use part of
> PFN 0 starting at physical address 0x0500. This area was originally
> used by MS-DOS, so chances are high that no broken BIOS out there
> corrupts this part of RAM...
>

In fact, I didn't consider in such deep way... I had forgot back up
region at all. But it's hard to use the low 640K area. Then, it's hard
to get phys_base of the kdump 1st kernel that is assumed to be saved
in thw low 640K now. Because externally running mechanism can run
after kdump 2nd kernel has booted up, crash utility needs to convert a
read request to the low 640K area into the corresponding part of the
pre-allocated area. See kdump_backup_region_init() in crash utility,
which tries to find the pre-allocated area via ELF header, where
symbol kexec_crash_image is read to find ELF header. This means we
need phys_base to find the pre-allocated area.

> Anyway, I'm not going to implement it right now for lack of time. I'm
> adding it to my TODO list, but if anybody wants to post a patch, I
> won't be offended.
>
> Petr T

--
Thanks.
HATAYAMA, Daisuke

2014-11-14 12:36:18

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO

On Fri, 14 Nov 2014 18:54:23 +0900 (JST)
HATAYAMA Daisuke <[email protected]> wrote:

> From: Petr Tesarik <[email protected]>
> Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
> Date: Fri, 14 Nov 2014 09:31:45 +0100
>
> > On Fri, 14 Nov 2014 10:42:35 +0900 (JST)
> > HATAYAMA Daisuke <[email protected]> wrote:
> >
> >> From: Petr Tesarik <[email protected]>
> >> Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
> >> Date: Thu, 13 Nov 2014 15:48:10 +0100
> >>
> >> > On Thu, 13 Nov 2014 09:25:48 -0500
> >> > Vivek Goyal <[email protected]> wrote:
> >> >
> >> >> On Thu, Nov 13, 2014 at 05:30:21PM +0900, HATAYAMA, Daisuke wrote:
> >> >> >
> >> >> > (2014/11/13 17:06), Petr Tesarik wrote:
> >> >> > >On Thu, 13 Nov 2014 09:17:09 +0900 (JST)
> >> >> > >HATAYAMA Daisuke <[email protected]> wrote:
> >> >> > >
> >> >> > >>From: Vivek Goyal <[email protected]>
> >> >> > >>Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
> >> >> > >>Date: Wed, 12 Nov 2014 17:12:05 -0500
> >> >> > >>
> >> >> > >>>On Wed, Nov 12, 2014 at 03:40:42PM +0900, HATAYAMA Daisuke wrote:
> >> >> > >>>>Currently, VMCOREINFO note information reports the virtual address of
> >> >> > >>>>phys_base that is assigned to symbol phys_base. But this doesn't make
> >> >> > >>>>sense because to refer to value of the phys_base, it's necessary to
> >> >> > >>>>get the value of phys_base itself we are now about to refer to.
> >> >> > >>>>
> >> >> > >>>
> >> >> > >>>Hi Hatayama,
> >> >> > >>>
> >> >> > >>>/proc/vmcore ELF headers have virtual address information and using
> >> >> > >>>that you should be able to read actual value of phys_base. gdb deals
> >> >> > >>>with virtual addresses all the time and can read value of any symbol
> >> >> > >>>using those headers.
> >> >> > >>>
> >> >> > >>>So I am not sure what's the need for exporting actual value of
> >> >> > >>>phys_base.
> >> >> > >>>
> >> >> > >>
> >> >> > >>Sorry, my logic in the patch description was wrong. For /proc/vmcore,
> >> >> > >>there's enough information for makedumpdile to get phys_base. It's
> >> >> > >>correct. The problem here is that other crash dump mechanisms that run
> >> >> > >>outside Linux kernel independently don't have information to get
> >> >> > >>phys_base.
> >> >> > >
> >> >> > >Yes, but these mechanisms won't be able to read VMCOREINFO either, will
> >> >> > >they?
> >> >> > >
> >> >> >
> >> >> > I don't intend such sophisticated function only by VMCOREINFO.
> >> >> > Search vmcore for VMCOREINFO using strings + grep before opening it by crash.
> >> >> > I intend that only here.
> >> >>
> >> >> I think this is very crude and not proper way to get to vmcoreinfo.
> >> >
> >> > Same here. If VMCOREINFO must be locatable without communicating any
> >> > information to the hypervisor, then I would rather go for something
> >> > similar to what s390(x) folks do - a well-known location in physical
> >> > memory that contains a pointer to a checksummed OS info structure,
> >> > which in turn contains the VMCOREINFO pointers.
> >> >
> >> > I'm a bit surprised such mechanism is not needed by Fujitsu SADUMP.
> >> > Or is that part of the current plan, Daisuke?
> >> >
> >>
> >> It's useful if there is. I don't plan now. For now, the idea of this
> >> patch is enough for me.
> >>
> >> BTW, for the above idea, I suspect that if the location in the
> >> physical memory is unique, it cannot deal with the kdump 2nd kernel
> >> case.
> >
> > No, not at all. The low 640K are copied away to a pre-allocated area by
> > kexec purgatory code on x86_64, so it's safe to overwrite any location
> > in there. The copy is needed, because BIOS already uses some hardcoded
> > addresses in that range. I think the Linux kernel may safely use part of
> > PFN 0 starting at physical address 0x0500. This area was originally
> > used by MS-DOS, so chances are high that no broken BIOS out there
> > corrupts this part of RAM...
> >
>
> In fact, I didn't consider in such deep way... I had forgot back up
> region at all. But it's hard to use the low 640K area. Then, it's hard
> to get phys_base of the kdump 1st kernel that is assumed to be saved
> in thw low 640K now. Because externally running mechanism can run
> after kdump 2nd kernel has booted up, crash utility needs to convert a
> read request to the low 640K area into the corresponding part of the
> pre-allocated area. See kdump_backup_region_init() in crash utility,
> which tries to find the pre-allocated area via ELF header, where
> symbol kexec_crash_image is read to find ELF header. This means we
> need phys_base to find the pre-allocated area.

Wrong again, I'm afraid.

So, first of all, an admin should make up your mind if you want to use
kexec-based dumping, or stand-alone dumping. OK, you seem to address
a corner case when s/he configures both. But in that case, the
stand-alone dump can be used to look at _BOTH_ kernels, and the default
should indeed be the one that was currently running. After all, I have
already debugged the _SECONDARY_ kernel environment several times...

However, it even works. If somebody wants to see the crashed kernel
from the same dump, they can use the second kernel's internal
structures to locate the corresponding phys_base and pass that as an
option to crash.

Let me illustrate the situation:

+-------------------+
| secondary kernel | <--- low 640K
| private pointers -+--\
| | | (1)
| | |
+-------------------+<-+-----\
| | | |
| primary kernel | | |
Z Z | |
| | | |
+-------------------+<-/ | (3)
| secondary kernel | |
| (contains pointer | |
| to backup area) -+--\ |
+-------------------+ | (2) |
| backup area |<-/ |
| -+--------/
+-------------------+
| |
| 1st kernel again |
Z Z
+-------------------+

The information is nicely chained in this diagram:

(1) Low 640K allows you to find the currently running kernel
(here it is the kdump kernel).
(2) This kernel knows where to find the backup area (otherwise it
couldn't correctly map them in /proc/vmcore).
(3) The backup area allows yoou to find the previously runnning
kernel (the 1st kernel).

I really don't see any issues with the concept, although I haven't
tried it in practice (yet).

Petr T

2014-11-17 05:19:10

by Hatayama, Daisuke

[permalink] [raw]
Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO

From: Petr Tesarik <[email protected]>
Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
Date: Fri, 14 Nov 2014 13:36:10 +0100

> On Fri, 14 Nov 2014 18:54:23 +0900 (JST)
> HATAYAMA Daisuke <[email protected]> wrote:
>
>> From: Petr Tesarik <[email protected]>
>> Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
>> Date: Fri, 14 Nov 2014 09:31:45 +0100
>>
>> > On Fri, 14 Nov 2014 10:42:35 +0900 (JST)
>> > HATAYAMA Daisuke <[email protected]> wrote:
>> >
>> >> From: Petr Tesarik <[email protected]>
>> >> Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
>> >> Date: Thu, 13 Nov 2014 15:48:10 +0100
>> >>
>> >> > On Thu, 13 Nov 2014 09:25:48 -0500
>> >> > Vivek Goyal <[email protected]> wrote:
>> >> >
>> >> >> On Thu, Nov 13, 2014 at 05:30:21PM +0900, HATAYAMA, Daisuke wrote:
>> >> >> >
>> >> >> > (2014/11/13 17:06), Petr Tesarik wrote:
>> >> >> > >On Thu, 13 Nov 2014 09:17:09 +0900 (JST)
>> >> >> > >HATAYAMA Daisuke <[email protected]> wrote:
>> >> >> > >
>> >> >> > >>From: Vivek Goyal <[email protected]>
>> >> >> > >>Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
>> >> >> > >>Date: Wed, 12 Nov 2014 17:12:05 -0500
>> >> >> > >>
>> >> >> > >>>On Wed, Nov 12, 2014 at 03:40:42PM +0900, HATAYAMA Daisuke wrote:
>> >> >> > >>>>Currently, VMCOREINFO note information reports the virtual address of
>> >> >> > >>>>phys_base that is assigned to symbol phys_base. But this doesn't make
>> >> >> > >>>>sense because to refer to value of the phys_base, it's necessary to
>> >> >> > >>>>get the value of phys_base itself we are now about to refer to.
>> >> >> > >>>>
>> >> >> > >>>
>> >> >> > >>>Hi Hatayama,
>> >> >> > >>>
>> >> >> > >>>/proc/vmcore ELF headers have virtual address information and using
>> >> >> > >>>that you should be able to read actual value of phys_base. gdb deals
>> >> >> > >>>with virtual addresses all the time and can read value of any symbol
>> >> >> > >>>using those headers.
>> >> >> > >>>
>> >> >> > >>>So I am not sure what's the need for exporting actual value of
>> >> >> > >>>phys_base.
>> >> >> > >>>
>> >> >> > >>
>> >> >> > >>Sorry, my logic in the patch description was wrong. For /proc/vmcore,
>> >> >> > >>there's enough information for makedumpdile to get phys_base. It's
>> >> >> > >>correct. The problem here is that other crash dump mechanisms that run
>> >> >> > >>outside Linux kernel independently don't have information to get
>> >> >> > >>phys_base.
>> >> >> > >
>> >> >> > >Yes, but these mechanisms won't be able to read VMCOREINFO either, will
>> >> >> > >they?
>> >> >> > >
>> >> >> >
>> >> >> > I don't intend such sophisticated function only by VMCOREINFO.
>> >> >> > Search vmcore for VMCOREINFO using strings + grep before opening it by crash.
>> >> >> > I intend that only here.
>> >> >>
>> >> >> I think this is very crude and not proper way to get to vmcoreinfo.
>> >> >
>> >> > Same here. If VMCOREINFO must be locatable without communicating any
>> >> > information to the hypervisor, then I would rather go for something
>> >> > similar to what s390(x) folks do - a well-known location in physical
>> >> > memory that contains a pointer to a checksummed OS info structure,
>> >> > which in turn contains the VMCOREINFO pointers.
>> >> >
>> >> > I'm a bit surprised such mechanism is not needed by Fujitsu SADUMP.
>> >> > Or is that part of the current plan, Daisuke?
>> >> >
>> >>
>> >> It's useful if there is. I don't plan now. For now, the idea of this
>> >> patch is enough for me.
>> >>
>> >> BTW, for the above idea, I suspect that if the location in the
>> >> physical memory is unique, it cannot deal with the kdump 2nd kernel
>> >> case.
>> >
>> > No, not at all. The low 640K are copied away to a pre-allocated area by
>> > kexec purgatory code on x86_64, so it's safe to overwrite any location
>> > in there. The copy is needed, because BIOS already uses some hardcoded
>> > addresses in that range. I think the Linux kernel may safely use part of
>> > PFN 0 starting at physical address 0x0500. This area was originally
>> > used by MS-DOS, so chances are high that no broken BIOS out there
>> > corrupts this part of RAM...
>> >
>>
>> In fact, I didn't consider in such deep way... I had forgot back up
>> region at all. But it's hard to use the low 640K area. Then, it's hard
>> to get phys_base of the kdump 1st kernel that is assumed to be saved
>> in thw low 640K now. Because externally running mechanism can run
>> after kdump 2nd kernel has booted up, crash utility needs to convert a
>> read request to the low 640K area into the corresponding part of the
>> pre-allocated area. See kdump_backup_region_init() in crash utility,
>> which tries to find the pre-allocated area via ELF header, where
>> symbol kexec_crash_image is read to find ELF header. This means we
>> need phys_base to find the pre-allocated area.
>
> Wrong again, I'm afraid.
>
> So, first of all, an admin should make up your mind if you want to use
> kexec-based dumping, or stand-alone dumping. OK, you seem to address
> a corner case when s/he configures both. But in that case, the

It's a never corner case. We usually use both. There's difference in
data reliability between kdump and others in that kdump can do cleanup
in kernel logic level at the end of the kdump 1st kernel prior to
kdump 2nd kernel, and difference in dumping feature that there's
makedumpfile that can filter memory to size of crash dump. OTOH,
external dump can still possibly work well even if kdump doesn't but
could generate less reliable data and has less features. After all,
it's best to use both.

> stand-alone dump can be used to look at _BOTH_ kernels, and the default
> should indeed be the one that was currently running. After all, I have
> already debugged the _SECONDARY_ kernel environment several times...
>
> However, it even works. If somebody wants to see the crashed kernel
> from the same dump, they can use the second kernel's internal
> structures to locate the corresponding phys_base and pass that as an
> option to crash.
>
> Let me illustrate the situation:
>
> +-------------------+
> | secondary kernel | <--- low 640K
> | private pointers -+--\
> | | | (1)
> | | |
> +-------------------+<-+-----\
> | | | |
> | primary kernel | | |
> Z Z | |
> | | | |
> +-------------------+<-/ | (3)
> | secondary kernel | |
> | (contains pointer | |
> | to backup area) -+--\ |
> +-------------------+ | (2) |
> | backup area |<-/ |
> | -+--------/
> +-------------------+
> | |
> | 1st kernel again |
> Z Z
> +-------------------+
>
> The information is nicely chained in this diagram:
>
> (1) Low 640K allows you to find the currently running kernel
> (here it is the kdump kernel).
> (2) This kernel knows where to find the backup area (otherwise it
> couldn't correctly map them in /proc/vmcore).
> (3) The backup area allows yoou to find the previously runnning
> kernel (the 1st kernel).
>
> I really don't see any issues with the concept, although I haven't
> tried it in practice (yet).
>
> Petr T

I'm not assuming that you intend to implement this logic in external
crash dump mechanisms such as qemu; this is too specific to Linux
kernel.

I still think the idea of my patch is simple and practical enough.

--
Thanks.
HATAYAMA, Daisuke

2014-11-17 20:39:06

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO

On Fri, Nov 14, 2014 at 10:31:33AM +0900, HATAYAMA Daisuke wrote:
> From: Vivek Goyal <[email protected]>
> Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
> Date: Thu, 13 Nov 2014 09:25:48 -0500
>
> > On Thu, Nov 13, 2014 at 05:30:21PM +0900, HATAYAMA, Daisuke wrote:
> >>
> >>
> >> (2014/11/13 17:06), Petr Tesarik wrote:
> >> >On Thu, 13 Nov 2014 09:17:09 +0900 (JST)
> >> >HATAYAMA Daisuke <[email protected]> wrote:
> >> >
> >> >>From: Vivek Goyal <[email protected]>
> >> >>Subject: Re: [PATCH] kdump, x86: report actual value of phys_base in VMCOREINFO
> >> >>Date: Wed, 12 Nov 2014 17:12:05 -0500
> >> >>
> >> >>>On Wed, Nov 12, 2014 at 03:40:42PM +0900, HATAYAMA Daisuke wrote:
> >> >>>>Currently, VMCOREINFO note information reports the virtual address of
> >> >>>>phys_base that is assigned to symbol phys_base. But this doesn't make
> >> >>>>sense because to refer to value of the phys_base, it's necessary to
> >> >>>>get the value of phys_base itself we are now about to refer to.
> >> >>>>
> >> >>>
> >> >>>Hi Hatayama,
> >> >>>
> >> >>>/proc/vmcore ELF headers have virtual address information and using
> >> >>>that you should be able to read actual value of phys_base. gdb deals
> >> >>>with virtual addresses all the time and can read value of any symbol
> >> >>>using those headers.
> >> >>>
> >> >>>So I am not sure what's the need for exporting actual value of
> >> >>>phys_base.
> >> >>>
> >> >>
> >> >>Sorry, my logic in the patch description was wrong. For /proc/vmcore,
> >> >>there's enough information for makedumpdile to get phys_base. It's
> >> >>correct. The problem here is that other crash dump mechanisms that run
> >> >>outside Linux kernel independently don't have information to get
> >> >>phys_base.
> >> >
> >> >Yes, but these mechanisms won't be able to read VMCOREINFO either, will
> >> >they?
> >> >
> >>
> >> I don't intend such sophisticated function only by VMCOREINFO.
> >> Search vmcore for VMCOREINFO using strings + grep before opening it by crash.
> >> I intend that only here.
> >
> > I think this is very crude and not proper way to get to vmcoreinfo. Can
>
> I agree it's crude, but it's useful enough for my usecase.
>
> > you give more context. What are those mechanisms and what are you trying
> > to do.
> >
>
> I after all write the same thing in the patch description... I mean
> qemu dump, xendump (and other hypervisor dumps), firmware dumps
> implemented on each vendor system for the crash dump mechanism.

vmcoreinfo is exported by kdump mechanism (/proc/vmcore). These other
dump mechanism needs to figure a way out how to export relevant
information and it is not right to try to put more info in vmcoreinfo.

Don't try to write kernel data structures in such a way so that
somebody can scan these later. In an external dump mechanism there
is no notion of vmcoreinfo elf header. So these mechanisms need to
come up with their own way to query some basic information about
kernel and export appropriately.

Also this notion of relying on two mechanism is unnecessary
introducing extra complexity. I think you should provide user
a choice so that they can configure one or other. If you think
that firmware dump mechanisms are more reliable, just use these.
In fact when crash happens then OS should call into some
firmware hook to trigger dump. And along that hook one should
be able to pass relevant info.

Thanks
Vivek