2024-03-05 01:32:38

by Dave Young

[permalink] [raw]
Subject: [PATCH] x86/kexec: do not update E820 kexec table for setup_data

crashkernel reservation failed on a Thinkpad t440s laptop recently,
Actually the memblock reservation succeeded, but later insert_resource()
failed.

Test step:
kexec load ->
kexec reboot ->
check the crashkernel memory
dmesg|grep "crashkernel reserved"; saw reserved suceeeded:
0x00000000d0000000 - 0x00000000da000000
grep Crash /proc/iomem: got nothing

The background story is like below:
Currently E820 code reserves setup_data regions for both the current kernel
and the kexec kernel, and it will also insert them into resources list.
Before the kexec kernel reboot nobody passes the old setup_data, kexec only
passes SETUP_EFI and SETUP_IMA if needed. Thus the old setup data memory
are not used at all. But due to old kernel updated the kexec e820 table
as well so kexec kernel see them as E820_TYPE_RESERVED_KERN regions, later
the old setup_data regions will be inserted into resources list in kexec
kernel by e820__reserve_resources().

Note, due to no setup_data passed in for those old regions they are not
early reserved (by function early_reserve_memory), crashkernel memblock
reservation will just regard them as usable memory and it could reserve
reserve crashkernel region overlaps with the old setup_data regions.

Just like the bug I noticed here, kdump insert_resource failed because
e820__reserve_resources added the overlapped chunks in /proc/iomem already.

Finally, looking at the code, the old setup_data regions are not used
at all as no setup_data passed in by the kexec boot loader. Although
something like SETUP_PCI etc could be needed, kexec should pass
the info as setup_data so that kexec kernel can take care of them.
This should be taken care of in other separate patches if needed.

Thus drop the useless buggy code here.

Signed-off-by: Dave Young <[email protected]>
---
arch/x86/kernel/e820.c | 16 +---------------
1 file changed, 1 insertion(+), 15 deletions(-)

Index: linux/arch/x86/kernel/e820.c
===================================================================
--- linux.orig/arch/x86/kernel/e820.c
+++ linux/arch/x86/kernel/e820.c
@@ -1015,16 +1015,6 @@ void __init e820__reserve_setup_data(voi
pa_next = data->next;

e820__range_update(pa_data, sizeof(*data)+data->len, E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
-
- /*
- * SETUP_EFI and SETUP_IMA are supplied by kexec and do not need
- * to be reserved.
- */
- if (data->type != SETUP_EFI && data->type != SETUP_IMA)
- e820__range_update_kexec(pa_data,
- sizeof(*data) + data->len,
- E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
-
if (data->type == SETUP_INDIRECT) {
len += data->len;
early_memunmap(data, sizeof(*data));
@@ -1036,12 +1026,9 @@ void __init e820__reserve_setup_data(voi

indirect = (struct setup_indirect *)data->data;

- if (indirect->type != SETUP_INDIRECT) {
+ if (indirect->type != SETUP_INDIRECT)
e820__range_update(indirect->addr, indirect->len,
E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
- e820__range_update_kexec(indirect->addr, indirect->len,
- E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
- }
}

pa_data = pa_next;
@@ -1049,7 +1036,6 @@ void __init e820__reserve_setup_data(voi
}

e820__update_table(e820_table);
- e820__update_table(e820_table_kexec);

pr_info("extended physical RAM map:\n");
e820__print_table("reserve setup_data");



2024-03-19 04:13:38

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH] x86/kexec: do not update E820 kexec table for setup_data

On Tue, 5 Mar 2024 at 09:32, Dave Young <[email protected]> wrote:
>
> crashkernel reservation failed on a Thinkpad t440s laptop recently,
> Actually the memblock reservation succeeded, but later insert_resource()
> failed.
>
> Test step:
> kexec load ->
> kexec reboot ->
> check the crashkernel memory
> dmesg|grep "crashkernel reserved"; saw reserved suceeeded:
> 0x00000000d0000000 - 0x00000000da000000
> grep Crash /proc/iomem: got nothing
>
> The background story is like below:
> Currently E820 code reserves setup_data regions for both the current kernel
> and the kexec kernel, and it will also insert them into resources list.
> Before the kexec kernel reboot nobody passes the old setup_data, kexec only
> passes SETUP_EFI and SETUP_IMA if needed. Thus the old setup data memory
> are not used at all. But due to old kernel updated the kexec e820 table
> as well so kexec kernel see them as E820_TYPE_RESERVED_KERN regions, later
> the old setup_data regions will be inserted into resources list in kexec
> kernel by e820__reserve_resources().
>
> Note, due to no setup_data passed in for those old regions they are not
> early reserved (by function early_reserve_memory), crashkernel memblock
> reservation will just regard them as usable memory and it could reserve
> reserve crashkernel region overlaps with the old setup_data regions.
>
> Just like the bug I noticed here, kdump insert_resource failed because
> e820__reserve_resources added the overlapped chunks in /proc/iomem already.
>
> Finally, looking at the code, the old setup_data regions are not used
> at all as no setup_data passed in by the kexec boot loader. Although
> something like SETUP_PCI etc could be needed, kexec should pass
> the info as setup_data so that kexec kernel can take care of them.
> This should be taken care of in other separate patches if needed.
>
> Thus drop the useless buggy code here.
>
> Signed-off-by: Dave Young <[email protected]>
> ---
> arch/x86/kernel/e820.c | 16 +---------------
> 1 file changed, 1 insertion(+), 15 deletions(-)
>
> Index: linux/arch/x86/kernel/e820.c
> ===================================================================
> --- linux.orig/arch/x86/kernel/e820.c
> +++ linux/arch/x86/kernel/e820.c
> @@ -1015,16 +1015,6 @@ void __init e820__reserve_setup_data(voi
> pa_next = data->next;
>
> e820__range_update(pa_data, sizeof(*data)+data->len, E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
> -
> - /*
> - * SETUP_EFI and SETUP_IMA are supplied by kexec and do not need
> - * to be reserved.
> - */
> - if (data->type != SETUP_EFI && data->type != SETUP_IMA)
> - e820__range_update_kexec(pa_data,
> - sizeof(*data) + data->len,
> - E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
> -
> if (data->type == SETUP_INDIRECT) {
> len += data->len;
> early_memunmap(data, sizeof(*data));
> @@ -1036,12 +1026,9 @@ void __init e820__reserve_setup_data(voi
>
> indirect = (struct setup_indirect *)data->data;
>
> - if (indirect->type != SETUP_INDIRECT) {
> + if (indirect->type != SETUP_INDIRECT)
> e820__range_update(indirect->addr, indirect->len,
> E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
> - e820__range_update_kexec(indirect->addr, indirect->len,
> - E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
> - }
> }
>
> pa_data = pa_next;
> @@ -1049,7 +1036,6 @@ void __init e820__reserve_setup_data(voi
> }
>
> e820__update_table(e820_table);
> - e820__update_table(e820_table_kexec);
>
> pr_info("extended physical RAM map:\n");
> e820__print_table("reserve setup_data");
>

Kindly ping for review.

Thanks!
Dave


2024-03-20 21:56:45

by Huang, Kai

[permalink] [raw]
Subject: Re: [PATCH] x86/kexec: do not update E820 kexec table for setup_data

Hi Dave,

Some nitpicking in changelog:

On 5/03/2024 2:32 pm, Dave Young wrote:
> crashkernel reservation failed on a Thinkpad t440s laptop recently,

',' -> '.' to make it as a standalone sentence.

> Actually the memblock reservation succeeded, but later insert_resource()
> failed.
>
> Test step:
> kexec load ->
> kexec reboot ->
> check the crashkernel memory
> dmesg|grep "crashkernel reserved"; saw reserved suceeeded:

"suceeeded" -> "succeeded".

> 0x00000000d0000000 - 0x00000000da000000
> grep Crash /proc/iomem: got nothing

And somehow I found it's not easy to read. :-)

>
> The background story is like below:

Better to have an blank line to make text more breathable.

> Currently E820 code reserves setup_data regions for both the current kernel
> and the kexec kernel, and it will also insert them into resources list.

"will insert" -> "inserts".

> Before the kexec kernel reboot nobody passes the old setup_data, kexec only

^ "reboots" ^ and

> passes SETUP_EFI and SETUP_IMA if needed. Thus the old setup data memory
> are not used at all. But due to old kernel updated the kexec e820 table

^ is ^ updates

> as well so kexec kernel see them as E820_TYPE_RESERVED_KERN regions, later

"so kexec kernel" -> ", the kexec kernel"

"see" -> "sees"

", later" -> ", and later"

> the old setup_data regions will be inserted into resources list in kexec

"will be" -> "are"

> kernel by e820__reserve_resources().
>
> Note, due to no setup_data passed in for those old regions they are not

^ is

> early reserved (by function early_reserve_memory), crashkernel memblock

^ and the

> reservation will just regard them as usable memory and it could reserve

"regard" -> "treat"

> reserve crashkernel region overlaps with the old setup_data regions.

double "reserve".

"crashkernel region" -> "the crashkernel region"

"overlaps" -> "which overlaps"

>
> Just like the bug I noticed here, kdump insert_resource failed because
> e820__reserve_resources added the overlapped chunks in /proc/iomem already.

^ has added
>
> Finally, looking at the code, the old setup_data regions are not used
> at all as no setup_data passed in by the kexec boot loader. Although

^ is passed

> something like SETUP_PCI etc could be needed, kexec should pass
> the info as setup_data so that kexec kernel can take care of them.
> This should be taken care of in other separate patches if needed.
>
> Thus drop the useless buggy code here.
>
> Signed-off-by: Dave Young <[email protected]>
> ---
> arch/x86/kernel/e820.c | 16 +---------------
> 1 file changed, 1 insertion(+), 15 deletions(-)
>
> Index: linux/arch/x86/kernel/e820.c
> ===================================================================
> --- linux.orig/arch/x86/kernel/e820.c
> +++ linux/arch/x86/kernel/e820.c
> @@ -1015,16 +1015,6 @@ void __init e820__reserve_setup_data(voi
> pa_next = data->next;
>
> e820__range_update(pa_data, sizeof(*data)+data->len, E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
> -
> - /*
> - * SETUP_EFI and SETUP_IMA are supplied by kexec and do not need
> - * to be reserved.
> - */
> - if (data->type != SETUP_EFI && data->type != SETUP_IMA)
> - e820__range_update_kexec(pa_data,
> - sizeof(*data) + data->len,
> - E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
> -
> if (data->type == SETUP_INDIRECT) {
> len += data->len;
> early_memunmap(data, sizeof(*data));
> @@ -1036,12 +1026,9 @@ void __init e820__reserve_setup_data(voi
>
> indirect = (struct setup_indirect *)data->data;
>
> - if (indirect->type != SETUP_INDIRECT) {
> + if (indirect->type != SETUP_INDIRECT)
> e820__range_update(indirect->addr, indirect->len,
> E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
> - e820__range_update_kexec(indirect->addr, indirect->len,
> - E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
> - }
> }
>
> pa_data = pa_next;
> @@ -1049,7 +1036,6 @@ void __init e820__reserve_setup_data(voi
> }
>
> e820__update_table(e820_table);
> - e820__update_table(e820_table_kexec);
>
> pr_info("extended physical RAM map:\n");
> e820__print_table("reserve setup_data");
>
>

2024-03-21 00:57:04

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH] x86/kexec: do not update E820 kexec table for setup_data

Hi,

On Thu, 21 Mar 2024 at 05:56, Huang, Kai <[email protected]> wrote:
>
> Hi Dave,
>
> Some nitpicking in changelog:

Will fix the grammar issues. Thanks for your review!

>
> On 5/03/2024 2:32 pm, Dave Young wrote:
> > crashkernel reservation failed on a Thinkpad t440s laptop recently,
>
> ',' -> '.' to make it as a standalone sentence.
>
> > Actually the memblock reservation succeeded, but later insert_resource()
> > failed.
> >
> > Test step:
> > kexec load ->
> > kexec reboot ->
> > check the crashkernel memory
> > dmesg|grep "crashkernel reserved"; saw reserved suceeeded:
>
> "suceeeded" -> "succeeded".
>
> > 0x00000000d0000000 - 0x00000000da000000
> > grep Crash /proc/iomem: got nothing
>
> And somehow I found it's not easy to read. :-)
>
> >
> > The background story is like below:
>
> Better to have an blank line to make text more breathable.
>
> > Currently E820 code reserves setup_data regions for both the current kernel
> > and the kexec kernel, and it will also insert them into resources list.
>
> "will insert" -> "inserts".
>
> > Before the kexec kernel reboot nobody passes the old setup_data, kexec only
>
> ^ "reboots" ^ and
>
> > passes SETUP_EFI and SETUP_IMA if needed. Thus the old setup data memory
> > are not used at all. But due to old kernel updated the kexec e820 table
>
> ^ is ^ updates
>
> > as well so kexec kernel see them as E820_TYPE_RESERVED_KERN regions, later
>
> "so kexec kernel" -> ", the kexec kernel"
>
> "see" -> "sees"
>
> ", later" -> ", and later"
>
> > the old setup_data regions will be inserted into resources list in kexec
>
> "will be" -> "are"
>
> > kernel by e820__reserve_resources().
> >
> > Note, due to no setup_data passed in for those old regions they are not
>
> ^ is
>
> > early reserved (by function early_reserve_memory), crashkernel memblock
>
> ^ and the
>
> > reservation will just regard them as usable memory and it could reserve
>
> "regard" -> "treat"
>
> > reserve crashkernel region overlaps with the old setup_data regions.
>
> double "reserve".
>
> "crashkernel region" -> "the crashkernel region"
>
> "overlaps" -> "which overlaps"
>
> >
> > Just like the bug I noticed here, kdump insert_resource failed because
> > e820__reserve_resources added the overlapped chunks in /proc/iomem already.
>
> ^ has added
> >
> > Finally, looking at the code, the old setup_data regions are not used
> > at all as no setup_data passed in by the kexec boot loader. Although
>
> ^ is passed
>
> > something like SETUP_PCI etc could be needed, kexec should pass
> > the info as setup_data so that kexec kernel can take care of them.
> > This should be taken care of in other separate patches if needed.
> >
> > Thus drop the useless buggy code here.
> >
> > Signed-off-by: Dave Young <[email protected]>
> > ---
> > arch/x86/kernel/e820.c | 16 +---------------
> > 1 file changed, 1 insertion(+), 15 deletions(-)
> >
> > Index: linux/arch/x86/kernel/e820.c
> > ===================================================================
> > --- linux.orig/arch/x86/kernel/e820.c
> > +++ linux/arch/x86/kernel/e820.c
> > @@ -1015,16 +1015,6 @@ void __init e820__reserve_setup_data(voi
> > pa_next = data->next;
> >
> > e820__range_update(pa_data, sizeof(*data)+data->len, E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
> > -
> > - /*
> > - * SETUP_EFI and SETUP_IMA are supplied by kexec and do not need
> > - * to be reserved.
> > - */
> > - if (data->type != SETUP_EFI && data->type != SETUP_IMA)
> > - e820__range_update_kexec(pa_data,
> > - sizeof(*data) + data->len,
> > - E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
> > -
> > if (data->type == SETUP_INDIRECT) {
> > len += data->len;
> > early_memunmap(data, sizeof(*data));
> > @@ -1036,12 +1026,9 @@ void __init e820__reserve_setup_data(voi
> >
> > indirect = (struct setup_indirect *)data->data;
> >
> > - if (indirect->type != SETUP_INDIRECT) {
> > + if (indirect->type != SETUP_INDIRECT)
> > e820__range_update(indirect->addr, indirect->len,
> > E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
> > - e820__range_update_kexec(indirect->addr, indirect->len,
> > - E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
> > - }
> > }
> >
> > pa_data = pa_next;
> > @@ -1049,7 +1036,6 @@ void __init e820__reserve_setup_data(voi
> > }
> >
> > e820__update_table(e820_table);
> > - e820__update_table(e820_table_kexec);
> >
> > pr_info("extended physical RAM map:\n");
> > e820__print_table("reserve setup_data");
> >
> >
>