2022-01-27 16:42:32

by Tiezhu Yang

[permalink] [raw]
Subject: [RFC PATCH] kdump: Add support for crashkernel=auto

Set the reserved memory automatically for the crash kernel based on
architecture.

Most code of this patch come from:
https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-8/-/tree/c8s

Signed-off-by: Tiezhu Yang <[email protected]>
---
Documentation/admin-guide/kdump/kdump.rst | 13 +++++++++++++
Documentation/admin-guide/kernel-parameters.txt | 5 +++++
kernel/crash_core.c | 20 ++++++++++++++++++++
3 files changed, 38 insertions(+)

diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst
index cb30ca3d..8f8a9cc 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -335,6 +335,19 @@ crashkernel syntax

crashkernel=0,low

+4) crashkernel=auto
+
+ You can use crashkernel=auto if you have enough memory. The threshold
+ is 1G on x86_64 and s390x, 2G on arm64, ppc64 and ppc64le. If your system
+ memory is less than the threshold crashkernel=auto will not reserve memory.
+
+ The automatically reserved memory size varies based on architecture.
+ The size changes according to system memory size like below:
+ x86_64: 1G-4G:160M,4G-64G:192M,64G-1T:256M,1T-:512M
+ s390x: 1G-4G:160M,4G-64G:192M,64G-1T:256M,1T-:512M
+ arm64: 2G-:448M
+ ppc64: 2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G
+
Boot into System Kernel
-----------------------
1) Update the boot loader (such as grub, yaboot, or lilo) configuration
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f5a27f0..14f052d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -783,6 +783,11 @@
Format:
<first_slot>,<last_slot>,<port>,<enum_bit>[,<debug>]

+ crashkernel=auto
+ [KNL] Set the reserved memory automatically for the crash kernel
+ based on architecture.
+ See Documentation/admin-guide/kdump/kdump.rst for further details.
+
crashkernel=size[KMG][@offset[KMG]]
[KNL] Using kexec, Linux can switch to a 'crash kernel'
upon panic. This parameter reserves the physical
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 256cf6d..32c51e2 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -252,6 +252,26 @@ static int __init __parse_crashkernel(char *cmdline,
if (suffix)
return parse_crashkernel_suffix(ck_cmdline, crash_size,
suffix);
+
+ if (strncmp(ck_cmdline, "auto", 4) == 0) {
+#if defined(CONFIG_X86_64) || defined(CONFIG_S390)
+ ck_cmdline = "1G-4G:160M,4G-64G:192M,64G-1T:256M,1T-:512M";
+#elif defined(CONFIG_ARM64)
+ ck_cmdline = "2G-:448M";
+#elif defined(CONFIG_PPC64)
+ char *fadump_cmdline;
+
+ fadump_cmdline = get_last_crashkernel(cmdline, "fadump=", NULL);
+ fadump_cmdline = fadump_cmdline ?
+ fadump_cmdline + strlen("fadump=") : NULL;
+ if (!fadump_cmdline || (strncmp(fadump_cmdline, "off", 3) == 0))
+ ck_cmdline = "2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G";
+ else
+ ck_cmdline = "4G-16G:768M,16G-64G:1G,64G-128G:2G,128G-1T:4G,1T-2T:6G,2T-4T:12G,4T-8T:20G,8T-16T:36G,16T-32T:64G,32T-64T:128G,64T-:180G";
+#endif
+ pr_info("Using crashkernel=auto, the size chosen is a best effort estimation.\n");
+ }
+
/*
* if the commandline contains a ':', then that's the extended
* syntax -- if not, it must be the classic syntax
--
2.1.0


2022-01-28 08:36:47

by Petr Tesařík

[permalink] [raw]
Subject: Re: [RFC PATCH] kdump: Add support for crashkernel=auto

Hi Tiezhu Yang,

I'm afraid the whole concept is broken by design. See below.

Dne 27. 01. 22 v 10:31 Tiezhu Yang napsal(a):
> Set the reserved memory automatically for the crash kernel based on
> architecture.
>
> Most code of this patch come from:
> https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-8/-/tree/c8s

And that's the problem, I think. The solution might be good for this
specific OS, but not for others.

>[...]
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 256cf6d..32c51e2 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -252,6 +252,26 @@ static int __init __parse_crashkernel(char *cmdline,
> if (suffix)
> return parse_crashkernel_suffix(ck_cmdline, crash_size,
> suffix);
> +
> + if (strncmp(ck_cmdline, "auto", 4) == 0) {
> +#if defined(CONFIG_X86_64) || defined(CONFIG_S390)
> + ck_cmdline = "1G-4G:160M,4G-64G:192M,64G-1T:256M,1T-:512M";
> +#elif defined(CONFIG_ARM64)
> + ck_cmdline = "2G-:448M";
> +#elif defined(CONFIG_PPC64)
> + char *fadump_cmdline;
> +
> + fadump_cmdline = get_last_crashkernel(cmdline, "fadump=", NULL);
> + fadump_cmdline = fadump_cmdline ?
> + fadump_cmdline + strlen("fadump=") : NULL;
> + if (!fadump_cmdline || (strncmp(fadump_cmdline, "off", 3) == 0))
> + ck_cmdline = "2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G";
> + else
> + ck_cmdline = "4G-16G:768M,16G-64G:1G,64G-128G:2G,128G-1T:4G,1T-2T:6G,2T-4T:12G,4T-8T:20G,8T-16T:36G,16T-32T:64G,32T-64T:128G,64T-:180G";
> +#endif
> + pr_info("Using crashkernel=auto, the size chosen is a best effort estimation.\n");
> + }
> +

How did you even arrive at the above numbers? I've done some research on
this topic recently (ie. during the last 7 years or so). My x86_64
system with 8G RAM running openSUSE Leap 15.3 seems needs 188M for
saving to the local disk, and 203M to save over the network (using
SFTP). My PPC64 LPAR with 16G RAM running latest Beta of SLES 15 SP4
needs 587M, i.e. with the above numbers it may run out of memory while
saving the dump.

Since this is not the first time, I'm trying to explain things, I've
written a blog post now:

https://sigillatum.tesarici.cz/2022-01-27-whats-wrong-with-crashkernel-auto.html

HTH
Petr Tesarik


Attachments:
OpenPGP_0xAA503BC9AE0F47A7.asc (18.33 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2022-01-28 08:37:13

by Petr Tesařík

[permalink] [raw]
Subject: Re: [RFC PATCH] kdump: Add support for crashkernel=auto

Hi Tiezhu Yang,

I'm afraid the whole concept is broken by design. See below.

Dne 27. 01. 22 v 10:31 Tiezhu Yang napsal(a):
> Set the reserved memory automatically for the crash kernel based on
> architecture.
>
> Most code of this patch come from:
> https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-8/-/tree/c8s

And that's the problem, I think. The solution might be good for this
specific OS, but not for others.

>[...]
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 256cf6d..32c51e2 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -252,6 +252,26 @@ static int __init __parse_crashkernel(char *cmdline,
> if (suffix)
> return parse_crashkernel_suffix(ck_cmdline, crash_size,
> suffix);
> +
> + if (strncmp(ck_cmdline, "auto", 4) == 0) {
> +#if defined(CONFIG_X86_64) || defined(CONFIG_S390)
> + ck_cmdline = "1G-4G:160M,4G-64G:192M,64G-1T:256M,1T-:512M";
> +#elif defined(CONFIG_ARM64)
> + ck_cmdline = "2G-:448M";
> +#elif defined(CONFIG_PPC64)
> + char *fadump_cmdline;
> +
> + fadump_cmdline = get_last_crashkernel(cmdline, "fadump=", NULL);
> + fadump_cmdline = fadump_cmdline ?
> + fadump_cmdline + strlen("fadump=") : NULL;
> + if (!fadump_cmdline || (strncmp(fadump_cmdline, "off", 3) == 0))
> + ck_cmdline = "2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G";
> + else
> + ck_cmdline = "4G-16G:768M,16G-64G:1G,64G-128G:2G,128G-1T:4G,1T-2T:6G,2T-4T:12G,4T-8T:20G,8T-16T:36G,16T-32T:64G,32T-64T:128G,64T-:180G";
> +#endif
> + pr_info("Using crashkernel=auto, the size chosen is a best effort estimation.\n");
> + }
> +

How did you even arrive at the above numbers? I've done some research on
this topic recently (ie. during the last 7 years or so). My x86_64
system with 8G RAM running openSUSE Leap 15.3 seems needs 188M for
saving to the local disk, and 203M to save over the network (using
SFTP). My PPC64 LPAR with 16G RAM running latest Beta of SLES 15 SP4
needs 587M, i.e. with the above numbers it may run out of memory while
saving the dump.

Since this is not the first time, I'm trying to explain things, I've
written a blog post now:

https://sigillatum.tesarici.cz/2022-01-27-whats-wrong-with-crashkernel-auto.html

HTH
Petr Tesarik

2022-01-29 07:18:45

by Tiezhu Yang

[permalink] [raw]
Subject: Re: [RFC PATCH] kdump: Add support for crashkernel=auto



On 01/27/2022 11:53 PM, Petr Tesařík wrote:
> Hi Tiezhu Yang,
>
> I'm afraid the whole concept is broken by design. See below.
>
> Dne 27. 01. 22 v 10:31 Tiezhu Yang napsal(a):
>> Set the reserved memory automatically for the crash kernel based on
>> architecture.
>>
>> Most code of this patch come from:
>> https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-8/-/tree/c8s
>>
>
> And that's the problem, I think. The solution might be good for this
> specific OS, but not for others.

Hi Petr,

Thank you for your reply.

This is a RFC patch, the initial aim of this patch is to discuss what is
the proper way to support crashkernel=auto.

A moment ago, I find the following patch, it is more flexible, but it is
not merged into the upstream kernel now.

kernel/crash_core: Add crashkernel=auto for vmcore creation

https://lore.kernel.org/lkml/[email protected]/

>
>> [...]
>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>> index 256cf6d..32c51e2 100644
>> --- a/kernel/crash_core.c
>> +++ b/kernel/crash_core.c
>> @@ -252,6 +252,26 @@ static int __init __parse_crashkernel(char *cmdline,
>> if (suffix)
>> return parse_crashkernel_suffix(ck_cmdline, crash_size,
>> suffix);
>> +
>> + if (strncmp(ck_cmdline, "auto", 4) == 0) {
>> +#if defined(CONFIG_X86_64) || defined(CONFIG_S390)
>> + ck_cmdline = "1G-4G:160M,4G-64G:192M,64G-1T:256M,1T-:512M";
>> +#elif defined(CONFIG_ARM64)
>> + ck_cmdline = "2G-:448M";
>> +#elif defined(CONFIG_PPC64)
>> + char *fadump_cmdline;
>> +
>> + fadump_cmdline = get_last_crashkernel(cmdline, "fadump=", NULL);
>> + fadump_cmdline = fadump_cmdline ?
>> + fadump_cmdline + strlen("fadump=") : NULL;
>> + if (!fadump_cmdline || (strncmp(fadump_cmdline, "off", 3) == 0))
>> + ck_cmdline =
>> "2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G";
>> + else
>> + ck_cmdline =
>> "4G-16G:768M,16G-64G:1G,64G-128G:2G,128G-1T:4G,1T-2T:6G,2T-4T:12G,4T-8T:20G,8T-16T:36G,16T-32T:64G,32T-64T:128G,64T-:180G";
>>
>> +#endif
>> + pr_info("Using crashkernel=auto, the size chosen is a best
>> effort estimation.\n");
>> + }
>> +
>
> How did you even arrive at the above numbers?

Memory requirements for kdump:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/supported-kdump-configurations-and-targets_managing-monitoring-and-updating-the-kernel#memory-requirements-for-kdump_supported-kdump-configurations-and-targets

I've done some research on
> this topic recently (ie. during the last 7 years or so). My x86_64
> system with 8G RAM running openSUSE Leap 15.3 seems needs 188M for
> saving to the local disk, and 203M to save over the network (using
> SFTP). My PPC64 LPAR with 16G RAM running latest Beta of SLES 15 SP4
> needs 587M, i.e. with the above numbers it may run out of memory while
> saving the dump.
>
> Since this is not the first time, I'm trying to explain things, I've
> written a blog post now:
>
> https://sigillatum.tesarici.cz/2022-01-27-whats-wrong-with-crashkernel-auto.html
>

Thank you, this is useful.

Thanks,
Tiezhu

>
> HTH
> Petr Tesarik

2022-01-30 23:41:43

by Petr Tesařík

[permalink] [raw]
Subject: Re: [RFC PATCH] kdump: Add support for crashkernel=auto

Hi Tiezhu Yang,

On Jan 28, 2022 at 02:20 Tiezhu Yang wrote:
>[...]
> Hi Petr,
>
> Thank you for your reply.
>
> This is a RFC patch, the initial aim of this patch is to discuss what is
> the proper way to support crashkernel=auto.

Well, the point I'm trying to make is that crashkernel=auto cannot be
implemented. Your code would have to know what happens in the future,
and AFAIK time travel has not been discovered yet. ;-)

A better approach is to make a very large allocation initially, e.g.
half of available RAM. The remaining RAM should still be big enough to
start booting the system. Later, when a kdump user-space service knows
what it wants to load, it can shrink the reservation by writing a lower
value into /sys/kernel/kexec_crash_size.

The alternative approach does not need any changes to the kernel, except
maybe adding something like "crashkernel=max".

Just my two cents,
Petr T

> A moment ago, I find the following patch, it is more flexible, but it is
> not merged into the upstream kernel now.
>
> kernel/crash_core: Add crashkernel=auto for vmcore creation
>
> https://lore.kernel.org/lkml/[email protected]/
>
>
>>
>>> [...]
>>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>>> index 256cf6d..32c51e2 100644
>>> --- a/kernel/crash_core.c
>>> +++ b/kernel/crash_core.c
>>> @@ -252,6 +252,26 @@ static int __init __parse_crashkernel(char
>>> *cmdline,
>>>       if (suffix)
>>>           return parse_crashkernel_suffix(ck_cmdline, crash_size,
>>>                   suffix);
>>> +
>>> +    if (strncmp(ck_cmdline, "auto", 4) == 0) {
>>> +#if defined(CONFIG_X86_64) || defined(CONFIG_S390)
>>> +        ck_cmdline = "1G-4G:160M,4G-64G:192M,64G-1T:256M,1T-:512M";
>>> +#elif defined(CONFIG_ARM64)
>>> +        ck_cmdline = "2G-:448M";
>>> +#elif defined(CONFIG_PPC64)
>>> +        char *fadump_cmdline;
>>> +
>>> +        fadump_cmdline = get_last_crashkernel(cmdline, "fadump=",
>>> NULL);
>>> +        fadump_cmdline = fadump_cmdline ?
>>> +                fadump_cmdline + strlen("fadump=") : NULL;
>>> +        if (!fadump_cmdline || (strncmp(fadump_cmdline, "off", 3) ==
>>> 0))
>>> +            ck_cmdline =
>>> "2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G";
>>> +        else
>>> +            ck_cmdline =
>>> "4G-16G:768M,16G-64G:1G,64G-128G:2G,128G-1T:4G,1T-2T:6G,2T-4T:12G,4T-8T:20G,8T-16T:36G,16T-32T:64G,32T-64T:128G,64T-:180G";
>>>
>>>
>>> +#endif
>>> +        pr_info("Using crashkernel=auto, the size chosen is a best
>>> effort estimation.\n");
>>> +    }
>>> +
>>
>> How did you even arrive at the above numbers?
>
> Memory requirements for kdump:
>
> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/supported-kdump-configurations-and-targets_managing-monitoring-and-updating-the-kernel#memory-requirements-for-kdump_supported-kdump-configurations-and-targets
>
>
> I've done some research on
>> this topic recently (ie. during the last 7 years or so). My x86_64
>> system with 8G RAM running openSUSE Leap 15.3 seems needs 188M for
>> saving to the local disk, and 203M to save over the network (using
>> SFTP). My PPC64 LPAR with 16G RAM running latest Beta of SLES 15 SP4
>> needs 587M, i.e. with the above numbers it may run out of memory while
>> saving the dump.
>>
>> Since this is not the first time, I'm trying to explain things, I've
>> written a blog post now:
>>
>> https://sigillatum.tesarici.cz/2022-01-27-whats-wrong-with-crashkernel-auto.html
>>
>>
>
> Thank you, this is useful.
>
> Thanks,
> Tiezhu
>
>>
>> HTH
>> Petr Tesarik
>
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec

2022-02-01 16:22:51

by Philipp Rudo

[permalink] [raw]
Subject: Re: [RFC PATCH] kdump: Add support for crashkernel=auto

Hi,

On Fri, 28 Jan 2022 11:31:49 +0100
Petr Tesařík <[email protected]> wrote:

> Hi Tiezhu Yang,
>
> On Jan 28, 2022 at 02:20 Tiezhu Yang wrote:
> >[...]
> > Hi Petr,
> >
> > Thank you for your reply.
> >
> > This is a RFC patch, the initial aim of this patch is to discuss what is
> > the proper way to support crashkernel=auto.
>
> Well, the point I'm trying to make is that crashkernel=auto cannot be
> implemented. Your code would have to know what happens in the future,
> and AFAIK time travel has not been discovered yet. ;-)
>
> A better approach is to make a very large allocation initially, e.g.
> half of available RAM. The remaining RAM should still be big enough to
> start booting the system. Later, when a kdump user-space service knows
> what it wants to load, it can shrink the reservation by writing a lower
> value into /sys/kernel/kexec_crash_size.

Even this approach doesn't work in every situation. For example it
requires that the system has at least twice the RAM it requires to
safely boot. That's not always given for e.g minimalistic VMs or
embedded systems.
Furthermore the memory requirement can also change during runtime due
to, e.g. workload spikes, device hot plug, moving the dump target from
an un-encrypted to an encrypted disk, etc.. So even when your user-space
program can exactly calculate the memory requirement at the moment
it loads kdump it might be too little at the moment the system panics.
In order for it to work the user-space would constantly need to monitor
how much memory is needed and adjust the requirement. But that would
also require to increase the reservation during runtime which would be
extremely expensive (if possible at all).

All in all I support Petr that time travel is the only proper solution
for implementing crashkernel=auto. But once we have time travel I
would prefer to use the gained knowledge to fix the bug that triggered
the panic rather than calculating the memory requirement for kdump.

> The alternative approach does not need any changes to the kernel, except
> maybe adding something like "crashkernel=max".

A slightly different approach is for the user-space tool to simply set
the crashkernel= parameter on the kernel commandline for the next boot.
This also works for memory restrained systems. Needs a reboot though...

> > A moment ago, I find the following patch, it is more flexible, but it is
> > not merged into the upstream kernel now.
> >
> > kernel/crash_core: Add crashkernel=auto for vmcore creation
> >
> > https://lore.kernel.org/lkml/[email protected]/

The patch was ultimately rejected by Linus

https://lore.kernel.org/linux-mm/20210507010432.IN24PudKT%[email protected]/

Thanks
Philipp

> >
> >>
> >>> [...]
> >>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> >>> index 256cf6d..32c51e2 100644
> >>> --- a/kernel/crash_core.c
> >>> +++ b/kernel/crash_core.c
> >>> @@ -252,6 +252,26 @@ static int __init __parse_crashkernel(char
> >>> *cmdline,
> >>>       if (suffix)
> >>>           return parse_crashkernel_suffix(ck_cmdline, crash_size,
> >>>                   suffix);
> >>> +
> >>> +    if (strncmp(ck_cmdline, "auto", 4) == 0) {
> >>> +#if defined(CONFIG_X86_64) || defined(CONFIG_S390)
> >>> +        ck_cmdline = "1G-4G:160M,4G-64G:192M,64G-1T:256M,1T-:512M";
> >>> +#elif defined(CONFIG_ARM64)
> >>> +        ck_cmdline = "2G-:448M";
> >>> +#elif defined(CONFIG_PPC64)
> >>> +        char *fadump_cmdline;
> >>> +
> >>> +        fadump_cmdline = get_last_crashkernel(cmdline, "fadump=",
> >>> NULL);
> >>> +        fadump_cmdline = fadump_cmdline ?
> >>> +                fadump_cmdline + strlen("fadump=") : NULL;
> >>> +        if (!fadump_cmdline || (strncmp(fadump_cmdline, "off", 3) ==
> >>> 0))
> >>> +            ck_cmdline =
> >>> "2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G";
> >>> +        else
> >>> +            ck_cmdline =
> >>> "4G-16G:768M,16G-64G:1G,64G-128G:2G,128G-1T:4G,1T-2T:6G,2T-4T:12G,4T-8T:20G,8T-16T:36G,16T-32T:64G,32T-64T:128G,64T-:180G";
> >>>
> >>>
> >>> +#endif
> >>> +        pr_info("Using crashkernel=auto, the size chosen is a best
> >>> effort estimation.\n");
> >>> +    }
> >>> +
> >>
> >> How did you even arrive at the above numbers?
> >
> > Memory requirements for kdump:
> >
> > https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/supported-kdump-configurations-and-targets_managing-monitoring-and-updating-the-kernel#memory-requirements-for-kdump_supported-kdump-configurations-and-targets
> >
> >
> > I've done some research on
> >> this topic recently (ie. during the last 7 years or so). My x86_64
> >> system with 8G RAM running openSUSE Leap 15.3 seems needs 188M for
> >> saving to the local disk, and 203M to save over the network (using
> >> SFTP). My PPC64 LPAR with 16G RAM running latest Beta of SLES 15 SP4
> >> needs 587M, i.e. with the above numbers it may run out of memory while
> >> saving the dump.
> >>
> >> Since this is not the first time, I'm trying to explain things, I've
> >> written a blog post now:
> >>
> >> https://sigillatum.tesarici.cz/2022-01-27-whats-wrong-with-crashkernel-auto.html
> >>
> >>
> >
> > Thank you, this is useful.
> >
> > Thanks,
> > Tiezhu
> >
> >>
> >> HTH
> >> Petr Tesarik
> >
> >
> > _______________________________________________
> > kexec mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/kexec
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec

2022-02-09 13:57:03

by Baoquan He

[permalink] [raw]
Subject: Re: [RFC PATCH] kdump: Add support for crashkernel=auto

Hi,

On 01/27/22 at 04:53pm, Petr Tesařík wrote:
> Hi Tiezhu Yang,
>
> I'm afraid the whole concept is broken by design. See below.
>
> Dne 27. 01. 22 v 10:31 Tiezhu Yang napsal(a):
> > Set the reserved memory automatically for the crash kernel based on
> > architecture.
> >
> > Most code of this patch come from:
> > https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-8/-/tree/c8s

This code is from RHEL8/Centos8 which Redhat has been using. It's
working pretty well, except of those big box with dozens of pcie
devices, or exceptional device driver costing very much memory.
See below patchset in i40e NIC case, it eats out 1.5G meory on
ppc64le for a NIC driver init.

[PATCH v1 0/3] Reducing memory usage of i40e for kdump
https://www.spinics.net/lists/kexec/msg26521.html

I agree asking customer to estimate the needed reservation size firstly,
then adding the value to cmdline and reboot is much easier. The thing is
it needs customer to operate. And no matter how much reservation is
needed, it mostly will be satisfied, including the above crazy i40e.
You won't get report about the exception, unless user check that
carefully each time. And you can't cover the hotplugged case.

The thing is which side is in lazy mode. Customer do the
estimation and set, developer is easy. If customer want to be easy,
developer need do more.

https://sigillatum.tesarici.cz/2022-01-27-whats-wrong-with-crashkernel-auto.html

I read Petr's above article, I have some different opinion about part of
them. Crashkernel=auto is not for upstream kernel, or customized kernel,
it is for distros. For distros, the kernel config is fixed in one
release, the kernel code, data is fixed. In rhel8, crashkernel=160M
is a default value, and works on most of cases. The kernel modules are
different, we filter out the unneeded according to user's kdump config.
We only keep the needed user space tools certainly. We have document to
note user that nfs needs more memory, some big end server need more memory.
We try to reserve a little more memory propotional to the total memory.
With these effort, we hope more people can try kdump w/o worrying about
no knowledge.

About the effort trying to get crashkernel=auto into kernel, Philipp
pasted a patch from Oracle. The kernel config way is expected but
rejected by Linus. We gave up too soon. RHEL instead make it in
userspace package. We will see, we may try again if it's not going well.


> And that's the problem, I think. The solution might be good for this
> specific OS, but not for others.
>
> > [...]
> > diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> > index 256cf6d..32c51e2 100644
> > --- a/kernel/crash_core.c
> > +++ b/kernel/crash_core.c
> > @@ -252,6 +252,26 @@ static int __init __parse_crashkernel(char *cmdline,
> > if (suffix)
> > return parse_crashkernel_suffix(ck_cmdline, crash_size,
> > suffix);
> > +
> > + if (strncmp(ck_cmdline, "auto", 4) == 0) {
> > +#if defined(CONFIG_X86_64) || defined(CONFIG_S390)
> > + ck_cmdline = "1G-4G:160M,4G-64G:192M,64G-1T:256M,1T-:512M";
> > +#elif defined(CONFIG_ARM64)
> > + ck_cmdline = "2G-:448M";
> > +#elif defined(CONFIG_PPC64)
> > + char *fadump_cmdline;
> > +
> > + fadump_cmdline = get_last_crashkernel(cmdline, "fadump=", NULL);
> > + fadump_cmdline = fadump_cmdline ?
> > + fadump_cmdline + strlen("fadump=") : NULL;
> > + if (!fadump_cmdline || (strncmp(fadump_cmdline, "off", 3) == 0))
> > + ck_cmdline = "2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G";
> > + else
> > + ck_cmdline = "4G-16G:768M,16G-64G:1G,64G-128G:2G,128G-1T:4G,1T-2T:6G,2T-4T:12G,4T-8T:20G,8T-16T:36G,16T-32T:64G,32T-64T:128G,64T-:180G";
> > +#endif
> > + pr_info("Using crashkernel=auto, the size chosen is a best effort estimation.\n");
> > + }
> > +
>
> How did you even arrive at the above numbers? I've done some research on
> this topic recently (ie. during the last 7 years or so). My x86_64 system
> with 8G RAM running openSUSE Leap 15.3 seems needs 188M for saving to the
> local disk, and 203M to save over the network (using SFTP). My PPC64 LPAR
> with 16G RAM running latest Beta of SLES 15 SP4 needs 587M, i.e. with the
> above numbers it may run out of memory while saving the dump.
>
> Since this is not the first time, I'm trying to explain things, I've written
> a blog post now:
>
> https://sigillatum.tesarici.cz/2022-01-27-whats-wrong-with-crashkernel-auto.html
>
> HTH
> Petr Tesarik
>


2022-02-15 14:33:14

by Philipp Rudo

[permalink] [raw]
Subject: Re: [RFC PATCH] kdump: Add support for crashkernel=auto

Hi Petr,

On Fri, 4 Feb 2022 06:34:19 +0100
Petr Tesařík <[email protected]> wrote:

> Hi Philipp,
>
> Dne 31. 01. 22 v 11:33 Philipp Rudo napsal(a):
> > Hi,
> >
> > On Fri, 28 Jan 2022 11:31:49 +0100
> > Petr Tesařík <[email protected]> wrote:
> >
> >> Hi Tiezhu Yang,
> >>
> >> On Jan 28, 2022 at 02:20 Tiezhu Yang wrote:
> >>> [...]
> >>> Hi Petr,
> >>>
> >>> Thank you for your reply.
> >>>
> >>> This is a RFC patch, the initial aim of this patch is to discuss what is
> >>> the proper way to support crashkernel=auto.
> >>
> >> Well, the point I'm trying to make is that crashkernel=auto cannot be
> >> implemented. Your code would have to know what happens in the future,
> >> and AFAIK time travel has not been discovered yet. ;-)
> >>
> >> A better approach is to make a very large allocation initially, e.g.
> >> half of available RAM. The remaining RAM should still be big enough to
> >> start booting the system. Later, when a kdump user-space service knows
> >> what it wants to load, it can shrink the reservation by writing a lower
> >> value into /sys/kernel/kexec_crash_size.
> >
> > Even this approach doesn't work in every situation. For example it
> > requires that the system has at least twice the RAM it requires to
> > safely boot. That's not always given for e.g minimalistic VMs or
> > embedded systems.
>
> If you reserve more RAM for the panic kernel than for running your
> actual workload, then you definitely have very special needs, and you
> should not expect that everything works out of the box.

That was basically the point I was trying to make. There is always a
scenario with special needs so that is is basically impossible to find
that one solution that works for everybody.

> > Furthermore the memory requirement can also change during runtime due
> > to, e.g. workload spikes, device hot plug, moving the dump target from
> > an un-encrypted to an encrypted disk, etc.. So even when your user-space
> > program can exactly calculate the memory requirement at the moment
> > it loads kdump it might be too little at the moment the system panics.
> > In order for it to work the user-space would constantly need to monitor
> > how much memory is needed and adjust the requirement. But that would
> > also require to increase the reservation during runtime which would be
> > extremely expensive (if possible at all).
> >
> > All in all I support Petr that time travel is the only proper solution
> > for implementing crashkernel=auto. But once we have time travel I
> > would prefer to use the gained knowledge to fix the bug that triggered
> > the panic rather than calculating the memory requirement for kdump.
>
> Yeah, long live patching! :-)
>
> >> The alternative approach does not need any changes to the kernel, except
> >> maybe adding something like "crashkernel=max".
> >
> > A slightly different approach is for the user-space tool to simply set
> > the crashkernel= parameter on the kernel commandline for the next boot.
> > This also works for memory restrained systems. Needs a reboot though...
>
> The downside is that if you remove some memory while your system is off,
> then a reservation calculate for the previous RAM size may no longer be
> possible on the next boot, and the kernel will boot up without any
> reservation. That's where "crashkernel=max" would come in handy. Let me
> send a patch and see the discussion.

True, in that situation our approach will fail. I'm looking forward to
see your patch.

Thanks
Philipp

> >>> A moment ago, I find the following patch, it is more flexible, but it is
> >>> not merged into the upstream kernel now.
> >>>
> >>> kernel/crash_core: Add crashkernel=auto for vmcore creation
> >>>
> >>> https://lore.kernel.org/lkml/[email protected]/
> >
> > The patch was ultimately rejected by Linus
> >
> > https://lore.kernel.org/linux-mm/20210507010432.IN24PudKT%[email protected]/
> >
> > Thanks
> > Philipp
> >
> >>>
> >>>>
> >>>>> [...]
> >>>>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> >>>>> index 256cf6d..32c51e2 100644
> >>>>> --- a/kernel/crash_core.c
> >>>>> +++ b/kernel/crash_core.c
> >>>>> @@ -252,6 +252,26 @@ static int __init __parse_crashkernel(char
> >>>>> *cmdline,
> >>>>>       if (suffix)
> >>>>>           return parse_crashkernel_suffix(ck_cmdline, crash_size,
> >>>>>                   suffix);
> >>>>> +
> >>>>> +    if (strncmp(ck_cmdline, "auto", 4) == 0) {
> >>>>> +#if defined(CONFIG_X86_64) || defined(CONFIG_S390)
> >>>>> +        ck_cmdline = "1G-4G:160M,4G-64G:192M,64G-1T:256M,1T-:512M";
> >>>>> +#elif defined(CONFIG_ARM64)
> >>>>> +        ck_cmdline = "2G-:448M";
> >>>>> +#elif defined(CONFIG_PPC64)
> >>>>> +        char *fadump_cmdline;
> >>>>> +
> >>>>> +        fadump_cmdline = get_last_crashkernel(cmdline, "fadump=",
> >>>>> NULL);
> >>>>> +        fadump_cmdline = fadump_cmdline ?
> >>>>> +                fadump_cmdline + strlen("fadump=") : NULL;
> >>>>> +        if (!fadump_cmdline || (strncmp(fadump_cmdline, "off", 3) ==
> >>>>> 0))
> >>>>> +            ck_cmdline =
> >>>>> "2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G";
> >>>>> +        else
> >>>>> +            ck_cmdline =
> >>>>> "4G-16G:768M,16G-64G:1G,64G-128G:2G,128G-1T:4G,1T-2T:6G,2T-4T:12G,4T-8T:20G,8T-16T:36G,16T-32T:64G,32T-64T:128G,64T-:180G";
> >>>>>
> >>>>>
> >>>>> +#endif
> >>>>> +        pr_info("Using crashkernel=auto, the size chosen is a best
> >>>>> effort estimation.\n");
> >>>>> +    }
> >>>>> +
> >>>>
> >>>> How did you even arrive at the above numbers?
> >>>
> >>> Memory requirements for kdump:
> >>>
> >>> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/supported-kdump-configurations-and-targets_managing-monitoring-and-updating-the-kernel#memory-requirements-for-kdump_supported-kdump-configurations-and-targets
> >>>
> >>>
> >>> I've done some research on
> >>>> this topic recently (ie. during the last 7 years or so). My x86_64
> >>>> system with 8G RAM running openSUSE Leap 15.3 seems needs 188M for
> >>>> saving to the local disk, and 203M to save over the network (using
> >>>> SFTP). My PPC64 LPAR with 16G RAM running latest Beta of SLES 15 SP4
> >>>> needs 587M, i.e. with the above numbers it may run out of memory while
> >>>> saving the dump.
> >>>>
> >>>> Since this is not the first time, I'm trying to explain things, I've
> >>>> written a blog post now:
> >>>>
> >>>> https://sigillatum.tesarici.cz/2022-01-27-whats-wrong-with-crashkernel-auto.html
> >>>>
> >>>>
> >>>
> >>> Thank you, this is useful.
> >>>
> >>> Thanks,
> >>> Tiezhu
> >>>
> >>>>
> >>>> HTH
> >>>> Petr Tesarik
> >>>
> >>>
> >>> _______________________________________________
> >>> kexec mailing list
> >>> [email protected]
> >>> http://lists.infradead.org/mailman/listinfo/kexec
> >>
> >> _______________________________________________
> >> kexec mailing list
> >> [email protected]
> >> http://lists.infradead.org/mailman/listinfo/kexec
> >
> >
> > _______________________________________________
> > kexec mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/kexec
>