2022-09-03 06:54:41

by Oleksandr Natalenko

[permalink] [raw]
Subject: [PATCH] core_pattern: add CPU specifier

Statistically, in a large deployment regular segfaults may indicate a CPU issue.

Currently, it is not possible to find out what CPU the segfault happened on.
There are at least two attempts to improve segfault logging with this regard,
but they do not help in case the logs rotate.

Hence, lets make sure it is possible to permanently record a CPU
the task ran on using a new core_pattern specifier.

Suggested-by: Renaud Métrich <[email protected]>
Signed-off-by: Oleksandr Natalenko <[email protected]>
---
Documentation/admin-guide/sysctl/kernel.rst | 1 +
fs/coredump.c | 5 +++++
include/linux/coredump.h | 1 +
3 files changed, 7 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 835c8844bba48..b566fff04946b 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -169,6 +169,7 @@ core_pattern
%f executable filename
%E executable path
%c maximum size of core file by resource limit RLIMIT_CORE
+ %C CPU the task ran on
%<OTHER> both are dropped
======== ==========================================

diff --git a/fs/coredump.c b/fs/coredump.c
index a8661874ac5b6..166d1f84a9b17 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -325,6 +325,10 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
err = cn_printf(cn, "%lu",
rlimit(RLIMIT_CORE));
break;
+ /* CPU the task ran on */
+ case 'C':
+ err = cn_printf(cn, "%d", cprm->cpu);
+ break;
default:
break;
}
@@ -535,6 +539,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
*/
.mm_flags = mm->flags,
.vma_meta = NULL,
+ .cpu = raw_smp_processor_id(),
};

audit_core_dumps(siginfo->si_signo);
diff --git a/include/linux/coredump.h b/include/linux/coredump.h
index 08a1d3e7e46d0..191dcf5af6cb9 100644
--- a/include/linux/coredump.h
+++ b/include/linux/coredump.h
@@ -22,6 +22,7 @@ struct coredump_params {
struct file *file;
unsigned long limit;
unsigned long mm_flags;
+ int cpu;
loff_t written;
loff_t pos;
loff_t to_skip;
--
2.37.2


2022-09-03 07:35:36

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] core_pattern: add CPU specifier

On 09/03, Oleksandr Natalenko wrote:
>
> Statistically, in a large deployment regular segfaults may indicate a CPU issue.
>
> Currently, it is not possible to find out what CPU the segfault happened on.
> There are at least two attempts to improve segfault logging with this regard,
> but they do not help in case the logs rotate.
>
> Hence, lets make sure it is possible to permanently record a CPU
> the task ran on using a new core_pattern specifier.
>
> Suggested-by: Renaud M?trich <[email protected]>
> Signed-off-by: Oleksandr Natalenko <[email protected]>
> ---
> Documentation/admin-guide/sysctl/kernel.rst | 1 +
> fs/coredump.c | 5 +++++
> include/linux/coredump.h | 1 +
> 3 files changed, 7 insertions(+)

Reviewed-by: Oleg Nesterov <[email protected]>

2022-09-04 18:59:30

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] core_pattern: add CPU specifier

On Sat, 3 Sep 2022 08:43:30 +0200 Oleksandr Natalenko <[email protected]> wrote:

> Statistically, in a large deployment regular segfaults may indicate a CPU issue.
>
> Currently, it is not possible to find out what CPU the segfault happened on.
> There are at least two attempts to improve segfault logging with this regard,
> but they do not help in case the logs rotate.
>
> Hence, lets make sure it is possible to permanently record a CPU
> the task ran on using a new core_pattern specifier.
>
> ...
>
> }
> @@ -535,6 +539,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
> */
> .mm_flags = mm->flags,
> .vma_meta = NULL,
> + .cpu = raw_smp_processor_id(),
> };

Why use the "raw_" function here?


2022-09-04 20:43:22

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] core_pattern: add CPU specifier

On 09/04, Andrew Morton wrote:
>
> On Sat, 3 Sep 2022 08:43:30 +0200 Oleksandr Natalenko <[email protected]> wrote:
>
> > @@ -535,6 +539,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
> > */
> > .mm_flags = mm->flags,
> > .vma_meta = NULL,
> > + .cpu = raw_smp_processor_id(),
> > };
>
> Why use the "raw_" function here?

To avoid check_preemption_disabled() from debug_smp_processor_id().

We do not disable preemption because this is pointless, this is the
"racy snapshot" anyway. The coredumping task can migrate to another
CPU even before get_signal/do_coredump, right after exit-to-user
path enables irqs.

Oleg.

2022-09-06 22:25:32

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] core_pattern: add CPU specifier

Oleksandr Natalenko <[email protected]> writes:

> Statistically, in a large deployment regular segfaults may indicate a CPU issue.
>
> Currently, it is not possible to find out what CPU the segfault happened on.
> There are at least two attempts to improve segfault logging with this regard,
> but they do not help in case the logs rotate.
>
> Hence, lets make sure it is possible to permanently record a CPU
> the task ran on using a new core_pattern specifier.

I am puzzled why make it part of the file name, and not part of the
core file? Say an elf note?

The big advantage is that you could always capture the cpu and
will not need to take special care configuring your system to
capture that information.

Eric

> Suggested-by: Renaud Métrich <[email protected]>
> Signed-off-by: Oleksandr Natalenko <[email protected]>
> ---
> Documentation/admin-guide/sysctl/kernel.rst | 1 +
> fs/coredump.c | 5 +++++
> include/linux/coredump.h | 1 +
> 3 files changed, 7 insertions(+)
>
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index 835c8844bba48..b566fff04946b 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -169,6 +169,7 @@ core_pattern
> %f executable filename
> %E executable path
> %c maximum size of core file by resource limit RLIMIT_CORE
> + %C CPU the task ran on
> %<OTHER> both are dropped
> ======== ==========================================
>
> diff --git a/fs/coredump.c b/fs/coredump.c
> index a8661874ac5b6..166d1f84a9b17 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -325,6 +325,10 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
> err = cn_printf(cn, "%lu",
> rlimit(RLIMIT_CORE));
> break;
> + /* CPU the task ran on */
> + case 'C':
> + err = cn_printf(cn, "%d", cprm->cpu);
> + break;
> default:
> break;
> }
> @@ -535,6 +539,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
> */
> .mm_flags = mm->flags,
> .vma_meta = NULL,
> + .cpu = raw_smp_processor_id(),
> };
>
> audit_core_dumps(siginfo->si_signo);
> diff --git a/include/linux/coredump.h b/include/linux/coredump.h
> index 08a1d3e7e46d0..191dcf5af6cb9 100644
> --- a/include/linux/coredump.h
> +++ b/include/linux/coredump.h
> @@ -22,6 +22,7 @@ struct coredump_params {
> struct file *file;
> unsigned long limit;
> unsigned long mm_flags;
> + int cpu;
> loff_t written;
> loff_t pos;
> loff_t to_skip;

2022-09-07 06:53:23

by Oleksandr Natalenko

[permalink] [raw]
Subject: Re: [PATCH] core_pattern: add CPU specifier

Hello.

On středa 7. září 2022 0:22:42 CEST Eric W. Biederman wrote:
> Oleksandr Natalenko <[email protected]> writes:
>
> > Statistically, in a large deployment regular segfaults may indicate a CPU issue.
> >
> > Currently, it is not possible to find out what CPU the segfault happened on.
> > There are at least two attempts to improve segfault logging with this regard,
> > but they do not help in case the logs rotate.
> >
> > Hence, lets make sure it is possible to permanently record a CPU
> > the task ran on using a new core_pattern specifier.
>
> I am puzzled why make it part of the file name, and not part of the
> core file? Say an elf note?

This might be a good idea too, and one approach doesn't exclude the other one.

> The big advantage is that you could always capture the cpu and
> will not need to take special care configuring your system to
> capture that information.

The advantage of having CPU recorded in the file name is that in case of multiple cores one can summarise them with a simple ls+grep without invoking a fully-featured debugger to find out whether the segfaults happened on the same CPU.

Thanks.

> Eric
>
> > Suggested-by: Renaud Métrich <[email protected]>
> > Signed-off-by: Oleksandr Natalenko <[email protected]>
> > ---
> > Documentation/admin-guide/sysctl/kernel.rst | 1 +
> > fs/coredump.c | 5 +++++
> > include/linux/coredump.h | 1 +
> > 3 files changed, 7 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> > index 835c8844bba48..b566fff04946b 100644
> > --- a/Documentation/admin-guide/sysctl/kernel.rst
> > +++ b/Documentation/admin-guide/sysctl/kernel.rst
> > @@ -169,6 +169,7 @@ core_pattern
> > %f executable filename
> > %E executable path
> > %c maximum size of core file by resource limit RLIMIT_CORE
> > + %C CPU the task ran on
> > %<OTHER> both are dropped
> > ======== ==========================================
> >
> > diff --git a/fs/coredump.c b/fs/coredump.c
> > index a8661874ac5b6..166d1f84a9b17 100644
> > --- a/fs/coredump.c
> > +++ b/fs/coredump.c
> > @@ -325,6 +325,10 @@ static int format_corename(struct core_name *cn, struct coredump_params *cprm,
> > err = cn_printf(cn, "%lu",
> > rlimit(RLIMIT_CORE));
> > break;
> > + /* CPU the task ran on */
> > + case 'C':
> > + err = cn_printf(cn, "%d", cprm->cpu);
> > + break;
> > default:
> > break;
> > }
> > @@ -535,6 +539,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
> > */
> > .mm_flags = mm->flags,
> > .vma_meta = NULL,
> > + .cpu = raw_smp_processor_id(),
> > };
> >
> > audit_core_dumps(siginfo->si_signo);
> > diff --git a/include/linux/coredump.h b/include/linux/coredump.h
> > index 08a1d3e7e46d0..191dcf5af6cb9 100644
> > --- a/include/linux/coredump.h
> > +++ b/include/linux/coredump.h
> > @@ -22,6 +22,7 @@ struct coredump_params {
> > struct file *file;
> > unsigned long limit;
> > unsigned long mm_flags;
> > + int cpu;
> > loff_t written;
> > loff_t pos;
> > loff_t to_skip;

--
Oleksandr Natalenko (post-factum)
Principal Software Maintenance Engineer


2022-09-07 16:41:48

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH] core_pattern: add CPU specifier

On Sat, Sep 03, 2022 at 08:43:30AM +0200, Oleksandr Natalenko wrote:
> Statistically, in a large deployment regular segfaults may indicate a CPU issue.

Can you elaborate on this? How common is this observed to be true? Are
there any public findings or bugs where it showed this?

Luis

2022-09-07 18:24:31

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] core_pattern: add CPU specifier

On 09/07, Oleksandr Natalenko wrote:
>
> The advantage of having CPU recorded in the file name is that
> in case of multiple cores one can summarise them with a simple
> ls+grep without invoking a fully-featured debugger to find out
> whether the segfaults happened on the same CPU.

Besides, if you only need to gather the statistics about the faulting
CPU(s), you do not even need to actually dump the the core. For example,
something like

#!/usr/bin/sh

echo $* >> path/to/coredump-stat.txt

and
echo '| path-to-script-above %C' >/proc/sys/kernel/core_pattern

can help.

Oleg.

2022-09-07 22:28:22

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] core_pattern: add CPU specifier

Oleg Nesterov <[email protected]> writes:

> On 09/07, Oleksandr Natalenko wrote:
>>
>> The advantage of having CPU recorded in the file name is that
>> in case of multiple cores one can summarise them with a simple
>> ls+grep without invoking a fully-featured debugger to find out
>> whether the segfaults happened on the same CPU.
>
> Besides, if you only need to gather the statistics about the faulting
> CPU(s), you do not even need to actually dump the the core. For example,
> something like
>
> #!/usr/bin/sh
>
> echo $* >> path/to/coredump-stat.txt
>
> and
> echo '| path-to-script-above %C' >/proc/sys/kernel/core_pattern
>
> can help.

So I am confused. I thought someone had modified print_fatal_signal
to print this information. Looking at the code now I don't see it,
but perhaps that is in linux-next somewhere.

That would seem to be the really obvious place to put this and much
closer to the original fault so we ware more likely to record the
cpu on which things actually happened on.

If we don't care about the core dump just getting the information in
syslog where it can be analyzed seems like the thing to do.

For a developers box putting it in core pattern makes sense, isn't a
hinderance to use. For anyone else's box the information needs to come
out in a way that allows automated tools to look for a pattern.
Requiring someone to take an extra step to print the information seems
a hinderance to automated tools doing the looking.

Eric

2022-09-08 06:35:21

by Oleksandr Natalenko

[permalink] [raw]
Subject: Re: [PATCH] core_pattern: add CPU specifier

Hello.

On čtvrtek 8. září 2022 0:00:43 CEST Eric W. Biederman wrote:
> Oleg Nesterov <[email protected]> writes:
>
> > On 09/07, Oleksandr Natalenko wrote:
> >>
> >> The advantage of having CPU recorded in the file name is that
> >> in case of multiple cores one can summarise them with a simple
> >> ls+grep without invoking a fully-featured debugger to find out
> >> whether the segfaults happened on the same CPU.
> >
> > Besides, if you only need to gather the statistics about the faulting
> > CPU(s), you do not even need to actually dump the the core. For example,
> > something like
> >
> > #!/usr/bin/sh
> >
> > echo $* >> path/to/coredump-stat.txt
> >
> > and
> > echo '| path-to-script-above %C' >/proc/sys/kernel/core_pattern
> >
> > can help.
>
> So I am confused. I thought someone had modified print_fatal_signal
> to print this information. Looking at the code now I don't see it,
> but perhaps that is in linux-next somewhere.

That's a different story that gets solved here: [1] [2].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/cpu&id=c926087eb38520b268515ae1a842db6db62554cc
[2] https://lore.kernel.org/lkml/[email protected]/

> That would seem to be the really obvious place to put this and much
> closer to the original fault so we ware more likely to record the
> cpu on which things actually happened on.
>
> If we don't care about the core dump just getting the information in
> syslog where it can be analyzed seems like the thing to do.
>
> For a developers box putting it in core pattern makes sense, isn't a
> hinderance to use. For anyone else's box the information needs to come
> out in a way that allows automated tools to look for a pattern.
> Requiring someone to take an extra step to print the information seems
> a hinderance to automated tools doing the looking.
>
> Eric
>
>


--
Oleksandr Natalenko (post-factum)
Principal Software Maintenance Engineer


2022-09-08 06:48:25

by Renaud Métrich

[permalink] [raw]
Subject: Re: [PATCH] core_pattern: add CPU specifier

Hello,

I have been working closely with Oleksandr on a couple of cases where
customers could see segfaults for various processes, including basic
tools ("grep", "cut", etc.) that usually don't die.

The coredumps showed of course nothing because from userland's
perspective there was nothing wrong, but just a bad pointer which
couldn't be explained.

Memory testing (e.g. Memtest86+) and CPU testing (usually from hardware
vendor) never showed any issue with the hardware as well, even though
there was, probably because it required special conditions, such as
specific load and/or thermal conditions.

The troubleshooting of such cases takes several weeks or even months,
until we have enough evidence it's not the OS that is faulty, and it's
always struggling.

Usually when we start getting kernel crashes, we are then happy because
kernel crashes indicate the CPU the task was running on, and it seems to
always be reliable enough information to point to faulty CPU. For other
cases where no kernel crash could be observed, these are solved after
requesting the customer to replace the hardware components, which is
something difficult to explain since it usually costs the customer money
and time.

I hope such feature will be helpful for everybody doing Linux support.

Renaud.

Le 9/7/22 à 17:53, Luis Chamberlain a écrit :
> On Sat, Sep 03, 2022 at 08:43:30AM +0200, Oleksandr Natalenko wrote:
>> Statistically, in a large deployment regular segfaults may indicate a CPU issue.
> Can you elaborate on this? How common is this observed to be true? Are
> there any public findings or bugs where it showed this?
>
> Luis
>


Attachments:
OpenPGP_signature (855.00 B)
OpenPGP digital signature