2022-08-05 14:44:52

by Rik van Riel

[permalink] [raw]
Subject: [PATCH v3] x86,mm: print likely CPU at segfault time

In a large enough fleet of computers, it is common to have a few bad CPUs.
Those can often be identified by seeing that some commonly run kernel code,
which runs fine everywhere else, keeps crashing on the same CPU core on one
particular bad system.

However, the failure modes in CPUs that have gone bad over the years are
often oddly specific, and the only bad behavior seen might be segfaults
in programs like bash, python, or various system daemons that run fine
everywhere else.

Add a printk() to show_signal_msg() to print the CPU, core, and socket
at segfault time. This is not perfect, since the task might get rescheduled
on another CPU between when the fault hit, and when the message is printed,
but in practice this has been good enough to help us identify several bad
CPU cores.

segfault[1349]: segfault at 0 ip 000000000040113a sp 00007ffc6d32e360 error 4 in segfault[401000+1000] on CPU 0 (core 0, socket 0)

This printk can be controlled through /proc/sys/debug/exception-trace

Signed-off-by: Rik van Riel <[email protected]>
CC: Dave Jones <[email protected]>
---
v3: READ_ONCE around raw_smp_processor_id() does not work, lets just omit that
instead of making the code harder to read

arch/x86/mm/fault.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fad8faa29d04..c7a5bbf40367 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -769,6 +769,8 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
unsigned long address, struct task_struct *tsk)
{
const char *loglvl = task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG;
+ /* This is a racy snapshot, but it's better than nothing. */
+ int cpu = raw_smp_processor_id();

if (!unhandled_signal(tsk, SIGSEGV))
return;
@@ -782,6 +784,14 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,

print_vma_addr(KERN_CONT " in ", regs->ip);

+ /*
+ * Dump the likely CPU where the fatal segfault happened.
+ * This can help identify faulty hardware.
+ */
+ printk(KERN_CONT " on CPU %d (core %d, socket %d)", cpu,
+ topology_core_id(cpu), topology_physical_package_id(cpu));
+
+
printk(KERN_CONT "\n");

show_opcodes(regs, loglvl);
--
2.37.1




2022-08-05 14:48:32

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3] x86,mm: print likely CPU at segfault time

On Fri, Aug 05, 2022 at 10:16:44AM -0400, Rik van Riel wrote:
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index fad8faa29d04..c7a5bbf40367 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -769,6 +769,8 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
> unsigned long address, struct task_struct *tsk)
> {
> const char *loglvl = task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG;
> + /* This is a racy snapshot, but it's better than nothing. */
> + int cpu = raw_smp_processor_id();

Please read this in exc_page_fault() and hand it down to helpers.

Alternatively, I'm being told there's a patchset in the works which
will allow for any exception handler to pass in additional information
downwards through an extended pt_regs. Then, saving the CPU number on
which the handler is running would work generically everywhere.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-08-05 14:49:24

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH v3] x86,mm: print likely CPU at segfault time

On Fri, 5 Aug 2022 16:27:40 +0200
Borislav Petkov <[email protected]> wrote:

> On Fri, Aug 05, 2022 at 10:16:44AM -0400, Rik van Riel wrote:
> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > index fad8faa29d04..c7a5bbf40367 100644
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -769,6 +769,8 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
> > unsigned long address, struct task_struct *tsk)
> > {
> > const char *loglvl = task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG;
> > + /* This is a racy snapshot, but it's better than nothing. */
> > + int cpu = raw_smp_processor_id();
>
> Please read this in exc_page_fault() and hand it down to helpers.

Below is the change that implements your suggestion.

If there is consensus among the x86 maintainers that this is
desirable, I am more than happy to merge that change into my
patch and resubmit v4.

I don't have a strong opinion either way.

---8<---

From 444f8588f0edfd8586a86e85191ad8fa8b7c6a6c Mon Sep 17 00:00:00 2001
From: Rik van Riel <[email protected]>
Date: Fri, 5 Aug 2022 10:32:11 -0400
Subject: [PATCH 2/2] x86,mm: get CPU number for segfault printk before
enabling preemption

Get the CPU number for the segfault printk earlier in the page fault
handler, before preemption is enabled.

Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Rik van Riel <[email protected]>
---
arch/x86/mm/fault.c | 58 +++++++++++++++++++++++----------------------
1 file changed, 30 insertions(+), 28 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index c7a5bbf40367..bd06b22826b2 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -766,11 +766,9 @@ kernelmode_fixup_or_oops(struct pt_regs *regs, unsigned long error_code,
*/
static inline void
show_signal_msg(struct pt_regs *regs, unsigned long error_code,
- unsigned long address, struct task_struct *tsk)
+ unsigned long address, struct task_struct *tsk, int cpu)
{
const char *loglvl = task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG;
- /* This is a racy snapshot, but it's better than nothing. */
- int cpu = raw_smp_processor_id();

if (!unhandled_signal(tsk, SIGSEGV))
return;
@@ -808,7 +806,7 @@ static bool is_vsyscall_vaddr(unsigned long vaddr)

static void
__bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
- unsigned long address, u32 pkey, int si_code)
+ unsigned long address, u32 pkey, int si_code, int cpu)
{
struct task_struct *tsk = current;

@@ -846,7 +844,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
return;

if (likely(show_unhandled_signals))
- show_signal_msg(regs, error_code, address, tsk);
+ show_signal_msg(regs, error_code, address, tsk, cpu);

set_signal_archinfo(address, error_code);

@@ -860,14 +858,14 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,

static noinline void
bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
- unsigned long address)
+ unsigned long address, int cpu)
{
- __bad_area_nosemaphore(regs, error_code, address, 0, SEGV_MAPERR);
+ __bad_area_nosemaphore(regs, error_code, address, 0, SEGV_MAPERR, cpu);
}

static void
__bad_area(struct pt_regs *regs, unsigned long error_code,
- unsigned long address, u32 pkey, int si_code)
+ unsigned long address, u32 pkey, int si_code, int cpu)
{
struct mm_struct *mm = current->mm;
/*
@@ -876,13 +874,14 @@ __bad_area(struct pt_regs *regs, unsigned long error_code,
*/
mmap_read_unlock(mm);

- __bad_area_nosemaphore(regs, error_code, address, pkey, si_code);
+ __bad_area_nosemaphore(regs, error_code, address, pkey, si_code, cpu);
}

static noinline void
-bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address)
+bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address,
+ int cpu)
{
- __bad_area(regs, error_code, address, 0, SEGV_MAPERR);
+ __bad_area(regs, error_code, address, 0, SEGV_MAPERR, cpu);
}

static inline bool bad_area_access_from_pkeys(unsigned long error_code,
@@ -904,7 +903,8 @@ static inline bool bad_area_access_from_pkeys(unsigned long error_code,

static noinline void
bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
- unsigned long address, struct vm_area_struct *vma)
+ unsigned long address, struct vm_area_struct *vma,
+ int cpu)
{
/*
* This OSPKE check is not strictly necessary at runtime.
@@ -934,9 +934,9 @@ bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
*/
u32 pkey = vma_pkey(vma);

- __bad_area(regs, error_code, address, pkey, SEGV_PKUERR);
+ __bad_area(regs, error_code, address, pkey, SEGV_PKUERR, cpu);
} else {
- __bad_area(regs, error_code, address, 0, SEGV_ACCERR);
+ __bad_area(regs, error_code, address, 0, SEGV_ACCERR, cpu);
}
}

@@ -1155,7 +1155,7 @@ bool fault_in_kernel_space(unsigned long address)
*/
static void
do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
- unsigned long address)
+ unsigned long address, int cpu)
{
/*
* Protection keys exceptions only happen on user pages. We
@@ -1214,7 +1214,7 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
* Don't take the mm semaphore here. If we fixup a prefetch
* fault we could otherwise deadlock:
*/
- bad_area_nosemaphore(regs, hw_error_code, address);
+ bad_area_nosemaphore(regs, hw_error_code, address, cpu);
}
NOKPROBE_SYMBOL(do_kern_addr_fault);

@@ -1229,7 +1229,8 @@ NOKPROBE_SYMBOL(do_kern_addr_fault);
static inline
void do_user_addr_fault(struct pt_regs *regs,
unsigned long error_code,
- unsigned long address)
+ unsigned long address,
+ int cpu)
{
struct vm_area_struct *vma;
struct task_struct *tsk;
@@ -1289,7 +1290,7 @@ void do_user_addr_fault(struct pt_regs *regs,
* in a region with pagefaults disabled then we must not take the fault
*/
if (unlikely(faulthandler_disabled() || !mm)) {
- bad_area_nosemaphore(regs, error_code, address);
+ bad_area_nosemaphore(regs, error_code, address, cpu);
return;
}

@@ -1351,7 +1352,7 @@ void do_user_addr_fault(struct pt_regs *regs,
* Fault from code in kernel from
* which we do not expect faults.
*/
- bad_area_nosemaphore(regs, error_code, address);
+ bad_area_nosemaphore(regs, error_code, address, cpu);
return;
}
retry:
@@ -1367,17 +1368,17 @@ void do_user_addr_fault(struct pt_regs *regs,

vma = find_vma(mm, address);
if (unlikely(!vma)) {
- bad_area(regs, error_code, address);
+ bad_area(regs, error_code, address, cpu);
return;
}
if (likely(vma->vm_start <= address))
goto good_area;
if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) {
- bad_area(regs, error_code, address);
+ bad_area(regs, error_code, address, cpu);
return;
}
if (unlikely(expand_stack(vma, address))) {
- bad_area(regs, error_code, address);
+ bad_area(regs, error_code, address, cpu);
return;
}

@@ -1387,7 +1388,7 @@ void do_user_addr_fault(struct pt_regs *regs,
*/
good_area:
if (unlikely(access_error(error_code, vma))) {
- bad_area_access_error(regs, error_code, address, vma);
+ bad_area_access_error(regs, error_code, address, vma, cpu);
return;
}

@@ -1458,7 +1459,7 @@ void do_user_addr_fault(struct pt_regs *regs,
VM_FAULT_HWPOISON_LARGE))
do_sigbus(regs, error_code, address, fault);
else if (fault & VM_FAULT_SIGSEGV)
- bad_area_nosemaphore(regs, error_code, address);
+ bad_area_nosemaphore(regs, error_code, address, cpu);
else
BUG();
}
@@ -1480,7 +1481,7 @@ trace_page_fault_entries(struct pt_regs *regs, unsigned long error_code,

static __always_inline void
handle_page_fault(struct pt_regs *regs, unsigned long error_code,
- unsigned long address)
+ unsigned long address, int cpu)
{
trace_page_fault_entries(regs, error_code, address);

@@ -1489,9 +1490,9 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,

/* Was the fault on kernel-controlled part of the address space? */
if (unlikely(fault_in_kernel_space(address))) {
- do_kern_addr_fault(regs, error_code, address);
+ do_kern_addr_fault(regs, error_code, address, cpu);
} else {
- do_user_addr_fault(regs, error_code, address);
+ do_user_addr_fault(regs, error_code, address, cpu);
/*
* User address page fault handling might have reenabled
* interrupts. Fixing up all potential exit points of
@@ -1506,6 +1507,7 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,
DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
{
unsigned long address = read_cr2();
+ int cpu = raw_smp_processor_id();
irqentry_state_t state;

prefetchw(&current->mm->mmap_lock);
@@ -1547,7 +1549,7 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
state = irqentry_enter(regs);

instrumentation_begin();
- handle_page_fault(regs, error_code, address);
+ handle_page_fault(regs, error_code, address, cpu);
instrumentation_end();

irqentry_exit(regs, state);
--
2.37.1


2022-08-05 17:29:46

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH v3] x86,mm: print likely CPU at segfault time

On Fri, Aug 05, 2022 at 04:27:40PM +0200, Borislav Petkov wrote:
> On Fri, Aug 05, 2022 at 10:16:44AM -0400, Rik van Riel wrote:
> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > index fad8faa29d04..c7a5bbf40367 100644
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -769,6 +769,8 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
> > unsigned long address, struct task_struct *tsk)
> > {
> > const char *loglvl = task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG;
> > + /* This is a racy snapshot, but it's better than nothing. */
> > + int cpu = raw_smp_processor_id();
>
> Please read this in exc_page_fault() and hand it down to helpers.
>
> Alternatively, I'm being told there's a patchset in the works which
> will allow for any exception handler to pass in additional information
> downwards through an extended pt_regs. Then, saving the CPU number on
> which the handler is running would work generically everywhere.

Indeed that was part of the PKS series.[1]

I've thrown together a quick RFC with the relevant patches from that series and
Rik's code in show_signal_msg().

I'll post it shortly,
Ira

[1] https://lore.kernel.org/lkml/[email protected]/

>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>

2022-08-06 09:02:21

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v3] x86,mm: print likely CPU at segfault time


* Ingo Molnar <[email protected]> wrote:

> > instrumentation_begin();
> > - handle_page_fault(regs, error_code, address);
> > + handle_page_fault(regs, error_code, address, cpu);
>
> Not convinced that this is a good change: this will bloat all the
> affected code by a couple of dozen instructions - for no good reason in
> the context of this patch.
>
> Boris, why should we do this? Extracting a parameter at higher levels and
> passing it down to lower levels is almost always a bad idea from a code
> generation POV, unless the majority of lower levels needs this
> information anyway (which isn't the case here).

Oh, I just got to this series in my mbox:

[RFC PATCH 0/5] Print CPU at segfault time
...
[RFC PATCH 5/5] x86/entry: Store CPU info on exception entry

With that basis, printing the segfault CPU becomes a 'free' feature.

At the cost of putting ~2 new instructions into the hotpath of every
exception though. :-/

Thanks,

Ingo

2022-08-06 09:14:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v3] x86,mm: print likely CPU at segfault time


* Rik van Riel <[email protected]> wrote:

> On Fri, 5 Aug 2022 16:27:40 +0200
> Borislav Petkov <[email protected]> wrote:
>
> > On Fri, Aug 05, 2022 at 10:16:44AM -0400, Rik van Riel wrote:
> > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > > index fad8faa29d04..c7a5bbf40367 100644
> > > --- a/arch/x86/mm/fault.c
> > > +++ b/arch/x86/mm/fault.c
> > > @@ -769,6 +769,8 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
> > > unsigned long address, struct task_struct *tsk)
> > > {
> > > const char *loglvl = task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG;
> > > + /* This is a racy snapshot, but it's better than nothing. */
> > > + int cpu = raw_smp_processor_id();
> >
> > Please read this in exc_page_fault() and hand it down to helpers.
>
> Below is the change that implements your suggestion.
>
> If there is consensus among the x86 maintainers that this is
> desirable, I am more than happy to merge that change into my
> patch and resubmit v4.
>
> I don't have a strong opinion either way.
>
> ---8<---
>
> From 444f8588f0edfd8586a86e85191ad8fa8b7c6a6c Mon Sep 17 00:00:00 2001
> From: Rik van Riel <[email protected]>
> Date: Fri, 5 Aug 2022 10:32:11 -0400
> Subject: [PATCH 2/2] x86,mm: get CPU number for segfault printk before
> enabling preemption
>
> Get the CPU number for the segfault printk earlier in the page fault
> handler, before preemption is enabled.
>
> Suggested-by: Borislav Petkov <[email protected]>
> Signed-off-by: Rik van Riel <[email protected]>
> ---
> arch/x86/mm/fault.c | 58 +++++++++++++++++++++++----------------------
> 1 file changed, 30 insertions(+), 28 deletions(-)
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index c7a5bbf40367..bd06b22826b2 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -766,11 +766,9 @@ kernelmode_fixup_or_oops(struct pt_regs *regs, unsigned long error_code,
> */
> static inline void
> show_signal_msg(struct pt_regs *regs, unsigned long error_code,
> - unsigned long address, struct task_struct *tsk)
> + unsigned long address, struct task_struct *tsk, int cpu)

> __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
> - unsigned long address, u32 pkey, int si_code)
> + unsigned long address, u32 pkey, int si_code, int cpu)

> - show_signal_msg(regs, error_code, address, tsk);
> + show_signal_msg(regs, error_code, address, tsk, cpu);

> - unsigned long address)
> + unsigned long address, int cpu)

> - unsigned long address, u32 pkey, int si_code)
> + unsigned long address, u32 pkey, int si_code, int cpu)

> - __bad_area_nosemaphore(regs, error_code, address, pkey, si_code);
> + __bad_area_nosemaphore(regs, error_code, address, pkey, si_code, cpu);

> -bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address)
> +bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address,
> + int cpu)
> {
> - __bad_area(regs, error_code, address, 0, SEGV_MAPERR);
> + __bad_area(regs, error_code, address, 0, SEGV_MAPERR, cpu);

> bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
> - unsigned long address, struct vm_area_struct *vma)
> + unsigned long address, struct vm_area_struct *vma,
> + int cpu)

> - __bad_area(regs, error_code, address, pkey, SEGV_PKUERR);
> + __bad_area(regs, error_code, address, pkey, SEGV_PKUERR, cpu);
> } else {
> - __bad_area(regs, error_code, address, 0, SEGV_ACCERR);
> + __bad_area(regs, error_code, address, 0, SEGV_ACCERR, cpu);

> static void
> do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
> - unsigned long address)
> + unsigned long address, int cpu)

> - bad_area_nosemaphore(regs, hw_error_code, address);
> + bad_area_nosemaphore(regs, hw_error_code, address, cpu);

> void do_user_addr_fault(struct pt_regs *regs,
> unsigned long error_code,
> - unsigned long address)
> + unsigned long address,
> + int cpu)

> - bad_area_nosemaphore(regs, error_code, address);
> + bad_area_nosemaphore(regs, error_code, address, cpu);

> - bad_area_nosemaphore(regs, error_code, address);
> + bad_area_nosemaphore(regs, error_code, address, cpu);

> - bad_area(regs, error_code, address);
> + bad_area(regs, error_code, address, cpu);

> - bad_area(regs, error_code, address);
> + bad_area(regs, error_code, address, cpu);

> - bad_area(regs, error_code, address);
> + bad_area(regs, error_code, address, cpu);

> - bad_area_access_error(regs, error_code, address, vma);
> + bad_area_access_error(regs, error_code, address, vma, cpu);

> - bad_area_nosemaphore(regs, error_code, address);
> + bad_area_nosemaphore(regs, error_code, address, cpu);

> - unsigned long address)
> + unsigned long address, int cpu)

> - do_kern_addr_fault(regs, error_code, address);
> + do_kern_addr_fault(regs, error_code, address, cpu);

> - do_user_addr_fault(regs, error_code, address);
> + do_user_addr_fault(regs, error_code, address, cpu);

> DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
> {
> unsigned long address = read_cr2();
> + int cpu = raw_smp_processor_id();
> irqentry_state_t state;
>
> prefetchw(&current->mm->mmap_lock);
> @@ -1547,7 +1549,7 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
> state = irqentry_enter(regs);
>
> instrumentation_begin();
> - handle_page_fault(regs, error_code, address);
> + handle_page_fault(regs, error_code, address, cpu);

Not convinced that this is a good change: this will bloat all the affected
code by a couple of dozen instructions - for no good reason in the context
of this patch.

Boris, why should we do this? Extracting a parameter at higher levels and
passing it down to lower levels is almost always a bad idea from a code
generation POV, unless the majority of lower levels needs this information
anyway (which isn't the case here).

Thanks,

Ingo

2022-08-24 11:21:10

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: x86/cpu] x86/mm: Print likely CPU at segfault time

The following commit has been merged into the x86/cpu branch of tip:

Commit-ID: c926087eb38520b268515ae1a842db6db62554cc
Gitweb: https://git.kernel.org/tip/c926087eb38520b268515ae1a842db6db62554cc
Author: Rik van Riel <[email protected]>
AuthorDate: Fri, 05 Aug 2022 10:16:44 -04:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Wed, 24 Aug 2022 12:48:05 +02:00

x86/mm: Print likely CPU at segfault time

In a large enough fleet of computers, it is common to have a few bad CPUs.
Those can often be identified by seeing that some commonly run kernel code,
which runs fine everywhere else, keeps crashing on the same CPU core on one
particular bad system.

However, the failure modes in CPUs that have gone bad over the years are
often oddly specific, and the only bad behavior seen might be segfaults
in programs like bash, python, or various system daemons that run fine
everywhere else.

Add a printk() to show_signal_msg() to print the CPU, core, and socket
at segfault time.

This is not perfect, since the task might get rescheduled on another
CPU between when the fault hit, and when the message is printed, but in
practice this has been good enough to help people identify several bad
CPU cores.

For example:

segfault[1349]: segfault at 0 ip 000000000040113a sp 00007ffc6d32e360 error 4 in \
segfault[401000+1000] likely on CPU 0 (core 0, socket 0)

This printk can be controlled through /proc/sys/debug/exception-trace.

[ bp: Massage a bit, add "likely" to the printed line to denote that
the CPU number is not always reliable. ]

Signed-off-by: Rik van Riel <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/mm/fault.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fa71a5d..a498ae1 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -769,6 +769,8 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
unsigned long address, struct task_struct *tsk)
{
const char *loglvl = task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG;
+ /* This is a racy snapshot, but it's better than nothing. */
+ int cpu = raw_smp_processor_id();

if (!unhandled_signal(tsk, SIGSEGV))
return;
@@ -782,6 +784,14 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,

print_vma_addr(KERN_CONT " in ", regs->ip);

+ /*
+ * Dump the likely CPU where the fatal segfault happened.
+ * This can help identify faulty hardware.
+ */
+ printk(KERN_CONT " likely on CPU %d (core %d, socket %d)", cpu,
+ topology_core_id(cpu), topology_physical_package_id(cpu));
+
+
printk(KERN_CONT "\n");

show_opcodes(regs, loglvl);