2015-04-30 13:36:06

by George Beshers

[permalink] [raw]
Subject: [PATCH 1/2] UV: NMI: insert per_cpu accessor function on uv_hub_nmi.

UV: NMI: insert this_cpu_read accessor function on uv_hub_nmi.

UV NMI was accidently broken by this patch.

commit e16321709c8270f9803bbfdb51e5e02235078c7f
Author: Christoph Lameter <[email protected]>
Date: Sun Aug 17 12:30:41 2014 -0500

This patch insert this_cpu_read() on when accessing the
PER_CPU uv_cpu_nmi variable.

Signed-off-by: George Beshers <[email protected]>
Acked-by: Mike Travis <[email protected]>
Cc: Russ Anderson <[email protected]>
Cc: Dimitri Sivanich <[email protected]>
Cc: Hedi Berriche <[email protected]>
Cc: Alex Thorlton <[email protected]>
Cc: Christoph Lameter <[email protected]>

diff --git a/arch/x86/include/asm/uv/uv_hub.h b/arch/x86/include/asm/uv/uv_hub.h
index a00ad8f..ea707478 100644
--- a/arch/x86/include/asm/uv/uv_hub.h
+++ b/arch/x86/include/asm/uv/uv_hub.h
@@ -609,7 +609,7 @@ struct uv_cpu_nmi_s {

DECLARE_PER_CPU(struct uv_cpu_nmi_s, uv_cpu_nmi);

-#define uv_hub_nmi (uv_cpu_nmi.hub)
+#define uv_hub_nmi this_cpu_read(uv_cpu_nmi.hub)
#define uv_cpu_nmi_per(cpu) (per_cpu(uv_cpu_nmi, cpu))
#define uv_hub_nmi_per(cpu) (uv_cpu_nmi_per(cpu).hub)


2015-04-30 13:36:04

by George Beshers

[permalink] [raw]
Subject: [PATCH 2/2] UV: NMI: simple dump failover if kdump fails

UV: NMI: simple dump failover if kdump fails

The ability to trigger a kdump using the system NMI command
was added by

commit 12ba6c990fab50fe568f3ad8715e81e356552428
Author: Mike Travis <[email protected]>
Date: Mon Sep 23 16:25:03 2013 -0500

When kdump is works it is preferable to the set of backtraces
that dump provides; however a number of things can go wrong and
the backtraces are much more useful than nothing.

The two most common reason for kdump not to be available are
a problem during boot or the kdump daemon fails to start.
In either case the call to crash_kexec() returns unexpectedly;
when this happens uv_nmi_kdump() also returns with the
uv_nmi_kexec_failed flag set. This condition now causes a
standard dump.

One other minor change is that dump now generates both the
show_regs() stack trace and the uv_nmi_dump_ip{,_hdr} information
that is generated by the "ips" action; the additional information
has proved to be useful.

Signed-off-by: George Beshers <[email protected]>
Acked-by: Mike Travis <[email protected]>
Cc: Russ Anderson <[email protected]>
Cc: Dimitri Sivanich <[email protected]>
Cc: Hedi Berriche <[email protected]>
Cc: Alex Thorlton <[email protected]>
Cc: Christoph Lameter <[email protected]>

diff --git a/arch/x86/platform/uv/uv_nmi.c b/arch/x86/platform/uv/uv_nmi.c
index 7488caf..89f37c7 100644
--- a/arch/x86/platform/uv/uv_nmi.c
+++ b/arch/x86/platform/uv/uv_nmi.c
@@ -391,23 +391,27 @@ static void uv_nmi_dump_cpu_ip(int cpu, struct pt_regs *regs)
printk_address(regs->ip);
}

-/* Dump this cpu's state */
+/*
+ * Dump this cpu's state. Note that "kdump" only happens
+ * when crash_kexec() has failed and we are providing the user
+ * a standard dump instead.
+ */
static void uv_nmi_dump_state_cpu(int cpu, struct pt_regs *regs)
{
const char *dots = " ................................. ";

- if (uv_nmi_action_is("ips")) {
- if (cpu == 0)
- uv_nmi_dump_cpu_ip_hdr();
-
- if (current->pid != 0)
- uv_nmi_dump_cpu_ip(cpu, regs);
-
- } else if (uv_nmi_action_is("dump")) {
+ if (uv_nmi_action_is("dump") || uv_nmi_action_is("kdump")) {
printk(KERN_DEFAULT
"UV:%sNMI process trace for CPU %d\n", dots, cpu);
show_regs(regs);
}
+
+ if (cpu == 0)
+ uv_nmi_dump_cpu_ip_hdr();
+
+ if (current->pid != 0)
+ uv_nmi_dump_cpu_ip(cpu, regs);
+
this_cpu_write(uv_cpu_nmi.state, UV_NMI_STATE_DUMP_DONE);
}

@@ -492,8 +496,9 @@ static void uv_nmi_touch_watchdogs(void)
touch_nmi_watchdog();
}

-#if defined(CONFIG_KEXEC)
static atomic_t uv_nmi_kexec_failed;
+
+#if defined(CONFIG_KEXEC)
static void uv_nmi_kdump(int cpu, int master, struct pt_regs *regs)
{
/* Call crash to dump system state */
@@ -502,9 +507,9 @@ static void uv_nmi_kdump(int cpu, int master, struct pt_regs *regs)
crash_kexec(regs);

pr_emerg("UV: crash_kexec unexpectedly returned, ");
+ atomic_set(&uv_nmi_kexec_failed, 1);
if (!kexec_crash_image) {
pr_cont("crash kernel not loaded\n");
- atomic_set(&uv_nmi_kexec_failed, 1);
uv_nmi_sync_exit(1);
return;
}
@@ -524,6 +529,7 @@ static inline void uv_nmi_kdump(int cpu, int master, struct pt_regs *regs)
{
if (master)
pr_err("UV: NMI kdump: KEXEC not supported in this kernel\n");
+ atomic_set(&uv_nmi_kexec_failed, 1);
}
#endif /* !CONFIG_KEXEC */

@@ -620,7 +626,8 @@ int uv_handle_nmi(unsigned int reason, struct pt_regs *regs)
uv_nmi_wait(master);

/* Dump state of each cpu */
- if (uv_nmi_action_is("ips") || uv_nmi_action_is("dump"))
+ if (uv_nmi_action_is("ips") || uv_nmi_action_is("dump") ||
+ atomic_read(&uv_nmi_kexec_failed) == 1)
uv_nmi_dump_state(cpu, regs, master);

/* Call KGDB/KDB if enabled */
@@ -640,6 +647,7 @@ int uv_handle_nmi(unsigned int reason, struct pt_regs *regs)
atomic_set(&uv_nmi_cpus_in_nmi, -1);
atomic_set(&uv_nmi_cpu, -1);
atomic_set(&uv_in_nmi, 0);
+ atomic_set(&uv_nmi_kexec_failed, 0);
}

uv_nmi_touch_watchdogs();

2015-05-01 07:21:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/2] UV: NMI: insert per_cpu accessor function on uv_hub_nmi.


* George Beshers <[email protected]> wrote:

> UV: NMI: insert this_cpu_read accessor function on uv_hub_nmi.
>
> UV NMI was accidently broken by this patch.

Broken in what way?

> commit e16321709c8270f9803bbfdb51e5e02235078c7f
> Author: Christoph Lameter <[email protected]>
> Date: Sun Aug 17 12:30:41 2014 -0500

That's a rather old patch. Was no upstream kernel tested since ~August
last year on UV hardware, or is the bug sporadic? The changelog does
not tell us.

> This patch insert this_cpu_read() on when accessing the PER_CPU
> uv_cpu_nmi variable.

Why? What problem does it solve?

Thanks,

Ingo

2015-05-01 07:27:41

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] UV: NMI: simple dump failover if kdump fails


* George Beshers <[email protected]> wrote:

> UV: NMI: simple dump failover if kdump fails
>
> The ability to trigger a kdump using the system NMI command
> was added by
>
> commit 12ba6c990fab50fe568f3ad8715e81e356552428
> Author: Mike Travis <[email protected]>
> Date: Mon Sep 23 16:25:03 2013 -0500
>
> When kdump is works it is preferable to the set of backtraces

(spelling error)

> that dump provides; however a number of things can go wrong and
> the backtraces are much more useful than nothing.
>
> The two most common reason for kdump not to be available are

(spelling error)

> a problem during boot or the kdump daemon fails to start.

(spelling error)

> In either case the call to crash_kexec() returns unexpectedly;
> when this happens uv_nmi_kdump() also returns with the
> uv_nmi_kexec_failed flag set. This condition now causes a
> standard dump.

'standard dump' == printing an NMI backtrace on all CPUs?

> One other minor change is that dump now generates both the
> show_regs() stack trace and the uv_nmi_dump_ip{,_hdr} information
> that is generated by the "ips" action; the additional information
> has proved to be useful.

Looks like a useful change.

> -/* Dump this cpu's state */
> +/*
> + * Dump this cpu's state. Note that "kdump" only happens

s/CPU's

> + * when crash_kexec() has failed and we are providing the user
> + * a standard dump instead.

So this sentence does not parse for me: kdump only happens if kdump
fails??

> + */
> static void uv_nmi_dump_state_cpu(int cpu, struct pt_regs *regs)
> {
> const char *dots = " ................................. ";
>
> - if (uv_nmi_action_is("ips")) {
> - if (cpu == 0)
> - uv_nmi_dump_cpu_ip_hdr();
> -
> - if (current->pid != 0)
> - uv_nmi_dump_cpu_ip(cpu, regs);
> -
> - } else if (uv_nmi_action_is("dump")) {
> + if (uv_nmi_action_is("dump") || uv_nmi_action_is("kdump")) {
> printk(KERN_DEFAULT
> "UV:%sNMI process trace for CPU %d\n", dots, cpu);

pr_info().

> show_regs(regs);
> }
> +
> + if (cpu == 0)
> + uv_nmi_dump_cpu_ip_hdr();
> +
> + if (current->pid != 0)
> + uv_nmi_dump_cpu_ip(cpu, regs);

What is an 'ip header'? If it's not an Internet IP address then it's
probably horribly named.

> +
> +#if defined(CONFIG_KEXEC)

#ifdef

> @@ -502,9 +507,9 @@ static void uv_nmi_kdump(int cpu, int master, struct pt_regs *regs)
> crash_kexec(regs);
>
> pr_emerg("UV: crash_kexec unexpectedly returned, ");
> + atomic_set(&uv_nmi_kexec_failed, 1);

Why is this flag an atomic variable?

Thanks,

Ingo

2015-05-01 16:33:12

by Mike Travis

[permalink] [raw]
Subject: Re: [PATCH 2/2] UV: NMI: simple dump failover if kdump fails



On 5/1/2015 12:27 AM, Ingo Molnar wrote:
>
> * George Beshers <[email protected]> wrote:
>
>> UV: NMI: simple dump failover if kdump fails
>>
>> The ability to trigger a kdump using the system NMI command
>> was added by
>>
>> commit 12ba6c990fab50fe568f3ad8715e81e356552428
>> Author: Mike Travis <[email protected]>
>> Date: Mon Sep 23 16:25:03 2013 -0500
>>
>> When kdump is works it is preferable to the set of backtraces
>
> (spelling error)
>
>> that dump provides; however a number of things can go wrong and
>> the backtraces are much more useful than nothing.
>>
>> The two most common reason for kdump not to be available are
>
> (spelling error)
>
>> a problem during boot or the kdump daemon fails to start.
>
> (spelling error)
>
>> In either case the call to crash_kexec() returns unexpectedly;
>> when this happens uv_nmi_kdump() also returns with the
>> uv_nmi_kexec_failed flag set. This condition now causes a
>> standard dump.
>
> 'standard dump' == printing an NMI backtrace on all CPUs?

Yes.
>
>> One other minor change is that dump now generates both the
>> show_regs() stack trace and the uv_nmi_dump_ip{,_hdr} information
>> that is generated by the "ips" action; the additional information
>> has proved to be useful.
>
> Looks like a useful change.
>
>> -/* Dump this cpu's state */
>> +/*
>> + * Dump this cpu's state. Note that "kdump" only happens
>
> s/CPU's
>
>> + * when crash_kexec() has failed and we are providing the user
>> + * a standard dump instead.
>
> So this sentence does not parse for me: kdump only happens if kdump
> fails??
>
>> + */
>> static void uv_nmi_dump_state_cpu(int cpu, struct pt_regs *regs)
>> {
>> const char *dots = " ................................. ";
>>
>> - if (uv_nmi_action_is("ips")) {
>> - if (cpu == 0)
>> - uv_nmi_dump_cpu_ip_hdr();
>> -
>> - if (current->pid != 0)
>> - uv_nmi_dump_cpu_ip(cpu, regs);
>> -
>> - } else if (uv_nmi_action_is("dump")) {
>> + if (uv_nmi_action_is("dump") || uv_nmi_action_is("kdump")) {
>> printk(KERN_DEFAULT
>> "UV:%sNMI process trace for CPU %d\n", dots, cpu);
>
> pr_info().
>
>> show_regs(regs);
>> }
>> +
>> + if (cpu == 0)
>> + uv_nmi_dump_cpu_ip_hdr();
>> +
>> + if (current->pid != 0)
>> + uv_nmi_dump_cpu_ip(cpu, regs);
>
> What is an 'ip header'? If it's not an Internet IP address then it's
> probably horribly named.

The IP or Instruction Pointer register. The "show ips" is sort of a
simplified ps showing the processes on non-idle CPUs. We'd need to
blame Intel for that name... :)

Currently you can have either the IPs or the stack dump, but both
contain useful info. So George's idea was if you asked for the dump
you'd get both, if you asked only for IPs, you'd just get them.

>
>> +
>> +#if defined(CONFIG_KEXEC)
>
> #ifdef
>
>> @@ -502,9 +507,9 @@ static void uv_nmi_kdump(int cpu, int master, struct pt_regs *regs)
>> crash_kexec(regs);
>>
>> pr_emerg("UV: crash_kexec unexpectedly returned, ");
>> + atomic_set(&uv_nmi_kexec_failed, 1);
>
> Why is this flag an atomic variable?
>
> Thanks,
>
> Ingo
>

2015-05-01 16:42:54

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] UV: NMI: simple dump failover if kdump fails


* Mike Travis <[email protected]> wrote:

>
>
> On 5/1/2015 12:27 AM, Ingo Molnar wrote:
> >
> > * George Beshers <[email protected]> wrote:
> >
> >> UV: NMI: simple dump failover if kdump fails
> >>
> >> The ability to trigger a kdump using the system NMI command
> >> was added by
> >>
> >> commit 12ba6c990fab50fe568f3ad8715e81e356552428
> >> Author: Mike Travis <[email protected]>
> >> Date: Mon Sep 23 16:25:03 2013 -0500
> >>
> >> When kdump is works it is preferable to the set of backtraces
> >
> > (spelling error)
> >
> >> that dump provides; however a number of things can go wrong and
> >> the backtraces are much more useful than nothing.
> >>
> >> The two most common reason for kdump not to be available are
> >
> > (spelling error)
> >
> >> a problem during boot or the kdump daemon fails to start.
> >
> > (spelling error)
> >
> >> In either case the call to crash_kexec() returns unexpectedly;
> >> when this happens uv_nmi_kdump() also returns with the
> >> uv_nmi_kexec_failed flag set. This condition now causes a
> >> standard dump.
> >
> > 'standard dump' == printing an NMI backtrace on all CPUs?
>
> Yes.
> >
> >> One other minor change is that dump now generates both the
> >> show_regs() stack trace and the uv_nmi_dump_ip{,_hdr} information
> >> that is generated by the "ips" action; the additional information
> >> has proved to be useful.
> >
> > Looks like a useful change.
> >
> >> -/* Dump this cpu's state */
> >> +/*
> >> + * Dump this cpu's state. Note that "kdump" only happens
> >
> > s/CPU's
> >
> >> + * when crash_kexec() has failed and we are providing the user
> >> + * a standard dump instead.
> >
> > So this sentence does not parse for me: kdump only happens if kdump
> > fails??
> >
> >> + */
> >> static void uv_nmi_dump_state_cpu(int cpu, struct pt_regs *regs)
> >> {
> >> const char *dots = " ................................. ";
> >>
> >> - if (uv_nmi_action_is("ips")) {
> >> - if (cpu == 0)
> >> - uv_nmi_dump_cpu_ip_hdr();
> >> -
> >> - if (current->pid != 0)
> >> - uv_nmi_dump_cpu_ip(cpu, regs);
> >> -
> >> - } else if (uv_nmi_action_is("dump")) {
> >> + if (uv_nmi_action_is("dump") || uv_nmi_action_is("kdump")) {
> >> printk(KERN_DEFAULT
> >> "UV:%sNMI process trace for CPU %d\n", dots, cpu);
> >
> > pr_info().
> >
> >> show_regs(regs);
> >> }
> >> +
> >> + if (cpu == 0)
> >> + uv_nmi_dump_cpu_ip_hdr();
> >> +
> >> + if (current->pid != 0)
> >> + uv_nmi_dump_cpu_ip(cpu, regs);
> >
> > What is an 'ip header'? If it's not an Internet IP address then it's
> > probably horribly named.
>
> The IP or Instruction Pointer register. The "show ips" is sort of a
> simplified ps showing the processes on non-idle CPUs. We'd need to
> blame Intel for that name... :)

Yes, but this is 64-bit code, why not call it RIP? :-)

that's kind of not unambiguous either, but at least in technical
discussions it should be ;-)

So what I found confusing is the ip_hdr - that sounds very network-ish
...

> Currently you can have either the IPs or the stack dump, but both
> contain useful info. So George's idea was if you asked for the dump
> you'd get both, if you asked only for IPs, you'd just get them.

Yeah, I'm not against the idea at all. The patch needs a bit of a face
lift and then it looks good to me.

Thanks,

Ingo