2013-03-29 07:22:19

by Zhenzhong Duan

[permalink] [raw]
Subject: [PATCH] xen: Don't call arch_trigger_all_cpu_backtrace in dom0(pvm)

nmi isn't supported in dom0, fallback to general all cpu backtrace code.

Without fix, on xapic system, sysrq+l, no backtrace is showed.
On x2apic enabled system, got NULL pointer dereference as below.

SysRq : Show backtrace of all active CPUs
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8125e3cb>] memcpy+0xb/0x120
Call Trace:
[<ffffffff81039633>] ? __x2apic_send_IPI_mask+0x73/0x160
[<ffffffff8103973e>] x2apic_send_IPI_all+0x1e/0x20
[<ffffffff8103498c>] arch_trigger_all_cpu_backtrace+0x6c/0xb0
[<ffffffff81501be4>] ? _raw_spin_lock_irqsave+0x34/0x50
[<ffffffff8131654e>] sysrq_handle_showallcpus+0xe/0x10
[<ffffffff8131616d>] __handle_sysrq+0x7d/0x140
[<ffffffff81316230>] ? __handle_sysrq+0x140/0x140
[<ffffffff81316287>] write_sysrq_trigger+0x57/0x60
[<ffffffff811ca996>] proc_reg_write+0x86/0xc0
[<ffffffff8116dd8e>] vfs_write+0xce/0x190
[<ffffffff8116e3e5>] sys_write+0x55/0x90
[<ffffffff8150a242>] system_call_fastpath+0x16/0x1b

Signed-off-by: Zhenzhong Duan <[email protected]>
Tested-by: Tamon Shiose <[email protected]>
---
include/linux/nmi.h | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index db50840..b845757 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -32,6 +32,8 @@ static inline void touch_nmi_watchdog(void)
#ifdef arch_trigger_all_cpu_backtrace
static inline bool trigger_all_cpu_backtrace(void)
{
+ if (xen_domain())
+ return false;
arch_trigger_all_cpu_backtrace();

return true;
--
1.7.3


2013-03-29 13:46:45

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH] xen: Don't call arch_trigger_all_cpu_backtrace in dom0(pvm)

On Fri, Mar 29, 2013 at 03:22:38PM +0800, Zhenzhong Duan wrote:
> nmi isn't supported in dom0, fallback to general all cpu backtrace code.
>
> Without fix, on xapic system, sysrq+l, no backtrace is showed.
> On x2apic enabled system, got NULL pointer dereference as below.

Why would the x2APIC or xAPIC make a difference here? The Linux dom0
is not fiddling with the APIC - that is the hypervisor job.

Can you explain to me why x2apic_send_IPI_mask is even set? Wouldn't
the Xen version of send_IPI be present? (See xen_smp_ops)

Perhaps it is missing an over-write for the send_IPI_all?

>
> SysRq : Show backtrace of all active CPUs
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: [<ffffffff8125e3cb>] memcpy+0xb/0x120
> Call Trace:
> [<ffffffff81039633>] ? __x2apic_send_IPI_mask+0x73/0x160
> [<ffffffff8103973e>] x2apic_send_IPI_all+0x1e/0x20
> [<ffffffff8103498c>] arch_trigger_all_cpu_backtrace+0x6c/0xb0
> [<ffffffff81501be4>] ? _raw_spin_lock_irqsave+0x34/0x50
> [<ffffffff8131654e>] sysrq_handle_showallcpus+0xe/0x10
> [<ffffffff8131616d>] __handle_sysrq+0x7d/0x140
> [<ffffffff81316230>] ? __handle_sysrq+0x140/0x140
> [<ffffffff81316287>] write_sysrq_trigger+0x57/0x60
> [<ffffffff811ca996>] proc_reg_write+0x86/0xc0
> [<ffffffff8116dd8e>] vfs_write+0xce/0x190
> [<ffffffff8116e3e5>] sys_write+0x55/0x90
> [<ffffffff8150a242>] system_call_fastpath+0x16/0x1b
>
> Signed-off-by: Zhenzhong Duan <[email protected]>
> Tested-by: Tamon Shiose <[email protected]>
> ---
> include/linux/nmi.h | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index db50840..b845757 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -32,6 +32,8 @@ static inline void touch_nmi_watchdog(void)
> #ifdef arch_trigger_all_cpu_backtrace
> static inline bool trigger_all_cpu_backtrace(void)
> {
> + if (xen_domain())
> + return false;
> arch_trigger_all_cpu_backtrace();
>
> return true;
> --
> 1.7.3
>

2013-04-01 05:26:12

by Zhenzhong Duan

[permalink] [raw]
Subject: Re: [PATCH] xen: Don't call arch_trigger_all_cpu_backtrace in dom0(pvm)


On 2013-03-29 21:46, Konrad Rzeszutek Wilk wrote:
> On Fri, Mar 29, 2013 at 03:22:38PM +0800, Zhenzhong Duan wrote:
>> nmi isn't supported in dom0, fallback to general all cpu backtrace code.
>>
>> Without fix, on xapic system, sysrq+l, no backtrace is showed.
>> On x2apic enabled system, got NULL pointer dereference as below.
> Why would the x2APIC or xAPIC make a difference here? The Linux dom0
> is not fiddling with the APIC - that is the hypervisor job.
In x2apic enabled system, dom0 kernel set apic pointer to
apic_x2apic_cluster or apic_x2apic_phys.
When sending nmi, apic->send_IPI_all copy cpumask which isn't initialized.
For xapic system, apic->send_IPI_all=xen_send_IPI_all apic, this func
does nothing for nmi, so no backtrace.
> Can you explain to me why x2apic_send_IPI_mask is even set? Wouldn't
> the Xen version of send_IPI be present? (See xen_smp_ops)
It's overwrited by x2apic initialization.
The problem is even without overwrite, like xapic system,
xen_send_IPI_all doesn't work for nmi vector.
zduan

2013-04-01 12:41:23

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH] xen: Don't call arch_trigger_all_cpu_backtrace in dom0(pvm)

On Mon, Apr 01, 2013 at 01:26:34PM +0800, Zhenzhong Duan wrote:
>
> On 2013-03-29 21:46, Konrad Rzeszutek Wilk wrote:
> >On Fri, Mar 29, 2013 at 03:22:38PM +0800, Zhenzhong Duan wrote:
> >>nmi isn't supported in dom0, fallback to general all cpu backtrace code.
> >>
> >>Without fix, on xapic system, sysrq+l, no backtrace is showed.
> >>On x2apic enabled system, got NULL pointer dereference as below.
> >Why would the x2APIC or xAPIC make a difference here? The Linux dom0
> >is not fiddling with the APIC - that is the hypervisor job.
> In x2apic enabled system, dom0 kernel set apic pointer to
> apic_x2apic_cluster or apic_x2apic_phys.


> When sending nmi, apic->send_IPI_all copy cpumask which isn't initialized.
> For xapic system, apic->send_IPI_all=xen_send_IPI_all apic, this
> func does nothing for nmi, so no backtrace.
> >Can you explain to me why x2apic_send_IPI_mask is even set? Wouldn't
> >the Xen version of send_IPI be present? (See xen_smp_ops)
> It's overwrited by x2apic initialization.

That explanation needs to be part of the git commit.
> The problem is even without overwrite, like xapic system,
> xen_send_IPI_all doesn't work for nmi vector.

Can you tweak the x2apic_mode = 0 in enligthen.c for example?
Or clear the X86_FEATURE_X2APIC in the enlighten.c ? (Similar
to how the other features are cleared) Wouldn't
that stop x2apic_enabled from detecting x2APIC?


Sure.
> zduan

2013-04-03 12:00:20

by Zhenzhong Duan

[permalink] [raw]
Subject: Re: [PATCH] xen: Don't call arch_trigger_all_cpu_backtrace in dom0(pvm)


On 2013-04-01 20:41, Konrad Rzeszutek Wilk wrote:
> On Mon, Apr 01, 2013 at 01:26:34PM +0800, Zhenzhong Duan wrote:
>> On 2013-03-29 21:46, Konrad Rzeszutek Wilk wrote:
>>> On Fri, Mar 29, 2013 at 03:22:38PM +0800, Zhenzhong Duan wrote:
>> The problem is even without overwrite, like xapic system,
>> xen_send_IPI_all doesn't work for nmi vector.
> Can you tweak the x2apic_mode = 0 in enligthen.c for example?
> Or clear the X86_FEATURE_X2APIC in the enlighten.c ? (Similar
> to how the other features are cleared) Wouldn't
> that stop x2apic_enabled from detecting x2APIC?
Hi Konrad,
I used the second method. x2apic is totally masked in dom0. Thanks Tamon
to do the test.

Testing result:

The server didn't get crashed by "echo l > /proc/sysrq-trigger". On this machine, VT-d is enabled in BIOS.
However, no backtrace was shown.

[root@x4470m2-bur09-b ~]# uname -a
Linux x4470m2-bur09-b.us.oracle.com 2.6.39-200.1.14.el5uek.bug16372098.test #1 SMP Tue Apr 2 21:09:27 PDT 2013 x86_64 x86_64 x86_64 GNU/Linux
[root@x4470m2-bur09-b ~]# cat /proc/cpuinfo | grep x2apic
[root@x4470m2-bur09-b ~]# dmesg | grep x2apic
[root@x4470m2-bur09-b ~]# cat /proc/cmdline
ro root=UUID=486fc42b-3383-462f-aca3-b1340fbd4ad9 console=tty1 console=ttyS0,9600n8
[root@x4470m2-bur09-b ~]# echo l > /proc/sysrq-trigger
[root@x4470m2-bur09-b ~]#

/var/log/messages:
(snip)
Apr 3 14:14:33 x4470m2-bur09-b kernel: SysRq : Show backtrace of all active CPUs
Apr 3 14:14:33 x4470m2-bur09-b kernel: sending NMI to all CPUs:
(EOF)

On the console, no backtrace either. I did it twice, and got the same results.

2013-04-03 14:46:12

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH] xen: Don't call arch_trigger_all_cpu_backtrace in dom0(pvm)

On Wed, Apr 03, 2013 at 08:00:37PM +0800, Zhenzhong Duan wrote:
>
> On 2013-04-01 20:41, Konrad Rzeszutek Wilk wrote:
> >On Mon, Apr 01, 2013 at 01:26:34PM +0800, Zhenzhong Duan wrote:
> >>On 2013-03-29 21:46, Konrad Rzeszutek Wilk wrote:
> >>>On Fri, Mar 29, 2013 at 03:22:38PM +0800, Zhenzhong Duan wrote:
> >>The problem is even without overwrite, like xapic system,
> >>xen_send_IPI_all doesn't work for nmi vector.
> >Can you tweak the x2apic_mode = 0 in enligthen.c for example?
> >Or clear the X86_FEATURE_X2APIC in the enlighten.c ? (Similar
> >to how the other features are cleared) Wouldn't
> >that stop x2apic_enabled from detecting x2APIC?
> Hi Konrad,
> I used the second method. x2apic is totally masked in dom0. Thanks
> Tamon to do the test.
>
> Testing result:
>
> The server didn't get crashed by "echo l > /proc/sysrq-trigger". On this machine, VT-d is enabled in BIOS.
> However, no backtrace was shown.
>
> [root@x4470m2-bur09-b ~]# uname -a
> Linux x4470m2-bur09-b.us.oracle.com 2.6.39-200.1.14.el5uek.bug16372098.test #1 SMP Tue Apr 2 21:09:27 PDT 2013 x86_64 x86_64 x86_64 GNU/Linux
> [root@x4470m2-bur09-b ~]# cat /proc/cpuinfo | grep x2apic
> [root@x4470m2-bur09-b ~]# dmesg | grep x2apic
> [root@x4470m2-bur09-b ~]# cat /proc/cmdline
> ro root=UUID=486fc42b-3383-462f-aca3-b1340fbd4ad9 console=tty1 console=ttyS0,9600n8
> [root@x4470m2-bur09-b ~]# echo l > /proc/sysrq-trigger
> [root@x4470m2-bur09-b ~]#
>
> /var/log/messages:
> (snip)
> Apr 3 14:14:33 x4470m2-bur09-b kernel: SysRq : Show backtrace of all active CPUs
> Apr 3 14:14:33 x4470m2-bur09-b kernel: sending NMI to all CPUs:
> (EOF)
>
> On the console, no backtrace either. I did it twice, and got the same results.

Great.
Zhenzhong, do you want to prep a patch with a nice git commit description mentioning
your findings of the x2APIC over-write and also include the crach that would
appear?

And send it to xen-devel + LKML and to me so I can put it on my v3.9 branch?

Thanks
>
>