2015-08-04 16:41:13

by Jason A. Donenfeld

[permalink] [raw]
Subject: printk from softirq on xen: hard lockup

Hi folks,

Paul McKenney and I had an offline discussion about some rcu questions
that eventually lead into me investigating a strange full lock-up I'm
experiencing as a consequence of a printk in softirq inside of an
rcu_read_lock, when using Xen PV. Relevant excerpts of the
conversation follow:

Jason:
> Looks like if I take away my [fixed text] pr_debug lines inside of
> softirq, then it doesn't lock up. That's crazy! Who knows what sort of
> bug I'm up against. Ahhh, nothing like debugging at 5am. :)

Paul:
> Are you using a serial console line? If so, what is the baud rate?
> 115Kbaud is usually too slow, and people do get serial-console-induced
> RCU CPU stall warnings from time to time.
> Same could apply due to slow mass storage if you are copying the console
> log to mass storage.

Jason:
> Wait, what?!? What you described sounds completely bonkers. Are you
> saying that printk()ing in softirq and/or while rcu_read_lock is held
> can result, in certain known circumstances, in an unrecoverable full
> system lockup? If so, this would be quite the unresolved kernel bug...

Paul:
> More like printk() while interrupts are disabled, but you got it.
> The RCU CPU stall timeout is 21 seconds in recent kernels. If you
> have a 115Kbaud serial line, you get about 1,150 characters per second.
> So if you printk() 24K characters within on irqs-disabled code region
> on such a system, you will very likely get an RCU CPU stall warning.
> And I agree that this can be annoying, but on the other hand, that is
> a pretty freaking slow console-output device, especially given that it
> is 2015.

Jason:
> Hah, that's crazy: seems like you were more or less right. Nice intuition.

Here's the backtrace of what's up during this lockup:

(gdb) target remote localhost:9999
Remote debugging using localhost:9999
__xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
56 while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
(gdb) bt
#0 __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
#1 __default_send_IPI_shortcut (shortcut=<optimized out>,
dest=<optimized out>, vector=<optimized out>) at
./arch/x86/include/asm/ipi.h:75
#2 apic_send_IPI_self (vector=246) at arch/x86/kernel/apic/probe_64.c:54
#3 0xffffffff81011336 in arch_irq_work_raise () at
arch/x86/kernel/irq_work.c:47
#4 0xffffffff8114990c in irq_work_queue (work=0xffff88000fc0e400) at
kernel/irq_work.c:100
#5 0xffffffff8110c29d in wake_up_klogd () at kernel/printk/printk.c:2633
#6 0xffffffff8110ca60 in vprintk_emit (facility=0, level=<optimized
out>, dict=0x0 <irq_stack_union>, dictlen=<optimized out>,
fmt=<optimized out>, args=<optimized out>)
at kernel/printk/printk.c:1778
#7 0xffffffff816010c8 in printk (fmt=<optimized out>) at
kernel/printk/printk.c:1868
#8 0xffffffffc00013ea in ?? ()
#9 0x0000000000000000 in ?? ()

And meanwhile I get stall message:

> [ 1090.072011] INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 1, t=61603 jiffies, g=7165, c=7164, q=28)
> [ 1090.072027] Task dump for CPU 0:
> [ 1090.072031] swapper/0 R running task 0 0 0 0x00000008
> [ 1090.072041] Call Trace:
> [ 1090.072049] [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
> [ 1090.072054] [<ffffffff81007acc>] ? xen_safe_halt+0xc/0x20
> [ 1090.072059] [<ffffffff810159a5>] ? default_idle+0x5/0x10
> [ 1090.072064] [<ffffffff8110394d>] ? cpu_startup_entry+0x1ed/0x220
> [ 1090.072070] [<ffffffff81a97e8d>] ? start_kernel+0x426/0x431
> [ 1090.072074] [<ffffffff81a998cd>] ? xen_start_kernel+0x350/0x356

So what's the deal exactly -- I can't use pr_debug in softirq in
rcu_read_lock()ed regions? I'm not using a slow serial modem -- just
the ordinary xen console. Userspace is logging dmesg to disk as well,
but the disk isn't especially slow.

Is this a xen problem? A softirq problem? Or is this simply... my
problem? It only happens with Xen PV. It doesn't happen otherwise.

Thanks,
Jason


2015-08-04 17:14:30

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] printk from softirq on xen: hard lockup

On 04/08/15 17:41, Jason A. Donenfeld wrote:
> Hi folks,
>
> Paul McKenney and I had an offline discussion about some rcu questions
> that eventually lead into me investigating a strange full lock-up I'm
> experiencing as a consequence of a printk in softirq inside of an
> rcu_read_lock, when using Xen PV. Relevant excerpts of the
^^ PV guest?

> (gdb) target remote localhost:9999
> Remote debugging using localhost:9999
> __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
> 56 while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ => HVM guest

Which is it?

A HVM guest's serial console may be particularly slow (since it's
emulated by qemu). Try using the PV console?

David

2015-08-04 23:01:34

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [Xen-devel] printk from softirq on xen: hard lockup

Hey David,

Sorry for the premature response on my phone earlier. Real reply follows.
>> rcu_read_lock, when using Xen PV. Relevant excerpts of the
> ^^ PV guest?

Yes. The lockup occurs on a PV guest. Nothing special at all about the
configuration. Vanilla upstream 4.1.3 kernel.

>> __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
>> 56 while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ => HVM guest
>
> Which is it?

That's odd. It's a PV guest, not an HVM nor PVH guest.

>
> A HVM guest's serial console may be particularly slow (since it's
> emulated by qemu). Try using the PV console?

I am using the PV console. But actually the problem occurs too even if
/proc/sys/kernel/printk is set to all zeros (no klog on the console).

The plot thickens?

2015-08-05 16:37:58

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [Xen-devel] printk from softirq on xen: hard lockup

Hi folks,

I have written an extremely simple reproducer. Xen 4.5.1. Linux 4.1.3.
Config attached. Reproducer attached. Makefile attached.

It results in the COMPLETE lockup of the system when it receives a
network packet over the Xen PV network interface.

The lockup is 100% reliable. As in the messages above, it puts this --
"while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)" into a busy
loop that never exits.

It is triggered by a simple printk in softirq.

Thanks,
Jason


Attachments:
Makefile (194.00 B)
xen-printk-crash.c (0.98 kB)
4.1.3-domU-config (54.23 kB)
Download all attachments

2015-08-06 08:23:19

by Ian Campbell

[permalink] [raw]
Subject: Re: [Xen-devel] printk from softirq on xen: hard lockup

On Tue, 2015-08-04 at 18:12 +0100, David Vrabel wrote:
> On 04/08/15 17:41, Jason A. Donenfeld wrote:
> > Hi folks,
> >
> > Paul McKenney and I had an offline discussion about some rcu questions
> > that eventually lead into me investigating a strange full lock-up I'm
> > experiencing as a consequence of a printk in softirq inside of an
> > rcu_read_lock, when using Xen PV. Relevant excerpts of the
> ^^ PV guest?
>
> > (gdb) target remote localhost:9999
> > Remote debugging using localhost:9999
> > __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
> > 56 while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ => HVM guest
>
> Which is it?

Aren't there still some code paths for PV guests which hit the native APIC
case (emulated in Xen even for PV these days, since pvops in the early days
didn't accept the hooks needed to make use of the hypercall versions of
apic read/write).

In particular I'm thinking of the IPI which is (or was) used by the sysrq
to trigger the backtrace on all CPUs.

Ian.

2015-08-06 10:02:52

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] printk from softirq on xen: hard lockup

On 05/08/15 00:01, Jason A. Donenfeld wrote:
> Hey David,
>
> Sorry for the premature response on my phone earlier. Real reply follows.
>>> rcu_read_lock, when using Xen PV. Relevant excerpts of the
>> ^^ PV guest?
>
> Yes. The lockup occurs on a PV guest. Nothing special at all about the
> configuration. Vanilla upstream 4.1.3 kernel.
>
>>> __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
>>> 56 while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ => HVM guest
>>
>> Which is it?
>
> That's odd. It's a PV guest, not an HVM nor PVH guest.

Linux PV guests must use the "Xen PV" APIC driver. You need to
investigate why your PV guest is not using this (although I'm surprised
it works at all with the wrong one).

David

2015-08-06 15:58:39

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [Xen-devel] printk from softirq on xen: hard lockup

On Thu, Aug 6, 2015 at 12:02 PM, David Vrabel <[email protected]> wrote:
> Linux PV guests must use the "Xen PV" APIC driver. You need to
> investigate why your PV guest is not using this (although I'm surprised
> it works at all with the wrong one).

Actually it appears this PV Guest is using the "flat" APIC driver
instead of the Xen APIC driver.

But upon further investigation into why:

arch/x86/xen/Makefile:
obj-$(CONFIG_XEN_DOM0) += apic.o vga.o

It would appear that only dom0 gets to use the Xen APIC driver.

What gives?

Jason

2015-08-06 16:10:49

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] printk from softirq on xen: hard lockup

On 06/08/15 16:58, Jason A. Donenfeld wrote:
> On Thu, Aug 6, 2015 at 12:02 PM, David Vrabel <[email protected]> wrote:
>> Linux PV guests must use the "Xen PV" APIC driver. You need to
>> investigate why your PV guest is not using this (although I'm surprised
>> it works at all with the wrong one).
>
> Actually it appears this PV Guest is using the "flat" APIC driver
> instead of the Xen APIC driver.
>
> But upon further investigation into why:
>
> arch/x86/xen/Makefile:
> obj-$(CONFIG_XEN_DOM0) += apic.o vga.o
>
> It would appear that only dom0 gets to use the Xen APIC driver.
>
> What gives?

Looks like the Makefile is wrong.

Try this:

8<------------------
x86/xen: build "Xen PV" APIC driver for domU as well

A PV domU also needs the Xen PV APIC driver but it was only built for
CONFIG_XEN_DOM0=y.

Signed-off-by: David Vrabel <[email protected]>
Cc: <[email protected]>
---
arch/x86/xen/Makefile | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index 7322755..4b6e29a 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -13,13 +13,13 @@ CFLAGS_mmu.o := $(nostackp)
obj-y := enlighten.o setup.o multicalls.o mmu.o irq.o \
time.o xen-asm.o xen-asm_$(BITS).o \
grant-table.o suspend.o platform-pci-unplug.o \
- p2m.o
+ p2m.o apic.o

obj-$(CONFIG_EVENT_TRACING) += trace.o

obj-$(CONFIG_SMP) += smp.o
obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= spinlock.o
obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o
-obj-$(CONFIG_XEN_DOM0) += apic.o vga.o
+obj-$(CONFIG_XEN_DOM0) += vga.o
obj-$(CONFIG_SWIOTLB_XEN) += pci-swiotlb-xen.o
obj-$(CONFIG_XEN_EFI) += efi.o
--
2.1.4

2015-08-06 16:21:37

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [Xen-devel] printk from softirq on xen: hard lockup

In that case, it doesn't compile.

arch/x86/xen/apic.c:204:13: error: redefinition of ‘xen_init_apic’
void __init xen_init_apic(void)
^
In file included from arch/x86/xen/apic.c:9:0:
arch/x86/xen/xen-ops.h:110:27: note: previous definition of
‘xen_init_apic’ was here
static inline void __init xen_init_apic(void)

It looks like whoever wrote this explicitly didn't want that APIC
driver used on domU:

#ifdef CONFIG_XEN_DOM0
void __init xen_init_vga(const struct dom0_vga_console_info *, size_t size);
void __init xen_init_apic(void);
#else
static inline void __init xen_init_vga(const struct dom0_vga_console_info *info,
size_t size)
{
}
static inline void __init xen_init_apic(void)
{
}
#endif

2015-08-06 16:29:46

by Jason A. Donenfeld

[permalink] [raw]
Subject: [PATCH] xen-apic: Enable on domU as well

It turns out that domU also requires the Xen APIC driver. Otherwise we
get stuck in busy loops that never exit, such as in this stack trace:

(gdb) target remote localhost:9999
Remote debugging using localhost:9999
__xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
56 while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
(gdb) bt
#0 __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
#1 __default_send_IPI_shortcut (shortcut=<optimized out>,
dest=<optimized out>, vector=<optimized out>) at
./arch/x86/include/asm/ipi.h:75
#2 apic_send_IPI_self (vector=246) at arch/x86/kernel/apic/probe_64.c:54
#3 0xffffffff81011336 in arch_irq_work_raise () at
arch/x86/kernel/irq_work.c:47
#4 0xffffffff8114990c in irq_work_queue (work=0xffff88000fc0e400) at
kernel/irq_work.c:100
#5 0xffffffff8110c29d in wake_up_klogd () at kernel/printk/printk.c:2633
#6 0xffffffff8110ca60 in vprintk_emit (facility=0, level=<optimized
out>, dict=0x0 <irq_stack_union>, dictlen=<optimized out>,
fmt=<optimized out>, args=<optimized out>)
at kernel/printk/printk.c:1778
#7 0xffffffff816010c8 in printk (fmt=<optimized out>) at
kernel/printk/printk.c:1868
#8 0xffffffffc00013ea in ?? ()
#9 0x0000000000000000 in ?? ()

Mailing-list-thread: https://lkml.org/lkml/2015/8/4/755
Signed-off-by: Jason A. Donenfeld <[email protected]>
Cc: David Vrabel <[email protected]>
Cc: Ian Campbell <[email protected]>
Cc: <[email protected]>
---
arch/x86/xen/Makefile | 4 ++--
arch/x86/xen/xen-ops.h | 10 ----------
2 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index 7322755..4b6e29a 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -13,13 +13,13 @@ CFLAGS_mmu.o := $(nostackp)
obj-y := enlighten.o setup.o multicalls.o mmu.o irq.o \
time.o xen-asm.o xen-asm_$(BITS).o \
grant-table.o suspend.o platform-pci-unplug.o \
- p2m.o
+ p2m.o apic.o

obj-$(CONFIG_EVENT_TRACING) += trace.o

obj-$(CONFIG_SMP) += smp.o
obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= spinlock.o
obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o
-obj-$(CONFIG_XEN_DOM0) += apic.o vga.o
+obj-$(CONFIG_XEN_DOM0) += vga.o
obj-$(CONFIG_SWIOTLB_XEN) += pci-swiotlb-xen.o
obj-$(CONFIG_XEN_EFI) += efi.o
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index c20fe29..7363e1b 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -99,18 +99,8 @@ static inline void xen_uninit_lock_cpu(int cpu)

struct dom0_vga_console_info;

-#ifdef CONFIG_XEN_DOM0
void __init xen_init_vga(const struct dom0_vga_console_info *, size_t size);
void __init xen_init_apic(void);
-#else
-static inline void __init xen_init_vga(const struct dom0_vga_console_info *info,
- size_t size)
-{
-}
-static inline void __init xen_init_apic(void)
-{
-}
-#endif

#ifdef CONFIG_XEN_EFI
extern void xen_efi_init(void);
--
2.5.0

2015-08-06 16:36:56

by Jason A. Donenfeld

[permalink] [raw]
Subject: [PATCH v2] xen-apic: Enable on domU as well

It turns out that domU also requires the Xen APIC driver. Otherwise we
get stuck in busy loops that never exit, such as in this stack trace:

(gdb) target remote localhost:9999
Remote debugging using localhost:9999
__xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
56 while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
(gdb) bt
#0 __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
#1 __default_send_IPI_shortcut (shortcut=<optimized out>,
dest=<optimized out>, vector=<optimized out>) at
./arch/x86/include/asm/ipi.h:75
#2 apic_send_IPI_self (vector=246) at arch/x86/kernel/apic/probe_64.c:54
#3 0xffffffff81011336 in arch_irq_work_raise () at
arch/x86/kernel/irq_work.c:47
#4 0xffffffff8114990c in irq_work_queue (work=0xffff88000fc0e400) at
kernel/irq_work.c:100
#5 0xffffffff8110c29d in wake_up_klogd () at kernel/printk/printk.c:2633
#6 0xffffffff8110ca60 in vprintk_emit (facility=0, level=<optimized
out>, dict=0x0 <irq_stack_union>, dictlen=<optimized out>,
fmt=<optimized out>, args=<optimized out>)
at kernel/printk/printk.c:1778
#7 0xffffffff816010c8 in printk (fmt=<optimized out>) at
kernel/printk/printk.c:1868
#8 0xffffffffc00013ea in ?? ()
#9 0x0000000000000000 in ?? ()

Mailing-list-thread: https://lkml.org/lkml/2015/8/4/755
Signed-off-by: Jason A. Donenfeld <[email protected]>
Cc: David Vrabel <[email protected]>
Cc: Ian Campbell <[email protected]>
Cc: <[email protected]>
---
arch/x86/xen/Makefile | 4 ++--
arch/x86/xen/xen-ops.h | 11 ++++-------
2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index 7322755..4b6e29a 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -13,13 +13,13 @@ CFLAGS_mmu.o := $(nostackp)
obj-y := enlighten.o setup.o multicalls.o mmu.o irq.o \
time.o xen-asm.o xen-asm_$(BITS).o \
grant-table.o suspend.o platform-pci-unplug.o \
- p2m.o
+ p2m.o apic.o

obj-$(CONFIG_EVENT_TRACING) += trace.o

obj-$(CONFIG_SMP) += smp.o
obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= spinlock.o
obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o
-obj-$(CONFIG_XEN_DOM0) += apic.o vga.o
+obj-$(CONFIG_XEN_DOM0) += vga.o
obj-$(CONFIG_SWIOTLB_XEN) += pci-swiotlb-xen.o
obj-$(CONFIG_XEN_EFI) += efi.o
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index c20fe29..cd248ff 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -98,20 +98,17 @@ static inline void xen_uninit_lock_cpu(int cpu)
#endif

struct dom0_vga_console_info;
-
#ifdef CONFIG_XEN_DOM0
void __init xen_init_vga(const struct dom0_vga_console_info *, size_t size);
-void __init xen_init_apic(void);
#else
-static inline void __init xen_init_vga(const struct dom0_vga_console_info *info,
- size_t size)
-{
-}
-static inline void __init xen_init_apic(void)
+void __init xen_init_vga(const struct dom0_vga_console_info *info,
+ size_t size);
{
}
#endif

+void __init xen_init_apic(void);
+
#ifdef CONFIG_XEN_EFI
extern void xen_efi_init(void);
#else
--
2.5.0

2015-08-07 14:23:40

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v2] xen-apic: Enable on domU as well

On Thu, Aug 06, 2015 at 06:37:05PM +0200, Jason A. Donenfeld wrote:
> It turns out that domU also requires the Xen APIC driver. Otherwise we
> get stuck in busy loops that never exit, such as in this stack trace:
>
> (gdb) target remote localhost:9999
> Remote debugging using localhost:9999
> __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
> 56 while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
> (gdb) bt
> #0 __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
> #1 __default_send_IPI_shortcut (shortcut=<optimized out>,
> dest=<optimized out>, vector=<optimized out>) at
> ./arch/x86/include/asm/ipi.h:75
> #2 apic_send_IPI_self (vector=246) at arch/x86/kernel/apic/probe_64.c:54
> #3 0xffffffff81011336 in arch_irq_work_raise () at
> arch/x86/kernel/irq_work.c:47
> #4 0xffffffff8114990c in irq_work_queue (work=0xffff88000fc0e400) at
> kernel/irq_work.c:100
> #5 0xffffffff8110c29d in wake_up_klogd () at kernel/printk/printk.c:2633
> #6 0xffffffff8110ca60 in vprintk_emit (facility=0, level=<optimized
> out>, dict=0x0 <irq_stack_union>, dictlen=<optimized out>,
> fmt=<optimized out>, args=<optimized out>)
> at kernel/printk/printk.c:1778
> #7 0xffffffff816010c8 in printk (fmt=<optimized out>) at
> kernel/printk/printk.c:1868
> #8 0xffffffffc00013ea in ?? ()
> #9 0x0000000000000000 in ?? ()
>
> Mailing-list-thread: https://lkml.org/lkml/2015/8/4/755
> Signed-off-by: Jason A. Donenfeld <[email protected]>
> Cc: David Vrabel <[email protected]>
> Cc: Ian Campbell <[email protected]>
> Cc: <[email protected]>

While this patch is OK for the trees that implement the PV APIC
driver it won't apply to older ones (and it does not need to).

In the older ones this was working with f447d56d36af18c5104ff29dcb1327c0c0ac3634
"xen: implement apic ipi interface", which should have worked for
your case.

And would have made the arch_irq_work_raise and such use the
Xen code paths:
952
953 #ifdef CONFIG_SMP
954 apic->send_IPI_allbutself = xen_send_IPI_allbutself;
955 apic->send_IPI_mask_allbutself = xen_send_IPI_mask_allbutself;
956 apic->send_IPI_mask = xen_send_IPI_mask;
957 apic->send_IPI_all = xen_send_IPI_all;
958 apic->send_IPI_self = xen_send_IPI_self;
959 #endif

Anyhow, your patch seems to fix a regression my patch
feb44f1f7a4ac299d1ab1c3606860e70b9b89d69
"x86/xen: Provide a "Xen PV" APIC driver to support >255 VCPUs"
introduced.

I would to the stable.vger.kernel.org:
# apply only to v4.1

As the earlier ones will work fine.

Thank you!

Oh, and Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>


> ---
> arch/x86/xen/Makefile | 4 ++--
> arch/x86/xen/xen-ops.h | 11 ++++-------
> 2 files changed, 6 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
> index 7322755..4b6e29a 100644
> --- a/arch/x86/xen/Makefile
> +++ b/arch/x86/xen/Makefile
> @@ -13,13 +13,13 @@ CFLAGS_mmu.o := $(nostackp)
> obj-y := enlighten.o setup.o multicalls.o mmu.o irq.o \
> time.o xen-asm.o xen-asm_$(BITS).o \
> grant-table.o suspend.o platform-pci-unplug.o \
> - p2m.o
> + p2m.o apic.o
>
> obj-$(CONFIG_EVENT_TRACING) += trace.o
>
> obj-$(CONFIG_SMP) += smp.o
> obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= spinlock.o
> obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o
> -obj-$(CONFIG_XEN_DOM0) += apic.o vga.o
> +obj-$(CONFIG_XEN_DOM0) += vga.o
> obj-$(CONFIG_SWIOTLB_XEN) += pci-swiotlb-xen.o
> obj-$(CONFIG_XEN_EFI) += efi.o
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index c20fe29..cd248ff 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -98,20 +98,17 @@ static inline void xen_uninit_lock_cpu(int cpu)
> #endif
>
> struct dom0_vga_console_info;
> -
> #ifdef CONFIG_XEN_DOM0
> void __init xen_init_vga(const struct dom0_vga_console_info *, size_t size);
> -void __init xen_init_apic(void);
> #else
> -static inline void __init xen_init_vga(const struct dom0_vga_console_info *info,
> - size_t size)
> -{
> -}
> -static inline void __init xen_init_apic(void)
> +void __init xen_init_vga(const struct dom0_vga_console_info *info,
> + size_t size);
> {
> }
> #endif
>
> +void __init xen_init_apic(void);
> +
> #ifdef CONFIG_XEN_EFI
> extern void xen_efi_init(void);
> #else
> --
> 2.5.0
>
>
> _______________________________________________
> Xen-devel mailing list
> [email protected]
> http://lists.xen.org/xen-devel

2015-08-07 14:38:00

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v2] xen-apic: Enable on domU as well

On Fri, Aug 7, 2015 at 4:23 PM, Konrad Rzeszutek Wilk
<[email protected]> wrote:
> Anyhow, your patch seems to fix a regression my patch
> feb44f1f7a4ac299d1ab1c3606860e70b9b89d69
> "x86/xen: Provide a "Xen PV" APIC driver to support >255 VCPUs"
> introduced.

Ahhh, good, okay. That explains why I didn't encounter this with older
kernels. The whole picture makes sense now. Thanks for reviewing this.

David - mergable?

2015-08-10 13:40:16

by Jason A. Donenfeld

[permalink] [raw]
Subject: [PATCH v3] xen-apic: Enable on domU as well

It turns out that domU also requires the Xen APIC driver. Otherwise we
get stuck in busy loops that never exit, such as in this stack trace:

(gdb) target remote localhost:9999
Remote debugging using localhost:9999
__xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
56 while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
(gdb) bt
#0 __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
#1 __default_send_IPI_shortcut (shortcut=<optimized out>,
dest=<optimized out>, vector=<optimized out>) at
./arch/x86/include/asm/ipi.h:75
#2 apic_send_IPI_self (vector=246) at arch/x86/kernel/apic/probe_64.c:54
#3 0xffffffff81011336 in arch_irq_work_raise () at
arch/x86/kernel/irq_work.c:47
#4 0xffffffff8114990c in irq_work_queue (work=0xffff88000fc0e400) at
kernel/irq_work.c:100
#5 0xffffffff8110c29d in wake_up_klogd () at kernel/printk/printk.c:2633
#6 0xffffffff8110ca60 in vprintk_emit (facility=0, level=<optimized
out>, dict=0x0 <irq_stack_union>, dictlen=<optimized out>,
fmt=<optimized out>, args=<optimized out>)
at kernel/printk/printk.c:1778
#7 0xffffffff816010c8 in printk (fmt=<optimized out>) at
kernel/printk/printk.c:1868
#8 0xffffffffc00013ea in ?? ()
#9 0x0000000000000000 in ?? ()

Mailing-list-thread: https://lkml.org/lkml/2015/8/4/755
Signed-off-by: Jason A. Donenfeld <[email protected]>
Cc: David Vrabel <[email protected]>
Cc: Ian Campbell <[email protected]>
Cc: <[email protected]>
---
arch/x86/xen/Makefile | 4 ++--
arch/x86/xen/xen-ops.h | 7 ++-----
2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index 7322755..4b6e29a 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -13,13 +13,13 @@ CFLAGS_mmu.o := $(nostackp)
obj-y := enlighten.o setup.o multicalls.o mmu.o irq.o \
time.o xen-asm.o xen-asm_$(BITS).o \
grant-table.o suspend.o platform-pci-unplug.o \
- p2m.o
+ p2m.o apic.o

obj-$(CONFIG_EVENT_TRACING) += trace.o

obj-$(CONFIG_SMP) += smp.o
obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= spinlock.o
obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o
-obj-$(CONFIG_XEN_DOM0) += apic.o vga.o
+obj-$(CONFIG_XEN_DOM0) += vga.o
obj-$(CONFIG_SWIOTLB_XEN) += pci-swiotlb-xen.o
obj-$(CONFIG_XEN_EFI) += efi.o
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index c20fe29..d0a543b 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -98,20 +98,17 @@ static inline void xen_uninit_lock_cpu(int cpu)
#endif

struct dom0_vga_console_info;
-
#ifdef CONFIG_XEN_DOM0
void __init xen_init_vga(const struct dom0_vga_console_info *, size_t size);
-void __init xen_init_apic(void);
#else
static inline void __init xen_init_vga(const struct dom0_vga_console_info *info,
size_t size)
{
}
-static inline void __init xen_init_apic(void)
-{
-}
#endif

+void __init xen_init_apic(void);
+
#ifdef CONFIG_XEN_EFI
extern void xen_efi_init(void);
#else
--
2.5.0

2015-08-10 13:42:05

by David Vrabel

[permalink] [raw]
Subject: Re: [PATCH v3] xen-apic: Enable on domU as well

On 10/08/15 14:40, Jason A. Donenfeld wrote:
> It turns out that domU also requires the Xen APIC driver. Otherwise we
> get stuck in busy loops that never exit, such as in this stack trace:

What's the difference between v3 and v2?

David

2015-08-10 13:44:17

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH v3] xen-apic: Enable on domU as well

On Mon, Aug 10, 2015 at 3:41 PM, David Vrabel <[email protected]> wrote:
> On 10/08/15 14:40, Jason A. Donenfeld wrote:
>> It turns out that domU also requires the Xen APIC driver. Otherwise we
>> get stuck in busy loops that never exit, such as in this stack trace:
>
> What's the difference between v3 and v2?

I did some silly things with vim in v2, and there's an extra
semicolon, some other formatting things, and a function is made
unstatic by accident. v3 is what I should have originally sent.
Functionally the same though.

2015-08-11 10:47:05

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH v3] xen-apic: Enable on domU as well

On 10/08/15 14:40, Jason A. Donenfeld wrote:
> It turns out that domU also requires the Xen APIC driver. Otherwise we
> get stuck in busy loops that never exit, such as in this stack trace:

Applied to for-linus-4.2 and tagged for stable, thanks.

David