2015-11-28 15:51:39

by Sander Eikelenboom

[permalink] [raw]
Subject: linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.

Hi all,

I have just tested a 4.4-rc2 kernel (current linus tree) + the tip tree
pulled on top.

Running this kernel under Xen on PV-guests with multiple vcpus goes well
(on idle < 10% cpu usage),
but a guest with only a single vcpu doesn't idle at all, it seems a
kworker thread is stuck:
root 569 98.0 0.0 0 0 ? R 16:02 12:47
[kworker/0:1]

Running a 4.3 kernel works fine with a single vpcu, bisecting would
probably quite painful since there were some breakages this merge window
with respect to Xen pv-guests.

There are some differences in the diff's from booting a 4.3, 4.4-single,
4.4-multi cpu boot:

Between 4.3 and 4.4-single:

-NR_IRQS:4352 nr_irqs:32 16
+Using NULL legacy PIC
+NR_IRQS:4352 nr_irqs:32 0

-cpu 0 spinlock event irq 17
+cpu 0 spinlock event irq 1

and later on:

-hctosys: unable to open rtc device (rtc0)
+rtc_cmos rtc_cmos: hctosys: unable to read the hardware clock

+genirq: Flags mismatch irq 8. 00000000 (hvc_console) vs. 00000000
(rtc0)
+hvc_open: request_irq failed with rc -16.
+Warning: unable to open an initial console.


between 4.4-single and 4.4-multi:

Using NULL legacy PIC
-NR_IRQS:4352 nr_irqs:32 0
+NR_IRQS:4352 nr_irqs:48 0

and later on:

-rtc_cmos rtc_cmos: hctosys: unable to read the hardware clock
+hctosys: unable to open rtc device (rtc0)

-genirq: Flags mismatch irq 8. 00000000 (hvc_console) vs. 00000000
(rtc0)
-hvc_open: request_irq failed with rc -16.
-Warning: unable to open an initial console.

attached:
- dmesg with 4.3 kernel with 1 vcpu
- dmesg with 4.4 kernel with 1 vpcu
- dmesg with 4.4 kernel with 2 vpcus
- .config of the 4.4 kernel is attached.

--
Sander



Attachments:
dotconfig (102.81 kB)
dmesg-4.3.txt (15.06 kB)
dmesg-4.4.txt (15.00 kB)
dmesg-4.4-multi.txt (14.97 kB)
Download all attachments

2015-12-02 14:16:18

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.

On 28/11/15 15:47, Sander Eikelenboom wrote:
>
> -rtc_cmos rtc_cmos: hctosys: unable to read the hardware clock
> +hctosys: unable to open rtc device (rtc0)
>
> -genirq: Flags mismatch irq 8. 00000000 (hvc_console) vs. 00000000 (rtc0)
> -hvc_open: request_irq failed with rc -16.

I have reproduced this issue. We really shouldn't have an RTC device in
a PV guest and I think this irq conflict breaks hvc0.

David

2015-12-02 14:56:09

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.

On 28/11/15 15:47, Sander Eikelenboom wrote:
> genirq: Flags mismatch irq 8. 00000000 (hvc_console) vs. 00000000 (rtc0)

We shouldn't register an rtc_cmos device because its legacy irq
conflicts with the irq needed for hvc0. For a multi VCPU guest irq 8 is
in use for the pv spinlocks and this gets requested first, preventing
the rtc device from probing.

Does this patch fix it for you?

David
8<--------------------
x86: rtc_cmos platform device requires legacy irqs

Adding the rtc platform device when there are no legacy irqs (no
legacy PIC) causes a conflict with other devices that end up using the
same irq number.

In a single VCPU PV guest we should have:

/proc/interrupts:
CPU0
0: 4934 xen-percpu-virq timer0
1: 0 xen-percpu-ipi spinlock0
2: 0 xen-percpu-ipi resched0
3: 0 xen-percpu-ipi callfunc0
4: 0 xen-percpu-virq debug0
5: 0 xen-percpu-ipi callfuncsingle0
6: 0 xen-percpu-ipi irqwork0
7: 321 xen-dyn-event xenbus
8: 90 xen-dyn-event hvc_console
...

But hvc_console cannot get its interrupt because it is already in use
by rtc0 and the console does not work.

genirq: Flags mismatch irq 8. 00000000 (hvc_console) vs. 00000000 (rtc0)

The rtc_cmos device requires a particular legacy irq so don't add it
if there are no legacy irqs.

Signed-off-by: David Vrabel <[email protected]>
---
arch/x86/kernel/rtc.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c
index cd96852..07c70f1 100644
--- a/arch/x86/kernel/rtc.c
+++ b/arch/x86/kernel/rtc.c
@@ -14,6 +14,7 @@
#include <asm/time.h>
#include <asm/intel-mid.h>
#include <asm/rtc.h>
+#include <asm/i8259.h>

#ifdef CONFIG_X86_32
/*
@@ -200,6 +201,10 @@ static __init int add_rtc_cmos(void)
}
#endif

+ /* RTC uses legacy IRQs. */
+ if (!nr_legacy_irqs())
+ return -ENODEV;
+
platform_device_register(&rtc_device);
dev_info(&rtc_device.dev,
"registered platform RTC device (no PNP device found)\n");
--
2.1.4


2015-12-02 17:35:16

by Sander Eikelenboom

[permalink] [raw]
Subject: Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.

On 2015-12-02 15:55, David Vrabel wrote:
> On 28/11/15 15:47, Sander Eikelenboom wrote:
>> genirq: Flags mismatch irq 8. 00000000 (hvc_console) vs. 00000000
>> (rtc0)
>
> We shouldn't register an rtc_cmos device because its legacy irq
> conflicts with the irq needed for hvc0. For a multi VCPU guest irq 8
> is
> in use for the pv spinlocks and this gets requested first, preventing
> the rtc device from probing.
>
> Does this patch fix it for you?
>
> David

It does, thanks.

Reported-and-tested-by: Sander Eikelenboom <[email protected]>

--
Sander

> 8<--------------------
> x86: rtc_cmos platform device requires legacy irqs
>
> Adding the rtc platform device when there are no legacy irqs (no
> legacy PIC) causes a conflict with other devices that end up using the
> same irq number.
>
> In a single VCPU PV guest we should have:
>
> /proc/interrupts:
> CPU0
> 0: 4934 xen-percpu-virq timer0
> 1: 0 xen-percpu-ipi spinlock0
> 2: 0 xen-percpu-ipi resched0
> 3: 0 xen-percpu-ipi callfunc0
> 4: 0 xen-percpu-virq debug0
> 5: 0 xen-percpu-ipi callfuncsingle0
> 6: 0 xen-percpu-ipi irqwork0
> 7: 321 xen-dyn-event xenbus
> 8: 90 xen-dyn-event hvc_console
> ...
>
> But hvc_console cannot get its interrupt because it is already in use
> by rtc0 and the console does not work.
>
> genirq: Flags mismatch irq 8. 00000000 (hvc_console) vs. 00000000
> (rtc0)
>
> The rtc_cmos device requires a particular legacy irq so don't add it
> if there are no legacy irqs.
>
> Signed-off-by: David Vrabel <[email protected]>
> ---
> arch/x86/kernel/rtc.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c
> index cd96852..07c70f1 100644
> --- a/arch/x86/kernel/rtc.c
> +++ b/arch/x86/kernel/rtc.c
> @@ -14,6 +14,7 @@
> #include <asm/time.h>
> #include <asm/intel-mid.h>
> #include <asm/rtc.h>
> +#include <asm/i8259.h>
>
> #ifdef CONFIG_X86_32
> /*
> @@ -200,6 +201,10 @@ static __init int add_rtc_cmos(void)
> }
> #endif
>
> + /* RTC uses legacy IRQs. */
> + if (!nr_legacy_irqs())
> + return -ENODEV;
> +
> platform_device_register(&rtc_device);
> dev_info(&rtc_device.dev,
> "registered platform RTC device (no PNP device found)\n");