The first three patches fix outstanding issues with v3.9 (and earlier)
kernels where the simple sequence of:
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 1 > /sys/devices/system/cpu/cpu1/online
would embarrassingly not work. As such they also have the [email protected]
on them.
[PATCH 1/9] xen/smp: Fix leakage of timer interrupt line for every
[PATCH 2/9] xen/smp/spinlock: Fix leakage of the spinlock interrupt
[PATCH 3/9] xen/time: Fix kasprintf splat when allocating timer%d IRQ
The rest are just cleanups and some coalescing of the PV and PVHVM
paths.
arch/x86/xen/enlighten.c | 5 ++++-
arch/x86/xen/smp.c | 21 ++++++++++++++-------
arch/x86/xen/spinlock.c | 25 +++++++++++++++++++++++++
arch/x86/xen/time.c | 13 ++++++++++---
drivers/xen/events.c | 20 +++++++++++++++++++-
5 files changed, 72 insertions(+), 12 deletions(-)
Konrad Rzeszutek Wilk (9):
xen/smp: Fix leakage of timer interrupt line for every CPU online/offline.
xen/smp/spinlock: Fix leakage of the spinlock interrupt line for every CPU online/offline
xen/time: Fix kasprintf splat when allocating timer%d IRQ line.
xen/events: Check that IRQ value passed in is valid.
xen/time: Add default value of -1 for IRQ and check for that.
xen/spinlock: Check against default value of -1 for IRQ line.
xen/spinlock: Disable IRQ spinlock (PV) allocation on PVHVM
xen/smp/pvhvm: Don't initialize IRQ_WORKER as we are using the native one.
xen/smp: Unifiy some of the PVs and PVHVM offline CPU path
While we don't use the spinlock interrupt line (see for details
commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23 -
xen: disable PV spinlocks on HVM) - we should still do the proper
init / deinit sequence. We did not do that correctly and for the
CPU init for PVHVM guest we would allocate an interrupt line - but
failed to deallocate the old interrupt line.
This resulted in leakage of an irq_desc but more importantly this splat
as we online an offlined CPU:
genirq: Flags mismatch irq 71. 0002cc20 (spinlock1) vs. 0002cc20 (spinlock1)
Pid: 2542, comm: init.late Not tainted 3.9.0-rc6upstream #1
Call Trace:
[<ffffffff811156de>] __setup_irq+0x23e/0x4a0
[<ffffffff81194191>] ? kmem_cache_alloc_trace+0x221/0x250
[<ffffffff811161bb>] request_threaded_irq+0xfb/0x160
[<ffffffff8104c6f0>] ? xen_spin_trylock+0x20/0x20
[<ffffffff813a8423>] bind_ipi_to_irqhandler+0xa3/0x160
[<ffffffff81303758>] ? kasprintf+0x38/0x40
[<ffffffff8104c6f0>] ? xen_spin_trylock+0x20/0x20
[<ffffffff810cad35>] ? update_max_interval+0x15/0x40
[<ffffffff816605db>] xen_init_lock_cpu+0x3c/0x78
[<ffffffff81660029>] xen_hvm_cpu_notify+0x29/0x33
[<ffffffff81676bdd>] notifier_call_chain+0x4d/0x70
[<ffffffff810bb2a9>] __raw_notifier_call_chain+0x9/0x10
[<ffffffff8109402b>] __cpu_notify+0x1b/0x30
[<ffffffff8166834a>] _cpu_up+0xa0/0x14b
[<ffffffff816684ce>] cpu_up+0xd9/0xec
[<ffffffff8165f754>] store_online+0x94/0xd0
[<ffffffff8141d15b>] dev_attr_store+0x1b/0x20
[<ffffffff81218f44>] sysfs_write_file+0xf4/0x170
[<ffffffff811a2864>] vfs_write+0xb4/0x130
[<ffffffff811a302a>] sys_write+0x5a/0xa0
[<ffffffff8167ada9>] system_call_fastpath+0x16/0x1b
cpu 1 spinlock event irq -16
smpboot: Booting Node 0 Processor 1 APIC 0x2
And if one looks at the /proc/interrupts right after
offlining (CPU1):
70: 0 0 xen-percpu-ipi spinlock0
71: 0 0 xen-percpu-ipi spinlock1
77: 0 0 xen-percpu-ipi spinlock2
There is the oddity of the 'spinlock1' still being present.
CC: [email protected]
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
arch/x86/xen/smp.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index f80e69c..22c800a 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -662,6 +662,7 @@ static void xen_hvm_cpu_die(unsigned int cpu)
unbind_from_irqhandler(per_cpu(xen_debug_irq, cpu), NULL);
unbind_from_irqhandler(per_cpu(xen_callfuncsingle_irq, cpu), NULL);
unbind_from_irqhandler(per_cpu(xen_irq_work, cpu), NULL);
+ xen_uninit_lock_cpu(cpu);
xen_teardown_timer(cpu);
native_cpu_die(cpu);
}
--
1.8.1.4
See git commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23
(xen: disable PV spinlocks on HVM) for details.
But we did not disable it everywhere - which means that when
we boot as PVHVM we end up allocating per-CPU irq line for
spinlock. This fixes that.
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
arch/x86/xen/spinlock.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 47ae032..8b54603 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -367,6 +367,13 @@ void __cpuinit xen_init_lock_cpu(int cpu)
WARN(per_cpu(lock_kicker_irq, cpu) > 0, "spinlock on CPU%d exists on IRQ%d!\n",
cpu, per_cpu(lock_kicker_irq, cpu));
+ /*
+ * See git commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23
+ * (xen: disable PV spinlocks on HVM)
+ */
+ if (xen_hvm_domain())
+ return;
+
name = kasprintf(GFP_KERNEL, "spinlock%d", cpu);
irq = bind_ipi_to_irqhandler(XEN_SPIN_UNLOCK_VECTOR,
cpu,
@@ -385,12 +392,26 @@ void __cpuinit xen_init_lock_cpu(int cpu)
void xen_uninit_lock_cpu(int cpu)
{
+ /*
+ * See git commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23
+ * (xen: disable PV spinlocks on HVM)
+ */
+ if (xen_hvm_domain())
+ return;
+
unbind_from_irqhandler(per_cpu(lock_kicker_irq, cpu), NULL);
per_cpu(lock_kicker_irq, cpu) = -1;
}
void __init xen_init_spinlocks(void)
{
+ /*
+ * See git commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23
+ * (xen: disable PV spinlocks on HVM)
+ */
+ if (xen_hvm_domain())
+ return;
+
BUILD_BUG_ON(sizeof(struct xen_spinlock) > sizeof(arch_spinlock_t));
pv_lock_ops.spin_is_locked = xen_spin_is_locked;
--
1.8.1.4
When we online the CPU, we get this splat:
smpboot: Booting Node 0 Processor 1 APIC 0x2
installing Xen timer for CPU 1
BUG: sleeping function called from invalid context at /home/konrad/ssd/konrad/linux/mm/slab.c:3179
in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1
Pid: 0, comm: swapper/1 Not tainted 3.9.0-rc6upstream-00001-g3884fad #1
Call Trace:
[<ffffffff810c1fea>] __might_sleep+0xda/0x100
[<ffffffff81194617>] __kmalloc_track_caller+0x1e7/0x2c0
[<ffffffff81303758>] ? kasprintf+0x38/0x40
[<ffffffff813036eb>] kvasprintf+0x5b/0x90
[<ffffffff81303758>] kasprintf+0x38/0x40
[<ffffffff81044510>] xen_setup_timer+0x30/0xb0
[<ffffffff810445af>] xen_hvm_setup_cpu_clockevents+0x1f/0x30
[<ffffffff81666d0a>] start_secondary+0x19c/0x1a8
The solution to that is use kasprintf in the CPU hotplug path
that 'online's the CPU. That is, do it in in xen_hvm_cpu_notify,
and remove the call to in xen_hvm_setup_cpu_clockevents.
Unfortunatly the later is not a good idea as the bootup path
does not use xen_hvm_cpu_notify so we would end up never allocating
timer%d interrupt lines when booting. As such add the check for
atomic() to continue.
CC: [email protected]
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
arch/x86/xen/enlighten.c | 5 ++++-
arch/x86/xen/time.c | 6 +++++-
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 47d3243..ddbd54a 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1641,8 +1641,11 @@ static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self,
switch (action) {
case CPU_UP_PREPARE:
xen_vcpu_setup(cpu);
- if (xen_have_vector_callback)
+ if (xen_have_vector_callback) {
xen_init_lock_cpu(cpu);
+ if (xen_feature(XENFEAT_hvm_safe_pvclock))
+ xen_setup_timer(cpu);
+ }
break;
default:
break;
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index 0296a95..054cc01 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -497,7 +497,11 @@ static void xen_hvm_setup_cpu_clockevents(void)
{
int cpu = smp_processor_id();
xen_setup_runstate_info(cpu);
- xen_setup_timer(cpu);
+ /*
+ * xen_setup_timer(cpu) - snprintf is bad in atomic context. Hence
+ * doing it xen_hvm_cpu_notify (which gets called by smp_init during
+ * early bootup and also during CPU hotplug events).
+ */
xen_setup_cpu_clockevents();
}
--
1.8.1.4
If the timer interrupt has been de-init or is just now being
initialized, the default value of -1 should be preset as
interrupt line. Check for that and if something is odd
WARN us.
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
arch/x86/xen/time.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index 054cc01..3d88bfd 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -377,7 +377,7 @@ static const struct clock_event_device xen_vcpuop_clockevent = {
static const struct clock_event_device *xen_clockevent =
&xen_timerop_clockevent;
-static DEFINE_PER_CPU(struct clock_event_device, xen_clock_events);
+static DEFINE_PER_CPU(struct clock_event_device, xen_clock_events) = { .irq = -1 };
static irqreturn_t xen_timer_interrupt(int irq, void *dev_id)
{
@@ -401,6 +401,9 @@ void xen_setup_timer(int cpu)
struct clock_event_device *evt;
int irq;
+ evt = &per_cpu(xen_clock_events, cpu);
+ WARN(evt->irq >= 0, "IRQ%d for CPU%d is already allocated\n", evt->irq, cpu);
+
printk(KERN_INFO "installing Xen timer for CPU %d\n", cpu);
name = kasprintf(GFP_KERNEL, "timer%d", cpu);
@@ -413,7 +416,6 @@ void xen_setup_timer(int cpu)
IRQF_FORCE_RESUME,
name, NULL);
- evt = &per_cpu(xen_clock_events, cpu);
memcpy(evt, xen_clockevent, sizeof(*evt));
evt->cpumask = cpumask_of(cpu);
@@ -426,6 +428,7 @@ void xen_teardown_timer(int cpu)
BUG_ON(cpu == 0);
evt = &per_cpu(xen_clock_events, cpu);
unbind_from_irqhandler(evt->irq, NULL);
+ evt->irq = -1;
}
void xen_setup_cpu_clockevents(void)
--
1.8.1.4
The "xen_cpu_die" and "xen_hvm_cpu_die" are very similar.
Lets coalesce them.
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
arch/x86/xen/smp.c | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 415694c..0d466d7 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -428,7 +428,7 @@ static int xen_cpu_disable(void)
static void xen_cpu_die(unsigned int cpu)
{
- while (HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu, NULL)) {
+ while (xen_pv_domain() && HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu, NULL)) {
current->state = TASK_UNINTERRUPTIBLE;
schedule_timeout(HZ/10);
}
@@ -436,7 +436,8 @@ static void xen_cpu_die(unsigned int cpu)
unbind_from_irqhandler(per_cpu(xen_callfunc_irq, cpu), NULL);
unbind_from_irqhandler(per_cpu(xen_debug_irq, cpu), NULL);
unbind_from_irqhandler(per_cpu(xen_callfuncsingle_irq, cpu), NULL);
- unbind_from_irqhandler(per_cpu(xen_irq_work, cpu), NULL);
+ if (!xen_hvm_domain())
+ unbind_from_irqhandler(per_cpu(xen_irq_work, cpu), NULL);
xen_uninit_lock_cpu(cpu);
xen_teardown_timer(cpu);
}
@@ -667,14 +668,7 @@ static int __cpuinit xen_hvm_cpu_up(unsigned int cpu, struct task_struct *tidle)
static void xen_hvm_cpu_die(unsigned int cpu)
{
- unbind_from_irqhandler(per_cpu(xen_resched_irq, cpu), NULL);
- unbind_from_irqhandler(per_cpu(xen_callfunc_irq, cpu), NULL);
- unbind_from_irqhandler(per_cpu(xen_debug_irq, cpu), NULL);
- unbind_from_irqhandler(per_cpu(xen_callfuncsingle_irq, cpu), NULL);
- if (!xen_hvm_domain())
- unbind_from_irqhandler(per_cpu(xen_irq_work, cpu), NULL);
- xen_uninit_lock_cpu(cpu);
- xen_teardown_timer(cpu);
+ xen_cpu_die(cpu);
native_cpu_die(cpu);
}
--
1.8.1.4
The default (uninitialized) value of the IRQ line is -1.
Check if we already have allocated an spinlock interrupt line
and if somebody is trying to do it again. Also set it to -1
when we offline the CPU.
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
arch/x86/xen/spinlock.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index f7a080e..47ae032 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -364,6 +364,9 @@ void __cpuinit xen_init_lock_cpu(int cpu)
int irq;
const char *name;
+ WARN(per_cpu(lock_kicker_irq, cpu) > 0, "spinlock on CPU%d exists on IRQ%d!\n",
+ cpu, per_cpu(lock_kicker_irq, cpu));
+
name = kasprintf(GFP_KERNEL, "spinlock%d", cpu);
irq = bind_ipi_to_irqhandler(XEN_SPIN_UNLOCK_VECTOR,
cpu,
@@ -383,6 +386,7 @@ void __cpuinit xen_init_lock_cpu(int cpu)
void xen_uninit_lock_cpu(int cpu)
{
unbind_from_irqhandler(per_cpu(lock_kicker_irq, cpu), NULL);
+ per_cpu(lock_kicker_irq, cpu) = -1;
}
void __init xen_init_spinlocks(void)
--
1.8.1.4
There is no need to use the PV version of the IRQ_WORKER mechanism
as under PVHVM we are using the native version. The native
version is using the SMP API.
They just sit around unused:
69: 0 0 xen-percpu-ipi irqwork0
83: 0 0 xen-percpu-ipi irqwork1
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
arch/x86/xen/smp.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 22c800a..415694c 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -144,6 +144,13 @@ static int xen_smp_intr_init(unsigned int cpu)
goto fail;
per_cpu(xen_callfuncsingle_irq, cpu) = rc;
+ /*
+ * The IRQ worker on PVHVM goes through the native path and uses the
+ * IPI mechanism.
+ */
+ if (xen_hvm_domain())
+ return 0;
+
callfunc_name = kasprintf(GFP_KERNEL, "irqwork%d", cpu);
rc = bind_ipi_to_irqhandler(XEN_IRQ_WORK_VECTOR,
cpu,
@@ -167,6 +174,9 @@ static int xen_smp_intr_init(unsigned int cpu)
if (per_cpu(xen_callfuncsingle_irq, cpu) >= 0)
unbind_from_irqhandler(per_cpu(xen_callfuncsingle_irq, cpu),
NULL);
+ if (xen_hvm_domain())
+ return rc;
+
if (per_cpu(xen_irq_work, cpu) >= 0)
unbind_from_irqhandler(per_cpu(xen_irq_work, cpu), NULL);
@@ -661,7 +671,8 @@ static void xen_hvm_cpu_die(unsigned int cpu)
unbind_from_irqhandler(per_cpu(xen_callfunc_irq, cpu), NULL);
unbind_from_irqhandler(per_cpu(xen_debug_irq, cpu), NULL);
unbind_from_irqhandler(per_cpu(xen_callfuncsingle_irq, cpu), NULL);
- unbind_from_irqhandler(per_cpu(xen_irq_work, cpu), NULL);
+ if (!xen_hvm_domain())
+ unbind_from_irqhandler(per_cpu(xen_irq_work, cpu), NULL);
xen_uninit_lock_cpu(cpu);
xen_teardown_timer(cpu);
native_cpu_die(cpu);
--
1.8.1.4
In the PVHVM path when we do CPU online/offline path we would
leak the timer%d IRQ line everytime we do a offline event. The
online path (xen_hvm_setup_cpu_clockevents via
x86_cpuinit.setup_percpu_clockev) would allocate a new interrupt
line for the timer%d.
But we would still use the old interrupt line leading to:
kernel BUG at /home/konrad/ssd/konrad/linux/kernel/hrtimer.c:1261!
invalid opcode: 0000 [#1] SMP
RIP: 0010:[<ffffffff810b9e21>] [<ffffffff810b9e21>] hrtimer_interrupt+0x261/0x270
.. snip..
<IRQ>
[<ffffffff810445ef>] xen_timer_interrupt+0x2f/0x1b0
[<ffffffff81104825>] ? stop_machine_cpu_stop+0xb5/0xf0
[<ffffffff8111434c>] handle_irq_event_percpu+0x7c/0x240
[<ffffffff811175b9>] handle_percpu_irq+0x49/0x70
[<ffffffff813a74a3>] __xen_evtchn_do_upcall+0x1c3/0x2f0
[<ffffffff813a760a>] xen_evtchn_do_upcall+0x2a/0x40
[<ffffffff8167c26d>] xen_hvm_callback_vector+0x6d/0x80
<EOI>
[<ffffffff81666d01>] ? start_secondary+0x193/0x1a8
[<ffffffff81666cfd>] ? start_secondary+0x18f/0x1a8
There is also the oddity (timer1) in the /proc/interrupts after
offlining CPU1:
64: 1121 0 xen-percpu-virq timer0
78: 0 0 xen-percpu-virq timer1
84: 0 2483 xen-percpu-virq timer2
This patch fixes it.
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
CC: [email protected]
---
arch/x86/xen/smp.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 09ea61d..f80e69c 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -662,6 +662,7 @@ static void xen_hvm_cpu_die(unsigned int cpu)
unbind_from_irqhandler(per_cpu(xen_debug_irq, cpu), NULL);
unbind_from_irqhandler(per_cpu(xen_callfuncsingle_irq, cpu), NULL);
unbind_from_irqhandler(per_cpu(xen_irq_work, cpu), NULL);
+ xen_teardown_timer(cpu);
native_cpu_die(cpu);
}
--
1.8.1.4
We naively assume that the IRQ value passed in is correct.
If it is not, then any dereference operation for the 'info'
structure will result in crash - so might as well guard ourselves
and sprinkle copious amounts of WARN_ON.
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
drivers/xen/events.c | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index bb65f75..94daed1 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -515,6 +515,9 @@ static void xen_free_irq(unsigned irq)
{
struct irq_info *info = irq_get_handler_data(irq);
+ if (WARN_ON(!info))
+ return;
+
list_del(&info->list);
irq_set_handler_data(irq, NULL);
@@ -1003,6 +1006,9 @@ static void unbind_from_irq(unsigned int irq)
int evtchn = evtchn_from_irq(irq);
struct irq_info *info = irq_get_handler_data(irq);
+ if (WARN_ON(!info))
+ return;
+
mutex_lock(&irq_mapping_update_lock);
if (info->refcnt > 0) {
@@ -1130,6 +1136,10 @@ int bind_ipi_to_irqhandler(enum ipi_vector ipi,
void unbind_from_irqhandler(unsigned int irq, void *dev_id)
{
+ struct irq_info *info = irq_get_handler_data(irq);
+
+ if (WARN_ON(!info))
+ return;
free_irq(irq, dev_id);
unbind_from_irq(irq);
}
@@ -1441,6 +1451,9 @@ void rebind_evtchn_irq(int evtchn, int irq)
{
struct irq_info *info = info_for_irq(irq);
+ if (WARN_ON(!info))
+ return;
+
/* Make sure the irq is masked, since the new event channel
will also be masked. */
disable_irq(irq);
@@ -1714,7 +1727,12 @@ void xen_poll_irq(int irq)
int xen_test_irq_shared(int irq)
{
struct irq_info *info = info_for_irq(irq);
- struct physdev_irq_status_query irq_status = { .irq = info->u.pirq.pirq };
+ struct physdev_irq_status_query irq_status;
+
+ if (WARN_ON(!info))
+ return -ENOENT;
+
+ irq_status.irq = info->u.pirq.pirq;
if (HYPERVISOR_physdev_op(PHYSDEVOP_irq_status_query, &irq_status))
return 0;
--
1.8.1.4
On Tue, 16 Apr 2013, Konrad Rzeszutek Wilk wrote:
> While we don't use the spinlock interrupt line (see for details
> commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23 -
> xen: disable PV spinlocks on HVM) - we should still do the proper
> init / deinit sequence. We did not do that correctly and for the
> CPU init for PVHVM guest we would allocate an interrupt line - but
> failed to deallocate the old interrupt line.
>
> This resulted in leakage of an irq_desc but more importantly this splat
> as we online an offlined CPU:
>
> genirq: Flags mismatch irq 71. 0002cc20 (spinlock1) vs. 0002cc20 (spinlock1)
> Pid: 2542, comm: init.late Not tainted 3.9.0-rc6upstream #1
> Call Trace:
> [<ffffffff811156de>] __setup_irq+0x23e/0x4a0
> [<ffffffff81194191>] ? kmem_cache_alloc_trace+0x221/0x250
> [<ffffffff811161bb>] request_threaded_irq+0xfb/0x160
> [<ffffffff8104c6f0>] ? xen_spin_trylock+0x20/0x20
> [<ffffffff813a8423>] bind_ipi_to_irqhandler+0xa3/0x160
> [<ffffffff81303758>] ? kasprintf+0x38/0x40
> [<ffffffff8104c6f0>] ? xen_spin_trylock+0x20/0x20
> [<ffffffff810cad35>] ? update_max_interval+0x15/0x40
> [<ffffffff816605db>] xen_init_lock_cpu+0x3c/0x78
> [<ffffffff81660029>] xen_hvm_cpu_notify+0x29/0x33
> [<ffffffff81676bdd>] notifier_call_chain+0x4d/0x70
> [<ffffffff810bb2a9>] __raw_notifier_call_chain+0x9/0x10
> [<ffffffff8109402b>] __cpu_notify+0x1b/0x30
> [<ffffffff8166834a>] _cpu_up+0xa0/0x14b
> [<ffffffff816684ce>] cpu_up+0xd9/0xec
> [<ffffffff8165f754>] store_online+0x94/0xd0
> [<ffffffff8141d15b>] dev_attr_store+0x1b/0x20
> [<ffffffff81218f44>] sysfs_write_file+0xf4/0x170
> [<ffffffff811a2864>] vfs_write+0xb4/0x130
> [<ffffffff811a302a>] sys_write+0x5a/0xa0
> [<ffffffff8167ada9>] system_call_fastpath+0x16/0x1b
> cpu 1 spinlock event irq -16
> smpboot: Booting Node 0 Processor 1 APIC 0x2
>
> And if one looks at the /proc/interrupts right after
> offlining (CPU1):
>
> 70: 0 0 xen-percpu-ipi spinlock0
> 71: 0 0 xen-percpu-ipi spinlock1
> 77: 0 0 xen-percpu-ipi spinlock2
>
> There is the oddity of the 'spinlock1' still being present.
>
> CC: [email protected]
> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Acked-by: Stefano Stabellini <[email protected]>
> arch/x86/xen/smp.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index f80e69c..22c800a 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -662,6 +662,7 @@ static void xen_hvm_cpu_die(unsigned int cpu)
> unbind_from_irqhandler(per_cpu(xen_debug_irq, cpu), NULL);
> unbind_from_irqhandler(per_cpu(xen_callfuncsingle_irq, cpu), NULL);
> unbind_from_irqhandler(per_cpu(xen_irq_work, cpu), NULL);
> + xen_uninit_lock_cpu(cpu);
> xen_teardown_timer(cpu);
> native_cpu_die(cpu);
> }
> --
> 1.8.1.4
>
On Tue, 16 Apr 2013, Konrad Rzeszutek Wilk wrote:
> In the PVHVM path when we do CPU online/offline path we would
> leak the timer%d IRQ line everytime we do a offline event. The
> online path (xen_hvm_setup_cpu_clockevents via
> x86_cpuinit.setup_percpu_clockev) would allocate a new interrupt
> line for the timer%d.
>
> But we would still use the old interrupt line leading to:
>
> kernel BUG at /home/konrad/ssd/konrad/linux/kernel/hrtimer.c:1261!
> invalid opcode: 0000 [#1] SMP
> RIP: 0010:[<ffffffff810b9e21>] [<ffffffff810b9e21>] hrtimer_interrupt+0x261/0x270
> .. snip..
> <IRQ>
> [<ffffffff810445ef>] xen_timer_interrupt+0x2f/0x1b0
> [<ffffffff81104825>] ? stop_machine_cpu_stop+0xb5/0xf0
> [<ffffffff8111434c>] handle_irq_event_percpu+0x7c/0x240
> [<ffffffff811175b9>] handle_percpu_irq+0x49/0x70
> [<ffffffff813a74a3>] __xen_evtchn_do_upcall+0x1c3/0x2f0
> [<ffffffff813a760a>] xen_evtchn_do_upcall+0x2a/0x40
> [<ffffffff8167c26d>] xen_hvm_callback_vector+0x6d/0x80
> <EOI>
> [<ffffffff81666d01>] ? start_secondary+0x193/0x1a8
> [<ffffffff81666cfd>] ? start_secondary+0x18f/0x1a8
>
> There is also the oddity (timer1) in the /proc/interrupts after
> offlining CPU1:
>
> 64: 1121 0 xen-percpu-virq timer0
> 78: 0 0 xen-percpu-virq timer1
> 84: 0 2483 xen-percpu-virq timer2
>
> This patch fixes it.
>
> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> CC: [email protected]
Acked-by: Stefano Stabellini <[email protected]>
> arch/x86/xen/smp.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index 09ea61d..f80e69c 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -662,6 +662,7 @@ static void xen_hvm_cpu_die(unsigned int cpu)
> unbind_from_irqhandler(per_cpu(xen_debug_irq, cpu), NULL);
> unbind_from_irqhandler(per_cpu(xen_callfuncsingle_irq, cpu), NULL);
> unbind_from_irqhandler(per_cpu(xen_irq_work, cpu), NULL);
> + xen_teardown_timer(cpu);
> native_cpu_die(cpu);
> }
>
> --
> 1.8.1.4
>
On Tue, 16 Apr 2013, Konrad Rzeszutek Wilk wrote:
> When we online the CPU, we get this splat:
>
> smpboot: Booting Node 0 Processor 1 APIC 0x2
> installing Xen timer for CPU 1
> BUG: sleeping function called from invalid context at /home/konrad/ssd/konrad/linux/mm/slab.c:3179
> in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1
> Pid: 0, comm: swapper/1 Not tainted 3.9.0-rc6upstream-00001-g3884fad #1
> Call Trace:
> [<ffffffff810c1fea>] __might_sleep+0xda/0x100
> [<ffffffff81194617>] __kmalloc_track_caller+0x1e7/0x2c0
> [<ffffffff81303758>] ? kasprintf+0x38/0x40
> [<ffffffff813036eb>] kvasprintf+0x5b/0x90
> [<ffffffff81303758>] kasprintf+0x38/0x40
> [<ffffffff81044510>] xen_setup_timer+0x30/0xb0
> [<ffffffff810445af>] xen_hvm_setup_cpu_clockevents+0x1f/0x30
> [<ffffffff81666d0a>] start_secondary+0x19c/0x1a8
>
> The solution to that is use kasprintf in the CPU hotplug path
> that 'online's the CPU. That is, do it in in xen_hvm_cpu_notify,
> and remove the call to in xen_hvm_setup_cpu_clockevents.
>
> Unfortunatly the later is not a good idea as the bootup path
> does not use xen_hvm_cpu_notify so we would end up never allocating
> timer%d interrupt lines when booting. As such add the check for
> atomic() to continue.
This last is not reflected in the code.
Also, is it actually OK to move xen_setup_timer out of
xen_hvm_setup_cpu_clockevents?
xen_setup_cpu_clockevents registers xen_clock_events as clocksource and
xen_clock_events is setup by xen_setup_timer so we need to make sure
that the call order remains the same.
> CC: [email protected]
> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> ---
> arch/x86/xen/enlighten.c | 5 ++++-
> arch/x86/xen/time.c | 6 +++++-
> 2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index 47d3243..ddbd54a 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1641,8 +1641,11 @@ static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self,
> switch (action) {
> case CPU_UP_PREPARE:
> xen_vcpu_setup(cpu);
> - if (xen_have_vector_callback)
> + if (xen_have_vector_callback) {
> xen_init_lock_cpu(cpu);
> + if (xen_feature(XENFEAT_hvm_safe_pvclock))
> + xen_setup_timer(cpu);
> + }
> break;
> default:
> break;
> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
> index 0296a95..054cc01 100644
> --- a/arch/x86/xen/time.c
> +++ b/arch/x86/xen/time.c
> @@ -497,7 +497,11 @@ static void xen_hvm_setup_cpu_clockevents(void)
> {
> int cpu = smp_processor_id();
> xen_setup_runstate_info(cpu);
> - xen_setup_timer(cpu);
> + /*
> + * xen_setup_timer(cpu) - snprintf is bad in atomic context. Hence
> + * doing it xen_hvm_cpu_notify (which gets called by smp_init during
> + * early bootup and also during CPU hotplug events).
> + */
> xen_setup_cpu_clockevents();
> }
>
> --
> 1.8.1.4
>
On Tue, 16 Apr 2013, Konrad Rzeszutek Wilk wrote:
> We naively assume that the IRQ value passed in is correct.
> If it is not, then any dereference operation for the 'info'
> structure will result in crash - so might as well guard ourselves
> and sprinkle copious amounts of WARN_ON.
>
> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Acked-by: Stefano Stabellini <[email protected]>
> drivers/xen/events.c | 20 +++++++++++++++++++-
> 1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/xen/events.c b/drivers/xen/events.c
> index bb65f75..94daed1 100644
> --- a/drivers/xen/events.c
> +++ b/drivers/xen/events.c
> @@ -515,6 +515,9 @@ static void xen_free_irq(unsigned irq)
> {
> struct irq_info *info = irq_get_handler_data(irq);
>
> + if (WARN_ON(!info))
> + return;
> +
> list_del(&info->list);
>
> irq_set_handler_data(irq, NULL);
> @@ -1003,6 +1006,9 @@ static void unbind_from_irq(unsigned int irq)
> int evtchn = evtchn_from_irq(irq);
> struct irq_info *info = irq_get_handler_data(irq);
>
> + if (WARN_ON(!info))
> + return;
> +
> mutex_lock(&irq_mapping_update_lock);
>
> if (info->refcnt > 0) {
> @@ -1130,6 +1136,10 @@ int bind_ipi_to_irqhandler(enum ipi_vector ipi,
>
> void unbind_from_irqhandler(unsigned int irq, void *dev_id)
> {
> + struct irq_info *info = irq_get_handler_data(irq);
> +
> + if (WARN_ON(!info))
> + return;
> free_irq(irq, dev_id);
> unbind_from_irq(irq);
> }
> @@ -1441,6 +1451,9 @@ void rebind_evtchn_irq(int evtchn, int irq)
> {
> struct irq_info *info = info_for_irq(irq);
>
> + if (WARN_ON(!info))
> + return;
> +
> /* Make sure the irq is masked, since the new event channel
> will also be masked. */
> disable_irq(irq);
> @@ -1714,7 +1727,12 @@ void xen_poll_irq(int irq)
> int xen_test_irq_shared(int irq)
> {
> struct irq_info *info = info_for_irq(irq);
> - struct physdev_irq_status_query irq_status = { .irq = info->u.pirq.pirq };
> + struct physdev_irq_status_query irq_status;
> +
> + if (WARN_ON(!info))
> + return -ENOENT;
> +
> + irq_status.irq = info->u.pirq.pirq;
>
> if (HYPERVISOR_physdev_op(PHYSDEVOP_irq_status_query, &irq_status))
> return 0;
> --
> 1.8.1.4
>
On Tue, 16 Apr 2013, Konrad Rzeszutek Wilk wrote:
> If the timer interrupt has been de-init or is just now being
> initialized, the default value of -1 should be preset as
> interrupt line. Check for that and if something is odd
> WARN us.
>
> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Acked-by: Stefano Stabellini <[email protected]>
> arch/x86/xen/time.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
> index 054cc01..3d88bfd 100644
> --- a/arch/x86/xen/time.c
> +++ b/arch/x86/xen/time.c
> @@ -377,7 +377,7 @@ static const struct clock_event_device xen_vcpuop_clockevent = {
>
> static const struct clock_event_device *xen_clockevent =
> &xen_timerop_clockevent;
> -static DEFINE_PER_CPU(struct clock_event_device, xen_clock_events);
> +static DEFINE_PER_CPU(struct clock_event_device, xen_clock_events) = { .irq = -1 };
>
> static irqreturn_t xen_timer_interrupt(int irq, void *dev_id)
> {
> @@ -401,6 +401,9 @@ void xen_setup_timer(int cpu)
> struct clock_event_device *evt;
> int irq;
>
> + evt = &per_cpu(xen_clock_events, cpu);
> + WARN(evt->irq >= 0, "IRQ%d for CPU%d is already allocated\n", evt->irq, cpu);
> +
> printk(KERN_INFO "installing Xen timer for CPU %d\n", cpu);
>
> name = kasprintf(GFP_KERNEL, "timer%d", cpu);
> @@ -413,7 +416,6 @@ void xen_setup_timer(int cpu)
> IRQF_FORCE_RESUME,
> name, NULL);
>
> - evt = &per_cpu(xen_clock_events, cpu);
> memcpy(evt, xen_clockevent, sizeof(*evt));
>
> evt->cpumask = cpumask_of(cpu);
> @@ -426,6 +428,7 @@ void xen_teardown_timer(int cpu)
> BUG_ON(cpu == 0);
> evt = &per_cpu(xen_clock_events, cpu);
> unbind_from_irqhandler(evt->irq, NULL);
> + evt->irq = -1;
> }
>
> void xen_setup_cpu_clockevents(void)
> --
> 1.8.1.4
>
On Tue, 16 Apr 2013, Konrad Rzeszutek Wilk wrote:
> The default (uninitialized) value of the IRQ line is -1.
> Check if we already have allocated an spinlock interrupt line
> and if somebody is trying to do it again. Also set it to -1
> when we offline the CPU.
>
> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> ---
> arch/x86/xen/spinlock.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
> index f7a080e..47ae032 100644
> --- a/arch/x86/xen/spinlock.c
> +++ b/arch/x86/xen/spinlock.c
> @@ -364,6 +364,9 @@ void __cpuinit xen_init_lock_cpu(int cpu)
> int irq;
> const char *name;
>
> + WARN(per_cpu(lock_kicker_irq, cpu) > 0, "spinlock on CPU%d exists on IRQ%d!\n",
shouldn't this be >= ^
> + cpu, per_cpu(lock_kicker_irq, cpu));
>
> name = kasprintf(GFP_KERNEL, "spinlock%d", cpu);
> irq = bind_ipi_to_irqhandler(XEN_SPIN_UNLOCK_VECTOR,
> cpu,
> @@ -383,6 +386,7 @@ void __cpuinit xen_init_lock_cpu(int cpu)
> void xen_uninit_lock_cpu(int cpu)
> {
> unbind_from_irqhandler(per_cpu(lock_kicker_irq, cpu), NULL);
> + per_cpu(lock_kicker_irq, cpu) = -1;
> }
>
> void __init xen_init_spinlocks(void)
> --
> 1.8.1.4
>
On Tue, 16 Apr 2013, Konrad Rzeszutek Wilk wrote:
> See git commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23
> (xen: disable PV spinlocks on HVM) for details.
>
> But we did not disable it everywhere - which means that when
> we boot as PVHVM we end up allocating per-CPU irq line for
> spinlock. This fixes that.
>
> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Is there any point in calling xen_init_lock_cpu in PVHVM guests?
At that point we might as well remove the call from xen_hvm_cpu_notify.
> arch/x86/xen/spinlock.c | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
>
> diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
> index 47ae032..8b54603 100644
> --- a/arch/x86/xen/spinlock.c
> +++ b/arch/x86/xen/spinlock.c
> @@ -367,6 +367,13 @@ void __cpuinit xen_init_lock_cpu(int cpu)
> WARN(per_cpu(lock_kicker_irq, cpu) > 0, "spinlock on CPU%d exists on IRQ%d!\n",
> cpu, per_cpu(lock_kicker_irq, cpu));
>
> + /*
> + * See git commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23
> + * (xen: disable PV spinlocks on HVM)
> + */
> + if (xen_hvm_domain())
> + return;
> +
> name = kasprintf(GFP_KERNEL, "spinlock%d", cpu);
> irq = bind_ipi_to_irqhandler(XEN_SPIN_UNLOCK_VECTOR,
> cpu,
> @@ -385,12 +392,26 @@ void __cpuinit xen_init_lock_cpu(int cpu)
>
> void xen_uninit_lock_cpu(int cpu)
> {
> + /*
> + * See git commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23
> + * (xen: disable PV spinlocks on HVM)
> + */
> + if (xen_hvm_domain())
> + return;
> +
> unbind_from_irqhandler(per_cpu(lock_kicker_irq, cpu), NULL);
> per_cpu(lock_kicker_irq, cpu) = -1;
> }
>
> void __init xen_init_spinlocks(void)
> {
> + /*
> + * See git commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23
> + * (xen: disable PV spinlocks on HVM)
> + */
> + if (xen_hvm_domain())
> + return;
> +
> BUILD_BUG_ON(sizeof(struct xen_spinlock) > sizeof(arch_spinlock_t));
>
> pv_lock_ops.spin_is_locked = xen_spin_is_locked;
> --
> 1.8.1.4
>
On Tue, 16 Apr 2013, Konrad Rzeszutek Wilk wrote:
> There is no need to use the PV version of the IRQ_WORKER mechanism
> as under PVHVM we are using the native version. The native
> version is using the SMP API.
>
> They just sit around unused:
>
> 69: 0 0 xen-percpu-ipi irqwork0
> 83: 0 0 xen-percpu-ipi irqwork1
>
> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Might be worth trying to make it work instead?
Is it just because we don't set the apic->send_IPI_* functions to the
xen specific version on PVHVM?
> arch/x86/xen/smp.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index 22c800a..415694c 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -144,6 +144,13 @@ static int xen_smp_intr_init(unsigned int cpu)
> goto fail;
> per_cpu(xen_callfuncsingle_irq, cpu) = rc;
>
> + /*
> + * The IRQ worker on PVHVM goes through the native path and uses the
> + * IPI mechanism.
> + */
> + if (xen_hvm_domain())
> + return 0;
> +
> callfunc_name = kasprintf(GFP_KERNEL, "irqwork%d", cpu);
> rc = bind_ipi_to_irqhandler(XEN_IRQ_WORK_VECTOR,
> cpu,
> @@ -167,6 +174,9 @@ static int xen_smp_intr_init(unsigned int cpu)
> if (per_cpu(xen_callfuncsingle_irq, cpu) >= 0)
> unbind_from_irqhandler(per_cpu(xen_callfuncsingle_irq, cpu),
> NULL);
> + if (xen_hvm_domain())
> + return rc;
> +
> if (per_cpu(xen_irq_work, cpu) >= 0)
> unbind_from_irqhandler(per_cpu(xen_irq_work, cpu), NULL);
>
> @@ -661,7 +671,8 @@ static void xen_hvm_cpu_die(unsigned int cpu)
> unbind_from_irqhandler(per_cpu(xen_callfunc_irq, cpu), NULL);
> unbind_from_irqhandler(per_cpu(xen_debug_irq, cpu), NULL);
> unbind_from_irqhandler(per_cpu(xen_callfuncsingle_irq, cpu), NULL);
> - unbind_from_irqhandler(per_cpu(xen_irq_work, cpu), NULL);
> + if (!xen_hvm_domain())
> + unbind_from_irqhandler(per_cpu(xen_irq_work, cpu), NULL);
> xen_uninit_lock_cpu(cpu);
> xen_teardown_timer(cpu);
> native_cpu_die(cpu);
> --
> 1.8.1.4
>
On Fri, Apr 26, 2013 at 05:27:20PM +0100, Stefano Stabellini wrote:
> On Tue, 16 Apr 2013, Konrad Rzeszutek Wilk wrote:
> > There is no need to use the PV version of the IRQ_WORKER mechanism
> > as under PVHVM we are using the native version. The native
> > version is using the SMP API.
> >
> > They just sit around unused:
> >
> > 69: 0 0 xen-percpu-ipi irqwork0
> > 83: 0 0 xen-percpu-ipi irqwork1
> >
> > Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
>
> Might be worth trying to make it work instead?
> Is it just because we don't set the apic->send_IPI_* functions to the
> xen specific version on PVHVM?
>
Right. We use the baremetal mechanism to do it. And it works fine.
>
> > arch/x86/xen/smp.c | 13 ++++++++++++-
> > 1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> > index 22c800a..415694c 100644
> > --- a/arch/x86/xen/smp.c
> > +++ b/arch/x86/xen/smp.c
> > @@ -144,6 +144,13 @@ static int xen_smp_intr_init(unsigned int cpu)
> > goto fail;
> > per_cpu(xen_callfuncsingle_irq, cpu) = rc;
> >
> > + /*
> > + * The IRQ worker on PVHVM goes through the native path and uses the
> > + * IPI mechanism.
> > + */
> > + if (xen_hvm_domain())
> > + return 0;
> > +
> > callfunc_name = kasprintf(GFP_KERNEL, "irqwork%d", cpu);
> > rc = bind_ipi_to_irqhandler(XEN_IRQ_WORK_VECTOR,
> > cpu,
> > @@ -167,6 +174,9 @@ static int xen_smp_intr_init(unsigned int cpu)
> > if (per_cpu(xen_callfuncsingle_irq, cpu) >= 0)
> > unbind_from_irqhandler(per_cpu(xen_callfuncsingle_irq, cpu),
> > NULL);
> > + if (xen_hvm_domain())
> > + return rc;
> > +
> > if (per_cpu(xen_irq_work, cpu) >= 0)
> > unbind_from_irqhandler(per_cpu(xen_irq_work, cpu), NULL);
> >
> > @@ -661,7 +671,8 @@ static void xen_hvm_cpu_die(unsigned int cpu)
> > unbind_from_irqhandler(per_cpu(xen_callfunc_irq, cpu), NULL);
> > unbind_from_irqhandler(per_cpu(xen_debug_irq, cpu), NULL);
> > unbind_from_irqhandler(per_cpu(xen_callfuncsingle_irq, cpu), NULL);
> > - unbind_from_irqhandler(per_cpu(xen_irq_work, cpu), NULL);
> > + if (!xen_hvm_domain())
> > + unbind_from_irqhandler(per_cpu(xen_irq_work, cpu), NULL);
> > xen_uninit_lock_cpu(cpu);
> > xen_teardown_timer(cpu);
> > native_cpu_die(cpu);
> > --
> > 1.8.1.4
> >
On Fri, Apr 26, 2013 at 05:20:23PM +0100, Stefano Stabellini wrote:
> On Tue, 16 Apr 2013, Konrad Rzeszutek Wilk wrote:
> > See git commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23
> > (xen: disable PV spinlocks on HVM) for details.
> >
> > But we did not disable it everywhere - which means that when
> > we boot as PVHVM we end up allocating per-CPU irq line for
> > spinlock. This fixes that.
> >
> > Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
>
> Is there any point in calling xen_init_lock_cpu in PVHVM guests?
I was thinking about it.. but I still have hope that I will be able to
take Jeremy's patches for paravirt locking and redo them a bit.
> At that point we might as well remove the call from xen_hvm_cpu_notify.
>
>
> > arch/x86/xen/spinlock.c | 21 +++++++++++++++++++++
> > 1 file changed, 21 insertions(+)
> >
> > diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
> > index 47ae032..8b54603 100644
> > --- a/arch/x86/xen/spinlock.c
> > +++ b/arch/x86/xen/spinlock.c
> > @@ -367,6 +367,13 @@ void __cpuinit xen_init_lock_cpu(int cpu)
> > WARN(per_cpu(lock_kicker_irq, cpu) > 0, "spinlock on CPU%d exists on IRQ%d!\n",
> > cpu, per_cpu(lock_kicker_irq, cpu));
> >
> > + /*
> > + * See git commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23
> > + * (xen: disable PV spinlocks on HVM)
> > + */
> > + if (xen_hvm_domain())
> > + return;
> > +
> > name = kasprintf(GFP_KERNEL, "spinlock%d", cpu);
> > irq = bind_ipi_to_irqhandler(XEN_SPIN_UNLOCK_VECTOR,
> > cpu,
> > @@ -385,12 +392,26 @@ void __cpuinit xen_init_lock_cpu(int cpu)
> >
> > void xen_uninit_lock_cpu(int cpu)
> > {
> > + /*
> > + * See git commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23
> > + * (xen: disable PV spinlocks on HVM)
> > + */
> > + if (xen_hvm_domain())
> > + return;
> > +
> > unbind_from_irqhandler(per_cpu(lock_kicker_irq, cpu), NULL);
> > per_cpu(lock_kicker_irq, cpu) = -1;
> > }
> >
> > void __init xen_init_spinlocks(void)
> > {
> > + /*
> > + * See git commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23
> > + * (xen: disable PV spinlocks on HVM)
> > + */
> > + if (xen_hvm_domain())
> > + return;
> > +
> > BUILD_BUG_ON(sizeof(struct xen_spinlock) > sizeof(arch_spinlock_t));
> >
> > pv_lock_ops.spin_is_locked = xen_spin_is_locked;
> > --
> > 1.8.1.4
> >
On Fri, Apr 26, 2013 at 05:18:01PM +0100, Stefano Stabellini wrote:
> On Tue, 16 Apr 2013, Konrad Rzeszutek Wilk wrote:
> > The default (uninitialized) value of the IRQ line is -1.
> > Check if we already have allocated an spinlock interrupt line
> > and if somebody is trying to do it again. Also set it to -1
> > when we offline the CPU.
> >
> > Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> > ---
> > arch/x86/xen/spinlock.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
> > index f7a080e..47ae032 100644
> > --- a/arch/x86/xen/spinlock.c
> > +++ b/arch/x86/xen/spinlock.c
> > @@ -364,6 +364,9 @@ void __cpuinit xen_init_lock_cpu(int cpu)
> > int irq;
> > const char *name;
> >
> > + WARN(per_cpu(lock_kicker_irq, cpu) > 0, "spinlock on CPU%d exists on IRQ%d!\n",
> shouldn't this be >= ^
>
Yes. Thanks for catching.
>
> > + cpu, per_cpu(lock_kicker_irq, cpu));
> >
> > name = kasprintf(GFP_KERNEL, "spinlock%d", cpu);
> > irq = bind_ipi_to_irqhandler(XEN_SPIN_UNLOCK_VECTOR,
> > cpu,
> > @@ -383,6 +386,7 @@ void __cpuinit xen_init_lock_cpu(int cpu)
> > void xen_uninit_lock_cpu(int cpu)
> > {
> > unbind_from_irqhandler(per_cpu(lock_kicker_irq, cpu), NULL);
> > + per_cpu(lock_kicker_irq, cpu) = -1;
> > }
> >
> > void __init xen_init_spinlocks(void)
> > --
> > 1.8.1.4
> >
On Fri, Apr 26, 2013 at 05:11:35PM +0100, Stefano Stabellini wrote:
> On Tue, 16 Apr 2013, Konrad Rzeszutek Wilk wrote:
> > When we online the CPU, we get this splat:
> >
> > smpboot: Booting Node 0 Processor 1 APIC 0x2
> > installing Xen timer for CPU 1
> > BUG: sleeping function called from invalid context at /home/konrad/ssd/konrad/linux/mm/slab.c:3179
> > in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1
> > Pid: 0, comm: swapper/1 Not tainted 3.9.0-rc6upstream-00001-g3884fad #1
> > Call Trace:
> > [<ffffffff810c1fea>] __might_sleep+0xda/0x100
> > [<ffffffff81194617>] __kmalloc_track_caller+0x1e7/0x2c0
> > [<ffffffff81303758>] ? kasprintf+0x38/0x40
> > [<ffffffff813036eb>] kvasprintf+0x5b/0x90
> > [<ffffffff81303758>] kasprintf+0x38/0x40
> > [<ffffffff81044510>] xen_setup_timer+0x30/0xb0
> > [<ffffffff810445af>] xen_hvm_setup_cpu_clockevents+0x1f/0x30
> > [<ffffffff81666d0a>] start_secondary+0x19c/0x1a8
> >
> > The solution to that is use kasprintf in the CPU hotplug path
> > that 'online's the CPU. That is, do it in in xen_hvm_cpu_notify,
> > and remove the call to in xen_hvm_setup_cpu_clockevents.
> >
> > Unfortunatly the later is not a good idea as the bootup path
> > does not use xen_hvm_cpu_notify so we would end up never allocating
> > timer%d interrupt lines when booting. As such add the check for
> > atomic() to continue.
>
> This last is not reflected in the code.
I found out that it was not needed.
>
> Also, is it actually OK to move xen_setup_timer out of
> xen_hvm_setup_cpu_clockevents?
Yes. It ends up being called earlier - in the notifier.
>
> xen_setup_cpu_clockevents registers xen_clock_events as clocksource and
> xen_clock_events is setup by xen_setup_timer so we need to make sure
> that the call order remains the same.
The order is still the same.
>
>
> > CC: [email protected]
> > Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> > ---
> > arch/x86/xen/enlighten.c | 5 ++++-
> > arch/x86/xen/time.c | 6 +++++-
> > 2 files changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> > index 47d3243..ddbd54a 100644
> > --- a/arch/x86/xen/enlighten.c
> > +++ b/arch/x86/xen/enlighten.c
> > @@ -1641,8 +1641,11 @@ static int __cpuinit xen_hvm_cpu_notify(struct notifier_block *self,
> > switch (action) {
> > case CPU_UP_PREPARE:
> > xen_vcpu_setup(cpu);
> > - if (xen_have_vector_callback)
> > + if (xen_have_vector_callback) {
> > xen_init_lock_cpu(cpu);
> > + if (xen_feature(XENFEAT_hvm_safe_pvclock))
> > + xen_setup_timer(cpu);
> > + }
> > break;
> > default:
> > break;
> > diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
> > index 0296a95..054cc01 100644
> > --- a/arch/x86/xen/time.c
> > +++ b/arch/x86/xen/time.c
> > @@ -497,7 +497,11 @@ static void xen_hvm_setup_cpu_clockevents(void)
> > {
> > int cpu = smp_processor_id();
> > xen_setup_runstate_info(cpu);
> > - xen_setup_timer(cpu);
> > + /*
> > + * xen_setup_timer(cpu) - snprintf is bad in atomic context. Hence
> > + * doing it xen_hvm_cpu_notify (which gets called by smp_init during
> > + * early bootup and also during CPU hotplug events).
> > + */
> > xen_setup_cpu_clockevents();
> > }
> >
> > --
> > 1.8.1.4
> >
On Mon, 29 Apr 2013, Konrad Rzeszutek Wilk wrote:
> On Fri, Apr 26, 2013 at 05:27:20PM +0100, Stefano Stabellini wrote:
> > On Tue, 16 Apr 2013, Konrad Rzeszutek Wilk wrote:
> > > There is no need to use the PV version of the IRQ_WORKER mechanism
> > > as under PVHVM we are using the native version. The native
> > > version is using the SMP API.
> > >
> > > They just sit around unused:
> > >
> > > 69: 0 0 xen-percpu-ipi irqwork0
> > > 83: 0 0 xen-percpu-ipi irqwork1
> > >
> > > Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> >
> > Might be worth trying to make it work instead?
> > Is it just because we don't set the apic->send_IPI_* functions to the
> > xen specific version on PVHVM?
> >
>
> Right. We use the baremetal mechanism to do it. And it works fine.
OK, it works fine, but won't it generate many mores trap and emulate
cycles?
> > > arch/x86/xen/smp.c | 13 ++++++++++++-
> > > 1 file changed, 12 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> > > index 22c800a..415694c 100644
> > > --- a/arch/x86/xen/smp.c
> > > +++ b/arch/x86/xen/smp.c
> > > @@ -144,6 +144,13 @@ static int xen_smp_intr_init(unsigned int cpu)
> > > goto fail;
> > > per_cpu(xen_callfuncsingle_irq, cpu) = rc;
> > >
> > > + /*
> > > + * The IRQ worker on PVHVM goes through the native path and uses the
> > > + * IPI mechanism.
> > > + */
> > > + if (xen_hvm_domain())
> > > + return 0;
> > > +
> > > callfunc_name = kasprintf(GFP_KERNEL, "irqwork%d", cpu);
> > > rc = bind_ipi_to_irqhandler(XEN_IRQ_WORK_VECTOR,
> > > cpu,
> > > @@ -167,6 +174,9 @@ static int xen_smp_intr_init(unsigned int cpu)
> > > if (per_cpu(xen_callfuncsingle_irq, cpu) >= 0)
> > > unbind_from_irqhandler(per_cpu(xen_callfuncsingle_irq, cpu),
> > > NULL);
> > > + if (xen_hvm_domain())
> > > + return rc;
> > > +
> > > if (per_cpu(xen_irq_work, cpu) >= 0)
> > > unbind_from_irqhandler(per_cpu(xen_irq_work, cpu), NULL);
> > >
> > > @@ -661,7 +671,8 @@ static void xen_hvm_cpu_die(unsigned int cpu)
> > > unbind_from_irqhandler(per_cpu(xen_callfunc_irq, cpu), NULL);
> > > unbind_from_irqhandler(per_cpu(xen_debug_irq, cpu), NULL);
> > > unbind_from_irqhandler(per_cpu(xen_callfuncsingle_irq, cpu), NULL);
> > > - unbind_from_irqhandler(per_cpu(xen_irq_work, cpu), NULL);
> > > + if (!xen_hvm_domain())
> > > + unbind_from_irqhandler(per_cpu(xen_irq_work, cpu), NULL);
> > > xen_uninit_lock_cpu(cpu);
> > > xen_teardown_timer(cpu);
> > > native_cpu_die(cpu);
> > > --
> > > 1.8.1.4
> > >
>
On Wed, May 01, 2013 at 02:25:16PM +0100, Stefano Stabellini wrote:
> On Mon, 29 Apr 2013, Konrad Rzeszutek Wilk wrote:
> > On Fri, Apr 26, 2013 at 05:27:20PM +0100, Stefano Stabellini wrote:
> > > On Tue, 16 Apr 2013, Konrad Rzeszutek Wilk wrote:
> > > > There is no need to use the PV version of the IRQ_WORKER mechanism
> > > > as under PVHVM we are using the native version. The native
> > > > version is using the SMP API.
> > > >
> > > > They just sit around unused:
> > > >
> > > > 69: 0 0 xen-percpu-ipi irqwork0
> > > > 83: 0 0 xen-percpu-ipi irqwork1
> > > >
> > > > Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> > >
> > > Might be worth trying to make it work instead?
> > > Is it just because we don't set the apic->send_IPI_* functions to the
> > > xen specific version on PVHVM?
> > >
> >
> > Right. We use the baremetal mechanism to do it. And it works fine.
>
> OK, it works fine, but won't it generate many mores trap and emulate
> cycles?
No idea. We can certainly make use of the PV IPI mechanism for IRQ_WORKER
type mechaism but I would have to play with xentrace to get a good handle
of what is involved (And how the v Posted interrupt thing affects this).
Right now that is something I can't do (buried in bugs).
On Wed, 1 May 2013, Konrad Rzeszutek Wilk wrote:
> On Wed, May 01, 2013 at 02:25:16PM +0100, Stefano Stabellini wrote:
> > On Mon, 29 Apr 2013, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Apr 26, 2013 at 05:27:20PM +0100, Stefano Stabellini wrote:
> > > > On Tue, 16 Apr 2013, Konrad Rzeszutek Wilk wrote:
> > > > > There is no need to use the PV version of the IRQ_WORKER mechanism
> > > > > as under PVHVM we are using the native version. The native
> > > > > version is using the SMP API.
> > > > >
> > > > > They just sit around unused:
> > > > >
> > > > > 69: 0 0 xen-percpu-ipi irqwork0
> > > > > 83: 0 0 xen-percpu-ipi irqwork1
> > > > >
> > > > > Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> > > >
> > > > Might be worth trying to make it work instead?
> > > > Is it just because we don't set the apic->send_IPI_* functions to the
> > > > xen specific version on PVHVM?
> > > >
> > >
> > > Right. We use the baremetal mechanism to do it. And it works fine.
> >
> > OK, it works fine, but won't it generate many mores trap and emulate
> > cycles?
>
> No idea. We can certainly make use of the PV IPI mechanism for IRQ_WORKER
> type mechaism but I would have to play with xentrace to get a good handle
> of what is involved (And how the v Posted interrupt thing affects this).
>
> Right now that is something I can't do (buried in bugs).
OK