2012-10-29 20:49:19

by Steven Rostedt

[permalink] [raw]
Subject: [PATCH 04/32] x86: New cpuset nohz irq vector

From: Frederic Weisbecker <[email protected]>

We need a way to send an IPI (remote or local) in order to
asynchronously restart the tick for CPUs in nohz adaptive mode.

This must be asynchronous such that we can trigger it with irqs
disabled. This must be usable as a self-IPI as well for example
in cases where we want to avoid random dealock scenario while
restarting the tick inline otherwise.

This only settles the x86 backend. The core tick restart function
will be defined in a later patch.

[CHECKME: Perhaps we instead need to use irq work for self IPIs.
But we also need a way to send async remote IPIs.]

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Alessio Igor Bogani <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Avi Kivity <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Daniel Lezcano <[email protected]>
Cc: Geoff Levand <[email protected]>
Cc: Gilad Ben Yossef <[email protected]>
Cc: Hakan Akkan <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Max Krasnyansky <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Sven-Thorsten Dietrich <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
arch/x86/include/asm/entry_arch.h | 3 +++
arch/x86/include/asm/hw_irq.h | 7 +++++++
arch/x86/include/asm/irq_vectors.h | 2 ++
arch/x86/include/asm/smp.h | 11 ++++++++++-
arch/x86/kernel/entry_64.S | 4 ++++
arch/x86/kernel/irqinit.c | 4 ++++
arch/x86/kernel/smp.c | 24 ++++++++++++++++++++++++
7 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/entry_arch.h b/arch/x86/include/asm/entry_arch.h
index 40afa00..7e8c38c 100644
--- a/arch/x86/include/asm/entry_arch.h
+++ b/arch/x86/include/asm/entry_arch.h
@@ -10,6 +10,9 @@
* through the ICC by us (IPIs)
*/
#ifdef CONFIG_SMP
+#ifdef CONFIG_CPUSETS_NO_HZ
+BUILD_INTERRUPT(cpuset_update_nohz_interrupt,CPUSET_UPDATE_NOHZ_VECTOR)
+#endif
BUILD_INTERRUPT(reschedule_interrupt,RESCHEDULE_VECTOR)
BUILD_INTERRUPT(call_function_interrupt,CALL_FUNCTION_VECTOR)
BUILD_INTERRUPT(call_function_single_interrupt,CALL_FUNCTION_SINGLE_VECTOR)
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index eb92a6e..0d26ed7 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -35,6 +35,10 @@ extern void spurious_interrupt(void);
extern void thermal_interrupt(void);
extern void reschedule_interrupt(void);

+#ifdef CONFIG_CPUSETS_NO_HZ
+extern void cpuset_update_nohz_interrupt(void);
+#endif
+
extern void invalidate_interrupt(void);
extern void invalidate_interrupt0(void);
extern void invalidate_interrupt1(void);
@@ -152,6 +156,9 @@ extern asmlinkage void smp_irq_move_cleanup_interrupt(void);
#endif
#ifdef CONFIG_SMP
extern void smp_reschedule_interrupt(struct pt_regs *);
+#ifdef CONFIG_CPUSETS_NO_HZ
+extern void smp_cpuset_update_nohz_interrupt(struct pt_regs *);
+#endif
extern void smp_call_function_interrupt(struct pt_regs *);
extern void smp_call_function_single_interrupt(struct pt_regs *);
#ifdef CONFIG_X86_32
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 1508e51..f54dea8 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -112,6 +112,8 @@
/* Xen vector callback to receive events in a HVM domain */
#define XEN_HVM_EVTCHN_CALLBACK 0xf3

+#define CPUSET_UPDATE_NOHZ_VECTOR 0xf2
+
/*
* Local APIC timer IRQ vector is on a different priority level,
* to work around the 'lost local interrupt if more than 2 IRQ
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 4f19a15..2c30bbd 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -71,7 +71,9 @@ struct smp_ops {

void (*stop_other_cpus)(int wait);
void (*smp_send_reschedule)(int cpu);
-
+#ifdef CONFIG_CPUSETS_NO_HZ
+ void (*smp_cpuset_update_nohz)(int cpu);
+#endif
int (*cpu_up)(unsigned cpu, struct task_struct *tidle);
int (*cpu_disable)(void);
void (*cpu_die)(unsigned int cpu);
@@ -140,6 +142,13 @@ static inline void smp_send_reschedule(int cpu)
smp_ops.smp_send_reschedule(cpu);
}

+static inline void smp_cpuset_update_nohz(int cpu)
+{
+#ifdef CONFIG_CPUSETS_NO_HZ
+ smp_ops.smp_cpuset_update_nohz(cpu);
+#endif
+}
+
static inline void arch_send_call_function_single_ipi(int cpu)
{
smp_ops.send_call_func_single_ipi(cpu);
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index b51b2c7..6d5b77d 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1173,6 +1173,10 @@ apicinterrupt CALL_FUNCTION_VECTOR \
call_function_interrupt smp_call_function_interrupt
apicinterrupt RESCHEDULE_VECTOR \
reschedule_interrupt smp_reschedule_interrupt
+#ifdef CONFIG_CPUSETS_NO_HZ
+apicinterrupt CPUSET_UPDATE_NOHZ_VECTOR \
+ cpuset_update_nohz_interrupt smp_cpuset_update_nohz_interrupt
+#endif
#endif

apicinterrupt ERROR_APIC_VECTOR \
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index 6e03b0d..394e9ec 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -171,6 +171,10 @@ static void __init smp_intr_init(void)
*/
alloc_intr_gate(RESCHEDULE_VECTOR, reschedule_interrupt);

+#ifdef CONFIG_CPUSETS_NO_HZ
+ alloc_intr_gate(CPUSET_UPDATE_NOHZ_VECTOR, cpuset_update_nohz_interrupt);
+#endif
+
/* IPI for generic function call */
alloc_intr_gate(CALL_FUNCTION_VECTOR, call_function_interrupt);

diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 48d2b7d..4c0b7d2 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -126,6 +126,17 @@ static void native_smp_send_reschedule(int cpu)
apic->send_IPI_mask(cpumask_of(cpu), RESCHEDULE_VECTOR);
}

+#ifdef CONFIG_CPUSETS_NO_HZ
+static void native_smp_cpuset_update_nohz(int cpu)
+{
+ if (unlikely(cpu_is_offline(cpu))) {
+ WARN_ON(1);
+ return;
+ }
+ apic->send_IPI_mask(cpumask_of(cpu), CPUSET_UPDATE_NOHZ_VECTOR);
+}
+#endif
+
void native_send_call_func_single_ipi(int cpu)
{
apic->send_IPI_mask(cpumask_of(cpu), CALL_FUNCTION_SINGLE_VECTOR);
@@ -259,6 +270,16 @@ void smp_reschedule_interrupt(struct pt_regs *regs)
*/
}

+#ifdef CONFIG_CPUSETS_NO_HZ
+void smp_cpuset_update_nohz_interrupt(struct pt_regs *regs)
+{
+ ack_APIC_irq();
+ irq_enter();
+ inc_irq_stat(irq_call_count);
+ irq_exit();
+}
+#endif
+
void smp_call_function_interrupt(struct pt_regs *regs)
{
ack_APIC_irq();
@@ -292,6 +313,9 @@ struct smp_ops smp_ops = {

.stop_other_cpus = native_stop_other_cpus,
.smp_send_reschedule = native_smp_send_reschedule,
+#ifdef CONFIG_CPUSETS_NO_HZ
+ .smp_cpuset_update_nohz = native_smp_cpuset_update_nohz,
+#endif

.cpu_up = native_cpu_up,
.cpu_die = native_cpu_die,
--
1.7.10.4


2012-10-30 17:39:51

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 04/32] x86: New cpuset nohz irq vector

On Mon, 2012-10-29 at 16:27 -0400, Steven Rostedt wrote:
> plain text document attachment
> (0004-x86-New-cpuset-nohz-irq-vector.patch)
> From: Frederic Weisbecker <[email protected]>
>
> We need a way to send an IPI (remote or local) in order to
> asynchronously restart the tick for CPUs in nohz adaptive mode.
>
> This must be asynchronous such that we can trigger it with irqs
> disabled. This must be usable as a self-IPI as well for example
> in cases where we want to avoid random dealock scenario while
> restarting the tick inline otherwise.
>
> This only settles the x86 backend. The core tick restart function
> will be defined in a later patch.
>
> [CHECKME: Perhaps we instead need to use irq work for self IPIs.
> But we also need a way to send async remote IPIs.]

Probably just use irq_work for self ipis, and normal ipis for other
CPUs.

Also, what reason do we have to force a task out of nohz? IOW, do we
really need this?

Also, perhaps we could just tag onto the schedule_ipi() function instead
of having to create a new IPI for all archs?

-- Steve

>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Alessio Igor Bogani <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Avi Kivity <[email protected]>
> Cc: Chris Metcalf <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Daniel Lezcano <[email protected]>
> Cc: Geoff Levand <[email protected]>
> Cc: Gilad Ben Yossef <[email protected]>
> Cc: Hakan Akkan <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Kevin Hilman <[email protected]>
> Cc: Max Krasnyansky <[email protected]>
> Cc: Paul E. McKenney <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Stephen Hemminger <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Sven-Thorsten Dietrich <[email protected]>
> Cc: Thomas Gleixner <[email protected]>

2012-10-30 23:51:35

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 04/32] x86: New cpuset nohz irq vector

2012/10/30 Steven Rostedt <[email protected]>:
> On Mon, 2012-10-29 at 16:27 -0400, Steven Rostedt wrote:
>> plain text document attachment
>> (0004-x86-New-cpuset-nohz-irq-vector.patch)
>> From: Frederic Weisbecker <[email protected]>
>>
>> We need a way to send an IPI (remote or local) in order to
>> asynchronously restart the tick for CPUs in nohz adaptive mode.
>>
>> This must be asynchronous such that we can trigger it with irqs
>> disabled. This must be usable as a self-IPI as well for example
>> in cases where we want to avoid random dealock scenario while
>> restarting the tick inline otherwise.
>>
>> This only settles the x86 backend. The core tick restart function
>> will be defined in a later patch.
>>
>> [CHECKME: Perhaps we instead need to use irq work for self IPIs.
>> But we also need a way to send async remote IPIs.]
>
> Probably just use irq_work for self ipis, and normal ipis for other
> CPUs.

Right. And that's one more reason why we want to know if the arch
implements irq work with self ipis or not. If the arch can't, then we
just don't stop the tick.

> Also, what reason do we have to force a task out of nohz? IOW, do we
> really need this?

When a posix CPU timer is enqueued, when a new task is enqueued, etc...

>
> Also, perhaps we could just tag onto the schedule_ipi() function instead
> of having to create a new IPI for all archs?

irq work should be just fine. No need to add more overhead on the
schedule ipi I think.

2012-10-31 00:07:58

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 04/32] x86: New cpuset nohz irq vector

On Wed, 2012-10-31 at 00:51 +0100, Frederic Weisbecker wrote:

> > Probably just use irq_work for self ipis, and normal ipis for other
> > CPUs.
>
> Right. And that's one more reason why we want to know if the arch
> implements irq work with self ipis or not. If the arch can't, then we
> just don't stop the tick.

We can just allow certain archs to have cpuset/nohz. Make it depend on
features that you want (or makes nohz easier to implement).

>
> > Also, what reason do we have to force a task out of nohz? IOW, do we
> > really need this?
>
> When a posix CPU timer is enqueued, when a new task is enqueued, etc...

I was thinking about something other than itself. That is, who would
enqueue a posix cpu timer on the cpu other than the task running with
nohz on that cpu?

A new task would send the schedule ipi too. Which would enqueue the task
and take the cpu out of nohz, no?


>
> >
> > Also, perhaps we could just tag onto the schedule_ipi() function instead
> > of having to create a new IPI for all archs?
>
> irq work should be just fine. No need to add more overhead on the
> schedule ipi I think.

irq_work can send the work to another CPU right? This part I wasn't sure
about.

-- Steve

2012-10-31 00:45:08

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 04/32] x86: New cpuset nohz irq vector

2012/10/31 Steven Rostedt <[email protected]>:
> On Wed, 2012-10-31 at 00:51 +0100, Frederic Weisbecker wrote:
>
>> > Probably just use irq_work for self ipis, and normal ipis for other
>> > CPUs.
>>
>> Right. And that's one more reason why we want to know if the arch
>> implements irq work with self ipis or not. If the arch can't, then we
>> just don't stop the tick.
>
> We can just allow certain archs to have cpuset/nohz. Make it depend on
> features that you want (or makes nohz easier to implement).

Right.

>>
>> > Also, what reason do we have to force a task out of nohz? IOW, do we
>> > really need this?
>>
>> When a posix CPU timer is enqueued, when a new task is enqueued, etc...
>
> I was thinking about something other than itself. That is, who would
> enqueue a posix cpu timer on the cpu other than the task running with
> nohz on that cpu?

If the posix cpu timer is process wide (ie: whole threadgroup) this can happen.

> A new task would send the schedule ipi too. Which would enqueue the task
> and take the cpu out of nohz, no?

Not if it's enqueued locally. And in this case we don't want to
restart the tick from the ttwu path in order to avoid funny locking
scenario. So a self IPI would do the trick.

>> irq work should be just fine. No need to add more overhead on the
>> schedule ipi I think.
>
> irq_work can send the work to another CPU right? This part I wasn't sure
> about.

"Claiming" a work itself can be a cross CPU competition: multiple CPUs
may want to queue the work at the same time, only one should succeed.
Once claimed though, the work can only been enqueued locally.