LinuxLists.cc - [GIT PULL] nohz: Move nohz kick out of scheduler IPI, v7

2014-06-03 14:40:29

Subject: [GIT PULL] nohz: Move nohz kick out of scheduler IPI, v7

Hi,

This version fixes the following concerns from Peterz:

* Warn _before_ work claim on irq_work_queue_on()
* Warn on in_nmi() while remote queueing
* Only disabled preemption (and not irqs) on local queueing

Thanks.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
timers/nohz-irq-work-v5

Thanks,
Frederic
---

Frederic Weisbecker (5):
irq_work: Split raised and lazy lists
irq_work: Shorten a bit irq_work_needs_cpu()
irq_work: Implement remote queueing
nohz: Move full nohz kick to its own IPI
nohz: Use IPI implicit full barrier against rq->nr_running r/w

include/linux/irq_work.h | 2 ++
include/linux/tick.h | 9 +++++-
kernel/irq_work.c | 72 ++++++++++++++++++++++++++++--------------------
kernel/sched/core.c | 14 ++++------
kernel/sched/sched.h | 12 ++++++--
kernel/smp.c | 4 +++
kernel/time/tick-sched.c | 10 ++++---
7 files changed, 77 insertions(+), 46 deletions(-)

2014-06-03 14:40:33

by Frederic Weisbecker

[permalink] [raw]

Subject: [PATCH 1/5] irq_work: Split raised and lazy lists

An irq work can be handled from two places: from the tick if the work
carries the "lazy" flag and the tick is periodic, or from a self IPI.

We merge all these works in a single list and we use some per cpu latch
to avoid raising a self-IPI when one is already pending.

Now we could do away with this ugly latch if only the list was only made of
non-lazy works. Just enqueueing a work on the empty list would be enough
to know if we need to raise an IPI or not.

Also we are going to implement remote irq work queuing. Then the per CPU
latch will need to become atomic in the global scope. That's too bad
because, here as well, just enqueueing a work on an empty list of
non-lazy works would be enough to know if we need to raise an IPI or not.

So lets take a way out of this: split the works in two distinct lists,
one for the works that can be handled by the next tick and another
one for those handled by the IPI. Just checking if the latter is empty
when we queue a new work is enough to know if we need to raise an IPI.

Suggested-by: Peter Zijlstra <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/irq_work.c | 53 ++++++++++++++++++++++++-----------------------------
1 file changed, 24 insertions(+), 29 deletions(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index a82170e..44ac87c 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -19,8 +19,8 @@
#include <asm/processor.h>

-static DEFINE_PER_CPU(struct llist_head, irq_work_list);
-static DEFINE_PER_CPU(int, irq_work_raised);
+static DEFINE_PER_CPU(struct llist_head, raised_list);
+static DEFINE_PER_CPU(struct llist_head, lazy_list);

/*
* Claim the entry so that no one else will poke at it.
@@ -70,15 +70,14 @@ bool irq_work_queue(struct irq_work *work)
/* Queue the entry and raise the IPI if needed. */
preempt_disable();

- llist_add(&work->llnode, &__get_cpu_var(irq_work_list));
-
- /*
- * If the work is not "lazy" or the tick is stopped, raise the irq
- * work interrupt (if supported by the arch), otherwise, just wait
- * for the next tick.
- */
- if (!(work->flags & IRQ_WORK_LAZY) || tick_nohz_tick_stopped()) {
- if (!this_cpu_cmpxchg(irq_work_raised, 0, 1))
+ /* If the work is "lazy", handle it from next tick if any */
+ if (work->flags & IRQ_WORK_LAZY) {
+ if (llist_add(&work->llnode, &__get_cpu_var(lazy_list)) &&
+ tick_nohz_tick_stopped())
+ arch_irq_work_raise();
+ }
+ } else {
+ if (llist_add(&work->llnode, &__get_cpu_var(raised_list)))
arch_irq_work_raise();
}

@@ -90,10 +89,10 @@ EXPORT_SYMBOL_GPL(irq_work_queue);

bool irq_work_needs_cpu(void)
{
- struct llist_head *this_list;
+ struct llist_head *list;

- this_list = &__get_cpu_var(irq_work_list);
- if (llist_empty(this_list))
+ list = &__get_cpu_var(lazy_list);
+ if (llist_empty(list))
return false;

/* All work should have been flushed before going offline */
@@ -102,28 +101,18 @@ bool irq_work_needs_cpu(void)
return true;
}

-static void __irq_work_run(void)
+static void irq_work_run_list(struct llist_head *list)
{
unsigned long flags;
struct irq_work *work;
- struct llist_head *this_list;
struct llist_node *llnode;

-
- /*
- * Reset the "raised" state right before we check the list because
- * an NMI may enqueue after we find the list empty from the runner.
- */
- __this_cpu_write(irq_work_raised, 0);
- barrier();
-
- this_list = &__get_cpu_var(irq_work_list);
- if (llist_empty(this_list))
- return;
-
BUG_ON(!irqs_disabled());

- llnode = llist_del_all(this_list);
+ if (llist_empty(list))
+ return;
+
+ llnode = llist_del_all(list);
while (llnode != NULL) {
work = llist_entry(llnode, struct irq_work, llnode);

@@ -148,6 +137,12 @@ static void __irq_work_run(void)
}
}

+static void __irq_work_run(void)
+{
+ irq_work_run_list(&__get_cpu_var(raised_list));
+ irq_work_run_list(&__get_cpu_var(lazy_list));
+}
+
/*
* Run the irq_work entries on this cpu. Requires to be ran from hardirq
* context with local IRQs disabled.
--
1.8.3.1

2014-06-03 14:40:39

by Frederic Weisbecker

[permalink] [raw]

Subject: [PATCH 4/5] nohz: Move full nohz kick to its own IPI

Now that the irq work subsystem can queue remote callbacks, it's
a perfect fit to safely queue IPIs when interrupts are disabled
without worrying about concurrent callers.

Lets use it for the full dynticks kick to notify a CPU that it's
exiting single task mode.

This unbloats a bit the scheduler IPI that the nohz code was abusing
for its cool "callable anywhere/anytime" properties.

Cc: Andrew Morton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
include/linux/tick.h | 9 ++++++++-
kernel/sched/core.c | 5 +----
kernel/sched/sched.h | 2 +-
kernel/time/tick-sched.c | 10 ++++++----
4 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index b84773c..8a4987f 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -181,7 +181,13 @@ static inline bool tick_nohz_full_cpu(int cpu)

extern void tick_nohz_init(void);
extern void __tick_nohz_full_check(void);
-extern void tick_nohz_full_kick(void);
+extern void tick_nohz_full_kick_cpu(int cpu);
+
+static inline void tick_nohz_full_kick(void)
+{
+ tick_nohz_full_kick_cpu(smp_processor_id());
+}
+
extern void tick_nohz_full_kick_all(void);
extern void __tick_nohz_task_switch(struct task_struct *tsk);
#else
@@ -189,6 +195,7 @@ static inline void tick_nohz_init(void) { }
static inline bool tick_nohz_full_enabled(void) { return false; }
static inline bool tick_nohz_full_cpu(int cpu) { return false; }
static inline void __tick_nohz_full_check(void) { }
+static inline void tick_nohz_full_kick_cpu(int cpu) { }
static inline void tick_nohz_full_kick(void) { }
static inline void tick_nohz_full_kick_all(void) { }
static inline void __tick_nohz_task_switch(struct task_struct *tsk) { }
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d9d8ece..fb6dfad 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1500,9 +1500,7 @@ void scheduler_ipi(void)
*/
preempt_fold_need_resched();

- if (llist_empty(&this_rq()->wake_list)
- && !tick_nohz_full_cpu(smp_processor_id())
- && !got_nohz_idle_kick())
+ if (llist_empty(&this_rq()->wake_list) && !got_nohz_idle_kick())
return;

/*
@@ -1519,7 +1517,6 @@ void scheduler_ipi(void)
* somewhat pessimize the simple resched case.
*/
irq_enter();
- tick_nohz_full_check();
sched_ttwu_pending();

/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 456e492..6089e00 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1225,7 +1225,7 @@ static inline void inc_nr_running(struct rq *rq)
if (tick_nohz_full_cpu(rq->cpu)) {
/* Order rq->nr_running write against the IPI */
smp_wmb();
- smp_send_reschedule(rq->cpu);
+ tick_nohz_full_kick_cpu(rq->cpu);
}
}
#endif
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6558b7a..3d63944 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -224,13 +224,15 @@ static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
};

/*
- * Kick the current CPU if it's full dynticks in order to force it to
+ * Kick the CPU if it's full dynticks in order to force it to
* re-evaluate its dependency on the tick and restart it if necessary.
*/
-void tick_nohz_full_kick(void)
+void tick_nohz_full_kick_cpu(int cpu)
{
- if (tick_nohz_full_cpu(smp_processor_id()))
- irq_work_queue(&__get_cpu_var(nohz_full_kick_work));
+ if (!tick_nohz_full_cpu(cpu))
+ return;
+
+ irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu);
}

static void nohz_full_kick_ipi(void *info)
--
1.8.3.1

2014-06-03 14:40:52

by Frederic Weisbecker

[permalink] [raw]

Subject: [PATCH 5/5] nohz: Use IPI implicit full barrier against rq->nr_running r/w

A full dynticks CPU is allowed to stop its tick when a single task runs.
Meanwhile when a new task gets enqueued, the CPU must be notified so that
it can restart its tick to maintain local fairness and other accounting
details.

This notification is performed by way of an IPI. Then when the target
receives the IPI, we expect it to see the new value of rq->nr_running.

Hence the following ordering scenario:

CPU 0 CPU 1

write rq->running get IPI
smp_wmb() smp_rmb()
send IPI read rq->nr_running

But Paul Mckenney says that nowadays IPIs imply a full barrier on
all architectures. So we can safely remove this pair and rely on the
implicit barriers that come along IPI send/receive. Lets
just comment on this new assumption.

Cc: Andrew Morton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/sched/core.c | 9 +++++----
kernel/sched/sched.h | 10 ++++++++--
2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fb6dfad..a06cac1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -670,10 +670,11 @@ bool sched_can_stop_tick(void)

rq = this_rq();

- /* Make sure rq->nr_running update is visible after the IPI */
- smp_rmb();
-
- /* More than one running task need preemption */
+ /*
+ * More than one running task need preemption.
+ * nr_running update is assumed to be visible
+ * after IPI is sent from wakers.
+ */
if (rq->nr_running > 1)
return false;

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 6089e00..219bfbd 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1223,8 +1223,14 @@ static inline void inc_nr_running(struct rq *rq)
#ifdef CONFIG_NO_HZ_FULL
if (rq->nr_running == 2) {
if (tick_nohz_full_cpu(rq->cpu)) {
- /* Order rq->nr_running write against the IPI */
- smp_wmb();
+ /*
+ * Tick is needed if more than one task runs on a CPU.
+ * Send the target an IPI to kick it out of nohz mode.
+ *
+ * We assume that IPI implies full memory barrier and the
+ * new value of rq->nr_running is visible on reception
+ * from the target.
+ */
tick_nohz_full_kick_cpu(rq->cpu);
}
}
--
1.8.3.1

2014-06-03 14:41:13

by Frederic Weisbecker

[permalink] [raw]

Subject: [PATCH 3/5] irq_work: Implement remote queueing

irq work currently only supports local callbacks. However its code
is mostly ready to run remote callbacks and we have some potential user.

The full nohz subsystem currently open codes its own remote irq work
on top of the scheduler ipi when it wants a CPU to reevaluate its next
tick. However this ad hoc solution bloats the scheduler IPI.

Lets just extend the irq work subsystem to support remote queuing on top
of the generic SMP IPI to handle this kind of user. This shouldn't add
noticeable overhead.

Suggested-by: Peter Zijlstra <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
include/linux/irq_work.h | 2 ++
kernel/irq_work.c | 22 +++++++++++++++++++++-
kernel/smp.c | 4 ++++
3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index 19ae05d..ae44aa2 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -33,6 +33,8 @@ void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *))
#define DEFINE_IRQ_WORK(name, _f) struct irq_work name = { .func = (_f), }

bool irq_work_queue(struct irq_work *work);
+bool irq_work_queue_on(struct irq_work *work, int cpu);
+
void irq_work_run(void);
void irq_work_sync(struct irq_work *work);

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index a4350d0..42b87cb 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -56,11 +56,31 @@ void __weak arch_irq_work_raise(void)
}

/*
- * Enqueue the irq_work @entry unless it's already pending
+ * Enqueue the irq_work @work on @cpu unless it's already pending
* somewhere.
*
* Can be re-enqueued while the callback is still in progress.
*/
+bool irq_work_queue_on(struct irq_work *work, int cpu)
+{
+ /* All work should have been flushed before going offline */
+ WARN_ON_ONCE(cpu_is_offline(cpu));
+
+ /* Arch remote IPI send/receive backend aren't NMI safe */
+ WARN_ON_ONCE(in_nmi());
+
+ /* Only queue if not already pending */
+ if (!irq_work_claim(work))
+ return false;
+
+ if (llist_add(&work->llnode, &per_cpu(raised_list, cpu)))
+ native_send_call_func_single_ipi(cpu);
+
+ return true;
+}
+EXPORT_SYMBOL_GPL(irq_work_queue_on);
+
+/* Enqueue the irq work @work on the current CPU */
bool irq_work_queue(struct irq_work *work)
{
/* Only queue if not already pending */
diff --git a/kernel/smp.c b/kernel/smp.c
index 06d574e..ba0d8fd 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -3,6 +3,7 @@
*
* (C) Jens Axboe <[email protected]> 2008
*/
+#include <linux/irq_work.h>
#include <linux/rcupdate.h>
#include <linux/rculist.h>
#include <linux/kernel.h>
@@ -198,6 +199,9 @@ void generic_smp_call_function_single_interrupt(void)
csd->func(csd->info);
csd_unlock(csd);
}
+
+ /* Handle irq works queued remotely by irq_work_queue_on() */
+ irq_work_run();
}

/*
--
1.8.3.1

2014-06-03 14:41:34

by Frederic Weisbecker

[permalink] [raw]

Subject: [PATCH 2/5] irq_work: Shorten a bit irq_work_needs_cpu()

Just a small cleanup.

Cc: Andrew Morton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/irq_work.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 44ac87c..a4350d0 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -89,10 +89,7 @@ EXPORT_SYMBOL_GPL(irq_work_queue);

bool irq_work_needs_cpu(void)
{
- struct llist_head *list;
-
- list = &__get_cpu_var(lazy_list);
- if (llist_empty(list))
+ if (llist_empty(&__get_cpu_var(lazy_list)))
return false;

/* All work should have been flushed before going offline */
--
1.8.3.1

2014-06-03 14:43:30

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [GIT PULL] nohz: Move nohz kick out of scheduler IPI, v7

On Tue, Jun 03, 2014 at 04:40:15PM +0200, Frederic Weisbecker wrote:
> Hi,
>
> This version fixes the following concerns from Peterz:
>
> * Warn _before_ work claim on irq_work_queue_on()
> * Warn on in_nmi() while remote queueing
> * Only disabled preemption (and not irqs) on local queueing
>
> Thanks.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> timers/nohz-irq-work-v5

Oops, wait a minute, the tree is broken, I'll do another pull request.

>
> Thanks,
> Frederic
> ---
>
> Frederic Weisbecker (5):
> irq_work: Split raised and lazy lists
> irq_work: Shorten a bit irq_work_needs_cpu()
> irq_work: Implement remote queueing
> nohz: Move full nohz kick to its own IPI
> nohz: Use IPI implicit full barrier against rq->nr_running r/w
>
>
> include/linux/irq_work.h | 2 ++
> include/linux/tick.h | 9 +++++-
> kernel/irq_work.c | 72 ++++++++++++++++++++++++++++--------------------
> kernel/sched/core.c | 14 ++++------
> kernel/sched/sched.h | 12 ++++++--
> kernel/smp.c | 4 +++
> kernel/time/tick-sched.c | 10 ++++---
> 7 files changed, 77 insertions(+), 46 deletions(-)

2014-06-03 14:54:54

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 1/5] irq_work: Split raised and lazy lists

On Tue, Jun 03, 2014 at 04:40:16PM +0200, Frederic Weisbecker wrote:
> @@ -90,10 +89,10 @@ EXPORT_SYMBOL_GPL(irq_work_queue);
>
> bool irq_work_needs_cpu(void)
> {
> - struct llist_head *this_list;
> + struct llist_head *list;
>
> - this_list = &__get_cpu_var(irq_work_list);
> - if (llist_empty(this_list))
> + list = &__get_cpu_var(lazy_list);
> + if (llist_empty(list))
> return false;
>
> /* All work should have been flushed before going offline */

Does this mean needs_cpu() only checks the lazy list? What about archs
without the arch_irq_work_raise() function? They run the other list from
the tick too.

Attachments:

(No filename) (626.00 B)
(No filename) (836.00 B)
Download all attachments

2014-06-03 14:56:58

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH 1/5] irq_work: Split raised and lazy lists

On Tue, Jun 03, 2014 at 04:54:42PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 03, 2014 at 04:40:16PM +0200, Frederic Weisbecker wrote:
> > @@ -90,10 +89,10 @@ EXPORT_SYMBOL_GPL(irq_work_queue);
> >
> > bool irq_work_needs_cpu(void)
> > {
> > - struct llist_head *this_list;
> > + struct llist_head *list;
> >
> > - this_list = &__get_cpu_var(irq_work_list);
> > - if (llist_empty(this_list))
> > + list = &__get_cpu_var(lazy_list);
> > + if (llist_empty(list))
> > return false;
> >
> > /* All work should have been flushed before going offline */
>
> Does this mean needs_cpu() only checks the lazy list? What about archs
> without the arch_irq_work_raise() function? They run the other list from
> the tick too.

Right, I'll fix that too.

2014-06-03 15:00:20

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 3/5] irq_work: Implement remote queueing

On Tue, Jun 03, 2014 at 04:40:18PM +0200, Frederic Weisbecker wrote:
> irq work currently only supports local callbacks. However its code
> is mostly ready to run remote callbacks and we have some potential user.
>
> The full nohz subsystem currently open codes its own remote irq work
> on top of the scheduler ipi when it wants a CPU to reevaluate its next
> tick. However this ad hoc solution bloats the scheduler IPI.
>
> Lets just extend the irq work subsystem to support remote queuing on top
> of the generic SMP IPI to handle this kind of user. This shouldn't add
> noticeable overhead.
>
> Suggested-by: Peter Zijlstra <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Kevin Hilman <[email protected]>
> Cc: Paul E. McKenney <[email protected]>

Acked-by: Peter Zijlstra <[email protected]>

> Cc: Thomas Gleixner <[email protected]>
> Cc: Viresh Kumar <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> ---
> include/linux/irq_work.h | 2 ++
> kernel/irq_work.c | 22 +++++++++++++++++++++-
> kernel/smp.c | 4 ++++
> 3 files changed, 27 insertions(+), 1 deletion(-)

> @@ -198,6 +199,9 @@ void generic_smp_call_function_single_interrupt(void)
> csd->func(csd->info);
> csd_unlock(csd);
> }
> +
> + /* Handle irq works queued remotely by irq_work_queue_on() */
> + irq_work_run();
> }

One could possibly extend that comment by stating that we explicitly run
the irq_work bits after the function bits, because the function bits are
typically synchronous and have people waiting on them, while not so for
the irq_works.

Attachments:

(No filename) (1.64 kB)
(No filename) (836.00 B)
Download all attachments

2014-06-03 15:01:04

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 4/5] nohz: Move full nohz kick to its own IPI

On Tue, Jun 03, 2014 at 04:40:19PM +0200, Frederic Weisbecker wrote:
> Now that the irq work subsystem can queue remote callbacks, it's
> a perfect fit to safely queue IPIs when interrupts are disabled
> without worrying about concurrent callers.
>
> Lets use it for the full dynticks kick to notify a CPU that it's
> exiting single task mode.
>
> This unbloats a bit the scheduler IPI that the nohz code was abusing
> for its cool "callable anywhere/anytime" properties.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Kevin Hilman <[email protected]>
> Cc: Paul E. McKenney <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Viresh Kumar <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> ---
> include/linux/tick.h | 9 ++++++++-
> kernel/sched/core.c | 5 +----
> kernel/sched/sched.h | 2 +-
> kernel/time/tick-sched.c | 10 ++++++----
> 4 files changed, 16 insertions(+), 10 deletions(-)
>

Attachments:

(No filename) (1.05 kB)
(No filename) (836.00 B)
Download all attachments

2014-06-03 15:03:12

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 5/5] nohz: Use IPI implicit full barrier against rq->nr_running r/w

On Tue, Jun 03, 2014 at 04:40:20PM +0200, Frederic Weisbecker wrote:
> A full dynticks CPU is allowed to stop its tick when a single task runs.
> Meanwhile when a new task gets enqueued, the CPU must be notified so that
> it can restart its tick to maintain local fairness and other accounting
> details.
>
> This notification is performed by way of an IPI. Then when the target
> receives the IPI, we expect it to see the new value of rq->nr_running.
>
> Hence the following ordering scenario:
>
> CPU 0 CPU 1
>
> write rq->running get IPI
> smp_wmb() smp_rmb()
> send IPI read rq->nr_running
>
> But Paul Mckenney says that nowadays IPIs imply a full barrier on
> all architectures. So we can safely remove this pair and rely on the
> implicit barriers that come along IPI send/receive. Lets
> just comment on this new assumption.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Kevin Hilman <[email protected]>
> Cc: Paul E. McKenney <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Viresh Kumar <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>

Attachments:

(No filename) (1.25 kB)
(No filename) (836.00 B)
Download all attachments

2014-06-03 15:08:26

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH 3/5] irq_work: Implement remote queueing

On Tue, Jun 03, 2014 at 05:00:08PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 03, 2014 at 04:40:18PM +0200, Frederic Weisbecker wrote:
> > irq work currently only supports local callbacks. However its code
> > is mostly ready to run remote callbacks and we have some potential user.
> >
> > The full nohz subsystem currently open codes its own remote irq work
> > on top of the scheduler ipi when it wants a CPU to reevaluate its next
> > tick. However this ad hoc solution bloats the scheduler IPI.
> >
> > Lets just extend the irq work subsystem to support remote queuing on top
> > of the generic SMP IPI to handle this kind of user. This shouldn't add
> > noticeable overhead.
> >
> > Suggested-by: Peter Zijlstra <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Kevin Hilman <[email protected]>
> > Cc: Paul E. McKenney <[email protected]>
>
> Acked-by: Peter Zijlstra <[email protected]>
>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Viresh Kumar <[email protected]>
> > Signed-off-by: Frederic Weisbecker <[email protected]>
> > ---
> > include/linux/irq_work.h | 2 ++
> > kernel/irq_work.c | 22 +++++++++++++++++++++-
> > kernel/smp.c | 4 ++++
> > 3 files changed, 27 insertions(+), 1 deletion(-)
>
> > @@ -198,6 +199,9 @@ void generic_smp_call_function_single_interrupt(void)
> > csd->func(csd->info);
> > csd_unlock(csd);
> > }
> > +
> > + /* Handle irq works queued remotely by irq_work_queue_on() */
> > + irq_work_run();
> > }
>
> One could possibly extend that comment by stating that we explicitly run
> the irq_work bits after the function bits, because the function bits are
> typically synchronous and have people waiting on them, while not so for
> the irq_works.
>

Good point! I'll expand the comment.

2014-06-03 16:29:11

by Eric Dumazet

[permalink] [raw]

Subject: Re: [PATCH 3/5] irq_work: Implement remote queueing

On Tue, 2014-06-03 at 16:40 +0200, Frederic Weisbecker wrote:
> irq work currently only supports local callbacks. However its code
> is mostly ready to run remote callbacks and we have some potential user.
>
> The full nohz subsystem currently open codes its own remote irq work
> on top of the scheduler ipi when it wants a CPU to reevaluate its next
> tick. However this ad hoc solution bloats the scheduler IPI.
>
> Lets just extend the irq work subsystem to support remote queuing on top
> of the generic SMP IPI to handle this kind of user. This shouldn't add
> noticeable overhead.

> */
> +bool irq_work_queue_on(struct irq_work *work, int cpu)
> +{
> + /* All work should have been flushed before going offline */
> + WARN_ON_ONCE(cpu_is_offline(cpu));
> +
> + /* Arch remote IPI send/receive backend aren't NMI safe */
> + WARN_ON_ONCE(in_nmi());
> +
> + /* Only queue if not already pending */
> + if (!irq_work_claim(work))
> + return false;
> +
> + if (llist_add(&work->llnode, &per_cpu(raised_list, cpu)))
> + native_send_call_func_single_ipi(cpu);
> +
> + return true;
> +}
> +EXPORT_SYMBOL_GPL(irq_work_queue_on);
> +

I am curious, this should only compile on x86, right ?

2014-06-03 16:56:08

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 3/5] irq_work: Implement remote queueing

On Tue, Jun 03, 2014 at 09:29:07AM -0700, Eric Dumazet wrote:
> > +bool irq_work_queue_on(struct irq_work *work, int cpu)
> > +{
> > + /* All work should have been flushed before going offline */
> > + WARN_ON_ONCE(cpu_is_offline(cpu));
> > +
> > + /* Arch remote IPI send/receive backend aren't NMI safe */
> > + WARN_ON_ONCE(in_nmi());
> > +
> > + /* Only queue if not already pending */
> > + if (!irq_work_claim(work))
> > + return false;
> > +
> > + if (llist_add(&work->llnode, &per_cpu(raised_list, cpu)))
> > + native_send_call_func_single_ipi(cpu);
> > +
> > + return true;
> > +}
> > +EXPORT_SYMBOL_GPL(irq_work_queue_on);
> > +
>
> I am curious, this should only compile on x86, right ?

Oh, you tease, you forgot to say why you think this.

Are you referring to the in_nmi() usage? that's from
include/linux/hardirq.h, hardly x86 specific.

Attachments:

(No filename) (855.00 B)
(No filename) (836.00 B)
Download all attachments

2014-06-03 20:02:43

by Eric Dumazet

[permalink] [raw]

Subject: Re: [PATCH 3/5] irq_work: Implement remote queueing

On Tue, 2014-06-03 at 18:37 +0200, Peter Zijlstra wrote:
> On Tue, Jun 03, 2014 at 09:29:07AM -0700, Eric Dumazet wrote:
> > > +bool irq_work_queue_on(struct irq_work *work, int cpu)
> > > +{
> > > + /* All work should have been flushed before going offline */
> > > + WARN_ON_ONCE(cpu_is_offline(cpu));
> > > +
> > > + /* Arch remote IPI send/receive backend aren't NMI safe */
> > > + WARN_ON_ONCE(in_nmi());
> > > +
> > > + /* Only queue if not already pending */
> > > + if (!irq_work_claim(work))
> > > + return false;
> > > +
> > > + if (llist_add(&work->llnode, &per_cpu(raised_list, cpu)))
> > > + native_send_call_func_single_ipi(cpu);
> > > +
> > > + return true;
> > > +}
> > > +EXPORT_SYMBOL_GPL(irq_work_queue_on);
> > > +
> >
> > I am curious, this should only compile on x86, right ?
>
> Oh, you tease, you forgot to say why you think this.
>
> Are you referring to the in_nmi() usage? that's from
> include/linux/hardirq.h, hardly x86 specific.

No, my eyes were attracted by native_send_call_func_single_ipi()

2014-06-03 20:29:50

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH 3/5] irq_work: Implement remote queueing

On Tue, Jun 03, 2014 at 01:02:39PM -0700, Eric Dumazet wrote:
> On Tue, 2014-06-03 at 18:37 +0200, Peter Zijlstra wrote:
> > On Tue, Jun 03, 2014 at 09:29:07AM -0700, Eric Dumazet wrote:
> > > > +bool irq_work_queue_on(struct irq_work *work, int cpu)
> > > > +{
> > > > + /* All work should have been flushed before going offline */
> > > > + WARN_ON_ONCE(cpu_is_offline(cpu));
> > > > +
> > > > + /* Arch remote IPI send/receive backend aren't NMI safe */
> > > > + WARN_ON_ONCE(in_nmi());
> > > > +
> > > > + /* Only queue if not already pending */
> > > > + if (!irq_work_claim(work))
> > > > + return false;
> > > > +
> > > > + if (llist_add(&work->llnode, &per_cpu(raised_list, cpu)))
> > > > + native_send_call_func_single_ipi(cpu);
> > > > +
> > > > + return true;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(irq_work_queue_on);
> > > > +
> > >
> > > I am curious, this should only compile on x86, right ?
> >
> > Oh, you tease, you forgot to say why you think this.
> >
> > Are you referring to the in_nmi() usage? that's from
> > include/linux/hardirq.h, hardly x86 specific.
>
> No, my eyes were attracted by native_send_call_func_single_ipi()

Right, should be arch_send_call_function_single_ipi(). I'm like unable
to get a correct patchset before at least 9 takes...