2009-09-13 16:14:45

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 0/4] Review comments, cleanups, and preemptable synchronize_rcu() fixes

This patchset provides updates to TREE_PREEMPT_RCU as follows:

1. Update the TREE_PREEMPT_RCU description to note that it is
suitable for small machines.

2. Add some WARN_ON_ONCE() calls to check for (incorrect)
concurrent grace-period initialization.

3. Simplify quiescent-state detection (which also speeds up
TREE_PREEMPT_RCU grace periods slightly).

4. Fix a thinko in TREE_PREEMPT_RCU's synchronize_rcu() that
could result in premature grace periods.


2009-09-13 16:22:23

by Daniel Walker

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 2/4] Add debug checks to TREE_PREEMPT_RCU for premature grace periods.

On Sun, 2009-09-13 at 09:15 -0700, Paul E. McKenney wrote:
> From: Paul E. McKenney <[email protected]>
>
> Check to make sure that there are no blocked tasks for the previous
> grace period while initializing for the next grace period, verify
> that rcu_preempt_qs() is given the correct CPU number and is never
> called for an offline CPU.
>

You've got a couple of whitespace issues in the WARN_ON_ONCE() lines..
As found by checkpatch,

ERROR: code indent should use tabs where possible
#97: FILE: kernel/rcutree_plugin.h:89:
+^I ^IWARN_ON_ONCE(cpu != smp_processor_id());$

ERROR: code indent should use tabs where possible
#109: FILE: kernel/rcutree_plugin.h:111:
+^I ^IWARN_ON_ONCE((rdp->grpmask & rnp->qsmaskinit) == 0);$


Could you fix these up?

Daniel

2009-09-13 16:31:44

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 2/4] Add debug checks to TREE_PREEMPT_RCU for premature grace periods.

On Sun, Sep 13, 2009 at 09:23:02AM -0700, Daniel Walker wrote:
> On Sun, 2009-09-13 at 09:15 -0700, Paul E. McKenney wrote:
> > From: Paul E. McKenney <[email protected]>
> >
> > Check to make sure that there are no blocked tasks for the previous
> > grace period while initializing for the next grace period, verify
> > that rcu_preempt_qs() is given the correct CPU number and is never
> > called for an offline CPU.
> >
>
> You've got a couple of whitespace issues in the WARN_ON_ONCE() lines..
> As found by checkpatch,
>
> ERROR: code indent should use tabs where possible
> #97: FILE: kernel/rcutree_plugin.h:89:
> +^I ^IWARN_ON_ONCE(cpu != smp_processor_id());$
>
> ERROR: code indent should use tabs where possible
> #109: FILE: kernel/rcutree_plugin.h:111:
> +^I ^IWARN_ON_ONCE((rdp->grpmask & rnp->qsmaskinit) == 0);$
>
> Could you fix these up?

Good catch! Here is a corrected version.

Thanx, Paul

------------------------------------------------------------------------

>From f5807ddbd4fff957e6c2efdc874a740ff40f1c94 Mon Sep 17 00:00:00 2001
From: Paul E. McKenney <[email protected]>
Date: Tue, 8 Sep 2009 16:36:30 -0700
Subject: [PATCH tip/core/rcu 2/4] Add debug checks to TREE_PREEMPT_RCU for premature grace periods.

Check to make sure that there are no blocked tasks for the previous
grace period while initializing for the next grace period, verify
that rcu_preempt_qs() is given the correct CPU number and is never
called for an offline CPU.

Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcutree.c | 2 ++
kernel/rcutree_plugin.h | 25 +++++++++++++++++++++++++
2 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index bca0aba..3a01405 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -627,6 +627,7 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
/* Special-case the common single-level case. */
if (NUM_RCU_NODES == 1) {
rnp->qsmask = rnp->qsmaskinit;
+ rcu_preempt_check_blocked_tasks(rnp);
rnp->gpnum = rsp->gpnum;
rsp->signaled = RCU_SIGNAL_INIT; /* force_quiescent_state OK. */
spin_unlock_irqrestore(&rnp->lock, flags);
@@ -660,6 +661,7 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
for (rnp_cur = &rsp->node[0]; rnp_cur < rnp_end; rnp_cur++) {
spin_lock(&rnp_cur->lock); /* irqs already disabled. */
rnp_cur->qsmask = rnp_cur->qsmaskinit;
+ rcu_preempt_check_blocked_tasks(rnp);
rnp->gpnum = rsp->gpnum;
spin_unlock(&rnp_cur->lock); /* irqs already disabled. */
}
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 4778936..51413cb 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -86,6 +86,7 @@ static void rcu_preempt_qs(int cpu)

if (t->rcu_read_lock_nesting &&
(t->rcu_read_unlock_special & RCU_READ_UNLOCK_BLOCKED) == 0) {
+ WARN_ON_ONCE(cpu != smp_processor_id());

/* Possibly blocking in an RCU read-side critical section. */
rdp = rcu_preempt_state.rda[cpu];
@@ -103,7 +104,11 @@ static void rcu_preempt_qs(int cpu)
* state for the current grace period), then as long
* as that task remains queued, the current grace period
* cannot end.
+ *
+ * But first, note that the current CPU must still be
+ * on line!
*/
+ WARN_ON_ONCE((rdp->grpmask & rnp->qsmaskinit) == 0);
phase = !(rnp->qsmask & rdp->grpmask) ^ (rnp->gpnum & 0x1);
list_add(&t->rcu_node_entry, &rnp->blocked_tasks[phase]);
smp_mb(); /* Ensure later ctxt swtch seen after above. */
@@ -259,6 +264,18 @@ static void rcu_print_task_stall(struct rcu_node *rnp)
#endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */

/*
+ * Check that the list of blocked tasks for the newly completed grace
+ * period is in fact empty. It is a serious bug to complete a grace
+ * period that still has RCU readers blocked! This function must be
+ * invoked -before- updating this rnp's ->gpnum, and the rnp's ->lock
+ * must be held by the caller.
+ */
+static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
+{
+ WARN_ON_ONCE(!list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x1]));
+}
+
+/*
* Check for preempted RCU readers for the specified rcu_node structure.
* If the caller needs a reliable answer, it must hold the rcu_node's
* >lock.
@@ -451,6 +468,14 @@ static void rcu_print_task_stall(struct rcu_node *rnp)
#endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */

/*
+ * Because there is no preemptable RCU, there can be no readers blocked,
+ * so there is no need to check for blocked tasks.
+ */
+static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
+{
+}
+
+/*
* Because preemptable RCU does not exist, there are never any preempted
* RCU readers.
*/
--
1.5.2.5

2009-09-15 07:18:16

by Paul E. McKenney

[permalink] [raw]
Subject: [tip:core/urgent] rcu: Kconfig help needs to say that TREE_PREEMPT_RCU scales down

Commit-ID: da054e04e4d7c8d340ddc8dc45d4f7cad7672b96
Gitweb: http://git.kernel.org/tip/da054e04e4d7c8d340ddc8dc45d4f7cad7672b96
Author: Paul E. McKenney <[email protected]>
AuthorDate: Sun, 13 Sep 2009 09:15:08 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 15 Sep 2009 08:43:58 +0200

rcu: Kconfig help needs to say that TREE_PREEMPT_RCU scales down

To quote Valdis:

This leaves somebody who has a laptop wondering which
choice is best for a system with only one or two cores that
has CONFIG_PREEMPT defined. One choice says it scales down
nicely, the other explicitly has a 'depends on PREEMPT'
attached to it...

So add "scales down nicely" to TREE_PREEMPT_RCU to match that of
TREE_RCU.

Suggested-by: Valdis Kletnieks <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <12528585112362-git-send-email->
Signed-off-by: Ingo Molnar <[email protected]>


---
init/Kconfig | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 8e8b76d..4c2c936 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -331,7 +331,8 @@ config TREE_PREEMPT_RCU
This option selects the RCU implementation that is
designed for very large SMP systems with hundreds or
thousands of CPUs, but for which real-time response
- is also required.
+ is also required. It also scales down nicely to
+ smaller systems.

endchoice

2009-09-15 07:18:28

by Paul E. McKenney

[permalink] [raw]
Subject: [tip:core/urgent] rcu: Add debug checks to TREE_PREEMPT_RCU for premature grace periods

Commit-ID: 429e6f07df20175fa59927df415c41c5e1d82d91
Gitweb: http://git.kernel.org/tip/429e6f07df20175fa59927df415c41c5e1d82d91
Author: Paul E. McKenney <[email protected]>
AuthorDate: Sun, 13 Sep 2009 09:15:09 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 15 Sep 2009 08:43:58 +0200

rcu: Add debug checks to TREE_PREEMPT_RCU for premature grace periods

Check to make sure that there are no blocked tasks for the previous
grace period while initializing for the next grace period, verify
that rcu_preempt_qs() is given the correct CPU number and is never
called for an offline CPU.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <12528585111986-git-send-email->
Signed-off-by: Ingo Molnar <[email protected]>


---
kernel/rcutree.c | 2 ++
kernel/rcutree_plugin.h | 25 +++++++++++++++++++++++++
2 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index da301e2..e9a4ae9 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -632,6 +632,7 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
/* Special-case the common single-level case. */
if (NUM_RCU_NODES == 1) {
rnp->qsmask = rnp->qsmaskinit;
+ rcu_preempt_check_blocked_tasks(rnp);
rnp->gpnum = rsp->gpnum;
rsp->signaled = RCU_SIGNAL_INIT; /* force_quiescent_state OK. */
spin_unlock_irqrestore(&rnp->lock, flags);
@@ -665,6 +666,7 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
for (rnp_cur = &rsp->node[0]; rnp_cur < rnp_end; rnp_cur++) {
spin_lock(&rnp_cur->lock); /* irqs already disabled. */
rnp_cur->qsmask = rnp_cur->qsmaskinit;
+ rcu_preempt_check_blocked_tasks(rnp);
rnp->gpnum = rsp->gpnum;
spin_unlock(&rnp_cur->lock); /* irqs already disabled. */
}
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 4778936..b8e4b03 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -86,6 +86,7 @@ static void rcu_preempt_qs(int cpu)

if (t->rcu_read_lock_nesting &&
(t->rcu_read_unlock_special & RCU_READ_UNLOCK_BLOCKED) == 0) {
+ WARN_ON_ONCE(cpu != smp_processor_id());

/* Possibly blocking in an RCU read-side critical section. */
rdp = rcu_preempt_state.rda[cpu];
@@ -103,7 +104,11 @@ static void rcu_preempt_qs(int cpu)
* state for the current grace period), then as long
* as that task remains queued, the current grace period
* cannot end.
+ *
+ * But first, note that the current CPU must still be
+ * on line!
*/
+ WARN_ON_ONCE((rdp->grpmask & rnp->qsmaskinit) == 0);
phase = !(rnp->qsmask & rdp->grpmask) ^ (rnp->gpnum & 0x1);
list_add(&t->rcu_node_entry, &rnp->blocked_tasks[phase]);
smp_mb(); /* Ensure later ctxt swtch seen after above. */
@@ -259,6 +264,18 @@ static void rcu_print_task_stall(struct rcu_node *rnp)
#endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */

/*
+ * Check that the list of blocked tasks for the newly completed grace
+ * period is in fact empty. It is a serious bug to complete a grace
+ * period that still has RCU readers blocked! This function must be
+ * invoked -before- updating this rnp's ->gpnum, and the rnp's ->lock
+ * must be held by the caller.
+ */
+static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
+{
+ WARN_ON_ONCE(!list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x1]));
+}
+
+/*
* Check for preempted RCU readers for the specified rcu_node structure.
* If the caller needs a reliable answer, it must hold the rcu_node's
* >lock.
@@ -451,6 +468,14 @@ static void rcu_print_task_stall(struct rcu_node *rnp)
#endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */

/*
+ * Because there is no preemptable RCU, there can be no readers blocked,
+ * so there is no need to check for blocked tasks.
+ */
+static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
+{
+}
+
+/*
* Because preemptable RCU does not exist, there are never any preempted
* RCU readers.
*/

2009-09-15 07:18:37

by Paul E. McKenney

[permalink] [raw]
Subject: [tip:core/urgent] rcu: Simplify rcu_read_unlock_special() quiescent-state accounting

Commit-ID: ddaad21c6848c599edc9432747a5295ea4d060df
Gitweb: http://git.kernel.org/tip/ddaad21c6848c599edc9432747a5295ea4d060df
Author: Paul E. McKenney <[email protected]>
AuthorDate: Sun, 13 Sep 2009 09:15:10 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 15 Sep 2009 08:43:59 +0200

rcu: Simplify rcu_read_unlock_special() quiescent-state accounting

The earlier approach required two scheduling-clock ticks to note an
preemptable-RCU quiescent state in the situation in which the
scheduling-clock interrupt is unlucky enough to always interrupt an
RCU read-side critical section.

With this change, the quiescent state is instead noted by the
outermost rcu_read_unlock() immediately following the first
scheduling-clock tick, or, alternatively, by the first subsequent
context switch. Therefore, this change also speeds up grace
periods.

Suggested-by: Josh Triplett <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <12528585111945-git-send-email->
Signed-off-by: Ingo Molnar <[email protected]>


---
include/linux/sched.h | 1 -
kernel/rcutree.c | 15 +++++-------
kernel/rcutree_plugin.h | 54 ++++++++++++++++++++++------------------------
3 files changed, 32 insertions(+), 38 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f3d74bd..c62a9f8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1740,7 +1740,6 @@ extern cputime_t task_gtime(struct task_struct *p);

#define RCU_READ_UNLOCK_BLOCKED (1 << 0) /* blocked while in RCU read-side. */
#define RCU_READ_UNLOCK_NEED_QS (1 << 1) /* RCU core needs CPU response. */
-#define RCU_READ_UNLOCK_GOT_QS (1 << 2) /* CPU has responded to RCU core. */

static inline void rcu_copy_process(struct task_struct *p)
{
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index e9a4ae9..6c99553 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -107,27 +107,23 @@ static void __cpuinit rcu_init_percpu_data(int cpu, struct rcu_state *rsp,
*/
void rcu_sched_qs(int cpu)
{
- unsigned long flags;
struct rcu_data *rdp;

- local_irq_save(flags);
rdp = &per_cpu(rcu_sched_data, cpu);
- rdp->passed_quiesc = 1;
rdp->passed_quiesc_completed = rdp->completed;
- rcu_preempt_qs(cpu);
- local_irq_restore(flags);
+ barrier();
+ rdp->passed_quiesc = 1;
+ rcu_preempt_note_context_switch(cpu);
}

void rcu_bh_qs(int cpu)
{
- unsigned long flags;
struct rcu_data *rdp;

- local_irq_save(flags);
rdp = &per_cpu(rcu_bh_data, cpu);
- rdp->passed_quiesc = 1;
rdp->passed_quiesc_completed = rdp->completed;
- local_irq_restore(flags);
+ barrier();
+ rdp->passed_quiesc = 1;
}

#ifdef CONFIG_NO_HZ
@@ -615,6 +611,7 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)

/* Advance to a new grace period and initialize state. */
rsp->gpnum++;
+ WARN_ON_ONCE(rsp->signaled == RCU_GP_INIT);
rsp->signaled = RCU_GP_INIT; /* Hold off force_quiescent_state. */
rsp->jiffies_force_qs = jiffies + RCU_JIFFIES_TILL_FORCE_QS;
record_gp_stall_check_time(rsp);
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index b8e4b03..c9616e4 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -64,34 +64,42 @@ EXPORT_SYMBOL_GPL(rcu_batches_completed);
* not in a quiescent state. There might be any number of tasks blocked
* while in an RCU read-side critical section.
*/
-static void rcu_preempt_qs_record(int cpu)
+static void rcu_preempt_qs(int cpu)
{
struct rcu_data *rdp = &per_cpu(rcu_preempt_data, cpu);
- rdp->passed_quiesc = 1;
rdp->passed_quiesc_completed = rdp->completed;
+ barrier();
+ rdp->passed_quiesc = 1;
}

/*
- * We have entered the scheduler or are between softirqs in ksoftirqd.
- * If we are in an RCU read-side critical section, we need to reflect
- * that in the state of the rcu_node structure corresponding to this CPU.
- * Caller must disable hardirqs.
+ * We have entered the scheduler, and the current task might soon be
+ * context-switched away from. If this task is in an RCU read-side
+ * critical section, we will no longer be able to rely on the CPU to
+ * record that fact, so we enqueue the task on the appropriate entry
+ * of the blocked_tasks[] array. The task will dequeue itself when
+ * it exits the outermost enclosing RCU read-side critical section.
+ * Therefore, the current grace period cannot be permitted to complete
+ * until the blocked_tasks[] entry indexed by the low-order bit of
+ * rnp->gpnum empties.
+ *
+ * Caller must disable preemption.
*/
-static void rcu_preempt_qs(int cpu)
+static void rcu_preempt_note_context_switch(int cpu)
{
struct task_struct *t = current;
+ unsigned long flags;
int phase;
struct rcu_data *rdp;
struct rcu_node *rnp;

if (t->rcu_read_lock_nesting &&
(t->rcu_read_unlock_special & RCU_READ_UNLOCK_BLOCKED) == 0) {
- WARN_ON_ONCE(cpu != smp_processor_id());

/* Possibly blocking in an RCU read-side critical section. */
rdp = rcu_preempt_state.rda[cpu];
rnp = rdp->mynode;
- spin_lock(&rnp->lock);
+ spin_lock_irqsave(&rnp->lock, flags);
t->rcu_read_unlock_special |= RCU_READ_UNLOCK_BLOCKED;
t->rcu_blocked_node = rnp;

@@ -112,7 +120,7 @@ static void rcu_preempt_qs(int cpu)
phase = !(rnp->qsmask & rdp->grpmask) ^ (rnp->gpnum & 0x1);
list_add(&t->rcu_node_entry, &rnp->blocked_tasks[phase]);
smp_mb(); /* Ensure later ctxt swtch seen after above. */
- spin_unlock(&rnp->lock);
+ spin_unlock_irqrestore(&rnp->lock, flags);
}

/*
@@ -124,9 +132,8 @@ static void rcu_preempt_qs(int cpu)
* grace period, then the fact that the task has been enqueued
* means that we continue to block the current grace period.
*/
- rcu_preempt_qs_record(cpu);
- t->rcu_read_unlock_special &= ~(RCU_READ_UNLOCK_NEED_QS |
- RCU_READ_UNLOCK_GOT_QS);
+ rcu_preempt_qs(cpu);
+ t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
}

/*
@@ -162,7 +169,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
special = t->rcu_read_unlock_special;
if (special & RCU_READ_UNLOCK_NEED_QS) {
t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
- t->rcu_read_unlock_special |= RCU_READ_UNLOCK_GOT_QS;
+ rcu_preempt_qs(smp_processor_id());
}

/* Hardware IRQ handlers cannot block. */
@@ -199,9 +206,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
*/
if (!empty && rnp->qsmask == 0 &&
list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x1])) {
- t->rcu_read_unlock_special &=
- ~(RCU_READ_UNLOCK_NEED_QS |
- RCU_READ_UNLOCK_GOT_QS);
+ t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
if (rnp->parent == NULL) {
/* Only one rcu_node in the tree. */
cpu_quiet_msk_finish(&rcu_preempt_state, flags);
@@ -352,19 +357,12 @@ static void rcu_preempt_check_callbacks(int cpu)
struct task_struct *t = current;

if (t->rcu_read_lock_nesting == 0) {
- t->rcu_read_unlock_special &=
- ~(RCU_READ_UNLOCK_NEED_QS | RCU_READ_UNLOCK_GOT_QS);
- rcu_preempt_qs_record(cpu);
+ t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
+ rcu_preempt_qs(cpu);
return;
}
if (per_cpu(rcu_preempt_data, cpu).qs_pending) {
- if (t->rcu_read_unlock_special & RCU_READ_UNLOCK_GOT_QS) {
- rcu_preempt_qs_record(cpu);
- t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_GOT_QS;
- } else if (!(t->rcu_read_unlock_special &
- RCU_READ_UNLOCK_NEED_QS)) {
- t->rcu_read_unlock_special |= RCU_READ_UNLOCK_NEED_QS;
- }
+ t->rcu_read_unlock_special |= RCU_READ_UNLOCK_NEED_QS;
}
}

@@ -451,7 +449,7 @@ EXPORT_SYMBOL_GPL(rcu_batches_completed);
* Because preemptable RCU does not exist, we never have to check for
* CPUs being in quiescent states.
*/
-static void rcu_preempt_qs(int cpu)
+static void rcu_preempt_note_context_switch(int cpu)
{
}

2009-09-15 07:18:50

by Paul E. McKenney

[permalink] [raw]
Subject: [tip:core/urgent] rcu: Fix synchronize_rcu() for TREE_PREEMPT_RCU

Commit-ID: 366b04ca60c70479e2959fe8485b87ff380fdbbf
Gitweb: http://git.kernel.org/tip/366b04ca60c70479e2959fe8485b87ff380fdbbf
Author: Paul E. McKenney <[email protected]>
AuthorDate: Sun, 13 Sep 2009 09:15:11 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 15 Sep 2009 08:43:59 +0200

rcu: Fix synchronize_rcu() for TREE_PREEMPT_RCU

The redirection of synchronize_sched() to synchronize_rcu() was
appropriate for TREE_RCU, but not for TREE_PREEMPT_RCU.

Fix this by creating an underlying synchronize_sched(). TREE_RCU
then redirects synchronize_rcu() to synchronize_sched(), while
TREE_PREEMPT_RCU has its own version of synchronize_rcu().

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <12528585111916-git-send-email->
Signed-off-by: Ingo Molnar <[email protected]>


---
include/linux/rcupdate.h | 23 +++++------------------
include/linux/rcutree.h | 4 ++--
kernel/rcupdate.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 50 insertions(+), 21 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 95e0615..39dce83 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -52,8 +52,13 @@ struct rcu_head {
};

/* Exported common interfaces */
+#ifdef CONFIG_TREE_PREEMPT_RCU
extern void synchronize_rcu(void);
+#else /* #ifdef CONFIG_TREE_PREEMPT_RCU */
+#define synchronize_rcu synchronize_sched
+#endif /* #else #ifdef CONFIG_TREE_PREEMPT_RCU */
extern void synchronize_rcu_bh(void);
+extern void synchronize_sched(void);
extern void rcu_barrier(void);
extern void rcu_barrier_bh(void);
extern void rcu_barrier_sched(void);
@@ -262,24 +267,6 @@ struct rcu_synchronize {
extern void wakeme_after_rcu(struct rcu_head *head);

/**
- * synchronize_sched - block until all CPUs have exited any non-preemptive
- * kernel code sequences.
- *
- * This means that all preempt_disable code sequences, including NMI and
- * hardware-interrupt handlers, in progress on entry will have completed
- * before this primitive returns. However, this does not guarantee that
- * softirq handlers will have completed, since in some kernels, these
- * handlers can run in process context, and can block.
- *
- * This primitive provides the guarantees made by the (now removed)
- * synchronize_kernel() API. In contrast, synchronize_rcu() only
- * guarantees that rcu_read_lock() sections will have completed.
- * In "classic RCU", these two guarantees happen to be one and
- * the same, but can differ in realtime RCU implementations.
- */
-#define synchronize_sched() __synchronize_sched()
-
-/**
* call_rcu - Queue an RCU callback for invocation after a grace period.
* @head: structure to be used for queueing the RCU updates.
* @func: actual update function to be invoked after the grace period
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index a893077..00d08c0 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -53,6 +53,8 @@ static inline void __rcu_read_unlock(void)
preempt_enable();
}

+#define __synchronize_sched() synchronize_rcu()
+
static inline void exit_rcu(void)
{
}
@@ -68,8 +70,6 @@ static inline void __rcu_read_unlock_bh(void)
local_bh_enable();
}

-#define __synchronize_sched() synchronize_rcu()
-
extern void call_rcu_sched(struct rcu_head *head,
void (*func)(struct rcu_head *rcu));

diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index bd5d5c8..28d2f24 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -74,6 +74,8 @@ void wakeme_after_rcu(struct rcu_head *head)
complete(&rcu->completion);
}

+#ifdef CONFIG_TREE_PREEMPT_RCU
+
/**
* synchronize_rcu - wait until a grace period has elapsed.
*
@@ -87,7 +89,7 @@ void synchronize_rcu(void)
{
struct rcu_synchronize rcu;

- if (rcu_blocking_is_gp())
+ if (!rcu_scheduler_active)
return;

init_completion(&rcu.completion);
@@ -98,6 +100,46 @@ void synchronize_rcu(void)
}
EXPORT_SYMBOL_GPL(synchronize_rcu);

+#endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */
+
+/**
+ * synchronize_sched - wait until an rcu-sched grace period has elapsed.
+ *
+ * Control will return to the caller some time after a full rcu-sched
+ * grace period has elapsed, in other words after all currently executing
+ * rcu-sched read-side critical sections have completed. These read-side
+ * critical sections are delimited by rcu_read_lock_sched() and
+ * rcu_read_unlock_sched(), and may be nested. Note that preempt_disable(),
+ * local_irq_disable(), and so on may be used in place of
+ * rcu_read_lock_sched().
+ *
+ * This means that all preempt_disable code sequences, including NMI and
+ * hardware-interrupt handlers, in progress on entry will have completed
+ * before this primitive returns. However, this does not guarantee that
+ * softirq handlers will have completed, since in some kernels, these
+ * handlers can run in process context, and can block.
+ *
+ * This primitive provides the guarantees made by the (now removed)
+ * synchronize_kernel() API. In contrast, synchronize_rcu() only
+ * guarantees that rcu_read_lock() sections will have completed.
+ * In "classic RCU", these two guarantees happen to be one and
+ * the same, but can differ in realtime RCU implementations.
+ */
+void synchronize_sched(void)
+{
+ struct rcu_synchronize rcu;
+
+ if (rcu_blocking_is_gp())
+ return;
+
+ init_completion(&rcu.completion);
+ /* Will wake me after RCU finished. */
+ call_rcu_sched(&rcu.head, wakeme_after_rcu);
+ /* Wait for it. */
+ wait_for_completion(&rcu.completion);
+}
+EXPORT_SYMBOL_GPL(synchronize_sched);
+
/**
* synchronize_rcu_bh - wait until an rcu_bh grace period has elapsed.
*

2009-09-15 19:54:02

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 3/4] Simplify rcu_read_unlock_special() quiescent-state accounting

On Sun, Sep 13, 2009 at 09:15:10AM -0700, Paul E. McKenney wrote:
> From: Paul E. McKenney <[email protected]>
>
> The earlier approach required two scheduling-clock ticks to note
> an preemptable-RCU quiescent state in the situation in which the
> scheduling-clock interrupt is unlucky enough to always interrupt an RCU
> read-side critical section. With this change, the quiescent state is
> instead noted by the outermost rcu_read_unlock() immediately following the
> first scheduling-clock tick, or, alternatively, by the first subsequent
> context switch. Therefore, this change also speeds up grace periods.
>
> Suggested-by: Josh Triplett <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>

Acked-by: Josh Triplett <[email protected]>

(patch left quoted for context)

> ---
> include/linux/sched.h | 1 -
> kernel/rcutree.c | 15 +++++-------
> kernel/rcutree_plugin.h | 54 ++++++++++++++++++++++------------------------
> 3 files changed, 32 insertions(+), 38 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 855fd0d..e00ee56 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1731,7 +1731,6 @@ extern cputime_t task_gtime(struct task_struct *p);
>
> #define RCU_READ_UNLOCK_BLOCKED (1 << 0) /* blocked while in RCU read-side. */
> #define RCU_READ_UNLOCK_NEED_QS (1 << 1) /* RCU core needs CPU response. */
> -#define RCU_READ_UNLOCK_GOT_QS (1 << 2) /* CPU has responded to RCU core. */
>
> static inline void rcu_copy_process(struct task_struct *p)
> {
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 3a01405..2454999 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -106,27 +106,23 @@ static void __cpuinit rcu_init_percpu_data(int cpu, struct rcu_state *rsp,
> */
> void rcu_sched_qs(int cpu)
> {
> - unsigned long flags;
> struct rcu_data *rdp;
>
> - local_irq_save(flags);
> rdp = &per_cpu(rcu_sched_data, cpu);
> - rdp->passed_quiesc = 1;
> rdp->passed_quiesc_completed = rdp->completed;
> - rcu_preempt_qs(cpu);
> - local_irq_restore(flags);
> + barrier();
> + rdp->passed_quiesc = 1;
> + rcu_preempt_note_context_switch(cpu);
> }
>
> void rcu_bh_qs(int cpu)
> {
> - unsigned long flags;
> struct rcu_data *rdp;
>
> - local_irq_save(flags);
> rdp = &per_cpu(rcu_bh_data, cpu);
> - rdp->passed_quiesc = 1;
> rdp->passed_quiesc_completed = rdp->completed;
> - local_irq_restore(flags);
> + barrier();
> + rdp->passed_quiesc = 1;
> }
>
> #ifdef CONFIG_NO_HZ
> @@ -610,6 +606,7 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
>
> /* Advance to a new grace period and initialize state. */
> rsp->gpnum++;
> + WARN_ON_ONCE(rsp->signaled == RCU_GP_INIT);
> rsp->signaled = RCU_GP_INIT; /* Hold off force_quiescent_state. */
> rsp->jiffies_force_qs = jiffies + RCU_JIFFIES_TILL_FORCE_QS;
> record_gp_stall_check_time(rsp);
> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> index 51413cb..eb4bae3 100644
> --- a/kernel/rcutree_plugin.h
> +++ b/kernel/rcutree_plugin.h
> @@ -64,34 +64,42 @@ EXPORT_SYMBOL_GPL(rcu_batches_completed);
> * not in a quiescent state. There might be any number of tasks blocked
> * while in an RCU read-side critical section.
> */
> -static void rcu_preempt_qs_record(int cpu)
> +static void rcu_preempt_qs(int cpu)
> {
> struct rcu_data *rdp = &per_cpu(rcu_preempt_data, cpu);
> - rdp->passed_quiesc = 1;
> rdp->passed_quiesc_completed = rdp->completed;
> + barrier();
> + rdp->passed_quiesc = 1;
> }
>
> /*
> - * We have entered the scheduler or are between softirqs in ksoftirqd.
> - * If we are in an RCU read-side critical section, we need to reflect
> - * that in the state of the rcu_node structure corresponding to this CPU.
> - * Caller must disable hardirqs.
> + * We have entered the scheduler, and the current task might soon be
> + * context-switched away from. If this task is in an RCU read-side
> + * critical section, we will no longer be able to rely on the CPU to
> + * record that fact, so we enqueue the task on the appropriate entry
> + * of the blocked_tasks[] array. The task will dequeue itself when
> + * it exits the outermost enclosing RCU read-side critical section.
> + * Therefore, the current grace period cannot be permitted to complete
> + * until the blocked_tasks[] entry indexed by the low-order bit of
> + * rnp->gpnum empties.
> + *
> + * Caller must disable preemption.
> */
> -static void rcu_preempt_qs(int cpu)
> +static void rcu_preempt_note_context_switch(int cpu)
> {
> struct task_struct *t = current;
> + unsigned long flags;
> int phase;
> struct rcu_data *rdp;
> struct rcu_node *rnp;
>
> if (t->rcu_read_lock_nesting &&
> (t->rcu_read_unlock_special & RCU_READ_UNLOCK_BLOCKED) == 0) {
> - WARN_ON_ONCE(cpu != smp_processor_id());
>
> /* Possibly blocking in an RCU read-side critical section. */
> rdp = rcu_preempt_state.rda[cpu];
> rnp = rdp->mynode;
> - spin_lock(&rnp->lock);
> + spin_lock_irqsave(&rnp->lock, flags);
> t->rcu_read_unlock_special |= RCU_READ_UNLOCK_BLOCKED;
> t->rcu_blocked_node = rnp;
>
> @@ -112,7 +120,7 @@ static void rcu_preempt_qs(int cpu)
> phase = !(rnp->qsmask & rdp->grpmask) ^ (rnp->gpnum & 0x1);
> list_add(&t->rcu_node_entry, &rnp->blocked_tasks[phase]);
> smp_mb(); /* Ensure later ctxt swtch seen after above. */
> - spin_unlock(&rnp->lock);
> + spin_unlock_irqrestore(&rnp->lock, flags);
> }
>
> /*
> @@ -124,9 +132,8 @@ static void rcu_preempt_qs(int cpu)
> * grace period, then the fact that the task has been enqueued
> * means that we continue to block the current grace period.
> */
> - rcu_preempt_qs_record(cpu);
> - t->rcu_read_unlock_special &= ~(RCU_READ_UNLOCK_NEED_QS |
> - RCU_READ_UNLOCK_GOT_QS);
> + rcu_preempt_qs(cpu);
> + t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
> }
>
> /*
> @@ -162,7 +169,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
> special = t->rcu_read_unlock_special;
> if (special & RCU_READ_UNLOCK_NEED_QS) {
> t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
> - t->rcu_read_unlock_special |= RCU_READ_UNLOCK_GOT_QS;
> + rcu_preempt_qs(smp_processor_id());
> }
>
> /* Hardware IRQ handlers cannot block. */
> @@ -199,9 +206,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
> */
> if (!empty && rnp->qsmask == 0 &&
> list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x1])) {
> - t->rcu_read_unlock_special &=
> - ~(RCU_READ_UNLOCK_NEED_QS |
> - RCU_READ_UNLOCK_GOT_QS);
> + t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
> if (rnp->parent == NULL) {
> /* Only one rcu_node in the tree. */
> cpu_quiet_msk_finish(&rcu_preempt_state, flags);
> @@ -352,19 +357,12 @@ static void rcu_preempt_check_callbacks(int cpu)
> struct task_struct *t = current;
>
> if (t->rcu_read_lock_nesting == 0) {
> - t->rcu_read_unlock_special &=
> - ~(RCU_READ_UNLOCK_NEED_QS | RCU_READ_UNLOCK_GOT_QS);
> - rcu_preempt_qs_record(cpu);
> + t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
> + rcu_preempt_qs(cpu);
> return;
> }
> if (per_cpu(rcu_preempt_data, cpu).qs_pending) {
> - if (t->rcu_read_unlock_special & RCU_READ_UNLOCK_GOT_QS) {
> - rcu_preempt_qs_record(cpu);
> - t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_GOT_QS;
> - } else if (!(t->rcu_read_unlock_special &
> - RCU_READ_UNLOCK_NEED_QS)) {
> - t->rcu_read_unlock_special |= RCU_READ_UNLOCK_NEED_QS;
> - }
> + t->rcu_read_unlock_special |= RCU_READ_UNLOCK_NEED_QS;
> }
> }
>
> @@ -451,7 +449,7 @@ EXPORT_SYMBOL_GPL(rcu_batches_completed);
> * Because preemptable RCU does not exist, we never have to check for
> * CPUs being in quiescent states.
> */
> -static void rcu_preempt_qs(int cpu)
> +static void rcu_preempt_note_context_switch(int cpu)
> {
> }
>

2009-09-17 22:11:23

by Paul E. McKenney

[permalink] [raw]
Subject: [tip:core/urgent] rcu: Kconfig help needs to say that TREE_PREEMPT_RCU scales down

Commit-ID: bbe3eae8bb039b5ffd64a6e3d1a0deaa1f3cbae9
Gitweb: http://git.kernel.org/tip/bbe3eae8bb039b5ffd64a6e3d1a0deaa1f3cbae9
Author: Paul E. McKenney <[email protected]>
AuthorDate: Sun, 13 Sep 2009 09:15:08 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 18 Sep 2009 00:05:53 +0200

rcu: Kconfig help needs to say that TREE_PREEMPT_RCU scales down

To quote Valdis:

This leaves somebody who has a laptop wondering which
choice is best for a system with only one or two cores that
has CONFIG_PREEMPT defined. One choice says it scales down
nicely, the other explicitly has a 'depends on PREEMPT'
attached to it...

So add "scales down nicely" to TREE_PREEMPT_RCU to match that of
TREE_RCU.

Suggested-by: Valdis Kletnieks <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <12528585112362-git-send-email->
Signed-off-by: Ingo Molnar <[email protected]>


---
init/Kconfig | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 8e8b76d..4c2c936 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -331,7 +331,8 @@ config TREE_PREEMPT_RCU
This option selects the RCU implementation that is
designed for very large SMP systems with hundreds or
thousands of CPUs, but for which real-time response
- is also required.
+ is also required. It also scales down nicely to
+ smaller systems.

endchoice

2009-09-17 22:11:35

by Paul E. McKenney

[permalink] [raw]
Subject: [tip:core/urgent] rcu: Add debug checks to TREE_PREEMPT_RCU for premature grace periods

Commit-ID: b0e165c035b13e1074fa0b555318bd9cb7102558
Gitweb: http://git.kernel.org/tip/b0e165c035b13e1074fa0b555318bd9cb7102558
Author: Paul E. McKenney <[email protected]>
AuthorDate: Sun, 13 Sep 2009 09:15:09 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 18 Sep 2009 00:06:13 +0200

rcu: Add debug checks to TREE_PREEMPT_RCU for premature grace periods

Check to make sure that there are no blocked tasks for the previous
grace period while initializing for the next grace period, verify
that rcu_preempt_qs() is given the correct CPU number and is never
called for an offline CPU.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <12528585111986-git-send-email->
Signed-off-by: Ingo Molnar <[email protected]>


---
kernel/rcutree.c | 2 ++
kernel/rcutree_plugin.h | 25 +++++++++++++++++++++++++
2 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index da301e2..e9a4ae9 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -632,6 +632,7 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
/* Special-case the common single-level case. */
if (NUM_RCU_NODES == 1) {
rnp->qsmask = rnp->qsmaskinit;
+ rcu_preempt_check_blocked_tasks(rnp);
rnp->gpnum = rsp->gpnum;
rsp->signaled = RCU_SIGNAL_INIT; /* force_quiescent_state OK. */
spin_unlock_irqrestore(&rnp->lock, flags);
@@ -665,6 +666,7 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
for (rnp_cur = &rsp->node[0]; rnp_cur < rnp_end; rnp_cur++) {
spin_lock(&rnp_cur->lock); /* irqs already disabled. */
rnp_cur->qsmask = rnp_cur->qsmaskinit;
+ rcu_preempt_check_blocked_tasks(rnp);
rnp->gpnum = rsp->gpnum;
spin_unlock(&rnp_cur->lock); /* irqs already disabled. */
}
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 4778936..b8e4b03 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -86,6 +86,7 @@ static void rcu_preempt_qs(int cpu)

if (t->rcu_read_lock_nesting &&
(t->rcu_read_unlock_special & RCU_READ_UNLOCK_BLOCKED) == 0) {
+ WARN_ON_ONCE(cpu != smp_processor_id());

/* Possibly blocking in an RCU read-side critical section. */
rdp = rcu_preempt_state.rda[cpu];
@@ -103,7 +104,11 @@ static void rcu_preempt_qs(int cpu)
* state for the current grace period), then as long
* as that task remains queued, the current grace period
* cannot end.
+ *
+ * But first, note that the current CPU must still be
+ * on line!
*/
+ WARN_ON_ONCE((rdp->grpmask & rnp->qsmaskinit) == 0);
phase = !(rnp->qsmask & rdp->grpmask) ^ (rnp->gpnum & 0x1);
list_add(&t->rcu_node_entry, &rnp->blocked_tasks[phase]);
smp_mb(); /* Ensure later ctxt swtch seen after above. */
@@ -259,6 +264,18 @@ static void rcu_print_task_stall(struct rcu_node *rnp)
#endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */

/*
+ * Check that the list of blocked tasks for the newly completed grace
+ * period is in fact empty. It is a serious bug to complete a grace
+ * period that still has RCU readers blocked! This function must be
+ * invoked -before- updating this rnp's ->gpnum, and the rnp's ->lock
+ * must be held by the caller.
+ */
+static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
+{
+ WARN_ON_ONCE(!list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x1]));
+}
+
+/*
* Check for preempted RCU readers for the specified rcu_node structure.
* If the caller needs a reliable answer, it must hold the rcu_node's
* >lock.
@@ -451,6 +468,14 @@ static void rcu_print_task_stall(struct rcu_node *rnp)
#endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */

/*
+ * Because there is no preemptable RCU, there can be no readers blocked,
+ * so there is no need to check for blocked tasks.
+ */
+static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
+{
+}
+
+/*
* Because preemptable RCU does not exist, there are never any preempted
* RCU readers.
*/

2009-09-17 22:11:52

by Paul E. McKenney

[permalink] [raw]
Subject: [tip:core/urgent] rcu: Simplify rcu_read_unlock_special() quiescent-state accounting

Commit-ID: c3422bea5f09b0e85704f51f2b01271630b8940b
Gitweb: http://git.kernel.org/tip/c3422bea5f09b0e85704f51f2b01271630b8940b
Author: Paul E. McKenney <[email protected]>
AuthorDate: Sun, 13 Sep 2009 09:15:10 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 18 Sep 2009 00:06:33 +0200

rcu: Simplify rcu_read_unlock_special() quiescent-state accounting

The earlier approach required two scheduling-clock ticks to note an
preemptable-RCU quiescent state in the situation in which the
scheduling-clock interrupt is unlucky enough to always interrupt an
RCU read-side critical section.

With this change, the quiescent state is instead noted by the
outermost rcu_read_unlock() immediately following the first
scheduling-clock tick, or, alternatively, by the first subsequent
context switch. Therefore, this change also speeds up grace
periods.

Suggested-by: Josh Triplett <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <12528585111945-git-send-email->
Signed-off-by: Ingo Molnar <[email protected]>


---
include/linux/sched.h | 1 -
kernel/rcutree.c | 15 +++++-------
kernel/rcutree_plugin.h | 54 ++++++++++++++++++++++------------------------
3 files changed, 32 insertions(+), 38 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f3d74bd..c62a9f8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1740,7 +1740,6 @@ extern cputime_t task_gtime(struct task_struct *p);

#define RCU_READ_UNLOCK_BLOCKED (1 << 0) /* blocked while in RCU read-side. */
#define RCU_READ_UNLOCK_NEED_QS (1 << 1) /* RCU core needs CPU response. */
-#define RCU_READ_UNLOCK_GOT_QS (1 << 2) /* CPU has responded to RCU core. */

static inline void rcu_copy_process(struct task_struct *p)
{
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index e9a4ae9..6c99553 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -107,27 +107,23 @@ static void __cpuinit rcu_init_percpu_data(int cpu, struct rcu_state *rsp,
*/
void rcu_sched_qs(int cpu)
{
- unsigned long flags;
struct rcu_data *rdp;

- local_irq_save(flags);
rdp = &per_cpu(rcu_sched_data, cpu);
- rdp->passed_quiesc = 1;
rdp->passed_quiesc_completed = rdp->completed;
- rcu_preempt_qs(cpu);
- local_irq_restore(flags);
+ barrier();
+ rdp->passed_quiesc = 1;
+ rcu_preempt_note_context_switch(cpu);
}

void rcu_bh_qs(int cpu)
{
- unsigned long flags;
struct rcu_data *rdp;

- local_irq_save(flags);
rdp = &per_cpu(rcu_bh_data, cpu);
- rdp->passed_quiesc = 1;
rdp->passed_quiesc_completed = rdp->completed;
- local_irq_restore(flags);
+ barrier();
+ rdp->passed_quiesc = 1;
}

#ifdef CONFIG_NO_HZ
@@ -615,6 +611,7 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)

/* Advance to a new grace period and initialize state. */
rsp->gpnum++;
+ WARN_ON_ONCE(rsp->signaled == RCU_GP_INIT);
rsp->signaled = RCU_GP_INIT; /* Hold off force_quiescent_state. */
rsp->jiffies_force_qs = jiffies + RCU_JIFFIES_TILL_FORCE_QS;
record_gp_stall_check_time(rsp);
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index b8e4b03..c9616e4 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -64,34 +64,42 @@ EXPORT_SYMBOL_GPL(rcu_batches_completed);
* not in a quiescent state. There might be any number of tasks blocked
* while in an RCU read-side critical section.
*/
-static void rcu_preempt_qs_record(int cpu)
+static void rcu_preempt_qs(int cpu)
{
struct rcu_data *rdp = &per_cpu(rcu_preempt_data, cpu);
- rdp->passed_quiesc = 1;
rdp->passed_quiesc_completed = rdp->completed;
+ barrier();
+ rdp->passed_quiesc = 1;
}

/*
- * We have entered the scheduler or are between softirqs in ksoftirqd.
- * If we are in an RCU read-side critical section, we need to reflect
- * that in the state of the rcu_node structure corresponding to this CPU.
- * Caller must disable hardirqs.
+ * We have entered the scheduler, and the current task might soon be
+ * context-switched away from. If this task is in an RCU read-side
+ * critical section, we will no longer be able to rely on the CPU to
+ * record that fact, so we enqueue the task on the appropriate entry
+ * of the blocked_tasks[] array. The task will dequeue itself when
+ * it exits the outermost enclosing RCU read-side critical section.
+ * Therefore, the current grace period cannot be permitted to complete
+ * until the blocked_tasks[] entry indexed by the low-order bit of
+ * rnp->gpnum empties.
+ *
+ * Caller must disable preemption.
*/
-static void rcu_preempt_qs(int cpu)
+static void rcu_preempt_note_context_switch(int cpu)
{
struct task_struct *t = current;
+ unsigned long flags;
int phase;
struct rcu_data *rdp;
struct rcu_node *rnp;

if (t->rcu_read_lock_nesting &&
(t->rcu_read_unlock_special & RCU_READ_UNLOCK_BLOCKED) == 0) {
- WARN_ON_ONCE(cpu != smp_processor_id());

/* Possibly blocking in an RCU read-side critical section. */
rdp = rcu_preempt_state.rda[cpu];
rnp = rdp->mynode;
- spin_lock(&rnp->lock);
+ spin_lock_irqsave(&rnp->lock, flags);
t->rcu_read_unlock_special |= RCU_READ_UNLOCK_BLOCKED;
t->rcu_blocked_node = rnp;

@@ -112,7 +120,7 @@ static void rcu_preempt_qs(int cpu)
phase = !(rnp->qsmask & rdp->grpmask) ^ (rnp->gpnum & 0x1);
list_add(&t->rcu_node_entry, &rnp->blocked_tasks[phase]);
smp_mb(); /* Ensure later ctxt swtch seen after above. */
- spin_unlock(&rnp->lock);
+ spin_unlock_irqrestore(&rnp->lock, flags);
}

/*
@@ -124,9 +132,8 @@ static void rcu_preempt_qs(int cpu)
* grace period, then the fact that the task has been enqueued
* means that we continue to block the current grace period.
*/
- rcu_preempt_qs_record(cpu);
- t->rcu_read_unlock_special &= ~(RCU_READ_UNLOCK_NEED_QS |
- RCU_READ_UNLOCK_GOT_QS);
+ rcu_preempt_qs(cpu);
+ t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
}

/*
@@ -162,7 +169,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
special = t->rcu_read_unlock_special;
if (special & RCU_READ_UNLOCK_NEED_QS) {
t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
- t->rcu_read_unlock_special |= RCU_READ_UNLOCK_GOT_QS;
+ rcu_preempt_qs(smp_processor_id());
}

/* Hardware IRQ handlers cannot block. */
@@ -199,9 +206,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
*/
if (!empty && rnp->qsmask == 0 &&
list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x1])) {
- t->rcu_read_unlock_special &=
- ~(RCU_READ_UNLOCK_NEED_QS |
- RCU_READ_UNLOCK_GOT_QS);
+ t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
if (rnp->parent == NULL) {
/* Only one rcu_node in the tree. */
cpu_quiet_msk_finish(&rcu_preempt_state, flags);
@@ -352,19 +357,12 @@ static void rcu_preempt_check_callbacks(int cpu)
struct task_struct *t = current;

if (t->rcu_read_lock_nesting == 0) {
- t->rcu_read_unlock_special &=
- ~(RCU_READ_UNLOCK_NEED_QS | RCU_READ_UNLOCK_GOT_QS);
- rcu_preempt_qs_record(cpu);
+ t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
+ rcu_preempt_qs(cpu);
return;
}
if (per_cpu(rcu_preempt_data, cpu).qs_pending) {
- if (t->rcu_read_unlock_special & RCU_READ_UNLOCK_GOT_QS) {
- rcu_preempt_qs_record(cpu);
- t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_GOT_QS;
- } else if (!(t->rcu_read_unlock_special &
- RCU_READ_UNLOCK_NEED_QS)) {
- t->rcu_read_unlock_special |= RCU_READ_UNLOCK_NEED_QS;
- }
+ t->rcu_read_unlock_special |= RCU_READ_UNLOCK_NEED_QS;
}
}

@@ -451,7 +449,7 @@ EXPORT_SYMBOL_GPL(rcu_batches_completed);
* Because preemptable RCU does not exist, we never have to check for
* CPUs being in quiescent states.
*/
-static void rcu_preempt_qs(int cpu)
+static void rcu_preempt_note_context_switch(int cpu)
{
}

2009-09-17 22:11:57

by Paul E. McKenney

[permalink] [raw]
Subject: [tip:core/urgent] rcu: Fix synchronize_rcu() for TREE_PREEMPT_RCU

Commit-ID: 16e3081191837a6a04733de5cd5d1d1b303140d4
Gitweb: http://git.kernel.org/tip/16e3081191837a6a04733de5cd5d1d1b303140d4
Author: Paul E. McKenney <[email protected]>
AuthorDate: Sun, 13 Sep 2009 09:15:11 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 18 Sep 2009 00:06:53 +0200

rcu: Fix synchronize_rcu() for TREE_PREEMPT_RCU

The redirection of synchronize_sched() to synchronize_rcu() was
appropriate for TREE_RCU, but not for TREE_PREEMPT_RCU.

Fix this by creating an underlying synchronize_sched(). TREE_RCU
then redirects synchronize_rcu() to synchronize_sched(), while
TREE_PREEMPT_RCU has its own version of synchronize_rcu().

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <12528585111916-git-send-email->
Signed-off-by: Ingo Molnar <[email protected]>


---
include/linux/rcupdate.h | 23 +++++------------------
include/linux/rcutree.h | 4 ++--
kernel/rcupdate.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 50 insertions(+), 21 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 95e0615..39dce83 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -52,8 +52,13 @@ struct rcu_head {
};

/* Exported common interfaces */
+#ifdef CONFIG_TREE_PREEMPT_RCU
extern void synchronize_rcu(void);
+#else /* #ifdef CONFIG_TREE_PREEMPT_RCU */
+#define synchronize_rcu synchronize_sched
+#endif /* #else #ifdef CONFIG_TREE_PREEMPT_RCU */
extern void synchronize_rcu_bh(void);
+extern void synchronize_sched(void);
extern void rcu_barrier(void);
extern void rcu_barrier_bh(void);
extern void rcu_barrier_sched(void);
@@ -262,24 +267,6 @@ struct rcu_synchronize {
extern void wakeme_after_rcu(struct rcu_head *head);

/**
- * synchronize_sched - block until all CPUs have exited any non-preemptive
- * kernel code sequences.
- *
- * This means that all preempt_disable code sequences, including NMI and
- * hardware-interrupt handlers, in progress on entry will have completed
- * before this primitive returns. However, this does not guarantee that
- * softirq handlers will have completed, since in some kernels, these
- * handlers can run in process context, and can block.
- *
- * This primitive provides the guarantees made by the (now removed)
- * synchronize_kernel() API. In contrast, synchronize_rcu() only
- * guarantees that rcu_read_lock() sections will have completed.
- * In "classic RCU", these two guarantees happen to be one and
- * the same, but can differ in realtime RCU implementations.
- */
-#define synchronize_sched() __synchronize_sched()
-
-/**
* call_rcu - Queue an RCU callback for invocation after a grace period.
* @head: structure to be used for queueing the RCU updates.
* @func: actual update function to be invoked after the grace period
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index a893077..00d08c0 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -53,6 +53,8 @@ static inline void __rcu_read_unlock(void)
preempt_enable();
}

+#define __synchronize_sched() synchronize_rcu()
+
static inline void exit_rcu(void)
{
}
@@ -68,8 +70,6 @@ static inline void __rcu_read_unlock_bh(void)
local_bh_enable();
}

-#define __synchronize_sched() synchronize_rcu()
-
extern void call_rcu_sched(struct rcu_head *head,
void (*func)(struct rcu_head *rcu));

diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index bd5d5c8..28d2f24 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -74,6 +74,8 @@ void wakeme_after_rcu(struct rcu_head *head)
complete(&rcu->completion);
}

+#ifdef CONFIG_TREE_PREEMPT_RCU
+
/**
* synchronize_rcu - wait until a grace period has elapsed.
*
@@ -87,7 +89,7 @@ void synchronize_rcu(void)
{
struct rcu_synchronize rcu;

- if (rcu_blocking_is_gp())
+ if (!rcu_scheduler_active)
return;

init_completion(&rcu.completion);
@@ -98,6 +100,46 @@ void synchronize_rcu(void)
}
EXPORT_SYMBOL_GPL(synchronize_rcu);

+#endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */
+
+/**
+ * synchronize_sched - wait until an rcu-sched grace period has elapsed.
+ *
+ * Control will return to the caller some time after a full rcu-sched
+ * grace period has elapsed, in other words after all currently executing
+ * rcu-sched read-side critical sections have completed. These read-side
+ * critical sections are delimited by rcu_read_lock_sched() and
+ * rcu_read_unlock_sched(), and may be nested. Note that preempt_disable(),
+ * local_irq_disable(), and so on may be used in place of
+ * rcu_read_lock_sched().
+ *
+ * This means that all preempt_disable code sequences, including NMI and
+ * hardware-interrupt handlers, in progress on entry will have completed
+ * before this primitive returns. However, this does not guarantee that
+ * softirq handlers will have completed, since in some kernels, these
+ * handlers can run in process context, and can block.
+ *
+ * This primitive provides the guarantees made by the (now removed)
+ * synchronize_kernel() API. In contrast, synchronize_rcu() only
+ * guarantees that rcu_read_lock() sections will have completed.
+ * In "classic RCU", these two guarantees happen to be one and
+ * the same, but can differ in realtime RCU implementations.
+ */
+void synchronize_sched(void)
+{
+ struct rcu_synchronize rcu;
+
+ if (rcu_blocking_is_gp())
+ return;
+
+ init_completion(&rcu.completion);
+ /* Will wake me after RCU finished. */
+ call_rcu_sched(&rcu.head, wakeme_after_rcu);
+ /* Wait for it. */
+ wait_for_completion(&rcu.completion);
+}
+EXPORT_SYMBOL_GPL(synchronize_sched);
+
/**
* synchronize_rcu_bh - wait until an rcu_bh grace period has elapsed.
*

2009-09-13 16:15:16

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 2/4] Add debug checks to TREE_PREEMPT_RCU for premature grace periods.

From: Paul E. McKenney <[email protected]>

Check to make sure that there are no blocked tasks for the previous
grace period while initializing for the next grace period, verify
that rcu_preempt_qs() is given the correct CPU number and is never
called for an offline CPU.

Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcutree.c | 2 ++
kernel/rcutree_plugin.h | 25 +++++++++++++++++++++++++
2 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index bca0aba..3a01405 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -627,6 +627,7 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
/* Special-case the common single-level case. */
if (NUM_RCU_NODES == 1) {
rnp->qsmask = rnp->qsmaskinit;
+ rcu_preempt_check_blocked_tasks(rnp);
rnp->gpnum = rsp->gpnum;
rsp->signaled = RCU_SIGNAL_INIT; /* force_quiescent_state OK. */
spin_unlock_irqrestore(&rnp->lock, flags);
@@ -660,6 +661,7 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
for (rnp_cur = &rsp->node[0]; rnp_cur < rnp_end; rnp_cur++) {
spin_lock(&rnp_cur->lock); /* irqs already disabled. */
rnp_cur->qsmask = rnp_cur->qsmaskinit;
+ rcu_preempt_check_blocked_tasks(rnp);
rnp->gpnum = rsp->gpnum;
spin_unlock(&rnp_cur->lock); /* irqs already disabled. */
}
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 4778936..51413cb 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -86,6 +86,7 @@ static void rcu_preempt_qs(int cpu)

if (t->rcu_read_lock_nesting &&
(t->rcu_read_unlock_special & RCU_READ_UNLOCK_BLOCKED) == 0) {
+ WARN_ON_ONCE(cpu != smp_processor_id());

/* Possibly blocking in an RCU read-side critical section. */
rdp = rcu_preempt_state.rda[cpu];
@@ -103,7 +104,11 @@ static void rcu_preempt_qs(int cpu)
* state for the current grace period), then as long
* as that task remains queued, the current grace period
* cannot end.
+ *
+ * But first, note that the current CPU must still be
+ * on line!
*/
+ WARN_ON_ONCE((rdp->grpmask & rnp->qsmaskinit) == 0);
phase = !(rnp->qsmask & rdp->grpmask) ^ (rnp->gpnum & 0x1);
list_add(&t->rcu_node_entry, &rnp->blocked_tasks[phase]);
smp_mb(); /* Ensure later ctxt swtch seen after above. */
@@ -259,6 +264,18 @@ static void rcu_print_task_stall(struct rcu_node *rnp)
#endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */

/*
+ * Check that the list of blocked tasks for the newly completed grace
+ * period is in fact empty. It is a serious bug to complete a grace
+ * period that still has RCU readers blocked! This function must be
+ * invoked -before- updating this rnp's ->gpnum, and the rnp's ->lock
+ * must be held by the caller.
+ */
+static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
+{
+ WARN_ON_ONCE(!list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x1]));
+}
+
+/*
* Check for preempted RCU readers for the specified rcu_node structure.
* If the caller needs a reliable answer, it must hold the rcu_node's
* >lock.
@@ -451,6 +468,14 @@ static void rcu_print_task_stall(struct rcu_node *rnp)
#endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */

/*
+ * Because there is no preemptable RCU, there can be no readers blocked,
+ * so there is no need to check for blocked tasks.
+ */
+static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
+{
+}
+
+/*
* Because preemptable RCU does not exist, there are never any preempted
* RCU readers.
*/
--
1.5.2.5

2009-09-13 16:15:22

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 3/4] Simplify rcu_read_unlock_special() quiescent-state accounting

From: Paul E. McKenney <[email protected]>

The earlier approach required two scheduling-clock ticks to note
an preemptable-RCU quiescent state in the situation in which the
scheduling-clock interrupt is unlucky enough to always interrupt an RCU
read-side critical section. With this change, the quiescent state is
instead noted by the outermost rcu_read_unlock() immediately following the
first scheduling-clock tick, or, alternatively, by the first subsequent
context switch. Therefore, this change also speeds up grace periods.

Suggested-by: Josh Triplett <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
include/linux/sched.h | 1 -
kernel/rcutree.c | 15 +++++-------
kernel/rcutree_plugin.h | 54 ++++++++++++++++++++++------------------------
3 files changed, 32 insertions(+), 38 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 855fd0d..e00ee56 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1731,7 +1731,6 @@ extern cputime_t task_gtime(struct task_struct *p);

#define RCU_READ_UNLOCK_BLOCKED (1 << 0) /* blocked while in RCU read-side. */
#define RCU_READ_UNLOCK_NEED_QS (1 << 1) /* RCU core needs CPU response. */
-#define RCU_READ_UNLOCK_GOT_QS (1 << 2) /* CPU has responded to RCU core. */

static inline void rcu_copy_process(struct task_struct *p)
{
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 3a01405..2454999 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -106,27 +106,23 @@ static void __cpuinit rcu_init_percpu_data(int cpu, struct rcu_state *rsp,
*/
void rcu_sched_qs(int cpu)
{
- unsigned long flags;
struct rcu_data *rdp;

- local_irq_save(flags);
rdp = &per_cpu(rcu_sched_data, cpu);
- rdp->passed_quiesc = 1;
rdp->passed_quiesc_completed = rdp->completed;
- rcu_preempt_qs(cpu);
- local_irq_restore(flags);
+ barrier();
+ rdp->passed_quiesc = 1;
+ rcu_preempt_note_context_switch(cpu);
}

void rcu_bh_qs(int cpu)
{
- unsigned long flags;
struct rcu_data *rdp;

- local_irq_save(flags);
rdp = &per_cpu(rcu_bh_data, cpu);
- rdp->passed_quiesc = 1;
rdp->passed_quiesc_completed = rdp->completed;
- local_irq_restore(flags);
+ barrier();
+ rdp->passed_quiesc = 1;
}

#ifdef CONFIG_NO_HZ
@@ -610,6 +606,7 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)

/* Advance to a new grace period and initialize state. */
rsp->gpnum++;
+ WARN_ON_ONCE(rsp->signaled == RCU_GP_INIT);
rsp->signaled = RCU_GP_INIT; /* Hold off force_quiescent_state. */
rsp->jiffies_force_qs = jiffies + RCU_JIFFIES_TILL_FORCE_QS;
record_gp_stall_check_time(rsp);
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 51413cb..eb4bae3 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -64,34 +64,42 @@ EXPORT_SYMBOL_GPL(rcu_batches_completed);
* not in a quiescent state. There might be any number of tasks blocked
* while in an RCU read-side critical section.
*/
-static void rcu_preempt_qs_record(int cpu)
+static void rcu_preempt_qs(int cpu)
{
struct rcu_data *rdp = &per_cpu(rcu_preempt_data, cpu);
- rdp->passed_quiesc = 1;
rdp->passed_quiesc_completed = rdp->completed;
+ barrier();
+ rdp->passed_quiesc = 1;
}

/*
- * We have entered the scheduler or are between softirqs in ksoftirqd.
- * If we are in an RCU read-side critical section, we need to reflect
- * that in the state of the rcu_node structure corresponding to this CPU.
- * Caller must disable hardirqs.
+ * We have entered the scheduler, and the current task might soon be
+ * context-switched away from. If this task is in an RCU read-side
+ * critical section, we will no longer be able to rely on the CPU to
+ * record that fact, so we enqueue the task on the appropriate entry
+ * of the blocked_tasks[] array. The task will dequeue itself when
+ * it exits the outermost enclosing RCU read-side critical section.
+ * Therefore, the current grace period cannot be permitted to complete
+ * until the blocked_tasks[] entry indexed by the low-order bit of
+ * rnp->gpnum empties.
+ *
+ * Caller must disable preemption.
*/
-static void rcu_preempt_qs(int cpu)
+static void rcu_preempt_note_context_switch(int cpu)
{
struct task_struct *t = current;
+ unsigned long flags;
int phase;
struct rcu_data *rdp;
struct rcu_node *rnp;

if (t->rcu_read_lock_nesting &&
(t->rcu_read_unlock_special & RCU_READ_UNLOCK_BLOCKED) == 0) {
- WARN_ON_ONCE(cpu != smp_processor_id());

/* Possibly blocking in an RCU read-side critical section. */
rdp = rcu_preempt_state.rda[cpu];
rnp = rdp->mynode;
- spin_lock(&rnp->lock);
+ spin_lock_irqsave(&rnp->lock, flags);
t->rcu_read_unlock_special |= RCU_READ_UNLOCK_BLOCKED;
t->rcu_blocked_node = rnp;

@@ -112,7 +120,7 @@ static void rcu_preempt_qs(int cpu)
phase = !(rnp->qsmask & rdp->grpmask) ^ (rnp->gpnum & 0x1);
list_add(&t->rcu_node_entry, &rnp->blocked_tasks[phase]);
smp_mb(); /* Ensure later ctxt swtch seen after above. */
- spin_unlock(&rnp->lock);
+ spin_unlock_irqrestore(&rnp->lock, flags);
}

/*
@@ -124,9 +132,8 @@ static void rcu_preempt_qs(int cpu)
* grace period, then the fact that the task has been enqueued
* means that we continue to block the current grace period.
*/
- rcu_preempt_qs_record(cpu);
- t->rcu_read_unlock_special &= ~(RCU_READ_UNLOCK_NEED_QS |
- RCU_READ_UNLOCK_GOT_QS);
+ rcu_preempt_qs(cpu);
+ t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
}

/*
@@ -162,7 +169,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
special = t->rcu_read_unlock_special;
if (special & RCU_READ_UNLOCK_NEED_QS) {
t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
- t->rcu_read_unlock_special |= RCU_READ_UNLOCK_GOT_QS;
+ rcu_preempt_qs(smp_processor_id());
}

/* Hardware IRQ handlers cannot block. */
@@ -199,9 +206,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
*/
if (!empty && rnp->qsmask == 0 &&
list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x1])) {
- t->rcu_read_unlock_special &=
- ~(RCU_READ_UNLOCK_NEED_QS |
- RCU_READ_UNLOCK_GOT_QS);
+ t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
if (rnp->parent == NULL) {
/* Only one rcu_node in the tree. */
cpu_quiet_msk_finish(&rcu_preempt_state, flags);
@@ -352,19 +357,12 @@ static void rcu_preempt_check_callbacks(int cpu)
struct task_struct *t = current;

if (t->rcu_read_lock_nesting == 0) {
- t->rcu_read_unlock_special &=
- ~(RCU_READ_UNLOCK_NEED_QS | RCU_READ_UNLOCK_GOT_QS);
- rcu_preempt_qs_record(cpu);
+ t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
+ rcu_preempt_qs(cpu);
return;
}
if (per_cpu(rcu_preempt_data, cpu).qs_pending) {
- if (t->rcu_read_unlock_special & RCU_READ_UNLOCK_GOT_QS) {
- rcu_preempt_qs_record(cpu);
- t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_GOT_QS;
- } else if (!(t->rcu_read_unlock_special &
- RCU_READ_UNLOCK_NEED_QS)) {
- t->rcu_read_unlock_special |= RCU_READ_UNLOCK_NEED_QS;
- }
+ t->rcu_read_unlock_special |= RCU_READ_UNLOCK_NEED_QS;
}
}

@@ -451,7 +449,7 @@ EXPORT_SYMBOL_GPL(rcu_batches_completed);
* Because preemptable RCU does not exist, we never have to check for
* CPUs being in quiescent states.
*/
-static void rcu_preempt_qs(int cpu)
+static void rcu_preempt_note_context_switch(int cpu)
{
}

--
1.5.2.5

2009-09-13 16:15:14

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 1/4] Kconfig help needs to say that TREE_PREEMPT_RCU scales down

To quote Valdis:

This leaves somebody who has a laptop wondering which choice
is best for a system with only one or two cores that has
CONFIG_PREEMPT defined. One choice says it scales down nicely,
the other explicitly has a 'depends on PREEMPT' attached to it...

So add "scales down nicely" to TREE_PREEMPT_RCU to match that of
TREE_RCU.

Suggested-by: Valdis Kletnieks <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
init/Kconfig | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 8e8b76d..4c2c936 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -331,7 +331,8 @@ config TREE_PREEMPT_RCU
This option selects the RCU implementation that is
designed for very large SMP systems with hundreds or
thousands of CPUs, but for which real-time response
- is also required.
+ is also required. It also scales down nicely to
+ smaller systems.

endchoice

--
1.5.2.5

2009-09-13 16:15:41

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 4/4] Fix synchronize_rcu() for TREE_PREEMPT_RCU

From: Paul E. McKenney <[email protected]>

The redirection of synchronize_sched() to synchronize_rcu() was appropriate
for TREE_RCU, but not for TREE_PREEMPT_RCU. Fix this by creating an
underlying synchronize_sched(). TREE_RCU then redirects synchronize_rcu()
to synchronize_sched(), while TREE_PREEMPT_RCU has its own version of
synchronize_rcu().

Signed-off-by: Paul E. McKenney <[email protected]>
---
include/linux/rcupdate.h | 23 +++++------------------
include/linux/rcutree.h | 4 ++--
kernel/rcupdate.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 50 insertions(+), 21 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 95e0615..39dce83 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -52,8 +52,13 @@ struct rcu_head {
};

/* Exported common interfaces */
+#ifdef CONFIG_TREE_PREEMPT_RCU
extern void synchronize_rcu(void);
+#else /* #ifdef CONFIG_TREE_PREEMPT_RCU */
+#define synchronize_rcu synchronize_sched
+#endif /* #else #ifdef CONFIG_TREE_PREEMPT_RCU */
extern void synchronize_rcu_bh(void);
+extern void synchronize_sched(void);
extern void rcu_barrier(void);
extern void rcu_barrier_bh(void);
extern void rcu_barrier_sched(void);
@@ -262,24 +267,6 @@ struct rcu_synchronize {
extern void wakeme_after_rcu(struct rcu_head *head);

/**
- * synchronize_sched - block until all CPUs have exited any non-preemptive
- * kernel code sequences.
- *
- * This means that all preempt_disable code sequences, including NMI and
- * hardware-interrupt handlers, in progress on entry will have completed
- * before this primitive returns. However, this does not guarantee that
- * softirq handlers will have completed, since in some kernels, these
- * handlers can run in process context, and can block.
- *
- * This primitive provides the guarantees made by the (now removed)
- * synchronize_kernel() API. In contrast, synchronize_rcu() only
- * guarantees that rcu_read_lock() sections will have completed.
- * In "classic RCU", these two guarantees happen to be one and
- * the same, but can differ in realtime RCU implementations.
- */
-#define synchronize_sched() __synchronize_sched()
-
-/**
* call_rcu - Queue an RCU callback for invocation after a grace period.
* @head: structure to be used for queueing the RCU updates.
* @func: actual update function to be invoked after the grace period
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index a893077..00d08c0 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -53,6 +53,8 @@ static inline void __rcu_read_unlock(void)
preempt_enable();
}

+#define __synchronize_sched() synchronize_rcu()
+
static inline void exit_rcu(void)
{
}
@@ -68,8 +70,6 @@ static inline void __rcu_read_unlock_bh(void)
local_bh_enable();
}

-#define __synchronize_sched() synchronize_rcu()
-
extern void call_rcu_sched(struct rcu_head *head,
void (*func)(struct rcu_head *rcu));

diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index bd5d5c8..28d2f24 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -74,6 +74,8 @@ void wakeme_after_rcu(struct rcu_head *head)
complete(&rcu->completion);
}

+#ifdef CONFIG_TREE_PREEMPT_RCU
+
/**
* synchronize_rcu - wait until a grace period has elapsed.
*
@@ -87,7 +89,7 @@ void synchronize_rcu(void)
{
struct rcu_synchronize rcu;

- if (rcu_blocking_is_gp())
+ if (!rcu_scheduler_active)
return;

init_completion(&rcu.completion);
@@ -98,6 +100,46 @@ void synchronize_rcu(void)
}
EXPORT_SYMBOL_GPL(synchronize_rcu);

+#endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */
+
+/**
+ * synchronize_sched - wait until an rcu-sched grace period has elapsed.
+ *
+ * Control will return to the caller some time after a full rcu-sched
+ * grace period has elapsed, in other words after all currently executing
+ * rcu-sched read-side critical sections have completed. These read-side
+ * critical sections are delimited by rcu_read_lock_sched() and
+ * rcu_read_unlock_sched(), and may be nested. Note that preempt_disable(),
+ * local_irq_disable(), and so on may be used in place of
+ * rcu_read_lock_sched().
+ *
+ * This means that all preempt_disable code sequences, including NMI and
+ * hardware-interrupt handlers, in progress on entry will have completed
+ * before this primitive returns. However, this does not guarantee that
+ * softirq handlers will have completed, since in some kernels, these
+ * handlers can run in process context, and can block.
+ *
+ * This primitive provides the guarantees made by the (now removed)
+ * synchronize_kernel() API. In contrast, synchronize_rcu() only
+ * guarantees that rcu_read_lock() sections will have completed.
+ * In "classic RCU", these two guarantees happen to be one and
+ * the same, but can differ in realtime RCU implementations.
+ */
+void synchronize_sched(void)
+{
+ struct rcu_synchronize rcu;
+
+ if (rcu_blocking_is_gp())
+ return;
+
+ init_completion(&rcu.completion);
+ /* Will wake me after RCU finished. */
+ call_rcu_sched(&rcu.head, wakeme_after_rcu);
+ /* Wait for it. */
+ wait_for_completion(&rcu.completion);
+}
+EXPORT_SYMBOL_GPL(synchronize_sched);
+
/**
* synchronize_rcu_bh - wait until an rcu_bh grace period has elapsed.
*
--
1.5.2.5