2009-09-18 16:49:56

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 0/3] Cleanups/bugfixes for large systems and TREE_PREEMPT_RCU

This set of patches does some cleanups and fixes a bug that prevents
TREE_PREEMPT_RCU from running reliably on large systems (more than
64 CPUs, or on 32-bit systems, more than 32 CPUs). The patches are
as follows:

1. Add WARN_ON_ONCE() consistency checks to catch bugs. These
are all on slowpaths, so are appropriate for production use.

2. Apply the results of a code walkthrough of rcutree_plugin.h.
This includes a fix for a theoretical race that could result in
excessively long RCU grace periods, or perhaps even hangs/OOMs.

3. Fix a bug introduced in a bugfix commit #de078d875 that caused
large systems to only partially initialize the fields in the
rcu_node tree. TREE_RCU doesn't care about any of these fields,
which explains why only TREE_PREEMPT_RCU was broken.

With these fixes, large systems using TREE_PREEMPT_RCU pass moderate
rcutorture runs (but with Josh Triplett's mods that force frequent
preemption within RCU read-side critical sections). I am sure that
there are more bugs, but these fixes get things much closer.

b/kernel/rcutree.c | 13 +++++++++----
b/kernel/rcutree_plugin.h | 21 ++++++++++++++-------
kernel/rcutree.c | 38 +++++++++-----------------------------
kernel/rcutree_plugin.h | 16 +++++++++-------
4 files changed, 41 insertions(+), 47 deletions(-)


2009-09-18 16:58:11

by Daniel Walker

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 2/3] Apply results of code inspection of kernel/rcutree_plugin.h

On Fri, 2009-09-18 at 09:50 -0700, Paul E. McKenney wrote:

> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> index 2b996c3..a2d586c 100644
> --- a/kernel/rcutree_plugin.h
> +++ b/kernel/rcutree_plugin.h
> @@ -117,9 +117,9 @@ static void rcu_preempt_note_context_switch(int cpu)
> * on line!
> */
> WARN_ON_ONCE((rdp->grpmask & rnp->qsmaskinit) == 0);
> - phase = !(rnp->qsmask & rdp->grpmask) ^ (rnp->gpnum & 0x1);
> + WARN_ON_ONCE(!list_empty(&t->rcu_node_entry));
> + phase = (rnp->gpnum + !(rnp->qsmask & rdp->grpmask)) & 0x1;
> list_add(&t->rcu_node_entry, &rnp->blocked_tasks[phase]);
> - smp_mb(); /* Ensure later ctxt swtch seen after above. */
> spin_unlock_irqrestore(&rnp->lock, flags);
> }

ERROR: code indent should use tabs where possible
#149: FILE: kernel/rcutree_plugin.h:120:
+^I ^IWARN_ON_ONCE(!list_empty(&t->rcu_node_entry));$


One funny indent in the line above.. If your intending for Ingo to take
this he might just fix it on apply ..

Daniel

2009-09-18 17:01:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 2/3] Apply results of code inspection of kernel/rcutree_plugin.h

On Fri, 2009-09-18 at 09:58 -0700, Daniel Walker wrote:
> On Fri, 2009-09-18 at 09:50 -0700, Paul E. McKenney wrote:
>
> > diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> > index 2b996c3..a2d586c 100644
> > --- a/kernel/rcutree_plugin.h
> > +++ b/kernel/rcutree_plugin.h
> > @@ -117,9 +117,9 @@ static void rcu_preempt_note_context_switch(int cpu)
> > * on line!
> > */
> > WARN_ON_ONCE((rdp->grpmask & rnp->qsmaskinit) == 0);
> > - phase = !(rnp->qsmask & rdp->grpmask) ^ (rnp->gpnum & 0x1);
> > + WARN_ON_ONCE(!list_empty(&t->rcu_node_entry));
> > + phase = (rnp->gpnum + !(rnp->qsmask & rdp->grpmask)) & 0x1;
> > list_add(&t->rcu_node_entry, &rnp->blocked_tasks[phase]);
> > - smp_mb(); /* Ensure later ctxt swtch seen after above. */
> > spin_unlock_irqrestore(&rnp->lock, flags);
> > }
>
> ERROR: code indent should use tabs where possible
> #149: FILE: kernel/rcutree_plugin.h:120:
> +^I ^IWARN_ON_ONCE(!list_empty(&t->rcu_node_entry));$
>
>
> One funny indent in the line above.. If your intending for Ingo to take
> this he might just fix it on apply ..

Daniel, seriously, get a new hobby.

2009-09-18 17:22:53

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 2/3] Apply results of code inspection of kernel/rcutree_plugin.h

On Fri, Sep 18, 2009 at 09:58:20AM -0700, Daniel Walker wrote:
> On Fri, 2009-09-18 at 09:50 -0700, Paul E. McKenney wrote:
>
> > diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> > index 2b996c3..a2d586c 100644
> > --- a/kernel/rcutree_plugin.h
> > +++ b/kernel/rcutree_plugin.h
> > @@ -117,9 +117,9 @@ static void rcu_preempt_note_context_switch(int cpu)
> > * on line!
> > */
> > WARN_ON_ONCE((rdp->grpmask & rnp->qsmaskinit) == 0);
> > - phase = !(rnp->qsmask & rdp->grpmask) ^ (rnp->gpnum & 0x1);
> > + WARN_ON_ONCE(!list_empty(&t->rcu_node_entry));
> > + phase = (rnp->gpnum + !(rnp->qsmask & rdp->grpmask)) & 0x1;
> > list_add(&t->rcu_node_entry, &rnp->blocked_tasks[phase]);
> > - smp_mb(); /* Ensure later ctxt swtch seen after above. */
> > spin_unlock_irqrestore(&rnp->lock, flags);
> > }
>
> ERROR: code indent should use tabs where possible
> #149: FILE: kernel/rcutree_plugin.h:120:
> +^I ^IWARN_ON_ONCE(!list_empty(&t->rcu_node_entry));$
>
> One funny indent in the line above.. If your intending for Ingo to take
> this he might just fix it on apply ..

I will be submitting a patch shortly to clean up the whitespace errors
in the include/linux/rcu* and kernel/rcu* files.

Thanx, Paul

2009-09-18 17:56:07

by Daniel Walker

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 2/3] Apply results of code inspection of kernel/rcutree_plugin.h

On Fri, 2009-09-18 at 10:22 -0700, Paul E. McKenney wrote:
> this he might just fix it on apply ..
>
> I will be submitting a patch shortly to clean up the whitespace errors
> in the include/linux/rcu* and kernel/rcu* files.

Ok, thanks for doing that..

Daniel

2009-09-19 06:55:57

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 0/3] Cleanups/bugfixes for large systems and TREE_PREEMPT_RCU


* Paul E. McKenney <[email protected]> wrote:

> This set of patches does some cleanups and fixes a bug that prevents
> TREE_PREEMPT_RCU from running reliably on large systems (more than 64
> CPUs, or on 32-bit systems, more than 32 CPUs). The patches are as
> follows:
>
> 1. Add WARN_ON_ONCE() consistency checks to catch bugs. These
> are all on slowpaths, so are appropriate for production use.
>
> 2. Apply the results of a code walkthrough of rcutree_plugin.h.
> This includes a fix for a theoretical race that could result in
> excessively long RCU grace periods, or perhaps even hangs/OOMs.
>
> 3. Fix a bug introduced in a bugfix commit #de078d875 that caused
> large systems to only partially initialize the fields in the
> rcu_node tree. TREE_RCU doesn't care about any of these fields,
> which explains why only TREE_PREEMPT_RCU was broken.
>
> With these fixes, large systems using TREE_PREEMPT_RCU pass moderate
> rcutorture runs (but with Josh Triplett's mods that force frequent
> preemption within RCU read-side critical sections). I am sure that
> there are more bugs, but these fixes get things much closer.
>
> b/kernel/rcutree.c | 13 +++++++++----
> b/kernel/rcutree_plugin.h | 21 ++++++++++++++-------
> kernel/rcutree.c | 38 +++++++++-----------------------------
> kernel/rcutree_plugin.h | 16 +++++++++-------
> 4 files changed, 41 insertions(+), 47 deletions(-)

Thanks Paul, applied them to tip:core/urgent.

Ingo

2009-09-19 07:59:05

by Paul E. McKenney

[permalink] [raw]
Subject: [tip:core/urgent] rcu: Add WARN_ON_ONCE() consistency checks covering state transitions

Commit-ID: 28ecd58020409be8eb176c716f957fc3386fa2fa
Gitweb: http://git.kernel.org/tip/28ecd58020409be8eb176c716f957fc3386fa2fa
Author: Paul E. McKenney <[email protected]>
AuthorDate: Fri, 18 Sep 2009 09:50:17 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Sat, 19 Sep 2009 08:53:19 +0200

rcu: Add WARN_ON_ONCE() consistency checks covering state transitions

o Verify that qsmask bits stay clear through GP
initialization.

o Verify that cpu_quiet_msk_finish() is never invoked unless
there actually is an RCU grace period in progress.

o Verify that all internal-node rcu_node structures have empty
blocked_tasks[] lists.

o Verify that child rcu_node structure's bits remain clear after
acquiring parent's lock.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <12532926191947-git-send-email->
Signed-off-by: Ingo Molnar <[email protected]>


---
kernel/rcutree.c | 13 +++++++++----
kernel/rcutree_plugin.h | 20 ++++++++++++++------
2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 6c99553..e8624eb 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -628,8 +628,8 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)

/* Special-case the common single-level case. */
if (NUM_RCU_NODES == 1) {
- rnp->qsmask = rnp->qsmaskinit;
rcu_preempt_check_blocked_tasks(rnp);
+ rnp->qsmask = rnp->qsmaskinit;
rnp->gpnum = rsp->gpnum;
rsp->signaled = RCU_SIGNAL_INIT; /* force_quiescent_state OK. */
spin_unlock_irqrestore(&rnp->lock, flags);
@@ -662,8 +662,8 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
rnp_end = &rsp->node[NUM_RCU_NODES];
for (rnp_cur = &rsp->node[0]; rnp_cur < rnp_end; rnp_cur++) {
spin_lock(&rnp_cur->lock); /* irqs already disabled. */
- rnp_cur->qsmask = rnp_cur->qsmaskinit;
rcu_preempt_check_blocked_tasks(rnp);
+ rnp_cur->qsmask = rnp_cur->qsmaskinit;
rnp->gpnum = rsp->gpnum;
spin_unlock(&rnp_cur->lock); /* irqs already disabled. */
}
@@ -708,6 +708,7 @@ rcu_process_gp_end(struct rcu_state *rsp, struct rcu_data *rdp)
static void cpu_quiet_msk_finish(struct rcu_state *rsp, unsigned long flags)
__releases(rnp->lock)
{
+ WARN_ON_ONCE(rsp->completed == rsp->gpnum);
rsp->completed = rsp->gpnum;
rcu_process_gp_end(rsp, rsp->rda[smp_processor_id()]);
rcu_start_gp(rsp, flags); /* releases root node's rnp->lock. */
@@ -725,6 +726,8 @@ cpu_quiet_msk(unsigned long mask, struct rcu_state *rsp, struct rcu_node *rnp,
unsigned long flags)
__releases(rnp->lock)
{
+ struct rcu_node *rnp_c;
+
/* Walk up the rcu_node hierarchy. */
for (;;) {
if (!(rnp->qsmask & mask)) {
@@ -748,8 +751,10 @@ cpu_quiet_msk(unsigned long mask, struct rcu_state *rsp, struct rcu_node *rnp,
break;
}
spin_unlock_irqrestore(&rnp->lock, flags);
+ rnp_c = rnp;
rnp = rnp->parent;
spin_lock_irqsave(&rnp->lock, flags);
+ WARN_ON_ONCE(rnp_c->qsmask);
}

/*
@@ -858,7 +863,7 @@ static void __rcu_offline_cpu(int cpu, struct rcu_state *rsp)
spin_lock_irqsave(&rsp->onofflock, flags);

/* Remove the outgoing CPU from the masks in the rcu_node hierarchy. */
- rnp = rdp->mynode;
+ rnp = rdp->mynode; /* this is the outgoing CPU's rnp. */
mask = rdp->grpmask; /* rnp->grplo is constant. */
do {
spin_lock(&rnp->lock); /* irqs already disabled. */
@@ -867,7 +872,7 @@ static void __rcu_offline_cpu(int cpu, struct rcu_state *rsp)
spin_unlock(&rnp->lock); /* irqs remain disabled. */
break;
}
- rcu_preempt_offline_tasks(rsp, rnp);
+ rcu_preempt_offline_tasks(rsp, rnp, rdp);
mask = rnp->grpmask;
spin_unlock(&rnp->lock); /* irqs remain disabled. */
rnp = rnp->parent;
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index c9616e4..5f94619 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -206,7 +206,8 @@ static void rcu_read_unlock_special(struct task_struct *t)
*/
if (!empty && rnp->qsmask == 0 &&
list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x1])) {
- t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
+ struct rcu_node *rnp_p;
+
if (rnp->parent == NULL) {
/* Only one rcu_node in the tree. */
cpu_quiet_msk_finish(&rcu_preempt_state, flags);
@@ -215,9 +216,10 @@ static void rcu_read_unlock_special(struct task_struct *t)
/* Report up the rest of the hierarchy. */
mask = rnp->grpmask;
spin_unlock_irqrestore(&rnp->lock, flags);
- rnp = rnp->parent;
- spin_lock_irqsave(&rnp->lock, flags);
- cpu_quiet_msk(mask, &rcu_preempt_state, rnp, flags);
+ rnp_p = rnp->parent;
+ spin_lock_irqsave(&rnp_p->lock, flags);
+ WARN_ON_ONCE(rnp->qsmask);
+ cpu_quiet_msk(mask, &rcu_preempt_state, rnp_p, flags);
return;
}
spin_unlock(&rnp->lock);
@@ -278,6 +280,7 @@ static void rcu_print_task_stall(struct rcu_node *rnp)
static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
{
WARN_ON_ONCE(!list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x1]));
+ WARN_ON_ONCE(rnp->qsmask);
}

/*
@@ -302,7 +305,8 @@ static int rcu_preempted_readers(struct rcu_node *rnp)
* The caller must hold rnp->lock with irqs disabled.
*/
static void rcu_preempt_offline_tasks(struct rcu_state *rsp,
- struct rcu_node *rnp)
+ struct rcu_node *rnp,
+ struct rcu_data *rdp)
{
int i;
struct list_head *lp;
@@ -314,6 +318,9 @@ static void rcu_preempt_offline_tasks(struct rcu_state *rsp,
WARN_ONCE(1, "Last CPU thought to be offlined?");
return; /* Shouldn't happen: at least one CPU online. */
}
+ WARN_ON_ONCE(rnp != rdp->mynode &&
+ (!list_empty(&rnp->blocked_tasks[0]) ||
+ !list_empty(&rnp->blocked_tasks[1])));

/*
* Move tasks up to root rcu_node. Rely on the fact that the
@@ -489,7 +496,8 @@ static int rcu_preempted_readers(struct rcu_node *rnp)
* tasks that were blocked within RCU read-side critical sections.
*/
static void rcu_preempt_offline_tasks(struct rcu_state *rsp,
- struct rcu_node *rnp)
+ struct rcu_node *rnp,
+ struct rcu_data *rdp)
{
}

2009-09-19 07:59:19

by Paul E. McKenney

[permalink] [raw]
Subject: [tip:core/urgent] rcu: Apply results of code inspection of kernel/rcutree_plugin.h

Commit-ID: e7d8842ed34a7fe19d1ed90f84c211fb056ac523
Gitweb: http://git.kernel.org/tip/e7d8842ed34a7fe19d1ed90f84c211fb056ac523
Author: Paul E. McKenney <[email protected]>
AuthorDate: Fri, 18 Sep 2009 09:50:18 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Sat, 19 Sep 2009 08:53:21 +0200

rcu: Apply results of code inspection of kernel/rcutree_plugin.h

o Drop the calls to cpu_quiet() from the online/offline code.
These are unnecessary, since force_quiescent_state() will
clean up, and removing them simplifies the code a bit.

o Add a warning to check that we don't enqueue the same blocked
task twice onto the ->blocked_tasks[] lists.

o Rework the phase computation in rcu_preempt_note_context_switch()
to be more readable, as suggested by Josh Triplett.

o Disable irqs to close a race between the scheduling clock
interrupt and rcu_preempt_note_context_switch() WRT the
->rcu_read_unlock_special field.

o Add comments to rnp->lock acquisition and release within
rcu_read_unlock_special() noting that irqs are already
disabled.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <12532926201851-git-send-email->
Signed-off-by: Ingo Molnar <[email protected]>


---
kernel/rcutree.c | 27 +++++----------------------
kernel/rcutree_plugin.h | 10 ++++++----
2 files changed, 11 insertions(+), 26 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index e8624eb..ae4a553 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -767,10 +767,10 @@ cpu_quiet_msk(unsigned long mask, struct rcu_state *rsp, struct rcu_node *rnp,

/*
* Record a quiescent state for the specified CPU, which must either be
- * the current CPU or an offline CPU. The lastcomp argument is used to
- * make sure we are still in the grace period of interest. We don't want
- * to end the current grace period based on quiescent states detected in
- * an earlier grace period!
+ * the current CPU. The lastcomp argument is used to make sure we are
+ * still in the grace period of interest. We don't want to end the current
+ * grace period based on quiescent states detected in an earlier grace
+ * period!
*/
static void
cpu_quiet(int cpu, struct rcu_state *rsp, struct rcu_data *rdp, long lastcomp)
@@ -805,7 +805,6 @@ cpu_quiet(int cpu, struct rcu_state *rsp, struct rcu_data *rdp, long lastcomp)
* This GP can't end until cpu checks in, so all of our
* callbacks can be processed during the next GP.
*/
- rdp = rsp->rda[smp_processor_id()];
rdp->nxttail[RCU_NEXT_READY_TAIL] = rdp->nxttail[RCU_NEXT_TAIL];

cpu_quiet_msk(mask, rsp, rnp, flags); /* releases rnp->lock */
@@ -881,9 +880,6 @@ static void __rcu_offline_cpu(int cpu, struct rcu_state *rsp)

spin_unlock(&rsp->onofflock); /* irqs remain disabled. */

- /* Being offline is a quiescent state, so go record it. */
- cpu_quiet(cpu, rsp, rdp, lastcomp);
-
/*
* Move callbacks from the outgoing CPU to the running CPU.
* Note that the outgoing CPU is now quiscent, so it is now
@@ -1448,20 +1444,7 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptable)
rnp = rnp->parent;
} while (rnp != NULL && !(rnp->qsmaskinit & mask));

- spin_unlock(&rsp->onofflock); /* irqs remain disabled. */
-
- /*
- * A new grace period might start here. If so, we will be part of
- * it, and its gpnum will be greater than ours, so we will
- * participate. It is also possible for the gpnum to have been
- * incremented before this function was called, and the bitmasks
- * to not be filled out until now, in which case we will also
- * participate due to our gpnum being behind.
- */
-
- /* Since it is coming online, the CPU is in a quiescent state. */
- cpu_quiet(cpu, rsp, rdp, lastcomp);
- local_irq_restore(flags);
+ spin_unlock_irqrestore(&rsp->onofflock, flags);
}

static void __cpuinit rcu_online_cpu(int cpu)
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 5f94619..cd6047c 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -117,9 +117,9 @@ static void rcu_preempt_note_context_switch(int cpu)
* on line!
*/
WARN_ON_ONCE((rdp->grpmask & rnp->qsmaskinit) == 0);
- phase = !(rnp->qsmask & rdp->grpmask) ^ (rnp->gpnum & 0x1);
+ WARN_ON_ONCE(!list_empty(&t->rcu_node_entry));
+ phase = (rnp->gpnum + !(rnp->qsmask & rdp->grpmask)) & 0x1;
list_add(&t->rcu_node_entry, &rnp->blocked_tasks[phase]);
- smp_mb(); /* Ensure later ctxt swtch seen after above. */
spin_unlock_irqrestore(&rnp->lock, flags);
}

@@ -133,7 +133,9 @@ static void rcu_preempt_note_context_switch(int cpu)
* means that we continue to block the current grace period.
*/
rcu_preempt_qs(cpu);
+ local_irq_save(flags);
t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
+ local_irq_restore(flags);
}

/*
@@ -189,10 +191,10 @@ static void rcu_read_unlock_special(struct task_struct *t)
*/
for (;;) {
rnp = t->rcu_blocked_node;
- spin_lock(&rnp->lock);
+ spin_lock(&rnp->lock); /* irqs already disabled. */
if (rnp == t->rcu_blocked_node)
break;
- spin_unlock(&rnp->lock);
+ spin_unlock(&rnp->lock); /* irqs remain disabled. */
}
empty = list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x1]);
list_del_init(&t->rcu_node_entry);

2009-09-19 07:59:41

by Paul E. McKenney

[permalink] [raw]
Subject: [tip:core/urgent] rcu: Fix thinko, actually initialize full tree

Commit-ID: 49e291266d0920264471d9d64268fb030e33a99a
Gitweb: http://git.kernel.org/tip/49e291266d0920264471d9d64268fb030e33a99a
Author: Paul E. McKenney <[email protected]>
AuthorDate: Fri, 18 Sep 2009 09:50:19 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Sat, 19 Sep 2009 08:53:21 +0200

rcu: Fix thinko, actually initialize full tree

Commit de078d8 ("rcu: Need to update rnp->gpnum if preemptable RCU
is to be reliable") repeatedly and incorrectly initializes the root
rcu_node structure's ->gpnum field rather than initializing the
->gpnum field of each node in the tree. Fix this. Also add an
additional consistency check to catch this in the future.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <125329262011-git-send-email->
Signed-off-by: Ingo Molnar <[email protected]>


---
kernel/rcutree.c | 11 ++++-------
kernel/rcutree_plugin.h | 4 +++-
2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index ae4a553..1b32cdd 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -601,8 +601,6 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
{
struct rcu_data *rdp = rsp->rda[smp_processor_id()];
struct rcu_node *rnp = rcu_get_root(rsp);
- struct rcu_node *rnp_cur;
- struct rcu_node *rnp_end;

if (!cpu_needs_another_gp(rsp, rdp)) {
spin_unlock_irqrestore(&rnp->lock, flags);
@@ -659,13 +657,12 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
* one corresponding to this CPU, due to the fact that we have
* irqs disabled.
*/
- rnp_end = &rsp->node[NUM_RCU_NODES];
- for (rnp_cur = &rsp->node[0]; rnp_cur < rnp_end; rnp_cur++) {
- spin_lock(&rnp_cur->lock); /* irqs already disabled. */
+ for (rnp = &rsp->node[0]; rnp < &rsp->node[NUM_RCU_NODES]; rnp++) {
+ spin_lock(&rnp->lock); /* irqs already disabled. */
rcu_preempt_check_blocked_tasks(rnp);
- rnp_cur->qsmask = rnp_cur->qsmaskinit;
+ rnp->qsmask = rnp->qsmaskinit;
rnp->gpnum = rsp->gpnum;
- spin_unlock(&rnp_cur->lock); /* irqs already disabled. */
+ spin_unlock(&rnp->lock); /* irqs already disabled. */
}

rsp->signaled = RCU_SIGNAL_INIT; /* force_quiescent_state now OK. */
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index cd6047c..09b7325 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -476,10 +476,12 @@ static void rcu_print_task_stall(struct rcu_node *rnp)

/*
* Because there is no preemptable RCU, there can be no readers blocked,
- * so there is no need to check for blocked tasks.
+ * so there is no need to check for blocked tasks. So check only for
+ * bogus qsmask values.
*/
static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
{
+ WARN_ON_ONCE(rnp->qsmask);
}

/*

2009-09-18 16:50:25

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 2/3] Apply results of code inspection of kernel/rcutree_plugin.h

From: Paul E. McKenney <[email protected]>

o Drop the calls to cpu_quiet() from the online/offline code.
These are unnecessary, since force_quiescent_state() will
clean up, and removing them simplifies the code a bit.

o Add a warning to check that we don't enqueue the same blocked
task twice onto the ->blocked_tasks[] lists.

o Rework the phase computation in rcu_preempt_note_context_switch()
to be more readable, as suggested by Josh Triplett.

o Disable irqs to close a race between the scheduling clock
interrupt and rcu_preempt_note_context_switch() WRT the
->rcu_read_unlock_special field.

o Add comments to rnp->lock acquisition and release within
rcu_read_unlock_special() noting that irqs are already
disabled.

Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcutree.c | 27 +++++----------------------
kernel/rcutree_plugin.h | 10 ++++++----
2 files changed, 11 insertions(+), 26 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 211442c..30e0e91 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -762,10 +762,10 @@ cpu_quiet_msk(unsigned long mask, struct rcu_state *rsp, struct rcu_node *rnp,

/*
* Record a quiescent state for the specified CPU, which must either be
- * the current CPU or an offline CPU. The lastcomp argument is used to
- * make sure we are still in the grace period of interest. We don't want
- * to end the current grace period based on quiescent states detected in
- * an earlier grace period!
+ * the current CPU. The lastcomp argument is used to make sure we are
+ * still in the grace period of interest. We don't want to end the current
+ * grace period based on quiescent states detected in an earlier grace
+ * period!
*/
static void
cpu_quiet(int cpu, struct rcu_state *rsp, struct rcu_data *rdp, long lastcomp)
@@ -800,7 +800,6 @@ cpu_quiet(int cpu, struct rcu_state *rsp, struct rcu_data *rdp, long lastcomp)
* This GP can't end until cpu checks in, so all of our
* callbacks can be processed during the next GP.
*/
- rdp = rsp->rda[smp_processor_id()];
rdp->nxttail[RCU_NEXT_READY_TAIL] = rdp->nxttail[RCU_NEXT_TAIL];

cpu_quiet_msk(mask, rsp, rnp, flags); /* releases rnp->lock */
@@ -876,9 +875,6 @@ static void __rcu_offline_cpu(int cpu, struct rcu_state *rsp)

spin_unlock(&rsp->onofflock); /* irqs remain disabled. */

- /* Being offline is a quiescent state, so go record it. */
- cpu_quiet(cpu, rsp, rdp, lastcomp);
-
/*
* Move callbacks from the outgoing CPU to the running CPU.
* Note that the outgoing CPU is now quiscent, so it is now
@@ -1443,20 +1439,7 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp, int preemptable)
rnp = rnp->parent;
} while (rnp != NULL && !(rnp->qsmaskinit & mask));

- spin_unlock(&rsp->onofflock); /* irqs remain disabled. */
-
- /*
- * A new grace period might start here. If so, we will be part of
- * it, and its gpnum will be greater than ours, so we will
- * participate. It is also possible for the gpnum to have been
- * incremented before this function was called, and the bitmasks
- * to not be filled out until now, in which case we will also
- * participate due to our gpnum being behind.
- */
-
- /* Since it is coming online, the CPU is in a quiescent state. */
- cpu_quiet(cpu, rsp, rdp, lastcomp);
- local_irq_restore(flags);
+ spin_unlock_irqrestore(&rsp->onofflock, flags);
}

static void __cpuinit rcu_online_cpu(int cpu)
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 2b996c3..a2d586c 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -117,9 +117,9 @@ static void rcu_preempt_note_context_switch(int cpu)
* on line!
*/
WARN_ON_ONCE((rdp->grpmask & rnp->qsmaskinit) == 0);
- phase = !(rnp->qsmask & rdp->grpmask) ^ (rnp->gpnum & 0x1);
+ WARN_ON_ONCE(!list_empty(&t->rcu_node_entry));
+ phase = (rnp->gpnum + !(rnp->qsmask & rdp->grpmask)) & 0x1;
list_add(&t->rcu_node_entry, &rnp->blocked_tasks[phase]);
- smp_mb(); /* Ensure later ctxt swtch seen after above. */
spin_unlock_irqrestore(&rnp->lock, flags);
}

@@ -133,7 +133,9 @@ static void rcu_preempt_note_context_switch(int cpu)
* means that we continue to block the current grace period.
*/
rcu_preempt_qs(cpu);
+ local_irq_save(flags);
t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
+ local_irq_restore(flags);
}

/*
@@ -189,10 +191,10 @@ static void rcu_read_unlock_special(struct task_struct *t)
*/
for (;;) {
rnp = t->rcu_blocked_node;
- spin_lock(&rnp->lock);
+ spin_lock(&rnp->lock); /* irqs already disabled. */
if (rnp == t->rcu_blocked_node)
break;
- spin_unlock(&rnp->lock);
+ spin_unlock(&rnp->lock); /* irqs remain disabled. */
}
empty = list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x1]);
list_del_init(&t->rcu_node_entry);
--
1.5.2.5

2009-09-18 16:50:24

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 1/3] Add WARN_ON_ONCE() consistency checks covering state transitions

o Verify that qsmask bits stay clear through GP initialization.

o Verify that cpu_quiet_msk_finish() is never invoked unless there
actually is an RCU grace period in progress.

o Verify that all internal-node rcu_node structures have empty
blocked_tasks[] lists.

o Verify that child rcu_node structure's bits remain clear after
acquiring parent's lock.

Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcutree.c | 13 +++++++++----
kernel/rcutree_plugin.h | 20 ++++++++++++++------
2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 2454999..211442c 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -623,8 +623,8 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)

/* Special-case the common single-level case. */
if (NUM_RCU_NODES == 1) {
- rnp->qsmask = rnp->qsmaskinit;
rcu_preempt_check_blocked_tasks(rnp);
+ rnp->qsmask = rnp->qsmaskinit;
rnp->gpnum = rsp->gpnum;
rsp->signaled = RCU_SIGNAL_INIT; /* force_quiescent_state OK. */
spin_unlock_irqrestore(&rnp->lock, flags);
@@ -657,8 +657,8 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
rnp_end = &rsp->node[NUM_RCU_NODES];
for (rnp_cur = &rsp->node[0]; rnp_cur < rnp_end; rnp_cur++) {
spin_lock(&rnp_cur->lock); /* irqs already disabled. */
- rnp_cur->qsmask = rnp_cur->qsmaskinit;
rcu_preempt_check_blocked_tasks(rnp);
+ rnp_cur->qsmask = rnp_cur->qsmaskinit;
rnp->gpnum = rsp->gpnum;
spin_unlock(&rnp_cur->lock); /* irqs already disabled. */
}
@@ -703,6 +703,7 @@ rcu_process_gp_end(struct rcu_state *rsp, struct rcu_data *rdp)
static void cpu_quiet_msk_finish(struct rcu_state *rsp, unsigned long flags)
__releases(rnp->lock)
{
+ WARN_ON_ONCE(rsp->completed == rsp->gpnum);
rsp->completed = rsp->gpnum;
rcu_process_gp_end(rsp, rsp->rda[smp_processor_id()]);
rcu_start_gp(rsp, flags); /* releases root node's rnp->lock. */
@@ -720,6 +721,8 @@ cpu_quiet_msk(unsigned long mask, struct rcu_state *rsp, struct rcu_node *rnp,
unsigned long flags)
__releases(rnp->lock)
{
+ struct rcu_node *rnp_c;
+
/* Walk up the rcu_node hierarchy. */
for (;;) {
if (!(rnp->qsmask & mask)) {
@@ -743,8 +746,10 @@ cpu_quiet_msk(unsigned long mask, struct rcu_state *rsp, struct rcu_node *rnp,
break;
}
spin_unlock_irqrestore(&rnp->lock, flags);
+ rnp_c = rnp;
rnp = rnp->parent;
spin_lock_irqsave(&rnp->lock, flags);
+ WARN_ON_ONCE(rnp_c->qsmask);
}

/*
@@ -853,7 +858,7 @@ static void __rcu_offline_cpu(int cpu, struct rcu_state *rsp)
spin_lock_irqsave(&rsp->onofflock, flags);

/* Remove the outgoing CPU from the masks in the rcu_node hierarchy. */
- rnp = rdp->mynode;
+ rnp = rdp->mynode; /* this is the outgoing CPU's rnp. */
mask = rdp->grpmask; /* rnp->grplo is constant. */
do {
spin_lock(&rnp->lock); /* irqs already disabled. */
@@ -862,7 +867,7 @@ static void __rcu_offline_cpu(int cpu, struct rcu_state *rsp)
spin_unlock(&rnp->lock); /* irqs remain disabled. */
break;
}
- rcu_preempt_offline_tasks(rsp, rnp);
+ rcu_preempt_offline_tasks(rsp, rnp, rdp);
mask = rnp->grpmask;
spin_unlock(&rnp->lock); /* irqs remain disabled. */
rnp = rnp->parent;
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index eb4bae3..2b996c3 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -206,7 +206,8 @@ static void rcu_read_unlock_special(struct task_struct *t)
*/
if (!empty && rnp->qsmask == 0 &&
list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x1])) {
- t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
+ struct rcu_node *rnp_p;
+
if (rnp->parent == NULL) {
/* Only one rcu_node in the tree. */
cpu_quiet_msk_finish(&rcu_preempt_state, flags);
@@ -215,9 +216,10 @@ static void rcu_read_unlock_special(struct task_struct *t)
/* Report up the rest of the hierarchy. */
mask = rnp->grpmask;
spin_unlock_irqrestore(&rnp->lock, flags);
- rnp = rnp->parent;
- spin_lock_irqsave(&rnp->lock, flags);
- cpu_quiet_msk(mask, &rcu_preempt_state, rnp, flags);
+ rnp_p = rnp->parent;
+ spin_lock_irqsave(&rnp_p->lock, flags);
+ WARN_ON_ONCE(rnp->qsmask);
+ cpu_quiet_msk(mask, &rcu_preempt_state, rnp_p, flags);
return;
}
spin_unlock(&rnp->lock);
@@ -278,6 +280,7 @@ static void rcu_print_task_stall(struct rcu_node *rnp)
static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
{
WARN_ON_ONCE(!list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x1]));
+ WARN_ON_ONCE(rnp->qsmask);
}

/*
@@ -302,7 +305,8 @@ static int rcu_preempted_readers(struct rcu_node *rnp)
* The caller must hold rnp->lock with irqs disabled.
*/
static void rcu_preempt_offline_tasks(struct rcu_state *rsp,
- struct rcu_node *rnp)
+ struct rcu_node *rnp,
+ struct rcu_data *rdp)
{
int i;
struct list_head *lp;
@@ -314,6 +318,9 @@ static void rcu_preempt_offline_tasks(struct rcu_state *rsp,
WARN_ONCE(1, "Last CPU thought to be offlined?");
return; /* Shouldn't happen: at least one CPU online. */
}
+ WARN_ON_ONCE(rnp != rdp->mynode &&
+ (!list_empty(&rnp->blocked_tasks[0]) ||
+ !list_empty(&rnp->blocked_tasks[1])));

/*
* Move tasks up to root rcu_node. Rely on the fact that the
@@ -489,7 +496,8 @@ static int rcu_preempted_readers(struct rcu_node *rnp)
* tasks that were blocked within RCU read-side critical sections.
*/
static void rcu_preempt_offline_tasks(struct rcu_state *rsp,
- struct rcu_node *rnp)
+ struct rcu_node *rnp,
+ struct rcu_data *rdp)
{
}

--
1.5.2.5

2009-09-18 16:50:40

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 3/3] Fix thinko in commit #de078d875, actually initialize full tree

From: Paul E. McKenney <[email protected]>

Commit #de078d875 repeatedly and incorrectly initializes the root rcu_node
structure's ->gpnum field rather than initializing the ->gpnum field of
each node in the tree. Fix this. Also add an additional consistency
check to catch this in the future.

Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcutree.c | 11 ++++-------
kernel/rcutree_plugin.h | 4 +++-
2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 30e0e91..dd6e743 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -596,8 +596,6 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
{
struct rcu_data *rdp = rsp->rda[smp_processor_id()];
struct rcu_node *rnp = rcu_get_root(rsp);
- struct rcu_node *rnp_cur;
- struct rcu_node *rnp_end;

if (!cpu_needs_another_gp(rsp, rdp)) {
spin_unlock_irqrestore(&rnp->lock, flags);
@@ -654,13 +652,12 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
* one corresponding to this CPU, due to the fact that we have
* irqs disabled.
*/
- rnp_end = &rsp->node[NUM_RCU_NODES];
- for (rnp_cur = &rsp->node[0]; rnp_cur < rnp_end; rnp_cur++) {
- spin_lock(&rnp_cur->lock); /* irqs already disabled. */
+ for (rnp = &rsp->node[0]; rnp < &rsp->node[NUM_RCU_NODES]; rnp++) {
+ spin_lock(&rnp->lock); /* irqs already disabled. */
rcu_preempt_check_blocked_tasks(rnp);
- rnp_cur->qsmask = rnp_cur->qsmaskinit;
+ rnp->qsmask = rnp->qsmaskinit;
rnp->gpnum = rsp->gpnum;
- spin_unlock(&rnp_cur->lock); /* irqs already disabled. */
+ spin_unlock(&rnp->lock); /* irqs already disabled. */
}

rsp->signaled = RCU_SIGNAL_INIT; /* force_quiescent_state now OK. */
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index a2d586c..55c6497 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -476,10 +476,12 @@ static void rcu_print_task_stall(struct rcu_node *rnp)

/*
* Because there is no preemptable RCU, there can be no readers blocked,
- * so there is no need to check for blocked tasks.
+ * so there is no need to check for blocked tasks. So check only for
+ * bogus qsmask values.
*/
static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
{
+ WARN_ON_ONCE(rnp->qsmask);
}

/*
--
1.5.2.5