2010-08-09 22:15:08

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 0/N] Additional RCU commits queued for 2.6.37

Hello!

This patchset shows additional patches queued for 2.6.37, over and above
those posted at http://lkml.org/lkml/2010/7/14/334. These are all minor
fixes, with the exception of patch #8, which adds TINY_PREEMPT_RCU.
The patches are as follows:

1. Remove the rcu_head initialization macros (from Mathieu Desnoyers).
This patch can move forward now that all uses of these macros
have been removed from mainline.
2. Update documentation to note the demise of the rcu_head
initialization macros.
3. Fix kernel-locking.tmpl docbook documentation, which was still
using the now-ancient three-argument version of call_rcu().
4. Allow RCU's CPU stall-warning messages to be controlled via sysfs.
5. Now that TINY_RCU has been in-tree for a few releases, adjust
the configuration so that TINY_RCU is mandatory for kernels
built with !SMP and !PREEMPT. Once TINY_PREEMPT_RCU has gained
a similar level of experience, !SMP code will be eliminated
from TREE_RCU.
6. Allow kernels to be built such that RCU CPU stall warnings are
suppressed at boot time. Patch #4 above allows them to be
manually re-enabled once the system has booted.
7. Updates the RCU_FANOUT message to take commit cf244dc01bf68 into
account. This commit added a fourth level to TREE_RCU.
8. Add TINY_PREEMPT_RCU, allowing reduced memory footprint for
UP builds of preemptible RCU. This is a cleaned-up version of
the patch posted at http://lkml.org/lkml/2010/7/21/364.
9. The "It is illegal to block while in an RCU read-side critical
section" docbook comment was obsoleted long ago by preemptible
RCU, so this patch brings it up to the present day.
10. Add comments above the RCU CPU stall-warning printk()s pointing
people at the Documentation/RCU/stallwarn.txt documentation.

For a testing-only version of this patchset from git, please see:

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git rcu/testing

Thanx, Paul

Documentation/DocBook/kernel-locking.tmpl | 6
b/Documentation/DocBook/kernel-locking.tmpl | 8
b/include/linux/hardirq.h | 2
b/include/linux/init_task.h | 10
b/include/linux/rcupdate.h | 6
b/include/linux/rcutiny.h | 126 ++++--
b/include/linux/rcutree.h | 2
b/include/linux/sched.h | 10
b/init/Kconfig | 1
b/kernel/Makefile | 1
b/kernel/rcutiny.c | 33 -
b/kernel/rcutiny_plugin.h | 580 +++++++++++++++++++++++++++-
b/kernel/rcutree.c | 2
b/kernel/rcutree.h | 6
b/lib/Kconfig.debug | 13
include/linux/rcupdate.h | 18
init/Kconfig | 25 +
kernel/rcutree.c | 14
18 files changed, 771 insertions(+), 92 deletions(-)


2010-08-09 22:15:31

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 06/10] rcu: Allow RCU CPU stall warnings to be off at boot, but manually enablable

Currently, if RCU CPU stall warnings are enabled, they are enabled
immediately upon boot. They can be manually disabled via /sys (and
also re-enabled via /sys), and are automatically disabled upon panic.
However, some users need RCU CPU stalls to be disabled at boot time,
but to be enabled without rebuilding/rebooting. For example, someone
running a real-time application in production might not want the
additional latency of RCU CPU stall detection in normal operation, but
might need to enable it at any point for fault isolation purposes.

This commit therefore provides a new CONFIG_RCU_CPU_STALL_DETECTOR_RUNNABLE
kernel configuration parameter that maintains the current behavior
(enable at boot) by default, but allows a kernel to be configured
with RCU CPU stall detection built into the kernel, but disabled at
boot time.

Requested-by: Clark Williams <[email protected]>
Requested-by: John Kacur <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcutree.c | 2 +-
kernel/rcutree.h | 6 ++++++
lib/Kconfig.debug | 13 +++++++++++++
3 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 5d910be..5aab7da 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -144,7 +144,7 @@ module_param(qhimark, int, 0);
module_param(qlowmark, int, 0);

#ifdef CONFIG_RCU_CPU_STALL_DETECTOR
-int rcu_cpu_stall_suppress __read_mostly;
+int rcu_cpu_stall_suppress __read_mostly = RCU_CPU_STALL_SUPPRESS_INIT;
module_param(rcu_cpu_stall_suppress, int, 0644);
#endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */

diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 183ebf4..bb4d086 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -264,6 +264,12 @@ struct rcu_data {
/* scheduling clock irq */
/* before ratting on them. */

+#ifdef CONFIG_RCU_CPU_STALL_DETECTOR_RUNNABLE
+#define RCU_CPU_STALL_SUPPRESS_INIT 0
+#else
+#define RCU_CPU_STALL_SUPPRESS_INIT 1
+#endif
+
#endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */

#define ULONG_CMP_GE(a, b) (ULONG_MAX / 2 >= (a) - (b))
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index af8b9c5..adfc9f1 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -832,6 +832,19 @@ config RCU_CPU_STALL_TIMEOUT
RCU grace period persists, additional CPU stall warnings are
printed at more widely spaced intervals.

+config RCU_CPU_STALL_DETECTOR_RUNNABLE
+ bool "RCU CPU stall checking starts automatically at boot"
+ depends on RCU_CPU_STALL_DETECTOR
+ default y
+ help
+ If set, start checking for RCU CPU stalls immediately on
+ boot. Otherwise, RCU CPU stall checking must be manually
+ enabled.
+
+ Say Y if you are unsure.
+
+ Say N if you wish to suppress RCU CPU stall checking during boot.
+
config RCU_CPU_STALL_VERBOSE
bool "Print additional per-task information for RCU_CPU_STALL_DETECTOR"
depends on RCU_CPU_STALL_DETECTOR && TREE_PREEMPT_RCU
--
1.7.0.6

2010-08-09 22:15:36

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 05/10] rcu: restrict TREE_RCU to SMP builds with !PREEMPT

Because both TINY_RCU and TREE_PREEMPT_RCU have been in mainline for
several releases, it is time to restrict the use of TREE_RCU to SMP
non-preemptible systems. This reduces testing/validation effort. This
commit is a first step towards driving the selection of RCU implementation
directly off of the SMP and PREEMPT configuration parameters.

Signed-off-by: Paul E. McKenney <[email protected]>
---
init/Kconfig | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 5cff9a9..9366954 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -336,6 +336,7 @@ choice

config TREE_RCU
bool "Tree-based hierarchical RCU"
+ depends on !PREEMPT && SMP
help
This option selects the RCU implementation that is
designed for very large SMP system with hundreds or
--
1.7.0.6

2010-08-09 22:15:29

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 03/10] Update call_rcu() usage, add synchronize_rcu()

Reported-by: Kyle Hubert <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Reviewed-by: Josh Triplett <[email protected]>
---
Documentation/DocBook/kernel-locking.tmpl | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl
index e6cc574..ed64d22 100644
--- a/Documentation/DocBook/kernel-locking.tmpl
+++ b/Documentation/DocBook/kernel-locking.tmpl
@@ -1645,7 +1645,9 @@ the amount of locking which needs to be done.
all the readers who were traversing the list when we deleted the
element are finished. We use <function>call_rcu()</function> to
register a callback which will actually destroy the object once
- the readers are finished.
+ all pre-existing readers are finished. Alternatively,
+ <function>synchronize_rcu()</function> may be used to block until
+ all pre-existing are finished.
</para>
<para>
But how does Read Copy Update know when the readers are
@@ -1714,7 +1716,7 @@ the amount of locking which needs to be done.
- object_put(obj);
+ list_del_rcu(&amp;obj-&gt;list);
cache_num--;
-+ call_rcu(&amp;obj-&gt;rcu, cache_delete_rcu, obj);
++ call_rcu(&amp;obj-&gt;rcu, cache_delete_rcu);
}

/* Must be holding cache_lock */
--
1.7.0.6

2010-08-09 22:15:32

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 04/10] rcu: allow RCU CPU stall warning messages to be controlled in /sys

Set the permissions of the rcu_cpu_stall_suppress to 644 to enable RCU
CPU stall warnings to be enabled and disabled at runtime via sysfs.

Suggested-by: Josh Triplett <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcutree.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index f3d5906..5d910be 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -145,7 +145,7 @@ module_param(qlowmark, int, 0);

#ifdef CONFIG_RCU_CPU_STALL_DETECTOR
int rcu_cpu_stall_suppress __read_mostly;
-module_param(rcu_cpu_stall_suppress, int, 0);
+module_param(rcu_cpu_stall_suppress, int, 0644);
#endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */

static void force_quiescent_state(struct rcu_state *rsp, int relaxed);
--
1.7.0.6

2010-08-09 22:15:35

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 10/10] rcu: refer RCU CPU stall-warning victims to stallwarn.txt

There is some documentation on RCU CPU stall warnings contained in
Documentation/RCU/stallwarn.txt, but it will not be apparent to someone
who runs into such a warning while under time pressure. This commit
therefore adds comments preceding the printk()s pointing out the
location of this documentation.

Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcutree.c | 12 ++++++++++--
1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 5aab7da..ff21411 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -487,8 +487,11 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
rcu_print_task_stall(rnp);
raw_spin_unlock_irqrestore(&rnp->lock, flags);

- /* OK, time to rat on our buddy... */
-
+ /*
+ * OK, time to rat on our buddy...
+ * See Documentation/RCU/stallwarn.txt for info on how to debug
+ * RCU CPU stall warnings.
+ */
printk(KERN_ERR "INFO: %s detected stalls on CPUs/tasks: {",
rsp->name);
rcu_for_each_leaf_node(rsp, rnp) {
@@ -517,6 +520,11 @@ static void print_cpu_stall(struct rcu_state *rsp)
unsigned long flags;
struct rcu_node *rnp = rcu_get_root(rsp);

+ /*
+ * OK, time to rat on ourselves...
+ * See Documentation/RCU/stallwarn.txt for info on how to debug
+ * RCU CPU stall warnings.
+ */
printk(KERN_ERR "INFO: %s detected stall on CPU %d (t=%lu jiffies)\n",
rsp->name, smp_processor_id(), jiffies - rsp->gp_start);
trigger_all_cpu_backtrace();
--
1.7.0.6

2010-08-09 22:15:27

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 01/10] rcu head remove init

From: Mathieu Desnoyers <[email protected]>

RCU heads really don't need to be initialized. Their state before call_rcu()
really does not matter.

We need to keep init/destroy_rcu_head_on_stack() though, since we want
debugobjects to be able to keep track of these objects.

Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Mathieu Desnoyers <[email protected]>
CC: David S. Miller <[email protected]>
CC: "Paul E. McKenney" <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: Alexey Dobriyan <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Reviewed-by: Josh Triplett <[email protected]>
---
include/linux/rcupdate.h | 6 ------
1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 3e1b662..27b44b3 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -75,12 +75,6 @@ extern void rcu_init(void);
#error "Unknown RCU implementation specified to kernel configuration"
#endif

-#define RCU_HEAD_INIT { .next = NULL, .func = NULL }
-#define RCU_HEAD(head) struct rcu_head head = RCU_HEAD_INIT
-#define INIT_RCU_HEAD(ptr) do { \
- (ptr)->next = NULL; (ptr)->func = NULL; \
-} while (0)
-
/*
* init_rcu_head_on_stack()/destroy_rcu_head_on_stack() are needed for dynamic
* initialization and destruction of rcu_head on the stack. rcu_head structures
--
1.7.0.6

2010-08-09 22:15:30

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 07/10] rcu: Fix RCU_FANOUT help message

Commit cf244dc01bf68 added a fourth level to the TREE_RCU hierarchy,
but the RCU_FANOUT help message still said "cube root". This commit
fixes this to "fourth root" and also emphasizes that production
systems are well-served by the default. (Stress-testing RCU itself
uses small RCU_FANOUT values in order to test large-system code paths
on small(er) systems.)

Located-by: John Kacur <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
init/Kconfig | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 9366954..7faa5d8 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -384,9 +384,12 @@ config RCU_FANOUT
help
This option controls the fanout of hierarchical implementations
of RCU, allowing RCU to work efficiently on machines with
- large numbers of CPUs. This value must be at least the cube
- root of NR_CPUS, which allows NR_CPUS up to 32,768 for 32-bit
- systems and up to 262,144 for 64-bit systems.
+ large numbers of CPUs. This value must be at least the fourth
+ root of NR_CPUS, which allows NR_CPUS to be insanely large.
+ The default value of RCU_FANOUT should be used for production
+ systems, but if you are stress-testing the RCU implementation
+ itself, small RCU_FANOUT values allow you to test large-system
+ code paths on small(er) systems.

Select a specific number if testing RCU itself.
Take the default if unsure.
--
1.7.0.6

2010-08-09 22:15:34

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 09/10] rcu: update obsolete rcu_read_lock() comment.

The comment says that blocking is illegal in rcu_read_lock()-style
RCU read-side critical sections, which is no longer entirely true
given preemptible RCU. This commit provides a fix.

Suggested-by: David Miller <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
include/linux/rcupdate.h | 15 ++++++++++++++-
1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 24b8966..d7af96e 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -458,7 +458,20 @@ extern int rcu_my_thread_group_empty(void);
* will be deferred until the outermost RCU read-side critical section
* completes.
*
- * It is illegal to block while in an RCU read-side critical section.
+ * You can avoid reading and understanding the next paragraph by
+ * following this rule: don't put anything in an rcu_read_lock() RCU
+ * read-side critical section that would block in a !PREEMPT kernel.
+ * But if you want the full story, read on!
+ *
+ * In non-preemptible RCU implementations (TREE_RCU and TINY_RCU), it
+ * is illegal to block while in an RCU read-side critical section. In
+ * preemptible RCU implementations (TREE_PREEMPT_RCU and TINY_PREEMPT_RCU)
+ * in CONFIG_PREEMPT kernel builds, RCU read-side critical sections may
+ * be preempted, but explicit blocking is illegal. Finally, in preemptible
+ * RCU implementations in real-time (CONFIG_PREEMPT_RT) kernel builds,
+ * RCU read-side critical sections may be preempted and they may also
+ * block, but only when acquiring spinlocks that are subject to priority
+ * inheritance.
*/
static inline void rcu_read_lock(void)
{
--
1.7.0.6

2010-08-09 22:15:24

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 08/10] rcu: Add a TINY_PREEMPT_RCU

Implement a small-memory-footprint uniprocessor-only implementation of
preemptible RCU. This implementation uses but a single blocked-tasks
list rather than the combinatorial number used per leaf rcu_node by
TREE_PREEMPT_RCU, which reduces memory consumption and greatly simplifies
processing. This version also takes advantage of uniprocessor execution
to accelerate grace periods in the case where there are no readers.

The general design is otherwise broadly similar to that of TREE_PREEMPT_RCU.

This implementation is a step towards having RCU implementation driven
off of the SMP and PREEMPT kernel configuration variables, which can
happen once this implementation has accumulated sufficient experience.

Requested-by: Loïc Minier <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
include/linux/hardirq.h | 2 +-
include/linux/init_task.h | 10 +-
include/linux/rcupdate.h | 3 +-
include/linux/rcutiny.h | 126 +++++++---
include/linux/rcutree.h | 2 +
include/linux/sched.h | 10 +-
init/Kconfig | 16 ++-
kernel/Makefile | 1 +
kernel/rcutiny.c | 33 +--
kernel/rcutiny_plugin.h | 580 ++++++++++++++++++++++++++++++++++++++++++++-
10 files changed, 715 insertions(+), 68 deletions(-)

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index d5b3876..1f4517d 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -139,7 +139,7 @@ static inline void account_system_vtime(struct task_struct *tsk)
#endif

#if defined(CONFIG_NO_HZ)
-#if defined(CONFIG_TINY_RCU)
+#if defined(CONFIG_TINY_RCU) || defined(CONFIG_TINY_PREEMPT_RCU)
extern void rcu_enter_nohz(void);
extern void rcu_exit_nohz(void);

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 6460fc6..2fea6c8 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -82,11 +82,17 @@ extern struct group_info init_groups;
# define CAP_INIT_BSET CAP_FULL_SET

#ifdef CONFIG_TREE_PREEMPT_RCU
+#define INIT_TASK_RCU_TREE_PREEMPT() \
+ .rcu_blocked_node = NULL,
+#else
+#define INIT_TASK_RCU_TREE_PREEMPT(tsk)
+#endif
+#ifdef CONFIG_PREEMPT_RCU
#define INIT_TASK_RCU_PREEMPT(tsk) \
.rcu_read_lock_nesting = 0, \
.rcu_read_unlock_special = 0, \
- .rcu_blocked_node = NULL, \
- .rcu_node_entry = LIST_HEAD_INIT(tsk.rcu_node_entry),
+ .rcu_node_entry = LIST_HEAD_INIT(tsk.rcu_node_entry), \
+ INIT_TASK_RCU_TREE_PREEMPT()
#else
#define INIT_TASK_RCU_PREEMPT(tsk)
#endif
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 27b44b3..24b8966 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -58,7 +58,6 @@ struct rcu_head {
};

/* Exported common interfaces */
-extern void rcu_barrier(void);
extern void rcu_barrier_bh(void);
extern void rcu_barrier_sched(void);
extern void synchronize_sched_expedited(void);
@@ -69,7 +68,7 @@ extern void rcu_init(void);

#if defined(CONFIG_TREE_RCU) || defined(CONFIG_TREE_PREEMPT_RCU)
#include <linux/rcutree.h>
-#elif defined(CONFIG_TINY_RCU)
+#elif defined(CONFIG_TINY_RCU) || defined(CONFIG_TINY_PREEMPT_RCU)
#include <linux/rcutiny.h>
#else
#error "Unknown RCU implementation specified to kernel configuration"
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index e2e8931..4cc5eba 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -29,66 +29,51 @@

void rcu_sched_qs(int cpu);
void rcu_bh_qs(int cpu);
-static inline void rcu_note_context_switch(int cpu)
-{
- rcu_sched_qs(cpu);
-}

+#ifdef CONFIG_TINY_RCU
#define __rcu_read_lock() preempt_disable()
#define __rcu_read_unlock() preempt_enable()
+#else /* #ifdef CONFIG_TINY_RCU */
+void __rcu_read_lock(void);
+void __rcu_read_unlock(void);
+#endif /* #else #ifdef CONFIG_TINY_RCU */
#define __rcu_read_lock_bh() local_bh_disable()
#define __rcu_read_unlock_bh() local_bh_enable()
-#define call_rcu_sched call_rcu
+extern void call_rcu_sched(struct rcu_head *head,
+ void (*func)(struct rcu_head *rcu));

#define rcu_init_sched() do { } while (0)
-extern void rcu_check_callbacks(int cpu, int user);

-static inline int rcu_needs_cpu(int cpu)
-{
- return 0;
-}
+extern void synchronize_sched(void);

-/*
- * Return the number of grace periods.
- */
-static inline long rcu_batches_completed(void)
-{
- return 0;
-}
+#ifdef CONFIG_TINY_RCU

-/*
- * Return the number of bottom-half grace periods.
- */
-static inline long rcu_batches_completed_bh(void)
-{
- return 0;
-}
+#define call_rcu call_rcu_sched

-static inline void rcu_force_quiescent_state(void)
+static inline void synchronize_rcu(void)
{
+ synchronize_sched();
}

-static inline void rcu_bh_force_quiescent_state(void)
+static inline void synchronize_rcu_expedited(void)
{
+ synchronize_sched(); /* Only one CPU, so pretty fast anyway!!! */
}

-static inline void rcu_sched_force_quiescent_state(void)
+static inline void rcu_barrier(void)
{
+ rcu_barrier_sched(); /* Only one CPU, so only one list of callbacks! */
}

-extern void synchronize_sched(void);
+#else /* #ifdef CONFIG_TINY_RCU */

-static inline void synchronize_rcu(void)
-{
- synchronize_sched();
-}
+void synchronize_rcu(void);
+void rcu_barrier(void);
+void synchronize_rcu_expedited(void);

-static inline void synchronize_rcu_bh(void)
-{
- synchronize_sched();
-}
+#endif /* #else #ifdef CONFIG_TINY_RCU */

-static inline void synchronize_rcu_expedited(void)
+static inline void synchronize_rcu_bh(void)
{
synchronize_sched();
}
@@ -117,15 +102,82 @@ static inline void rcu_exit_nohz(void)

#endif /* #else #ifdef CONFIG_NO_HZ */

+#ifdef CONFIG_TINY_RCU
+
+static inline void rcu_preempt_note_context_switch(void)
+{
+}
+
static inline void exit_rcu(void)
{
}

+static inline int rcu_needs_cpu(int cpu)
+{
+ return 0;
+}
+
static inline int rcu_preempt_depth(void)
{
return 0;
}

+#else /* #ifdef CONFIG_TINY_RCU */
+
+void rcu_preempt_note_context_switch(void);
+extern void exit_rcu(void);
+int rcu_preempt_needs_cpu(void);
+
+static inline int rcu_needs_cpu(int cpu)
+{
+ return rcu_preempt_needs_cpu();
+}
+
+/*
+ * Defined as macro as it is a very low level header
+ * included from areas that don't even know about current
+ * FIXME: combine with include/linux/rcutree.h into rcupdate.h.
+ */
+#define rcu_preempt_depth() (current->rcu_read_lock_nesting)
+
+#endif /* #else #ifdef CONFIG_TINY_RCU */
+
+static inline void rcu_note_context_switch(int cpu)
+{
+ rcu_sched_qs(cpu);
+ rcu_preempt_note_context_switch();
+}
+
+extern void rcu_check_callbacks(int cpu, int user);
+
+/*
+ * Return the number of grace periods.
+ */
+static inline long rcu_batches_completed(void)
+{
+ return 0;
+}
+
+/*
+ * Return the number of bottom-half grace periods.
+ */
+static inline long rcu_batches_completed_bh(void)
+{
+ return 0;
+}
+
+static inline void rcu_force_quiescent_state(void)
+{
+}
+
+static inline void rcu_bh_force_quiescent_state(void)
+{
+}
+
+static inline void rcu_sched_force_quiescent_state(void)
+{
+}
+
#ifdef CONFIG_DEBUG_LOCK_ALLOC

extern int rcu_scheduler_active __read_mostly;
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index c0ed1c0..c13b85d 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -95,6 +95,8 @@ static inline void synchronize_rcu_bh_expedited(void)
synchronize_sched_expedited();
}

+extern void rcu_barrier(void);
+
extern void rcu_check_callbacks(int cpu, int user);

extern long rcu_batches_completed(void);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 26b85fe..e7a2192 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1211,11 +1211,13 @@ struct task_struct {
unsigned int policy;
cpumask_t cpus_allowed;

-#ifdef CONFIG_TREE_PREEMPT_RCU
+#ifdef CONFIG_PREEMPT_RCU
int rcu_read_lock_nesting;
char rcu_read_unlock_special;
- struct rcu_node *rcu_blocked_node;
struct list_head rcu_node_entry;
+#endif /* #ifdef CONFIG_PREEMPT_RCU */
+#ifdef CONFIG_TREE_PREEMPT_RCU
+ struct rcu_node *rcu_blocked_node;
#endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */

#if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
@@ -1748,7 +1750,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
#define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
#define used_math() tsk_used_math(current)

-#ifdef CONFIG_TREE_PREEMPT_RCU
+#ifdef CONFIG_PREEMPT_RCU

#define RCU_READ_UNLOCK_BLOCKED (1 << 0) /* blocked while in RCU read-side. */
#define RCU_READ_UNLOCK_NEED_QS (1 << 1) /* RCU core needs CPU response. */
@@ -1757,7 +1759,9 @@ static inline void rcu_copy_process(struct task_struct *p)
{
p->rcu_read_lock_nesting = 0;
p->rcu_read_unlock_special = 0;
+#ifdef CONFIG_TREE_PREEMPT_RCU
p->rcu_blocked_node = NULL;
+#endif
INIT_LIST_HEAD(&p->rcu_node_entry);
}

diff --git a/init/Kconfig b/init/Kconfig
index 7faa5d8..a2ac4ce 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -344,7 +344,7 @@ config TREE_RCU
smaller systems.

config TREE_PREEMPT_RCU
- bool "Preemptable tree-based hierarchical RCU"
+ bool "Preemptible tree-based hierarchical RCU"
depends on PREEMPT
help
This option selects the RCU implementation that is
@@ -362,8 +362,22 @@ config TINY_RCU
is not required. This option greatly reduces the
memory footprint of RCU.

+config TINY_PREEMPT_RCU
+ bool "Preemptible UP-only small-memory-footprint RCU"
+ depends on !SMP && PREEMPT
+ help
+ This option selects the RCU implementation that is designed
+ for real-time UP systems. This option greatly reduces the
+ memory footprint of RCU.
+
endchoice

+config PREEMPT_RCU
+ def_bool ( TREE_PREEMPT_RCU || TINY_PREEMPT_RCU )
+ help
+ This option enables preemptible-RCU code that is common between
+ the TREE_PREEMPT_RCU and TINY_PREEMPT_RCU implementations.
+
config RCU_TRACE
bool "Enable tracing for RCU"
depends on TREE_RCU || TREE_PREEMPT_RCU
diff --git a/kernel/Makefile b/kernel/Makefile
index 057472f..75b5d5c 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -85,6 +85,7 @@ obj-$(CONFIG_TREE_RCU) += rcutree.o
obj-$(CONFIG_TREE_PREEMPT_RCU) += rcutree.o
obj-$(CONFIG_TREE_RCU_TRACE) += rcutree_trace.o
obj-$(CONFIG_TINY_RCU) += rcutiny.o
+obj-$(CONFIG_TINY_PREEMPT_RCU) += rcutiny.o
obj-$(CONFIG_RELAY) += relay.o
obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
index 196ec02..d806735 100644
--- a/kernel/rcutiny.c
+++ b/kernel/rcutiny.c
@@ -59,6 +59,14 @@ int rcu_scheduler_active __read_mostly;
EXPORT_SYMBOL_GPL(rcu_scheduler_active);
#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */

+/* Forward declarations for rcutiny_plugin.h. */
+static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp);
+static void __call_rcu(struct rcu_head *head,
+ void (*func)(struct rcu_head *rcu),
+ struct rcu_ctrlblk *rcp);
+
+#include "rcutiny_plugin.h"
+
#ifdef CONFIG_NO_HZ

static long rcu_dynticks_nesting = 1;
@@ -140,6 +148,7 @@ void rcu_check_callbacks(int cpu, int user)
rcu_sched_qs(cpu);
else if (!in_softirq())
rcu_bh_qs(cpu);
+ rcu_preempt_check_callbacks();
}

/*
@@ -162,6 +171,7 @@ static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
*rcp->donetail = NULL;
if (rcp->curtail == rcp->donetail)
rcp->curtail = &rcp->rcucblist;
+ rcu_preempt_remove_callbacks(rcp);
rcp->donetail = &rcp->rcucblist;
local_irq_restore(flags);

@@ -182,6 +192,7 @@ static void rcu_process_callbacks(struct softirq_action *unused)
{
__rcu_process_callbacks(&rcu_sched_ctrlblk);
__rcu_process_callbacks(&rcu_bh_ctrlblk);
+ rcu_preempt_process_callbacks();
}

/*
@@ -223,15 +234,15 @@ static void __call_rcu(struct rcu_head *head,
}

/*
- * Post an RCU callback to be invoked after the end of an RCU grace
+ * Post an RCU callback to be invoked after the end of an RCU-sched grace
* period. But since we have but one CPU, that would be after any
* quiescent state.
*/
-void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
+void call_rcu_sched(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
{
__call_rcu(head, func, &rcu_sched_ctrlblk);
}
-EXPORT_SYMBOL_GPL(call_rcu);
+EXPORT_SYMBOL_GPL(call_rcu_sched);

/*
* Post an RCU bottom-half callback to be invoked after any subsequent
@@ -243,20 +254,6 @@ void call_rcu_bh(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
}
EXPORT_SYMBOL_GPL(call_rcu_bh);

-void rcu_barrier(void)
-{
- struct rcu_synchronize rcu;
-
- init_rcu_head_on_stack(&rcu.head);
- init_completion(&rcu.completion);
- /* Will wake me after RCU finished. */
- call_rcu(&rcu.head, wakeme_after_rcu);
- /* Wait for it. */
- wait_for_completion(&rcu.completion);
- destroy_rcu_head_on_stack(&rcu.head);
-}
-EXPORT_SYMBOL_GPL(rcu_barrier);
-
void rcu_barrier_bh(void)
{
struct rcu_synchronize rcu;
@@ -289,5 +286,3 @@ void __init rcu_init(void)
{
open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
}
-
-#include "rcutiny_plugin.h"
diff --git a/kernel/rcutiny_plugin.h b/kernel/rcutiny_plugin.h
index d223a92..44f2c92 100644
--- a/kernel/rcutiny_plugin.h
+++ b/kernel/rcutiny_plugin.h
@@ -1,7 +1,7 @@
/*
- * Read-Copy Update mechanism for mutual exclusion (tree-based version)
+ * Read-Copy Update mechanism for mutual exclusion, the Bloatwatch edition
* Internal non-public definitions that provide either classic
- * or preemptable semantics.
+ * or preemptible semantics.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -17,11 +17,585 @@
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
*
- * Copyright IBM Corporation, 2009
+ * Copyright (c) 2010 Linaro
*
* Author: Paul E. McKenney <[email protected]>
*/

+#ifdef CONFIG_TINY_PREEMPT_RCU
+
+#include <linux/delay.h>
+
+/* FIXME: merge with definitions in kernel/rcutree.h. */
+#define ULONG_CMP_GE(a, b) (ULONG_MAX / 2 >= (a) - (b))
+#define ULONG_CMP_LT(a, b) (ULONG_MAX / 2 < (a) - (b))
+
+/* Global control variables for preemptible RCU. */
+struct rcu_preempt_ctrlblk {
+ struct rcu_ctrlblk rcb; /* curtail: ->next ptr of last CB for GP. */
+ struct rcu_head **nexttail;
+ /* Tasks blocked in a preemptible RCU */
+ /* read-side critical section while an */
+ /* preemptible-RCU grace period is in */
+ /* progress must wait for a later grace */
+ /* period. This pointer points to the */
+ /* ->next pointer of the last task that */
+ /* must wait for a later grace period, or */
+ /* to &->rcb.rcucblist if there is no */
+ /* such task. */
+ struct list_head blkd_tasks;
+ /* Tasks blocked in RCU read-side critical */
+ /* section. Tasks are placed at the head */
+ /* of this list and age towards the tail. */
+ struct list_head *gp_tasks;
+ /* Pointer to the first task blocking the */
+ /* current grace period, or NULL if there */
+ /* is not such task. */
+ struct list_head *exp_tasks;
+ /* Pointer to first task blocking the */
+ /* current expedited grace period, or NULL */
+ /* if there is no such task. If there */
+ /* is no current expedited grace period, */
+ /* then there cannot be any such task. */
+ u8 gpnum; /* Current grace period. */
+ u8 gpcpu; /* Last grace period blocked by the CPU. */
+ u8 completed; /* Last grace period completed. */
+ /* If all three are equal, RCU is idle. */
+};
+
+static struct rcu_preempt_ctrlblk rcu_preempt_ctrlblk = {
+ .rcb.donetail = &rcu_preempt_ctrlblk.rcb.rcucblist,
+ .rcb.curtail = &rcu_preempt_ctrlblk.rcb.rcucblist,
+ .nexttail = &rcu_preempt_ctrlblk.rcb.rcucblist,
+ .blkd_tasks = LIST_HEAD_INIT(rcu_preempt_ctrlblk.blkd_tasks),
+};
+
+static int rcu_preempted_readers_exp(void);
+static void rcu_report_exp_done(void);
+
+/*
+ * Return true if the CPU has not yet responded to the current grace period.
+ */
+static int rcu_cpu_cur_gp(void)
+{
+ return rcu_preempt_ctrlblk.gpcpu != rcu_preempt_ctrlblk.gpnum;
+}
+
+/*
+ * Check for a running RCU reader. Because there is only one CPU,
+ * there can be but one running RCU reader at a time. ;-)
+ */
+static int rcu_preempt_running_reader(void)
+{
+ return current->rcu_read_lock_nesting;
+}
+
+/*
+ * Check for preempted RCU readers blocking any grace period.
+ * If the caller needs a reliable answer, it must disable hard irqs.
+ */
+static int rcu_preempt_blocked_readers_any(void)
+{
+ return !list_empty(&rcu_preempt_ctrlblk.blkd_tasks);
+}
+
+/*
+ * Check for preempted RCU readers blocking the current grace period.
+ * If the caller needs a reliable answer, it must disable hard irqs.
+ */
+static int rcu_preempt_blocked_readers_cgp(void)
+{
+ return rcu_preempt_ctrlblk.gp_tasks != NULL;
+}
+
+/*
+ * Return true if another preemptible-RCU grace period is needed.
+ */
+static int rcu_preempt_needs_another_gp(void)
+{
+ return *rcu_preempt_ctrlblk.rcb.curtail != NULL;
+}
+
+/*
+ * Return true if a preemptible-RCU grace period is in progress.
+ * The caller must disable hardirqs.
+ */
+static int rcu_preempt_gp_in_progress(void)
+{
+ return rcu_preempt_ctrlblk.completed != rcu_preempt_ctrlblk.gpnum;
+}
+
+/*
+ * Record a preemptible-RCU quiescent state for the specified CPU. Note
+ * that this just means that the task currently running on the CPU is
+ * in a quiescent state. There might be any number of tasks blocked
+ * while in an RCU read-side critical section.
+ *
+ * Unlike the other rcu_*_qs() functions, callers to this function
+ * must disable irqs in order to protect the assignment to
+ * ->rcu_read_unlock_special.
+ *
+ * Because this is a single-CPU implementation, the only way a grace
+ * period can end is if the CPU is in a quiescent state. The reason is
+ * that a blocked preemptible-RCU reader can exit its critical section
+ * only if the CPU is running it at the time. Therefore, when the
+ * last task blocking the current grace period exits its RCU read-side
+ * critical section, neither the CPU nor blocked tasks will be stopping
+ * the current grace period. (In contrast, SMP implementations
+ * might have CPUs running in RCU read-side critical sections that
+ * block later grace periods -- but this is not possible given only
+ * one CPU.)
+ */
+static void rcu_preempt_cpu_qs(void)
+{
+ /* Record both CPU and task as having responded to current GP. */
+ rcu_preempt_ctrlblk.gpcpu = rcu_preempt_ctrlblk.gpnum;
+ current->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
+
+ /*
+ * If there is no GP, or if blocked readers are still blocking GP,
+ * then there is nothing more to do.
+ */
+ if (!rcu_preempt_gp_in_progress() || rcu_preempt_blocked_readers_cgp())
+ return;
+
+ /* Advance callbacks. */
+ rcu_preempt_ctrlblk.completed = rcu_preempt_ctrlblk.gpnum;
+ rcu_preempt_ctrlblk.rcb.donetail = rcu_preempt_ctrlblk.rcb.curtail;
+ rcu_preempt_ctrlblk.rcb.curtail = rcu_preempt_ctrlblk.nexttail;
+
+ /* If there are no blocked readers, next GP is done instantly. */
+ if (!rcu_preempt_blocked_readers_any())
+ rcu_preempt_ctrlblk.rcb.donetail = rcu_preempt_ctrlblk.nexttail;
+
+ /* If there are done callbacks, make RCU_SOFTIRQ process them. */
+ if (*rcu_preempt_ctrlblk.rcb.donetail != NULL)
+ raise_softirq(RCU_SOFTIRQ);
+}
+
+/*
+ * Start a new RCU grace period if warranted. Hard irqs must be disabled.
+ */
+static void rcu_preempt_start_gp(void)
+{
+ if (!rcu_preempt_gp_in_progress() && rcu_preempt_needs_another_gp()) {
+
+ /* Official start of GP. */
+ rcu_preempt_ctrlblk.gpnum++;
+
+ /* Any blocked RCU readers block new GP. */
+ if (rcu_preempt_blocked_readers_any())
+ rcu_preempt_ctrlblk.gp_tasks =
+ rcu_preempt_ctrlblk.blkd_tasks.next;
+
+ /* If there is no running reader, CPU is done with GP. */
+ if (!rcu_preempt_running_reader())
+ rcu_preempt_cpu_qs();
+ }
+}
+
+/*
+ * We have entered the scheduler, and the current task might soon be
+ * context-switched away from. If this task is in an RCU read-side
+ * critical section, we will no longer be able to rely on the CPU to
+ * record that fact, so we enqueue the task on the blkd_tasks list.
+ * If the task started after the current grace period began, as recorded
+ * by ->gpcpu, we enqueue at the beginning of the list. Otherwise
+ * before the element referenced by ->gp_tasks (or at the tail if
+ * ->gp_tasks is NULL) and point ->gp_tasks at the newly added element.
+ * The task will dequeue itself when it exits the outermost enclosing
+ * RCU read-side critical section. Therefore, the current grace period
+ * cannot be permitted to complete until the ->gp_tasks pointer becomes
+ * NULL.
+ *
+ * Caller must disable preemption.
+ */
+void rcu_preempt_note_context_switch(void)
+{
+ struct task_struct *t = current;
+ unsigned long flags;
+
+ local_irq_save(flags); /* must exclude scheduler_tick(). */
+ if (rcu_preempt_running_reader() &&
+ (t->rcu_read_unlock_special & RCU_READ_UNLOCK_BLOCKED) == 0) {
+
+ /* Possibly blocking in an RCU read-side critical section. */
+ t->rcu_read_unlock_special |= RCU_READ_UNLOCK_BLOCKED;
+
+ /*
+ * If this CPU has already checked in, then this task
+ * will hold up the next grace period rather than the
+ * current grace period. Queue the task accordingly.
+ * If the task is queued for the current grace period
+ * (i.e., this CPU has not yet passed through a quiescent
+ * state for the current grace period), then as long
+ * as that task remains queued, the current grace period
+ * cannot end.
+ */
+ list_add(&t->rcu_node_entry, &rcu_preempt_ctrlblk.blkd_tasks);
+ if (rcu_cpu_cur_gp())
+ rcu_preempt_ctrlblk.gp_tasks = &t->rcu_node_entry;
+ }
+
+ /*
+ * Either we were not in an RCU read-side critical section to
+ * begin with, or we have now recorded that critical section
+ * globally. Either way, we can now note a quiescent state
+ * for this CPU. Again, if we were in an RCU read-side critical
+ * section, and if that critical section was blocking the current
+ * grace period, then the fact that the task has been enqueued
+ * means that current grace period continues to be blocked.
+ */
+ rcu_preempt_cpu_qs();
+ local_irq_restore(flags);
+}
+
+/*
+ * Tiny-preemptible RCU implementation for rcu_read_lock().
+ * Just increment ->rcu_read_lock_nesting, shared state will be updated
+ * if we block.
+ */
+void __rcu_read_lock(void)
+{
+ current->rcu_read_lock_nesting++;
+ barrier(); /* needed if we ever invoke rcu_read_lock in rcutiny.c */
+}
+EXPORT_SYMBOL_GPL(__rcu_read_lock);
+
+/*
+ * Handle special cases during rcu_read_unlock(), such as needing to
+ * notify RCU core processing or task having blocked during the RCU
+ * read-side critical section.
+ */
+static void rcu_read_unlock_special(struct task_struct *t)
+{
+ int empty;
+ int empty_exp;
+ unsigned long flags;
+ struct list_head *np;
+ int special;
+
+ /*
+ * NMI handlers cannot block and cannot safely manipulate state.
+ * They therefore cannot possibly be special, so just leave.
+ */
+ if (in_nmi())
+ return;
+
+ local_irq_save(flags);
+
+ /*
+ * If RCU core is waiting for this CPU to exit critical section,
+ * let it know that we have done so.
+ */
+ special = t->rcu_read_unlock_special;
+ if (special & RCU_READ_UNLOCK_NEED_QS)
+ rcu_preempt_cpu_qs();
+
+ /* Hardware IRQ handlers cannot block. */
+ if (in_irq()) {
+ local_irq_restore(flags);
+ return;
+ }
+
+ /* Clean up if blocked during RCU read-side critical section. */
+ if (special & RCU_READ_UNLOCK_BLOCKED) {
+ t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_BLOCKED;
+
+ /*
+ * Remove this task from the ->blkd_tasks list and adjust
+ * any pointers that might have been referencing it.
+ */
+ empty = !rcu_preempt_blocked_readers_cgp();
+ empty_exp = rcu_preempt_ctrlblk.exp_tasks == NULL;
+ np = t->rcu_node_entry.next;
+ if (np == &rcu_preempt_ctrlblk.blkd_tasks)
+ np = NULL;
+ list_del(&t->rcu_node_entry);
+ if (&t->rcu_node_entry == rcu_preempt_ctrlblk.gp_tasks)
+ rcu_preempt_ctrlblk.gp_tasks = np;
+ if (&t->rcu_node_entry == rcu_preempt_ctrlblk.exp_tasks)
+ rcu_preempt_ctrlblk.exp_tasks = np;
+ INIT_LIST_HEAD(&t->rcu_node_entry);
+
+ /*
+ * If this was the last task on the current list, and if
+ * we aren't waiting on the CPU, report the quiescent state
+ * and start a new grace period if needed.
+ */
+ if (!empty && !rcu_preempt_blocked_readers_cgp()) {
+ rcu_preempt_cpu_qs();
+ rcu_preempt_start_gp();
+ }
+
+ /*
+ * If this was the last task on the expedited lists,
+ * then we need wake up the waiting task.
+ */
+ if (!empty_exp && rcu_preempt_ctrlblk.exp_tasks == NULL)
+ rcu_report_exp_done();
+ }
+ local_irq_restore(flags);
+}
+
+/*
+ * Tiny-preemptible RCU implementation for rcu_read_unlock().
+ * Decrement ->rcu_read_lock_nesting. If the result is zero (outermost
+ * rcu_read_unlock()) and ->rcu_read_unlock_special is non-zero, then
+ * invoke rcu_read_unlock_special() to clean up after a context switch
+ * in an RCU read-side critical section and other special cases.
+ */
+void __rcu_read_unlock(void)
+{
+ struct task_struct *t = current;
+
+ barrier(); /* needed if we ever invoke rcu_read_unlock in rcutiny.c */
+ if (--t->rcu_read_lock_nesting == 0 &&
+ unlikely(t->rcu_read_unlock_special))
+ rcu_read_unlock_special(t);
+#ifdef CONFIG_PROVE_LOCKING
+ WARN_ON_ONCE(t->rcu_read_lock_nesting < 0);
+#endif /* #ifdef CONFIG_PROVE_LOCKING */
+}
+EXPORT_SYMBOL_GPL(__rcu_read_unlock);
+
+/*
+ * Check for a quiescent state from the current CPU. When a task blocks,
+ * the task is recorded in the rcu_preempt_ctrlblk structure, which is
+ * checked elsewhere. This is called from the scheduling-clock interrupt.
+ *
+ * Caller must disable hard irqs.
+ */
+static void rcu_preempt_check_callbacks(void)
+{
+ struct task_struct *t = current;
+
+ if (!rcu_preempt_running_reader() && rcu_preempt_gp_in_progress())
+ rcu_preempt_cpu_qs();
+ if (&rcu_preempt_ctrlblk.rcb.rcucblist !=
+ rcu_preempt_ctrlblk.rcb.donetail)
+ raise_softirq(RCU_SOFTIRQ);
+ if (rcu_preempt_gp_in_progress() && rcu_preempt_running_reader())
+ t->rcu_read_unlock_special |= RCU_READ_UNLOCK_NEED_QS;
+}
+
+/*
+ * TINY_PREEMPT_RCU has an extra callback-list tail pointer to
+ * update, so this is invoked from __rcu_process_callbacks() to
+ * handle that case. Of course, it is invoked for all flavors of
+ * RCU, but RCU callbacks can appear only on one of the lists, and
+ * neither ->nexttail nor ->donetail can possibly be NULL, so there
+ * is no need for an explicit check.
+ */
+static void rcu_preempt_remove_callbacks(struct rcu_ctrlblk *rcp)
+{
+ if (rcu_preempt_ctrlblk.nexttail == rcp->donetail)
+ rcu_preempt_ctrlblk.nexttail = &rcp->rcucblist;
+}
+
+/*
+ * Process callbacks for preemptible RCU.
+ */
+static void rcu_preempt_process_callbacks(void)
+{
+ __rcu_process_callbacks(&rcu_preempt_ctrlblk.rcb);
+}
+
+/*
+ * Queue a preemptible -RCU callback for invocation after a grace period.
+ */
+void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
+{
+ unsigned long flags;
+
+ debug_rcu_head_queue(head);
+ head->func = func;
+ head->next = NULL;
+
+ local_irq_save(flags);
+ *rcu_preempt_ctrlblk.nexttail = head;
+ rcu_preempt_ctrlblk.nexttail = &head->next;
+ rcu_preempt_start_gp(); /* checks to see if GP needed. */
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(call_rcu);
+
+void rcu_barrier(void)
+{
+ struct rcu_synchronize rcu;
+
+ init_rcu_head_on_stack(&rcu.head);
+ init_completion(&rcu.completion);
+ /* Will wake me after RCU finished. */
+ call_rcu(&rcu.head, wakeme_after_rcu);
+ /* Wait for it. */
+ wait_for_completion(&rcu.completion);
+ destroy_rcu_head_on_stack(&rcu.head);
+}
+EXPORT_SYMBOL_GPL(rcu_barrier);
+
+/*
+ * synchronize_rcu - wait until a grace period has elapsed.
+ *
+ * Control will return to the caller some time after a full grace
+ * period has elapsed, in other words after all currently executing RCU
+ * read-side critical sections have completed. RCU read-side critical
+ * sections are delimited by rcu_read_lock() and rcu_read_unlock(),
+ * and may be nested.
+ */
+void synchronize_rcu(void)
+{
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ if (!rcu_scheduler_active)
+ return;
+#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
+
+ WARN_ON_ONCE(rcu_preempt_running_reader());
+ if (!rcu_preempt_blocked_readers_any())
+ return;
+
+ /* Once we get past the fastpath checks, same code as rcu_barrier(). */
+ rcu_barrier();
+}
+EXPORT_SYMBOL_GPL(synchronize_rcu);
+
+static DECLARE_WAIT_QUEUE_HEAD(sync_rcu_preempt_exp_wq);
+static unsigned long sync_rcu_preempt_exp_count;
+static DEFINE_MUTEX(sync_rcu_preempt_exp_mutex);
+
+/*
+ * Return non-zero if there are any tasks in RCU read-side critical
+ * sections blocking the current preemptible-RCU expedited grace period.
+ * If there is no preemptible-RCU expedited grace period currently in
+ * progress, returns zero unconditionally.
+ */
+static int rcu_preempted_readers_exp(void)
+{
+ return rcu_preempt_ctrlblk.exp_tasks != NULL;
+}
+
+/*
+ * Report the exit from RCU read-side critical section for the last task
+ * that queued itself during or before the current expedited preemptible-RCU
+ * grace period.
+ */
+static void rcu_report_exp_done(void)
+{
+ wake_up(&sync_rcu_preempt_exp_wq);
+}
+
+/*
+ * Wait for an rcu-preempt grace period, but expedite it. The basic idea
+ * is to rely in the fact that there is but one CPU, and that it is
+ * illegal for a task to invoke synchronize_rcu_expedited() while in a
+ * preemptible-RCU read-side critical section. Therefore, any such
+ * critical sections must correspond to blocked tasks, which must therefore
+ * be on the ->blkd_tasks list. So just record the current head of the
+ * list in the ->exp_tasks pointer, and wait for all tasks including and
+ * after the task pointed to by ->exp_tasks to drain.
+ */
+void synchronize_rcu_expedited(void)
+{
+ unsigned long flags;
+ struct rcu_preempt_ctrlblk *rpcp = &rcu_preempt_ctrlblk;
+ unsigned long snap;
+
+ barrier(); /* ensure prior action seen before grace period. */
+
+ WARN_ON_ONCE(rcu_preempt_running_reader());
+
+ /*
+ * Acquire lock so that there is only one preemptible RCU grace
+ * period in flight. Of course, if someone does the expedited
+ * grace period for us while we are acquiring the lock, just leave.
+ */
+ snap = sync_rcu_preempt_exp_count + 1;
+ mutex_lock(&sync_rcu_preempt_exp_mutex);
+ if (ULONG_CMP_LT(snap, sync_rcu_preempt_exp_count))
+ goto unlock_mb_ret; /* Others did our work for us. */
+
+ local_irq_save(flags);
+
+ /*
+ * All RCU readers have to already be on blkd_tasks because
+ * we cannot legally be executing in an RCU read-side critical
+ * section.
+ */
+
+ /* Snapshot current head of ->blkd_tasks list. */
+ rpcp->exp_tasks = rpcp->blkd_tasks.next;
+ if (rpcp->exp_tasks == &rpcp->blkd_tasks)
+ rpcp->exp_tasks = NULL;
+ local_irq_restore(flags);
+
+ /* Wait for tail of ->blkd_tasks list to drain. */
+ if (rcu_preempted_readers_exp())
+ wait_event(sync_rcu_preempt_exp_wq,
+ !rcu_preempted_readers_exp());
+
+ /* Clean up and exit. */
+ barrier(); /* ensure expedited GP seen before counter increment. */
+ sync_rcu_preempt_exp_count++;
+unlock_mb_ret:
+ mutex_unlock(&sync_rcu_preempt_exp_mutex);
+ barrier(); /* ensure subsequent action seen after grace period. */
+}
+EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
+
+/*
+ * Does preemptible RCU need the CPU to stay out of dynticks mode?
+ */
+int rcu_preempt_needs_cpu(void)
+{
+ if (!rcu_preempt_running_reader())
+ rcu_preempt_cpu_qs();
+ return rcu_preempt_ctrlblk.rcb.rcucblist != NULL;
+}
+
+/*
+ * Check for a task exiting while in a preemptible -RCU read-side
+ * critical section, clean up if so. No need to issue warnings,
+ * as debug_check_no_locks_held() already does this if lockdep
+ * is enabled.
+ */
+void exit_rcu(void)
+{
+ struct task_struct *t = current;
+
+ if (t->rcu_read_lock_nesting == 0)
+ return;
+ t->rcu_read_lock_nesting = 1;
+ rcu_read_unlock();
+}
+
+#else /* #ifdef CONFIG_TINY_PREEMPT_RCU */
+
+/*
+ * Because preemptible RCU does not exist, it never has any callbacks
+ * to check.
+ */
+static void rcu_preempt_check_callbacks(void)
+{
+}
+
+/*
+ * Because preemptible RCU does not exist, it never has any callbacks
+ * to remove.
+ */
+static void rcu_preempt_remove_callbacks(struct rcu_ctrlblk *rcp)
+{
+}
+
+/*
+ * Because preemptible RCU does not exist, it never has any callbacks
+ * to process.
+ */
+static void rcu_preempt_process_callbacks(void)
+{
+}
+
+#endif /* #else #ifdef CONFIG_TINY_PREEMPT_RCU */
+
#ifdef CONFIG_DEBUG_LOCK_ALLOC

#include <linux/kernel_stat.h>
--
1.7.0.6

2010-08-09 22:15:26

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 02/10] Update documentation to note the passage of INIT_RCU_HEAD()

Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Reviewed-by: Josh Triplett <[email protected]>
---
Documentation/DocBook/kernel-locking.tmpl | 8 --------
1 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl
index 084f6ad..e6cc574 100644
--- a/Documentation/DocBook/kernel-locking.tmpl
+++ b/Documentation/DocBook/kernel-locking.tmpl
@@ -1725,14 +1725,6 @@ the amount of locking which needs to be done.
if (++cache_num > MAX_CACHE_SIZE) {
struct object *i, *outcast = NULL;
list_for_each_entry(i, &amp;cache, list) {
-@@ -85,6 +94,7 @@
- obj-&gt;popularity = 0;
- atomic_set(&amp;obj-&gt;refcnt, 1); /* The cache holds a reference */
- spin_lock_init(&amp;obj-&gt;lock);
-+ INIT_RCU_HEAD(&amp;obj-&gt;rcu);
-
- spin_lock_irqsave(&amp;cache_lock, flags);
- __cache_add(obj);
@@ -104,12 +114,11 @@
struct object *cache_find(int id)
{
--
1.7.0.6

2010-08-16 14:45:36

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 09/10] rcu: update obsolete rcu_read_lock() comment.

* Paul E. McKenney ([email protected]) wrote:
> The comment says that blocking is illegal in rcu_read_lock()-style
> RCU read-side critical sections, which is no longer entirely true
> given preemptible RCU. This commit provides a fix.
>
> Suggested-by: David Miller <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
> ---
> include/linux/rcupdate.h | 15 ++++++++++++++-
> 1 files changed, 14 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 24b8966..d7af96e 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -458,7 +458,20 @@ extern int rcu_my_thread_group_empty(void);
> * will be deferred until the outermost RCU read-side critical section
> * completes.
> *
> - * It is illegal to block while in an RCU read-side critical section.
> + * You can avoid reading and understanding the next paragraph by
> + * following this rule: don't put anything in an rcu_read_lock() RCU
> + * read-side critical section that would block in a !PREEMPT kernel.
> + * But if you want the full story, read on!
> + *
> + * In non-preemptible RCU implementations (TREE_RCU and TINY_RCU), it
> + * is illegal to block while in an RCU read-side critical section. In
> + * preemptible RCU implementations (TREE_PREEMPT_RCU and TINY_PREEMPT_RCU)
> + * in CONFIG_PREEMPT kernel builds, RCU read-side critical sections may
> + * be preempted, but explicit blocking is illegal. Finally, in preemptible
> + * RCU implementations in real-time (CONFIG_PREEMPT_RT) kernel builds,
> + * RCU read-side critical sections may be preempted and they may also
> + * block, but only when acquiring spinlocks that are subject to priority
> + * inheritance.

It might be good to add a note about locking chain dependency that is
created in the RT case, e.g., the lock we are sharing with another
context in preempt RT is subject to the same rules as the RCU C.S.. It
should never call synchronize_rcu(); this would cause a RCU+lock-induced
deadlock.

I must admit, however, that because calling synchronize_rcu() from
spinlocks is already forbidden, this is already implied.

Thanks,

Mathieu

> */
> static inline void rcu_read_lock(void)
> {
> --
> 1.7.0.6
>

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-08-16 15:07:40

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 08/10] rcu: Add a TINY_PREEMPT_RCU

* Paul E. McKenney ([email protected]) wrote:
[...]
> +
> +/*
> + * Tiny-preemptible RCU implementation for rcu_read_unlock().
> + * Decrement ->rcu_read_lock_nesting. If the result is zero (outermost
> + * rcu_read_unlock()) and ->rcu_read_unlock_special is non-zero, then
> + * invoke rcu_read_unlock_special() to clean up after a context switch
> + * in an RCU read-side critical section and other special cases.
> + */
> +void __rcu_read_unlock(void)
> +{
> + struct task_struct *t = current;
> +
> + barrier(); /* needed if we ever invoke rcu_read_unlock in rcutiny.c */
> + if (--t->rcu_read_lock_nesting == 0 &&
> + unlikely(t->rcu_read_unlock_special))

Hrm I think we discussed this in a past life, but would the following
sequence be possible and correct ?

CPU 0

read t->rcu_read_unlock_special
interrupt comes in, preempts. sets t->rcu_read_unlock_special
<preempted>
<scheduled back>
iret
decrement and read t->rcu_read_lock_nesting
test both old "special" value (which we have locally on the stack) and
detect that rcu_read_lock_nesting is 0.

We actually missed a reschedule.

I think we might need a barrier() between the t->rcu_read_lock_nesting
and t->rcu_read_unlock_special reads. We might need to audit
TREE PREEMPT RCU for the same kind of behavior.

But I might be (again ?) missing something. I've got the feeling you
already convinced me that this was OK for some reason, but I trip on
this every time I read the code.

[...]

> +/*
> + * Check for a task exiting while in a preemptible -RCU read-side
> + * critical section, clean up if so. No need to issue warnings,
> + * as debug_check_no_locks_held() already does this if lockdep
> + * is enabled.
> + */
> +void exit_rcu(void)
> +{
> + struct task_struct *t = current;
> +
> + if (t->rcu_read_lock_nesting == 0)
> + return;
> + t->rcu_read_lock_nesting = 1;
> + rcu_read_unlock();
> +}
> +

The interaction with preemption is unclear here. exit.c disables
preemption around the call to exit_rcu(), but if, for some reason,
rcu_read_unlock_special was set earlier by preemption, then the
rcu_read_unlock() code might block and cause problems.

Maybe we should consider clearing rcu_read_unlock_special here ?

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-08-16 17:55:17

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 09/10] rcu: update obsolete rcu_read_lock() comment.

On Mon, Aug 16, 2010 at 10:45:32AM -0400, Mathieu Desnoyers wrote:
> * Paul E. McKenney ([email protected]) wrote:
> > The comment says that blocking is illegal in rcu_read_lock()-style
> > RCU read-side critical sections, which is no longer entirely true
> > given preemptible RCU. This commit provides a fix.
> >
> > Suggested-by: David Miller <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
> > ---
> > include/linux/rcupdate.h | 15 ++++++++++++++-
> > 1 files changed, 14 insertions(+), 1 deletions(-)
> >
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 24b8966..d7af96e 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -458,7 +458,20 @@ extern int rcu_my_thread_group_empty(void);
> > * will be deferred until the outermost RCU read-side critical section
> > * completes.
> > *
> > - * It is illegal to block while in an RCU read-side critical section.
> > + * You can avoid reading and understanding the next paragraph by
> > + * following this rule: don't put anything in an rcu_read_lock() RCU
> > + * read-side critical section that would block in a !PREEMPT kernel.
> > + * But if you want the full story, read on!
> > + *
> > + * In non-preemptible RCU implementations (TREE_RCU and TINY_RCU), it
> > + * is illegal to block while in an RCU read-side critical section. In
> > + * preemptible RCU implementations (TREE_PREEMPT_RCU and TINY_PREEMPT_RCU)
> > + * in CONFIG_PREEMPT kernel builds, RCU read-side critical sections may
> > + * be preempted, but explicit blocking is illegal. Finally, in preemptible
> > + * RCU implementations in real-time (CONFIG_PREEMPT_RT) kernel builds,
> > + * RCU read-side critical sections may be preempted and they may also
> > + * block, but only when acquiring spinlocks that are subject to priority
> > + * inheritance.
>
> It might be good to add a note about locking chain dependency that is
> created in the RT case, e.g., the lock we are sharing with another
> context in preempt RT is subject to the same rules as the RCU C.S.. It
> should never call synchronize_rcu(); this would cause a RCU+lock-induced
> deadlock.
>
> I must admit, however, that because calling synchronize_rcu() from
> spinlocks is already forbidden, this is already implied.

Thank you for looking this over!

I am updating the srcu_read_lock() docbook comments to call out the
potential for this problem, given that SRCU read-side critical sections
can acquire mutexes, which can be held across both synchronize_srcu()
and synchronize_srcu_expedited().

Seem reasonable?

Thanx, Paul

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 6f456a7..58971e8 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -139,7 +139,12 @@ static inline int srcu_read_lock_held(struct srcu_struct *sp)
* @sp: srcu_struct in which to register the new reader.
*
* Enter an SRCU read-side critical section. Note that SRCU read-side
- * critical sections may be nested.
+ * critical sections may be nested. However, it is illegal to
+ * call anything that waits on an SRCU grace period for the same
+ * srcu_struct, whether directly or indirectly. Please note that
+ * one way to indirectly wait on an SRCU grace period is to acquire
+ * a mutex that is held elsewhere while calling synchronize_srcu() or
+ * synchronize_srcu_expedited().
*/
static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
{

> Thanks,
>
> Mathieu
>
> > */
> > static inline void rcu_read_lock(void)
> > {
> > --
> > 1.7.0.6
> >
>
> --
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com

2010-08-16 18:29:49

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 09/10] rcu: update obsolete rcu_read_lock() comment.

* Paul E. McKenney ([email protected]) wrote:
> On Mon, Aug 16, 2010 at 10:45:32AM -0400, Mathieu Desnoyers wrote:
> > * Paul E. McKenney ([email protected]) wrote:
> > > The comment says that blocking is illegal in rcu_read_lock()-style
> > > RCU read-side critical sections, which is no longer entirely true
> > > given preemptible RCU. This commit provides a fix.
> > >
> > > Suggested-by: David Miller <[email protected]>
> > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > ---
> > > include/linux/rcupdate.h | 15 ++++++++++++++-
> > > 1 files changed, 14 insertions(+), 1 deletions(-)
> > >
> > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > index 24b8966..d7af96e 100644
> > > --- a/include/linux/rcupdate.h
> > > +++ b/include/linux/rcupdate.h
> > > @@ -458,7 +458,20 @@ extern int rcu_my_thread_group_empty(void);
> > > * will be deferred until the outermost RCU read-side critical section
> > > * completes.
> > > *
> > > - * It is illegal to block while in an RCU read-side critical section.
> > > + * You can avoid reading and understanding the next paragraph by
> > > + * following this rule: don't put anything in an rcu_read_lock() RCU
> > > + * read-side critical section that would block in a !PREEMPT kernel.
> > > + * But if you want the full story, read on!
> > > + *
> > > + * In non-preemptible RCU implementations (TREE_RCU and TINY_RCU), it
> > > + * is illegal to block while in an RCU read-side critical section. In
> > > + * preemptible RCU implementations (TREE_PREEMPT_RCU and TINY_PREEMPT_RCU)
> > > + * in CONFIG_PREEMPT kernel builds, RCU read-side critical sections may
> > > + * be preempted, but explicit blocking is illegal. Finally, in preemptible
> > > + * RCU implementations in real-time (CONFIG_PREEMPT_RT) kernel builds,
> > > + * RCU read-side critical sections may be preempted and they may also
> > > + * block, but only when acquiring spinlocks that are subject to priority
> > > + * inheritance.
> >
> > It might be good to add a note about locking chain dependency that is
> > created in the RT case, e.g., the lock we are sharing with another
> > context in preempt RT is subject to the same rules as the RCU C.S.. It
> > should never call synchronize_rcu(); this would cause a RCU+lock-induced
> > deadlock.
> >
> > I must admit, however, that because calling synchronize_rcu() from
> > spinlocks is already forbidden, this is already implied.
>
> Thank you for looking this over!
>
> I am updating the srcu_read_lock() docbook comments to call out the
> potential for this problem, given that SRCU read-side critical sections
> can acquire mutexes, which can be held across both synchronize_srcu()
> and synchronize_srcu_expedited().
>
> Seem reasonable?

Yep, thanks!

Mathieu

>
> Thanx, Paul
>
> diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> index 6f456a7..58971e8 100644
> --- a/include/linux/srcu.h
> +++ b/include/linux/srcu.h
> @@ -139,7 +139,12 @@ static inline int srcu_read_lock_held(struct srcu_struct *sp)
> * @sp: srcu_struct in which to register the new reader.
> *
> * Enter an SRCU read-side critical section. Note that SRCU read-side
> - * critical sections may be nested.
> + * critical sections may be nested. However, it is illegal to
> + * call anything that waits on an SRCU grace period for the same
> + * srcu_struct, whether directly or indirectly. Please note that
> + * one way to indirectly wait on an SRCU grace period is to acquire
> + * a mutex that is held elsewhere while calling synchronize_srcu() or
> + * synchronize_srcu_expedited().
> */
> static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
> {
>
> > Thanks,
> >
> > Mathieu
> >
> > > */
> > > static inline void rcu_read_lock(void)
> > > {
> > > --
> > > 1.7.0.6
> > >
> >
> > --
> > Mathieu Desnoyers
> > Operating System Efficiency R&D Consultant
> > EfficiOS Inc.
> > http://www.efficios.com

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-08-16 18:34:04

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 08/10] rcu: Add a TINY_PREEMPT_RCU

On Mon, Aug 16, 2010 at 11:07:37AM -0400, Mathieu Desnoyers wrote:
> * Paul E. McKenney ([email protected]) wrote:
> [...]
> > +
> > +/*
> > + * Tiny-preemptible RCU implementation for rcu_read_unlock().
> > + * Decrement ->rcu_read_lock_nesting. If the result is zero (outermost
> > + * rcu_read_unlock()) and ->rcu_read_unlock_special is non-zero, then
> > + * invoke rcu_read_unlock_special() to clean up after a context switch
> > + * in an RCU read-side critical section and other special cases.
> > + */
> > +void __rcu_read_unlock(void)
> > +{
> > + struct task_struct *t = current;
> > +
> > + barrier(); /* needed if we ever invoke rcu_read_unlock in rcutiny.c */
> > + if (--t->rcu_read_lock_nesting == 0 &&
> > + unlikely(t->rcu_read_unlock_special))

First, thank you for looking this over!!!

> Hrm I think we discussed this in a past life, but would the following
> sequence be possible and correct ?
>
> CPU 0
>
> read t->rcu_read_unlock_special
> interrupt comes in, preempts. sets t->rcu_read_unlock_special
> <preempted>
> <scheduled back>
> iret
> decrement and read t->rcu_read_lock_nesting
> test both old "special" value (which we have locally on the stack) and
> detect that rcu_read_lock_nesting is 0.
>
> We actually missed a reschedule.
>
> I think we might need a barrier() between the t->rcu_read_lock_nesting
> and t->rcu_read_unlock_special reads.

You are correct -- I got too aggressive in eliminating synchronization.

Good catch!!!

I added an ACCESS_ONCE() to the second term of the "if" condition so
that it now reads:

if (--t->rcu_read_lock_nesting == 0 &&
unlikely((ACCESS_ONCE(t->rcu_read_unlock_special)))

This prevents the compiler from reordering because the ACCESS_ONCE()
prohibits accessing t->rcu_read_unlock_special unless the value of
t->rcu_read_lock_nesting is known to be zero.

> We might need to audit
> TREE PREEMPT RCU for the same kind of behavior.

The version of __rcu_read_unlock() in kernel/rcutree_plugin.h is as
follows:

void __rcu_read_unlock(void)
{
struct task_struct *t = current;

barrier(); /* needed if we ever invoke rcu_read_unlock in rcutree.c */
if (--ACCESS_ONCE(t->rcu_read_lock_nesting) == 0 &&
unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
rcu_read_unlock_special(t);
#ifdef CONFIG_PROVE_LOCKING
WARN_ON_ONCE(ACCESS_ONCE(t->rcu_read_lock_nesting) < 0);
#endif /* #ifdef CONFIG_PROVE_LOCKING */
}

The ACCESS_ONCE() calls should cover this. I believe that the first
ACCESS_ONCE() is redundant, and have checking this more closely on my
todo list.

> But I might be (again ?) missing something. I've got the feeling you
> already convinced me that this was OK for some reason, but I trip on
> this every time I read the code.
>
> [...]
>
> > +/*
> > + * Check for a task exiting while in a preemptible -RCU read-side
> > + * critical section, clean up if so. No need to issue warnings,
> > + * as debug_check_no_locks_held() already does this if lockdep
> > + * is enabled.
> > + */
> > +void exit_rcu(void)
> > +{
> > + struct task_struct *t = current;
> > +
> > + if (t->rcu_read_lock_nesting == 0)
> > + return;
> > + t->rcu_read_lock_nesting = 1;
> > + rcu_read_unlock();
> > +}
> > +
>
> The interaction with preemption is unclear here. exit.c disables
> preemption around the call to exit_rcu(), but if, for some reason,
> rcu_read_unlock_special was set earlier by preemption, then the
> rcu_read_unlock() code might block and cause problems.

But rcu_read_unlock_special() does not block. In fact, it disables
interrupts over almost all of its execution. Or am I missing some
subtlety here?

> Maybe we should consider clearing rcu_read_unlock_special here ?

If the task blocked in an RCU read-side critical section just before
exit_rcu() was called, we need to remove the task from the ->blkd_tasks
list. If we fail to do so, we might get a segfault later on. Also,
we do need to handle any RCU_READ_UNLOCK_NEED_QS requests from the RCU
core.

So I really do like the current approach of calling rcu_read_unlock()
to do this sort of cleanup.

Thanx, Paul

> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2010-08-16 19:19:51

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 08/10] rcu: Add a TINY_PREEMPT_RCU

* Paul E. McKenney ([email protected]) wrote:
> On Mon, Aug 16, 2010 at 11:07:37AM -0400, Mathieu Desnoyers wrote:
> > * Paul E. McKenney ([email protected]) wrote:
> > [...]
> > > +
> > > +/*
> > > + * Tiny-preemptible RCU implementation for rcu_read_unlock().
> > > + * Decrement ->rcu_read_lock_nesting. If the result is zero (outermost
> > > + * rcu_read_unlock()) and ->rcu_read_unlock_special is non-zero, then
> > > + * invoke rcu_read_unlock_special() to clean up after a context switch
> > > + * in an RCU read-side critical section and other special cases.
> > > + */
> > > +void __rcu_read_unlock(void)
> > > +{
> > > + struct task_struct *t = current;
> > > +
> > > + barrier(); /* needed if we ever invoke rcu_read_unlock in rcutiny.c */
> > > + if (--t->rcu_read_lock_nesting == 0 &&
> > > + unlikely(t->rcu_read_unlock_special))
>
> First, thank you for looking this over!!!
>
> > Hrm I think we discussed this in a past life, but would the following
> > sequence be possible and correct ?
> >
> > CPU 0
> >
> > read t->rcu_read_unlock_special
> > interrupt comes in, preempts. sets t->rcu_read_unlock_special
> > <preempted>
> > <scheduled back>
> > iret
> > decrement and read t->rcu_read_lock_nesting
> > test both old "special" value (which we have locally on the stack) and
> > detect that rcu_read_lock_nesting is 0.
> >
> > We actually missed a reschedule.
> >
> > I think we might need a barrier() between the t->rcu_read_lock_nesting
> > and t->rcu_read_unlock_special reads.
>
> You are correct -- I got too aggressive in eliminating synchronization.
>
> Good catch!!!
>
> I added an ACCESS_ONCE() to the second term of the "if" condition so
> that it now reads:
>
> if (--t->rcu_read_lock_nesting == 0 &&
> unlikely((ACCESS_ONCE(t->rcu_read_unlock_special)))
>
> This prevents the compiler from reordering because the ACCESS_ONCE()
> prohibits accessing t->rcu_read_unlock_special unless the value of
> t->rcu_read_lock_nesting is known to be zero.

Hrm, --t->rcu_read_lock_nesting does not have any globally visible
side-effect, so the compiler is free to reorder the memory access across
the rcu_read_unlock_special access. I think we need the ACCESS_ONCE()
around the t->rcu_read_lock_nesting access too.

>
> > We might need to audit
> > TREE PREEMPT RCU for the same kind of behavior.
>
> The version of __rcu_read_unlock() in kernel/rcutree_plugin.h is as
> follows:
>
> void __rcu_read_unlock(void)
> {
> struct task_struct *t = current;
>
> barrier(); /* needed if we ever invoke rcu_read_unlock in rcutree.c */
> if (--ACCESS_ONCE(t->rcu_read_lock_nesting) == 0 &&
> unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))

This seem to work because we have:

volatile access (read/update t->rcu_read_lock_nesting)
&& (sequence point)
volatile access (t->rcu_read_unlock_special)

The C standard seems to forbid reordering of volatile accesses across
sequence points, so this should be fine. But it would probably be good
to document this implied ordering explicitly.

> rcu_read_unlock_special(t);
> #ifdef CONFIG_PROVE_LOCKING
> WARN_ON_ONCE(ACCESS_ONCE(t->rcu_read_lock_nesting) < 0);
> #endif /* #ifdef CONFIG_PROVE_LOCKING */
> }
>
> The ACCESS_ONCE() calls should cover this. I believe that the first
> ACCESS_ONCE() is redundant, and have checking this more closely on my
> todo list.

I doubt so, see explanation above.

>
> > But I might be (again ?) missing something. I've got the feeling you
> > already convinced me that this was OK for some reason, but I trip on
> > this every time I read the code.
> >
> > [...]
> >
> > > +/*
> > > + * Check for a task exiting while in a preemptible -RCU read-side
> > > + * critical section, clean up if so. No need to issue warnings,
> > > + * as debug_check_no_locks_held() already does this if lockdep
> > > + * is enabled.
> > > + */
> > > +void exit_rcu(void)
> > > +{
> > > + struct task_struct *t = current;
> > > +
> > > + if (t->rcu_read_lock_nesting == 0)
> > > + return;
> > > + t->rcu_read_lock_nesting = 1;
> > > + rcu_read_unlock();
> > > +}
> > > +
> >
> > The interaction with preemption is unclear here. exit.c disables
> > preemption around the call to exit_rcu(), but if, for some reason,
> > rcu_read_unlock_special was set earlier by preemption, then the
> > rcu_read_unlock() code might block and cause problems.
>
> But rcu_read_unlock_special() does not block. In fact, it disables
> interrupts over almost all of its execution. Or am I missing some
> subtlety here?

I am probably the one who was missing a subtlety about how
rcu_read_unlock_special() works.

>
> > Maybe we should consider clearing rcu_read_unlock_special here ?
>
> If the task blocked in an RCU read-side critical section just before
> exit_rcu() was called, we need to remove the task from the ->blkd_tasks
> list. If we fail to do so, we might get a segfault later on. Also,
> we do need to handle any RCU_READ_UNLOCK_NEED_QS requests from the RCU
> core.
>
> So I really do like the current approach of calling rcu_read_unlock()
> to do this sort of cleanup.

It looks good then, I just wanted to ensure that the side-effects of
calling rcu_read_unlock() in this code path were well-thought.

Thanks,

Mathieu

>
> Thanx, Paul
>
> > Thanks,
> >
> > Mathieu
> >
> > --
> > Mathieu Desnoyers
> > Operating System Efficiency R&D Consultant
> > EfficiOS Inc.
> > http://www.efficios.com
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-08-16 21:32:08

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 08/10] rcu: Add a TINY_PREEMPT_RCU

On Mon, Aug 16, 2010 at 03:19:47PM -0400, Mathieu Desnoyers wrote:
> * Paul E. McKenney ([email protected]) wrote:
> > On Mon, Aug 16, 2010 at 11:07:37AM -0400, Mathieu Desnoyers wrote:
> > > * Paul E. McKenney ([email protected]) wrote:
> > > [...]
> > > > +
> > > > +/*
> > > > + * Tiny-preemptible RCU implementation for rcu_read_unlock().
> > > > + * Decrement ->rcu_read_lock_nesting. If the result is zero (outermost
> > > > + * rcu_read_unlock()) and ->rcu_read_unlock_special is non-zero, then
> > > > + * invoke rcu_read_unlock_special() to clean up after a context switch
> > > > + * in an RCU read-side critical section and other special cases.
> > > > + */
> > > > +void __rcu_read_unlock(void)
> > > > +{
> > > > + struct task_struct *t = current;
> > > > +
> > > > + barrier(); /* needed if we ever invoke rcu_read_unlock in rcutiny.c */
> > > > + if (--t->rcu_read_lock_nesting == 0 &&
> > > > + unlikely(t->rcu_read_unlock_special))
> >
> > First, thank you for looking this over!!!
> >
> > > Hrm I think we discussed this in a past life, but would the following
> > > sequence be possible and correct ?
> > >
> > > CPU 0
> > >
> > > read t->rcu_read_unlock_special
> > > interrupt comes in, preempts. sets t->rcu_read_unlock_special
> > > <preempted>
> > > <scheduled back>
> > > iret
> > > decrement and read t->rcu_read_lock_nesting
> > > test both old "special" value (which we have locally on the stack) and
> > > detect that rcu_read_lock_nesting is 0.
> > >
> > > We actually missed a reschedule.
> > >
> > > I think we might need a barrier() between the t->rcu_read_lock_nesting
> > > and t->rcu_read_unlock_special reads.
> >
> > You are correct -- I got too aggressive in eliminating synchronization.
> >
> > Good catch!!!
> >
> > I added an ACCESS_ONCE() to the second term of the "if" condition so
> > that it now reads:
> >
> > if (--t->rcu_read_lock_nesting == 0 &&
> > unlikely((ACCESS_ONCE(t->rcu_read_unlock_special)))
> >
> > This prevents the compiler from reordering because the ACCESS_ONCE()
> > prohibits accessing t->rcu_read_unlock_special unless the value of
> > t->rcu_read_lock_nesting is known to be zero.
>
> Hrm, --t->rcu_read_lock_nesting does not have any globally visible
> side-effect, so the compiler is free to reorder the memory access across
> the rcu_read_unlock_special access. I think we need the ACCESS_ONCE()
> around the t->rcu_read_lock_nesting access too.

Indeed, it is free to reorder that access. This has the effect of
extending the scope of the RCU read-side critical section, which is
harmless as long as it doesn't pull a lock or some such into it.

> > > We might need to audit
> > > TREE PREEMPT RCU for the same kind of behavior.
> >
> > The version of __rcu_read_unlock() in kernel/rcutree_plugin.h is as
> > follows:
> >
> > void __rcu_read_unlock(void)
> > {
> > struct task_struct *t = current;
> >
> > barrier(); /* needed if we ever invoke rcu_read_unlock in rcutree.c */
> > if (--ACCESS_ONCE(t->rcu_read_lock_nesting) == 0 &&
> > unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
>
> This seem to work because we have:
>
> volatile access (read/update t->rcu_read_lock_nesting)
> && (sequence point)
> volatile access (t->rcu_read_unlock_special)

Yep!!! ;-)

> The C standard seems to forbid reordering of volatile accesses across
> sequence points, so this should be fine. But it would probably be good
> to document this implied ordering explicitly.

I should probably review commenting globally, and this might be one
place needing help.

> > rcu_read_unlock_special(t);
> > #ifdef CONFIG_PROVE_LOCKING
> > WARN_ON_ONCE(ACCESS_ONCE(t->rcu_read_lock_nesting) < 0);
> > #endif /* #ifdef CONFIG_PROVE_LOCKING */
> > }
> >
> > The ACCESS_ONCE() calls should cover this. I believe that the first
> > ACCESS_ONCE() is redundant, and have checking this more closely on my
> > todo list.
>
> I doubt so, see explanation above.

Ditto! ;-)

> > > But I might be (again ?) missing something. I've got the feeling you
> > > already convinced me that this was OK for some reason, but I trip on
> > > this every time I read the code.
> > >
> > > [...]
> > >
> > > > +/*
> > > > + * Check for a task exiting while in a preemptible -RCU read-side
> > > > + * critical section, clean up if so. No need to issue warnings,
> > > > + * as debug_check_no_locks_held() already does this if lockdep
> > > > + * is enabled.
> > > > + */
> > > > +void exit_rcu(void)
> > > > +{
> > > > + struct task_struct *t = current;
> > > > +
> > > > + if (t->rcu_read_lock_nesting == 0)
> > > > + return;
> > > > + t->rcu_read_lock_nesting = 1;
> > > > + rcu_read_unlock();
> > > > +}
> > > > +
> > >
> > > The interaction with preemption is unclear here. exit.c disables
> > > preemption around the call to exit_rcu(), but if, for some reason,
> > > rcu_read_unlock_special was set earlier by preemption, then the
> > > rcu_read_unlock() code might block and cause problems.
> >
> > But rcu_read_unlock_special() does not block. In fact, it disables
> > interrupts over almost all of its execution. Or am I missing some
> > subtlety here?
>
> I am probably the one who was missing a subtlety about how
> rcu_read_unlock_special() works.
>
> >
> > > Maybe we should consider clearing rcu_read_unlock_special here ?
> >
> > If the task blocked in an RCU read-side critical section just before
> > exit_rcu() was called, we need to remove the task from the ->blkd_tasks
> > list. If we fail to do so, we might get a segfault later on. Also,
> > we do need to handle any RCU_READ_UNLOCK_NEED_QS requests from the RCU
> > core.
> >
> > So I really do like the current approach of calling rcu_read_unlock()
> > to do this sort of cleanup.
>
> It looks good then, I just wanted to ensure that the side-effects of
> calling rcu_read_unlock() in this code path were well-thought.

Long ago on the first RCU priority-boosting implementation I tried doing
the rcu_read_unlock() by hand. The unhappy lessons learned caused me
to just use rcu_read_unlock() when I encountered similar situations
later on. ;-)

Thanx, Paul

2010-08-16 21:41:28

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 08/10] rcu: Add a TINY_PREEMPT_RCU

* Paul E. McKenney ([email protected]) wrote:
> On Mon, Aug 16, 2010 at 03:19:47PM -0400, Mathieu Desnoyers wrote:
> > * Paul E. McKenney ([email protected]) wrote:
> > > On Mon, Aug 16, 2010 at 11:07:37AM -0400, Mathieu Desnoyers wrote:
> > > > * Paul E. McKenney ([email protected]) wrote:
> > > > [...]
> > > > > +
> > > > > +/*
> > > > > + * Tiny-preemptible RCU implementation for rcu_read_unlock().
> > > > > + * Decrement ->rcu_read_lock_nesting. If the result is zero (outermost
> > > > > + * rcu_read_unlock()) and ->rcu_read_unlock_special is non-zero, then
> > > > > + * invoke rcu_read_unlock_special() to clean up after a context switch
> > > > > + * in an RCU read-side critical section and other special cases.
> > > > > + */
> > > > > +void __rcu_read_unlock(void)
> > > > > +{
> > > > > + struct task_struct *t = current;
> > > > > +
> > > > > + barrier(); /* needed if we ever invoke rcu_read_unlock in rcutiny.c */
> > > > > + if (--t->rcu_read_lock_nesting == 0 &&
> > > > > + unlikely(t->rcu_read_unlock_special))
> > >
> > > First, thank you for looking this over!!!
> > >
> > > > Hrm I think we discussed this in a past life, but would the following
> > > > sequence be possible and correct ?
> > > >
> > > > CPU 0
> > > >
> > > > read t->rcu_read_unlock_special
> > > > interrupt comes in, preempts. sets t->rcu_read_unlock_special
> > > > <preempted>
> > > > <scheduled back>
> > > > iret
> > > > decrement and read t->rcu_read_lock_nesting
> > > > test both old "special" value (which we have locally on the stack) and
> > > > detect that rcu_read_lock_nesting is 0.
> > > >
> > > > We actually missed a reschedule.
> > > >
> > > > I think we might need a barrier() between the t->rcu_read_lock_nesting
> > > > and t->rcu_read_unlock_special reads.
> > >
> > > You are correct -- I got too aggressive in eliminating synchronization.
> > >
> > > Good catch!!!
> > >
> > > I added an ACCESS_ONCE() to the second term of the "if" condition so
> > > that it now reads:
> > >
> > > if (--t->rcu_read_lock_nesting == 0 &&
> > > unlikely((ACCESS_ONCE(t->rcu_read_unlock_special)))
> > >
> > > This prevents the compiler from reordering because the ACCESS_ONCE()
> > > prohibits accessing t->rcu_read_unlock_special unless the value of
> > > t->rcu_read_lock_nesting is known to be zero.
> >
> > Hrm, --t->rcu_read_lock_nesting does not have any globally visible
> > side-effect, so the compiler is free to reorder the memory access across
> > the rcu_read_unlock_special access. I think we need the ACCESS_ONCE()
> > around the t->rcu_read_lock_nesting access too.
>
> Indeed, it is free to reorder that access. This has the effect of
> extending the scope of the RCU read-side critical section, which is
> harmless as long as it doesn't pull a lock or some such into it.
>

So what happens if we get:

CPU 0

read t->rcu_read_lock_nesting
check if equals to 1
read t->rcu_read_unlock_special
interrupt comes in, preempts. sets t->rcu_read_unlock_special
<preempted>
<scheduled back>
iret
decrement t->rcu_read_lock_nesting
test rcu_read_unlock_special value (read prior to interrupt)
-> fails to notice the preemption that came in after the
rcu_read_unlock_special read.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-08-16 21:56:00

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 08/10] rcu: Add a TINY_PREEMPT_RCU

On Mon, Aug 16, 2010 at 05:41:23PM -0400, Mathieu Desnoyers wrote:
> * Paul E. McKenney ([email protected]) wrote:
> > On Mon, Aug 16, 2010 at 03:19:47PM -0400, Mathieu Desnoyers wrote:
> > > * Paul E. McKenney ([email protected]) wrote:
> > > > On Mon, Aug 16, 2010 at 11:07:37AM -0400, Mathieu Desnoyers wrote:
> > > > > * Paul E. McKenney ([email protected]) wrote:
> > > > > [...]
> > > > > > +
> > > > > > +/*
> > > > > > + * Tiny-preemptible RCU implementation for rcu_read_unlock().
> > > > > > + * Decrement ->rcu_read_lock_nesting. If the result is zero (outermost
> > > > > > + * rcu_read_unlock()) and ->rcu_read_unlock_special is non-zero, then
> > > > > > + * invoke rcu_read_unlock_special() to clean up after a context switch
> > > > > > + * in an RCU read-side critical section and other special cases.
> > > > > > + */
> > > > > > +void __rcu_read_unlock(void)
> > > > > > +{
> > > > > > + struct task_struct *t = current;
> > > > > > +
> > > > > > + barrier(); /* needed if we ever invoke rcu_read_unlock in rcutiny.c */
> > > > > > + if (--t->rcu_read_lock_nesting == 0 &&
> > > > > > + unlikely(t->rcu_read_unlock_special))
> > > >
> > > > First, thank you for looking this over!!!
> > > >
> > > > > Hrm I think we discussed this in a past life, but would the following
> > > > > sequence be possible and correct ?
> > > > >
> > > > > CPU 0
> > > > >
> > > > > read t->rcu_read_unlock_special
> > > > > interrupt comes in, preempts. sets t->rcu_read_unlock_special
> > > > > <preempted>
> > > > > <scheduled back>
> > > > > iret
> > > > > decrement and read t->rcu_read_lock_nesting
> > > > > test both old "special" value (which we have locally on the stack) and
> > > > > detect that rcu_read_lock_nesting is 0.
> > > > >
> > > > > We actually missed a reschedule.
> > > > >
> > > > > I think we might need a barrier() between the t->rcu_read_lock_nesting
> > > > > and t->rcu_read_unlock_special reads.
> > > >
> > > > You are correct -- I got too aggressive in eliminating synchronization.
> > > >
> > > > Good catch!!!
> > > >
> > > > I added an ACCESS_ONCE() to the second term of the "if" condition so
> > > > that it now reads:
> > > >
> > > > if (--t->rcu_read_lock_nesting == 0 &&
> > > > unlikely((ACCESS_ONCE(t->rcu_read_unlock_special)))
> > > >
> > > > This prevents the compiler from reordering because the ACCESS_ONCE()
> > > > prohibits accessing t->rcu_read_unlock_special unless the value of
> > > > t->rcu_read_lock_nesting is known to be zero.
> > >
> > > Hrm, --t->rcu_read_lock_nesting does not have any globally visible
> > > side-effect, so the compiler is free to reorder the memory access across
> > > the rcu_read_unlock_special access. I think we need the ACCESS_ONCE()
> > > around the t->rcu_read_lock_nesting access too.
> >
> > Indeed, it is free to reorder that access. This has the effect of
> > extending the scope of the RCU read-side critical section, which is
> > harmless as long as it doesn't pull a lock or some such into it.
> >
>
> So what happens if we get:
>
> CPU 0
>
> read t->rcu_read_lock_nesting
> check if equals to 1
> read t->rcu_read_unlock_special
> interrupt comes in, preempts. sets t->rcu_read_unlock_special
> <preempted>
> <scheduled back>
> iret
> decrement t->rcu_read_lock_nesting

Moving this down past the check of t->rcu_read_lock_special (which is
now covered by ACCESS_ONCE()) would violate the C standard, as it would
be equivalent to moving a volatile up past a sequence point.

Thanx, Paul

> test rcu_read_unlock_special value (read prior to interrupt)
> -> fails to notice the preemption that came in after the
> rcu_read_unlock_special read.
>
> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com

2010-08-16 22:12:13

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 08/10] rcu: Add a TINY_PREEMPT_RCU

* Paul E. McKenney ([email protected]) wrote:
> On Mon, Aug 16, 2010 at 05:41:23PM -0400, Mathieu Desnoyers wrote:
> > * Paul E. McKenney ([email protected]) wrote:
> > > On Mon, Aug 16, 2010 at 03:19:47PM -0400, Mathieu Desnoyers wrote:
> > > > * Paul E. McKenney ([email protected]) wrote:
> > > > > On Mon, Aug 16, 2010 at 11:07:37AM -0400, Mathieu Desnoyers wrote:
> > > > > > * Paul E. McKenney ([email protected]) wrote:
> > > > > > [...]
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * Tiny-preemptible RCU implementation for rcu_read_unlock().
> > > > > > > + * Decrement ->rcu_read_lock_nesting. If the result is zero (outermost
> > > > > > > + * rcu_read_unlock()) and ->rcu_read_unlock_special is non-zero, then
> > > > > > > + * invoke rcu_read_unlock_special() to clean up after a context switch
> > > > > > > + * in an RCU read-side critical section and other special cases.
> > > > > > > + */
> > > > > > > +void __rcu_read_unlock(void)
> > > > > > > +{
> > > > > > > + struct task_struct *t = current;
> > > > > > > +
> > > > > > > + barrier(); /* needed if we ever invoke rcu_read_unlock in rcutiny.c */
> > > > > > > + if (--t->rcu_read_lock_nesting == 0 &&
> > > > > > > + unlikely(t->rcu_read_unlock_special))
> > > > >
> > > > > First, thank you for looking this over!!!
> > > > >
> > > > > > Hrm I think we discussed this in a past life, but would the following
> > > > > > sequence be possible and correct ?
> > > > > >
> > > > > > CPU 0
> > > > > >
> > > > > > read t->rcu_read_unlock_special
> > > > > > interrupt comes in, preempts. sets t->rcu_read_unlock_special
> > > > > > <preempted>
> > > > > > <scheduled back>
> > > > > > iret
> > > > > > decrement and read t->rcu_read_lock_nesting
> > > > > > test both old "special" value (which we have locally on the stack) and
> > > > > > detect that rcu_read_lock_nesting is 0.
> > > > > >
> > > > > > We actually missed a reschedule.
> > > > > >
> > > > > > I think we might need a barrier() between the t->rcu_read_lock_nesting
> > > > > > and t->rcu_read_unlock_special reads.
> > > > >
> > > > > You are correct -- I got too aggressive in eliminating synchronization.
> > > > >
> > > > > Good catch!!!
> > > > >
> > > > > I added an ACCESS_ONCE() to the second term of the "if" condition so
> > > > > that it now reads:
> > > > >
> > > > > if (--t->rcu_read_lock_nesting == 0 &&
> > > > > unlikely((ACCESS_ONCE(t->rcu_read_unlock_special)))
> > > > >
> > > > > This prevents the compiler from reordering because the ACCESS_ONCE()
> > > > > prohibits accessing t->rcu_read_unlock_special unless the value of
> > > > > t->rcu_read_lock_nesting is known to be zero.
> > > >
> > > > Hrm, --t->rcu_read_lock_nesting does not have any globally visible
> > > > side-effect, so the compiler is free to reorder the memory access across
> > > > the rcu_read_unlock_special access. I think we need the ACCESS_ONCE()
> > > > around the t->rcu_read_lock_nesting access too.
> > >
> > > Indeed, it is free to reorder that access. This has the effect of
> > > extending the scope of the RCU read-side critical section, which is
> > > harmless as long as it doesn't pull a lock or some such into it.
> > >
> >
> > So what happens if we get:
> >
> > CPU 0
> >
> > read t->rcu_read_lock_nesting
> > check if equals to 1
> > read t->rcu_read_unlock_special
> > interrupt comes in, preempts. sets t->rcu_read_unlock_special
> > <preempted>
> > <scheduled back>
> > iret
> > decrement t->rcu_read_lock_nesting
>
> Moving this down past the check of t->rcu_read_lock_special (which is
> now covered by ACCESS_ONCE()) would violate the C standard, as it would
> be equivalent to moving a volatile up past a sequence point.

Hrm, I'm not quite convinced yet. I am not concerned about gcc moving
the volatile access prior to the sequence point (as you say, this is
forbidden by the C standard), but rather that:

--(t->rcu_read_lock_nesting)

could be split in two distinct operations:

read t->rcu_read_lock_nesting
decrement t->rcu_read_lock_nesting

Note that in order to know the result required to pass the sequence
point "&&" (the test), we only need to perform the read, not the
decrement. AFAIU, gcc would be in its rights to move the
t->rcu_read_lock_nesting update after the volatile access.

Thanks,

Mathieu

>
> Thanx, Paul
>
> > test rcu_read_unlock_special value (read prior to interrupt)
> > -> fails to notice the preemption that came in after the
> > rcu_read_unlock_special read.
> >
> > Thanks,
> >
> > Mathieu
> >
> > --
> > Mathieu Desnoyers
> > Operating System Efficiency R&D Consultant
> > EfficiOS Inc.
> > http://www.efficios.com

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-08-16 22:24:08

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 08/10] rcu: Add a TINY_PREEMPT_RCU

On Mon, Aug 16, 2010 at 06:07:05PM -0400, Mathieu Desnoyers wrote:
> * Paul E. McKenney ([email protected]) wrote:
> > On Mon, Aug 16, 2010 at 05:41:23PM -0400, Mathieu Desnoyers wrote:
> > > * Paul E. McKenney ([email protected]) wrote:
> > > > On Mon, Aug 16, 2010 at 03:19:47PM -0400, Mathieu Desnoyers wrote:
> > > > > * Paul E. McKenney ([email protected]) wrote:
> > > > > > On Mon, Aug 16, 2010 at 11:07:37AM -0400, Mathieu Desnoyers wrote:
> > > > > > > * Paul E. McKenney ([email protected]) wrote:
> > > > > > > [...]
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * Tiny-preemptible RCU implementation for rcu_read_unlock().
> > > > > > > > + * Decrement ->rcu_read_lock_nesting. If the result is zero (outermost
> > > > > > > > + * rcu_read_unlock()) and ->rcu_read_unlock_special is non-zero, then
> > > > > > > > + * invoke rcu_read_unlock_special() to clean up after a context switch
> > > > > > > > + * in an RCU read-side critical section and other special cases.
> > > > > > > > + */
> > > > > > > > +void __rcu_read_unlock(void)
> > > > > > > > +{
> > > > > > > > + struct task_struct *t = current;
> > > > > > > > +
> > > > > > > > + barrier(); /* needed if we ever invoke rcu_read_unlock in rcutiny.c */
> > > > > > > > + if (--t->rcu_read_lock_nesting == 0 &&
> > > > > > > > + unlikely(t->rcu_read_unlock_special))
> > > > > >
> > > > > > First, thank you for looking this over!!!
> > > > > >
> > > > > > > Hrm I think we discussed this in a past life, but would the following
> > > > > > > sequence be possible and correct ?
> > > > > > >
> > > > > > > CPU 0
> > > > > > >
> > > > > > > read t->rcu_read_unlock_special
> > > > > > > interrupt comes in, preempts. sets t->rcu_read_unlock_special
> > > > > > > <preempted>
> > > > > > > <scheduled back>
> > > > > > > iret
> > > > > > > decrement and read t->rcu_read_lock_nesting
> > > > > > > test both old "special" value (which we have locally on the stack) and
> > > > > > > detect that rcu_read_lock_nesting is 0.
> > > > > > >
> > > > > > > We actually missed a reschedule.
> > > > > > >
> > > > > > > I think we might need a barrier() between the t->rcu_read_lock_nesting
> > > > > > > and t->rcu_read_unlock_special reads.
> > > > > >
> > > > > > You are correct -- I got too aggressive in eliminating synchronization.
> > > > > >
> > > > > > Good catch!!!
> > > > > >
> > > > > > I added an ACCESS_ONCE() to the second term of the "if" condition so
> > > > > > that it now reads:
> > > > > >
> > > > > > if (--t->rcu_read_lock_nesting == 0 &&
> > > > > > unlikely((ACCESS_ONCE(t->rcu_read_unlock_special)))
> > > > > >
> > > > > > This prevents the compiler from reordering because the ACCESS_ONCE()
> > > > > > prohibits accessing t->rcu_read_unlock_special unless the value of
> > > > > > t->rcu_read_lock_nesting is known to be zero.
> > > > >
> > > > > Hrm, --t->rcu_read_lock_nesting does not have any globally visible
> > > > > side-effect, so the compiler is free to reorder the memory access across
> > > > > the rcu_read_unlock_special access. I think we need the ACCESS_ONCE()
> > > > > around the t->rcu_read_lock_nesting access too.
> > > >
> > > > Indeed, it is free to reorder that access. This has the effect of
> > > > extending the scope of the RCU read-side critical section, which is
> > > > harmless as long as it doesn't pull a lock or some such into it.
> > > >
> > >
> > > So what happens if we get:
> > >
> > > CPU 0
> > >
> > > read t->rcu_read_lock_nesting
> > > check if equals to 1
> > > read t->rcu_read_unlock_special
> > > interrupt comes in, preempts. sets t->rcu_read_unlock_special
> > > <preempted>
> > > <scheduled back>
> > > iret
> > > decrement t->rcu_read_lock_nesting
> >
> > Moving this down past the check of t->rcu_read_lock_special (which is
> > now covered by ACCESS_ONCE()) would violate the C standard, as it would
> > be equivalent to moving a volatile up past a sequence point.
>
> Hrm, I'm not quite convinced yet. I am not concerned about gcc moving
> the volatile access prior to the sequence point (as you say, this is
> forbidden by the C standard), but rather that:
>
> --(t->rcu_read_lock_nesting)
>
> could be split in two distinct operations:
>
> read t->rcu_read_lock_nesting
> decrement t->rcu_read_lock_nesting
>
> Note that in order to know the result required to pass the sequence
> point "&&" (the test), we only need to perform the read, not the
> decrement. AFAIU, gcc would be in its rights to move the
> t->rcu_read_lock_nesting update after the volatile access.

I will run this by some compiler experts.

Thanx, Paul

2010-08-17 09:32:10

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 08/10] rcu: Add a TINY_PREEMPT_RCU

On 08/17/2010 06:24 AM, Paul E. McKenney wrote:

>> --(t->rcu_read_lock_nesting)
>>
>> could be split in two distinct operations:
>>
>> read t->rcu_read_lock_nesting
>> decrement t->rcu_read_lock_nesting
>>
>> Note that in order to know the result required to pass the sequence
>> point "&&" (the test), we only need to perform the read, not the
>> decrement. AFAIU, gcc would be in its rights to move the
>> t->rcu_read_lock_nesting update after the volatile access.
>
> I will run this by some compiler experts.
>

We can just use "read and decrement statements" instead of "--" to
avoid dependency from compilers.

2010-08-17 13:27:24

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 08/10] rcu: Add a TINY_PREEMPT_RCU

On Mon, 2010-08-16 at 18:07 -0400, Mathieu Desnoyers wrote:

> > Moving this down past the check of t->rcu_read_lock_special (which is
> > now covered by ACCESS_ONCE()) would violate the C standard, as it would
> > be equivalent to moving a volatile up past a sequence point.
>
> Hrm, I'm not quite convinced yet. I am not concerned about gcc moving
> the volatile access prior to the sequence point (as you say, this is
> forbidden by the C standard), but rather that:
>
> --(t->rcu_read_lock_nesting)
>
> could be split in two distinct operations:
>
> read t->rcu_read_lock_nesting
> decrement t->rcu_read_lock_nesting
>
> Note that in order to know the result required to pass the sequence
> point "&&" (the test), we only need to perform the read, not the
> decrement. AFAIU, gcc would be in its rights to move the
> t->rcu_read_lock_nesting update after the volatile access.
>

If we are this concerned, what about just doing:

--t->rcu_read_lock_nesting;
if (ACCESS_ONCE(t->rcu_read_lock_nesting) == 0 &&
unlikely((ACCESS_ONCE(t->rcu_read_unlock_special)))

-- Steve

2010-08-17 14:16:46

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 08/10] rcu: Add a TINY_PREEMPT_RCU

* Steven Rostedt ([email protected]) wrote:
> On Mon, 2010-08-16 at 18:07 -0400, Mathieu Desnoyers wrote:
>
> > > Moving this down past the check of t->rcu_read_lock_special (which is
> > > now covered by ACCESS_ONCE()) would violate the C standard, as it would
> > > be equivalent to moving a volatile up past a sequence point.
> >
> > Hrm, I'm not quite convinced yet. I am not concerned about gcc moving
> > the volatile access prior to the sequence point (as you say, this is
> > forbidden by the C standard), but rather that:
> >
> > --(t->rcu_read_lock_nesting)
> >
> > could be split in two distinct operations:
> >
> > read t->rcu_read_lock_nesting
> > decrement t->rcu_read_lock_nesting
> >
> > Note that in order to know the result required to pass the sequence
> > point "&&" (the test), we only need to perform the read, not the
> > decrement. AFAIU, gcc would be in its rights to move the
> > t->rcu_read_lock_nesting update after the volatile access.
> >
>
> If we are this concerned, what about just doing:
>
> --t->rcu_read_lock_nesting;
> if (ACCESS_ONCE(t->rcu_read_lock_nesting) == 0 &&
> unlikely((ACCESS_ONCE(t->rcu_read_unlock_special)))

I'd be concerned by the fact that there is no strong ordering guarantee
that the non-volatile --t->rcu_read_lock_nesting is done before
ACCESS_ONCE(t->rcu_read_unlock_special).

My concern is that the compiler might be allowed to turn your code into:

if (ACCESS_ONCE(t->rcu_read_lock_nesting) == 1 &&
unlikely((ACCESS_ONCE(t->rcu_read_unlock_special))) {
--t->rcu_read_lock_nesting;
do_something();
} else
--t->rcu_read_lock_nesting;

So whether or not this could be done by the compiler depending on the
various definitions of volatile, I strongly recommend against using
volatile accesses to provide compiler ordering guarantees. It is bad in
terms of code documentation (we don't document _what_ is ordered) and it
is also bad because the volatile ordering guarantees seems to be
very easy to misinterpret.

ACCESS_ONCE() should be only that: a macro that tells the access should
be performed only once. Why are we suddenly presuming it should have any
ordering semantic ?

It should be totally valid to create arch-specific ACCESS_ONCE() macros
that only perform the "read once", without the ordering guarantees
provided by the current ACCESS_ONCE() "volatile" implementation. The
following code is only for unsigned long, but you get the idea: there is
no volatile at all, and I ensure that "val" is only read once by using
the "+m" (val) constraint, telling the compiler (falsely) that the
assembler is modifying the value (it therefore has a side-effect), so
gcc won't be tempted to re-issue the assembly statement.

static inline unsigned long arch_access_once(unsigned long val)
{
unsigned long ret;

#if (__BITS_PER_LONG == 32)
asm ("movl %1,%0": "=r" (ret), "+m" (val));
#else
asm ("movq %1,%0": "=r" (ret), "+m" (val));
#endif
}

Thanks,

Mathieu

>
> -- Steve
>
>

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-08-17 14:36:35

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 08/10] rcu: Add a TINY_PREEMPT_RCU

On Tue, Aug 17, 2010 at 05:36:13PM +0800, Lai Jiangshan wrote:
> On 08/17/2010 06:24 AM, Paul E. McKenney wrote:
>
> >> --(t->rcu_read_lock_nesting)
> >>
> >> could be split in two distinct operations:
> >>
> >> read t->rcu_read_lock_nesting
> >> decrement t->rcu_read_lock_nesting
> >>
> >> Note that in order to know the result required to pass the sequence
> >> point "&&" (the test), we only need to perform the read, not the
> >> decrement. AFAIU, gcc would be in its rights to move the
> >> t->rcu_read_lock_nesting update after the volatile access.
> >
> > I will run this by some compiler experts.
>
> We can just use "read and decrement statements" instead of "--" to
> avoid dependency from compilers.

You mean something like local_add_return()? This turns into
atomic_add_return() on many platforms, including x86 as it turns out,
so I am reluctant to use it.

If you had something else in mind, please don't keep it a secret!

Thanx, Paul

2010-08-17 14:54:45

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 08/10] rcu: Add a TINY_PREEMPT_RCU

On Tue, 2010-08-17 at 10:16 -0400, Mathieu Desnoyers wrote:
> * Steven Rostedt ([email protected]) wrote:

> > If we are this concerned, what about just doing:
> >
> > --t->rcu_read_lock_nesting;
> > if (ACCESS_ONCE(t->rcu_read_lock_nesting) == 0 &&
> > unlikely((ACCESS_ONCE(t->rcu_read_unlock_special)))
>
> I'd be concerned by the fact that there is no strong ordering guarantee
> that the non-volatile --t->rcu_read_lock_nesting is done before
> ACCESS_ONCE(t->rcu_read_unlock_special).
>
> My concern is that the compiler might be allowed to turn your code into:
>
> if (ACCESS_ONCE(t->rcu_read_lock_nesting) == 1 &&
> unlikely((ACCESS_ONCE(t->rcu_read_unlock_special))) {
> --t->rcu_read_lock_nesting;
> do_something();
> } else
> --t->rcu_read_lock_nesting;


That just seems to break all sorts of rules.

>
> So whether or not this could be done by the compiler depending on the
> various definitions of volatile, I strongly recommend against using
> volatile accesses to provide compiler ordering guarantees. It is bad in
> terms of code documentation (we don't document _what_ is ordered) and it
> is also bad because the volatile ordering guarantees seems to be
> very easy to misinterpret.

Yes, volatile does not guarantee ordering of other accesses, but it
should at least guarantee ordering of access to the thing that is
volatile.

b++;
a++;
c = ACCESS_ONCE(a);

'b++' can be moved to anywhere. But I'm pretty sure the compiler is not
allowed to move the 'a++' after the ACCESS_ONCE(a) because it is the
thing that is volatile. We are telling the compiler that 'a' can change
outside our scope, which to me is the same as doing:

a++;
c = some_global_function(&a);

Where, the compiler does not know the result of 'a' and can not move the
'a++'.


Maybe I'm wrong, and need to verify this with a compiler expert. But
what's the use of volatile if it can't protect the ordering of what is
volatile from itself.

>
> ACCESS_ONCE() should be only that: a macro that tells the access should
> be performed only once. Why are we suddenly presuming it should have any
> ordering semantic ?

Only ordering with the variable that is volatile. It has no ordering to
any other variable.

>
> It should be totally valid to create arch-specific ACCESS_ONCE() macros
> that only perform the "read once", without the ordering guarantees
> provided by the current ACCESS_ONCE() "volatile" implementation. The
> following code is only for unsigned long, but you get the idea: there is
> no volatile at all, and I ensure that "val" is only read once by using
> the "+m" (val) constraint, telling the compiler (falsely) that the
> assembler is modifying the value (it therefore has a side-effect), so
> gcc won't be tempted to re-issue the assembly statement.
>
> static inline unsigned long arch_access_once(unsigned long val)
> {
> unsigned long ret;
>
> #if (__BITS_PER_LONG == 32)
> asm ("movl %1,%0": "=r" (ret), "+m" (val));
> #else
> asm ("movq %1,%0": "=r" (ret), "+m" (val));
> #endif
> }

Heck, this is too much micro optimization. We could just be safe and do
the:
--t->rcu_read_lock_nesting;
barrier();
if (ACCESS_ONCE(t->rcu_read_lock_nesting) == 0 &&
unlikely((ACCESS_ONCE(t->rcu_read_unlock_special)))

And be done with it.

-- Steve

2010-08-17 16:00:38

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 08/10] rcu: Add a TINY_PREEMPT_RCU

* Steven Rostedt ([email protected]) wrote:
> On Tue, 2010-08-17 at 10:16 -0400, Mathieu Desnoyers wrote:
> > * Steven Rostedt ([email protected]) wrote:
>
> > > If we are this concerned, what about just doing:
> > >
> > > --t->rcu_read_lock_nesting;
> > > if (ACCESS_ONCE(t->rcu_read_lock_nesting) == 0 &&
> > > unlikely((ACCESS_ONCE(t->rcu_read_unlock_special)))
> >
> > I'd be concerned by the fact that there is no strong ordering guarantee
> > that the non-volatile --t->rcu_read_lock_nesting is done before
> > ACCESS_ONCE(t->rcu_read_unlock_special).
> >
> > My concern is that the compiler might be allowed to turn your code into:
> >
> > if (ACCESS_ONCE(t->rcu_read_lock_nesting) == 1 &&
> > unlikely((ACCESS_ONCE(t->rcu_read_unlock_special))) {
> > --t->rcu_read_lock_nesting;
> > do_something();
> > } else
> > --t->rcu_read_lock_nesting;
>
>
> That just seems to break all sorts of rules.

Is there a rule stating that a non-volatile access is ensured to be in
issued in program order wrt other accesses to that same variable ?

The stardard discourages compilers to do any kind of optimization when
volatile is detected on a variable
(http://gcc.gnu.org/onlinedocs/gcc/Volatiles.html), but that's a very
weak guarantee I would not like to rely on.

The only strong guarantee it provides is:
"The minimum either standard specifies is that at a sequence point all
previous accesses to volatile objects have stabilized and no subsequent
accesses have occurred."

So the question here is: given that the "--t->rcu_read_lock_nesting"
access is not marked volatile, but the following
"ACCESS_ONCE(t->rcu_read_lock_nesting)" read is volatile, should we
consider than "--t->rcu_read_lock_nesting" apply to a volatile object or
not ?

This is the kind of grey zone I dislike.

>
> >
> > So whether or not this could be done by the compiler depending on the
> > various definitions of volatile, I strongly recommend against using
> > volatile accesses to provide compiler ordering guarantees. It is bad in
> > terms of code documentation (we don't document _what_ is ordered) and it
> > is also bad because the volatile ordering guarantees seems to be
> > very easy to misinterpret.
>
> Yes, volatile does not guarantee ordering of other accesses, but it
> should at least guarantee ordering of access to the thing that is
> volatile.
>
> b++;
> a++;
> c = ACCESS_ONCE(a);
>
> 'b++' can be moved to anywhere. But I'm pretty sure the compiler is not
> allowed to move the 'a++' after the ACCESS_ONCE(a) because it is the
> thing that is volatile.

AFAIU, the standard only requires ordering between volatile "objects".
So when we cast non-volatile objects to volatile, I assume the
non-volatile accesses apply only to the non-volatile version of the
object. So volatile ordering guarantees would not apply to "a".

> We are telling the compiler that 'a' can change
> outside our scope, which to me is the same as doing:
>
> a++;
> c = some_global_function(&a);
>
> Where, the compiler does not know the result of 'a' and can not move the
> 'a++'.

The code here is different: calling an external function is equivalent
to put a full compiler memory barrier ("memory" clobber). Volatile is
quite different from that; the compiler is only told to keep ordering
between volatile accesses, and to try not to optimize the volatile
access per se.

>
>
> Maybe I'm wrong, and need to verify this with a compiler expert. But
> what's the use of volatile if it can't protect the ordering of what is
> volatile from itself.

I'm concerned about the fact that volatile seems to have been initially
designed to apply to a variable declaration (object) rather than a
specific access through a cast. So I really wonder if the non-volatile
object accesses has the same guarantees as the accesses performed on its
volatile cast version.

>
> >
> > ACCESS_ONCE() should be only that: a macro that tells the access should
> > be performed only once. Why are we suddenly presuming it should have any
> > ordering semantic ?
>
> Only ordering with the variable that is volatile. It has no ordering to
> any other variable.

This is not quite correct. Volatile orders with respect to _all_ other
volatile accesses.

What I am pointing out here is that the macro name "ACCESS_ONCE()" does
not convey any ordering semantic, and should not.

>
> >
> > It should be totally valid to create arch-specific ACCESS_ONCE() macros
> > that only perform the "read once", without the ordering guarantees
> > provided by the current ACCESS_ONCE() "volatile" implementation. The
> > following code is only for unsigned long, but you get the idea: there is
> > no volatile at all, and I ensure that "val" is only read once by using
> > the "+m" (val) constraint, telling the compiler (falsely) that the
> > assembler is modifying the value (it therefore has a side-effect), so
> > gcc won't be tempted to re-issue the assembly statement.
> >
> > static inline unsigned long arch_access_once(unsigned long val)
> > {
> > unsigned long ret;
> >
> > #if (__BITS_PER_LONG == 32)
> > asm ("movl %1,%0": "=r" (ret), "+m" (val));
> > #else
> > asm ("movq %1,%0": "=r" (ret), "+m" (val));
> > #endif
> > }
>
> Heck, this is too much micro optimization. We could just be safe and do
> the:
> --t->rcu_read_lock_nesting;
> barrier();
> if (ACCESS_ONCE(t->rcu_read_lock_nesting) == 0 &&
> unlikely((ACCESS_ONCE(t->rcu_read_unlock_special)))
>
> And be done with it.

Then we could go for the simpler:

--t->rcu_read_lock_nesting;
barrier();
if (t->rcu_read_lock_nesting == 0 &&
unlikely((t->rcu_read_unlock_special))

Which puts a constraint across all memory accesses. I'd be fine with
that if you are afraid of too much micro-optimization (e.g. my
barrier2(a, b) proposal).

Thanks,

Mathieu


>
> -- Steve
>
>

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-08-17 16:04:56

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 08/10] rcu: Add a TINY_PREEMPT_RCU

On Tue, 2010-08-17 at 11:55 -0400, Mathieu Desnoyers wrote:

> > Heck, this is too much micro optimization. We could just be safe and do
> > the:
> > --t->rcu_read_lock_nesting;
> > barrier();
> > if (ACCESS_ONCE(t->rcu_read_lock_nesting) == 0 &&
> > unlikely((ACCESS_ONCE(t->rcu_read_unlock_special)))
> >
> > And be done with it.
>
> Then we could go for the simpler:
>
> --t->rcu_read_lock_nesting;
> barrier();
> if (t->rcu_read_lock_nesting == 0 &&
> unlikely((t->rcu_read_unlock_special))

Yeah, that's what I meant, I was too lazy to remove the ACCESS_ONCE()
from the cut and paste I did.

>
> Which puts a constraint across all memory accesses. I'd be fine with
> that if you are afraid of too much micro-optimization (e.g. my
> barrier2(a, b) proposal).

Not afraid, but just too much code for a simple solution.

-- Steve

2010-08-17 16:06:43

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 08/10] rcu: Add a TINY_PREEMPT_RCU

On Tue, 2010-08-17 at 12:04 -0400, Steven Rostedt wrote:

> > Then we could go for the simpler:
> >
> > --t->rcu_read_lock_nesting;
> > barrier();
> > if (t->rcu_read_lock_nesting == 0 &&
> > unlikely((t->rcu_read_unlock_special))
>
> Yeah, that's what I meant, I was too lazy to remove the ACCESS_ONCE()
> from the cut and paste I did.
>
> >
> > Which puts a constraint across all memory accesses. I'd be fine with
> > that if you are afraid of too much micro-optimization (e.g. my
> > barrier2(a, b) proposal).
>
> Not afraid, but just too much code for a simple solution.

IOW,

I think just pulling out the '--' and adding the barrier() is the proper
solution here. Compiler barriers are rather cheap.

Can we all agree on this solution?

-- Steve

2010-08-17 16:30:39

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 08/10] rcu: Add a TINY_PREEMPT_RCU

* Steven Rostedt ([email protected]) wrote:
> On Tue, 2010-08-17 at 12:04 -0400, Steven Rostedt wrote:
>
> > > Then we could go for the simpler:
> > >
> > > --t->rcu_read_lock_nesting;
> > > barrier();
> > > if (t->rcu_read_lock_nesting == 0 &&
> > > unlikely((t->rcu_read_unlock_special))
> >
> > Yeah, that's what I meant, I was too lazy to remove the ACCESS_ONCE()
> > from the cut and paste I did.
> >
> > >
> > > Which puts a constraint across all memory accesses. I'd be fine with
> > > that if you are afraid of too much micro-optimization (e.g. my
> > > barrier2(a, b) proposal).
> >
> > Not afraid, but just too much code for a simple solution.
>
> IOW,
>
> I think just pulling out the '--' and adding the barrier() is the proper
> solution here. Compiler barriers are rather cheap.
>
> Can we all agree on this solution?

Given that we already have a barrier() at the beginning of
rcu_read_unlock(), adding a second one will not have much more global
optimisation impact than what is already there. I'm personally fine with
this solution. Let's see what others have to say about this.

Thanks,

Mathieu


--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-08-17 19:34:07

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 08/10] rcu: Add a TINY_PREEMPT_RCU

On Tue, Aug 17, 2010 at 12:25:25PM -0400, Mathieu Desnoyers wrote:
> * Steven Rostedt ([email protected]) wrote:
> > On Tue, 2010-08-17 at 12:04 -0400, Steven Rostedt wrote:
> >
> > > > Then we could go for the simpler:
> > > >
> > > > --t->rcu_read_lock_nesting;
> > > > barrier();
> > > > if (t->rcu_read_lock_nesting == 0 &&
> > > > unlikely((t->rcu_read_unlock_special))
> > >
> > > Yeah, that's what I meant, I was too lazy to remove the ACCESS_ONCE()
> > > from the cut and paste I did.
> > >
> > > >
> > > > Which puts a constraint across all memory accesses. I'd be fine with
> > > > that if you are afraid of too much micro-optimization (e.g. my
> > > > barrier2(a, b) proposal).
> > >
> > > Not afraid, but just too much code for a simple solution.
> >
> > IOW,
> >
> > I think just pulling out the '--' and adding the barrier() is the proper
> > solution here. Compiler barriers are rather cheap.
> >
> > Can we all agree on this solution?
>
> Given that we already have a barrier() at the beginning of
> rcu_read_unlock(), adding a second one will not have much more global
> optimisation impact than what is already there. I'm personally fine with
> this solution. Let's see what others have to say about this.

Thank you both for the optimization work -- the read-side primitives do
need to be fast. And the barrier() approach generates decent code, on
some systems better than the original. So the second barrier wins. ;-)

Thanx, Paul

2010-08-17 20:00:56

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 08/10] rcu: Add a TINY_PREEMPT_RCU

On Tue, Aug 17, 2010 at 12:33:58PM -0700, Paul E. McKenney wrote:
> On Tue, Aug 17, 2010 at 12:25:25PM -0400, Mathieu Desnoyers wrote:
> > * Steven Rostedt ([email protected]) wrote:
> > > On Tue, 2010-08-17 at 12:04 -0400, Steven Rostedt wrote:
> > >
> > > > > Then we could go for the simpler:
> > > > >
> > > > > --t->rcu_read_lock_nesting;
> > > > > barrier();
> > > > > if (t->rcu_read_lock_nesting == 0 &&
> > > > > unlikely((t->rcu_read_unlock_special))
> > > >
> > > > Yeah, that's what I meant, I was too lazy to remove the ACCESS_ONCE()
> > > > from the cut and paste I did.
> > > >
> > > > >
> > > > > Which puts a constraint across all memory accesses. I'd be fine with
> > > > > that if you are afraid of too much micro-optimization (e.g. my
> > > > > barrier2(a, b) proposal).
> > > >
> > > > Not afraid, but just too much code for a simple solution.
> > >
> > > IOW,
> > >
> > > I think just pulling out the '--' and adding the barrier() is the proper
> > > solution here. Compiler barriers are rather cheap.
> > >
> > > Can we all agree on this solution?
> >
> > Given that we already have a barrier() at the beginning of
> > rcu_read_unlock(), adding a second one will not have much more global
> > optimisation impact than what is already there. I'm personally fine with
> > this solution. Let's see what others have to say about this.
>
> Thank you both for the optimization work -- the read-side primitives do
> need to be fast. And the barrier() approach generates decent code, on
> some systems better than the original. So the second barrier wins. ;-)

And here is the updated version of this patch.

Thanx, Paul

------------------------------------------------------------------------

commit e4fb476c3a41e0c1dde1f8e303a328623c9eaa92
Author: Paul E. McKenney <[email protected]>
Date: Tue Jun 29 16:49:16 2010 -0700

rcu: Add a TINY_PREEMPT_RCU

Implement a small-memory-footprint uniprocessor-only implementation of
preemptible RCU. This implementation uses but a single blocked-tasks
list rather than the combinatorial number used per leaf rcu_node by
TREE_PREEMPT_RCU, which reduces memory consumption and greatly simplifies
processing. This version also takes advantage of uniprocessor execution
to accelerate grace periods in the case where there are no readers.

The general design is otherwise broadly similar to that of TREE_PREEMPT_RCU.

This implementation is a step towards having RCU implementation driven
off of the SMP and PREEMPT kernel configuration variables, which can
happen once this implementation has accumulated sufficient experience.

Removed ACCESS_ONCE() from __rcu_read_unlock() and added barrier() as
suggested by Steve Rostedt in order to avoid the compiler-reordering
issue noted by Mathieu Desnoyers (http://lkml.org/lkml/2010/8/16/183).

Requested-by: Lo?c Minier <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index d5b3876..1f4517d 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -139,7 +139,7 @@ static inline void account_system_vtime(struct task_struct *tsk)
#endif

#if defined(CONFIG_NO_HZ)
-#if defined(CONFIG_TINY_RCU)
+#if defined(CONFIG_TINY_RCU) || defined(CONFIG_TINY_PREEMPT_RCU)
extern void rcu_enter_nohz(void);
extern void rcu_exit_nohz(void);

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 6460fc6..2fea6c8 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -82,11 +82,17 @@ extern struct group_info init_groups;
# define CAP_INIT_BSET CAP_FULL_SET

#ifdef CONFIG_TREE_PREEMPT_RCU
+#define INIT_TASK_RCU_TREE_PREEMPT() \
+ .rcu_blocked_node = NULL,
+#else
+#define INIT_TASK_RCU_TREE_PREEMPT(tsk)
+#endif
+#ifdef CONFIG_PREEMPT_RCU
#define INIT_TASK_RCU_PREEMPT(tsk) \
.rcu_read_lock_nesting = 0, \
.rcu_read_unlock_special = 0, \
- .rcu_blocked_node = NULL, \
- .rcu_node_entry = LIST_HEAD_INIT(tsk.rcu_node_entry),
+ .rcu_node_entry = LIST_HEAD_INIT(tsk.rcu_node_entry), \
+ INIT_TASK_RCU_TREE_PREEMPT()
#else
#define INIT_TASK_RCU_PREEMPT(tsk)
#endif
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 27b44b3..24b8966 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -58,7 +58,6 @@ struct rcu_head {
};

/* Exported common interfaces */
-extern void rcu_barrier(void);
extern void rcu_barrier_bh(void);
extern void rcu_barrier_sched(void);
extern void synchronize_sched_expedited(void);
@@ -69,7 +68,7 @@ extern void rcu_init(void);

#if defined(CONFIG_TREE_RCU) || defined(CONFIG_TREE_PREEMPT_RCU)
#include <linux/rcutree.h>
-#elif defined(CONFIG_TINY_RCU)
+#elif defined(CONFIG_TINY_RCU) || defined(CONFIG_TINY_PREEMPT_RCU)
#include <linux/rcutiny.h>
#else
#error "Unknown RCU implementation specified to kernel configuration"
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index e2e8931..4cc5eba 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -29,66 +29,51 @@

void rcu_sched_qs(int cpu);
void rcu_bh_qs(int cpu);
-static inline void rcu_note_context_switch(int cpu)
-{
- rcu_sched_qs(cpu);
-}

+#ifdef CONFIG_TINY_RCU
#define __rcu_read_lock() preempt_disable()
#define __rcu_read_unlock() preempt_enable()
+#else /* #ifdef CONFIG_TINY_RCU */
+void __rcu_read_lock(void);
+void __rcu_read_unlock(void);
+#endif /* #else #ifdef CONFIG_TINY_RCU */
#define __rcu_read_lock_bh() local_bh_disable()
#define __rcu_read_unlock_bh() local_bh_enable()
-#define call_rcu_sched call_rcu
+extern void call_rcu_sched(struct rcu_head *head,
+ void (*func)(struct rcu_head *rcu));

#define rcu_init_sched() do { } while (0)
-extern void rcu_check_callbacks(int cpu, int user);

-static inline int rcu_needs_cpu(int cpu)
-{
- return 0;
-}
+extern void synchronize_sched(void);

-/*
- * Return the number of grace periods.
- */
-static inline long rcu_batches_completed(void)
-{
- return 0;
-}
+#ifdef CONFIG_TINY_RCU

-/*
- * Return the number of bottom-half grace periods.
- */
-static inline long rcu_batches_completed_bh(void)
-{
- return 0;
-}
+#define call_rcu call_rcu_sched

-static inline void rcu_force_quiescent_state(void)
+static inline void synchronize_rcu(void)
{
+ synchronize_sched();
}

-static inline void rcu_bh_force_quiescent_state(void)
+static inline void synchronize_rcu_expedited(void)
{
+ synchronize_sched(); /* Only one CPU, so pretty fast anyway!!! */
}

-static inline void rcu_sched_force_quiescent_state(void)
+static inline void rcu_barrier(void)
{
+ rcu_barrier_sched(); /* Only one CPU, so only one list of callbacks! */
}

-extern void synchronize_sched(void);
+#else /* #ifdef CONFIG_TINY_RCU */

-static inline void synchronize_rcu(void)
-{
- synchronize_sched();
-}
+void synchronize_rcu(void);
+void rcu_barrier(void);
+void synchronize_rcu_expedited(void);

-static inline void synchronize_rcu_bh(void)
-{
- synchronize_sched();
-}
+#endif /* #else #ifdef CONFIG_TINY_RCU */

-static inline void synchronize_rcu_expedited(void)
+static inline void synchronize_rcu_bh(void)
{
synchronize_sched();
}
@@ -117,15 +102,82 @@ static inline void rcu_exit_nohz(void)

#endif /* #else #ifdef CONFIG_NO_HZ */

+#ifdef CONFIG_TINY_RCU
+
+static inline void rcu_preempt_note_context_switch(void)
+{
+}
+
static inline void exit_rcu(void)
{
}

+static inline int rcu_needs_cpu(int cpu)
+{
+ return 0;
+}
+
static inline int rcu_preempt_depth(void)
{
return 0;
}

+#else /* #ifdef CONFIG_TINY_RCU */
+
+void rcu_preempt_note_context_switch(void);
+extern void exit_rcu(void);
+int rcu_preempt_needs_cpu(void);
+
+static inline int rcu_needs_cpu(int cpu)
+{
+ return rcu_preempt_needs_cpu();
+}
+
+/*
+ * Defined as macro as it is a very low level header
+ * included from areas that don't even know about current
+ * FIXME: combine with include/linux/rcutree.h into rcupdate.h.
+ */
+#define rcu_preempt_depth() (current->rcu_read_lock_nesting)
+
+#endif /* #else #ifdef CONFIG_TINY_RCU */
+
+static inline void rcu_note_context_switch(int cpu)
+{
+ rcu_sched_qs(cpu);
+ rcu_preempt_note_context_switch();
+}
+
+extern void rcu_check_callbacks(int cpu, int user);
+
+/*
+ * Return the number of grace periods.
+ */
+static inline long rcu_batches_completed(void)
+{
+ return 0;
+}
+
+/*
+ * Return the number of bottom-half grace periods.
+ */
+static inline long rcu_batches_completed_bh(void)
+{
+ return 0;
+}
+
+static inline void rcu_force_quiescent_state(void)
+{
+}
+
+static inline void rcu_bh_force_quiescent_state(void)
+{
+}
+
+static inline void rcu_sched_force_quiescent_state(void)
+{
+}
+
#ifdef CONFIG_DEBUG_LOCK_ALLOC

extern int rcu_scheduler_active __read_mostly;
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index c0ed1c0..c13b85d 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -95,6 +95,8 @@ static inline void synchronize_rcu_bh_expedited(void)
synchronize_sched_expedited();
}

+extern void rcu_barrier(void);
+
extern void rcu_check_callbacks(int cpu, int user);

extern long rcu_batches_completed(void);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9a2d9a0..87ba750 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1211,11 +1211,13 @@ struct task_struct {
unsigned int policy;
cpumask_t cpus_allowed;

-#ifdef CONFIG_TREE_PREEMPT_RCU
+#ifdef CONFIG_PREEMPT_RCU
int rcu_read_lock_nesting;
char rcu_read_unlock_special;
- struct rcu_node *rcu_blocked_node;
struct list_head rcu_node_entry;
+#endif /* #ifdef CONFIG_PREEMPT_RCU */
+#ifdef CONFIG_TREE_PREEMPT_RCU
+ struct rcu_node *rcu_blocked_node;
#endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */

#if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
@@ -1748,7 +1750,7 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
#define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
#define used_math() tsk_used_math(current)

-#ifdef CONFIG_TREE_PREEMPT_RCU
+#ifdef CONFIG_PREEMPT_RCU

#define RCU_READ_UNLOCK_BLOCKED (1 << 0) /* blocked while in RCU read-side. */
#define RCU_READ_UNLOCK_NEED_QS (1 << 1) /* RCU core needs CPU response. */
@@ -1757,7 +1759,9 @@ static inline void rcu_copy_process(struct task_struct *p)
{
p->rcu_read_lock_nesting = 0;
p->rcu_read_unlock_special = 0;
+#ifdef CONFIG_TREE_PREEMPT_RCU
p->rcu_blocked_node = NULL;
+#endif
INIT_LIST_HEAD(&p->rcu_node_entry);
}

diff --git a/init/Kconfig b/init/Kconfig
index 7faa5d8..a2ac4ce 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -344,7 +344,7 @@ config TREE_RCU
smaller systems.

config TREE_PREEMPT_RCU
- bool "Preemptable tree-based hierarchical RCU"
+ bool "Preemptible tree-based hierarchical RCU"
depends on PREEMPT
help
This option selects the RCU implementation that is
@@ -362,8 +362,22 @@ config TINY_RCU
is not required. This option greatly reduces the
memory footprint of RCU.

+config TINY_PREEMPT_RCU
+ bool "Preemptible UP-only small-memory-footprint RCU"
+ depends on !SMP && PREEMPT
+ help
+ This option selects the RCU implementation that is designed
+ for real-time UP systems. This option greatly reduces the
+ memory footprint of RCU.
+
endchoice

+config PREEMPT_RCU
+ def_bool ( TREE_PREEMPT_RCU || TINY_PREEMPT_RCU )
+ help
+ This option enables preemptible-RCU code that is common between
+ the TREE_PREEMPT_RCU and TINY_PREEMPT_RCU implementations.
+
config RCU_TRACE
bool "Enable tracing for RCU"
depends on TREE_RCU || TREE_PREEMPT_RCU
diff --git a/kernel/Makefile b/kernel/Makefile
index 057472f..75b5d5c 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -85,6 +85,7 @@ obj-$(CONFIG_TREE_RCU) += rcutree.o
obj-$(CONFIG_TREE_PREEMPT_RCU) += rcutree.o
obj-$(CONFIG_TREE_RCU_TRACE) += rcutree_trace.o
obj-$(CONFIG_TINY_RCU) += rcutiny.o
+obj-$(CONFIG_TINY_PREEMPT_RCU) += rcutiny.o
obj-$(CONFIG_RELAY) += relay.o
obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
index 196ec02..d806735 100644
--- a/kernel/rcutiny.c
+++ b/kernel/rcutiny.c
@@ -59,6 +59,14 @@ int rcu_scheduler_active __read_mostly;
EXPORT_SYMBOL_GPL(rcu_scheduler_active);
#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */

+/* Forward declarations for rcutiny_plugin.h. */
+static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp);
+static void __call_rcu(struct rcu_head *head,
+ void (*func)(struct rcu_head *rcu),
+ struct rcu_ctrlblk *rcp);
+
+#include "rcutiny_plugin.h"
+
#ifdef CONFIG_NO_HZ

static long rcu_dynticks_nesting = 1;
@@ -140,6 +148,7 @@ void rcu_check_callbacks(int cpu, int user)
rcu_sched_qs(cpu);
else if (!in_softirq())
rcu_bh_qs(cpu);
+ rcu_preempt_check_callbacks();
}

/*
@@ -162,6 +171,7 @@ static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
*rcp->donetail = NULL;
if (rcp->curtail == rcp->donetail)
rcp->curtail = &rcp->rcucblist;
+ rcu_preempt_remove_callbacks(rcp);
rcp->donetail = &rcp->rcucblist;
local_irq_restore(flags);

@@ -182,6 +192,7 @@ static void rcu_process_callbacks(struct softirq_action *unused)
{
__rcu_process_callbacks(&rcu_sched_ctrlblk);
__rcu_process_callbacks(&rcu_bh_ctrlblk);
+ rcu_preempt_process_callbacks();
}

/*
@@ -223,15 +234,15 @@ static void __call_rcu(struct rcu_head *head,
}

/*
- * Post an RCU callback to be invoked after the end of an RCU grace
+ * Post an RCU callback to be invoked after the end of an RCU-sched grace
* period. But since we have but one CPU, that would be after any
* quiescent state.
*/
-void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
+void call_rcu_sched(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
{
__call_rcu(head, func, &rcu_sched_ctrlblk);
}
-EXPORT_SYMBOL_GPL(call_rcu);
+EXPORT_SYMBOL_GPL(call_rcu_sched);

/*
* Post an RCU bottom-half callback to be invoked after any subsequent
@@ -243,20 +254,6 @@ void call_rcu_bh(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
}
EXPORT_SYMBOL_GPL(call_rcu_bh);

-void rcu_barrier(void)
-{
- struct rcu_synchronize rcu;
-
- init_rcu_head_on_stack(&rcu.head);
- init_completion(&rcu.completion);
- /* Will wake me after RCU finished. */
- call_rcu(&rcu.head, wakeme_after_rcu);
- /* Wait for it. */
- wait_for_completion(&rcu.completion);
- destroy_rcu_head_on_stack(&rcu.head);
-}
-EXPORT_SYMBOL_GPL(rcu_barrier);
-
void rcu_barrier_bh(void)
{
struct rcu_synchronize rcu;
@@ -289,5 +286,3 @@ void __init rcu_init(void)
{
open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
}
-
-#include "rcutiny_plugin.h"
diff --git a/kernel/rcutiny_plugin.h b/kernel/rcutiny_plugin.h
index d223a92..e6bc1b4 100644
--- a/kernel/rcutiny_plugin.h
+++ b/kernel/rcutiny_plugin.h
@@ -1,7 +1,7 @@
/*
- * Read-Copy Update mechanism for mutual exclusion (tree-based version)
+ * Read-Copy Update mechanism for mutual exclusion, the Bloatwatch edition
* Internal non-public definitions that provide either classic
- * or preemptable semantics.
+ * or preemptible semantics.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -17,11 +17,587 @@
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
*
- * Copyright IBM Corporation, 2009
+ * Copyright (c) 2010 Linaro
*
* Author: Paul E. McKenney <[email protected]>
*/

+#ifdef CONFIG_TINY_PREEMPT_RCU
+
+#include <linux/delay.h>
+
+/* FIXME: merge with definitions in kernel/rcutree.h. */
+#define ULONG_CMP_GE(a, b) (ULONG_MAX / 2 >= (a) - (b))
+#define ULONG_CMP_LT(a, b) (ULONG_MAX / 2 < (a) - (b))
+
+/* Global control variables for preemptible RCU. */
+struct rcu_preempt_ctrlblk {
+ struct rcu_ctrlblk rcb; /* curtail: ->next ptr of last CB for GP. */
+ struct rcu_head **nexttail;
+ /* Tasks blocked in a preemptible RCU */
+ /* read-side critical section while an */
+ /* preemptible-RCU grace period is in */
+ /* progress must wait for a later grace */
+ /* period. This pointer points to the */
+ /* ->next pointer of the last task that */
+ /* must wait for a later grace period, or */
+ /* to &->rcb.rcucblist if there is no */
+ /* such task. */
+ struct list_head blkd_tasks;
+ /* Tasks blocked in RCU read-side critical */
+ /* section. Tasks are placed at the head */
+ /* of this list and age towards the tail. */
+ struct list_head *gp_tasks;
+ /* Pointer to the first task blocking the */
+ /* current grace period, or NULL if there */
+ /* is not such task. */
+ struct list_head *exp_tasks;
+ /* Pointer to first task blocking the */
+ /* current expedited grace period, or NULL */
+ /* if there is no such task. If there */
+ /* is no current expedited grace period, */
+ /* then there cannot be any such task. */
+ u8 gpnum; /* Current grace period. */
+ u8 gpcpu; /* Last grace period blocked by the CPU. */
+ u8 completed; /* Last grace period completed. */
+ /* If all three are equal, RCU is idle. */
+};
+
+static struct rcu_preempt_ctrlblk rcu_preempt_ctrlblk = {
+ .rcb.donetail = &rcu_preempt_ctrlblk.rcb.rcucblist,
+ .rcb.curtail = &rcu_preempt_ctrlblk.rcb.rcucblist,
+ .nexttail = &rcu_preempt_ctrlblk.rcb.rcucblist,
+ .blkd_tasks = LIST_HEAD_INIT(rcu_preempt_ctrlblk.blkd_tasks),
+};
+
+static int rcu_preempted_readers_exp(void);
+static void rcu_report_exp_done(void);
+
+/*
+ * Return true if the CPU has not yet responded to the current grace period.
+ */
+static int rcu_cpu_cur_gp(void)
+{
+ return rcu_preempt_ctrlblk.gpcpu != rcu_preempt_ctrlblk.gpnum;
+}
+
+/*
+ * Check for a running RCU reader. Because there is only one CPU,
+ * there can be but one running RCU reader at a time. ;-)
+ */
+static int rcu_preempt_running_reader(void)
+{
+ return current->rcu_read_lock_nesting;
+}
+
+/*
+ * Check for preempted RCU readers blocking any grace period.
+ * If the caller needs a reliable answer, it must disable hard irqs.
+ */
+static int rcu_preempt_blocked_readers_any(void)
+{
+ return !list_empty(&rcu_preempt_ctrlblk.blkd_tasks);
+}
+
+/*
+ * Check for preempted RCU readers blocking the current grace period.
+ * If the caller needs a reliable answer, it must disable hard irqs.
+ */
+static int rcu_preempt_blocked_readers_cgp(void)
+{
+ return rcu_preempt_ctrlblk.gp_tasks != NULL;
+}
+
+/*
+ * Return true if another preemptible-RCU grace period is needed.
+ */
+static int rcu_preempt_needs_another_gp(void)
+{
+ return *rcu_preempt_ctrlblk.rcb.curtail != NULL;
+}
+
+/*
+ * Return true if a preemptible-RCU grace period is in progress.
+ * The caller must disable hardirqs.
+ */
+static int rcu_preempt_gp_in_progress(void)
+{
+ return rcu_preempt_ctrlblk.completed != rcu_preempt_ctrlblk.gpnum;
+}
+
+/*
+ * Record a preemptible-RCU quiescent state for the specified CPU. Note
+ * that this just means that the task currently running on the CPU is
+ * in a quiescent state. There might be any number of tasks blocked
+ * while in an RCU read-side critical section.
+ *
+ * Unlike the other rcu_*_qs() functions, callers to this function
+ * must disable irqs in order to protect the assignment to
+ * ->rcu_read_unlock_special.
+ *
+ * Because this is a single-CPU implementation, the only way a grace
+ * period can end is if the CPU is in a quiescent state. The reason is
+ * that a blocked preemptible-RCU reader can exit its critical section
+ * only if the CPU is running it at the time. Therefore, when the
+ * last task blocking the current grace period exits its RCU read-side
+ * critical section, neither the CPU nor blocked tasks will be stopping
+ * the current grace period. (In contrast, SMP implementations
+ * might have CPUs running in RCU read-side critical sections that
+ * block later grace periods -- but this is not possible given only
+ * one CPU.)
+ */
+static void rcu_preempt_cpu_qs(void)
+{
+ /* Record both CPU and task as having responded to current GP. */
+ rcu_preempt_ctrlblk.gpcpu = rcu_preempt_ctrlblk.gpnum;
+ current->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
+
+ /*
+ * If there is no GP, or if blocked readers are still blocking GP,
+ * then there is nothing more to do.
+ */
+ if (!rcu_preempt_gp_in_progress() || rcu_preempt_blocked_readers_cgp())
+ return;
+
+ /* Advance callbacks. */
+ rcu_preempt_ctrlblk.completed = rcu_preempt_ctrlblk.gpnum;
+ rcu_preempt_ctrlblk.rcb.donetail = rcu_preempt_ctrlblk.rcb.curtail;
+ rcu_preempt_ctrlblk.rcb.curtail = rcu_preempt_ctrlblk.nexttail;
+
+ /* If there are no blocked readers, next GP is done instantly. */
+ if (!rcu_preempt_blocked_readers_any())
+ rcu_preempt_ctrlblk.rcb.donetail = rcu_preempt_ctrlblk.nexttail;
+
+ /* If there are done callbacks, make RCU_SOFTIRQ process them. */
+ if (*rcu_preempt_ctrlblk.rcb.donetail != NULL)
+ raise_softirq(RCU_SOFTIRQ);
+}
+
+/*
+ * Start a new RCU grace period if warranted. Hard irqs must be disabled.
+ */
+static void rcu_preempt_start_gp(void)
+{
+ if (!rcu_preempt_gp_in_progress() && rcu_preempt_needs_another_gp()) {
+
+ /* Official start of GP. */
+ rcu_preempt_ctrlblk.gpnum++;
+
+ /* Any blocked RCU readers block new GP. */
+ if (rcu_preempt_blocked_readers_any())
+ rcu_preempt_ctrlblk.gp_tasks =
+ rcu_preempt_ctrlblk.blkd_tasks.next;
+
+ /* If there is no running reader, CPU is done with GP. */
+ if (!rcu_preempt_running_reader())
+ rcu_preempt_cpu_qs();
+ }
+}
+
+/*
+ * We have entered the scheduler, and the current task might soon be
+ * context-switched away from. If this task is in an RCU read-side
+ * critical section, we will no longer be able to rely on the CPU to
+ * record that fact, so we enqueue the task on the blkd_tasks list.
+ * If the task started after the current grace period began, as recorded
+ * by ->gpcpu, we enqueue at the beginning of the list. Otherwise
+ * before the element referenced by ->gp_tasks (or at the tail if
+ * ->gp_tasks is NULL) and point ->gp_tasks at the newly added element.
+ * The task will dequeue itself when it exits the outermost enclosing
+ * RCU read-side critical section. Therefore, the current grace period
+ * cannot be permitted to complete until the ->gp_tasks pointer becomes
+ * NULL.
+ *
+ * Caller must disable preemption.
+ */
+void rcu_preempt_note_context_switch(void)
+{
+ struct task_struct *t = current;
+ unsigned long flags;
+
+ local_irq_save(flags); /* must exclude scheduler_tick(). */
+ if (rcu_preempt_running_reader() &&
+ (t->rcu_read_unlock_special & RCU_READ_UNLOCK_BLOCKED) == 0) {
+
+ /* Possibly blocking in an RCU read-side critical section. */
+ t->rcu_read_unlock_special |= RCU_READ_UNLOCK_BLOCKED;
+
+ /*
+ * If this CPU has already checked in, then this task
+ * will hold up the next grace period rather than the
+ * current grace period. Queue the task accordingly.
+ * If the task is queued for the current grace period
+ * (i.e., this CPU has not yet passed through a quiescent
+ * state for the current grace period), then as long
+ * as that task remains queued, the current grace period
+ * cannot end.
+ */
+ list_add(&t->rcu_node_entry, &rcu_preempt_ctrlblk.blkd_tasks);
+ if (rcu_cpu_cur_gp())
+ rcu_preempt_ctrlblk.gp_tasks = &t->rcu_node_entry;
+ }
+
+ /*
+ * Either we were not in an RCU read-side critical section to
+ * begin with, or we have now recorded that critical section
+ * globally. Either way, we can now note a quiescent state
+ * for this CPU. Again, if we were in an RCU read-side critical
+ * section, and if that critical section was blocking the current
+ * grace period, then the fact that the task has been enqueued
+ * means that current grace period continues to be blocked.
+ */
+ rcu_preempt_cpu_qs();
+ local_irq_restore(flags);
+}
+
+/*
+ * Tiny-preemptible RCU implementation for rcu_read_lock().
+ * Just increment ->rcu_read_lock_nesting, shared state will be updated
+ * if we block.
+ */
+void __rcu_read_lock(void)
+{
+ current->rcu_read_lock_nesting++;
+ barrier(); /* needed if we ever invoke rcu_read_lock in rcutiny.c */
+}
+EXPORT_SYMBOL_GPL(__rcu_read_lock);
+
+/*
+ * Handle special cases during rcu_read_unlock(), such as needing to
+ * notify RCU core processing or task having blocked during the RCU
+ * read-side critical section.
+ */
+static void rcu_read_unlock_special(struct task_struct *t)
+{
+ int empty;
+ int empty_exp;
+ unsigned long flags;
+ struct list_head *np;
+ int special;
+
+ /*
+ * NMI handlers cannot block and cannot safely manipulate state.
+ * They therefore cannot possibly be special, so just leave.
+ */
+ if (in_nmi())
+ return;
+
+ local_irq_save(flags);
+
+ /*
+ * If RCU core is waiting for this CPU to exit critical section,
+ * let it know that we have done so.
+ */
+ special = t->rcu_read_unlock_special;
+ if (special & RCU_READ_UNLOCK_NEED_QS)
+ rcu_preempt_cpu_qs();
+
+ /* Hardware IRQ handlers cannot block. */
+ if (in_irq()) {
+ local_irq_restore(flags);
+ return;
+ }
+
+ /* Clean up if blocked during RCU read-side critical section. */
+ if (special & RCU_READ_UNLOCK_BLOCKED) {
+ t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_BLOCKED;
+
+ /*
+ * Remove this task from the ->blkd_tasks list and adjust
+ * any pointers that might have been referencing it.
+ */
+ empty = !rcu_preempt_blocked_readers_cgp();
+ empty_exp = rcu_preempt_ctrlblk.exp_tasks == NULL;
+ np = t->rcu_node_entry.next;
+ if (np == &rcu_preempt_ctrlblk.blkd_tasks)
+ np = NULL;
+ list_del(&t->rcu_node_entry);
+ if (&t->rcu_node_entry == rcu_preempt_ctrlblk.gp_tasks)
+ rcu_preempt_ctrlblk.gp_tasks = np;
+ if (&t->rcu_node_entry == rcu_preempt_ctrlblk.exp_tasks)
+ rcu_preempt_ctrlblk.exp_tasks = np;
+ INIT_LIST_HEAD(&t->rcu_node_entry);
+
+ /*
+ * If this was the last task on the current list, and if
+ * we aren't waiting on the CPU, report the quiescent state
+ * and start a new grace period if needed.
+ */
+ if (!empty && !rcu_preempt_blocked_readers_cgp()) {
+ rcu_preempt_cpu_qs();
+ rcu_preempt_start_gp();
+ }
+
+ /*
+ * If this was the last task on the expedited lists,
+ * then we need wake up the waiting task.
+ */
+ if (!empty_exp && rcu_preempt_ctrlblk.exp_tasks == NULL)
+ rcu_report_exp_done();
+ }
+ local_irq_restore(flags);
+}
+
+/*
+ * Tiny-preemptible RCU implementation for rcu_read_unlock().
+ * Decrement ->rcu_read_lock_nesting. If the result is zero (outermost
+ * rcu_read_unlock()) and ->rcu_read_unlock_special is non-zero, then
+ * invoke rcu_read_unlock_special() to clean up after a context switch
+ * in an RCU read-side critical section and other special cases.
+ */
+void __rcu_read_unlock(void)
+{
+ struct task_struct *t = current;
+
+ barrier(); /* needed if we ever invoke rcu_read_unlock in rcutiny.c */
+ --t->rcu_read_lock_nesting;
+ barrier(); /* decrement before load of ->rcu_read_unlock_special */
+ if (t->rcu_read_lock_nesting == 0 &&
+ unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
+ rcu_read_unlock_special(t);
+#ifdef CONFIG_PROVE_LOCKING
+ WARN_ON_ONCE(t->rcu_read_lock_nesting < 0);
+#endif /* #ifdef CONFIG_PROVE_LOCKING */
+}
+EXPORT_SYMBOL_GPL(__rcu_read_unlock);
+
+/*
+ * Check for a quiescent state from the current CPU. When a task blocks,
+ * the task is recorded in the rcu_preempt_ctrlblk structure, which is
+ * checked elsewhere. This is called from the scheduling-clock interrupt.
+ *
+ * Caller must disable hard irqs.
+ */
+static void rcu_preempt_check_callbacks(void)
+{
+ struct task_struct *t = current;
+
+ if (!rcu_preempt_running_reader() && rcu_preempt_gp_in_progress())
+ rcu_preempt_cpu_qs();
+ if (&rcu_preempt_ctrlblk.rcb.rcucblist !=
+ rcu_preempt_ctrlblk.rcb.donetail)
+ raise_softirq(RCU_SOFTIRQ);
+ if (rcu_preempt_gp_in_progress() && rcu_preempt_running_reader())
+ t->rcu_read_unlock_special |= RCU_READ_UNLOCK_NEED_QS;
+}
+
+/*
+ * TINY_PREEMPT_RCU has an extra callback-list tail pointer to
+ * update, so this is invoked from __rcu_process_callbacks() to
+ * handle that case. Of course, it is invoked for all flavors of
+ * RCU, but RCU callbacks can appear only on one of the lists, and
+ * neither ->nexttail nor ->donetail can possibly be NULL, so there
+ * is no need for an explicit check.
+ */
+static void rcu_preempt_remove_callbacks(struct rcu_ctrlblk *rcp)
+{
+ if (rcu_preempt_ctrlblk.nexttail == rcp->donetail)
+ rcu_preempt_ctrlblk.nexttail = &rcp->rcucblist;
+}
+
+/*
+ * Process callbacks for preemptible RCU.
+ */
+static void rcu_preempt_process_callbacks(void)
+{
+ __rcu_process_callbacks(&rcu_preempt_ctrlblk.rcb);
+}
+
+/*
+ * Queue a preemptible -RCU callback for invocation after a grace period.
+ */
+void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
+{
+ unsigned long flags;
+
+ debug_rcu_head_queue(head);
+ head->func = func;
+ head->next = NULL;
+
+ local_irq_save(flags);
+ *rcu_preempt_ctrlblk.nexttail = head;
+ rcu_preempt_ctrlblk.nexttail = &head->next;
+ rcu_preempt_start_gp(); /* checks to see if GP needed. */
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(call_rcu);
+
+void rcu_barrier(void)
+{
+ struct rcu_synchronize rcu;
+
+ init_rcu_head_on_stack(&rcu.head);
+ init_completion(&rcu.completion);
+ /* Will wake me after RCU finished. */
+ call_rcu(&rcu.head, wakeme_after_rcu);
+ /* Wait for it. */
+ wait_for_completion(&rcu.completion);
+ destroy_rcu_head_on_stack(&rcu.head);
+}
+EXPORT_SYMBOL_GPL(rcu_barrier);
+
+/*
+ * synchronize_rcu - wait until a grace period has elapsed.
+ *
+ * Control will return to the caller some time after a full grace
+ * period has elapsed, in other words after all currently executing RCU
+ * read-side critical sections have completed. RCU read-side critical
+ * sections are delimited by rcu_read_lock() and rcu_read_unlock(),
+ * and may be nested.
+ */
+void synchronize_rcu(void)
+{
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ if (!rcu_scheduler_active)
+ return;
+#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
+
+ WARN_ON_ONCE(rcu_preempt_running_reader());
+ if (!rcu_preempt_blocked_readers_any())
+ return;
+
+ /* Once we get past the fastpath checks, same code as rcu_barrier(). */
+ rcu_barrier();
+}
+EXPORT_SYMBOL_GPL(synchronize_rcu);
+
+static DECLARE_WAIT_QUEUE_HEAD(sync_rcu_preempt_exp_wq);
+static unsigned long sync_rcu_preempt_exp_count;
+static DEFINE_MUTEX(sync_rcu_preempt_exp_mutex);
+
+/*
+ * Return non-zero if there are any tasks in RCU read-side critical
+ * sections blocking the current preemptible-RCU expedited grace period.
+ * If there is no preemptible-RCU expedited grace period currently in
+ * progress, returns zero unconditionally.
+ */
+static int rcu_preempted_readers_exp(void)
+{
+ return rcu_preempt_ctrlblk.exp_tasks != NULL;
+}
+
+/*
+ * Report the exit from RCU read-side critical section for the last task
+ * that queued itself during or before the current expedited preemptible-RCU
+ * grace period.
+ */
+static void rcu_report_exp_done(void)
+{
+ wake_up(&sync_rcu_preempt_exp_wq);
+}
+
+/*
+ * Wait for an rcu-preempt grace period, but expedite it. The basic idea
+ * is to rely in the fact that there is but one CPU, and that it is
+ * illegal for a task to invoke synchronize_rcu_expedited() while in a
+ * preemptible-RCU read-side critical section. Therefore, any such
+ * critical sections must correspond to blocked tasks, which must therefore
+ * be on the ->blkd_tasks list. So just record the current head of the
+ * list in the ->exp_tasks pointer, and wait for all tasks including and
+ * after the task pointed to by ->exp_tasks to drain.
+ */
+void synchronize_rcu_expedited(void)
+{
+ unsigned long flags;
+ struct rcu_preempt_ctrlblk *rpcp = &rcu_preempt_ctrlblk;
+ unsigned long snap;
+
+ barrier(); /* ensure prior action seen before grace period. */
+
+ WARN_ON_ONCE(rcu_preempt_running_reader());
+
+ /*
+ * Acquire lock so that there is only one preemptible RCU grace
+ * period in flight. Of course, if someone does the expedited
+ * grace period for us while we are acquiring the lock, just leave.
+ */
+ snap = sync_rcu_preempt_exp_count + 1;
+ mutex_lock(&sync_rcu_preempt_exp_mutex);
+ if (ULONG_CMP_LT(snap, sync_rcu_preempt_exp_count))
+ goto unlock_mb_ret; /* Others did our work for us. */
+
+ local_irq_save(flags);
+
+ /*
+ * All RCU readers have to already be on blkd_tasks because
+ * we cannot legally be executing in an RCU read-side critical
+ * section.
+ */
+
+ /* Snapshot current head of ->blkd_tasks list. */
+ rpcp->exp_tasks = rpcp->blkd_tasks.next;
+ if (rpcp->exp_tasks == &rpcp->blkd_tasks)
+ rpcp->exp_tasks = NULL;
+ local_irq_restore(flags);
+
+ /* Wait for tail of ->blkd_tasks list to drain. */
+ if (rcu_preempted_readers_exp())
+ wait_event(sync_rcu_preempt_exp_wq,
+ !rcu_preempted_readers_exp());
+
+ /* Clean up and exit. */
+ barrier(); /* ensure expedited GP seen before counter increment. */
+ sync_rcu_preempt_exp_count++;
+unlock_mb_ret:
+ mutex_unlock(&sync_rcu_preempt_exp_mutex);
+ barrier(); /* ensure subsequent action seen after grace period. */
+}
+EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
+
+/*
+ * Does preemptible RCU need the CPU to stay out of dynticks mode?
+ */
+int rcu_preempt_needs_cpu(void)
+{
+ if (!rcu_preempt_running_reader())
+ rcu_preempt_cpu_qs();
+ return rcu_preempt_ctrlblk.rcb.rcucblist != NULL;
+}
+
+/*
+ * Check for a task exiting while in a preemptible -RCU read-side
+ * critical section, clean up if so. No need to issue warnings,
+ * as debug_check_no_locks_held() already does this if lockdep
+ * is enabled.
+ */
+void exit_rcu(void)
+{
+ struct task_struct *t = current;
+
+ if (t->rcu_read_lock_nesting == 0)
+ return;
+ t->rcu_read_lock_nesting = 1;
+ rcu_read_unlock();
+}
+
+#else /* #ifdef CONFIG_TINY_PREEMPT_RCU */
+
+/*
+ * Because preemptible RCU does not exist, it never has any callbacks
+ * to check.
+ */
+static void rcu_preempt_check_callbacks(void)
+{
+}
+
+/*
+ * Because preemptible RCU does not exist, it never has any callbacks
+ * to remove.
+ */
+static void rcu_preempt_remove_callbacks(struct rcu_ctrlblk *rcp)
+{
+}
+
+/*
+ * Because preemptible RCU does not exist, it never has any callbacks
+ * to process.
+ */
+static void rcu_preempt_process_callbacks(void)
+{
+}
+
+#endif /* #else #ifdef CONFIG_TINY_PREEMPT_RCU */
+
#ifdef CONFIG_DEBUG_LOCK_ALLOC

#include <linux/kernel_stat.h>