Hello!
This series contains documentation updates:
1. Further updates to RCU's lockdep.rst.
2. Update NMI-RCU.rst.
3. Update rcubarrier.rst.
4. Update rcu_dereference.rst.
5. Update and wordsmith rculist_nulls.rst.
6. Update rcu.rst.
7. Update stallwarn.rst.
8. Update torture.rst.
9. Update UP.rst.
10. Update rcu.rst URL to RCU publications.
11. Update whatisRCU.rst.
12. Document CONFIG_RCU_CPU_STALL_CPUTIME=y stall information,
courtesy of Zhen Lei.
13. docs/RCU/rcubarrier: Adjust 'Answer' parts of QQs as
definition-lists, courtesy of Akira Yokosawa.
14. docs/RCU/rcubarrier: Right-adjust line numbers in code snippets,
courtesy of Akira Yokosawa.
15. Fix htmldocs build warnings of stallwarn.rst, courtesy of
Zhen Lei.
Thanx, Paul
------------------------------------------------------------------------
Documentation/RCU/rcu.rst | 3
Documentation/RCU/rcubarrier.rst | 177 ++++++++++++++--------------
Documentation/RCU/stallwarn.rst | 144 +++++++++++++++++++----
b/Documentation/RCU/NMI-RCU.rst | 4
b/Documentation/RCU/UP.rst | 13 +-
b/Documentation/RCU/lockdep.rst | 13 --
b/Documentation/RCU/rcu.rst | 3
b/Documentation/RCU/rcu_dereference.rst | 21 ++-
b/Documentation/RCU/rcubarrier.rst | 196 +++++++++++++++++---------------
b/Documentation/RCU/rculist_nulls.rst | 109 ++++++++---------
b/Documentation/RCU/stallwarn.rst | 43 ++++---
b/Documentation/RCU/torture.rst | 89 +++++++++++++-
b/Documentation/RCU/whatisRCU.rst | 193 ++++++++++++++++++++-----------
13 files changed, 646 insertions(+), 362 deletions(-)
This commit updates NMI-RCU.rst to highlight the ancient heritage of
the example code and to discourage wanton compiler "optimizations".
Signed-off-by: Paul E. McKenney <[email protected]>
---
Documentation/RCU/NMI-RCU.rst | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Documentation/RCU/NMI-RCU.rst b/Documentation/RCU/NMI-RCU.rst
index 2a92bc685ef1a..dff60a80b386e 100644
--- a/Documentation/RCU/NMI-RCU.rst
+++ b/Documentation/RCU/NMI-RCU.rst
@@ -8,7 +8,7 @@ Although RCU is usually used to protect read-mostly data structures,
it is possible to use RCU to provide dynamic non-maskable interrupt
handlers, as well as dynamic irq handlers. This document describes
how to do this, drawing loosely from Zwane Mwaikambo's NMI-timer
-work in "arch/x86/kernel/traps.c".
+work in an old version of "arch/x86/kernel/traps.c".
The relevant pieces of code are listed below, each followed by a
brief explanation::
@@ -116,7 +116,7 @@ Answer to Quick Quiz:
This same sad story can happen on other CPUs when using
a compiler with aggressive pointer-value speculation
- optimizations.
+ optimizations. (But please don't!)
More important, the rcu_dereference_sched() makes it
clear to someone reading the code that the pointer is
--
2.31.1.189.g2e36527f23
This commit updates rcu_dereference.rst to reflect RCU additions and
changes over the past few years
Signed-off-by: Paul E. McKenney <[email protected]>
---
Documentation/RCU/rcu_dereference.rst | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/Documentation/RCU/rcu_dereference.rst b/Documentation/RCU/rcu_dereference.rst
index 81e828c8313b8..3b739f6243c85 100644
--- a/Documentation/RCU/rcu_dereference.rst
+++ b/Documentation/RCU/rcu_dereference.rst
@@ -19,8 +19,9 @@ Follow these rules to keep your RCU code working properly:
can reload the value, and won't your code have fun with two
different values for a single pointer! Without rcu_dereference(),
DEC Alpha can load a pointer, dereference that pointer, and
- return data preceding initialization that preceded the store of
- the pointer.
+ return data preceding initialization that preceded the store
+ of the pointer. (As noted later, in recent kernels READ_ONCE()
+ also prevents DEC Alpha from playing these tricks.)
In addition, the volatile cast in rcu_dereference() prevents the
compiler from deducing the resulting pointer value. Please see
@@ -34,7 +35,7 @@ Follow these rules to keep your RCU code working properly:
takes on the role of the lockless_dereference() primitive that
was removed in v4.15.
-- You are only permitted to use rcu_dereference on pointer values.
+- You are only permitted to use rcu_dereference() on pointer values.
The compiler simply knows too much about integral values to
trust it to carry dependencies through integer operations.
There are a very few exceptions, namely that you can temporarily
@@ -240,6 +241,7 @@ precautions. To see this, consider the following code fragment::
struct foo *q;
int r1, r2;
+ rcu_read_lock();
p = rcu_dereference(gp2);
if (p == NULL)
return;
@@ -248,7 +250,10 @@ precautions. To see this, consider the following code fragment::
if (p == q) {
/* The compiler decides that q->c is same as p->c. */
r2 = p->c; /* Could get 44 on weakly order system. */
+ } else {
+ r2 = p->c - r1; /* Unconditional access to p->c. */
}
+ rcu_read_unlock();
do_something_with(r1, r2);
}
@@ -297,6 +302,7 @@ Then one approach is to use locking, for example, as follows::
struct foo *q;
int r1, r2;
+ rcu_read_lock();
p = rcu_dereference(gp2);
if (p == NULL)
return;
@@ -306,7 +312,12 @@ Then one approach is to use locking, for example, as follows::
if (p == q) {
/* The compiler decides that q->c is same as p->c. */
r2 = p->c; /* Locking guarantees r2 == 144. */
+ } else {
+ spin_lock(&q->lock);
+ r2 = q->c - r1;
+ spin_unlock(&q->lock);
}
+ rcu_read_unlock();
spin_unlock(&p->lock);
do_something_with(r1, r2);
}
@@ -364,7 +375,7 @@ the exact value of "p" even in the not-equals case. This allows the
compiler to make the return values independent of the load from "gp",
in turn destroying the ordering between this load and the loads of the
return values. This can result in "p->b" returning pre-initialization
-garbage values.
+garbage values on weakly ordered systems.
In short, rcu_dereference() is *not* optional when you are going to
dereference the resulting pointer.
@@ -430,7 +441,7 @@ member of the rcu_dereference() to use in various situations:
SPARSE CHECKING OF RCU-PROTECTED POINTERS
-----------------------------------------
-The sparse static-analysis tool checks for direct access to RCU-protected
+The sparse static-analysis tool checks for non-RCU access to RCU-protected
pointers, which can result in "interesting" bugs due to compiler
optimizations involving invented loads and perhaps also load tearing.
For example, suppose someone mistakenly does something like this::
--
2.31.1.189.g2e36527f23
This commit wordsmiths RCU's lockdep.rst.
Signed-off-by: Paul E. McKenney <[email protected]>
---
Documentation/RCU/lockdep.rst | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/Documentation/RCU/lockdep.rst b/Documentation/RCU/lockdep.rst
index 9308f1bdba05d..2749f43ec1b03 100644
--- a/Documentation/RCU/lockdep.rst
+++ b/Documentation/RCU/lockdep.rst
@@ -69,9 +69,8 @@ checking of rcu_dereference() primitives:
value of the pointer itself, for example, against NULL.
The rcu_dereference_check() check expression can be any boolean
-expression, but would normally include a lockdep expression. However,
-any boolean expression can be used. For a moderately ornate example,
-consider the following::
+expression, but would normally include a lockdep expression. For a
+moderately ornate example, consider the following::
file = rcu_dereference_check(fdt->fd[fd],
lockdep_is_held(&files->file_lock) ||
@@ -97,10 +96,10 @@ code, it could instead be written as follows::
atomic_read(&files->count) == 1);
This would verify cases #2 and #3 above, and furthermore lockdep would
-complain if this was used in an RCU read-side critical section unless one
-of these two cases held. Because rcu_dereference_protected() omits all
-barriers and compiler constraints, it generates better code than do the
-other flavors of rcu_dereference(). On the other hand, it is illegal
+complain even if this was used in an RCU read-side critical section unless
+one of these two cases held. Because rcu_dereference_protected() omits
+all barriers and compiler constraints, it generates better code than do
+the other flavors of rcu_dereference(). On the other hand, it is illegal
to use rcu_dereference_protected() if either the RCU-protected pointer
or the RCU-protected data that it points to can change concurrently.
--
2.31.1.189.g2e36527f23
From: Akira Yokosawa <[email protected]>
Line numbers in code snippets in rcubarrier.rst have beed left adjusted
since commit 4af498306ffd ("doc: Convert to rcubarrier.txt to ReST").
This might have been because right adjusting them had confused Sphinx.
The rules around a literal block in reST are:
- Need a blank line above it.
- A line with the same indent level as the line above it is regarded
as the end of it.
Those line numbers can be right adjusted by keeping indents at two-
digit numbers. While at it, add some spaces between the column of line
numbers and the code area for better readability.
Signed-off-by: Akira Yokosawa <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
Documentation/RCU/rcubarrier.rst | 168 +++++++++++++++----------------
1 file changed, 84 insertions(+), 84 deletions(-)
diff --git a/Documentation/RCU/rcubarrier.rst b/Documentation/RCU/rcubarrier.rst
index 9fb9ed7773552..6da7f66da2a80 100644
--- a/Documentation/RCU/rcubarrier.rst
+++ b/Documentation/RCU/rcubarrier.rst
@@ -72,9 +72,9 @@ For example, if it uses call_rcu(), call_srcu() on srcu_struct_1, and
call_srcu() on srcu_struct_2, then the following three lines of code
will be required when unloading::
- 1 rcu_barrier();
- 2 srcu_barrier(&srcu_struct_1);
- 3 srcu_barrier(&srcu_struct_2);
+ 1 rcu_barrier();
+ 2 srcu_barrier(&srcu_struct_1);
+ 3 srcu_barrier(&srcu_struct_2);
If latency is of the essence, workqueues could be used to run these
three functions concurrently.
@@ -82,69 +82,69 @@ three functions concurrently.
An ancient version of the rcutorture module makes use of rcu_barrier()
in its exit function as follows::
- 1 static void
- 2 rcu_torture_cleanup(void)
- 3 {
- 4 int i;
- 5
- 6 fullstop = 1;
- 7 if (shuffler_task != NULL) {
- 8 VERBOSE_PRINTK_STRING("Stopping rcu_torture_shuffle task");
- 9 kthread_stop(shuffler_task);
- 10 }
- 11 shuffler_task = NULL;
+ 1 static void
+ 2 rcu_torture_cleanup(void)
+ 3 {
+ 4 int i;
+ 5
+ 6 fullstop = 1;
+ 7 if (shuffler_task != NULL) {
+ 8 VERBOSE_PRINTK_STRING("Stopping rcu_torture_shuffle task");
+ 9 kthread_stop(shuffler_task);
+ 10 }
+ 11 shuffler_task = NULL;
12
- 13 if (writer_task != NULL) {
- 14 VERBOSE_PRINTK_STRING("Stopping rcu_torture_writer task");
- 15 kthread_stop(writer_task);
- 16 }
- 17 writer_task = NULL;
+ 13 if (writer_task != NULL) {
+ 14 VERBOSE_PRINTK_STRING("Stopping rcu_torture_writer task");
+ 15 kthread_stop(writer_task);
+ 16 }
+ 17 writer_task = NULL;
18
- 19 if (reader_tasks != NULL) {
- 20 for (i = 0; i < nrealreaders; i++) {
- 21 if (reader_tasks[i] != NULL) {
- 22 VERBOSE_PRINTK_STRING(
- 23 "Stopping rcu_torture_reader task");
- 24 kthread_stop(reader_tasks[i]);
- 25 }
- 26 reader_tasks[i] = NULL;
- 27 }
- 28 kfree(reader_tasks);
- 29 reader_tasks = NULL;
- 30 }
- 31 rcu_torture_current = NULL;
+ 19 if (reader_tasks != NULL) {
+ 20 for (i = 0; i < nrealreaders; i++) {
+ 21 if (reader_tasks[i] != NULL) {
+ 22 VERBOSE_PRINTK_STRING(
+ 23 "Stopping rcu_torture_reader task");
+ 24 kthread_stop(reader_tasks[i]);
+ 25 }
+ 26 reader_tasks[i] = NULL;
+ 27 }
+ 28 kfree(reader_tasks);
+ 29 reader_tasks = NULL;
+ 30 }
+ 31 rcu_torture_current = NULL;
32
- 33 if (fakewriter_tasks != NULL) {
- 34 for (i = 0; i < nfakewriters; i++) {
- 35 if (fakewriter_tasks[i] != NULL) {
- 36 VERBOSE_PRINTK_STRING(
- 37 "Stopping rcu_torture_fakewriter task");
- 38 kthread_stop(fakewriter_tasks[i]);
- 39 }
- 40 fakewriter_tasks[i] = NULL;
- 41 }
- 42 kfree(fakewriter_tasks);
- 43 fakewriter_tasks = NULL;
- 44 }
+ 33 if (fakewriter_tasks != NULL) {
+ 34 for (i = 0; i < nfakewriters; i++) {
+ 35 if (fakewriter_tasks[i] != NULL) {
+ 36 VERBOSE_PRINTK_STRING(
+ 37 "Stopping rcu_torture_fakewriter task");
+ 38 kthread_stop(fakewriter_tasks[i]);
+ 39 }
+ 40 fakewriter_tasks[i] = NULL;
+ 41 }
+ 42 kfree(fakewriter_tasks);
+ 43 fakewriter_tasks = NULL;
+ 44 }
45
- 46 if (stats_task != NULL) {
- 47 VERBOSE_PRINTK_STRING("Stopping rcu_torture_stats task");
- 48 kthread_stop(stats_task);
- 49 }
- 50 stats_task = NULL;
+ 46 if (stats_task != NULL) {
+ 47 VERBOSE_PRINTK_STRING("Stopping rcu_torture_stats task");
+ 48 kthread_stop(stats_task);
+ 49 }
+ 50 stats_task = NULL;
51
- 52 /* Wait for all RCU callbacks to fire. */
- 53 rcu_barrier();
+ 52 /* Wait for all RCU callbacks to fire. */
+ 53 rcu_barrier();
54
- 55 rcu_torture_stats_print(); /* -After- the stats thread is stopped! */
+ 55 rcu_torture_stats_print(); /* -After- the stats thread is stopped! */
56
- 57 if (cur_ops->cleanup != NULL)
- 58 cur_ops->cleanup();
- 59 if (atomic_read(&n_rcu_torture_error))
- 60 rcu_torture_print_module_parms("End of test: FAILURE");
- 61 else
- 62 rcu_torture_print_module_parms("End of test: SUCCESS");
- 63 }
+ 57 if (cur_ops->cleanup != NULL)
+ 58 cur_ops->cleanup();
+ 59 if (atomic_read(&n_rcu_torture_error))
+ 60 rcu_torture_print_module_parms("End of test: FAILURE");
+ 61 else
+ 62 rcu_torture_print_module_parms("End of test: SUCCESS");
+ 63 }
Line 6 sets a global variable that prevents any RCU callbacks from
re-posting themselves. This will not be necessary in most cases, since
@@ -193,16 +193,16 @@ which point, all earlier RCU callbacks are guaranteed to have completed.
The original code for rcu_barrier() was roughly as follows::
- 1 void rcu_barrier(void)
- 2 {
- 3 BUG_ON(in_interrupt());
- 4 /* Take cpucontrol mutex to protect against CPU hotplug */
- 5 mutex_lock(&rcu_barrier_mutex);
- 6 init_completion(&rcu_barrier_completion);
- 7 atomic_set(&rcu_barrier_cpu_count, 1);
- 8 on_each_cpu(rcu_barrier_func, NULL, 0, 1);
- 9 if (atomic_dec_and_test(&rcu_barrier_cpu_count))
- 10 complete(&rcu_barrier_completion);
+ 1 void rcu_barrier(void)
+ 2 {
+ 3 BUG_ON(in_interrupt());
+ 4 /* Take cpucontrol mutex to protect against CPU hotplug */
+ 5 mutex_lock(&rcu_barrier_mutex);
+ 6 init_completion(&rcu_barrier_completion);
+ 7 atomic_set(&rcu_barrier_cpu_count, 1);
+ 8 on_each_cpu(rcu_barrier_func, NULL, 0, 1);
+ 9 if (atomic_dec_and_test(&rcu_barrier_cpu_count))
+ 10 complete(&rcu_barrier_completion);
11 wait_for_completion(&rcu_barrier_completion);
12 mutex_unlock(&rcu_barrier_mutex);
13 }
@@ -232,16 +232,16 @@ still gives the general idea.
The rcu_barrier_func() runs on each CPU, where it invokes call_rcu()
to post an RCU callback, as follows::
- 1 static void rcu_barrier_func(void *notused)
- 2 {
- 3 int cpu = smp_processor_id();
- 4 struct rcu_data *rdp = &per_cpu(rcu_data, cpu);
- 5 struct rcu_head *head;
- 6
- 7 head = &rdp->barrier;
- 8 atomic_inc(&rcu_barrier_cpu_count);
- 9 call_rcu(head, rcu_barrier_callback);
- 10 }
+ 1 static void rcu_barrier_func(void *notused)
+ 2 {
+ 3 int cpu = smp_processor_id();
+ 4 struct rcu_data *rdp = &per_cpu(rcu_data, cpu);
+ 5 struct rcu_head *head;
+ 6
+ 7 head = &rdp->barrier;
+ 8 atomic_inc(&rcu_barrier_cpu_count);
+ 9 call_rcu(head, rcu_barrier_callback);
+ 10 }
Lines 3 and 4 locate RCU's internal per-CPU rcu_data structure,
which contains the struct rcu_head that needed for the later call to
@@ -254,11 +254,11 @@ The rcu_barrier_callback() function simply atomically decrements the
rcu_barrier_cpu_count variable and finalizes the completion when it
reaches zero, as follows::
- 1 static void rcu_barrier_callback(struct rcu_head *notused)
- 2 {
- 3 if (atomic_dec_and_test(&rcu_barrier_cpu_count))
- 4 complete(&rcu_barrier_completion);
- 5 }
+ 1 static void rcu_barrier_callback(struct rcu_head *notused)
+ 2 {
+ 3 if (atomic_dec_and_test(&rcu_barrier_cpu_count))
+ 4 complete(&rcu_barrier_completion);
+ 5 }
.. _rcubarrier_quiz_3:
--
2.31.1.189.g2e36527f23
Do some wordsmithing and breaking up of RCU readers.
Signed-off-by: Paul E. McKenney <[email protected]>
---
Documentation/RCU/rculist_nulls.rst | 109 ++++++++++++++--------------
1 file changed, 54 insertions(+), 55 deletions(-)
diff --git a/Documentation/RCU/rculist_nulls.rst b/Documentation/RCU/rculist_nulls.rst
index ca4692775ad41..f84d6970758bc 100644
--- a/Documentation/RCU/rculist_nulls.rst
+++ b/Documentation/RCU/rculist_nulls.rst
@@ -14,19 +14,19 @@ Using 'nulls'
=============
Using special makers (called 'nulls') is a convenient way
-to solve following problem :
+to solve following problem.
-A typical RCU linked list managing objects which are
-allocated with SLAB_TYPESAFE_BY_RCU kmem_cache can
-use following algos :
+Without 'nulls', a typical RCU linked list managing objects which are
+allocated with SLAB_TYPESAFE_BY_RCU kmem_cache can use the following
+algorithms:
-1) Lookup algo
---------------
+1) Lookup algorithm
+-------------------
::
- rcu_read_lock()
begin:
+ rcu_read_lock()
obj = lockless_lookup(key);
if (obj) {
if (!try_get_ref(obj)) // might fail for free objects
@@ -38,6 +38,7 @@ use following algos :
*/
if (obj->key != key) { // not the object we expected
put_ref(obj);
+ rcu_read_unlock();
goto begin;
}
}
@@ -52,9 +53,9 @@ but a version with an additional memory barrier (smp_rmb())
{
struct hlist_node *node, *next;
for (pos = rcu_dereference((head)->first);
- pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
- ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
- pos = rcu_dereference(next))
+ pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
+ ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
+ pos = rcu_dereference(next))
if (obj->key == key)
return obj;
return NULL;
@@ -64,9 +65,9 @@ And note the traditional hlist_for_each_entry_rcu() misses this smp_rmb()::
struct hlist_node *node;
for (pos = rcu_dereference((head)->first);
- pos && ({ prefetch(pos->next); 1; }) &&
- ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
- pos = rcu_dereference(pos->next))
+ pos && ({ prefetch(pos->next); 1; }) &&
+ ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
+ pos = rcu_dereference(pos->next))
if (obj->key == key)
return obj;
return NULL;
@@ -82,36 +83,32 @@ Quoting Corey Minyard::
solved by pre-fetching the "next" field (with proper barriers) before
checking the key."
-2) Insert algo
---------------
+2) Insertion algorithm
+----------------------
We need to make sure a reader cannot read the new 'obj->obj_next' value
-and previous value of 'obj->key'. Or else, an item could be deleted
+and previous value of 'obj->key'. Otherwise, an item could be deleted
from a chain, and inserted into another chain. If new chain was empty
-before the move, 'next' pointer is NULL, and lockless reader can
-not detect it missed following items in original chain.
+before the move, 'next' pointer is NULL, and lockless reader can not
+detect the fact that it missed following items in original chain.
::
/*
- * Please note that new inserts are done at the head of list,
- * not in the middle or end.
- */
+ * Please note that new inserts are done at the head of list,
+ * not in the middle or end.
+ */
obj = kmem_cache_alloc(...);
lock_chain(); // typically a spin_lock()
obj->key = key;
- /*
- * we need to make sure obj->key is updated before obj->next
- * or obj->refcnt
- */
- smp_wmb();
- atomic_set(&obj->refcnt, 1);
+ atomic_set_release(&obj->refcnt, 1); // key before refcnt
hlist_add_head_rcu(&obj->obj_node, list);
unlock_chain(); // typically a spin_unlock()
-3) Remove algo
---------------
+3) Removal algorithm
+--------------------
+
Nothing special here, we can use a standard RCU hlist deletion.
But thanks to SLAB_TYPESAFE_BY_RCU, beware a deleted object can be reused
very very fast (before the end of RCU grace period)
@@ -133,7 +130,7 @@ Avoiding extra smp_rmb()
========================
With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup()
-and extra smp_wmb() in insert function.
+and extra _release() in insert function.
For example, if we choose to store the slot number as the 'nulls'
end-of-list marker for each slot of the hash table, we can detect
@@ -142,59 +139,61 @@ to another chain) checking the final 'nulls' value if
the lookup met the end of chain. If final 'nulls' value
is not the slot number, then we must restart the lookup at
the beginning. If the object was moved to the same chain,
-then the reader doesn't care : It might eventually
+then the reader doesn't care: It might occasionally
scan the list again without harm.
-1) lookup algo
---------------
+1) lookup algorithm
+-------------------
::
head = &table[slot];
- rcu_read_lock();
begin:
+ rcu_read_lock();
hlist_nulls_for_each_entry_rcu(obj, node, head, member) {
if (obj->key == key) {
- if (!try_get_ref(obj)) // might fail for free objects
+ if (!try_get_ref(obj)) { // might fail for free objects
+ rcu_read_unlock();
goto begin;
+ }
if (obj->key != key) { // not the object we expected
put_ref(obj);
+ rcu_read_unlock();
goto begin;
}
- goto out;
+ goto out;
+ }
+ }
+
+ // If the nulls value we got at the end of this lookup is
+ // not the expected one, we must restart lookup.
+ // We probably met an item that was moved to another chain.
+ if (get_nulls_value(node) != slot) {
+ put_ref(obj);
+ rcu_read_unlock();
+ goto begin;
}
- /*
- * if the nulls value we got at the end of this lookup is
- * not the expected one, we must restart lookup.
- * We probably met an item that was moved to another chain.
- */
- if (get_nulls_value(node) != slot)
- goto begin;
obj = NULL;
out:
rcu_read_unlock();
-2) Insert function
-------------------
+2) Insert algorithm
+-------------------
::
/*
- * Please note that new inserts are done at the head of list,
- * not in the middle or end.
- */
+ * Please note that new inserts are done at the head of list,
+ * not in the middle or end.
+ */
obj = kmem_cache_alloc(cachep);
lock_chain(); // typically a spin_lock()
obj->key = key;
+ atomic_set_release(&obj->refcnt, 1); // key before refcnt
/*
- * changes to obj->key must be visible before refcnt one
- */
- smp_wmb();
- atomic_set(&obj->refcnt, 1);
- /*
- * insert obj in RCU way (readers might be traversing chain)
- */
+ * insert obj in RCU way (readers might be traversing chain)
+ */
hlist_nulls_add_head_rcu(&obj->obj_node, list);
unlock_chain(); // typically a spin_unlock()
--
2.31.1.189.g2e36527f23
From: Zhen Lei <[email protected]>
This commit documents the additional RCU CPU stall warning output
produced by kernels built with CONFIG_RCU_CPU_STALL_CPUTIME=y or booted
with rcupdate.rcu_cpu_stall_cputime=1.
[ paulmck: Apply wordsmithing. ]
Signed-off-by: Zhen Lei <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
Documentation/RCU/stallwarn.rst | 88 +++++++++++++++++++++++++++++++++
1 file changed, 88 insertions(+)
diff --git a/Documentation/RCU/stallwarn.rst b/Documentation/RCU/stallwarn.rst
index dfa4db8c0931e..c1e92dfef40d5 100644
--- a/Documentation/RCU/stallwarn.rst
+++ b/Documentation/RCU/stallwarn.rst
@@ -390,3 +390,91 @@ for example, "P3421".
It is entirely possible to see stall warnings from normal and from
expedited grace periods at about the same time during the same run.
+
+RCU_CPU_STALL_CPUTIME
+=====================
+
+In kernels built with CONFIG_RCU_CPU_STALL_CPUTIME=y or booted with
+rcupdate.rcu_cpu_stall_cputime=1, the following additional information
+is supplied with each RCU CPU stall warning::
+
+rcu: hardirqs softirqs csw/system
+rcu: number: 624 45 0
+rcu: cputime: 69 1 2425 ==> 2500(ms)
+
+These statistics are collected during the sampling period. The values
+in row "number:" are the number of hard interrupts, number of soft
+interrupts, and number of context switches on the stalled CPU. The
+first three values in row "cputime:" indicate the CPU time in
+milliseconds consumed by hard interrupts, soft interrupts, and tasks
+on the stalled CPU. The last number is the measurement interval, again
+in milliseconds. Because user-mode tasks normally do not cause RCU CPU
+stalls, these tasks are typically kernel tasks, which is why only the
+system CPU time are considered.
+
+The sampling period is shown as follows:
+:<------------first timeout---------->:<-----second timeout----->:
+:<--half timeout-->:<--half timeout-->: :
+: :<--first period-->: :
+: :<-----------second sampling period---------->:
+: : : :
+: snapshot time point 1st-stall 2nd-stall
+
+
+The following describes four typical scenarios:
+
+1. A CPU looping with interrupts disabled.::
+
+ rcu: hardirqs softirqs csw/system
+ rcu: number: 0 0 0
+ rcu: cputime: 0 0 0 ==> 2500(ms)
+
+ Because interrupts have been disabled throughout the measurement
+ interval, there are no interrupts and no context switches.
+ Furthermore, because CPU time consumption was measured using interrupt
+ handlers, the system CPU consumption is misleadingly measured as zero.
+ This scenario will normally also have "(0 ticks this GP)" printed on
+ this CPU's summary line.
+
+2. A CPU looping with bottom halves disabled.
+
+ This is similar to the previous example, but with non-zero number of
+ and CPU time consumed by hard interrupts, along with non-zero CPU
+ time consumed by in-kernel execution.::
+
+ rcu: hardirqs softirqs csw/system
+ rcu: number: 624 0 0
+ rcu: cputime: 49 0 2446 ==> 2500(ms)
+
+ The fact that there are zero softirqs gives a hint that these were
+ disabled, perhaps via local_bh_disable(). It is of course possible
+ that there were no softirqs, perhaps because all events that would
+ result in softirq execution are confined to other CPUs. In this case,
+ the diagnosis should continue as shown in the next example.
+
+3. A CPU looping with preemption disabled.
+
+ Here, only the number of context switches is zero.::
+
+ rcu: hardirqs softirqs csw/system
+ rcu: number: 624 45 0
+ rcu: cputime: 69 1 2425 ==> 2500(ms)
+
+ This situation hints that the stalled CPU was looping with preemption
+ disabled.
+
+4. No looping, but massive hard and soft interrupts.::
+
+ rcu: hardirqs softirqs csw/system
+ rcu: number: xx xx 0
+ rcu: cputime: xx xx 0 ==> 2500(ms)
+
+ Here, the number and CPU time of hard interrupts are all non-zero,
+ but the number of context switches and the in-kernel CPU time consumed
+ are zero. The number and cputime of soft interrupts will usually be
+ non-zero, but could be zero, for example, if the CPU was spinning
+ within a single hard interrupt handler.
+
+ If this type of RCU CPU stall warning can be reproduced, you can
+ narrow it down by looking at /proc/interrupts or by writing code to
+ trace each interrupt, for example, by referring to show_interrupts().
--
2.31.1.189.g2e36527f23
This commit updates UP.rst to reflect changes over the past few years,
including the advent of userspace RCU libraries for constrained systems.
Signed-off-by: Paul E. McKenney <[email protected]>
---
Documentation/RCU/UP.rst | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/Documentation/RCU/UP.rst b/Documentation/RCU/UP.rst
index e26dda27430c8..8b20fd45f2558 100644
--- a/Documentation/RCU/UP.rst
+++ b/Documentation/RCU/UP.rst
@@ -38,7 +38,7 @@ by having call_rcu() directly invoke its arguments only if it was called
from process context. However, this can fail in a similar manner.
Suppose that an RCU-based algorithm again scans a linked list containing
-elements A, B, and C in process contexts, but that it invokes a function
+elements A, B, and C in process context, but that it invokes a function
on each element as it is scanned. Suppose further that this function
deletes element B from the list, then passes it to call_rcu() for deferred
freeing. This may be a bit unconventional, but it is perfectly legal
@@ -59,7 +59,8 @@ Example 3: Death by Deadlock
Suppose that call_rcu() is invoked while holding a lock, and that the
callback function must acquire this same lock. In this case, if
call_rcu() were to directly invoke the callback, the result would
-be self-deadlock.
+be self-deadlock *even if* this invocation occurred from a later
+call_rcu() invocation a full grace period later.
In some cases, it would possible to restructure to code so that
the call_rcu() is delayed until after the lock is released. However,
@@ -85,6 +86,14 @@ Quick Quiz #2:
:ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
+It is important to note that userspace RCU implementations *do*
+permit call_rcu() to directly invoke callbacks, but only if a full
+grace period has elapsed since those callbacks were queued. This is
+the case because some userspace environments are extremely constrained.
+Nevertheless, people writing userspace RCU implementations are strongly
+encouraged to avoid invoking callbacks from call_rcu(), thus obtaining
+the deadlock-avoidance benefits called out above.
+
Summary
-------
--
2.31.1.189.g2e36527f23
From: Zhen Lei <[email protected]>
Documentation/RCU/stallwarn.rst:
401: WARNING: Literal block expected; none found.
428: WARNING: Literal block expected; none found.
445: WARNING: Literal block expected; none found.
459: WARNING: Literal block expected; none found.
468: WARNING: Literal block expected; none found.
The literal block needs to be indented, so this commit adds two spaces
to each line.
In addition, ':', which is used as a boundary in the literal block, is
replaced by '|'.
Link: https://lore.kernel.org/linux-next/[email protected]/
Fixes: 3d2788ba4573 ("doc: Document CONFIG_RCU_CPU_STALL_CPUTIME=y stall information")
Reported-by: Stephen Rothwell <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
Tested-by: Akira Yokosawa <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
Documentation/RCU/stallwarn.rst | 56 ++++++++++++++++++---------------
1 file changed, 30 insertions(+), 26 deletions(-)
diff --git a/Documentation/RCU/stallwarn.rst b/Documentation/RCU/stallwarn.rst
index c1e92dfef40d5..ca7b7cd806a16 100644
--- a/Documentation/RCU/stallwarn.rst
+++ b/Documentation/RCU/stallwarn.rst
@@ -398,9 +398,9 @@ In kernels built with CONFIG_RCU_CPU_STALL_CPUTIME=y or booted with
rcupdate.rcu_cpu_stall_cputime=1, the following additional information
is supplied with each RCU CPU stall warning::
-rcu: hardirqs softirqs csw/system
-rcu: number: 624 45 0
-rcu: cputime: 69 1 2425 ==> 2500(ms)
+ rcu: hardirqs softirqs csw/system
+ rcu: number: 624 45 0
+ rcu: cputime: 69 1 2425 ==> 2500(ms)
These statistics are collected during the sampling period. The values
in row "number:" are the number of hard interrupts, number of soft
@@ -412,22 +412,24 @@ in milliseconds. Because user-mode tasks normally do not cause RCU CPU
stalls, these tasks are typically kernel tasks, which is why only the
system CPU time are considered.
-The sampling period is shown as follows:
-:<------------first timeout---------->:<-----second timeout----->:
-:<--half timeout-->:<--half timeout-->: :
-: :<--first period-->: :
-: :<-----------second sampling period---------->:
-: : : :
-: snapshot time point 1st-stall 2nd-stall
+The sampling period is shown as follows::
+ |<------------first timeout---------->|<-----second timeout----->|
+ |<--half timeout-->|<--half timeout-->| |
+ | |<--first period-->| |
+ | |<-----------second sampling period---------->|
+ | | | |
+ snapshot time point 1st-stall 2nd-stall
The following describes four typical scenarios:
-1. A CPU looping with interrupts disabled.::
+1. A CPU looping with interrupts disabled.
- rcu: hardirqs softirqs csw/system
- rcu: number: 0 0 0
- rcu: cputime: 0 0 0 ==> 2500(ms)
+ ::
+
+ rcu: hardirqs softirqs csw/system
+ rcu: number: 0 0 0
+ rcu: cputime: 0 0 0 ==> 2500(ms)
Because interrupts have been disabled throughout the measurement
interval, there are no interrupts and no context switches.
@@ -440,11 +442,11 @@ The following describes four typical scenarios:
This is similar to the previous example, but with non-zero number of
and CPU time consumed by hard interrupts, along with non-zero CPU
- time consumed by in-kernel execution.::
+ time consumed by in-kernel execution::
- rcu: hardirqs softirqs csw/system
- rcu: number: 624 0 0
- rcu: cputime: 49 0 2446 ==> 2500(ms)
+ rcu: hardirqs softirqs csw/system
+ rcu: number: 624 0 0
+ rcu: cputime: 49 0 2446 ==> 2500(ms)
The fact that there are zero softirqs gives a hint that these were
disabled, perhaps via local_bh_disable(). It is of course possible
@@ -454,20 +456,22 @@ The following describes four typical scenarios:
3. A CPU looping with preemption disabled.
- Here, only the number of context switches is zero.::
+ Here, only the number of context switches is zero::
- rcu: hardirqs softirqs csw/system
- rcu: number: 624 45 0
- rcu: cputime: 69 1 2425 ==> 2500(ms)
+ rcu: hardirqs softirqs csw/system
+ rcu: number: 624 45 0
+ rcu: cputime: 69 1 2425 ==> 2500(ms)
This situation hints that the stalled CPU was looping with preemption
disabled.
-4. No looping, but massive hard and soft interrupts.::
+4. No looping, but massive hard and soft interrupts.
+
+ ::
- rcu: hardirqs softirqs csw/system
- rcu: number: xx xx 0
- rcu: cputime: xx xx 0 ==> 2500(ms)
+ rcu: hardirqs softirqs csw/system
+ rcu: number: xx xx 0
+ rcu: cputime: xx xx 0 ==> 2500(ms)
Here, the number and CPU time of hard interrupts are all non-zero,
but the number of context switches and the in-kernel CPU time consumed
--
2.31.1.189.g2e36527f23