2014-02-17 21:26:32

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 0/6] Documentation changes for 3.15

Hello!

This series provides a variety of documentation updates:

1. Update the documentation on actions to take to avoid frenetic
call_rcu() invocations from exhausting memory, including a mention
of internal-to-RCU avoidance measures.

2. Add a note to Documentation/memory-barriers.txt stating that
ACCESS_ONCE() provides cache coherence for accesses to any
single given variable.

3. Add an explicit statement that in control dependencies, the
condition must include the load in question.

4. Add WQ_SYSFS workqueues as a work-offloading option in
Documentation/kernel-per-CPU-kthreads.txt.

5. It turns out that some types of control dependencies require
memory barriers, most notably when the exact same store is
at the beginning of both the then-clause and the else-clause.
Document this.

6. Fixups to RTFP.txt

Thanx, Paul

------------------------------------------------------------------------

Documentation/memory-barriers.txt | 29 +++--
b/Documentation/RCU/RTFP.txt | 149 +++++++++++++++++++++++-----
b/Documentation/RCU/checklist.txt | 19 ++-
b/Documentation/kernel-per-CPU-kthreads.txt | 13 ++
b/Documentation/memory-barriers.txt | 17 +++
5 files changed, 189 insertions(+), 38 deletions(-)


2014-02-17 21:27:09

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 2/6] Documentation/memory-barriers.txt: ACCESS_ONCE() provides cache coherence

From: "Paul E. McKenney" <[email protected]>

The ACCESS_ONCE() primitive provides cache coherence, but the
documentation does not clearly state this. This commit therefore upgrades
the documentation.

Signed-off-by: Paul E. McKenney <[email protected]>
---
Documentation/memory-barriers.txt | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 102dc19c4119..ad6db1d48f1f 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1249,6 +1249,23 @@ The ACCESS_ONCE() function can prevent any number of optimizations that,
while perfectly safe in single-threaded code, can be fatal in concurrent
code. Here are some examples of these sorts of optimizations:

+ (*) The compiler is within its rights to reorder loads and stores
+ to the same variable, and in some cases, the CPU is within its
+ rights to reorder loads to the same variable. This means that
+ the following code:
+
+ a[0] = x;
+ a[1] = x;
+
+ Might result in an older value of x stored in a[1] than in a[0].
+ Prevent both the compiler and the CPU from doing this as follows:
+
+ a[0] = ACCESS_ONCE(x);
+ a[1] = ACCESS_ONCE(x);
+
+ In short, ACCESS_ONCE() provides "cache coherence" for accesses from
+ multiple CPUs to a single variable.
+
(*) The compiler is within its rights to merge successive loads from
the same variable. Such merging can cause the compiler to "optimize"
the following code:
--
1.8.1.5

2014-02-17 21:27:11

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 5/6] Documentation/memory-barriers.txt: Need barriers() for some control dependencies

From: "Paul E. McKenney" <[email protected]>

Current compilers can "speculate" stores in the case where both legs
of the "if" statement start with identical stores. Because the stores
are identical, the compiler knows that the store will unconditionally
execute regardless of the "if" condition, and so the compiler is within
its rights to hoist the store to precede the condition. Such hoisting
destroys the control-dependency ordering. This ordering can be restored
by placing a barrier() at the beginning of each leg of the "if" statement.
This commit adds this requirement to the control-dependencies section.

Signed-off-by: Paul E. McKenney <[email protected]>
---
Documentation/memory-barriers.txt | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index f2668c19807e..adfaca831a90 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -608,26 +608,30 @@ as follows:
b = p; /* BUG: Compiler can reorder!!! */
do_something();

-The solution is again ACCESS_ONCE(), which preserves the ordering between
-the load from variable 'a' and the store to variable 'b':
+The solution is again ACCESS_ONCE() and barrier(), which preserves the
+ordering between the load from variable 'a' and the store to variable 'b':

q = ACCESS_ONCE(a);
if (q) {
+ barrier();
ACCESS_ONCE(b) = p;
do_something();
} else {
+ barrier();
ACCESS_ONCE(b) = p;
do_something_else();
}

-You could also use barrier() to prevent the compiler from moving
-the stores to variable 'b', but barrier() would not prevent the
-compiler from proving to itself that a==1 always, so ACCESS_ONCE()
-is also needed.
+The initial ACCESS_ONCE() is required to prevent the compiler from
+proving the value of 'a', and the pair of barrier() invocations are
+required to prevent the compiler from pulling the two identical stores
+to 'b' out from the legs of the "if" statement.

It is important to note that control dependencies absolutely require a
a conditional. For example, the following "optimized" version of
-the above example breaks ordering:
+the above example breaks ordering, which is why the barrier() invocations
+are absolutely required if you have identical stores in both legs of
+the "if" statement:

q = ACCESS_ONCE(a);
ACCESS_ONCE(b) = p; /* BUG: No ordering vs. load from a!!! */
@@ -643,9 +647,11 @@ It is of course legal for the prior load to be part of the conditional,
for example, as follows:

if (ACCESS_ONCE(a) > 0) {
+ barrier();
ACCESS_ONCE(b) = q / 2;
do_something();
} else {
+ barrier();
ACCESS_ONCE(b) = q / 3;
do_something_else();
}
@@ -659,9 +665,11 @@ the needed conditional. For example:

q = ACCESS_ONCE(a);
if (q % MAX) {
+ barrier();
ACCESS_ONCE(b) = p;
do_something();
} else {
+ barrier();
ACCESS_ONCE(b) = p;
do_something_else();
}
@@ -723,6 +731,10 @@ In summary:
use smb_rmb(), smp_wmb(), or, in the case of prior stores and
later loads, smp_mb().

+ (*) If both legs of the "if" statement begin with identical stores
+ to the same variable, a barrier() statement is required at the
+ beginning of each leg of the "if" statement.
+
(*) Control dependencies require at least one run-time conditional
between the prior load and the subsequent store, and this
conditional must involve the prior load. If the compiler
--
1.8.1.5

2014-02-17 21:27:07

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 4/6] Documentation/kernel-per-CPU-kthreads.txt: Workqueue affinity

From: "Paul E. McKenney" <[email protected]>

This commit documents the ability to apply CPU affinity to WQ_SYSFS
workqueues, thus offloading them from the desired worker CPUs.

Signed-off-by: Paul E. McKenney <[email protected]>
Reviewed-by: Tejun Heo <[email protected]>
Acked-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Lai Jiangshan <[email protected]>
---
Documentation/kernel-per-CPU-kthreads.txt | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt
index 827104fb9364..f3cd299fcc41 100644
--- a/Documentation/kernel-per-CPU-kthreads.txt
+++ b/Documentation/kernel-per-CPU-kthreads.txt
@@ -162,7 +162,18 @@ Purpose: Execute workqueue requests
To reduce its OS jitter, do any of the following:
1. Run your workload at a real-time priority, which will allow
preempting the kworker daemons.
-2. Do any of the following needed to avoid jitter that your
+2. A given workqueue can be made visible in the sysfs filesystem
+ by passing the WQ_SYSFS to that workqueue's alloc_workqueue().
+ Such a workqueue can be confined to a given subset of the
+ CPUs using the /sys/devices/virtual/workqueue/*/cpumask sysfs
+ files. The set of WQ_SYSFS workqueues can be displayed using
+ "ls sys/devices/virtual/workqueue". That said, the workqueues
+ maintainer would like to caution people against indiscriminately
+ sprinkling WQ_SYSFS across all the workqueues. The reason for
+ caution is that it is easy to add WQ_SYSFS, but because sysfs is
+ part of the formal user/kernel API, it can be nearly impossible
+ to remove it, even if its addition was a mistake.
+3. Do any of the following needed to avoid jitter that your
application cannot tolerate:
a. Build your kernel with CONFIG_SLUB=y rather than
CONFIG_SLAB=y, thus avoiding the slab allocator's periodic
--
1.8.1.5

2014-02-17 21:27:05

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 1/6] documentation: Document call_rcu() safety mechanisms and limitations

From: "Paul E. McKenney" <[email protected]>

The call_rcu() family of primitives will take action to accelerate
grace periods when the number of callbacks pending on a given CPU
becomes excessive. Although this safety mechanism can be useful,
it is no substitute for users of call_rcu() having rate-limit controls
in place. This commit adds this nuance to the documentation.

Reported-by: "Michael S. Tsirkin" <[email protected]>
Reported-by: Gleb Natapov <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
Documentation/RCU/checklist.txt | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
index 91266193b8f4..5733e31836b5 100644
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.txt
@@ -256,10 +256,11 @@ over a rather long period of time, but improvements are always welcome!
variations on this theme.

b. Limiting update rate. For example, if updates occur only
- once per hour, then no explicit rate limiting is required,
- unless your system is already badly broken. The dcache
- subsystem takes this approach -- updates are guarded
- by a global lock, limiting their rate.
+ once per hour, then no explicit rate limiting is
+ required, unless your system is already badly broken.
+ Older versions of the dcache subsystem takes this
+ approach -- updates were guarded by a global lock,
+ limiting their rate.

c. Trusted update -- if updates can only be done manually by
superuser or some other trusted user, then it might not
@@ -268,7 +269,8 @@ over a rather long period of time, but improvements are always welcome!
the machine.

d. Use call_rcu_bh() rather than call_rcu(), in order to take
- advantage of call_rcu_bh()'s faster grace periods.
+ advantage of call_rcu_bh()'s faster grace periods. (This
+ is only a partial solution, though.)

e. Periodically invoke synchronize_rcu(), permitting a limited
number of updates per grace period.
@@ -276,6 +278,13 @@ over a rather long period of time, but improvements are always welcome!
The same cautions apply to call_rcu_bh(), call_rcu_sched(),
call_srcu(), and kfree_rcu().

+ Note that although these primitives do take action to avoid memory
+ exhaustion when any given CPU has too many callbacks, a determined
+ user could still exhaust memory. This is especially the case
+ if a system with a large number of CPUs has been configured to
+ offload all of its RCU callbacks onto a single CPU, or if the
+ system has relatively little free memory.
+
9. All RCU list-traversal primitives, which include
rcu_dereference(), list_for_each_entry_rcu(), and
list_for_each_safe_rcu(), must be either within an RCU read-side
--
1.8.1.5

2014-02-17 21:28:13

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 3/6] Documentation/memory-barriers.txt: Conditional must use prior load

From: "Paul E. McKenney" <[email protected]>

A control dependency consists of a load, a conditional that depends on
that load, and a store. This commit emphasizes this point in the
summary.

Signed-off-by: Paul E. McKenney <[email protected]>
---
Documentation/memory-barriers.txt | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index ad6db1d48f1f..f2668c19807e 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -724,7 +724,8 @@ In summary:
later loads, smp_mb().

(*) Control dependencies require at least one run-time conditional
- between the prior load and the subsequent store. If the compiler
+ between the prior load and the subsequent store, and this
+ conditional must involve the prior load. If the compiler
is able to optimize the conditional away, it will have also
optimized away the ordering. Careful use of ACCESS_ONCE() can
help to preserve the needed conditional.
--
1.8.1.5

2014-02-17 21:28:11

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 6/6] documentation: Fix some inconsistencies in RTFP.txt

From: "Paul E. McKenney" <[email protected]>

Some of the early history leaves out some citations and vice versa. This
commit fixes these up.

Signed-off-by: Paul E. McKenney <[email protected]>
---
Documentation/RCU/RTFP.txt | 149 +++++++++++++++++++++++++++++++++++++--------
1 file changed, 125 insertions(+), 24 deletions(-)

diff --git a/Documentation/RCU/RTFP.txt b/Documentation/RCU/RTFP.txt
index 273e654d7d08..2f0fcb2112d2 100644
--- a/Documentation/RCU/RTFP.txt
+++ b/Documentation/RCU/RTFP.txt
@@ -31,6 +31,14 @@ has lapsed, so this approach may be used in non-GPL software, if desired.
(In contrast, implementation of RCU is permitted only in software licensed
under either GPL or LGPL. Sorry!!!)

+In 1987, Rashid et al. described lazy TLB-flush [RichardRashid87a].
+At first glance, this has nothing to do with RCU, but nevertheless
+this paper helped inspire the update-side batching used in the later
+RCU implementation in DYNIX/ptx. In 1988, Barbara Liskov published
+a description of Argus that noted that use of out-of-date values can
+be tolerated in some situations. Thus, this paper provides some early
+theoretical justification for use of stale data.
+
In 1990, Pugh [Pugh90] noted that explicitly tracking which threads
were reading a given data structure permitted deferred free to operate
in the presence of non-terminating threads. However, this explicit
@@ -41,11 +49,11 @@ providing a fine-grained locking design, however, it would be interesting
to see how much of the performance advantage reported in 1990 remains
today.

-At about this same time, Adams [Adams91] described ``chaotic relaxation'',
-where the normal barriers between successive iterations of convergent
-numerical algorithms are relaxed, so that iteration $n$ might use
-data from iteration $n-1$ or even $n-2$. This introduces error,
-which typically slows convergence and thus increases the number of
+At about this same time, Andrews [Andrews91textbook] described ``chaotic
+relaxation'', where the normal barriers between successive iterations
+of convergent numerical algorithms are relaxed, so that iteration $n$
+might use data from iteration $n-1$ or even $n-2$. This introduces
+error, which typically slows convergence and thus increases the number of
iterations required. However, this increase is sometimes more than made
up for by a reduction in the number of expensive barrier operations,
which are otherwise required to synchronize the threads at the end
@@ -55,7 +63,8 @@ is thus inapplicable to most data structures in operating-system kernels.

In 1992, Henry (now Alexia) Massalin completed a dissertation advising
parallel programmers to defer processing when feasible to simplify
-synchronization. RCU makes extremely heavy use of this advice.
+synchronization [HMassalinPhD]. RCU makes extremely heavy use of
+this advice.

In 1993, Jacobson [Jacobson93] verbally described what is perhaps the
simplest deferred-free technique: simply waiting a fixed amount of time
@@ -90,27 +99,29 @@ mechanism, which is quite similar to RCU [Gamsa99]. These operating
systems made pervasive use of RCU in place of "existence locks", which
greatly simplifies locking hierarchies and helps avoid deadlocks.

-2001 saw the first RCU presentation involving Linux [McKenney01a]
-at OLS. The resulting abundance of RCU patches was presented the
-following year [McKenney02a], and use of RCU in dcache was first
-described that same year [Linder02a].
+The year 2000 saw an email exchange that would likely have
+led to yet another independent invention of something like RCU
+[RustyRussell2000a,RustyRussell2000b]. Instead, 2001 saw the first
+RCU presentation involving Linux [McKenney01a] at OLS. The resulting
+abundance of RCU patches was presented the following year [McKenney02a],
+and use of RCU in dcache was first described that same year [Linder02a].

Also in 2002, Michael [Michael02b,Michael02a] presented "hazard-pointer"
techniques that defer the destruction of data structures to simplify
non-blocking synchronization (wait-free synchronization, lock-free
synchronization, and obstruction-free synchronization are all examples of
-non-blocking synchronization). In particular, this technique eliminates
-locking, reduces contention, reduces memory latency for readers, and
-parallelizes pipeline stalls and memory latency for writers. However,
-these techniques still impose significant read-side overhead in the
-form of memory barriers. Researchers at Sun worked along similar lines
-in the same timeframe [HerlihyLM02]. These techniques can be thought
-of as inside-out reference counts, where the count is represented by the
-number of hazard pointers referencing a given data structure rather than
-the more conventional counter field within the data structure itself.
-The key advantage of inside-out reference counts is that they can be
-stored in immortal variables, thus allowing races between access and
-deletion to be avoided.
+non-blocking synchronization). The corresponding journal article appeared
+in 2004 [MagedMichael04a]. This technique eliminates locking, reduces
+contention, reduces memory latency for readers, and parallelizes pipeline
+stalls and memory latency for writers. However, these techniques still
+impose significant read-side overhead in the form of memory barriers.
+Researchers at Sun worked along similar lines in the same timeframe
+[HerlihyLM02]. These techniques can be thought of as inside-out reference
+counts, where the count is represented by the number of hazard pointers
+referencing a given data structure rather than the more conventional
+counter field within the data structure itself. The key advantage
+of inside-out reference counts is that they can be stored in immortal
+variables, thus allowing races between access and deletion to be avoided.

By the same token, RCU can be thought of as a "bulk reference count",
where some form of reference counter covers all reference by a given CPU
@@ -123,8 +134,10 @@ can be thought of in other terms as well.

In 2003, the K42 group described how RCU could be used to create
hot-pluggable implementations of operating-system functions [Appavoo03a].
-Later that year saw a paper describing an RCU implementation of System
-V IPC [Arcangeli03], and an introduction to RCU in Linux Journal
+Later that year saw a paper describing an RCU implementation
+of System V IPC [Arcangeli03] (following up on a suggestion by
+Hugh Dickins [Dickins02a] and an implementation by Mingming Cao
+[MingmingCao2002IPCRCU]), and an introduction to RCU in Linux Journal
[McKenney03a].

2004 has seen a Linux-Journal article on use of RCU in dcache
@@ -383,6 +396,21 @@ for Programming Languages and Operating Systems}"
}
}

+@phdthesis{HMassalinPhD
+,author="H. Massalin"
+,title="Synthesis: An Efficient Implementation of Fundamental Operating
+System Services"
+,school="Columbia University"
+,address="New York, NY"
+,year="1992"
+,annotation={
+ Mondo optimizing compiler.
+ Wait-free stuff.
+ Good advice: defer work to avoid synchronization. See page 90
+ (PDF page 106), Section 5.4, fourth bullet point.
+}
+}
+
@unpublished{Jacobson93
,author="Van Jacobson"
,title="Avoid Read-Side Locking Via Delayed Free"
@@ -671,6 +699,20 @@ Orran Krieger and Rusty Russell and Dipankar Sarma and Maneesh Soni"
[Viewed October 18, 2004]"
}

+@conference{Michael02b
+,author="Maged M. Michael"
+,title="High Performance Dynamic Lock-Free Hash Tables and List-Based Sets"
+,Year="2002"
+,Month="August"
+,booktitle="{Proceedings of the 14\textsuperscript{th} Annual ACM
+Symposium on Parallel
+Algorithms and Architecture}"
+,pages="73-82"
+,annotation={
+Like the title says...
+}
+}
+
@Conference{Linder02a
,Author="Hanna Linder and Dipankar Sarma and Maneesh Soni"
,Title="Scalability of the Directory Entry Cache"
@@ -727,6 +769,24 @@ Andrea Arcangeli and Andi Kleen and Orran Krieger and Rusty Russell"
}
}

+@conference{Michael02a
+,author="Maged M. Michael"
+,title="Safe Memory Reclamation for Dynamic Lock-Free Objects Using Atomic
+Reads and Writes"
+,Year="2002"
+,Month="August"
+,booktitle="{Proceedings of the 21\textsuperscript{st} Annual ACM
+Symposium on Principles of Distributed Computing}"
+,pages="21-30"
+,annotation={
+ Each thread keeps an array of pointers to items that it is
+ currently referencing. Sort of an inside-out garbage collection
+ mechanism, but one that requires the accessing code to explicitly
+ state its needs. Also requires read-side memory barriers on
+ most architectures.
+}
+}
+
@unpublished{Dickins02a
,author="Hugh Dickins"
,title="Use RCU for System-V IPC"
@@ -735,6 +795,17 @@ Andrea Arcangeli and Andi Kleen and Orran Krieger and Rusty Russell"
,note="private communication"
}

+@InProceedings{HerlihyLM02
+,author={Maurice Herlihy and Victor Luchangco and Mark Moir}
+,title="The Repeat Offender Problem: A Mechanism for Supporting Dynamic-Sized,
+Lock-Free Data Structures"
+,booktitle={Proceedings of 16\textsuperscript{th} International
+Symposium on Distributed Computing}
+,year=2002
+,month="October"
+,pages="339-353"
+}
+
@unpublished{Sarma02b
,Author="Dipankar Sarma"
,Title="Some dcache\_rcu benchmark numbers"
@@ -749,6 +820,19 @@ Andrea Arcangeli and Andi Kleen and Orran Krieger and Rusty Russell"
}
}

+@unpublished{MingmingCao2002IPCRCU
+,Author="Mingming Cao"
+,Title="[PATCH]updated ipc lock patch"
+,month="October"
+,year="2002"
+,note="Available:
+\url{https://lkml.org/lkml/2002/10/24/262}
+[Viewed February 15, 2014]"
+,annotation={
+ Mingming Cao's patch to introduce RCU to SysV IPC.
+}
+}
+
@unpublished{LinusTorvalds2003a
,Author="Linus Torvalds"
,Title="Re: {[PATCH]} small fixes in brlock.h"
@@ -982,6 +1066,23 @@ Realtime Applications"
}
}

+@article{MagedMichael04a
+,author="Maged M. Michael"
+,title="Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects"
+,Year="2004"
+,Month="June"
+,journal="IEEE Transactions on Parallel and Distributed Systems"
+,volume="15"
+,number="6"
+,pages="491-504"
+,url="Available:
+\url{http://www.research.ibm.com/people/m/michael/ieeetpds-2004.pdf}
+[Viewed March 1, 2005]"
+,annotation={
+ New canonical hazard-pointer citation.
+}
+}
+
@phdthesis{PaulEdwardMcKenneyPhD
,author="Paul E. McKenney"
,title="Exploiting Deferred Destruction:
--
1.8.1.5

2014-02-17 21:40:18

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 1/6] documentation: Document call_rcu() safety mechanisms and limitations

On Mon, Feb 17, 2014 at 01:26:48PM -0800, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <[email protected]>
>
> The call_rcu() family of primitives will take action to accelerate
> grace periods when the number of callbacks pending on a given CPU
> becomes excessive. Although this safety mechanism can be useful,
> it is no substitute for users of call_rcu() having rate-limit controls
> in place. This commit adds this nuance to the documentation.
>
> Reported-by: "Michael S. Tsirkin" <[email protected]>
> Reported-by: Gleb Natapov <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>

Grammatical nit below; otherwise:
Reviewed-by: Josh Triplett <[email protected]>

> Documentation/RCU/checklist.txt | 19 ++++++++++++++-----
> 1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
> index 91266193b8f4..5733e31836b5 100644
> --- a/Documentation/RCU/checklist.txt
> +++ b/Documentation/RCU/checklist.txt
> @@ -256,10 +256,11 @@ over a rather long period of time, but improvements are always welcome!
> variations on this theme.
>
> b. Limiting update rate. For example, if updates occur only
> - once per hour, then no explicit rate limiting is required,
> - unless your system is already badly broken. The dcache
> - subsystem takes this approach -- updates are guarded
> - by a global lock, limiting their rate.
> + once per hour, then no explicit rate limiting is
> + required, unless your system is already badly broken.
> + Older versions of the dcache subsystem takes this
> + approach -- updates were guarded by a global lock,
> + limiting their rate.

s/takes/take/ to match the change from the singular "The dcache
subsystem" to the plural "Older versions of the dcache subsystem"

(You might also change " -- updates are guarded by" to ", guarding
updates with".)

>
> c. Trusted update -- if updates can only be done manually by
> superuser or some other trusted user, then it might not
> @@ -268,7 +269,8 @@ over a rather long period of time, but improvements are always welcome!
> the machine.
>
> d. Use call_rcu_bh() rather than call_rcu(), in order to take
> - advantage of call_rcu_bh()'s faster grace periods.
> + advantage of call_rcu_bh()'s faster grace periods. (This
> + is only a partial solution, though.)
>
> e. Periodically invoke synchronize_rcu(), permitting a limited
> number of updates per grace period.
> @@ -276,6 +278,13 @@ over a rather long period of time, but improvements are always welcome!
> The same cautions apply to call_rcu_bh(), call_rcu_sched(),
> call_srcu(), and kfree_rcu().
>
> + Note that although these primitives do take action to avoid memory
> + exhaustion when any given CPU has too many callbacks, a determined
> + user could still exhaust memory. This is especially the case
> + if a system with a large number of CPUs has been configured to
> + offload all of its RCU callbacks onto a single CPU, or if the
> + system has relatively little free memory.
> +
> 9. All RCU list-traversal primitives, which include
> rcu_dereference(), list_for_each_entry_rcu(), and
> list_for_each_safe_rcu(), must be either within an RCU read-side
> --
> 1.8.1.5
>

2014-02-17 21:41:37

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 2/6] Documentation/memory-barriers.txt: ACCESS_ONCE() provides cache coherence

On Mon, Feb 17, 2014 at 01:26:49PM -0800, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <[email protected]>
>
> The ACCESS_ONCE() primitive provides cache coherence, but the
> documentation does not clearly state this. This commit therefore upgrades
> the documentation.
>
> Signed-off-by: Paul E. McKenney <[email protected]>

Punctuation nit below; otherwise:
Reviewed-by: Josh Triplett <[email protected]>

> Documentation/memory-barriers.txt | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index 102dc19c4119..ad6db1d48f1f 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1249,6 +1249,23 @@ The ACCESS_ONCE() function can prevent any number of optimizations that,
> while perfectly safe in single-threaded code, can be fatal in concurrent
> code. Here are some examples of these sorts of optimizations:
>
> + (*) The compiler is within its rights to reorder loads and stores
> + to the same variable, and in some cases, the CPU is within its
> + rights to reorder loads to the same variable. This means that
> + the following code:
> +
> + a[0] = x;
> + a[1] = x;
> +
> + Might result in an older value of x stored in a[1] than in a[0].
> + Prevent both the compiler and the CPU from doing this as follows:
> +
> + a[0] = ACCESS_ONCE(x);
> + a[1] = ACCESS_ONCE(x);
> +
> + In short, ACCESS_ONCE() provides "cache coherence" for accesses from
> + multiple CPUs to a single variable.

You don't need to "quote" the well-established term "cache coherence".

> (*) The compiler is within its rights to merge successive loads from
> the same variable. Such merging can cause the compiler to "optimize"
> the following code:
> --
> 1.8.1.5
>

2014-02-17 21:46:52

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 5/6] Documentation/memory-barriers.txt: Need barriers() for some control dependencies

On Mon, Feb 17, 2014 at 01:26:52PM -0800, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <[email protected]>
>
> Current compilers can "speculate" stores in the case where both legs
> of the "if" statement start with identical stores. Because the stores
> are identical, the compiler knows that the store will unconditionally
> execute regardless of the "if" condition, and so the compiler is within
> its rights to hoist the store to precede the condition. Such hoisting
> destroys the control-dependency ordering. This ordering can be restored
> by placing a barrier() at the beginning of each leg of the "if" statement.
> This commit adds this requirement to the control-dependencies section.
>
> Signed-off-by: Paul E. McKenney <[email protected]>

This is starting to become a rather unreasonable level of fighting the
compiler. ("Threads cannot be implemented as a library" indeed.) This
doesn't seem like a reasonable thing to require users to do. Is there
really no other way to cope with this particular bit of "help" from the
compiler?

> Documentation/memory-barriers.txt | 26 +++++++++++++++++++-------
> 1 file changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index f2668c19807e..adfaca831a90 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -608,26 +608,30 @@ as follows:
> b = p; /* BUG: Compiler can reorder!!! */
> do_something();
>
> -The solution is again ACCESS_ONCE(), which preserves the ordering between
> -the load from variable 'a' and the store to variable 'b':
> +The solution is again ACCESS_ONCE() and barrier(), which preserves the
> +ordering between the load from variable 'a' and the store to variable 'b':
>
> q = ACCESS_ONCE(a);
> if (q) {
> + barrier();
> ACCESS_ONCE(b) = p;
> do_something();
> } else {
> + barrier();
> ACCESS_ONCE(b) = p;
> do_something_else();
> }
>
> -You could also use barrier() to prevent the compiler from moving
> -the stores to variable 'b', but barrier() would not prevent the
> -compiler from proving to itself that a==1 always, so ACCESS_ONCE()
> -is also needed.
> +The initial ACCESS_ONCE() is required to prevent the compiler from
> +proving the value of 'a', and the pair of barrier() invocations are
> +required to prevent the compiler from pulling the two identical stores
> +to 'b' out from the legs of the "if" statement.
>
> It is important to note that control dependencies absolutely require a
> a conditional. For example, the following "optimized" version of
> -the above example breaks ordering:
> +the above example breaks ordering, which is why the barrier() invocations
> +are absolutely required if you have identical stores in both legs of
> +the "if" statement:
>
> q = ACCESS_ONCE(a);
> ACCESS_ONCE(b) = p; /* BUG: No ordering vs. load from a!!! */
> @@ -643,9 +647,11 @@ It is of course legal for the prior load to be part of the conditional,
> for example, as follows:
>
> if (ACCESS_ONCE(a) > 0) {
> + barrier();
> ACCESS_ONCE(b) = q / 2;
> do_something();
> } else {
> + barrier();
> ACCESS_ONCE(b) = q / 3;
> do_something_else();
> }
> @@ -659,9 +665,11 @@ the needed conditional. For example:
>
> q = ACCESS_ONCE(a);
> if (q % MAX) {
> + barrier();
> ACCESS_ONCE(b) = p;
> do_something();
> } else {
> + barrier();
> ACCESS_ONCE(b) = p;
> do_something_else();
> }
> @@ -723,6 +731,10 @@ In summary:
> use smb_rmb(), smp_wmb(), or, in the case of prior stores and
> later loads, smp_mb().
>
> + (*) If both legs of the "if" statement begin with identical stores
> + to the same variable, a barrier() statement is required at the
> + beginning of each leg of the "if" statement.
> +
> (*) Control dependencies require at least one run-time conditional
> between the prior load and the subsequent store, and this
> conditional must involve the prior load. If the compiler
> --
> 1.8.1.5
>

2014-02-17 21:47:27

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 0/6] Documentation changes for 3.15

On Mon, Feb 17, 2014 at 01:26:25PM -0800, Paul E. McKenney wrote:
> Hello!
>
> This series provides a variety of documentation updates:
>
> 1. Update the documentation on actions to take to avoid frenetic
> call_rcu() invocations from exhausting memory, including a mention
> of internal-to-RCU avoidance measures.
>
> 2. Add a note to Documentation/memory-barriers.txt stating that
> ACCESS_ONCE() provides cache coherence for accesses to any
> single given variable.
>
> 3. Add an explicit statement that in control dependencies, the
> condition must include the load in question.
>
> 4. Add WQ_SYSFS workqueues as a work-offloading option in
> Documentation/kernel-per-CPU-kthreads.txt.
>
> 5. It turns out that some types of control dependencies require
> memory barriers, most notably when the exact same store is
> at the beginning of both the then-clause and the else-clause.
> Document this.
>
> 6. Fixups to RTFP.txt

I've replied to 1, 2, and 5 with comments. For 3, 4, and 6:
Reviewed-by: Josh Triplett <[email protected]>

2014-02-17 22:52:37

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 1/6] documentation: Document call_rcu() safety mechanisms and limitations

On Mon, Feb 17, 2014 at 01:39:30PM -0800, Josh Triplett wrote:
> On Mon, Feb 17, 2014 at 01:26:48PM -0800, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" <[email protected]>
> >
> > The call_rcu() family of primitives will take action to accelerate
> > grace periods when the number of callbacks pending on a given CPU
> > becomes excessive. Although this safety mechanism can be useful,
> > it is no substitute for users of call_rcu() having rate-limit controls
> > in place. This commit adds this nuance to the documentation.
> >
> > Reported-by: "Michael S. Tsirkin" <[email protected]>
> > Reported-by: Gleb Natapov <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
>
> Grammatical nit below; otherwise:
> Reviewed-by: Josh Triplett <[email protected]>
>
> > Documentation/RCU/checklist.txt | 19 ++++++++++++++-----
> > 1 file changed, 14 insertions(+), 5 deletions(-)
> >
> > diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
> > index 91266193b8f4..5733e31836b5 100644
> > --- a/Documentation/RCU/checklist.txt
> > +++ b/Documentation/RCU/checklist.txt
> > @@ -256,10 +256,11 @@ over a rather long period of time, but improvements are always welcome!
> > variations on this theme.
> >
> > b. Limiting update rate. For example, if updates occur only
> > - once per hour, then no explicit rate limiting is required,
> > - unless your system is already badly broken. The dcache
> > - subsystem takes this approach -- updates are guarded
> > - by a global lock, limiting their rate.
> > + once per hour, then no explicit rate limiting is
> > + required, unless your system is already badly broken.
> > + Older versions of the dcache subsystem takes this
> > + approach -- updates were guarded by a global lock,
> > + limiting their rate.
>
> s/takes/take/ to match the change from the singular "The dcache
> subsystem" to the plural "Older versions of the dcache subsystem"
>
> (You might also change " -- updates are guarded by" to ", guarding
> updates with".)

Took both suggested changes and applied your Reviewed-by. Thank you!

Thanx, Paul

> > c. Trusted update -- if updates can only be done manually by
> > superuser or some other trusted user, then it might not
> > @@ -268,7 +269,8 @@ over a rather long period of time, but improvements are always welcome!
> > the machine.
> >
> > d. Use call_rcu_bh() rather than call_rcu(), in order to take
> > - advantage of call_rcu_bh()'s faster grace periods.
> > + advantage of call_rcu_bh()'s faster grace periods. (This
> > + is only a partial solution, though.)
> >
> > e. Periodically invoke synchronize_rcu(), permitting a limited
> > number of updates per grace period.
> > @@ -276,6 +278,13 @@ over a rather long period of time, but improvements are always welcome!
> > The same cautions apply to call_rcu_bh(), call_rcu_sched(),
> > call_srcu(), and kfree_rcu().
> >
> > + Note that although these primitives do take action to avoid memory
> > + exhaustion when any given CPU has too many callbacks, a determined
> > + user could still exhaust memory. This is especially the case
> > + if a system with a large number of CPUs has been configured to
> > + offload all of its RCU callbacks onto a single CPU, or if the
> > + system has relatively little free memory.
> > +
> > 9. All RCU list-traversal primitives, which include
> > rcu_dereference(), list_for_each_entry_rcu(), and
> > list_for_each_safe_rcu(), must be either within an RCU read-side
> > --
> > 1.8.1.5
> >
>

2014-02-17 22:53:00

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 2/6] Documentation/memory-barriers.txt: ACCESS_ONCE() provides cache coherence

On Mon, Feb 17, 2014 at 01:40:56PM -0800, Josh Triplett wrote:
> On Mon, Feb 17, 2014 at 01:26:49PM -0800, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" <[email protected]>
> >
> > The ACCESS_ONCE() primitive provides cache coherence, but the
> > documentation does not clearly state this. This commit therefore upgrades
> > the documentation.
> >
> > Signed-off-by: Paul E. McKenney <[email protected]>
>
> Punctuation nit below; otherwise:
> Reviewed-by: Josh Triplett <[email protected]>
>
> > Documentation/memory-barriers.txt | 17 +++++++++++++++++
> > 1 file changed, 17 insertions(+)
> >
> > diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> > index 102dc19c4119..ad6db1d48f1f 100644
> > --- a/Documentation/memory-barriers.txt
> > +++ b/Documentation/memory-barriers.txt
> > @@ -1249,6 +1249,23 @@ The ACCESS_ONCE() function can prevent any number of optimizations that,
> > while perfectly safe in single-threaded code, can be fatal in concurrent
> > code. Here are some examples of these sorts of optimizations:
> >
> > + (*) The compiler is within its rights to reorder loads and stores
> > + to the same variable, and in some cases, the CPU is within its
> > + rights to reorder loads to the same variable. This means that
> > + the following code:
> > +
> > + a[0] = x;
> > + a[1] = x;
> > +
> > + Might result in an older value of x stored in a[1] than in a[0].
> > + Prevent both the compiler and the CPU from doing this as follows:
> > +
> > + a[0] = ACCESS_ONCE(x);
> > + a[1] = ACCESS_ONCE(x);
> > +
> > + In short, ACCESS_ONCE() provides "cache coherence" for accesses from
> > + multiple CPUs to a single variable.
>
> You don't need to "quote" the well-established term "cache coherence".

Good point, fixed and applied your Reviewed-by, thank you!

Thanx, Paul

> > (*) The compiler is within its rights to merge successive loads from
> > the same variable. Such merging can cause the compiler to "optimize"
> > the following code:
> > --
> > 1.8.1.5
> >
>

2014-02-17 22:58:22

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 5/6] Documentation/memory-barriers.txt: Need barriers() for some control dependencies

On Mon, Feb 17, 2014 at 01:46:06PM -0800, Josh Triplett wrote:
> On Mon, Feb 17, 2014 at 01:26:52PM -0800, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" <[email protected]>
> >
> > Current compilers can "speculate" stores in the case where both legs
> > of the "if" statement start with identical stores. Because the stores
> > are identical, the compiler knows that the store will unconditionally
> > execute regardless of the "if" condition, and so the compiler is within
> > its rights to hoist the store to precede the condition. Such hoisting
> > destroys the control-dependency ordering. This ordering can be restored
> > by placing a barrier() at the beginning of each leg of the "if" statement.
> > This commit adds this requirement to the control-dependencies section.
> >
> > Signed-off-by: Paul E. McKenney <[email protected]>
>
> This is starting to become a rather unreasonable level of fighting the
> compiler. ("Threads cannot be implemented as a library" indeed.) This
> doesn't seem like a reasonable thing to require users to do. Is there
> really no other way to cope with this particular bit of "help" from the
> compiler?

Well, we could use smp_mb() instead of barrier(), but that was the
sort of thing that Peter Zijlstra was trying to avoid. That said,
I do sympathize completely with your position here -- it is just that
it is better to have our compiler-fights documented that not, right?

Thanx, Paul

> > Documentation/memory-barriers.txt | 26 +++++++++++++++++++-------
> > 1 file changed, 19 insertions(+), 7 deletions(-)
> >
> > diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> > index f2668c19807e..adfaca831a90 100644
> > --- a/Documentation/memory-barriers.txt
> > +++ b/Documentation/memory-barriers.txt
> > @@ -608,26 +608,30 @@ as follows:
> > b = p; /* BUG: Compiler can reorder!!! */
> > do_something();
> >
> > -The solution is again ACCESS_ONCE(), which preserves the ordering between
> > -the load from variable 'a' and the store to variable 'b':
> > +The solution is again ACCESS_ONCE() and barrier(), which preserves the
> > +ordering between the load from variable 'a' and the store to variable 'b':
> >
> > q = ACCESS_ONCE(a);
> > if (q) {
> > + barrier();
> > ACCESS_ONCE(b) = p;
> > do_something();
> > } else {
> > + barrier();
> > ACCESS_ONCE(b) = p;
> > do_something_else();
> > }
> >
> > -You could also use barrier() to prevent the compiler from moving
> > -the stores to variable 'b', but barrier() would not prevent the
> > -compiler from proving to itself that a==1 always, so ACCESS_ONCE()
> > -is also needed.
> > +The initial ACCESS_ONCE() is required to prevent the compiler from
> > +proving the value of 'a', and the pair of barrier() invocations are
> > +required to prevent the compiler from pulling the two identical stores
> > +to 'b' out from the legs of the "if" statement.
> >
> > It is important to note that control dependencies absolutely require a
> > a conditional. For example, the following "optimized" version of
> > -the above example breaks ordering:
> > +the above example breaks ordering, which is why the barrier() invocations
> > +are absolutely required if you have identical stores in both legs of
> > +the "if" statement:
> >
> > q = ACCESS_ONCE(a);
> > ACCESS_ONCE(b) = p; /* BUG: No ordering vs. load from a!!! */
> > @@ -643,9 +647,11 @@ It is of course legal for the prior load to be part of the conditional,
> > for example, as follows:
> >
> > if (ACCESS_ONCE(a) > 0) {
> > + barrier();
> > ACCESS_ONCE(b) = q / 2;
> > do_something();
> > } else {
> > + barrier();
> > ACCESS_ONCE(b) = q / 3;
> > do_something_else();
> > }
> > @@ -659,9 +665,11 @@ the needed conditional. For example:
> >
> > q = ACCESS_ONCE(a);
> > if (q % MAX) {
> > + barrier();
> > ACCESS_ONCE(b) = p;
> > do_something();
> > } else {
> > + barrier();
> > ACCESS_ONCE(b) = p;
> > do_something_else();
> > }
> > @@ -723,6 +731,10 @@ In summary:
> > use smb_rmb(), smp_wmb(), or, in the case of prior stores and
> > later loads, smp_mb().
> >
> > + (*) If both legs of the "if" statement begin with identical stores
> > + to the same variable, a barrier() statement is required at the
> > + beginning of each leg of the "if" statement.
> > +
> > (*) Control dependencies require at least one run-time conditional
> > between the prior load and the subsequent store, and this
> > conditional must involve the prior load. If the compiler
> > --
> > 1.8.1.5
> >
>

2014-02-18 00:03:04

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 5/6] Documentation/memory-barriers.txt: Need barriers() for some control dependencies

On Mon, Feb 17, 2014 at 02:58:16PM -0800, Paul E. McKenney wrote:
> On Mon, Feb 17, 2014 at 01:46:06PM -0800, Josh Triplett wrote:
> > On Mon, Feb 17, 2014 at 01:26:52PM -0800, Paul E. McKenney wrote:
> > > From: "Paul E. McKenney" <[email protected]>
> > >
> > > Current compilers can "speculate" stores in the case where both legs
> > > of the "if" statement start with identical stores. Because the stores
> > > are identical, the compiler knows that the store will unconditionally
> > > execute regardless of the "if" condition, and so the compiler is within
> > > its rights to hoist the store to precede the condition. Such hoisting
> > > destroys the control-dependency ordering. This ordering can be restored
> > > by placing a barrier() at the beginning of each leg of the "if" statement.
> > > This commit adds this requirement to the control-dependencies section.
> > >
> > > Signed-off-by: Paul E. McKenney <[email protected]>
> >
> > This is starting to become a rather unreasonable level of fighting the
> > compiler. ("Threads cannot be implemented as a library" indeed.) This
> > doesn't seem like a reasonable thing to require users to do. Is there
> > really no other way to cope with this particular bit of "help" from the
> > compiler?
>
> Well, we could use smp_mb() instead of barrier(), but that was the
> sort of thing that Peter Zijlstra was trying to avoid.

Yeah, that's not an improvement. The goal would be to make the code no
more complex than it already needs to be with ACCESS_ONCE; changing
"barrier()" to something else doesn't help (quite apart from smp_mb()
being suboptimal).

> That said, I do sympathize completely with your position here -- it is
> just that it is better to have our compiler-fights documented that
> not, right?

Sure, better to document them, but better still to not have them. Is
there some other way we could avoid this one entirely?

- Josh Triplett

> > > Documentation/memory-barriers.txt | 26 +++++++++++++++++++-------
> > > 1 file changed, 19 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> > > index f2668c19807e..adfaca831a90 100644
> > > --- a/Documentation/memory-barriers.txt
> > > +++ b/Documentation/memory-barriers.txt
> > > @@ -608,26 +608,30 @@ as follows:
> > > b = p; /* BUG: Compiler can reorder!!! */
> > > do_something();
> > >
> > > -The solution is again ACCESS_ONCE(), which preserves the ordering between
> > > -the load from variable 'a' and the store to variable 'b':
> > > +The solution is again ACCESS_ONCE() and barrier(), which preserves the
> > > +ordering between the load from variable 'a' and the store to variable 'b':
> > >
> > > q = ACCESS_ONCE(a);
> > > if (q) {
> > > + barrier();
> > > ACCESS_ONCE(b) = p;
> > > do_something();
> > > } else {
> > > + barrier();
> > > ACCESS_ONCE(b) = p;
> > > do_something_else();
> > > }
> > >
> > > -You could also use barrier() to prevent the compiler from moving
> > > -the stores to variable 'b', but barrier() would not prevent the
> > > -compiler from proving to itself that a==1 always, so ACCESS_ONCE()
> > > -is also needed.
> > > +The initial ACCESS_ONCE() is required to prevent the compiler from
> > > +proving the value of 'a', and the pair of barrier() invocations are
> > > +required to prevent the compiler from pulling the two identical stores
> > > +to 'b' out from the legs of the "if" statement.
> > >
> > > It is important to note that control dependencies absolutely require a
> > > a conditional. For example, the following "optimized" version of
> > > -the above example breaks ordering:
> > > +the above example breaks ordering, which is why the barrier() invocations
> > > +are absolutely required if you have identical stores in both legs of
> > > +the "if" statement:
> > >
> > > q = ACCESS_ONCE(a);
> > > ACCESS_ONCE(b) = p; /* BUG: No ordering vs. load from a!!! */
> > > @@ -643,9 +647,11 @@ It is of course legal for the prior load to be part of the conditional,
> > > for example, as follows:
> > >
> > > if (ACCESS_ONCE(a) > 0) {
> > > + barrier();
> > > ACCESS_ONCE(b) = q / 2;
> > > do_something();
> > > } else {
> > > + barrier();
> > > ACCESS_ONCE(b) = q / 3;
> > > do_something_else();
> > > }
> > > @@ -659,9 +665,11 @@ the needed conditional. For example:
> > >
> > > q = ACCESS_ONCE(a);
> > > if (q % MAX) {
> > > + barrier();
> > > ACCESS_ONCE(b) = p;
> > > do_something();
> > > } else {
> > > + barrier();
> > > ACCESS_ONCE(b) = p;
> > > do_something_else();
> > > }
> > > @@ -723,6 +731,10 @@ In summary:
> > > use smb_rmb(), smp_wmb(), or, in the case of prior stores and
> > > later loads, smp_mb().
> > >
> > > + (*) If both legs of the "if" statement begin with identical stores
> > > + to the same variable, a barrier() statement is required at the
> > > + beginning of each leg of the "if" statement.
> > > +
> > > (*) Control dependencies require at least one run-time conditional
> > > between the prior load and the subsequent store, and this
> > > conditional must involve the prior load. If the compiler
> > > --
> > > 1.8.1.5
> > >
> >
>

2014-02-18 00:17:46

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 5/6] Documentation/memory-barriers.txt: Need barriers() for some control dependencies

On Mon, Feb 17, 2014 at 04:02:47PM -0800, Josh Triplett wrote:
> On Mon, Feb 17, 2014 at 02:58:16PM -0800, Paul E. McKenney wrote:
> > On Mon, Feb 17, 2014 at 01:46:06PM -0800, Josh Triplett wrote:
> > > On Mon, Feb 17, 2014 at 01:26:52PM -0800, Paul E. McKenney wrote:
> > > > From: "Paul E. McKenney" <[email protected]>
> > > >
> > > > Current compilers can "speculate" stores in the case where both legs
> > > > of the "if" statement start with identical stores. Because the stores
> > > > are identical, the compiler knows that the store will unconditionally
> > > > execute regardless of the "if" condition, and so the compiler is within
> > > > its rights to hoist the store to precede the condition. Such hoisting
> > > > destroys the control-dependency ordering. This ordering can be restored
> > > > by placing a barrier() at the beginning of each leg of the "if" statement.
> > > > This commit adds this requirement to the control-dependencies section.
> > > >
> > > > Signed-off-by: Paul E. McKenney <[email protected]>
> > >
> > > This is starting to become a rather unreasonable level of fighting the
> > > compiler. ("Threads cannot be implemented as a library" indeed.) This
> > > doesn't seem like a reasonable thing to require users to do. Is there
> > > really no other way to cope with this particular bit of "help" from the
> > > compiler?
> >
> > Well, we could use smp_mb() instead of barrier(), but that was the
> > sort of thing that Peter Zijlstra was trying to avoid.
>
> Yeah, that's not an improvement. The goal would be to make the code no
> more complex than it already needs to be with ACCESS_ONCE; changing
> "barrier()" to something else doesn't help (quite apart from smp_mb()
> being suboptimal).
>
> > That said, I do sympathize completely with your position here -- it is
> > just that it is better to have our compiler-fights documented that
> > not, right?
>
> Sure, better to document them, but better still to not have them. Is
> there some other way we could avoid this one entirely?

We could try change the standard so as to outlaw pulling common code from
both legs of an "if" statement, but that will be a serious uphill battle.
Or perhaps do something to warn the developer about the possibility of
this happening.

Other thoughts?

Thanx, Paul

> - Josh Triplett
>
> > > > Documentation/memory-barriers.txt | 26 +++++++++++++++++++-------
> > > > 1 file changed, 19 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> > > > index f2668c19807e..adfaca831a90 100644
> > > > --- a/Documentation/memory-barriers.txt
> > > > +++ b/Documentation/memory-barriers.txt
> > > > @@ -608,26 +608,30 @@ as follows:
> > > > b = p; /* BUG: Compiler can reorder!!! */
> > > > do_something();
> > > >
> > > > -The solution is again ACCESS_ONCE(), which preserves the ordering between
> > > > -the load from variable 'a' and the store to variable 'b':
> > > > +The solution is again ACCESS_ONCE() and barrier(), which preserves the
> > > > +ordering between the load from variable 'a' and the store to variable 'b':
> > > >
> > > > q = ACCESS_ONCE(a);
> > > > if (q) {
> > > > + barrier();
> > > > ACCESS_ONCE(b) = p;
> > > > do_something();
> > > > } else {
> > > > + barrier();
> > > > ACCESS_ONCE(b) = p;
> > > > do_something_else();
> > > > }
> > > >
> > > > -You could also use barrier() to prevent the compiler from moving
> > > > -the stores to variable 'b', but barrier() would not prevent the
> > > > -compiler from proving to itself that a==1 always, so ACCESS_ONCE()
> > > > -is also needed.
> > > > +The initial ACCESS_ONCE() is required to prevent the compiler from
> > > > +proving the value of 'a', and the pair of barrier() invocations are
> > > > +required to prevent the compiler from pulling the two identical stores
> > > > +to 'b' out from the legs of the "if" statement.
> > > >
> > > > It is important to note that control dependencies absolutely require a
> > > > a conditional. For example, the following "optimized" version of
> > > > -the above example breaks ordering:
> > > > +the above example breaks ordering, which is why the barrier() invocations
> > > > +are absolutely required if you have identical stores in both legs of
> > > > +the "if" statement:
> > > >
> > > > q = ACCESS_ONCE(a);
> > > > ACCESS_ONCE(b) = p; /* BUG: No ordering vs. load from a!!! */
> > > > @@ -643,9 +647,11 @@ It is of course legal for the prior load to be part of the conditional,
> > > > for example, as follows:
> > > >
> > > > if (ACCESS_ONCE(a) > 0) {
> > > > + barrier();
> > > > ACCESS_ONCE(b) = q / 2;
> > > > do_something();
> > > > } else {
> > > > + barrier();
> > > > ACCESS_ONCE(b) = q / 3;
> > > > do_something_else();
> > > > }
> > > > @@ -659,9 +665,11 @@ the needed conditional. For example:
> > > >
> > > > q = ACCESS_ONCE(a);
> > > > if (q % MAX) {
> > > > + barrier();
> > > > ACCESS_ONCE(b) = p;
> > > > do_something();
> > > > } else {
> > > > + barrier();
> > > > ACCESS_ONCE(b) = p;
> > > > do_something_else();
> > > > }
> > > > @@ -723,6 +731,10 @@ In summary:
> > > > use smb_rmb(), smp_wmb(), or, in the case of prior stores and
> > > > later loads, smp_mb().
> > > >
> > > > + (*) If both legs of the "if" statement begin with identical stores
> > > > + to the same variable, a barrier() statement is required at the
> > > > + beginning of each leg of the "if" statement.
> > > > +
> > > > (*) Control dependencies require at least one run-time conditional
> > > > between the prior load and the subsequent store, and this
> > > > conditional must involve the prior load. If the compiler
> > > > --
> > > > 1.8.1.5
> > > >
> > >
> >
>

2014-02-18 00:45:25

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 5/6] Documentation/memory-barriers.txt: Need barriers() for some control dependencies

On Mon, Feb 17, 2014 at 04:17:40PM -0800, Paul E. McKenney wrote:
> On Mon, Feb 17, 2014 at 04:02:47PM -0800, Josh Triplett wrote:
> > On Mon, Feb 17, 2014 at 02:58:16PM -0800, Paul E. McKenney wrote:
> > > On Mon, Feb 17, 2014 at 01:46:06PM -0800, Josh Triplett wrote:
> > > > On Mon, Feb 17, 2014 at 01:26:52PM -0800, Paul E. McKenney wrote:
> > > > > From: "Paul E. McKenney" <[email protected]>
> > > > >
> > > > > Current compilers can "speculate" stores in the case where both legs
> > > > > of the "if" statement start with identical stores. Because the stores
> > > > > are identical, the compiler knows that the store will unconditionally
> > > > > execute regardless of the "if" condition, and so the compiler is within
> > > > > its rights to hoist the store to precede the condition. Such hoisting
> > > > > destroys the control-dependency ordering. This ordering can be restored
> > > > > by placing a barrier() at the beginning of each leg of the "if" statement.
> > > > > This commit adds this requirement to the control-dependencies section.
> > > > >
> > > > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > >
> > > > This is starting to become a rather unreasonable level of fighting the
> > > > compiler. ("Threads cannot be implemented as a library" indeed.) This
> > > > doesn't seem like a reasonable thing to require users to do. Is there
> > > > really no other way to cope with this particular bit of "help" from the
> > > > compiler?
> > >
> > > Well, we could use smp_mb() instead of barrier(), but that was the
> > > sort of thing that Peter Zijlstra was trying to avoid.
> >
> > Yeah, that's not an improvement. The goal would be to make the code no
> > more complex than it already needs to be with ACCESS_ONCE; changing
> > "barrier()" to something else doesn't help (quite apart from smp_mb()
> > being suboptimal).
> >
> > > That said, I do sympathize completely with your position here -- it is
> > > just that it is better to have our compiler-fights documented that
> > > not, right?
> >
> > Sure, better to document them, but better still to not have them. Is
> > there some other way we could avoid this one entirely?
>
> We could try change the standard so as to outlaw pulling common code from
> both legs of an "if" statement, but that will be a serious uphill battle.

And insufficient given widespread use of existing compilers.

> Or perhaps do something to warn the developer about the possibility of
> this happening.
>
> Other thoughts?

Might be worth bringing this up with the GCC folks to find out if
there's something obvious we're missing. (For non-obvious values of
"obvious".)

- Josh Triplett

2014-02-18 01:21:45

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 5/6] Documentation/memory-barriers.txt: Need barriers() for some control dependencies

On Mon, Feb 17, 2014 at 04:45:11PM -0800, Josh Triplett wrote:
> On Mon, Feb 17, 2014 at 04:17:40PM -0800, Paul E. McKenney wrote:
> > On Mon, Feb 17, 2014 at 04:02:47PM -0800, Josh Triplett wrote:
> > > On Mon, Feb 17, 2014 at 02:58:16PM -0800, Paul E. McKenney wrote:
> > > > On Mon, Feb 17, 2014 at 01:46:06PM -0800, Josh Triplett wrote:
> > > > > On Mon, Feb 17, 2014 at 01:26:52PM -0800, Paul E. McKenney wrote:
> > > > > > From: "Paul E. McKenney" <[email protected]>
> > > > > >
> > > > > > Current compilers can "speculate" stores in the case where both legs
> > > > > > of the "if" statement start with identical stores. Because the stores
> > > > > > are identical, the compiler knows that the store will unconditionally
> > > > > > execute regardless of the "if" condition, and so the compiler is within
> > > > > > its rights to hoist the store to precede the condition. Such hoisting
> > > > > > destroys the control-dependency ordering. This ordering can be restored
> > > > > > by placing a barrier() at the beginning of each leg of the "if" statement.
> > > > > > This commit adds this requirement to the control-dependencies section.
> > > > > >
> > > > > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > > >
> > > > > This is starting to become a rather unreasonable level of fighting the
> > > > > compiler. ("Threads cannot be implemented as a library" indeed.) This
> > > > > doesn't seem like a reasonable thing to require users to do. Is there
> > > > > really no other way to cope with this particular bit of "help" from the
> > > > > compiler?
> > > >
> > > > Well, we could use smp_mb() instead of barrier(), but that was the
> > > > sort of thing that Peter Zijlstra was trying to avoid.
> > >
> > > Yeah, that's not an improvement. The goal would be to make the code no
> > > more complex than it already needs to be with ACCESS_ONCE; changing
> > > "barrier()" to something else doesn't help (quite apart from smp_mb()
> > > being suboptimal).
> > >
> > > > That said, I do sympathize completely with your position here -- it is
> > > > just that it is better to have our compiler-fights documented that
> > > > not, right?
> > >
> > > Sure, better to document them, but better still to not have them. Is
> > > there some other way we could avoid this one entirely?
> >
> > We could try change the standard so as to outlaw pulling common code from
> > both legs of an "if" statement, but that will be a serious uphill battle.
>
> And insufficient given widespread use of existing compilers.

Fair point...

> > Or perhaps do something to warn the developer about the possibility of
> > this happening.
> >
> > Other thoughts?
>
> Might be worth bringing this up with the GCC folks to find out if
> there's something obvious we're missing. (For non-obvious values of
> "obvious".)

Non-obvious values of "obvious" -- I have no idea what that means, but
it does have a nice counter-intuitive sound to it, doesn't it? ;-)

This conversation has started, albeit with much more noise and smoke
than signal or light.

Thanx, Paul

2014-02-18 03:30:11

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 5/6] Documentation/memory-barriers.txt: Need barriers() for some control dependencies

On Mon, Feb 17, 2014 at 05:21:37PM -0800, Paul E. McKenney wrote:
> On Mon, Feb 17, 2014 at 04:45:11PM -0800, Josh Triplett wrote:
> > On Mon, Feb 17, 2014 at 04:17:40PM -0800, Paul E. McKenney wrote:
> > > On Mon, Feb 17, 2014 at 04:02:47PM -0800, Josh Triplett wrote:
> > > > On Mon, Feb 17, 2014 at 02:58:16PM -0800, Paul E. McKenney wrote:
> > > > > On Mon, Feb 17, 2014 at 01:46:06PM -0800, Josh Triplett wrote:
> > > > > > On Mon, Feb 17, 2014 at 01:26:52PM -0800, Paul E. McKenney wrote:
> > > > > > > From: "Paul E. McKenney" <[email protected]>
> > > > > > >
> > > > > > > Current compilers can "speculate" stores in the case where both legs
> > > > > > > of the "if" statement start with identical stores. Because the stores
> > > > > > > are identical, the compiler knows that the store will unconditionally
> > > > > > > execute regardless of the "if" condition, and so the compiler is within
> > > > > > > its rights to hoist the store to precede the condition. Such hoisting
> > > > > > > destroys the control-dependency ordering. This ordering can be restored
> > > > > > > by placing a barrier() at the beginning of each leg of the "if" statement.
> > > > > > > This commit adds this requirement to the control-dependencies section.
> > > > > > >
> > > > > > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > > > >
> > > > > > This is starting to become a rather unreasonable level of fighting the
> > > > > > compiler. ("Threads cannot be implemented as a library" indeed.) This
> > > > > > doesn't seem like a reasonable thing to require users to do. Is there
> > > > > > really no other way to cope with this particular bit of "help" from the
> > > > > > compiler?
> > > > >
> > > > > Well, we could use smp_mb() instead of barrier(), but that was the
> > > > > sort of thing that Peter Zijlstra was trying to avoid.
> > > >
> > > > Yeah, that's not an improvement. The goal would be to make the code no
> > > > more complex than it already needs to be with ACCESS_ONCE; changing
> > > > "barrier()" to something else doesn't help (quite apart from smp_mb()
> > > > being suboptimal).
> > > >
> > > > > That said, I do sympathize completely with your position here -- it is
> > > > > just that it is better to have our compiler-fights documented that
> > > > > not, right?
> > > >
> > > > Sure, better to document them, but better still to not have them. Is
> > > > there some other way we could avoid this one entirely?
> > >
> > > We could try change the standard so as to outlaw pulling common code from
> > > both legs of an "if" statement, but that will be a serious uphill battle.
> >
> > And insufficient given widespread use of existing compilers.
>
> Fair point...
>
> > > Or perhaps do something to warn the developer about the possibility of
> > > this happening.
> > >
> > > Other thoughts?
> >
> > Might be worth bringing this up with the GCC folks to find out if
> > there's something obvious we're missing. (For non-obvious values of
> > "obvious".)
>
> Non-obvious values of "obvious" -- I have no idea what that means, but
> it does have a nice counter-intuitive sound to it, doesn't it? ;-)

I'm hoping for something that we'll consider obvious in hindsight.

> This conversation has started, albeit with much more noise and smoke
> than signal or light.

Whereabouts is that conversation taking place?

- Josh Triplett

2014-02-18 04:57:56

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 5/6] Documentation/memory-barriers.txt: Need barriers() for some control dependencies

On Mon, Feb 17, 2014 at 07:29:55PM -0800, Josh Triplett wrote:
> On Mon, Feb 17, 2014 at 05:21:37PM -0800, Paul E. McKenney wrote:
> > On Mon, Feb 17, 2014 at 04:45:11PM -0800, Josh Triplett wrote:
> > > On Mon, Feb 17, 2014 at 04:17:40PM -0800, Paul E. McKenney wrote:
> > > > On Mon, Feb 17, 2014 at 04:02:47PM -0800, Josh Triplett wrote:
> > > > > On Mon, Feb 17, 2014 at 02:58:16PM -0800, Paul E. McKenney wrote:
> > > > > > On Mon, Feb 17, 2014 at 01:46:06PM -0800, Josh Triplett wrote:
> > > > > > > On Mon, Feb 17, 2014 at 01:26:52PM -0800, Paul E. McKenney wrote:
> > > > > > > > From: "Paul E. McKenney" <[email protected]>
> > > > > > > >
> > > > > > > > Current compilers can "speculate" stores in the case where both legs
> > > > > > > > of the "if" statement start with identical stores. Because the stores
> > > > > > > > are identical, the compiler knows that the store will unconditionally
> > > > > > > > execute regardless of the "if" condition, and so the compiler is within
> > > > > > > > its rights to hoist the store to precede the condition. Such hoisting
> > > > > > > > destroys the control-dependency ordering. This ordering can be restored
> > > > > > > > by placing a barrier() at the beginning of each leg of the "if" statement.
> > > > > > > > This commit adds this requirement to the control-dependencies section.
> > > > > > > >
> > > > > > > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > > > > >
> > > > > > > This is starting to become a rather unreasonable level of fighting the
> > > > > > > compiler. ("Threads cannot be implemented as a library" indeed.) This
> > > > > > > doesn't seem like a reasonable thing to require users to do. Is there
> > > > > > > really no other way to cope with this particular bit of "help" from the
> > > > > > > compiler?
> > > > > >
> > > > > > Well, we could use smp_mb() instead of barrier(), but that was the
> > > > > > sort of thing that Peter Zijlstra was trying to avoid.
> > > > >
> > > > > Yeah, that's not an improvement. The goal would be to make the code no
> > > > > more complex than it already needs to be with ACCESS_ONCE; changing
> > > > > "barrier()" to something else doesn't help (quite apart from smp_mb()
> > > > > being suboptimal).
> > > > >
> > > > > > That said, I do sympathize completely with your position here -- it is
> > > > > > just that it is better to have our compiler-fights documented that
> > > > > > not, right?
> > > > >
> > > > > Sure, better to document them, but better still to not have them. Is
> > > > > there some other way we could avoid this one entirely?
> > > >
> > > > We could try change the standard so as to outlaw pulling common code from
> > > > both legs of an "if" statement, but that will be a serious uphill battle.
> > >
> > > And insufficient given widespread use of existing compilers.
> >
> > Fair point...
> >
> > > > Or perhaps do something to warn the developer about the possibility of
> > > > this happening.
> > > >
> > > > Other thoughts?
> > >
> > > Might be worth bringing this up with the GCC folks to find out if
> > > there's something obvious we're missing. (For non-obvious values of
> > > "obvious".)
> >
> > Non-obvious values of "obvious" -- I have no idea what that means, but
> > it does have a nice counter-intuitive sound to it, doesn't it? ;-)
>
> I'm hoping for something that we'll consider obvious in hindsight.
>
> > This conversation has started, albeit with much more noise and smoke
> > than signal or light.
>
> Whereabouts is that conversation taking place?

Here you go: https://lkml.org/lkml/2014/2/6/193

Not too bad, only about 80 messages thus far. Spirited at times, though. ;-)

Thanx, Paul

2020-04-24 06:43:51

by Jon Masters

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 2/6] Documentation/memory-barriers.txt: ACCESS_ONCE() provides cache coherence

Hi Paul,

On 2/17/14 4:26 PM, Paul E. McKenney wrote:

> The ACCESS_ONCE() primitive provides cache coherence, but the
> documentation does not clearly state this. This commit therefore upgrades
> the documentation.

<snip>

> + In short, ACCESS_ONCE() provides "cache coherence" for accesses from
> + multiple CPUs to a single variable.

(ACCESS_ONCE is now READ_ONCE/WRITE_ONCE but the above added the
original language around cache coherence)

I would argue that we might want to avoid describing it in this manner.
The hardware provides cache coherency in order to keep a single memory
location coherent between multiple observers. These kernel macros only
tell the compiler to perform the load once. They take advantage of the
properties of coherence in the presence of multiple observers.

Jon.

--
Computer Architect

2020-04-27 23:02:57

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 2/6] Documentation/memory-barriers.txt: ACCESS_ONCE() provides cache coherence

On Thu, Apr 23, 2020 at 11:36:19PM -0400, Jon Masters wrote:
> Hi Paul,
>
> On 2/17/14 4:26 PM, Paul E. McKenney wrote:
>
> > The ACCESS_ONCE() primitive provides cache coherence, but the
> > documentation does not clearly state this. This commit therefore upgrades
> > the documentation.
>
> <snip>
>
> > + In short, ACCESS_ONCE() provides "cache coherence" for accesses from
> > + multiple CPUs to a single variable.
>
> (ACCESS_ONCE is now READ_ONCE/WRITE_ONCE but the above added the original
> language around cache coherence)
>
> I would argue that we might want to avoid describing it in this manner. The
> hardware provides cache coherency in order to keep a single memory location
> coherent between multiple observers. These kernel macros only tell the
> compiler to perform the load once. They take advantage of the properties of
> coherence in the presence of multiple observers.

You lost me on this one. Are you advocating that this be described
as constraining the compiler from invalidating the cache coherency
(single-variable sequential consistency) provided by modern hardware?

Just for background, my view is that "cache coherence", like "real time",
is a property of the system that can be destroyed by any component.

Thanx, Paul