2017-06-12 21:38:01

by Paul E. McKenney

[permalink] [raw]
Subject: [GIT PULL rcu/next] RCU commits for 4.13

Hello, Ingo,

This pull request is unusual in being a single linear set of commits,
as opposed to my usual topic branches. This is due to the many
large-footprint changes, which means that reasonable topic branches
result in large numbers of merge conflicts. In addition, some commits
depend on other commits that should be on different topic branches.
I will return to the topic-branch style next time.

The largest feature of this series is shrinking and simplification,
with the following diffstat summary:

79 files changed, 1496 insertions(+), 4211 deletions(-)

In other words, this series represents a net reduction of more than 2700
lines of code.

These commits were posted to LKML:

http://lkml.kernel.org/r/[email protected]

Two of these commits (46/88 and 48/88) have been deferred, most likely
to v4.14. All of the remaining commits have been subjected to the 0day
Test Robot and -next testing, and are availiable in teh git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git for-mingo

for you to fetch changes up to 6d48152eafde1f0d0a4a9e0584fa7d9ff4fbfdac:

rcu: Remove RCU CPU stall warnings from Tiny RCU (2017-06-08 18:52:45 -0700)

----------------------------------------------------------------
Arnd Bergmann (1):
bcm47xx: Fix build regression

Paul E. McKenney (83):
rcutorture: Add lockdep to one of the SRCU scenarios
rcutorture: Add three-level tree test for Tree SRCU
rcutorture: Fix bug in reporting Kconfig mis-settings
rcutorture: Add a scenario for Tiny SRCU
rcutorture: Add a scenario for Classic SRCU
rcu: Prevent rcu_barrier() from starting needless grace periods
rcutorture: Correctly handle CONFIG_RCU_TORTURE_TEST_* options
rcutorture: Update test scenarios based on new Kconfig dependencies
srcu: Eliminate possibility of destructive counter overflow
rcu: Complain if blocking in preemptible RCU read-side critical section
rcuperf: Defer expedited/normal check to end of test
rcuperf: Remove conflicting Kconfig options
rcu: Remove obsolete reference to synchronize_kernel()
rcuperf: Add ability to performance-test call_rcu() and friends
rcuperf: Add a Kconfig-fragment file for Classic SRCU
rcu: Make sync_rcu_preempt_exp_done() return bool
checkpatch: Remove checks for expedited grace periods
rcuperf: Add test for dynamically initialized srcu_struct
doc/atomic_ops: Clarify smp_mb__{before,after}_atomic()
atomics: Add header comment so spin_unlock_wait()
rcuperf: Add the ability to test tiny RCU flavors
srcu: Make Classic and Tree SRCU announce themselves at bootup
rcutorture: Reduce CPUs dedicated to testing Classic SRCU
srcu: Shrink Tiny SRCU a bit more
rcuperf: Set more user-friendly defaults
rcuperf: Add writer_holdoff boot parameter
rcutorture: Add "git diff" output to testid.txt file
srcu: Document auto-expediting requirement
doc: Take tail recursion into account in RCU requirements
rcu: Add preemptibility checks in rcu_sched_qs() and rcu_bh_qs()
rcu: Print out rcupdate.c non-default boot-time settings
rcu: Update rcu_bootup_announce_oddness()
srcu: Make exp_holdoff module parameter be static
srcu: Print non-default exp_holdoff values at boot time
rcu: Add lockdep_assert_held() teeth to tree.c
rcu: Add lockdep_assert_held() teeth to tree_plugin.h
srcu: Make SRCU be once again optional
srcu: Shrink Tiny SRCU a bit
srcu: Add DEBUG_OBJECTS_RCU_HEAD functionality
rcu: Make synchronize_rcu_mult() check for duplicates
sched: Rely on synchronize_rcu_mult() de-duplication
rcu: Use RCU_NOCB_WAKE rather than RCU_NOGP_WAKE
rcu: Add memory barriers for NOCB leader wakeup
rcu: Flag need for rcu_node_tree.h and rcu_segcblist.h visibility
rcu: Move docbook comments out of rcupdate.h
rcu: Move rcu_expedited and rcu_normal externs from rcupdate.h
rcu: Move expediting-related access/control out of rcupdate.h
rcu: Move torture-related definitions from rcupdate.h to rcu.h
rcu: Remove UINT_CMP_GE() and UINT_CMP_LT()
rcu: Move rcupdate.h to new empty-function style
rcu: Eliminate the unused __rcu_is_watching() function
rcu: Move the RCU_SCHEDULER_ definitions from rcupdate.h
rcu: Remove linux/debugobjects.h from rcupdate.h
rcu: Improve __call_rcu() debug-objects error message
rcu: Move rcu_is_nocb_cpu() from rcupdate.h to rcu.h
rcu: Move rcu_ftrace_dump() from rcupdate.h to rcu.h
rcu: move rcupdate.h to the new true/false-function style
rcu: Move torture-related functions out of rcutiny.h and rcutree.h
rcu: Move rcu_request_urgent_qs_task() out of rcutiny.h and rcutree.h
rcu: Move rcutiny.h to new empty/true/false-function style
srcu: Prevent sdp->srcu_gp_seq_needed counter wrap
srcu: Shrink srcu.h by moving docbook and private function
srcu: Apply trivial callback lists to shrink Tiny SRCU
lockdep: Use consistent printing primitives
rcu: Refactor #includes from include/linux/rcupdate.h
rcu: Convert rnp->lock wrappers to macros for SRCU use
rcu: Move rnp->lock wrappers for SRCU use
srcu: Use rnp->lock wrappers to replace explicit memory barriers
rcu: Remove *_SLOW_* Kconfig options
rcu: Remove the RCU_KTHREAD_PRIO Kconfig option
rcu: Remove nohz_full full-system-idle state machine
rcu: Remove #ifdef moving rcu_end_inkernel_boot from rcupdate.h
rcu: Remove typecheck() from RCU locking wrapper functions
rcu: Remove the now-obsolete PROVE_RCU_REPEATEDLY Kconfig option
rcu: Remove SPARSE_RCU_POINTER Kconfig option
srcu: Fix rcutorture-statistics typo
srcu: Remove Classic SRCU
rcu: Remove debugfs tracing
rcu: Eliminate NOCBs CPU-state Kconfig options
rcu: Move RCU non-debug Kconfig options to kernel/rcu
rcu: Move RCU debug Kconfig options to kernel/rcu
rcu: Remove event tracing from Tiny RCU
rcu: Remove RCU CPU stall warnings from Tiny RCU

Priyalee Kushwaha (1):
srcu-cbmc: Use /usr/bin/awk instead of /bin/awk

Stan Drozd (1):
docs: Fix typo in Documentation/memory-barriers.txt

Documentation/RCU/00-INDEX | 2 -
.../RCU/Design/Requirements/Requirements.html | 34 +-
Documentation/RCU/checklist.txt | 8 +-
Documentation/RCU/trace.txt | 535 -----------------
Documentation/admin-guide/kernel-parameters.txt | 41 +-
Documentation/core-api/atomic_ops.rst | 5 +
Documentation/dev-tools/sparse.rst | 6 -
Documentation/kernel-per-CPU-kthreads.txt | 31 +-
Documentation/memory-barriers.txt | 2 +-
Documentation/timers/NO_HZ.txt | 29 +-
include/linux/bcm47xx_nvram.h | 1 +
include/linux/compiler.h | 4 -
include/linux/rcu_node_tree.h | 4 +
include/linux/rcu_segcblist.h | 4 +
include/linux/rcupdate.h | 318 +---------
include/linux/rcutiny.h | 167 +-----
include/linux/rcutree.h | 21 +-
include/linux/spinlock.h | 20 +
include/linux/srcu.h | 25 +-
include/linux/srcuclassic.h | 115 ----
include/linux/srcutiny.h | 47 +-
include/linux/srcutree.h | 13 +-
include/trace/events/rcu.h | 1 +
init/Kconfig | 349 +----------
kernel/locking/lockdep.c | 176 +++---
kernel/rcu/Kconfig | 242 ++++++++
kernel/rcu/Kconfig.debug | 82 +++
kernel/rcu/Makefile | 2 -
kernel/rcu/rcu.h | 277 +++++++++
kernel/rcu/rcuperf.c | 129 +++-
kernel/rcu/rcutorture.c | 21 +-
kernel/rcu/srcu.c | 661 ---------------------
kernel/rcu/srcutiny.c | 86 +--
kernel/rcu/srcutree.c | 187 ++++--
kernel/rcu/tiny.c | 54 +-
kernel/rcu/tiny_plugin.h | 123 ----
kernel/rcu/tree.c | 195 +++---
kernel/rcu/tree.h | 109 +---
kernel/rcu/tree_exp.h | 2 +-
kernel/rcu/tree_plugin.h | 573 +++---------------
kernel/rcu/tree_trace.c | 494 ---------------
kernel/rcu/update.c | 77 ++-
kernel/sched/core.c | 8 +-
kernel/time/Kconfig | 50 --
lib/Kconfig.debug | 184 +-----
lib/Makefile | 3 -
scripts/checkpatch.pl | 17 -
.../selftests/rcutorture/bin/configcheck.sh | 2 +-
.../testing/selftests/rcutorture/bin/kvm-build.sh | 2 +-
tools/testing/selftests/rcutorture/bin/kvm.sh | 5 +-
.../selftests/rcutorture/configs/rcu/CFLIST | 2 +
.../selftests/rcutorture/configs/rcu/SRCU-C.boot | 1 +
.../selftests/rcutorture/configs/rcu/SRCU-N | 2 +-
.../selftests/rcutorture/configs/rcu/SRCU-P | 6 +-
.../selftests/rcutorture/configs/rcu/SRCU-t | 10 +
.../selftests/rcutorture/configs/rcu/SRCU-t.boot | 1 +
.../selftests/rcutorture/configs/rcu/SRCU-u | 9 +
.../selftests/rcutorture/configs/rcu/SRCU-u.boot | 1 +
.../selftests/rcutorture/configs/rcu/TINY02 | 5 +-
.../selftests/rcutorture/configs/rcu/TREE01 | 5 +-
.../selftests/rcutorture/configs/rcu/TREE01.boot | 4 +
.../selftests/rcutorture/configs/rcu/TREE02 | 5 +-
.../selftests/rcutorture/configs/rcu/TREE03 | 4 -
.../selftests/rcutorture/configs/rcu/TREE03.boot | 4 +
.../selftests/rcutorture/configs/rcu/TREE04 | 4 -
.../selftests/rcutorture/configs/rcu/TREE05 | 4 -
.../selftests/rcutorture/configs/rcu/TREE05.boot | 3 +
.../selftests/rcutorture/configs/rcu/TREE06 | 4 +-
.../selftests/rcutorture/configs/rcu/TREE06.boot | 3 +
.../selftests/rcutorture/configs/rcu/TREE07 | 6 -
.../selftests/rcutorture/configs/rcu/TREE08 | 1 -
.../selftests/rcutorture/configs/rcu/TREE08-T | 21 -
.../selftests/rcutorture/configs/rcu/TREE08.boot | 1 +
.../configs/{rcu/TREE02-T => rcuperf/TINY} | 19 +-
.../selftests/rcutorture/configs/rcuperf/TREE | 1 -
.../selftests/rcutorture/configs/rcuperf/TREE54 | 1 -
.../testing/selftests/rcutorture/doc/TINY_RCU.txt | 1 -
.../selftests/rcutorture/doc/TREE_RCU-kconfig.txt | 34 +-
.../rcutorture/formal/srcu-cbmc/modify_srcu.awk | 2 +-
79 files changed, 1496 insertions(+), 4211 deletions(-)
delete mode 100644 Documentation/RCU/trace.txt
delete mode 100644 include/linux/srcuclassic.h
create mode 100644 kernel/rcu/Kconfig
create mode 100644 kernel/rcu/Kconfig.debug
delete mode 100644 kernel/rcu/srcu.c
delete mode 100644 kernel/rcu/tree_trace.c
create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-C.boot
create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-t
create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-t.boot
create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-u
create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-u.boot
delete mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE08-T
rename tools/testing/selftests/rcutorture/configs/{rcu/TREE02-T => rcuperf/TINY} (50%)


2017-06-13 06:41:29

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13


* Paul E. McKenney <[email protected]> wrote:

> Hello, Ingo,
>
> This pull request is unusual in being a single linear set of commits,
> as opposed to my usual topic branches. This is due to the many
> large-footprint changes, which means that reasonable topic branches
> result in large numbers of merge conflicts. In addition, some commits
> depend on other commits that should be on different topic branches.
> I will return to the topic-branch style next time.
>
> The largest feature of this series is shrinking and simplification,
> with the following diffstat summary:
>
> 79 files changed, 1496 insertions(+), 4211 deletions(-)
>
> In other words, this series represents a net reduction of more than 2700
> lines of code.
>
> These commits were posted to LKML:
>
> http://lkml.kernel.org/r/[email protected]
>
> Two of these commits (46/88 and 48/88) have been deferred, most likely
> to v4.14. All of the remaining commits have been subjected to the 0day
> Test Robot and -next testing, and are availiable in teh git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git for-mingo
>
> for you to fetch changes up to 6d48152eafde1f0d0a4a9e0584fa7d9ff4fbfdac:
>
> rcu: Remove RCU CPU stall warnings from Tiny RCU (2017-06-08 18:52:45 -0700)
>
> ----------------------------------------------------------------
> Arnd Bergmann (1):
> bcm47xx: Fix build regression
>
> Paul E. McKenney (83):
> rcutorture: Add lockdep to one of the SRCU scenarios
> rcutorture: Add three-level tree test for Tree SRCU
> rcutorture: Fix bug in reporting Kconfig mis-settings
> rcutorture: Add a scenario for Tiny SRCU
> rcutorture: Add a scenario for Classic SRCU
> rcu: Prevent rcu_barrier() from starting needless grace periods
> rcutorture: Correctly handle CONFIG_RCU_TORTURE_TEST_* options
> rcutorture: Update test scenarios based on new Kconfig dependencies
> srcu: Eliminate possibility of destructive counter overflow
> rcu: Complain if blocking in preemptible RCU read-side critical section
> rcuperf: Defer expedited/normal check to end of test
> rcuperf: Remove conflicting Kconfig options
> rcu: Remove obsolete reference to synchronize_kernel()
> rcuperf: Add ability to performance-test call_rcu() and friends
> rcuperf: Add a Kconfig-fragment file for Classic SRCU
> rcu: Make sync_rcu_preempt_exp_done() return bool
> checkpatch: Remove checks for expedited grace periods
> rcuperf: Add test for dynamically initialized srcu_struct
> doc/atomic_ops: Clarify smp_mb__{before,after}_atomic()
> atomics: Add header comment so spin_unlock_wait()
> rcuperf: Add the ability to test tiny RCU flavors
> srcu: Make Classic and Tree SRCU announce themselves at bootup
> rcutorture: Reduce CPUs dedicated to testing Classic SRCU
> srcu: Shrink Tiny SRCU a bit more
> rcuperf: Set more user-friendly defaults
> rcuperf: Add writer_holdoff boot parameter
> rcutorture: Add "git diff" output to testid.txt file
> srcu: Document auto-expediting requirement
> doc: Take tail recursion into account in RCU requirements
> rcu: Add preemptibility checks in rcu_sched_qs() and rcu_bh_qs()
> rcu: Print out rcupdate.c non-default boot-time settings
> rcu: Update rcu_bootup_announce_oddness()
> srcu: Make exp_holdoff module parameter be static
> srcu: Print non-default exp_holdoff values at boot time
> rcu: Add lockdep_assert_held() teeth to tree.c
> rcu: Add lockdep_assert_held() teeth to tree_plugin.h
> srcu: Make SRCU be once again optional
> srcu: Shrink Tiny SRCU a bit
> srcu: Add DEBUG_OBJECTS_RCU_HEAD functionality
> rcu: Make synchronize_rcu_mult() check for duplicates
> sched: Rely on synchronize_rcu_mult() de-duplication
> rcu: Use RCU_NOCB_WAKE rather than RCU_NOGP_WAKE
> rcu: Add memory barriers for NOCB leader wakeup
> rcu: Flag need for rcu_node_tree.h and rcu_segcblist.h visibility
> rcu: Move docbook comments out of rcupdate.h
> rcu: Move rcu_expedited and rcu_normal externs from rcupdate.h
> rcu: Move expediting-related access/control out of rcupdate.h
> rcu: Move torture-related definitions from rcupdate.h to rcu.h
> rcu: Remove UINT_CMP_GE() and UINT_CMP_LT()
> rcu: Move rcupdate.h to new empty-function style
> rcu: Eliminate the unused __rcu_is_watching() function
> rcu: Move the RCU_SCHEDULER_ definitions from rcupdate.h
> rcu: Remove linux/debugobjects.h from rcupdate.h
> rcu: Improve __call_rcu() debug-objects error message
> rcu: Move rcu_is_nocb_cpu() from rcupdate.h to rcu.h
> rcu: Move rcu_ftrace_dump() from rcupdate.h to rcu.h
> rcu: move rcupdate.h to the new true/false-function style
> rcu: Move torture-related functions out of rcutiny.h and rcutree.h
> rcu: Move rcu_request_urgent_qs_task() out of rcutiny.h and rcutree.h
> rcu: Move rcutiny.h to new empty/true/false-function style
> srcu: Prevent sdp->srcu_gp_seq_needed counter wrap
> srcu: Shrink srcu.h by moving docbook and private function
> srcu: Apply trivial callback lists to shrink Tiny SRCU
> lockdep: Use consistent printing primitives
> rcu: Refactor #includes from include/linux/rcupdate.h
> rcu: Convert rnp->lock wrappers to macros for SRCU use
> rcu: Move rnp->lock wrappers for SRCU use
> srcu: Use rnp->lock wrappers to replace explicit memory barriers
> rcu: Remove *_SLOW_* Kconfig options
> rcu: Remove the RCU_KTHREAD_PRIO Kconfig option
> rcu: Remove nohz_full full-system-idle state machine
> rcu: Remove #ifdef moving rcu_end_inkernel_boot from rcupdate.h
> rcu: Remove typecheck() from RCU locking wrapper functions
> rcu: Remove the now-obsolete PROVE_RCU_REPEATEDLY Kconfig option
> rcu: Remove SPARSE_RCU_POINTER Kconfig option
> srcu: Fix rcutorture-statistics typo
> srcu: Remove Classic SRCU
> rcu: Remove debugfs tracing
> rcu: Eliminate NOCBs CPU-state Kconfig options
> rcu: Move RCU non-debug Kconfig options to kernel/rcu
> rcu: Move RCU debug Kconfig options to kernel/rcu
> rcu: Remove event tracing from Tiny RCU
> rcu: Remove RCU CPU stall warnings from Tiny RCU
>
> Priyalee Kushwaha (1):
> srcu-cbmc: Use /usr/bin/awk instead of /bin/awk
>
> Stan Drozd (1):
> docs: Fix typo in Documentation/memory-barriers.txt
>
> Documentation/RCU/00-INDEX | 2 -
> .../RCU/Design/Requirements/Requirements.html | 34 +-
> Documentation/RCU/checklist.txt | 8 +-
> Documentation/RCU/trace.txt | 535 -----------------
> Documentation/admin-guide/kernel-parameters.txt | 41 +-
> Documentation/core-api/atomic_ops.rst | 5 +
> Documentation/dev-tools/sparse.rst | 6 -
> Documentation/kernel-per-CPU-kthreads.txt | 31 +-
> Documentation/memory-barriers.txt | 2 +-
> Documentation/timers/NO_HZ.txt | 29 +-
> include/linux/bcm47xx_nvram.h | 1 +
> include/linux/compiler.h | 4 -
> include/linux/rcu_node_tree.h | 4 +
> include/linux/rcu_segcblist.h | 4 +
> include/linux/rcupdate.h | 318 +---------
> include/linux/rcutiny.h | 167 +-----
> include/linux/rcutree.h | 21 +-
> include/linux/spinlock.h | 20 +
> include/linux/srcu.h | 25 +-
> include/linux/srcuclassic.h | 115 ----
> include/linux/srcutiny.h | 47 +-
> include/linux/srcutree.h | 13 +-
> include/trace/events/rcu.h | 1 +
> init/Kconfig | 349 +----------
> kernel/locking/lockdep.c | 176 +++---
> kernel/rcu/Kconfig | 242 ++++++++
> kernel/rcu/Kconfig.debug | 82 +++
> kernel/rcu/Makefile | 2 -
> kernel/rcu/rcu.h | 277 +++++++++
> kernel/rcu/rcuperf.c | 129 +++-
> kernel/rcu/rcutorture.c | 21 +-
> kernel/rcu/srcu.c | 661 ---------------------
> kernel/rcu/srcutiny.c | 86 +--
> kernel/rcu/srcutree.c | 187 ++++--
> kernel/rcu/tiny.c | 54 +-
> kernel/rcu/tiny_plugin.h | 123 ----
> kernel/rcu/tree.c | 195 +++---
> kernel/rcu/tree.h | 109 +---
> kernel/rcu/tree_exp.h | 2 +-
> kernel/rcu/tree_plugin.h | 573 +++---------------
> kernel/rcu/tree_trace.c | 494 ---------------
> kernel/rcu/update.c | 77 ++-
> kernel/sched/core.c | 8 +-
> kernel/time/Kconfig | 50 --
> lib/Kconfig.debug | 184 +-----
> lib/Makefile | 3 -
> scripts/checkpatch.pl | 17 -
> .../selftests/rcutorture/bin/configcheck.sh | 2 +-
> .../testing/selftests/rcutorture/bin/kvm-build.sh | 2 +-
> tools/testing/selftests/rcutorture/bin/kvm.sh | 5 +-
> .../selftests/rcutorture/configs/rcu/CFLIST | 2 +
> .../selftests/rcutorture/configs/rcu/SRCU-C.boot | 1 +
> .../selftests/rcutorture/configs/rcu/SRCU-N | 2 +-
> .../selftests/rcutorture/configs/rcu/SRCU-P | 6 +-
> .../selftests/rcutorture/configs/rcu/SRCU-t | 10 +
> .../selftests/rcutorture/configs/rcu/SRCU-t.boot | 1 +
> .../selftests/rcutorture/configs/rcu/SRCU-u | 9 +
> .../selftests/rcutorture/configs/rcu/SRCU-u.boot | 1 +
> .../selftests/rcutorture/configs/rcu/TINY02 | 5 +-
> .../selftests/rcutorture/configs/rcu/TREE01 | 5 +-
> .../selftests/rcutorture/configs/rcu/TREE01.boot | 4 +
> .../selftests/rcutorture/configs/rcu/TREE02 | 5 +-
> .../selftests/rcutorture/configs/rcu/TREE03 | 4 -
> .../selftests/rcutorture/configs/rcu/TREE03.boot | 4 +
> .../selftests/rcutorture/configs/rcu/TREE04 | 4 -
> .../selftests/rcutorture/configs/rcu/TREE05 | 4 -
> .../selftests/rcutorture/configs/rcu/TREE05.boot | 3 +
> .../selftests/rcutorture/configs/rcu/TREE06 | 4 +-
> .../selftests/rcutorture/configs/rcu/TREE06.boot | 3 +
> .../selftests/rcutorture/configs/rcu/TREE07 | 6 -
> .../selftests/rcutorture/configs/rcu/TREE08 | 1 -
> .../selftests/rcutorture/configs/rcu/TREE08-T | 21 -
> .../selftests/rcutorture/configs/rcu/TREE08.boot | 1 +
> .../configs/{rcu/TREE02-T => rcuperf/TINY} | 19 +-
> .../selftests/rcutorture/configs/rcuperf/TREE | 1 -
> .../selftests/rcutorture/configs/rcuperf/TREE54 | 1 -
> .../testing/selftests/rcutorture/doc/TINY_RCU.txt | 1 -
> .../selftests/rcutorture/doc/TREE_RCU-kconfig.txt | 34 +-
> .../rcutorture/formal/srcu-cbmc/modify_srcu.awk | 2 +-
> 79 files changed, 1496 insertions(+), 4211 deletions(-)
> delete mode 100644 Documentation/RCU/trace.txt
> delete mode 100644 include/linux/srcuclassic.h
> create mode 100644 kernel/rcu/Kconfig
> create mode 100644 kernel/rcu/Kconfig.debug
> delete mode 100644 kernel/rcu/srcu.c
> delete mode 100644 kernel/rcu/tree_trace.c
> create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-C.boot
> create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-t
> create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-t.boot
> create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-u
> create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-u.boot
> delete mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE08-T
> rename tools/testing/selftests/rcutorture/configs/{rcu/TREE02-T => rcuperf/TINY} (50%)

Pulled, thanks a lot Paul!

Ingo

2017-06-14 02:54:17

by Andrea Parri

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Mon, Jun 12, 2017 at 02:37:55PM -0700, Paul E. McKenney wrote:
> Hello, Ingo,
>
> This pull request is unusual in being a single linear set of commits,
> as opposed to my usual topic branches. This is due to the many
> large-footprint changes, which means that reasonable topic branches
> result in large numbers of merge conflicts. In addition, some commits
> depend on other commits that should be on different topic branches.
> I will return to the topic-branch style next time.
>
> The largest feature of this series is shrinking and simplification,
> with the following diffstat summary:
>
> 79 files changed, 1496 insertions(+), 4211 deletions(-)
>
> In other words, this series represents a net reduction of more than 2700
> lines of code.
>
> These commits were posted to LKML:
>
> http://lkml.kernel.org/r/[email protected]

I did raise some issues (AFAICT, unresolved) concerning...


>
> Two of these commits (46/88 and 48/88) have been deferred, most likely
> to v4.14. All of the remaining commits have been subjected to the 0day
> Test Robot and -next testing, and are availiable in teh git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git for-mingo
>
> for you to fetch changes up to 6d48152eafde1f0d0a4a9e0584fa7d9ff4fbfdac:
>
> rcu: Remove RCU CPU stall warnings from Tiny RCU (2017-06-08 18:52:45 -0700)
>
> ----------------------------------------------------------------
> Arnd Bergmann (1):
> bcm47xx: Fix build regression
>
> Paul E. McKenney (83):
> rcutorture: Add lockdep to one of the SRCU scenarios
> rcutorture: Add three-level tree test for Tree SRCU
> rcutorture: Fix bug in reporting Kconfig mis-settings
> rcutorture: Add a scenario for Tiny SRCU
> rcutorture: Add a scenario for Classic SRCU
> rcu: Prevent rcu_barrier() from starting needless grace periods
> rcutorture: Correctly handle CONFIG_RCU_TORTURE_TEST_* options
> rcutorture: Update test scenarios based on new Kconfig dependencies
> srcu: Eliminate possibility of destructive counter overflow
> rcu: Complain if blocking in preemptible RCU read-side critical section
> rcuperf: Defer expedited/normal check to end of test
> rcuperf: Remove conflicting Kconfig options
> rcu: Remove obsolete reference to synchronize_kernel()
> rcuperf: Add ability to performance-test call_rcu() and friends
> rcuperf: Add a Kconfig-fragment file for Classic SRCU
> rcu: Make sync_rcu_preempt_exp_done() return bool
> checkpatch: Remove checks for expedited grace periods
> rcuperf: Add test for dynamically initialized srcu_struct
> doc/atomic_ops: Clarify smp_mb__{before,after}_atomic()
> atomics: Add header comment so spin_unlock_wait()

... this one: c.f.,

http://www.mail-archive.com/[email protected]/msg1418503.html

Any hints about those?

Andrea

> rcuperf: Add the ability to test tiny RCU flavors
> srcu: Make Classic and Tree SRCU announce themselves at bootup
> rcutorture: Reduce CPUs dedicated to testing Classic SRCU
> srcu: Shrink Tiny SRCU a bit more
> rcuperf: Set more user-friendly defaults
> rcuperf: Add writer_holdoff boot parameter
> rcutorture: Add "git diff" output to testid.txt file
> srcu: Document auto-expediting requirement
> doc: Take tail recursion into account in RCU requirements
> rcu: Add preemptibility checks in rcu_sched_qs() and rcu_bh_qs()
> rcu: Print out rcupdate.c non-default boot-time settings
> rcu: Update rcu_bootup_announce_oddness()
> srcu: Make exp_holdoff module parameter be static
> srcu: Print non-default exp_holdoff values at boot time
> rcu: Add lockdep_assert_held() teeth to tree.c
> rcu: Add lockdep_assert_held() teeth to tree_plugin.h
> srcu: Make SRCU be once again optional
> srcu: Shrink Tiny SRCU a bit
> srcu: Add DEBUG_OBJECTS_RCU_HEAD functionality
> rcu: Make synchronize_rcu_mult() check for duplicates
> sched: Rely on synchronize_rcu_mult() de-duplication
> rcu: Use RCU_NOCB_WAKE rather than RCU_NOGP_WAKE
> rcu: Add memory barriers for NOCB leader wakeup
> rcu: Flag need for rcu_node_tree.h and rcu_segcblist.h visibility
> rcu: Move docbook comments out of rcupdate.h
> rcu: Move rcu_expedited and rcu_normal externs from rcupdate.h
> rcu: Move expediting-related access/control out of rcupdate.h
> rcu: Move torture-related definitions from rcupdate.h to rcu.h
> rcu: Remove UINT_CMP_GE() and UINT_CMP_LT()
> rcu: Move rcupdate.h to new empty-function style
> rcu: Eliminate the unused __rcu_is_watching() function
> rcu: Move the RCU_SCHEDULER_ definitions from rcupdate.h
> rcu: Remove linux/debugobjects.h from rcupdate.h
> rcu: Improve __call_rcu() debug-objects error message
> rcu: Move rcu_is_nocb_cpu() from rcupdate.h to rcu.h
> rcu: Move rcu_ftrace_dump() from rcupdate.h to rcu.h
> rcu: move rcupdate.h to the new true/false-function style
> rcu: Move torture-related functions out of rcutiny.h and rcutree.h
> rcu: Move rcu_request_urgent_qs_task() out of rcutiny.h and rcutree.h
> rcu: Move rcutiny.h to new empty/true/false-function style
> srcu: Prevent sdp->srcu_gp_seq_needed counter wrap
> srcu: Shrink srcu.h by moving docbook and private function
> srcu: Apply trivial callback lists to shrink Tiny SRCU
> lockdep: Use consistent printing primitives
> rcu: Refactor #includes from include/linux/rcupdate.h
> rcu: Convert rnp->lock wrappers to macros for SRCU use
> rcu: Move rnp->lock wrappers for SRCU use
> srcu: Use rnp->lock wrappers to replace explicit memory barriers
> rcu: Remove *_SLOW_* Kconfig options
> rcu: Remove the RCU_KTHREAD_PRIO Kconfig option
> rcu: Remove nohz_full full-system-idle state machine
> rcu: Remove #ifdef moving rcu_end_inkernel_boot from rcupdate.h
> rcu: Remove typecheck() from RCU locking wrapper functions
> rcu: Remove the now-obsolete PROVE_RCU_REPEATEDLY Kconfig option
> rcu: Remove SPARSE_RCU_POINTER Kconfig option
> srcu: Fix rcutorture-statistics typo
> srcu: Remove Classic SRCU
> rcu: Remove debugfs tracing
> rcu: Eliminate NOCBs CPU-state Kconfig options
> rcu: Move RCU non-debug Kconfig options to kernel/rcu
> rcu: Move RCU debug Kconfig options to kernel/rcu
> rcu: Remove event tracing from Tiny RCU
> rcu: Remove RCU CPU stall warnings from Tiny RCU
>
> Priyalee Kushwaha (1):
> srcu-cbmc: Use /usr/bin/awk instead of /bin/awk
>
> Stan Drozd (1):
> docs: Fix typo in Documentation/memory-barriers.txt
>
> Documentation/RCU/00-INDEX | 2 -
> .../RCU/Design/Requirements/Requirements.html | 34 +-
> Documentation/RCU/checklist.txt | 8 +-
> Documentation/RCU/trace.txt | 535 -----------------
> Documentation/admin-guide/kernel-parameters.txt | 41 +-
> Documentation/core-api/atomic_ops.rst | 5 +
> Documentation/dev-tools/sparse.rst | 6 -
> Documentation/kernel-per-CPU-kthreads.txt | 31 +-
> Documentation/memory-barriers.txt | 2 +-
> Documentation/timers/NO_HZ.txt | 29 +-
> include/linux/bcm47xx_nvram.h | 1 +
> include/linux/compiler.h | 4 -
> include/linux/rcu_node_tree.h | 4 +
> include/linux/rcu_segcblist.h | 4 +
> include/linux/rcupdate.h | 318 +---------
> include/linux/rcutiny.h | 167 +-----
> include/linux/rcutree.h | 21 +-
> include/linux/spinlock.h | 20 +
> include/linux/srcu.h | 25 +-
> include/linux/srcuclassic.h | 115 ----
> include/linux/srcutiny.h | 47 +-
> include/linux/srcutree.h | 13 +-
> include/trace/events/rcu.h | 1 +
> init/Kconfig | 349 +----------
> kernel/locking/lockdep.c | 176 +++---
> kernel/rcu/Kconfig | 242 ++++++++
> kernel/rcu/Kconfig.debug | 82 +++
> kernel/rcu/Makefile | 2 -
> kernel/rcu/rcu.h | 277 +++++++++
> kernel/rcu/rcuperf.c | 129 +++-
> kernel/rcu/rcutorture.c | 21 +-
> kernel/rcu/srcu.c | 661 ---------------------
> kernel/rcu/srcutiny.c | 86 +--
> kernel/rcu/srcutree.c | 187 ++++--
> kernel/rcu/tiny.c | 54 +-
> kernel/rcu/tiny_plugin.h | 123 ----
> kernel/rcu/tree.c | 195 +++---
> kernel/rcu/tree.h | 109 +---
> kernel/rcu/tree_exp.h | 2 +-
> kernel/rcu/tree_plugin.h | 573 +++---------------
> kernel/rcu/tree_trace.c | 494 ---------------
> kernel/rcu/update.c | 77 ++-
> kernel/sched/core.c | 8 +-
> kernel/time/Kconfig | 50 --
> lib/Kconfig.debug | 184 +-----
> lib/Makefile | 3 -
> scripts/checkpatch.pl | 17 -
> .../selftests/rcutorture/bin/configcheck.sh | 2 +-
> .../testing/selftests/rcutorture/bin/kvm-build.sh | 2 +-
> tools/testing/selftests/rcutorture/bin/kvm.sh | 5 +-
> .../selftests/rcutorture/configs/rcu/CFLIST | 2 +
> .../selftests/rcutorture/configs/rcu/SRCU-C.boot | 1 +
> .../selftests/rcutorture/configs/rcu/SRCU-N | 2 +-
> .../selftests/rcutorture/configs/rcu/SRCU-P | 6 +-
> .../selftests/rcutorture/configs/rcu/SRCU-t | 10 +
> .../selftests/rcutorture/configs/rcu/SRCU-t.boot | 1 +
> .../selftests/rcutorture/configs/rcu/SRCU-u | 9 +
> .../selftests/rcutorture/configs/rcu/SRCU-u.boot | 1 +
> .../selftests/rcutorture/configs/rcu/TINY02 | 5 +-
> .../selftests/rcutorture/configs/rcu/TREE01 | 5 +-
> .../selftests/rcutorture/configs/rcu/TREE01.boot | 4 +
> .../selftests/rcutorture/configs/rcu/TREE02 | 5 +-
> .../selftests/rcutorture/configs/rcu/TREE03 | 4 -
> .../selftests/rcutorture/configs/rcu/TREE03.boot | 4 +
> .../selftests/rcutorture/configs/rcu/TREE04 | 4 -
> .../selftests/rcutorture/configs/rcu/TREE05 | 4 -
> .../selftests/rcutorture/configs/rcu/TREE05.boot | 3 +
> .../selftests/rcutorture/configs/rcu/TREE06 | 4 +-
> .../selftests/rcutorture/configs/rcu/TREE06.boot | 3 +
> .../selftests/rcutorture/configs/rcu/TREE07 | 6 -
> .../selftests/rcutorture/configs/rcu/TREE08 | 1 -
> .../selftests/rcutorture/configs/rcu/TREE08-T | 21 -
> .../selftests/rcutorture/configs/rcu/TREE08.boot | 1 +
> .../configs/{rcu/TREE02-T => rcuperf/TINY} | 19 +-
> .../selftests/rcutorture/configs/rcuperf/TREE | 1 -
> .../selftests/rcutorture/configs/rcuperf/TREE54 | 1 -
> .../testing/selftests/rcutorture/doc/TINY_RCU.txt | 1 -
> .../selftests/rcutorture/doc/TREE_RCU-kconfig.txt | 34 +-
> .../rcutorture/formal/srcu-cbmc/modify_srcu.awk | 2 +-
> 79 files changed, 1496 insertions(+), 4211 deletions(-)
> delete mode 100644 Documentation/RCU/trace.txt
> delete mode 100644 include/linux/srcuclassic.h
> create mode 100644 kernel/rcu/Kconfig
> create mode 100644 kernel/rcu/Kconfig.debug
> delete mode 100644 kernel/rcu/srcu.c
> delete mode 100644 kernel/rcu/tree_trace.c
> create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-C.boot
> create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-t
> create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-t.boot
> create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-u
> create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-u.boot
> delete mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE08-T
> rename tools/testing/selftests/rcutorture/configs/{rcu/TREE02-T => rcuperf/TINY} (50%)
>

2017-06-14 04:33:28

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Wed, Jun 14, 2017 at 04:54:04AM +0200, Andrea Parri wrote:
> On Mon, Jun 12, 2017 at 02:37:55PM -0700, Paul E. McKenney wrote:
> > Hello, Ingo,
> >
> > This pull request is unusual in being a single linear set of commits,
> > as opposed to my usual topic branches. This is due to the many
> > large-footprint changes, which means that reasonable topic branches
> > result in large numbers of merge conflicts. In addition, some commits
> > depend on other commits that should be on different topic branches.
> > I will return to the topic-branch style next time.
> >
> > The largest feature of this series is shrinking and simplification,
> > with the following diffstat summary:
> >
> > 79 files changed, 1496 insertions(+), 4211 deletions(-)
> >
> > In other words, this series represents a net reduction of more than 2700
> > lines of code.
> >
> > These commits were posted to LKML:
> >
> > http://lkml.kernel.org/r/[email protected]
>
> I did raise some issues (AFAICT, unresolved) concerning...
>
>
> >
> > Two of these commits (46/88 and 48/88) have been deferred, most likely
> > to v4.14. All of the remaining commits have been subjected to the 0day
> > Test Robot and -next testing, and are availiable in teh git repository at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git for-mingo
> >
> > for you to fetch changes up to 6d48152eafde1f0d0a4a9e0584fa7d9ff4fbfdac:
> >
> > rcu: Remove RCU CPU stall warnings from Tiny RCU (2017-06-08 18:52:45 -0700)
> >
> > ----------------------------------------------------------------
> > Arnd Bergmann (1):
> > bcm47xx: Fix build regression
> >
> > Paul E. McKenney (83):
> > rcutorture: Add lockdep to one of the SRCU scenarios
> > rcutorture: Add three-level tree test for Tree SRCU
> > rcutorture: Fix bug in reporting Kconfig mis-settings
> > rcutorture: Add a scenario for Tiny SRCU
> > rcutorture: Add a scenario for Classic SRCU
> > rcu: Prevent rcu_barrier() from starting needless grace periods
> > rcutorture: Correctly handle CONFIG_RCU_TORTURE_TEST_* options
> > rcutorture: Update test scenarios based on new Kconfig dependencies
> > srcu: Eliminate possibility of destructive counter overflow
> > rcu: Complain if blocking in preemptible RCU read-side critical section
> > rcuperf: Defer expedited/normal check to end of test
> > rcuperf: Remove conflicting Kconfig options
> > rcu: Remove obsolete reference to synchronize_kernel()
> > rcuperf: Add ability to performance-test call_rcu() and friends
> > rcuperf: Add a Kconfig-fragment file for Classic SRCU
> > rcu: Make sync_rcu_preempt_exp_done() return bool
> > checkpatch: Remove checks for expedited grace periods
> > rcuperf: Add test for dynamically initialized srcu_struct
> > doc/atomic_ops: Clarify smp_mb__{before,after}_atomic()
> > atomics: Add header comment so spin_unlock_wait()
>
> ... this one: c.f.,
>
> http://www.mail-archive.com/[email protected]/msg1418503.html
>
> Any hints about those?

I suggest being -extremely- clear. This is about ARM, correct?
If so, lay out the exact situation (code example, which hardware,
which spin_unlock_wait(), which sequence of events) that could lead to
the failure.

The problem here is that no one knows which of the 30 CPU families you
might be talking about, nor do they know exactly what the problem is.
I didn't worry about it at the time because I figured that you had
sent private email to Will with the full story.

Yes, the four of us (you, Alan, Luc, and me) discussed it, but we weren't
sure whether it was a bug in the memory model, the spin_unlock_wait()
code, or my description of spin_unlock_wait(). Given that Will didn't
object to my April 13th email (the one that you were not CCed on),
I figured that he wasn't going to claim that the spin_unlock_wait()
description was wrong, especially since he went to so much effort some
years back to make ARM64 meet that description.

So again, I recommend replying to your msg1418503.html email with
a code fragment demonstrating the problem, exact identification of
the hardware that might be susceptible (ARM64? ARM32? Which ARM32?),
exact identification of which spin_unlock_wait() function you suspect,
and a clear bullet-form sequence of events that shows how you believe
that the problem can occur.

That makes it easy for people to see what your concern is, easy for
them to check their code and hardware, and hard for them to ignore you.

Make sense?

Thanx, Paul

> Andrea
>
> > rcuperf: Add the ability to test tiny RCU flavors
> > srcu: Make Classic and Tree SRCU announce themselves at bootup
> > rcutorture: Reduce CPUs dedicated to testing Classic SRCU
> > srcu: Shrink Tiny SRCU a bit more
> > rcuperf: Set more user-friendly defaults
> > rcuperf: Add writer_holdoff boot parameter
> > rcutorture: Add "git diff" output to testid.txt file
> > srcu: Document auto-expediting requirement
> > doc: Take tail recursion into account in RCU requirements
> > rcu: Add preemptibility checks in rcu_sched_qs() and rcu_bh_qs()
> > rcu: Print out rcupdate.c non-default boot-time settings
> > rcu: Update rcu_bootup_announce_oddness()
> > srcu: Make exp_holdoff module parameter be static
> > srcu: Print non-default exp_holdoff values at boot time
> > rcu: Add lockdep_assert_held() teeth to tree.c
> > rcu: Add lockdep_assert_held() teeth to tree_plugin.h
> > srcu: Make SRCU be once again optional
> > srcu: Shrink Tiny SRCU a bit
> > srcu: Add DEBUG_OBJECTS_RCU_HEAD functionality
> > rcu: Make synchronize_rcu_mult() check for duplicates
> > sched: Rely on synchronize_rcu_mult() de-duplication
> > rcu: Use RCU_NOCB_WAKE rather than RCU_NOGP_WAKE
> > rcu: Add memory barriers for NOCB leader wakeup
> > rcu: Flag need for rcu_node_tree.h and rcu_segcblist.h visibility
> > rcu: Move docbook comments out of rcupdate.h
> > rcu: Move rcu_expedited and rcu_normal externs from rcupdate.h
> > rcu: Move expediting-related access/control out of rcupdate.h
> > rcu: Move torture-related definitions from rcupdate.h to rcu.h
> > rcu: Remove UINT_CMP_GE() and UINT_CMP_LT()
> > rcu: Move rcupdate.h to new empty-function style
> > rcu: Eliminate the unused __rcu_is_watching() function
> > rcu: Move the RCU_SCHEDULER_ definitions from rcupdate.h
> > rcu: Remove linux/debugobjects.h from rcupdate.h
> > rcu: Improve __call_rcu() debug-objects error message
> > rcu: Move rcu_is_nocb_cpu() from rcupdate.h to rcu.h
> > rcu: Move rcu_ftrace_dump() from rcupdate.h to rcu.h
> > rcu: move rcupdate.h to the new true/false-function style
> > rcu: Move torture-related functions out of rcutiny.h and rcutree.h
> > rcu: Move rcu_request_urgent_qs_task() out of rcutiny.h and rcutree.h
> > rcu: Move rcutiny.h to new empty/true/false-function style
> > srcu: Prevent sdp->srcu_gp_seq_needed counter wrap
> > srcu: Shrink srcu.h by moving docbook and private function
> > srcu: Apply trivial callback lists to shrink Tiny SRCU
> > lockdep: Use consistent printing primitives
> > rcu: Refactor #includes from include/linux/rcupdate.h
> > rcu: Convert rnp->lock wrappers to macros for SRCU use
> > rcu: Move rnp->lock wrappers for SRCU use
> > srcu: Use rnp->lock wrappers to replace explicit memory barriers
> > rcu: Remove *_SLOW_* Kconfig options
> > rcu: Remove the RCU_KTHREAD_PRIO Kconfig option
> > rcu: Remove nohz_full full-system-idle state machine
> > rcu: Remove #ifdef moving rcu_end_inkernel_boot from rcupdate.h
> > rcu: Remove typecheck() from RCU locking wrapper functions
> > rcu: Remove the now-obsolete PROVE_RCU_REPEATEDLY Kconfig option
> > rcu: Remove SPARSE_RCU_POINTER Kconfig option
> > srcu: Fix rcutorture-statistics typo
> > srcu: Remove Classic SRCU
> > rcu: Remove debugfs tracing
> > rcu: Eliminate NOCBs CPU-state Kconfig options
> > rcu: Move RCU non-debug Kconfig options to kernel/rcu
> > rcu: Move RCU debug Kconfig options to kernel/rcu
> > rcu: Remove event tracing from Tiny RCU
> > rcu: Remove RCU CPU stall warnings from Tiny RCU
> >
> > Priyalee Kushwaha (1):
> > srcu-cbmc: Use /usr/bin/awk instead of /bin/awk
> >
> > Stan Drozd (1):
> > docs: Fix typo in Documentation/memory-barriers.txt
> >
> > Documentation/RCU/00-INDEX | 2 -
> > .../RCU/Design/Requirements/Requirements.html | 34 +-
> > Documentation/RCU/checklist.txt | 8 +-
> > Documentation/RCU/trace.txt | 535 -----------------
> > Documentation/admin-guide/kernel-parameters.txt | 41 +-
> > Documentation/core-api/atomic_ops.rst | 5 +
> > Documentation/dev-tools/sparse.rst | 6 -
> > Documentation/kernel-per-CPU-kthreads.txt | 31 +-
> > Documentation/memory-barriers.txt | 2 +-
> > Documentation/timers/NO_HZ.txt | 29 +-
> > include/linux/bcm47xx_nvram.h | 1 +
> > include/linux/compiler.h | 4 -
> > include/linux/rcu_node_tree.h | 4 +
> > include/linux/rcu_segcblist.h | 4 +
> > include/linux/rcupdate.h | 318 +---------
> > include/linux/rcutiny.h | 167 +-----
> > include/linux/rcutree.h | 21 +-
> > include/linux/spinlock.h | 20 +
> > include/linux/srcu.h | 25 +-
> > include/linux/srcuclassic.h | 115 ----
> > include/linux/srcutiny.h | 47 +-
> > include/linux/srcutree.h | 13 +-
> > include/trace/events/rcu.h | 1 +
> > init/Kconfig | 349 +----------
> > kernel/locking/lockdep.c | 176 +++---
> > kernel/rcu/Kconfig | 242 ++++++++
> > kernel/rcu/Kconfig.debug | 82 +++
> > kernel/rcu/Makefile | 2 -
> > kernel/rcu/rcu.h | 277 +++++++++
> > kernel/rcu/rcuperf.c | 129 +++-
> > kernel/rcu/rcutorture.c | 21 +-
> > kernel/rcu/srcu.c | 661 ---------------------
> > kernel/rcu/srcutiny.c | 86 +--
> > kernel/rcu/srcutree.c | 187 ++++--
> > kernel/rcu/tiny.c | 54 +-
> > kernel/rcu/tiny_plugin.h | 123 ----
> > kernel/rcu/tree.c | 195 +++---
> > kernel/rcu/tree.h | 109 +---
> > kernel/rcu/tree_exp.h | 2 +-
> > kernel/rcu/tree_plugin.h | 573 +++---------------
> > kernel/rcu/tree_trace.c | 494 ---------------
> > kernel/rcu/update.c | 77 ++-
> > kernel/sched/core.c | 8 +-
> > kernel/time/Kconfig | 50 --
> > lib/Kconfig.debug | 184 +-----
> > lib/Makefile | 3 -
> > scripts/checkpatch.pl | 17 -
> > .../selftests/rcutorture/bin/configcheck.sh | 2 +-
> > .../testing/selftests/rcutorture/bin/kvm-build.sh | 2 +-
> > tools/testing/selftests/rcutorture/bin/kvm.sh | 5 +-
> > .../selftests/rcutorture/configs/rcu/CFLIST | 2 +
> > .../selftests/rcutorture/configs/rcu/SRCU-C.boot | 1 +
> > .../selftests/rcutorture/configs/rcu/SRCU-N | 2 +-
> > .../selftests/rcutorture/configs/rcu/SRCU-P | 6 +-
> > .../selftests/rcutorture/configs/rcu/SRCU-t | 10 +
> > .../selftests/rcutorture/configs/rcu/SRCU-t.boot | 1 +
> > .../selftests/rcutorture/configs/rcu/SRCU-u | 9 +
> > .../selftests/rcutorture/configs/rcu/SRCU-u.boot | 1 +
> > .../selftests/rcutorture/configs/rcu/TINY02 | 5 +-
> > .../selftests/rcutorture/configs/rcu/TREE01 | 5 +-
> > .../selftests/rcutorture/configs/rcu/TREE01.boot | 4 +
> > .../selftests/rcutorture/configs/rcu/TREE02 | 5 +-
> > .../selftests/rcutorture/configs/rcu/TREE03 | 4 -
> > .../selftests/rcutorture/configs/rcu/TREE03.boot | 4 +
> > .../selftests/rcutorture/configs/rcu/TREE04 | 4 -
> > .../selftests/rcutorture/configs/rcu/TREE05 | 4 -
> > .../selftests/rcutorture/configs/rcu/TREE05.boot | 3 +
> > .../selftests/rcutorture/configs/rcu/TREE06 | 4 +-
> > .../selftests/rcutorture/configs/rcu/TREE06.boot | 3 +
> > .../selftests/rcutorture/configs/rcu/TREE07 | 6 -
> > .../selftests/rcutorture/configs/rcu/TREE08 | 1 -
> > .../selftests/rcutorture/configs/rcu/TREE08-T | 21 -
> > .../selftests/rcutorture/configs/rcu/TREE08.boot | 1 +
> > .../configs/{rcu/TREE02-T => rcuperf/TINY} | 19 +-
> > .../selftests/rcutorture/configs/rcuperf/TREE | 1 -
> > .../selftests/rcutorture/configs/rcuperf/TREE54 | 1 -
> > .../testing/selftests/rcutorture/doc/TINY_RCU.txt | 1 -
> > .../selftests/rcutorture/doc/TREE_RCU-kconfig.txt | 34 +-
> > .../rcutorture/formal/srcu-cbmc/modify_srcu.awk | 2 +-
> > 79 files changed, 1496 insertions(+), 4211 deletions(-)
> > delete mode 100644 Documentation/RCU/trace.txt
> > delete mode 100644 include/linux/srcuclassic.h
> > create mode 100644 kernel/rcu/Kconfig
> > create mode 100644 kernel/rcu/Kconfig.debug
> > delete mode 100644 kernel/rcu/srcu.c
> > delete mode 100644 kernel/rcu/tree_trace.c
> > create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-C.boot
> > create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-t
> > create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-t.boot
> > create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-u
> > create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-u.boot
> > delete mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE08-T
> > rename tools/testing/selftests/rcutorture/configs/{rcu/TREE02-T => rcuperf/TINY} (50%)
> >
>

2017-06-14 14:33:36

by Andrea Parri

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Tue, Jun 13, 2017 at 09:33:17PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 14, 2017 at 04:54:04AM +0200, Andrea Parri wrote:
> > On Mon, Jun 12, 2017 at 02:37:55PM -0700, Paul E. McKenney wrote:
> > > Hello, Ingo,
> > >
> > > This pull request is unusual in being a single linear set of commits,
> > > as opposed to my usual topic branches. This is due to the many
> > > large-footprint changes, which means that reasonable topic branches
> > > result in large numbers of merge conflicts. In addition, some commits
> > > depend on other commits that should be on different topic branches.
> > > I will return to the topic-branch style next time.
> > >
> > > The largest feature of this series is shrinking and simplification,
> > > with the following diffstat summary:
> > >
> > > 79 files changed, 1496 insertions(+), 4211 deletions(-)
> > >
> > > In other words, this series represents a net reduction of more than 2700
> > > lines of code.
> > >
> > > These commits were posted to LKML:
> > >
> > > http://lkml.kernel.org/r/[email protected]
> >
> > I did raise some issues (AFAICT, unresolved) concerning...
> >
> >
> > >
> > > Two of these commits (46/88 and 48/88) have been deferred, most likely
> > > to v4.14. All of the remaining commits have been subjected to the 0day
> > > Test Robot and -next testing, and are availiable in teh git repository at:
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git for-mingo
> > >
> > > for you to fetch changes up to 6d48152eafde1f0d0a4a9e0584fa7d9ff4fbfdac:
> > >
> > > rcu: Remove RCU CPU stall warnings from Tiny RCU (2017-06-08 18:52:45 -0700)
> > >
> > > ----------------------------------------------------------------
> > > Arnd Bergmann (1):
> > > bcm47xx: Fix build regression
> > >
> > > Paul E. McKenney (83):
> > > rcutorture: Add lockdep to one of the SRCU scenarios
> > > rcutorture: Add three-level tree test for Tree SRCU
> > > rcutorture: Fix bug in reporting Kconfig mis-settings
> > > rcutorture: Add a scenario for Tiny SRCU
> > > rcutorture: Add a scenario for Classic SRCU
> > > rcu: Prevent rcu_barrier() from starting needless grace periods
> > > rcutorture: Correctly handle CONFIG_RCU_TORTURE_TEST_* options
> > > rcutorture: Update test scenarios based on new Kconfig dependencies
> > > srcu: Eliminate possibility of destructive counter overflow
> > > rcu: Complain if blocking in preemptible RCU read-side critical section
> > > rcuperf: Defer expedited/normal check to end of test
> > > rcuperf: Remove conflicting Kconfig options
> > > rcu: Remove obsolete reference to synchronize_kernel()
> > > rcuperf: Add ability to performance-test call_rcu() and friends
> > > rcuperf: Add a Kconfig-fragment file for Classic SRCU
> > > rcu: Make sync_rcu_preempt_exp_done() return bool
> > > checkpatch: Remove checks for expedited grace periods
> > > rcuperf: Add test for dynamically initialized srcu_struct
> > > doc/atomic_ops: Clarify smp_mb__{before,after}_atomic()
> > > atomics: Add header comment so spin_unlock_wait()
> >
> > ... this one: c.f.,
> >
> > http://www.mail-archive.com/[email protected]/msg1418503.html
> >
> > Any hints about those?
>
> I suggest being -extremely- clear. This is about ARM, correct?
> If so, lay out the exact situation (code example, which hardware,
> which spin_unlock_wait(), which sequence of events) that could lead to
> the failure.
>
> The problem here is that no one knows which of the 30 CPU families you
> might be talking about, nor do they know exactly what the problem is.
> I didn't worry about it at the time because I figured that you had
> sent private email to Will with the full story.
>
> Yes, the four of us (you, Alan, Luc, and me) discussed it, but we weren't
> sure whether it was a bug in the memory model, the spin_unlock_wait()
> code, or my description of spin_unlock_wait(). Given that Will didn't
> object to my April 13th email (the one that you were not CCed on),
> I figured that he wasn't going to claim that the spin_unlock_wait()
> description was wrong, especially since he went to so much effort some
> years back to make ARM64 meet that description.
>
> So again, I recommend replying to your msg1418503.html email with
> a code fragment demonstrating the problem, exact identification of
> the hardware that might be susceptible (ARM64? ARM32? Which ARM32?),
> exact identification of which spin_unlock_wait() function you suspect,
> and a clear bullet-form sequence of events that shows how you believe
> that the problem can occur.
>
> That makes it easy for people to see what your concern is, easy for
> them to check their code and hardware, and hard for them to ignore you.
>
> Make sense?

My concerns originates from the fact that none of the implementations
(of spin_unlock_wait()) for the architectures touched by:

726328d92a42b6d4b76078e2659f43067f82c4e8
("locking/spinlock, arch: Update and fix spin_unlock_wait() implementations"

currently contain any traces of that RELEASE/spin_unlock() from your:

"Semantically this is equivalent to a spin_lock() immediately followed
by a spin_unlock()."

In fact, the header of that commit states:

"The update is in semantics; where it previously was only a control
dependency, we now upgrade to a full load-acquire [...]"

For an example leveraging this RELEASE, consider:

[initially: X = 0, s UNLOCKED]

P0 P1
X = 1; spin_lock(s);
spin_unlock_wait(s); r0 = X;

According to the "spin_lock(); spin_unlock() semantics" this has one
non-deadlocking execution, and the RELEASE from the spin_unlock_wait()
(paired with the ACQUIRE from the spin_lock() in P1) guarantees that
r0 = 1 in this execution. AFAICT, this same conclusion does not hold
according to the "smp_cond_load_acquire() semantics" (726328d92a42b).

Andrea


>
> Thanx, Paul
>
> > Andrea
> >
> > > rcuperf: Add the ability to test tiny RCU flavors
> > > srcu: Make Classic and Tree SRCU announce themselves at bootup
> > > rcutorture: Reduce CPUs dedicated to testing Classic SRCU
> > > srcu: Shrink Tiny SRCU a bit more
> > > rcuperf: Set more user-friendly defaults
> > > rcuperf: Add writer_holdoff boot parameter
> > > rcutorture: Add "git diff" output to testid.txt file
> > > srcu: Document auto-expediting requirement
> > > doc: Take tail recursion into account in RCU requirements
> > > rcu: Add preemptibility checks in rcu_sched_qs() and rcu_bh_qs()
> > > rcu: Print out rcupdate.c non-default boot-time settings
> > > rcu: Update rcu_bootup_announce_oddness()
> > > srcu: Make exp_holdoff module parameter be static
> > > srcu: Print non-default exp_holdoff values at boot time
> > > rcu: Add lockdep_assert_held() teeth to tree.c
> > > rcu: Add lockdep_assert_held() teeth to tree_plugin.h
> > > srcu: Make SRCU be once again optional
> > > srcu: Shrink Tiny SRCU a bit
> > > srcu: Add DEBUG_OBJECTS_RCU_HEAD functionality
> > > rcu: Make synchronize_rcu_mult() check for duplicates
> > > sched: Rely on synchronize_rcu_mult() de-duplication
> > > rcu: Use RCU_NOCB_WAKE rather than RCU_NOGP_WAKE
> > > rcu: Add memory barriers for NOCB leader wakeup
> > > rcu: Flag need for rcu_node_tree.h and rcu_segcblist.h visibility
> > > rcu: Move docbook comments out of rcupdate.h
> > > rcu: Move rcu_expedited and rcu_normal externs from rcupdate.h
> > > rcu: Move expediting-related access/control out of rcupdate.h
> > > rcu: Move torture-related definitions from rcupdate.h to rcu.h
> > > rcu: Remove UINT_CMP_GE() and UINT_CMP_LT()
> > > rcu: Move rcupdate.h to new empty-function style
> > > rcu: Eliminate the unused __rcu_is_watching() function
> > > rcu: Move the RCU_SCHEDULER_ definitions from rcupdate.h
> > > rcu: Remove linux/debugobjects.h from rcupdate.h
> > > rcu: Improve __call_rcu() debug-objects error message
> > > rcu: Move rcu_is_nocb_cpu() from rcupdate.h to rcu.h
> > > rcu: Move rcu_ftrace_dump() from rcupdate.h to rcu.h
> > > rcu: move rcupdate.h to the new true/false-function style
> > > rcu: Move torture-related functions out of rcutiny.h and rcutree.h
> > > rcu: Move rcu_request_urgent_qs_task() out of rcutiny.h and rcutree.h
> > > rcu: Move rcutiny.h to new empty/true/false-function style
> > > srcu: Prevent sdp->srcu_gp_seq_needed counter wrap
> > > srcu: Shrink srcu.h by moving docbook and private function
> > > srcu: Apply trivial callback lists to shrink Tiny SRCU
> > > lockdep: Use consistent printing primitives
> > > rcu: Refactor #includes from include/linux/rcupdate.h
> > > rcu: Convert rnp->lock wrappers to macros for SRCU use
> > > rcu: Move rnp->lock wrappers for SRCU use
> > > srcu: Use rnp->lock wrappers to replace explicit memory barriers
> > > rcu: Remove *_SLOW_* Kconfig options
> > > rcu: Remove the RCU_KTHREAD_PRIO Kconfig option
> > > rcu: Remove nohz_full full-system-idle state machine
> > > rcu: Remove #ifdef moving rcu_end_inkernel_boot from rcupdate.h
> > > rcu: Remove typecheck() from RCU locking wrapper functions
> > > rcu: Remove the now-obsolete PROVE_RCU_REPEATEDLY Kconfig option
> > > rcu: Remove SPARSE_RCU_POINTER Kconfig option
> > > srcu: Fix rcutorture-statistics typo
> > > srcu: Remove Classic SRCU
> > > rcu: Remove debugfs tracing
> > > rcu: Eliminate NOCBs CPU-state Kconfig options
> > > rcu: Move RCU non-debug Kconfig options to kernel/rcu
> > > rcu: Move RCU debug Kconfig options to kernel/rcu
> > > rcu: Remove event tracing from Tiny RCU
> > > rcu: Remove RCU CPU stall warnings from Tiny RCU
> > >
> > > Priyalee Kushwaha (1):
> > > srcu-cbmc: Use /usr/bin/awk instead of /bin/awk
> > >
> > > Stan Drozd (1):
> > > docs: Fix typo in Documentation/memory-barriers.txt
> > >
> > > Documentation/RCU/00-INDEX | 2 -
> > > .../RCU/Design/Requirements/Requirements.html | 34 +-
> > > Documentation/RCU/checklist.txt | 8 +-
> > > Documentation/RCU/trace.txt | 535 -----------------
> > > Documentation/admin-guide/kernel-parameters.txt | 41 +-
> > > Documentation/core-api/atomic_ops.rst | 5 +
> > > Documentation/dev-tools/sparse.rst | 6 -
> > > Documentation/kernel-per-CPU-kthreads.txt | 31 +-
> > > Documentation/memory-barriers.txt | 2 +-
> > > Documentation/timers/NO_HZ.txt | 29 +-
> > > include/linux/bcm47xx_nvram.h | 1 +
> > > include/linux/compiler.h | 4 -
> > > include/linux/rcu_node_tree.h | 4 +
> > > include/linux/rcu_segcblist.h | 4 +
> > > include/linux/rcupdate.h | 318 +---------
> > > include/linux/rcutiny.h | 167 +-----
> > > include/linux/rcutree.h | 21 +-
> > > include/linux/spinlock.h | 20 +
> > > include/linux/srcu.h | 25 +-
> > > include/linux/srcuclassic.h | 115 ----
> > > include/linux/srcutiny.h | 47 +-
> > > include/linux/srcutree.h | 13 +-
> > > include/trace/events/rcu.h | 1 +
> > > init/Kconfig | 349 +----------
> > > kernel/locking/lockdep.c | 176 +++---
> > > kernel/rcu/Kconfig | 242 ++++++++
> > > kernel/rcu/Kconfig.debug | 82 +++
> > > kernel/rcu/Makefile | 2 -
> > > kernel/rcu/rcu.h | 277 +++++++++
> > > kernel/rcu/rcuperf.c | 129 +++-
> > > kernel/rcu/rcutorture.c | 21 +-
> > > kernel/rcu/srcu.c | 661 ---------------------
> > > kernel/rcu/srcutiny.c | 86 +--
> > > kernel/rcu/srcutree.c | 187 ++++--
> > > kernel/rcu/tiny.c | 54 +-
> > > kernel/rcu/tiny_plugin.h | 123 ----
> > > kernel/rcu/tree.c | 195 +++---
> > > kernel/rcu/tree.h | 109 +---
> > > kernel/rcu/tree_exp.h | 2 +-
> > > kernel/rcu/tree_plugin.h | 573 +++---------------
> > > kernel/rcu/tree_trace.c | 494 ---------------
> > > kernel/rcu/update.c | 77 ++-
> > > kernel/sched/core.c | 8 +-
> > > kernel/time/Kconfig | 50 --
> > > lib/Kconfig.debug | 184 +-----
> > > lib/Makefile | 3 -
> > > scripts/checkpatch.pl | 17 -
> > > .../selftests/rcutorture/bin/configcheck.sh | 2 +-
> > > .../testing/selftests/rcutorture/bin/kvm-build.sh | 2 +-
> > > tools/testing/selftests/rcutorture/bin/kvm.sh | 5 +-
> > > .../selftests/rcutorture/configs/rcu/CFLIST | 2 +
> > > .../selftests/rcutorture/configs/rcu/SRCU-C.boot | 1 +
> > > .../selftests/rcutorture/configs/rcu/SRCU-N | 2 +-
> > > .../selftests/rcutorture/configs/rcu/SRCU-P | 6 +-
> > > .../selftests/rcutorture/configs/rcu/SRCU-t | 10 +
> > > .../selftests/rcutorture/configs/rcu/SRCU-t.boot | 1 +
> > > .../selftests/rcutorture/configs/rcu/SRCU-u | 9 +
> > > .../selftests/rcutorture/configs/rcu/SRCU-u.boot | 1 +
> > > .../selftests/rcutorture/configs/rcu/TINY02 | 5 +-
> > > .../selftests/rcutorture/configs/rcu/TREE01 | 5 +-
> > > .../selftests/rcutorture/configs/rcu/TREE01.boot | 4 +
> > > .../selftests/rcutorture/configs/rcu/TREE02 | 5 +-
> > > .../selftests/rcutorture/configs/rcu/TREE03 | 4 -
> > > .../selftests/rcutorture/configs/rcu/TREE03.boot | 4 +
> > > .../selftests/rcutorture/configs/rcu/TREE04 | 4 -
> > > .../selftests/rcutorture/configs/rcu/TREE05 | 4 -
> > > .../selftests/rcutorture/configs/rcu/TREE05.boot | 3 +
> > > .../selftests/rcutorture/configs/rcu/TREE06 | 4 +-
> > > .../selftests/rcutorture/configs/rcu/TREE06.boot | 3 +
> > > .../selftests/rcutorture/configs/rcu/TREE07 | 6 -
> > > .../selftests/rcutorture/configs/rcu/TREE08 | 1 -
> > > .../selftests/rcutorture/configs/rcu/TREE08-T | 21 -
> > > .../selftests/rcutorture/configs/rcu/TREE08.boot | 1 +
> > > .../configs/{rcu/TREE02-T => rcuperf/TINY} | 19 +-
> > > .../selftests/rcutorture/configs/rcuperf/TREE | 1 -
> > > .../selftests/rcutorture/configs/rcuperf/TREE54 | 1 -
> > > .../testing/selftests/rcutorture/doc/TINY_RCU.txt | 1 -
> > > .../selftests/rcutorture/doc/TREE_RCU-kconfig.txt | 34 +-
> > > .../rcutorture/formal/srcu-cbmc/modify_srcu.awk | 2 +-
> > > 79 files changed, 1496 insertions(+), 4211 deletions(-)
> > > delete mode 100644 Documentation/RCU/trace.txt
> > > delete mode 100644 include/linux/srcuclassic.h
> > > create mode 100644 kernel/rcu/Kconfig
> > > create mode 100644 kernel/rcu/Kconfig.debug
> > > delete mode 100644 kernel/rcu/srcu.c
> > > delete mode 100644 kernel/rcu/tree_trace.c
> > > create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-C.boot
> > > create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-t
> > > create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-t.boot
> > > create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-u
> > > create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/SRCU-u.boot
> > > delete mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE08-T
> > > rename tools/testing/selftests/rcutorture/configs/{rcu/TREE02-T => rcuperf/TINY} (50%)
> > >
> >
>

2017-06-14 20:23:37

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Wed, Jun 14, 2017 at 04:33:22PM +0200, Andrea Parri wrote:
> On Tue, Jun 13, 2017 at 09:33:17PM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 14, 2017 at 04:54:04AM +0200, Andrea Parri wrote:
> > > On Mon, Jun 12, 2017 at 02:37:55PM -0700, Paul E. McKenney wrote:
> > > > Hello, Ingo,
> > > >
> > > > This pull request is unusual in being a single linear set of commits,
> > > > as opposed to my usual topic branches. This is due to the many
> > > > large-footprint changes, which means that reasonable topic branches
> > > > result in large numbers of merge conflicts. In addition, some commits
> > > > depend on other commits that should be on different topic branches.
> > > > I will return to the topic-branch style next time.
> > > >
> > > > The largest feature of this series is shrinking and simplification,
> > > > with the following diffstat summary:
> > > >
> > > > 79 files changed, 1496 insertions(+), 4211 deletions(-)
> > > >
> > > > In other words, this series represents a net reduction of more than 2700
> > > > lines of code.
> > > >
> > > > These commits were posted to LKML:
> > > >
> > > > http://lkml.kernel.org/r/[email protected]
> > >
> > > I did raise some issues (AFAICT, unresolved) concerning...
> > >
> > >
> > > >
> > > > Two of these commits (46/88 and 48/88) have been deferred, most likely
> > > > to v4.14. All of the remaining commits have been subjected to the 0day
> > > > Test Robot and -next testing, and are availiable in teh git repository at:
> > > >
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git for-mingo
> > > >
> > > > for you to fetch changes up to 6d48152eafde1f0d0a4a9e0584fa7d9ff4fbfdac:
> > > >
> > > > rcu: Remove RCU CPU stall warnings from Tiny RCU (2017-06-08 18:52:45 -0700)
> > > >
> > > > ----------------------------------------------------------------
> > > > Arnd Bergmann (1):
> > > > bcm47xx: Fix build regression
> > > >
> > > > Paul E. McKenney (83):
> > > > rcutorture: Add lockdep to one of the SRCU scenarios
> > > > rcutorture: Add three-level tree test for Tree SRCU
> > > > rcutorture: Fix bug in reporting Kconfig mis-settings
> > > > rcutorture: Add a scenario for Tiny SRCU
> > > > rcutorture: Add a scenario for Classic SRCU
> > > > rcu: Prevent rcu_barrier() from starting needless grace periods
> > > > rcutorture: Correctly handle CONFIG_RCU_TORTURE_TEST_* options
> > > > rcutorture: Update test scenarios based on new Kconfig dependencies
> > > > srcu: Eliminate possibility of destructive counter overflow
> > > > rcu: Complain if blocking in preemptible RCU read-side critical section
> > > > rcuperf: Defer expedited/normal check to end of test
> > > > rcuperf: Remove conflicting Kconfig options
> > > > rcu: Remove obsolete reference to synchronize_kernel()
> > > > rcuperf: Add ability to performance-test call_rcu() and friends
> > > > rcuperf: Add a Kconfig-fragment file for Classic SRCU
> > > > rcu: Make sync_rcu_preempt_exp_done() return bool
> > > > checkpatch: Remove checks for expedited grace periods
> > > > rcuperf: Add test for dynamically initialized srcu_struct
> > > > doc/atomic_ops: Clarify smp_mb__{before,after}_atomic()
> > > > atomics: Add header comment so spin_unlock_wait()
> > >
> > > ... this one: c.f.,
> > >
> > > http://www.mail-archive.com/[email protected]/msg1418503.html
> > >
> > > Any hints about those?
> >
> > I suggest being -extremely- clear. This is about ARM, correct?
> > If so, lay out the exact situation (code example, which hardware,
> > which spin_unlock_wait(), which sequence of events) that could lead to
> > the failure.
> >
> > The problem here is that no one knows which of the 30 CPU families you
> > might be talking about, nor do they know exactly what the problem is.
> > I didn't worry about it at the time because I figured that you had
> > sent private email to Will with the full story.
> >
> > Yes, the four of us (you, Alan, Luc, and me) discussed it, but we weren't
> > sure whether it was a bug in the memory model, the spin_unlock_wait()
> > code, or my description of spin_unlock_wait(). Given that Will didn't
> > object to my April 13th email (the one that you were not CCed on),
> > I figured that he wasn't going to claim that the spin_unlock_wait()
> > description was wrong, especially since he went to so much effort some
> > years back to make ARM64 meet that description.
> >
> > So again, I recommend replying to your msg1418503.html email with
> > a code fragment demonstrating the problem, exact identification of
> > the hardware that might be susceptible (ARM64? ARM32? Which ARM32?),
> > exact identification of which spin_unlock_wait() function you suspect,
> > and a clear bullet-form sequence of events that shows how you believe
> > that the problem can occur.
> >
> > That makes it easy for people to see what your concern is, easy for
> > them to check their code and hardware, and hard for them to ignore you.
> >
> > Make sense?
>
> My concerns originates from the fact that none of the implementations
> (of spin_unlock_wait()) for the architectures touched by:
>
> 726328d92a42b6d4b76078e2659f43067f82c4e8
> ("locking/spinlock, arch: Update and fix spin_unlock_wait() implementations"
>
> currently contain any traces of that RELEASE/spin_unlock() from your:
>
> "Semantically this is equivalent to a spin_lock() immediately followed
> by a spin_unlock()."
>
> In fact, the header of that commit states:
>
> "The update is in semantics; where it previously was only a control
> dependency, we now upgrade to a full load-acquire [...]"
>
> For an example leveraging this RELEASE, consider:
>
> [initially: X = 0, s UNLOCKED]
>
> P0 P1
> X = 1; spin_lock(s);
> spin_unlock_wait(s); r0 = X;
>
> According to the "spin_lock(); spin_unlock() semantics" this has one
> non-deadlocking execution, and the RELEASE from the spin_unlock_wait()
> (paired with the ACQUIRE from the spin_lock() in P1) guarantees that
> r0 = 1 in this execution. AFAICT, this same conclusion does not hold
> according to the "smp_cond_load_acquire() semantics" (726328d92a42b).

OK. For exactly which CPU families do you believe that this fails to
hold. That is, given the implementations of spin_unlock_wait() and
spin_lock() for the various CPU families, which will break and why?

Thanx, Paul

2017-06-19 16:24:56

by Andrea Parri

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Wed, Jun 14, 2017 at 01:23:29PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 14, 2017 at 04:33:22PM +0200, Andrea Parri wrote:
> > On Tue, Jun 13, 2017 at 09:33:17PM -0700, Paul E. McKenney wrote:
> > > On Wed, Jun 14, 2017 at 04:54:04AM +0200, Andrea Parri wrote:
> > > > On Mon, Jun 12, 2017 at 02:37:55PM -0700, Paul E. McKenney wrote:
> > > > > Hello, Ingo,
> > > > >
> > > > > This pull request is unusual in being a single linear set of commits,
> > > > > as opposed to my usual topic branches. This is due to the many
> > > > > large-footprint changes, which means that reasonable topic branches
> > > > > result in large numbers of merge conflicts. In addition, some commits
> > > > > depend on other commits that should be on different topic branches.
> > > > > I will return to the topic-branch style next time.
> > > > >
> > > > > The largest feature of this series is shrinking and simplification,
> > > > > with the following diffstat summary:
> > > > >
> > > > > 79 files changed, 1496 insertions(+), 4211 deletions(-)
> > > > >
> > > > > In other words, this series represents a net reduction of more than 2700
> > > > > lines of code.
> > > > >
> > > > > These commits were posted to LKML:
> > > > >
> > > > > http://lkml.kernel.org/r/[email protected]
> > > >
> > > > I did raise some issues (AFAICT, unresolved) concerning...
> > > >
> > > >
> > > > >
> > > > > Two of these commits (46/88 and 48/88) have been deferred, most likely
> > > > > to v4.14. All of the remaining commits have been subjected to the 0day
> > > > > Test Robot and -next testing, and are availiable in teh git repository at:
> > > > >
> > > > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git for-mingo
> > > > >
> > > > > for you to fetch changes up to 6d48152eafde1f0d0a4a9e0584fa7d9ff4fbfdac:
> > > > >
> > > > > rcu: Remove RCU CPU stall warnings from Tiny RCU (2017-06-08 18:52:45 -0700)
> > > > >
> > > > > ----------------------------------------------------------------
> > > > > Arnd Bergmann (1):
> > > > > bcm47xx: Fix build regression
> > > > >
> > > > > Paul E. McKenney (83):
> > > > > rcutorture: Add lockdep to one of the SRCU scenarios
> > > > > rcutorture: Add three-level tree test for Tree SRCU
> > > > > rcutorture: Fix bug in reporting Kconfig mis-settings
> > > > > rcutorture: Add a scenario for Tiny SRCU
> > > > > rcutorture: Add a scenario for Classic SRCU
> > > > > rcu: Prevent rcu_barrier() from starting needless grace periods
> > > > > rcutorture: Correctly handle CONFIG_RCU_TORTURE_TEST_* options
> > > > > rcutorture: Update test scenarios based on new Kconfig dependencies
> > > > > srcu: Eliminate possibility of destructive counter overflow
> > > > > rcu: Complain if blocking in preemptible RCU read-side critical section
> > > > > rcuperf: Defer expedited/normal check to end of test
> > > > > rcuperf: Remove conflicting Kconfig options
> > > > > rcu: Remove obsolete reference to synchronize_kernel()
> > > > > rcuperf: Add ability to performance-test call_rcu() and friends
> > > > > rcuperf: Add a Kconfig-fragment file for Classic SRCU
> > > > > rcu: Make sync_rcu_preempt_exp_done() return bool
> > > > > checkpatch: Remove checks for expedited grace periods
> > > > > rcuperf: Add test for dynamically initialized srcu_struct
> > > > > doc/atomic_ops: Clarify smp_mb__{before,after}_atomic()
> > > > > atomics: Add header comment so spin_unlock_wait()
> > > >
> > > > ... this one: c.f.,
> > > >
> > > > http://www.mail-archive.com/[email protected]/msg1418503.html
> > > >
> > > > Any hints about those?
> > >
> > > I suggest being -extremely- clear. This is about ARM, correct?
> > > If so, lay out the exact situation (code example, which hardware,
> > > which spin_unlock_wait(), which sequence of events) that could lead to
> > > the failure.
> > >
> > > The problem here is that no one knows which of the 30 CPU families you
> > > might be talking about, nor do they know exactly what the problem is.
> > > I didn't worry about it at the time because I figured that you had
> > > sent private email to Will with the full story.
> > >
> > > Yes, the four of us (you, Alan, Luc, and me) discussed it, but we weren't
> > > sure whether it was a bug in the memory model, the spin_unlock_wait()
> > > code, or my description of spin_unlock_wait(). Given that Will didn't
> > > object to my April 13th email (the one that you were not CCed on),
> > > I figured that he wasn't going to claim that the spin_unlock_wait()
> > > description was wrong, especially since he went to so much effort some
> > > years back to make ARM64 meet that description.
> > >
> > > So again, I recommend replying to your msg1418503.html email with
> > > a code fragment demonstrating the problem, exact identification of
> > > the hardware that might be susceptible (ARM64? ARM32? Which ARM32?),
> > > exact identification of which spin_unlock_wait() function you suspect,
> > > and a clear bullet-form sequence of events that shows how you believe
> > > that the problem can occur.
> > >
> > > That makes it easy for people to see what your concern is, easy for
> > > them to check their code and hardware, and hard for them to ignore you.
> > >
> > > Make sense?
> >
> > My concerns originates from the fact that none of the implementations
> > (of spin_unlock_wait()) for the architectures touched by:
> >
> > 726328d92a42b6d4b76078e2659f43067f82c4e8
> > ("locking/spinlock, arch: Update and fix spin_unlock_wait() implementations"
> >
> > currently contain any traces of that RELEASE/spin_unlock() from your:
> >
> > "Semantically this is equivalent to a spin_lock() immediately followed
> > by a spin_unlock()."
> >
> > In fact, the header of that commit states:
> >
> > "The update is in semantics; where it previously was only a control
> > dependency, we now upgrade to a full load-acquire [...]"
> >
> > For an example leveraging this RELEASE, consider:
> >
> > [initially: X = 0, s UNLOCKED]
> >
> > P0 P1
> > X = 1; spin_lock(s);
> > spin_unlock_wait(s); r0 = X;
> >
> > According to the "spin_lock(); spin_unlock() semantics" this has one
> > non-deadlocking execution, and the RELEASE from the spin_unlock_wait()
> > (paired with the ACQUIRE from the spin_lock() in P1) guarantees that
> > r0 = 1 in this execution. AFAICT, this same conclusion does not hold
> > according to the "smp_cond_load_acquire() semantics" (726328d92a42b).
>
> OK. For exactly which CPU families do you believe that this fails to
> hold. That is, given the implementations of spin_unlock_wait() and
> spin_lock() for the various CPU families, which will break and why?

Considering the case of x86, the question amounts to ask whether
the "exists" clause of the following test can be satisfied:

X86 TEST
{
1:EAX=1;
}
P0 | P1 ;
MOV [X],$1 | LOCK XCHG [s],EAX ;
MOV EAX,[s] | MOV EBX,[X] ;
exists
(0:EAX=0 /\ 1:EBX=0)

The answer is "Yes", as illustrated by the following sequence of
events:

1) P0's store to X is placed into the store buffer (of P0);
2) P0 loads 0 from s;
3) P1 executes LOCK XCHG;
4) P1 loads X=0 from memory;
5) P0's store to X is flushed from store buffer to memory.

---

This analysis shows that the x86 implementation needs additional
synchronization to meet a "spin_lock();spin_unlock()" semantics.

I believe that the same conclusion holds for other architectures
(e.g., sparc, ia64, arm(32), alpha).

In fact, the only two implementations I know that _seem_ able to
meet that semantics are arm64's and powerpc's.

Andrea


>
> Thanx, Paul
>

2017-06-27 20:58:22

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Mon, Jun 19, 2017 at 06:24:28PM +0200, Andrea Parri wrote:
> On Wed, Jun 14, 2017 at 01:23:29PM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 14, 2017 at 04:33:22PM +0200, Andrea Parri wrote:
> > > On Tue, Jun 13, 2017 at 09:33:17PM -0700, Paul E. McKenney wrote:
> > > > On Wed, Jun 14, 2017 at 04:54:04AM +0200, Andrea Parri wrote:
> > > > > On Mon, Jun 12, 2017 at 02:37:55PM -0700, Paul E. McKenney wrote:
> > > > > > Hello, Ingo,
> > > > > >
> > > > > > This pull request is unusual in being a single linear set of commits,
> > > > > > as opposed to my usual topic branches. This is due to the many
> > > > > > large-footprint changes, which means that reasonable topic branches
> > > > > > result in large numbers of merge conflicts. In addition, some commits
> > > > > > depend on other commits that should be on different topic branches.
> > > > > > I will return to the topic-branch style next time.
> > > > > >
> > > > > > The largest feature of this series is shrinking and simplification,
> > > > > > with the following diffstat summary:
> > > > > >
> > > > > > 79 files changed, 1496 insertions(+), 4211 deletions(-)
> > > > > >
> > > > > > In other words, this series represents a net reduction of more than 2700
> > > > > > lines of code.
> > > > > >
> > > > > > These commits were posted to LKML:
> > > > > >
> > > > > > http://lkml.kernel.org/r/[email protected]
> > > > >
> > > > > I did raise some issues (AFAICT, unresolved) concerning...
> > > > >
> > > > >
> > > > > >
> > > > > > Two of these commits (46/88 and 48/88) have been deferred, most likely
> > > > > > to v4.14. All of the remaining commits have been subjected to the 0day
> > > > > > Test Robot and -next testing, and are availiable in teh git repository at:
> > > > > >
> > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git for-mingo
> > > > > >
> > > > > > for you to fetch changes up to 6d48152eafde1f0d0a4a9e0584fa7d9ff4fbfdac:
> > > > > >
> > > > > > rcu: Remove RCU CPU stall warnings from Tiny RCU (2017-06-08 18:52:45 -0700)
> > > > > >
> > > > > > ----------------------------------------------------------------
> > > > > > Arnd Bergmann (1):
> > > > > > bcm47xx: Fix build regression
> > > > > >
> > > > > > Paul E. McKenney (83):
> > > > > > rcutorture: Add lockdep to one of the SRCU scenarios
> > > > > > rcutorture: Add three-level tree test for Tree SRCU
> > > > > > rcutorture: Fix bug in reporting Kconfig mis-settings
> > > > > > rcutorture: Add a scenario for Tiny SRCU
> > > > > > rcutorture: Add a scenario for Classic SRCU
> > > > > > rcu: Prevent rcu_barrier() from starting needless grace periods
> > > > > > rcutorture: Correctly handle CONFIG_RCU_TORTURE_TEST_* options
> > > > > > rcutorture: Update test scenarios based on new Kconfig dependencies
> > > > > > srcu: Eliminate possibility of destructive counter overflow
> > > > > > rcu: Complain if blocking in preemptible RCU read-side critical section
> > > > > > rcuperf: Defer expedited/normal check to end of test
> > > > > > rcuperf: Remove conflicting Kconfig options
> > > > > > rcu: Remove obsolete reference to synchronize_kernel()
> > > > > > rcuperf: Add ability to performance-test call_rcu() and friends
> > > > > > rcuperf: Add a Kconfig-fragment file for Classic SRCU
> > > > > > rcu: Make sync_rcu_preempt_exp_done() return bool
> > > > > > checkpatch: Remove checks for expedited grace periods
> > > > > > rcuperf: Add test for dynamically initialized srcu_struct
> > > > > > doc/atomic_ops: Clarify smp_mb__{before,after}_atomic()
> > > > > > atomics: Add header comment so spin_unlock_wait()
> > > > >
> > > > > ... this one: c.f.,
> > > > >
> > > > > http://www.mail-archive.com/[email protected]/msg1418503.html
> > > > >
> > > > > Any hints about those?
> > > >
> > > > I suggest being -extremely- clear. This is about ARM, correct?
> > > > If so, lay out the exact situation (code example, which hardware,
> > > > which spin_unlock_wait(), which sequence of events) that could lead to
> > > > the failure.
> > > >
> > > > The problem here is that no one knows which of the 30 CPU families you
> > > > might be talking about, nor do they know exactly what the problem is.
> > > > I didn't worry about it at the time because I figured that you had
> > > > sent private email to Will with the full story.
> > > >
> > > > Yes, the four of us (you, Alan, Luc, and me) discussed it, but we weren't
> > > > sure whether it was a bug in the memory model, the spin_unlock_wait()
> > > > code, or my description of spin_unlock_wait(). Given that Will didn't
> > > > object to my April 13th email (the one that you were not CCed on),
> > > > I figured that he wasn't going to claim that the spin_unlock_wait()
> > > > description was wrong, especially since he went to so much effort some
> > > > years back to make ARM64 meet that description.
> > > >
> > > > So again, I recommend replying to your msg1418503.html email with
> > > > a code fragment demonstrating the problem, exact identification of
> > > > the hardware that might be susceptible (ARM64? ARM32? Which ARM32?),
> > > > exact identification of which spin_unlock_wait() function you suspect,
> > > > and a clear bullet-form sequence of events that shows how you believe
> > > > that the problem can occur.
> > > >
> > > > That makes it easy for people to see what your concern is, easy for
> > > > them to check their code and hardware, and hard for them to ignore you.
> > > >
> > > > Make sense?
> > >
> > > My concerns originates from the fact that none of the implementations
> > > (of spin_unlock_wait()) for the architectures touched by:
> > >
> > > 726328d92a42b6d4b76078e2659f43067f82c4e8
> > > ("locking/spinlock, arch: Update and fix spin_unlock_wait() implementations"
> > >
> > > currently contain any traces of that RELEASE/spin_unlock() from your:
> > >
> > > "Semantically this is equivalent to a spin_lock() immediately followed
> > > by a spin_unlock()."
> > >
> > > In fact, the header of that commit states:
> > >
> > > "The update is in semantics; where it previously was only a control
> > > dependency, we now upgrade to a full load-acquire [...]"
> > >
> > > For an example leveraging this RELEASE, consider:
> > >
> > > [initially: X = 0, s UNLOCKED]
> > >
> > > P0 P1
> > > X = 1; spin_lock(s);
> > > spin_unlock_wait(s); r0 = X;
> > >
> > > According to the "spin_lock(); spin_unlock() semantics" this has one
> > > non-deadlocking execution, and the RELEASE from the spin_unlock_wait()
> > > (paired with the ACQUIRE from the spin_lock() in P1) guarantees that
> > > r0 = 1 in this execution. AFAICT, this same conclusion does not hold
> > > according to the "smp_cond_load_acquire() semantics" (726328d92a42b).
> >
> > OK. For exactly which CPU families do you believe that this fails to
> > hold. That is, given the implementations of spin_unlock_wait() and
> > spin_lock() for the various CPU families, which will break and why?
>
> Considering the case of x86, the question amounts to ask whether
> the "exists" clause of the following test can be satisfied:
>
> X86 TEST
> {
> 1:EAX=1;
> }
> P0 | P1 ;
> MOV [X],$1 | LOCK XCHG [s],EAX ;
> MOV EAX,[s] | MOV EBX,[X] ;
> exists
> (0:EAX=0 /\ 1:EBX=0)
>
> The answer is "Yes", as illustrated by the following sequence of
> events:
>
> 1) P0's store to X is placed into the store buffer (of P0);
> 2) P0 loads 0 from s;
> 3) P1 executes LOCK XCHG;
> 4) P1 loads X=0 from memory;
> 5) P0's store to X is flushed from store buffer to memory.
>
> ---
>
> This analysis shows that the x86 implementation needs additional
> synchronization to meet a "spin_lock();spin_unlock()" semantics.
>
> I believe that the same conclusion holds for other architectures
> (e.g., sparc, ia64, arm(32), alpha).
>
> In fact, the only two implementations I know that _seem_ able to
> meet that semantics are arm64's and powerpc's.

And you are quite correct. In fact, Luc Maranget was able to make this
happen on real x86 hardware using real Linux-kernel locking primitives.

So what next?

One option would be to weaken the definition of spin_unlock_wait() so
that it had acquire semantics but not release semantics. Alternatively,
we could keep the full empty-critical-section semantics and add memory
barriers to spin_unlock_wait() implementations, perhaps as shown in the
patch below. I could go either way, though I do have some preference
for the stronger semantics.

Thoughts?

Thanx, Paul

------------------------------------------------------------------------

commit a023454468fefc41c5d3c1054106ca77aa1b962d
Author: Paul E. McKenney <[email protected]>
Date: Fri Jun 23 13:20:27 2017 -0700

locking: Add leading smp_mb() for spin_unlock_wait()

The spin_unlock_wait() primitive is supposed to have the effect of
a spin_lock() immediately followed by a spin_unlock(). However,
the klitmus7 tool says otherwise on x86 when running a kernel module
generated from the following litmus test:

C C-unlock-wait-00
{
spinlock_t r;
}

P0(spinlock_t *r,int *x,int *y)
{
int r0;
WRITE_ONCE(*x,1) ;
spin_unlock_wait(r);
r0 = READ_ONCE(*y);

}

P1(spinlock_t *r,int *x,int *y) {
int r1;
spin_lock(r);
WRITE_ONCE(*y,1);
r1 = READ_ONCE(*x);
spin_unlock(r);
}

exists (0:r0=0 /\ 1:r1=0)

If P0()'s spin_unlock_wait() slots in before P1()'s critical section,
then r1 cannot be zero, and if P0()'s spin_unlock_wait() slots in after
P1()'s critical section, then r0 cannot be zero. Yet real code running
on real hardware observes both r0 and r1 being zero on real hardware,
including x86.

One option is to weaken the semantics of spin_unlock_wait(),
but there is one use case that seems to need the strong semantics
(ata_scsi_cmd_error_handler()), and another that has explicit memory
barriers that could be removed given the stronger semantics proposed
in this commit (nf_conntrack_lock()).

This commit therefore adds memory barriers to enforce the stronger
semantics. Please note that this commit takes an extremely conservative
approach because spin_unlock_wait() is not used heavily or on fastpaths.
It is quite possible that some of the added memory barriers are in
fact unnecessary.

Reported-by: Andrea Parri <[email protected]>
Reported-by: Luc Maranget <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: <[email protected]>

diff --git a/arch/alpha/include/asm/spinlock.h b/arch/alpha/include/asm/spinlock.h
index a40b9fc0c6c3..6ceb56e1e716 100644
--- a/arch/alpha/include/asm/spinlock.h
+++ b/arch/alpha/include/asm/spinlock.h
@@ -18,6 +18,7 @@

static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
{
+ smp_mb();
smp_cond_load_acquire(&lock->lock, !VAL);
}

diff --git a/arch/arc/include/asm/spinlock.h b/arch/arc/include/asm/spinlock.h
index 233d5ffe6ec7..9d0f588caea3 100644
--- a/arch/arc/include/asm/spinlock.h
+++ b/arch/arc/include/asm/spinlock.h
@@ -18,6 +18,7 @@

static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
{
+ smp_mb();
smp_cond_load_acquire(&lock->slock, !VAL);
}

diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h
index 4bec45442072..0b097ea6760d 100644
--- a/arch/arm/include/asm/spinlock.h
+++ b/arch/arm/include/asm/spinlock.h
@@ -56,6 +56,7 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
{
u16 owner = READ_ONCE(lock->tickets.owner);

+ smp_mb();
for (;;) {
arch_spinlock_t tmp = READ_ONCE(*lock);

diff --git a/arch/blackfin/include/asm/spinlock.h b/arch/blackfin/include/asm/spinlock.h
index c58f4a83ed6f..f4bdc8996d35 100644
--- a/arch/blackfin/include/asm/spinlock.h
+++ b/arch/blackfin/include/asm/spinlock.h
@@ -50,6 +50,7 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)

static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
{
+ smp_mb();
smp_cond_load_acquire(&lock->lock, !VAL);
}

diff --git a/arch/hexagon/include/asm/spinlock.h b/arch/hexagon/include/asm/spinlock.h
index a1c55788c5d6..9f09f900e146 100644
--- a/arch/hexagon/include/asm/spinlock.h
+++ b/arch/hexagon/include/asm/spinlock.h
@@ -181,6 +181,7 @@ static inline unsigned int arch_spin_trylock(arch_spinlock_t *lock)

static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
{
+ smp_mb();
smp_cond_load_acquire(&lock->lock, !VAL);
}

diff --git a/arch/ia64/include/asm/spinlock.h b/arch/ia64/include/asm/spinlock.h
index ca9e76149a4a..4c76d7caf185 100644
--- a/arch/ia64/include/asm/spinlock.h
+++ b/arch/ia64/include/asm/spinlock.h
@@ -145,6 +145,7 @@ static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,

static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
{
+ smp_mb();
__ticket_spin_unlock_wait(lock);
}

diff --git a/arch/m32r/include/asm/spinlock.h b/arch/m32r/include/asm/spinlock.h
index 323c7fc953cd..1f2e97486557 100644
--- a/arch/m32r/include/asm/spinlock.h
+++ b/arch/m32r/include/asm/spinlock.h
@@ -32,6 +32,7 @@

static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
{
+ smp_mb();
smp_cond_load_acquire(&lock->slock, VAL > 0);
}

diff --git a/arch/metag/include/asm/spinlock.h b/arch/metag/include/asm/spinlock.h
index c0c7a22be1ae..53a17023beed 100644
--- a/arch/metag/include/asm/spinlock.h
+++ b/arch/metag/include/asm/spinlock.h
@@ -17,6 +17,7 @@

static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
{
+ smp_mb();
smp_cond_load_acquire(&lock->lock, !VAL);
}

diff --git a/arch/mips/include/asm/spinlock.h b/arch/mips/include/asm/spinlock.h
index a8df44d60607..2c5788223abc 100644
--- a/arch/mips/include/asm/spinlock.h
+++ b/arch/mips/include/asm/spinlock.h
@@ -53,7 +53,7 @@ static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
{
u16 owner = READ_ONCE(lock->h.serving_now);
- smp_rmb();
+ smp_mb();
for (;;) {
arch_spinlock_t tmp = READ_ONCE(*lock);

diff --git a/arch/mn10300/include/asm/spinlock.h b/arch/mn10300/include/asm/spinlock.h
index 9c7b8f7942d8..517db0908132 100644
--- a/arch/mn10300/include/asm/spinlock.h
+++ b/arch/mn10300/include/asm/spinlock.h
@@ -28,6 +28,7 @@

static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
{
+ smp_mb();
smp_cond_load_acquire(&lock->slock, !VAL);
}

diff --git a/arch/parisc/include/asm/spinlock.h b/arch/parisc/include/asm/spinlock.h
index e32936cd7f10..aecf9e4f4335 100644
--- a/arch/parisc/include/asm/spinlock.h
+++ b/arch/parisc/include/asm/spinlock.h
@@ -18,6 +18,7 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t *x)
{
volatile unsigned int *a = __ldcw_align(x);

+ smp_mb();
smp_cond_load_acquire(a, VAL);
}

diff --git a/arch/s390/include/asm/spinlock.h b/arch/s390/include/asm/spinlock.h
index f7838ecd83c6..5b97e8e9e444 100644
--- a/arch/s390/include/asm/spinlock.h
+++ b/arch/s390/include/asm/spinlock.h
@@ -100,6 +100,7 @@ static inline void arch_spin_unlock(arch_spinlock_t *lp)

static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
{
+ smp_mb();
while (arch_spin_is_locked(lock))
arch_spin_relax(lock);
smp_acquire__after_ctrl_dep();
diff --git a/arch/sh/include/asm/spinlock-cas.h b/arch/sh/include/asm/spinlock-cas.h
index c46e8cc7b515..37af00015330 100644
--- a/arch/sh/include/asm/spinlock-cas.h
+++ b/arch/sh/include/asm/spinlock-cas.h
@@ -31,6 +31,7 @@ static inline unsigned __sl_cas(volatile unsigned *p, unsigned old, unsigned new

static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
{
+ smp_mb();
smp_cond_load_acquire(&lock->lock, VAL > 0);
}

diff --git a/arch/sh/include/asm/spinlock-llsc.h b/arch/sh/include/asm/spinlock-llsc.h
index cec78143fa83..cf3e37e70ae8 100644
--- a/arch/sh/include/asm/spinlock-llsc.h
+++ b/arch/sh/include/asm/spinlock-llsc.h
@@ -23,6 +23,7 @@

static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
{
+ smp_mb();
smp_cond_load_acquire(&lock->lock, VAL > 0);
}

diff --git a/arch/sparc/include/asm/spinlock_32.h b/arch/sparc/include/asm/spinlock_32.h
index 8011e79f59c9..441009963962 100644
--- a/arch/sparc/include/asm/spinlock_32.h
+++ b/arch/sparc/include/asm/spinlock_32.h
@@ -16,6 +16,7 @@

static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
{
+ smp_mb();
smp_cond_load_acquire(&lock->lock, !VAL);
}

diff --git a/arch/sparc/include/asm/spinlock_64.h b/arch/sparc/include/asm/spinlock_64.h
index 07c9f2e9bf57..9be036a4a88b 100644
--- a/arch/sparc/include/asm/spinlock_64.h
+++ b/arch/sparc/include/asm/spinlock_64.h
@@ -28,6 +28,7 @@

static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
{
+ smp_mb();
smp_cond_load_acquire(&lock->lock, !VAL);
}

diff --git a/arch/tile/lib/spinlock_32.c b/arch/tile/lib/spinlock_32.c
index 076c6cc43113..7a9bb8acdb6f 100644
--- a/arch/tile/lib/spinlock_32.c
+++ b/arch/tile/lib/spinlock_32.c
@@ -69,6 +69,7 @@ void arch_spin_unlock_wait(arch_spinlock_t *lock)
int next = READ_ONCE(lock->next_ticket);

/* Return immediately if unlocked. */
+ smp_mb();
if (next == curr)
return;

diff --git a/arch/tile/lib/spinlock_64.c b/arch/tile/lib/spinlock_64.c
index a4b5b2cbce93..2933e02582c9 100644
--- a/arch/tile/lib/spinlock_64.c
+++ b/arch/tile/lib/spinlock_64.c
@@ -69,6 +69,7 @@ void arch_spin_unlock_wait(arch_spinlock_t *lock)
u32 curr = arch_spin_current(val);

/* Return immediately if unlocked. */
+ smp_mb();
if (arch_spin_next(val) == curr)
return;

diff --git a/arch/xtensa/include/asm/spinlock.h b/arch/xtensa/include/asm/spinlock.h
index a36221cf6363..536417abd11b 100644
--- a/arch/xtensa/include/asm/spinlock.h
+++ b/arch/xtensa/include/asm/spinlock.h
@@ -35,6 +35,7 @@

static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
{
+ smp_mb();
smp_cond_load_acquire(&lock->slock, !VAL);
}


2017-06-27 21:48:29

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Tue, Jun 27, 2017 at 1:58 PM, Paul E. McKenney
<[email protected]> wrote:
>
> So what next?
>
> One option would be to weaken the definition of spin_unlock_wait() so
> that it had acquire semantics but not release semantics. Alternatively,
> we could keep the full empty-critical-section semantics and add memory
> barriers to spin_unlock_wait() implementations, perhaps as shown in the
> patch below. I could go either way, though I do have some preference
> for the stronger semantics.
>
> Thoughts?

I would prefer to just say

- document that spin_unlock_wait() has acquire semantics

- mindlessly add the smp_mb() to all users

- let users then decide if they are ok with just acquire

That's partly because I think we actually have *fewer* users than we
have implementations of spin_unlock_wait(). So adding a few smp_mb()'s
in the users is actually likely the smaller change.

But it's also because then that allows people who *can* say that
acquire is sufficient to just use it. People who use
spin_unlock_wait() tend to have some odd performance reason to do so,
so I think allowing them to use the more light-weight memory ordering
if it works for them is a good idea.

But finally, it's partly because I think "acquire" semantics are
actually the saner ones that we can explain the logic for much more
clearly.

Basically, acquire semantics means that you are guaranteed to see any
changes that were done inside a previously locked region.

Doesn't that sound like sensible semantics?

Then, the argument for "smp_mb()" (before the spin_unlock_wait()) becomes:

- I did a write that will affect any future lock takes

- the smp_mb() now means that that write will be ordered wrt the
acquire that guarantees we've seen all old actions taken by a lock.

Does those kinds of semantics make sense to people?

Linus

2017-06-27 23:38:08

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Tue, Jun 27, 2017 at 02:48:18PM -0700, Linus Torvalds wrote:
> On Tue, Jun 27, 2017 at 1:58 PM, Paul E. McKenney
> <[email protected]> wrote:
> >
> > So what next?
> >
> > One option would be to weaken the definition of spin_unlock_wait() so
> > that it had acquire semantics but not release semantics. Alternatively,
> > we could keep the full empty-critical-section semantics and add memory
> > barriers to spin_unlock_wait() implementations, perhaps as shown in the
> > patch below. I could go either way, though I do have some preference
> > for the stronger semantics.
> >
> > Thoughts?
>
> I would prefer to just say
>
> - document that spin_unlock_wait() has acquire semantics
>
> - mindlessly add the smp_mb() to all users
>
> - let users then decide if they are ok with just acquire
>
> That's partly because I think we actually have *fewer* users than we
> have implementations of spin_unlock_wait(). So adding a few smp_mb()'s
> in the users is actually likely the smaller change.

You are right about that! There are only five invocations of
spin_unlock_wait() in the kernel, with a sixth that has since been
converted to spin_lock() immediately followed by spin_unlock().

> But it's also because then that allows people who *can* say that
> acquire is sufficient to just use it. People who use
> spin_unlock_wait() tend to have some odd performance reason to do so,
> so I think allowing them to use the more light-weight memory ordering
> if it works for them is a good idea.
>
> But finally, it's partly because I think "acquire" semantics are
> actually the saner ones that we can explain the logic for much more
> clearly.
>
> Basically, acquire semantics means that you are guaranteed to see any
> changes that were done inside a previously locked region.
>
> Doesn't that sound like sensible semantics?

It is the semantics that most implementations of spin_unlock_wait()
provide. Of the six invocations, two of them very clearly rely
only on the acquire semantics and two others already have the needed
memory barriers in place. I have queued one patch to add smp_mb()
to the remaining spin_unlock_wait() of the surviving five instances,
and another patch to convert the spin_lock/unlock pair to smp_mb()
followed by spin_unlock_wait().

So, yes, it is a sensible set of semantics. At this point, agreeing
-any- reasonable semantics would be good, as it would allow us to get
locking added to the prototype Linux-kernel memory model. ;-)

> Then, the argument for "smp_mb()" (before the spin_unlock_wait()) becomes:
>
> - I did a write that will affect any future lock takes
>
> - the smp_mb() now means that that write will be ordered wrt the
> acquire that guarantees we've seen all old actions taken by a lock.
>
> Does those kinds of semantics make sense to people?

In case the answer is "yes", the (untested) patch below (combining
three commits) shows the changes that I believe would be required.
A preview is also available as individual commits on branch
spin_unlock_wait.2017.06.27a on -rcu here:

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git

As usual, thoughts? ;-)

Thanx, Paul

------------------------------------------------------------------------

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index ef68232b5222..cc01b77a079a 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -704,8 +704,10 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,

/* initialize eh_tries */
ap->eh_tries = ATA_EH_MAX_TRIES;
- } else
+ } else {
+ smp_mb(); /* Add release semantics for spin_unlock_wait(). */
spin_unlock_wait(ap->lock);
+ }

}
EXPORT_SYMBOL(ata_scsi_cmd_error_handler);
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index d9510e8522d4..0c3f54e2a1d1 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -373,21 +373,21 @@ static __always_inline int spin_trylock_irq(spinlock_t *lock)
* spin_unlock_wait - Interpose between successive critical sections
* @lock: the spinlock whose critical sections are to be interposed.
*
- * Semantically this is equivalent to a spin_lock() immediately
- * followed by a spin_unlock(). However, most architectures have
- * more efficient implementations in which the spin_unlock_wait()
- * cannot block concurrent lock acquisition, and in some cases
- * where spin_unlock_wait() does not write to the lock variable.
- * Nevertheless, spin_unlock_wait() can have high overhead, so if
- * you feel the need to use it, please check to see if there is
- * a better way to get your job done.
+ * Semantically this is equivalent to a spin_lock() immediately followed
+ * by a mythical spin_unlock() that has no ordering semantics. However,
+ * most architectures have more efficient implementations in which the
+ * spin_unlock_wait() cannot block concurrent lock acquisition, and in some
+ * cases where spin_unlock_wait() does not write to the lock variable.
+ * Nevertheless, spin_unlock_wait() can have high overhead, so if you
+ * feel the need to use it, please check to see if there is a better way
+ * to get your job done.
*
- * The ordering guarantees provided by spin_unlock_wait() are:
- *
- * 1. All accesses preceding the spin_unlock_wait() happen before
- * any accesses in later critical sections for this same lock.
- * 2. All accesses following the spin_unlock_wait() happen after
- * any accesses in earlier critical sections for this same lock.
+ * The spin_unlock_wait() function guarantees that all accesses following
+ * the spin_unlock_wait() happen after any accesses in earlier critical
+ * sections for this same lock. Please note that it does -not- guarantee
+ * that accesses preceding the spin_unlock_wait() happen before any accesses
+ * in later critical sections for this same lock. If you need this latter
+ * ordering, precede the spin_unlock_wait() with an smp_mb() or similar.
*/
static __always_inline void spin_unlock_wait(spinlock_t *lock)
{
diff --git a/ipc/sem.c b/ipc/sem.c
index 947dc2348271..ef42e55e9dd0 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -307,8 +307,8 @@ static void complexmode_enter(struct sem_array *sma)

for (i = 0; i < sma->sem_nsems; i++) {
sem = sma->sem_base + i;
- spin_lock(&sem->lock);
- spin_unlock(&sem->lock);
+ smp_mb(); /* Add release semantics for spin_unlock_wait(). */
+ spin_unlock_wait(&sem->lock);
}
}


2017-06-28 15:32:02

by Alan Stern

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Tue, 27 Jun 2017, Paul E. McKenney wrote:

> On Tue, Jun 27, 2017 at 02:48:18PM -0700, Linus Torvalds wrote:
> > On Tue, Jun 27, 2017 at 1:58 PM, Paul E. McKenney
> > <[email protected]> wrote:
> > >
> > > So what next?
> > >
> > > One option would be to weaken the definition of spin_unlock_wait() so
> > > that it had acquire semantics but not release semantics. Alternatively,
> > > we could keep the full empty-critical-section semantics and add memory
> > > barriers to spin_unlock_wait() implementations, perhaps as shown in the
> > > patch below. I could go either way, though I do have some preference
> > > for the stronger semantics.
> > >
> > > Thoughts?
> >
> > I would prefer to just say
> >
> > - document that spin_unlock_wait() has acquire semantics
> >
> > - mindlessly add the smp_mb() to all users
> >
> > - let users then decide if they are ok with just acquire
> >
> > That's partly because I think we actually have *fewer* users than we
> > have implementations of spin_unlock_wait(). So adding a few smp_mb()'s
> > in the users is actually likely the smaller change.
>
> You are right about that! There are only five invocations of
> spin_unlock_wait() in the kernel, with a sixth that has since been
> converted to spin_lock() immediately followed by spin_unlock().
>
> > But it's also because then that allows people who *can* say that
> > acquire is sufficient to just use it. People who use
> > spin_unlock_wait() tend to have some odd performance reason to do so,
> > so I think allowing them to use the more light-weight memory ordering
> > if it works for them is a good idea.
> >
> > But finally, it's partly because I think "acquire" semantics are
> > actually the saner ones that we can explain the logic for much more
> > clearly.
> >
> > Basically, acquire semantics means that you are guaranteed to see any
> > changes that were done inside a previously locked region.
> >
> > Doesn't that sound like sensible semantics?
>
> It is the semantics that most implementations of spin_unlock_wait()
> provide. Of the six invocations, two of them very clearly rely
> only on the acquire semantics and two others already have the needed
> memory barriers in place. I have queued one patch to add smp_mb()
> to the remaining spin_unlock_wait() of the surviving five instances,
> and another patch to convert the spin_lock/unlock pair to smp_mb()
> followed by spin_unlock_wait().
>
> So, yes, it is a sensible set of semantics. At this point, agreeing
> -any- reasonable semantics would be good, as it would allow us to get
> locking added to the prototype Linux-kernel memory model. ;-)
>
> > Then, the argument for "smp_mb()" (before the spin_unlock_wait()) becomes:
> >
> > - I did a write that will affect any future lock takes
> >
> > - the smp_mb() now means that that write will be ordered wrt the
> > acquire that guarantees we've seen all old actions taken by a lock.
> >
> > Does those kinds of semantics make sense to people?

The problem is that adding smp_mb() before spin_unlock_wait() does not
provide release semantics, as Andrea has pointed out. ISTM that when
people want full release & acquire semantics, they should just use
"spin_lock(); spin_unlock();".

If there are any places where this would add unacceptable overhead,
maybe those places need some rethinking. For instance, perhaps we
could add a separate primitive that provides only release semantics.
(But would using the new primitive together with spin_unlock_wait
really be significantly better than lock-unlock?)

Alan Stern

2017-06-28 17:03:39

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Wed, Jun 28, 2017 at 11:31:55AM -0400, Alan Stern wrote:
> On Tue, 27 Jun 2017, Paul E. McKenney wrote:
>
> > On Tue, Jun 27, 2017 at 02:48:18PM -0700, Linus Torvalds wrote:
> > > On Tue, Jun 27, 2017 at 1:58 PM, Paul E. McKenney
> > > <[email protected]> wrote:
> > > >
> > > > So what next?
> > > >
> > > > One option would be to weaken the definition of spin_unlock_wait() so
> > > > that it had acquire semantics but not release semantics. Alternatively,
> > > > we could keep the full empty-critical-section semantics and add memory
> > > > barriers to spin_unlock_wait() implementations, perhaps as shown in the
> > > > patch below. I could go either way, though I do have some preference
> > > > for the stronger semantics.
> > > >
> > > > Thoughts?
> > >
> > > I would prefer to just say
> > >
> > > - document that spin_unlock_wait() has acquire semantics
> > >
> > > - mindlessly add the smp_mb() to all users
> > >
> > > - let users then decide if they are ok with just acquire
> > >
> > > That's partly because I think we actually have *fewer* users than we
> > > have implementations of spin_unlock_wait(). So adding a few smp_mb()'s
> > > in the users is actually likely the smaller change.
> >
> > You are right about that! There are only five invocations of
> > spin_unlock_wait() in the kernel, with a sixth that has since been
> > converted to spin_lock() immediately followed by spin_unlock().
> >
> > > But it's also because then that allows people who *can* say that
> > > acquire is sufficient to just use it. People who use
> > > spin_unlock_wait() tend to have some odd performance reason to do so,
> > > so I think allowing them to use the more light-weight memory ordering
> > > if it works for them is a good idea.
> > >
> > > But finally, it's partly because I think "acquire" semantics are
> > > actually the saner ones that we can explain the logic for much more
> > > clearly.
> > >
> > > Basically, acquire semantics means that you are guaranteed to see any
> > > changes that were done inside a previously locked region.
> > >
> > > Doesn't that sound like sensible semantics?
> >
> > It is the semantics that most implementations of spin_unlock_wait()
> > provide. Of the six invocations, two of them very clearly rely
> > only on the acquire semantics and two others already have the needed
> > memory barriers in place. I have queued one patch to add smp_mb()
> > to the remaining spin_unlock_wait() of the surviving five instances,
> > and another patch to convert the spin_lock/unlock pair to smp_mb()
> > followed by spin_unlock_wait().
> >
> > So, yes, it is a sensible set of semantics. At this point, agreeing
> > -any- reasonable semantics would be good, as it would allow us to get
> > locking added to the prototype Linux-kernel memory model. ;-)
> >
> > > Then, the argument for "smp_mb()" (before the spin_unlock_wait()) becomes:
> > >
> > > - I did a write that will affect any future lock takes
> > >
> > > - the smp_mb() now means that that write will be ordered wrt the
> > > acquire that guarantees we've seen all old actions taken by a lock.
> > >
> > > Does those kinds of semantics make sense to people?
>
> The problem is that adding smp_mb() before spin_unlock_wait() does not
> provide release semantics, as Andrea has pointed out. ISTM that when
> people want full release & acquire semantics, they should just use
> "spin_lock(); spin_unlock();".

Well, from what I can see, this thread is certainly demonstrating to my
satisfaction that we really do need a memory model. ;-)

I agree that calling a loops of loads a "release" is at best confusing,
and certainly conflicts with all know nomenclature. So let's forget
"release" or not and try to agree on litmus tests. Of course, as we
both know and as noted long ago on LKML, we cannot -define- semantics
via litmus tests, but we can use litmus tests to -check- semantics.

First, modeling lock acquisition with xchg(), which is fully ordered
(given that locking is still very much under development in our model):

C SUW+or-ow+l-ow-or
{}

P0(int *x, int *y, int *my_lock)
{
r0 = READ_ONCE(*x);
WRITE_ONCE(*y, 1);
smp_mb();
r1 = READ_ONCE(*my_lock);
}

P1(int *x, int *y, int *my_lock) {
r1 = xchg(my_lock, 1);
WRITE_ONCE(*x, 1);
r0 = READ_ONCE(*y);
}

exists (0:r0=1 /\ 0:r1=0 /\ 1:r0=0 /\ 1:r1=0)

This gives "Positive: 0 Negative: 5". But to your point, replacing
"xchg()" with "xchg_acquire()" gets us "Positive: 1 Negative: 7".

But xchg_acquire() would in fact work on x86 because the xchg instruction
does full ordering in any case, right? (I believe that this also applies
to the other TSO architectures, but have not checked.)

For PowerPC (and I believe ARM), the leading smp_mb() suffices, as
illustrated by this litmus test:

PPC spin_unlock_wait_st.litmus
""
{
l=0;
0:r1=1; 0:r3=42; 0:r4=x; 0:r10=0; 0:r12=l;
1:r1=1; 1:r3=42; 1:r4=x; 1:r10=0; 1:r11=0; 1:r12=l;
}
P0 | P1 ;
stw r1,0(r4) | lwarx r11,r10,r12 ;
sync | cmpwi r11,0 ;
lwarx r11,r10,r12 | bne Fail1 ;
stwcx. r11,r10,r12 | stwcx. r1,r10,r12 ;
bne Fail0 | bne Fail1 ;
lwz r3,0(r12) | lwsync ;
Fail0: | lwz r3,0(r4) ;
| Fail1: ;


exists
(0:r3=0 /\ 1:r3=0)

So what am I missing here?

> If there are any places where this would add unacceptable overhead,
> maybe those places need some rethinking. For instance, perhaps we
> could add a separate primitive that provides only release semantics.
> (But would using the new primitive together with spin_unlock_wait
> really be significantly better than lock-unlock?)

At the moment, I have no idea on the relative overheads. ;-)

Thanx, Paul

2017-06-28 20:16:14

by Alan Stern

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Wed, 28 Jun 2017, Paul E. McKenney wrote:

> > The problem is that adding smp_mb() before spin_unlock_wait() does not
> > provide release semantics, as Andrea has pointed out. ISTM that when
> > people want full release & acquire semantics, they should just use
> > "spin_lock(); spin_unlock();".
>
> Well, from what I can see, this thread is certainly demonstrating to my
> satisfaction that we really do need a memory model. ;-)
>
> I agree that calling a loops of loads a "release" is at best confusing,
> and certainly conflicts with all know nomenclature. So let's forget
> "release" or not and try to agree on litmus tests. Of course, as we
> both know and as noted long ago on LKML, we cannot -define- semantics
> via litmus tests, but we can use litmus tests to -check- semantics.
>
> First, modeling lock acquisition with xchg(), which is fully ordered
> (given that locking is still very much under development in our model):
>
> C SUW+or-ow+l-ow-or
> {}
>
> P0(int *x, int *y, int *my_lock)
> {
> r0 = READ_ONCE(*x);
> WRITE_ONCE(*y, 1);
> smp_mb();
> r1 = READ_ONCE(*my_lock);
> }
>
> P1(int *x, int *y, int *my_lock) {
> r1 = xchg(my_lock, 1);
> WRITE_ONCE(*x, 1);
> r0 = READ_ONCE(*y);
> }
>
> exists (0:r0=1 /\ 0:r1=0 /\ 1:r0=0 /\ 1:r1=0)
>
> This gives "Positive: 0 Negative: 5". But to your point, replacing
> "xchg()" with "xchg_acquire()" gets us "Positive: 1 Negative: 7".

Yes, that's what I had in mind.

> But xchg_acquire() would in fact work on x86 because the xchg instruction
> does full ordering in any case, right? (I believe that this also applies
> to the other TSO architectures, but have not checked.)

True.

> For PowerPC (and I believe ARM), the leading smp_mb() suffices, as
> illustrated by this litmus test:
>
> PPC spin_unlock_wait_st.litmus
> ""
> {
> l=0;
> 0:r1=1; 0:r3=42; 0:r4=x; 0:r10=0; 0:r12=l;
> 1:r1=1; 1:r3=42; 1:r4=x; 1:r10=0; 1:r11=0; 1:r12=l;
> }
> P0 | P1 ;
> stw r1,0(r4) | lwarx r11,r10,r12 ;
> sync | cmpwi r11,0 ;
> lwarx r11,r10,r12 | bne Fail1 ;
> stwcx. r11,r10,r12 | stwcx. r1,r10,r12 ;
> bne Fail0 | bne Fail1 ;
> lwz r3,0(r12) | lwsync ;
> Fail0: | lwz r3,0(r4) ;
> | Fail1: ;
>
>
> exists
> (0:r3=0 /\ 1:r3=0)

Yes. Bear in mind that the PowerPC version of arch_spin_unlock_wait
ends with smp_mb. That already makes it a lot stronger than the
smp_cond_load_acquire implementation on other architectures.

> So what am I missing here?

Our memory model is (deliberately) weaker than the guarantees provided
by any of the hardware implementations. So while adding smp_mb in
front of smp_cond_load_acquire may suffice to give the desired
semantics in many cases, it might not suffice for all architectures
(because it doesn't suffice in the model). In fact, we would need to
change the model to make it accept this idiom.

I admit not being able to point to any architectures where the
difference matters. But if you take this approach, you do need to
document that for any future architectures, putting smp_mb in front of
spin_unlock_wait must be guaranteed by the arch-specific code to have
the desired effect.

Also, it would not be obvious to a reader _why_ putting an explicit
smp_mb before spin_unlock_wait would make prior writes visible to later
critical sections, since it's not obvious that spin_unlock_wait needs
to do any writes. Replacing the whole thing with spin_lock +
spin_unlock would be easier to understand and wouldn't need any special
guarantees or explanations.

> > If there are any places where this would add unacceptable overhead,
> > maybe those places need some rethinking. For instance, perhaps we
> > could add a separate primitive that provides only release semantics.
> > (But would using the new primitive together with spin_unlock_wait
> > really be significantly better than lock-unlock?)
>
> At the moment, I have no idea on the relative overheads. ;-)

I don't either. But for architectures where spin_unlock_wait already
does the equivalent of load-locked + store-conditional, it's hard to
see why replacing it with spin_lock + spin_unlock would make it any
slower.

Alan Stern

2017-06-28 23:54:24

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Wed, Jun 28, 2017 at 04:16:03PM -0400, Alan Stern wrote:
> On Wed, 28 Jun 2017, Paul E. McKenney wrote:

[ . . . ]

> Yes. Bear in mind that the PowerPC version of arch_spin_unlock_wait
> ends with smp_mb. That already makes it a lot stronger than the
> smp_cond_load_acquire implementation on other architectures.

Fair enough!

> > So what am I missing here?
>
> Our memory model is (deliberately) weaker than the guarantees provided
> by any of the hardware implementations.

Agreed -- the model cannot say that something is guaranteed unless -all-
the architectures provide that guarantee. If I remember correctly, each
architecture currently running Linux is stronger in some way than at least
one other architecture. So yes, the model has to be strictly weaker than
each of the architectures taken individually, because otherwise the model
would be making a guarantee that at least one architecture does not meet,
which would disqualify it from being a valid Linux-kernel memory model.

Plus, yes, there are a few places where the model is a bit weaker than
it needs to be in order to keep the model's complexity from exploding
beyond all reason.

> So while adding smp_mb in
> front of smp_cond_load_acquire may suffice to give the desired
> semantics in many cases, it might not suffice for all architectures
> (because it doesn't suffice in the model). In fact, we would need to
> change the model to make it accept this idiom.

Understood. And if Murphy is being his usual self, the change wouldn't
make the model any simpler.

> I admit not being able to point to any architectures where the
> difference matters. But if you take this approach, you do need to
> document that for any future architectures, putting smp_mb in front of
> spin_unlock_wait must be guaranteed by the arch-specific code to have
> the desired effect.
>
> Also, it would not be obvious to a reader _why_ putting an explicit
> smp_mb before spin_unlock_wait would make prior writes visible to later
> critical sections, since it's not obvious that spin_unlock_wait needs
> to do any writes.

Agreed, my current comments need help. I will fix them as you suggest.

> Replacing the whole thing with spin_lock +
> spin_unlock would be easier to understand and wouldn't need any special
> guarantees or explanations.

That was my initial choice, so I would look pretty silly arguing all
that much. And I have had much better success getting the lock+unlock
definition across to people, so my experience agrees with yours on the
easy-to-understand part.

> > > If there are any places where this would add unacceptable overhead,
> > > maybe those places need some rethinking. For instance, perhaps we
> > > could add a separate primitive that provides only release semantics.
> > > (But would using the new primitive together with spin_unlock_wait
> > > really be significantly better than lock-unlock?)
> >
> > At the moment, I have no idea on the relative overheads. ;-)
>
> I don't either. But for architectures where spin_unlock_wait already
> does the equivalent of load-locked + store-conditional, it's hard to
> see why replacing it with spin_lock + spin_unlock would make it any
> slower.

Well, the current implementations don't actually take the lock, which
means that they don't delay other lock holders quite so much. Which might
not be a huge difference even at high levels of lock contention, and
would definitely be down in the noise for low levels of lock contention.

Digital packrat that I am, I am still carrying both sets of patches,
so can still go either way.

Linus, are you dead-set against defining spin_unlock_wait() to be
spin_lock + spin_unlock? For example, is the current x86 implementation
of spin_unlock_wait() really a non-negotiable hard requirement? Or
would you be willing to live with the spin_lock + spin_unlock semantics?

Thanx, Paul

2017-06-29 00:05:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Wed, Jun 28, 2017 at 4:54 PM, Paul E. McKenney
<[email protected]> wrote:
>
> Linus, are you dead-set against defining spin_unlock_wait() to be
> spin_lock + spin_unlock? For example, is the current x86 implementation
> of spin_unlock_wait() really a non-negotiable hard requirement? Or
> would you be willing to live with the spin_lock + spin_unlock semantics?

So I think the "same as spin_lock + spin_unlock" semantics are kind of insane.

One of the issues is that the same as "spin_lock + spin_unlock" is
basically now architecture-dependent. Is it really the
architecture-dependent ordering you want to define this as?

So I just think it's a *bad* definition. If somebody wants something
that is exactly equivalent to spin_lock+spin_unlock, then dammit, just
do *THAT*. It's completely pointless to me to define
spin_unlock_wait() in those terms.

And if it's not equivalent to the *architecture* behavior of
spin_lock+spin_unlock, then I think it should be descibed in terms
that aren't about the architecture implementation (so you shouldn't
describe it as "spin_lock+spin_unlock", you should describe it in
terms of memory barrier semantics.

And if we really have to use the spin_lock+spinunlock semantics for
this, then what is the advantage of spin_unlock_wait at all, if it
doesn't fundamentally avoid some locking overhead of just taking the
spinlock in the first place?

And if we can't use a cheaper model, maybe we should just get rid of
it entirely?

Finally: if the memory barrier semantics are exactly the same, and
it's purely about avoiding some nasty contention case, I think the
concept is broken - contention is almost never an actual issue, and if
it is, the problem is much deeper than spin_unlock_wait().

Linus

2017-06-29 00:46:12

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Wed, Jun 28, 2017 at 05:05:46PM -0700, Linus Torvalds wrote:
> On Wed, Jun 28, 2017 at 4:54 PM, Paul E. McKenney
> <[email protected]> wrote:
> >
> > Linus, are you dead-set against defining spin_unlock_wait() to be
> > spin_lock + spin_unlock? For example, is the current x86 implementation
> > of spin_unlock_wait() really a non-negotiable hard requirement? Or
> > would you be willing to live with the spin_lock + spin_unlock semantics?
>
> So I think the "same as spin_lock + spin_unlock" semantics are kind of insane.
>
> One of the issues is that the same as "spin_lock + spin_unlock" is
> basically now architecture-dependent. Is it really the
> architecture-dependent ordering you want to define this as?
>
> So I just think it's a *bad* definition. If somebody wants something
> that is exactly equivalent to spin_lock+spin_unlock, then dammit, just
> do *THAT*. It's completely pointless to me to define
> spin_unlock_wait() in those terms.
>
> And if it's not equivalent to the *architecture* behavior of
> spin_lock+spin_unlock, then I think it should be descibed in terms
> that aren't about the architecture implementation (so you shouldn't
> describe it as "spin_lock+spin_unlock", you should describe it in
> terms of memory barrier semantics.
>
> And if we really have to use the spin_lock+spinunlock semantics for
> this, then what is the advantage of spin_unlock_wait at all, if it
> doesn't fundamentally avoid some locking overhead of just taking the
> spinlock in the first place?
>
> And if we can't use a cheaper model, maybe we should just get rid of
> it entirely?
>
> Finally: if the memory barrier semantics are exactly the same, and
> it's purely about avoiding some nasty contention case, I think the
> concept is broken - contention is almost never an actual issue, and if
> it is, the problem is much deeper than spin_unlock_wait().

All good points!

I must confess that your sentence about getting rid of spin_unlock_wait()
entirely does resonate with me, especially given the repeated bouts of
"but what -exactly- is it -supposed- to do?" over the past 18 months
or so. ;-)

Just for completeness, here is a list of the definitions that have been
put forward, just in case it inspires someone to come up with something
better:

1. spin_unlock_wait() provides only acquire semantics. Code
placed after the spin_unlock_wait() will see the effects of
all previous critical sections, but there is no guarantees for
subsequent critical sections. The x86 implementation provides
this. I -think- that the ARM and PowerPC implementations could
get rid of a memory-barrier instruction and still provide this.

2. As #1 above, but a "smp_mb();spin_unlock_wait();" provides the
additional guarantee that code placed before this construct is
seen by all subsequent critical sections. The x86 implementation
provides this, as do ARM and PowerPC, but it is not clear that all
architectures do. As Alan noted, this is an extremely unnatural
definition for the current memory model.

3. [ Just for completeness, yes, this is off the table! ] The
spin_unlock_wait() has the same semantics as a spin_lock()
followed immediately by a spin_unlock().

4. spin_unlock_wait() is analogous to synchronize_rcu(), where
spin_unlock_wait()'s "read-side critical sections" are the lock's
normal critical sections. This was the first definition I heard
that made any sense to me, but it turns out to be equivalent
to #3. Thus, also off the table.

Does anyone know of any other possible definitions?

Thanx, Paul

2017-06-29 03:15:50

by Boqun Feng

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Wed, Jun 28, 2017 at 05:45:56PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 28, 2017 at 05:05:46PM -0700, Linus Torvalds wrote:
> > On Wed, Jun 28, 2017 at 4:54 PM, Paul E. McKenney
> > <[email protected]> wrote:
> > >
> > > Linus, are you dead-set against defining spin_unlock_wait() to be
> > > spin_lock + spin_unlock? For example, is the current x86 implementation
> > > of spin_unlock_wait() really a non-negotiable hard requirement? Or
> > > would you be willing to live with the spin_lock + spin_unlock semantics?
> >
> > So I think the "same as spin_lock + spin_unlock" semantics are kind of insane.
> >
> > One of the issues is that the same as "spin_lock + spin_unlock" is
> > basically now architecture-dependent. Is it really the
> > architecture-dependent ordering you want to define this as?
> >
> > So I just think it's a *bad* definition. If somebody wants something
> > that is exactly equivalent to spin_lock+spin_unlock, then dammit, just
> > do *THAT*. It's completely pointless to me to define
> > spin_unlock_wait() in those terms.
> >
> > And if it's not equivalent to the *architecture* behavior of
> > spin_lock+spin_unlock, then I think it should be descibed in terms
> > that aren't about the architecture implementation (so you shouldn't
> > describe it as "spin_lock+spin_unlock", you should describe it in
> > terms of memory barrier semantics.
> >
> > And if we really have to use the spin_lock+spinunlock semantics for
> > this, then what is the advantage of spin_unlock_wait at all, if it
> > doesn't fundamentally avoid some locking overhead of just taking the
> > spinlock in the first place?
> >
> > And if we can't use a cheaper model, maybe we should just get rid of
> > it entirely?
> >
> > Finally: if the memory barrier semantics are exactly the same, and
> > it's purely about avoiding some nasty contention case, I think the
> > concept is broken - contention is almost never an actual issue, and if
> > it is, the problem is much deeper than spin_unlock_wait().
>
> All good points!
>
> I must confess that your sentence about getting rid of spin_unlock_wait()
> entirely does resonate with me, especially given the repeated bouts of
> "but what -exactly- is it -supposed- to do?" over the past 18 months
> or so. ;-)
>
> Just for completeness, here is a list of the definitions that have been
> put forward, just in case it inspires someone to come up with something
> better:
>
> 1. spin_unlock_wait() provides only acquire semantics. Code
> placed after the spin_unlock_wait() will see the effects of
> all previous critical sections, but there is no guarantees for
> subsequent critical sections. The x86 implementation provides
> this. I -think- that the ARM and PowerPC implementations could
> get rid of a memory-barrier instruction and still provide this.
>

Yes, except we still need a smp_lwsync() in powerpc's
spin_unlock_wait().

And FWIW, the two smp_mb()s in spin_unlock_wait() on PowerPC exist there
just because when Peter worked on commit 726328d92a42, we decided to let
the fix for spin_unlock_wait() on PowerPC(i.e. commit 6262db7c088bb ) go
into the tree first to avoid some possible conflicts. And.. I forgot to
do the clean-up for an aquire-semantics spin_unlock_wait() later.. ;-)

I could send out the necessary fix once we have a conclusion for the
semantics part.

Regards,
Boqun

> 2. As #1 above, but a "smp_mb();spin_unlock_wait();" provides the
> additional guarantee that code placed before this construct is
> seen by all subsequent critical sections. The x86 implementation
> provides this, as do ARM and PowerPC, but it is not clear that all
> architectures do. As Alan noted, this is an extremely unnatural
> definition for the current memory model.
>
> 3. [ Just for completeness, yes, this is off the table! ] The
> spin_unlock_wait() has the same semantics as a spin_lock()
> followed immediately by a spin_unlock().
>
> 4. spin_unlock_wait() is analogous to synchronize_rcu(), where
> spin_unlock_wait()'s "read-side critical sections" are the lock's
> normal critical sections. This was the first definition I heard
> that made any sense to me, but it turns out to be equivalent
> to #3. Thus, also off the table.
>
> Does anyone know of any other possible definitions?
>
> Thanx, Paul
>


Attachments:
(No filename) (4.23 kB)
signature.asc (488.00 B)
Download all attachments

2017-06-29 11:36:51

by Will Deacon

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

Hey Paul,

On Wed, Jun 28, 2017 at 05:45:56PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 28, 2017 at 05:05:46PM -0700, Linus Torvalds wrote:
> > On Wed, Jun 28, 2017 at 4:54 PM, Paul E. McKenney
> > <[email protected]> wrote:
> > >
> > > Linus, are you dead-set against defining spin_unlock_wait() to be
> > > spin_lock + spin_unlock? For example, is the current x86 implementation
> > > of spin_unlock_wait() really a non-negotiable hard requirement? Or
> > > would you be willing to live with the spin_lock + spin_unlock semantics?
> >
> > So I think the "same as spin_lock + spin_unlock" semantics are kind of insane.
> >
> > One of the issues is that the same as "spin_lock + spin_unlock" is
> > basically now architecture-dependent. Is it really the
> > architecture-dependent ordering you want to define this as?
> >
> > So I just think it's a *bad* definition. If somebody wants something
> > that is exactly equivalent to spin_lock+spin_unlock, then dammit, just
> > do *THAT*. It's completely pointless to me to define
> > spin_unlock_wait() in those terms.
> >
> > And if it's not equivalent to the *architecture* behavior of
> > spin_lock+spin_unlock, then I think it should be descibed in terms
> > that aren't about the architecture implementation (so you shouldn't
> > describe it as "spin_lock+spin_unlock", you should describe it in
> > terms of memory barrier semantics.
> >
> > And if we really have to use the spin_lock+spinunlock semantics for
> > this, then what is the advantage of spin_unlock_wait at all, if it
> > doesn't fundamentally avoid some locking overhead of just taking the
> > spinlock in the first place?
> >
> > And if we can't use a cheaper model, maybe we should just get rid of
> > it entirely?
> >
> > Finally: if the memory barrier semantics are exactly the same, and
> > it's purely about avoiding some nasty contention case, I think the
> > concept is broken - contention is almost never an actual issue, and if
> > it is, the problem is much deeper than spin_unlock_wait().
>
> All good points!
>
> I must confess that your sentence about getting rid of spin_unlock_wait()
> entirely does resonate with me, especially given the repeated bouts of
> "but what -exactly- is it -supposed- to do?" over the past 18 months
> or so. ;-)
>
> Just for completeness, here is a list of the definitions that have been
> put forward, just in case it inspires someone to come up with something
> better:
>
> 1. spin_unlock_wait() provides only acquire semantics. Code
> placed after the spin_unlock_wait() will see the effects of
> all previous critical sections, but there is no guarantees for
> subsequent critical sections. The x86 implementation provides
> this. I -think- that the ARM and PowerPC implementations could
> get rid of a memory-barrier instruction and still provide this.
>
> 2. As #1 above, but a "smp_mb();spin_unlock_wait();" provides the
> additional guarantee that code placed before this construct is
> seen by all subsequent critical sections. The x86 implementation
> provides this, as do ARM and PowerPC, but it is not clear that all
> architectures do. As Alan noted, this is an extremely unnatural
> definition for the current memory model.
>
> 3. [ Just for completeness, yes, this is off the table! ] The
> spin_unlock_wait() has the same semantics as a spin_lock()
> followed immediately by a spin_unlock().
>
> 4. spin_unlock_wait() is analogous to synchronize_rcu(), where
> spin_unlock_wait()'s "read-side critical sections" are the lock's
> normal critical sections. This was the first definition I heard
> that made any sense to me, but it turns out to be equivalent
> to #3. Thus, also off the table.
>
> Does anyone know of any other possible definitions?

My understanding was that spin_unlock_wait() has:

* Acquire semantics
* Is ordered with respect to any prior spin_lock/spin_unlock operations
on the same thread.

so if you want order against other PO-prior accesses, like in Andrea's test,
then you need an explicit smp_mb() (see, for example, "CASE 2" of the big
comment in qspinlock.c).

That's what I used when implementing this for arm64, and I think that's what
Peter's been going by too (at least, I think the current implementations
meet those requirements).

Do we have users in-tree that need more than that?

Will

2017-06-29 11:38:56

by Will Deacon

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

[turns out I've not been on cc for this thread, but Jade pointed me to it
and I see my name came up at some point!]

On Wed, Jun 28, 2017 at 05:05:46PM -0700, Linus Torvalds wrote:
> On Wed, Jun 28, 2017 at 4:54 PM, Paul E. McKenney
> <[email protected]> wrote:
> >
> > Linus, are you dead-set against defining spin_unlock_wait() to be
> > spin_lock + spin_unlock? For example, is the current x86 implementation
> > of spin_unlock_wait() really a non-negotiable hard requirement? Or
> > would you be willing to live with the spin_lock + spin_unlock semantics?
>
> So I think the "same as spin_lock + spin_unlock" semantics are kind of insane.
>
> One of the issues is that the same as "spin_lock + spin_unlock" is
> basically now architecture-dependent. Is it really the
> architecture-dependent ordering you want to define this as?
>
> So I just think it's a *bad* definition. If somebody wants something
> that is exactly equivalent to spin_lock+spin_unlock, then dammit, just
> do *THAT*. It's completely pointless to me to define
> spin_unlock_wait() in those terms.
>
> And if it's not equivalent to the *architecture* behavior of
> spin_lock+spin_unlock, then I think it should be descibed in terms
> that aren't about the architecture implementation (so you shouldn't
> describe it as "spin_lock+spin_unlock", you should describe it in
> terms of memory barrier semantics.
>
> And if we really have to use the spin_lock+spinunlock semantics for
> this, then what is the advantage of spin_unlock_wait at all, if it
> doesn't fundamentally avoid some locking overhead of just taking the
> spinlock in the first place?

Just on this point -- the arm64 code provides the same ordering semantics
as you would get from a lock;unlock sequence, but we can optimise that
when compared to an actual lock;unlock sequence because we don't need to
wait in turn for our ticket. I suspect something similar could be done
if/when we move to qspinlocks.

Whether or not this is actually worth optimising is another question, but
it is worth noting that unlock_wait can be implemented more cheaply than
lock;unlock, whilst providing the same ordering guarantees (if that's
really what we want -- see my reply to Paul).

Simplicity tends to be my preference, so ripping this out would suit me
best ;)

Will

2017-06-29 15:59:35

by Alan Stern

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Thu, 29 Jun 2017, Will Deacon wrote:

> [turns out I've not been on cc for this thread, but Jade pointed me to it
> and I see my name came up at some point!]
>
> On Wed, Jun 28, 2017 at 05:05:46PM -0700, Linus Torvalds wrote:
> > On Wed, Jun 28, 2017 at 4:54 PM, Paul E. McKenney
> > <[email protected]> wrote:
> > >
> > > Linus, are you dead-set against defining spin_unlock_wait() to be
> > > spin_lock + spin_unlock? For example, is the current x86 implementation
> > > of spin_unlock_wait() really a non-negotiable hard requirement? Or
> > > would you be willing to live with the spin_lock + spin_unlock semantics?
> >
> > So I think the "same as spin_lock + spin_unlock" semantics are kind of insane.
> >
> > One of the issues is that the same as "spin_lock + spin_unlock" is
> > basically now architecture-dependent. Is it really the
> > architecture-dependent ordering you want to define this as?
> >
> > So I just think it's a *bad* definition. If somebody wants something
> > that is exactly equivalent to spin_lock+spin_unlock, then dammit, just
> > do *THAT*. It's completely pointless to me to define
> > spin_unlock_wait() in those terms.
> >
> > And if it's not equivalent to the *architecture* behavior of
> > spin_lock+spin_unlock, then I think it should be descibed in terms
> > that aren't about the architecture implementation (so you shouldn't
> > describe it as "spin_lock+spin_unlock", you should describe it in
> > terms of memory barrier semantics.
> >
> > And if we really have to use the spin_lock+spinunlock semantics for
> > this, then what is the advantage of spin_unlock_wait at all, if it
> > doesn't fundamentally avoid some locking overhead of just taking the
> > spinlock in the first place?
>
> Just on this point -- the arm64 code provides the same ordering semantics
> as you would get from a lock;unlock sequence, but we can optimise that
> when compared to an actual lock;unlock sequence because we don't need to
> wait in turn for our ticket. I suspect something similar could be done
> if/when we move to qspinlocks.
>
> Whether or not this is actually worth optimising is another question, but
> it is worth noting that unlock_wait can be implemented more cheaply than
> lock;unlock, whilst providing the same ordering guarantees (if that's
> really what we want -- see my reply to Paul).
>
> Simplicity tends to be my preference, so ripping this out would suit me
> best ;)

It would be best to know:

(1). How spin_unlock_wait() is currently being used.

(2). What it was originally intended for.

Paul has done some research into (1). He can correct me if I get this
wrong... Only a few (i.e., around one or two) of the usages don't seem
to require the full spin_lock+spin_unlock semantics. I go along with
Linus; the places which really do want it to behave like
spin_lock+spin_unlock should simply use spin_lock+spin_unlock. There
hasn't been any indication so far that the possible efficiency
improvement Will mentions is at all important.

According to Paul, most of the other places don't need anything more
than the acquire guarantee (any changes made in earlier critical
sections will be visible to the code following spin_unlock_wait). In
which case, the semantics of spin_unlock_wait could be redefined in
this simpler form.

Or we could literally replace all the existing definitions with
spin_lock+spin_unlock. Would that be so terrible?

As regards (2), I did a little digging. spin_unlock_wait was
introduced in the 2.1.36 kernel, in mid-April 1997. I wasn't able to
find a specific patch for it in the LKML archives. At the time it
was used in only one place in the entire kernel (in kernel/exit.c):

void release(struct task_struct * p)
{
int i;

if (!p)
return;
if (p == current) {
printk("task releasing itself\n");
return;
}
for (i=1 ; i<NR_TASKS ; i++)
if (task[i] == p) {
#ifdef __SMP__
/* FIXME! Cheesy, but kills the window... -DaveM */
while(p->processor != NO_PROC_ID)
barrier();
spin_unlock_wait(&scheduler_lock);
#endif
nr_tasks--;
task[i] = NULL;
REMOVE_LINKS(p);
release_thread(p);
if (STACK_MAGIC != *(unsigned long *)p->kernel_stack_page)
printk(KERN_ALERT "release: %s kernel stack corruption. Aiee\n", p->comm);
free_kernel_stack(p->kernel_stack_page);
current->cmin_flt += p->min_flt + p->cmin_flt;
current->cmaj_flt += p->maj_flt + p->cmaj_flt;
current->cnswap += p->nswap + p->cnswap;
free_task_struct(p);
return;
}
panic("trying to release non-existent task");
}

I'm not entirely clear on the point of this call. It looks like it
wanted to wait until p was guaranteed not to be running on any
processor ever again. (I don't see why it couldn't have just acquired
the scheduler_lock -- was release() a particularly hot path?)

Although it doesn't matter now, this would mean that the original
semantics of spin_unlock_wait were different from what we are
discussing. It apparently was meant to provide the release guarantee:
any future critical sections would see the values that were visible
before the call. Ironic.

Alan Stern

2017-06-29 18:11:44

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Thu, Jun 29, 2017 at 11:59:27AM -0400, Alan Stern wrote:
> On Thu, 29 Jun 2017, Will Deacon wrote:
>
> > [turns out I've not been on cc for this thread, but Jade pointed me to it
> > and I see my name came up at some point!]
> >
> > On Wed, Jun 28, 2017 at 05:05:46PM -0700, Linus Torvalds wrote:
> > > On Wed, Jun 28, 2017 at 4:54 PM, Paul E. McKenney
> > > <[email protected]> wrote:
> > > >
> > > > Linus, are you dead-set against defining spin_unlock_wait() to be
> > > > spin_lock + spin_unlock? For example, is the current x86 implementation
> > > > of spin_unlock_wait() really a non-negotiable hard requirement? Or
> > > > would you be willing to live with the spin_lock + spin_unlock semantics?
> > >
> > > So I think the "same as spin_lock + spin_unlock" semantics are kind of insane.
> > >
> > > One of the issues is that the same as "spin_lock + spin_unlock" is
> > > basically now architecture-dependent. Is it really the
> > > architecture-dependent ordering you want to define this as?
> > >
> > > So I just think it's a *bad* definition. If somebody wants something
> > > that is exactly equivalent to spin_lock+spin_unlock, then dammit, just
> > > do *THAT*. It's completely pointless to me to define
> > > spin_unlock_wait() in those terms.
> > >
> > > And if it's not equivalent to the *architecture* behavior of
> > > spin_lock+spin_unlock, then I think it should be descibed in terms
> > > that aren't about the architecture implementation (so you shouldn't
> > > describe it as "spin_lock+spin_unlock", you should describe it in
> > > terms of memory barrier semantics.
> > >
> > > And if we really have to use the spin_lock+spinunlock semantics for
> > > this, then what is the advantage of spin_unlock_wait at all, if it
> > > doesn't fundamentally avoid some locking overhead of just taking the
> > > spinlock in the first place?
> >
> > Just on this point -- the arm64 code provides the same ordering semantics
> > as you would get from a lock;unlock sequence, but we can optimise that
> > when compared to an actual lock;unlock sequence because we don't need to
> > wait in turn for our ticket. I suspect something similar could be done
> > if/when we move to qspinlocks.
> >
> > Whether or not this is actually worth optimising is another question, but
> > it is worth noting that unlock_wait can be implemented more cheaply than
> > lock;unlock, whilst providing the same ordering guarantees (if that's
> > really what we want -- see my reply to Paul).
> >
> > Simplicity tends to be my preference, so ripping this out would suit me
> > best ;)
>
> It would be best to know:
>
> (1). How spin_unlock_wait() is currently being used.
>
> (2). What it was originally intended for.
>
> Paul has done some research into (1). He can correct me if I get this
> wrong... Only a few (i.e., around one or two) of the usages don't seem
> to require the full spin_lock+spin_unlock semantics. I go along with
> Linus; the places which really do want it to behave like
> spin_lock+spin_unlock should simply use spin_lock+spin_unlock. There
> hasn't been any indication so far that the possible efficiency
> improvement Will mentions is at all important.
>
> According to Paul, most of the other places don't need anything more
> than the acquire guarantee (any changes made in earlier critical
> sections will be visible to the code following spin_unlock_wait). In
> which case, the semantics of spin_unlock_wait could be redefined in
> this simpler form.
>
> Or we could literally replace all the existing definitions with
> spin_lock+spin_unlock. Would that be so terrible?

And here they are...

spin_unlock_wait():

o drivers/ata/libata-eh.c ata_scsi_cmd_error_handler()
spin_unlock_wait(ap->lock) in else-clause where then-clause has
a full critical section for this same lock. This use case could
potentially require both acquire and release semantics. (I am
following up with the developers/maintainers, suggesting that
they convert to spin_lock+spin_unlock if they need release
semantics.)

This is error-handling code, which should be rare, so
spin_lock+spin_unlock should work fine here. Probably shouldn't
have bugged the maintainer, but email already sent. :-/

o ipc/sem.c exit_sem()
This use case appears to need to wait only on prior critical
sections, as the only way we get here is if the entry has already
been removed from the list. An acquire-only spin_unlock_wait()
works here. However, this is sem-exit code, which is not a
fastpath, and the race should be rare, so spin_lock+spin_unlock
should work fine here.

o kernel/sched/completion.c completion_done()
This use case appears to need to wait only on prior critical
sections, as the only way we get past the "if" is when the lock is
held by complete(), and you are only supposed to invoke complete()
once on a given completion. An acquire-only spin_unlock_wait()
works here, but the race should be rare, so spin_lock+spin_unlock
should also work fine here.

o net/netfilter/nf_conntrack_core.c nf_conntrack_lock()
This instance of spin_unlock_wait() interacts with
nf_conntrack_all_lock()'s instance of spin_unlock_wait().
Although nf_conntrack_all_lock() has an smp_mb(), which I
believe provides release semantics given current implementations,
nf_conntrack_lock() just has smp_rmb().

I believe that the smp_rmb() needs to be smp_mb(). Am I missing
something here that makes the current code safe on x86?

I believe that this code could use spin_lock+spin_unlock without
significant performance penalties -- I do not believe that
nf_conntrack_locks_all_lock gets significant contention.

raw_spin_unlock_wait() (Courtesy of Andrea Parri with added commentary):

o kernel/exit.c do_exit()
Seems to rely on both acquire and release semantics. The
raw_spin_unlock_wait() primitive is preceded by a smp_mb().
But this is task exit doing spin_unlock_wait() on the task's
lock, so spin_lock+spin_unlock should work fine here.

o kernel/sched/core.c do_task_dead()
Seems to rely on the acquire semantics only. The
raw_spin_unlock_wait() primitive is preceded by an inexplicable
smp_mb(). Again, this is task exit doing spin_unlock_wait() on
the task's lock, so spin_lock+spin_unlock should work fine here.

o kernel/task_work.c task_work_run()
Seems to rely on the acquire semantics only. This is to handle
a race with task_work_cancel(), which appears to be quite rare.
So the spin_lock+spin_unlock should work fine here.

spin_lock()/spin_unlock():

o ipc/sem.c complexmode_enter()
This used to be spin_unlock_wait(), but was changed to a
spin_lock()/spin_unlock() pair by 27d7be1801a4 ("ipc/sem.c:
avoid using spin_unlock_wait()").

Looks to me like we really can drop spin_unlock_wait() in favor of
momentarily acquiring the lock. There are so few use cases that I don't
see a problem open-coding this. I will put together yet another patch
series for my spin_unlock_wait() collection of patch serieses. ;-)

> As regards (2), I did a little digging. spin_unlock_wait was
> introduced in the 2.1.36 kernel, in mid-April 1997. I wasn't able to
> find a specific patch for it in the LKML archives. At the time it
> was used in only one place in the entire kernel (in kernel/exit.c):
>
> void release(struct task_struct * p)
> {
> int i;
>
> if (!p)
> return;
> if (p == current) {
> printk("task releasing itself\n");
> return;
> }
> for (i=1 ; i<NR_TASKS ; i++)
> if (task[i] == p) {
> #ifdef __SMP__
> /* FIXME! Cheesy, but kills the window... -DaveM */
> while(p->processor != NO_PROC_ID)
> barrier();
> spin_unlock_wait(&scheduler_lock);
> #endif
> nr_tasks--;
> task[i] = NULL;
> REMOVE_LINKS(p);
> release_thread(p);
> if (STACK_MAGIC != *(unsigned long *)p->kernel_stack_page)
> printk(KERN_ALERT "release: %s kernel stack corruption. Aiee\n", p->comm);
> free_kernel_stack(p->kernel_stack_page);
> current->cmin_flt += p->min_flt + p->cmin_flt;
> current->cmaj_flt += p->maj_flt + p->cmaj_flt;
> current->cnswap += p->nswap + p->cnswap;
> free_task_struct(p);
> return;
> }
> panic("trying to release non-existent task");
> }
>
> I'm not entirely clear on the point of this call. It looks like it
> wanted to wait until p was guaranteed not to be running on any
> processor ever again. (I don't see why it couldn't have just acquired
> the scheduler_lock -- was release() a particularly hot path?)
>
> Although it doesn't matter now, this would mean that the original
> semantics of spin_unlock_wait were different from what we are
> discussing. It apparently was meant to provide the release guarantee:
> any future critical sections would see the values that were visible
> before the call. Ironic.

Cute!!! ;-)

Thanx, Paul

2017-06-29 18:47:13

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Thu, Jun 29, 2017 at 12:38:48PM +0100, Will Deacon wrote:
> [turns out I've not been on cc for this thread, but Jade pointed me to it
> and I see my name came up at some point!]

My bad for not having you Cc: on the original patch, apologies!

> On Wed, Jun 28, 2017 at 05:05:46PM -0700, Linus Torvalds wrote:
> > On Wed, Jun 28, 2017 at 4:54 PM, Paul E. McKenney
> > <[email protected]> wrote:
> > >
> > > Linus, are you dead-set against defining spin_unlock_wait() to be
> > > spin_lock + spin_unlock? For example, is the current x86 implementation
> > > of spin_unlock_wait() really a non-negotiable hard requirement? Or
> > > would you be willing to live with the spin_lock + spin_unlock semantics?
> >
> > So I think the "same as spin_lock + spin_unlock" semantics are kind of insane.
> >
> > One of the issues is that the same as "spin_lock + spin_unlock" is
> > basically now architecture-dependent. Is it really the
> > architecture-dependent ordering you want to define this as?
> >
> > So I just think it's a *bad* definition. If somebody wants something
> > that is exactly equivalent to spin_lock+spin_unlock, then dammit, just
> > do *THAT*. It's completely pointless to me to define
> > spin_unlock_wait() in those terms.
> >
> > And if it's not equivalent to the *architecture* behavior of
> > spin_lock+spin_unlock, then I think it should be descibed in terms
> > that aren't about the architecture implementation (so you shouldn't
> > describe it as "spin_lock+spin_unlock", you should describe it in
> > terms of memory barrier semantics.
> >
> > And if we really have to use the spin_lock+spinunlock semantics for
> > this, then what is the advantage of spin_unlock_wait at all, if it
> > doesn't fundamentally avoid some locking overhead of just taking the
> > spinlock in the first place?
>
> Just on this point -- the arm64 code provides the same ordering semantics
> as you would get from a lock;unlock sequence, but we can optimise that
> when compared to an actual lock;unlock sequence because we don't need to
> wait in turn for our ticket. I suspect something similar could be done
> if/when we move to qspinlocks.
>
> Whether or not this is actually worth optimising is another question, but
> it is worth noting that unlock_wait can be implemented more cheaply than
> lock;unlock, whilst providing the same ordering guarantees (if that's
> really what we want -- see my reply to Paul).
>
> Simplicity tends to be my preference, so ripping this out would suit me
> best ;)

Creating the series to do just that, with you on Cc this time!

Thanx, Paul

2017-06-29 18:47:50

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Thu, Jun 29, 2017 at 11:17:26AM +0800, Boqun Feng wrote:
> On Wed, Jun 28, 2017 at 05:45:56PM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 28, 2017 at 05:05:46PM -0700, Linus Torvalds wrote:
> > > On Wed, Jun 28, 2017 at 4:54 PM, Paul E. McKenney
> > > <[email protected]> wrote:
> > > >
> > > > Linus, are you dead-set against defining spin_unlock_wait() to be
> > > > spin_lock + spin_unlock? For example, is the current x86 implementation
> > > > of spin_unlock_wait() really a non-negotiable hard requirement? Or
> > > > would you be willing to live with the spin_lock + spin_unlock semantics?
> > >
> > > So I think the "same as spin_lock + spin_unlock" semantics are kind of insane.
> > >
> > > One of the issues is that the same as "spin_lock + spin_unlock" is
> > > basically now architecture-dependent. Is it really the
> > > architecture-dependent ordering you want to define this as?
> > >
> > > So I just think it's a *bad* definition. If somebody wants something
> > > that is exactly equivalent to spin_lock+spin_unlock, then dammit, just
> > > do *THAT*. It's completely pointless to me to define
> > > spin_unlock_wait() in those terms.
> > >
> > > And if it's not equivalent to the *architecture* behavior of
> > > spin_lock+spin_unlock, then I think it should be descibed in terms
> > > that aren't about the architecture implementation (so you shouldn't
> > > describe it as "spin_lock+spin_unlock", you should describe it in
> > > terms of memory barrier semantics.
> > >
> > > And if we really have to use the spin_lock+spinunlock semantics for
> > > this, then what is the advantage of spin_unlock_wait at all, if it
> > > doesn't fundamentally avoid some locking overhead of just taking the
> > > spinlock in the first place?
> > >
> > > And if we can't use a cheaper model, maybe we should just get rid of
> > > it entirely?
> > >
> > > Finally: if the memory barrier semantics are exactly the same, and
> > > it's purely about avoiding some nasty contention case, I think the
> > > concept is broken - contention is almost never an actual issue, and if
> > > it is, the problem is much deeper than spin_unlock_wait().
> >
> > All good points!
> >
> > I must confess that your sentence about getting rid of spin_unlock_wait()
> > entirely does resonate with me, especially given the repeated bouts of
> > "but what -exactly- is it -supposed- to do?" over the past 18 months
> > or so. ;-)
> >
> > Just for completeness, here is a list of the definitions that have been
> > put forward, just in case it inspires someone to come up with something
> > better:
> >
> > 1. spin_unlock_wait() provides only acquire semantics. Code
> > placed after the spin_unlock_wait() will see the effects of
> > all previous critical sections, but there is no guarantees for
> > subsequent critical sections. The x86 implementation provides
> > this. I -think- that the ARM and PowerPC implementations could
> > get rid of a memory-barrier instruction and still provide this.
> >
>
> Yes, except we still need a smp_lwsync() in powerpc's
> spin_unlock_wait().
>
> And FWIW, the two smp_mb()s in spin_unlock_wait() on PowerPC exist there
> just because when Peter worked on commit 726328d92a42, we decided to let
> the fix for spin_unlock_wait() on PowerPC(i.e. commit 6262db7c088bb ) go
> into the tree first to avoid some possible conflicts. And.. I forgot to
> do the clean-up for an aquire-semantics spin_unlock_wait() later.. ;-)
>
> I could send out the necessary fix once we have a conclusion for the
> semantics part.

If we end up still having spin_unlock_wait(), I will be happy to take
you up on that.

Thanx, Paul

> Regards,
> Boqun
>
> > 2. As #1 above, but a "smp_mb();spin_unlock_wait();" provides the
> > additional guarantee that code placed before this construct is
> > seen by all subsequent critical sections. The x86 implementation
> > provides this, as do ARM and PowerPC, but it is not clear that all
> > architectures do. As Alan noted, this is an extremely unnatural
> > definition for the current memory model.
> >
> > 3. [ Just for completeness, yes, this is off the table! ] The
> > spin_unlock_wait() has the same semantics as a spin_lock()
> > followed immediately by a spin_unlock().
> >
> > 4. spin_unlock_wait() is analogous to synchronize_rcu(), where
> > spin_unlock_wait()'s "read-side critical sections" are the lock's
> > normal critical sections. This was the first definition I heard
> > that made any sense to me, but it turns out to be equivalent
> > to #3. Thus, also off the table.
> >
> > Does anyone know of any other possible definitions?
> >
> > Thanx, Paul
> >


2017-06-30 02:49:42

by Boqun Feng

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Thu, Jun 29, 2017 at 11:11:26AM -0700, Paul E. McKenney wrote:
> On Thu, Jun 29, 2017 at 11:59:27AM -0400, Alan Stern wrote:
> > On Thu, 29 Jun 2017, Will Deacon wrote:
> >
> > > [turns out I've not been on cc for this thread, but Jade pointed me to it
> > > and I see my name came up at some point!]
> > >
> > > On Wed, Jun 28, 2017 at 05:05:46PM -0700, Linus Torvalds wrote:
> > > > On Wed, Jun 28, 2017 at 4:54 PM, Paul E. McKenney
> > > > <[email protected]> wrote:
> > > > >
> > > > > Linus, are you dead-set against defining spin_unlock_wait() to be
> > > > > spin_lock + spin_unlock? For example, is the current x86 implementation
> > > > > of spin_unlock_wait() really a non-negotiable hard requirement? Or
> > > > > would you be willing to live with the spin_lock + spin_unlock semantics?
> > > >
> > > > So I think the "same as spin_lock + spin_unlock" semantics are kind of insane.
> > > >
> > > > One of the issues is that the same as "spin_lock + spin_unlock" is
> > > > basically now architecture-dependent. Is it really the
> > > > architecture-dependent ordering you want to define this as?
> > > >
> > > > So I just think it's a *bad* definition. If somebody wants something
> > > > that is exactly equivalent to spin_lock+spin_unlock, then dammit, just
> > > > do *THAT*. It's completely pointless to me to define
> > > > spin_unlock_wait() in those terms.
> > > >
> > > > And if it's not equivalent to the *architecture* behavior of
> > > > spin_lock+spin_unlock, then I think it should be descibed in terms
> > > > that aren't about the architecture implementation (so you shouldn't
> > > > describe it as "spin_lock+spin_unlock", you should describe it in
> > > > terms of memory barrier semantics.
> > > >
> > > > And if we really have to use the spin_lock+spinunlock semantics for
> > > > this, then what is the advantage of spin_unlock_wait at all, if it
> > > > doesn't fundamentally avoid some locking overhead of just taking the
> > > > spinlock in the first place?
> > >
> > > Just on this point -- the arm64 code provides the same ordering semantics
> > > as you would get from a lock;unlock sequence, but we can optimise that
> > > when compared to an actual lock;unlock sequence because we don't need to
> > > wait in turn for our ticket. I suspect something similar could be done
> > > if/when we move to qspinlocks.
> > >
> > > Whether or not this is actually worth optimising is another question, but
> > > it is worth noting that unlock_wait can be implemented more cheaply than
> > > lock;unlock, whilst providing the same ordering guarantees (if that's
> > > really what we want -- see my reply to Paul).
> > >
> > > Simplicity tends to be my preference, so ripping this out would suit me
> > > best ;)
> >
> > It would be best to know:
> >
> > (1). How spin_unlock_wait() is currently being used.
> >
> > (2). What it was originally intended for.
> >
> > Paul has done some research into (1). He can correct me if I get this
> > wrong... Only a few (i.e., around one or two) of the usages don't seem
> > to require the full spin_lock+spin_unlock semantics. I go along with
> > Linus; the places which really do want it to behave like
> > spin_lock+spin_unlock should simply use spin_lock+spin_unlock. There
> > hasn't been any indication so far that the possible efficiency
> > improvement Will mentions is at all important.
> >
> > According to Paul, most of the other places don't need anything more
> > than the acquire guarantee (any changes made in earlier critical
> > sections will be visible to the code following spin_unlock_wait). In
> > which case, the semantics of spin_unlock_wait could be redefined in
> > this simpler form.
> >
> > Or we could literally replace all the existing definitions with
> > spin_lock+spin_unlock. Would that be so terrible?
>
> And here they are...
>
> spin_unlock_wait():
>
> o drivers/ata/libata-eh.c ata_scsi_cmd_error_handler()
> spin_unlock_wait(ap->lock) in else-clause where then-clause has
> a full critical section for this same lock. This use case could
> potentially require both acquire and release semantics. (I am
> following up with the developers/maintainers, suggesting that
> they convert to spin_lock+spin_unlock if they need release
> semantics.)
>
> This is error-handling code, which should be rare, so
> spin_lock+spin_unlock should work fine here. Probably shouldn't
> have bugged the maintainer, but email already sent. :-/
>
> o ipc/sem.c exit_sem()
> This use case appears to need to wait only on prior critical
> sections, as the only way we get here is if the entry has already
> been removed from the list. An acquire-only spin_unlock_wait()
> works here. However, this is sem-exit code, which is not a
> fastpath, and the race should be rare, so spin_lock+spin_unlock
> should work fine here.
>
> o kernel/sched/completion.c completion_done()
> This use case appears to need to wait only on prior critical
> sections, as the only way we get past the "if" is when the lock is
> held by complete(), and you are only supposed to invoke complete()
> once on a given completion. An acquire-only spin_unlock_wait()
> works here, but the race should be rare, so spin_lock+spin_unlock
> should also work fine here.
>
> o net/netfilter/nf_conntrack_core.c nf_conntrack_lock()
> This instance of spin_unlock_wait() interacts with
> nf_conntrack_all_lock()'s instance of spin_unlock_wait().
> Although nf_conntrack_all_lock() has an smp_mb(), which I
> believe provides release semantics given current implementations,
> nf_conntrack_lock() just has smp_rmb().
>
> I believe that the smp_rmb() needs to be smp_mb(). Am I missing
> something here that makes the current code safe on x86?
>

actually i think the smp_rmb() or even along with the spin_unlock_wait()
in nf_conntrack_lock() is not needed, we could
implementnf_conntrack_lock() as:


void nf_conntrack_lock(spinlock_t *lock) __acquires(lock)
{
spin_lock(lock);
while (unlikely(smp_load_acquire(nf_conntrack_locks_all))) {
spin_unlock(lock);
cpu_relaxed();
spin_lock(lock);
}
}

because in nf_conntrack_all_unlock(), we have:

smp_store_release(&nf_conntrack_locks_all, false);
spin_unlock(&nf_conntrack_locks_all_lock);

so if we exit the loop, which means we observe nf_conntrack_locks_all
being false, we actually hold the per bucket lock and observe everything
before the smp_store_release(), which is the same as everything in the
critical section of nf_conntrack_locks_all_lock. Otherwise, we observe
the nf_conntrack_locks_all being true, which means a global lock
critical section may be on its way, we simply drop the per bucket lock
and test whether the global lock is finished again some time later.

So I think spin_unlock_wait() in the nf_conntrack_lock() just requires
acquire semantics, at least.

Maybe I miss someting?

> I believe that this code could use spin_lock+spin_unlock without
> significant performance penalties -- I do not believe that
> nf_conntrack_locks_all_lock gets significant contention.
>
> raw_spin_unlock_wait() (Courtesy of Andrea Parri with added commentary):
>
> o kernel/exit.c do_exit()
> Seems to rely on both acquire and release semantics. The
> raw_spin_unlock_wait() primitive is preceded by a smp_mb().
> But this is task exit doing spin_unlock_wait() on the task's
> lock, so spin_lock+spin_unlock should work fine here.
>
> o kernel/sched/core.c do_task_dead()
> Seems to rely on the acquire semantics only. The
> raw_spin_unlock_wait() primitive is preceded by an inexplicable
> smp_mb(). Again, this is task exit doing spin_unlock_wait() on
> the task's lock, so spin_lock+spin_unlock should work fine here.
>
> o kernel/task_work.c task_work_run()
> Seems to rely on the acquire semantics only. This is to handle

I think this one needs the stronger semantics, the smp_mb() is just
hidden in the cmpxchg() before the raw_spin_unlock_wait() ;-)

cmpxchg() sets a special value to indicate the task_work has been taken,
and raw_spin_unlock_wait() must wait until the next critical section of
->pi_lock(in task_work_cancel()) could observe this, otherwise we may
cancel a task_work while executing it.

Regards,
Boqun
> a race with task_work_cancel(), which appears to be quite rare.
> So the spin_lock+spin_unlock should work fine here.
>
> spin_lock()/spin_unlock():
>
> o ipc/sem.c complexmode_enter()
> This used to be spin_unlock_wait(), but was changed to a
> spin_lock()/spin_unlock() pair by 27d7be1801a4 ("ipc/sem.c:
> avoid using spin_unlock_wait()").
>
> Looks to me like we really can drop spin_unlock_wait() in favor of
> momentarily acquiring the lock. There are so few use cases that I don't
> see a problem open-coding this. I will put together yet another patch
> series for my spin_unlock_wait() collection of patch serieses. ;-)
>
> > As regards (2), I did a little digging. spin_unlock_wait was
> > introduced in the 2.1.36 kernel, in mid-April 1997. I wasn't able to
> > find a specific patch for it in the LKML archives. At the time it
> > was used in only one place in the entire kernel (in kernel/exit.c):
> >
> > void release(struct task_struct * p)
> > {
> > int i;
> >
> > if (!p)
> > return;
> > if (p == current) {
> > printk("task releasing itself\n");
> > return;
> > }
> > for (i=1 ; i<NR_TASKS ; i++)
> > if (task[i] == p) {
> > #ifdef __SMP__
> > /* FIXME! Cheesy, but kills the window... -DaveM */
> > while(p->processor != NO_PROC_ID)
> > barrier();
> > spin_unlock_wait(&scheduler_lock);
> > #endif
> > nr_tasks--;
> > task[i] = NULL;
> > REMOVE_LINKS(p);
> > release_thread(p);
> > if (STACK_MAGIC != *(unsigned long *)p->kernel_stack_page)
> > printk(KERN_ALERT "release: %s kernel stack corruption. Aiee\n", p->comm);
> > free_kernel_stack(p->kernel_stack_page);
> > current->cmin_flt += p->min_flt + p->cmin_flt;
> > current->cmaj_flt += p->maj_flt + p->cmaj_flt;
> > current->cnswap += p->nswap + p->cnswap;
> > free_task_struct(p);
> > return;
> > }
> > panic("trying to release non-existent task");
> > }
> >
> > I'm not entirely clear on the point of this call. It looks like it
> > wanted to wait until p was guaranteed not to be running on any
> > processor ever again. (I don't see why it couldn't have just acquired
> > the scheduler_lock -- was release() a particularly hot path?)
> >
> > Although it doesn't matter now, this would mean that the original
> > semantics of spin_unlock_wait were different from what we are
> > discussing. It apparently was meant to provide the release guarantee:
> > any future critical sections would see the values that were visible
> > before the call. Ironic.
>
> Cute!!! ;-)
>
> Thanx, Paul
>


Attachments:
(No filename) (10.57 kB)
signature.asc (488.00 B)
Download all attachments

2017-06-30 04:02:53

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Fri, Jun 30, 2017 at 10:51:26AM +0800, Boqun Feng wrote:
> On Thu, Jun 29, 2017 at 11:11:26AM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 29, 2017 at 11:59:27AM -0400, Alan Stern wrote:
> > > On Thu, 29 Jun 2017, Will Deacon wrote:
> > >
> > > > [turns out I've not been on cc for this thread, but Jade pointed me to it
> > > > and I see my name came up at some point!]
> > > >
> > > > On Wed, Jun 28, 2017 at 05:05:46PM -0700, Linus Torvalds wrote:
> > > > > On Wed, Jun 28, 2017 at 4:54 PM, Paul E. McKenney
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > Linus, are you dead-set against defining spin_unlock_wait() to be
> > > > > > spin_lock + spin_unlock? For example, is the current x86 implementation
> > > > > > of spin_unlock_wait() really a non-negotiable hard requirement? Or
> > > > > > would you be willing to live with the spin_lock + spin_unlock semantics?
> > > > >
> > > > > So I think the "same as spin_lock + spin_unlock" semantics are kind of insane.
> > > > >
> > > > > One of the issues is that the same as "spin_lock + spin_unlock" is
> > > > > basically now architecture-dependent. Is it really the
> > > > > architecture-dependent ordering you want to define this as?
> > > > >
> > > > > So I just think it's a *bad* definition. If somebody wants something
> > > > > that is exactly equivalent to spin_lock+spin_unlock, then dammit, just
> > > > > do *THAT*. It's completely pointless to me to define
> > > > > spin_unlock_wait() in those terms.
> > > > >
> > > > > And if it's not equivalent to the *architecture* behavior of
> > > > > spin_lock+spin_unlock, then I think it should be descibed in terms
> > > > > that aren't about the architecture implementation (so you shouldn't
> > > > > describe it as "spin_lock+spin_unlock", you should describe it in
> > > > > terms of memory barrier semantics.
> > > > >
> > > > > And if we really have to use the spin_lock+spinunlock semantics for
> > > > > this, then what is the advantage of spin_unlock_wait at all, if it
> > > > > doesn't fundamentally avoid some locking overhead of just taking the
> > > > > spinlock in the first place?
> > > >
> > > > Just on this point -- the arm64 code provides the same ordering semantics
> > > > as you would get from a lock;unlock sequence, but we can optimise that
> > > > when compared to an actual lock;unlock sequence because we don't need to
> > > > wait in turn for our ticket. I suspect something similar could be done
> > > > if/when we move to qspinlocks.
> > > >
> > > > Whether or not this is actually worth optimising is another question, but
> > > > it is worth noting that unlock_wait can be implemented more cheaply than
> > > > lock;unlock, whilst providing the same ordering guarantees (if that's
> > > > really what we want -- see my reply to Paul).
> > > >
> > > > Simplicity tends to be my preference, so ripping this out would suit me
> > > > best ;)
> > >
> > > It would be best to know:
> > >
> > > (1). How spin_unlock_wait() is currently being used.
> > >
> > > (2). What it was originally intended for.
> > >
> > > Paul has done some research into (1). He can correct me if I get this
> > > wrong... Only a few (i.e., around one or two) of the usages don't seem
> > > to require the full spin_lock+spin_unlock semantics. I go along with
> > > Linus; the places which really do want it to behave like
> > > spin_lock+spin_unlock should simply use spin_lock+spin_unlock. There
> > > hasn't been any indication so far that the possible efficiency
> > > improvement Will mentions is at all important.
> > >
> > > According to Paul, most of the other places don't need anything more
> > > than the acquire guarantee (any changes made in earlier critical
> > > sections will be visible to the code following spin_unlock_wait). In
> > > which case, the semantics of spin_unlock_wait could be redefined in
> > > this simpler form.
> > >
> > > Or we could literally replace all the existing definitions with
> > > spin_lock+spin_unlock. Would that be so terrible?
> >
> > And here they are...
> >
> > spin_unlock_wait():
> >
> > o drivers/ata/libata-eh.c ata_scsi_cmd_error_handler()
> > spin_unlock_wait(ap->lock) in else-clause where then-clause has
> > a full critical section for this same lock. This use case could
> > potentially require both acquire and release semantics. (I am
> > following up with the developers/maintainers, suggesting that
> > they convert to spin_lock+spin_unlock if they need release
> > semantics.)
> >
> > This is error-handling code, which should be rare, so
> > spin_lock+spin_unlock should work fine here. Probably shouldn't
> > have bugged the maintainer, but email already sent. :-/
> >
> > o ipc/sem.c exit_sem()
> > This use case appears to need to wait only on prior critical
> > sections, as the only way we get here is if the entry has already
> > been removed from the list. An acquire-only spin_unlock_wait()
> > works here. However, this is sem-exit code, which is not a
> > fastpath, and the race should be rare, so spin_lock+spin_unlock
> > should work fine here.
> >
> > o kernel/sched/completion.c completion_done()
> > This use case appears to need to wait only on prior critical
> > sections, as the only way we get past the "if" is when the lock is
> > held by complete(), and you are only supposed to invoke complete()
> > once on a given completion. An acquire-only spin_unlock_wait()
> > works here, but the race should be rare, so spin_lock+spin_unlock
> > should also work fine here.
> >
> > o net/netfilter/nf_conntrack_core.c nf_conntrack_lock()
> > This instance of spin_unlock_wait() interacts with
> > nf_conntrack_all_lock()'s instance of spin_unlock_wait().
> > Although nf_conntrack_all_lock() has an smp_mb(), which I
> > believe provides release semantics given current implementations,
> > nf_conntrack_lock() just has smp_rmb().
> >
> > I believe that the smp_rmb() needs to be smp_mb(). Am I missing
> > something here that makes the current code safe on x86?
> >
>
> actually i think the smp_rmb() or even along with the spin_unlock_wait()
> in nf_conntrack_lock() is not needed, we could
> implementnf_conntrack_lock() as:
>
>
> void nf_conntrack_lock(spinlock_t *lock) __acquires(lock)
> {
> spin_lock(lock);
> while (unlikely(smp_load_acquire(nf_conntrack_locks_all))) {
> spin_unlock(lock);
> cpu_relaxed();
> spin_lock(lock);
> }
> }
>
> because in nf_conntrack_all_unlock(), we have:
>
> smp_store_release(&nf_conntrack_locks_all, false);
> spin_unlock(&nf_conntrack_locks_all_lock);
>
> so if we exit the loop, which means we observe nf_conntrack_locks_all
> being false, we actually hold the per bucket lock and observe everything
> before the smp_store_release(), which is the same as everything in the
> critical section of nf_conntrack_locks_all_lock. Otherwise, we observe
> the nf_conntrack_locks_all being true, which means a global lock
> critical section may be on its way, we simply drop the per bucket lock
> and test whether the global lock is finished again some time later.
>
> So I think spin_unlock_wait() in the nf_conntrack_lock() just requires
> acquire semantics, at least.
>
> Maybe I miss someting?

Or perhaps I was being too paranoid.

But does the same analysis work in the case where an nf_conntrack_lock
races with an nf_contrack_all_lock()?

> > I believe that this code could use spin_lock+spin_unlock without
> > significant performance penalties -- I do not believe that
> > nf_conntrack_locks_all_lock gets significant contention.
> >
> > raw_spin_unlock_wait() (Courtesy of Andrea Parri with added commentary):
> >
> > o kernel/exit.c do_exit()
> > Seems to rely on both acquire and release semantics. The
> > raw_spin_unlock_wait() primitive is preceded by a smp_mb().
> > But this is task exit doing spin_unlock_wait() on the task's
> > lock, so spin_lock+spin_unlock should work fine here.
> >
> > o kernel/sched/core.c do_task_dead()
> > Seems to rely on the acquire semantics only. The
> > raw_spin_unlock_wait() primitive is preceded by an inexplicable
> > smp_mb(). Again, this is task exit doing spin_unlock_wait() on
> > the task's lock, so spin_lock+spin_unlock should work fine here.
> >
> > o kernel/task_work.c task_work_run()
> > Seems to rely on the acquire semantics only. This is to handle
>
> I think this one needs the stronger semantics, the smp_mb() is just
> hidden in the cmpxchg() before the raw_spin_unlock_wait() ;-)
>
> cmpxchg() sets a special value to indicate the task_work has been taken,
> and raw_spin_unlock_wait() must wait until the next critical section of
> ->pi_lock(in task_work_cancel()) could observe this, otherwise we may
> cancel a task_work while executing it.

But either way, replacing the spin_unlock_wait() with a spin_lock()
immediately followed by a spin_unlock() should work correctly, right?

Thanx, Paul

> Regards,
> Boqun
> > a race with task_work_cancel(), which appears to be quite rare.
> > So the spin_lock+spin_unlock should work fine here.
> >
> > spin_lock()/spin_unlock():
> >
> > o ipc/sem.c complexmode_enter()
> > This used to be spin_unlock_wait(), but was changed to a
> > spin_lock()/spin_unlock() pair by 27d7be1801a4 ("ipc/sem.c:
> > avoid using spin_unlock_wait()").
> >
> > Looks to me like we really can drop spin_unlock_wait() in favor of
> > momentarily acquiring the lock. There are so few use cases that I don't
> > see a problem open-coding this. I will put together yet another patch
> > series for my spin_unlock_wait() collection of patch serieses. ;-)
> >
> > > As regards (2), I did a little digging. spin_unlock_wait was
> > > introduced in the 2.1.36 kernel, in mid-April 1997. I wasn't able to
> > > find a specific patch for it in the LKML archives. At the time it
> > > was used in only one place in the entire kernel (in kernel/exit.c):
> > >
> > > void release(struct task_struct * p)
> > > {
> > > int i;
> > >
> > > if (!p)
> > > return;
> > > if (p == current) {
> > > printk("task releasing itself\n");
> > > return;
> > > }
> > > for (i=1 ; i<NR_TASKS ; i++)
> > > if (task[i] == p) {
> > > #ifdef __SMP__
> > > /* FIXME! Cheesy, but kills the window... -DaveM */
> > > while(p->processor != NO_PROC_ID)
> > > barrier();
> > > spin_unlock_wait(&scheduler_lock);
> > > #endif
> > > nr_tasks--;
> > > task[i] = NULL;
> > > REMOVE_LINKS(p);
> > > release_thread(p);
> > > if (STACK_MAGIC != *(unsigned long *)p->kernel_stack_page)
> > > printk(KERN_ALERT "release: %s kernel stack corruption. Aiee\n", p->comm);
> > > free_kernel_stack(p->kernel_stack_page);
> > > current->cmin_flt += p->min_flt + p->cmin_flt;
> > > current->cmaj_flt += p->maj_flt + p->cmaj_flt;
> > > current->cnswap += p->nswap + p->cnswap;
> > > free_task_struct(p);
> > > return;
> > > }
> > > panic("trying to release non-existent task");
> > > }
> > >
> > > I'm not entirely clear on the point of this call. It looks like it
> > > wanted to wait until p was guaranteed not to be running on any
> > > processor ever again. (I don't see why it couldn't have just acquired
> > > the scheduler_lock -- was release() a particularly hot path?)
> > >
> > > Although it doesn't matter now, this would mean that the original
> > > semantics of spin_unlock_wait were different from what we are
> > > discussing. It apparently was meant to provide the release guarantee:
> > > any future critical sections would see the values that were visible
> > > before the call. Ironic.
> >
> > Cute!!! ;-)
> >
> > Thanx, Paul
> >


2017-06-30 05:15:07

by Boqun Feng

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Thu, Jun 29, 2017 at 09:02:41PM -0700, Paul E. McKenney wrote:
[...]
> > > o net/netfilter/nf_conntrack_core.c nf_conntrack_lock()
> > > This instance of spin_unlock_wait() interacts with
> > > nf_conntrack_all_lock()'s instance of spin_unlock_wait().
> > > Although nf_conntrack_all_lock() has an smp_mb(), which I
> > > believe provides release semantics given current implementations,
> > > nf_conntrack_lock() just has smp_rmb().
> > >
> > > I believe that the smp_rmb() needs to be smp_mb(). Am I missing
> > > something here that makes the current code safe on x86?
> > >
> >
> > actually i think the smp_rmb() or even along with the spin_unlock_wait()
> > in nf_conntrack_lock() is not needed, we could
> > implementnf_conntrack_lock() as:
> >
> >
> > void nf_conntrack_lock(spinlock_t *lock) __acquires(lock)
> > {
> > spin_lock(lock);
> > while (unlikely(smp_load_acquire(nf_conntrack_locks_all))) {
> > spin_unlock(lock);
> > cpu_relaxed();
> > spin_lock(lock);
> > }
> > }
> >
> > because in nf_conntrack_all_unlock(), we have:
> >
> > smp_store_release(&nf_conntrack_locks_all, false);
> > spin_unlock(&nf_conntrack_locks_all_lock);
> >
> > so if we exit the loop, which means we observe nf_conntrack_locks_all
> > being false, we actually hold the per bucket lock and observe everything
> > before the smp_store_release(), which is the same as everything in the
> > critical section of nf_conntrack_locks_all_lock. Otherwise, we observe
> > the nf_conntrack_locks_all being true, which means a global lock
> > critical section may be on its way, we simply drop the per bucket lock
> > and test whether the global lock is finished again some time later.
> >
> > So I think spin_unlock_wait() in the nf_conntrack_lock() just requires
> > acquire semantics, at least.
> >
> > Maybe I miss someting?
>
> Or perhaps I was being too paranoid.
>
> But does the same analysis work in the case where an nf_conntrack_lock
> races with an nf_contrack_all_lock()?
>

You mean the smp_mb()+spin_unlock_wait() in nf_conntrack_all_lock(),
right? I think it's different, because nf_conntrack_all_lock() relies
this release-like operation to let all the next critical sections of per
bucket locks observe nf_conntrack_locks_all=true, otherwise
nf_conntrack_lock() will break out the loop and access some data while
the global lock crictial section is doing the same.

The variable @nf_conntrack_locks_all is used for synchronized between
two kinds of locks and is set by nf_conntrack_all_lock(), I think this
make things different.

> > > I believe that this code could use spin_lock+spin_unlock without
> > > significant performance penalties -- I do not believe that
> > > nf_conntrack_locks_all_lock gets significant contention.
> > >
> > > raw_spin_unlock_wait() (Courtesy of Andrea Parri with added commentary):
> > >
> > > o kernel/exit.c do_exit()
> > > Seems to rely on both acquire and release semantics. The
> > > raw_spin_unlock_wait() primitive is preceded by a smp_mb().
> > > But this is task exit doing spin_unlock_wait() on the task's
> > > lock, so spin_lock+spin_unlock should work fine here.
> > >
> > > o kernel/sched/core.c do_task_dead()
> > > Seems to rely on the acquire semantics only. The
> > > raw_spin_unlock_wait() primitive is preceded by an inexplicable
> > > smp_mb(). Again, this is task exit doing spin_unlock_wait() on
> > > the task's lock, so spin_lock+spin_unlock should work fine here.
> > >
> > > o kernel/task_work.c task_work_run()
> > > Seems to rely on the acquire semantics only. This is to handle
> >
> > I think this one needs the stronger semantics, the smp_mb() is just
> > hidden in the cmpxchg() before the raw_spin_unlock_wait() ;-)
> >
> > cmpxchg() sets a special value to indicate the task_work has been taken,
> > and raw_spin_unlock_wait() must wait until the next critical section of
> > ->pi_lock(in task_work_cancel()) could observe this, otherwise we may
> > cancel a task_work while executing it.
>
> But either way, replacing the spin_unlock_wait() with a spin_lock()
> immediately followed by a spin_unlock() should work correctly, right?
>

Yep ;-) I was thinking about the case that we kept spin_unlock_wait()
with a simpler acquire semantics, and if so, we would actually have to
do the replace. But I saw your patchset of removing it, so it doesn't
matter.

Regards,
Boqun

> Thanx, Paul
>
> > Regards,
> > Boqun
> > > a race with task_work_cancel(), which appears to be quite rare.
> > > So the spin_lock+spin_unlock should work fine here.
> > >
> > > spin_lock()/spin_unlock():
> > >
> > > o ipc/sem.c complexmode_enter()
> > > This used to be spin_unlock_wait(), but was changed to a
> > > spin_lock()/spin_unlock() pair by 27d7be1801a4 ("ipc/sem.c:
> > > avoid using spin_unlock_wait()").
> > >
> > > Looks to me like we really can drop spin_unlock_wait() in favor of
> > > momentarily acquiring the lock. There are so few use cases that I don't
> > > see a problem open-coding this. I will put together yet another patch
> > > series for my spin_unlock_wait() collection of patch serieses. ;-)
> > >
> > > > As regards (2), I did a little digging. spin_unlock_wait was
> > > > introduced in the 2.1.36 kernel, in mid-April 1997. I wasn't able to
> > > > find a specific patch for it in the LKML archives. At the time it
> > > > was used in only one place in the entire kernel (in kernel/exit.c):
> > > >
> > > > void release(struct task_struct * p)
> > > > {
> > > > int i;
> > > >
> > > > if (!p)
> > > > return;
> > > > if (p == current) {
> > > > printk("task releasing itself\n");
> > > > return;
> > > > }
> > > > for (i=1 ; i<NR_TASKS ; i++)
> > > > if (task[i] == p) {
> > > > #ifdef __SMP__
> > > > /* FIXME! Cheesy, but kills the window... -DaveM */
> > > > while(p->processor != NO_PROC_ID)
> > > > barrier();
> > > > spin_unlock_wait(&scheduler_lock);
> > > > #endif
> > > > nr_tasks--;
> > > > task[i] = NULL;
> > > > REMOVE_LINKS(p);
> > > > release_thread(p);
> > > > if (STACK_MAGIC != *(unsigned long *)p->kernel_stack_page)
> > > > printk(KERN_ALERT "release: %s kernel stack corruption. Aiee\n", p->comm);
> > > > free_kernel_stack(p->kernel_stack_page);
> > > > current->cmin_flt += p->min_flt + p->cmin_flt;
> > > > current->cmaj_flt += p->maj_flt + p->cmaj_flt;
> > > > current->cnswap += p->nswap + p->cnswap;
> > > > free_task_struct(p);
> > > > return;
> > > > }
> > > > panic("trying to release non-existent task");
> > > > }
> > > >
> > > > I'm not entirely clear on the point of this call. It looks like it
> > > > wanted to wait until p was guaranteed not to be running on any
> > > > processor ever again. (I don't see why it couldn't have just acquired
> > > > the scheduler_lock -- was release() a particularly hot path?)
> > > >
> > > > Although it doesn't matter now, this would mean that the original
> > > > semantics of spin_unlock_wait were different from what we are
> > > > discussing. It apparently was meant to provide the release guarantee:
> > > > any future critical sections would see the values that were visible
> > > > before the call. Ironic.
> > >
> > > Cute!!! ;-)
> > >
> > > Thanx, Paul
> > >
>
>


Attachments:
(No filename) (7.13 kB)
signature.asc (488.00 B)
Download all attachments

2017-06-30 17:31:35

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Fri, Jun 30, 2017 at 01:16:54PM +0800, Boqun Feng wrote:
> On Thu, Jun 29, 2017 at 09:02:41PM -0700, Paul E. McKenney wrote:

[ . . . ]

> > > > o kernel/task_work.c task_work_run()
> > > > Seems to rely on the acquire semantics only. This is to handle
> > >
> > > I think this one needs the stronger semantics, the smp_mb() is just
> > > hidden in the cmpxchg() before the raw_spin_unlock_wait() ;-)
> > >
> > > cmpxchg() sets a special value to indicate the task_work has been taken,
> > > and raw_spin_unlock_wait() must wait until the next critical section of
> > > ->pi_lock(in task_work_cancel()) could observe this, otherwise we may
> > > cancel a task_work while executing it.
> >
> > But either way, replacing the spin_unlock_wait() with a spin_lock()
> > immediately followed by a spin_unlock() should work correctly, right?
> >
>
> Yep ;-) I was thinking about the case that we kept spin_unlock_wait()
> with a simpler acquire semantics, and if so, we would actually have to
> do the replace. But I saw your patchset of removing it, so it doesn't
> matter.

Well, there is a fair amount of review and testing between now and
it getting in (assuming that it in fact does get in), but I do very
much appreciate the vote of confidence! ;-)

Thanx, Paul