RCU's hotplug design will help understand the requirements an RCU
implementation needs to fullfill, such as dead-lock avoidance.
The rcu_barrier() section of the "Hotplug CPU" section already talks
about deadlocks, however the description of what else can deadlock other
than rcu_barrier is rather incomplete.
This commit therefore continues the section by describing how RCU's
design handles CPU hotplug in a deadlock-free way.
Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
.../RCU/Design/Requirements/Requirements.rst | 30 +++++++++++++++++--
1 file changed, 28 insertions(+), 2 deletions(-)
diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
index 1ae79a10a8de..e0413aa989dd 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.rst
+++ b/Documentation/RCU/Design/Requirements/Requirements.rst
@@ -1929,8 +1929,10 @@ The Linux-kernel CPU-hotplug implementation has notifiers that are used
to allow the various kernel subsystems (including RCU) to respond
appropriately to a given CPU-hotplug operation. Most RCU operations may
be invoked from CPU-hotplug notifiers, including even synchronous
-grace-period operations such as ``synchronize_rcu()`` and
-``synchronize_rcu_expedited()``.
+grace-period operations such as. However, the synchronous variants
+(``synchronize_rcu()`` and ``synchronize_rcu_expedited()``) should not
+from notifiers that execute via ``stop_machine()`` -- specifically those
+between the ``CPUHP_AP_OFFLINE`` and ``CPUHP_AP_ONLINE`` states.
However, all-callback-wait operations such as ``rcu_barrier()`` are also
not supported, due to the fact that there are phases of CPU-hotplug
@@ -1940,6 +1942,30 @@ deadlock. Furthermore, ``rcu_barrier()`` blocks CPU-hotplug operations
during its execution, which results in another type of deadlock when
invoked from a CPU-hotplug notifier.
+Also, RCU's implementation avoids serious deadlocks which could occur due to
+interaction between hotplug, timers and grace period processing. It does so by
+maintaining its own books of every CPU's hotplug state, independent of
+the existing general-purpose CPU masks and by reporting quiescent states
+explictly when an online CPU is going down. Due to this design, the force
+quiescent state loop (FQS) is not required to report quiescent states for
+offline CPUs, like it does for idle CPUs, but it does splat if offline CPUs are
+stalling the RCU grace period for too long.
+
+For an offline CPU, the quiescent state will be reported in either of:
+1. During CPU offlining, using RCU's hotplug notifier (``rcu_report_dead()``).
+2. During grace period initialization (``rcu_gp_init()``) if it detected a race
+ with CPU offlining, or a race with a task unblocking on a node which
+ previously had all of its CPUs offlined.
+
+The CPU onlining path (``rcu_cpu_starting()``) does not need to report a
+quiescent state for an offline CPU; in fact it would trigger a warning if a
+quiescent state was not already reported for that CPU.
+
+During the checking/modification of RCU's hotplug bookkeeping, the
+corresponding CPU's leaf node lock is held. This avoids race conditions between
+RCU's hotplug notifier hooks, grace period initialization code and the FQS loop
+which can concurrently refer to or modify the bookkeeping.
+
Scheduler and RCU
~~~~~~~~~~~~~~~~~
--
2.28.0.709.gb0816b6eb0-goog
Hi Paul,
On Tue, Sep 29, 2020 at 03:29:28PM -0400, Joel Fernandes (Google) wrote:
> RCU's hotplug design will help understand the requirements an RCU
> implementation needs to fullfill, such as dead-lock avoidance.
>
> The rcu_barrier() section of the "Hotplug CPU" section already talks
> about deadlocks, however the description of what else can deadlock other
> than rcu_barrier is rather incomplete.
>
> This commit therefore continues the section by describing how RCU's
> design handles CPU hotplug in a deadlock-free way.
>
> Signed-off-by: Joel Fernandes (Google) <[email protected]>
> ---
> .../RCU/Design/Requirements/Requirements.rst | 30 +++++++++++++++++--
> 1 file changed, 28 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
> index 1ae79a10a8de..e0413aa989dd 100644
> --- a/Documentation/RCU/Design/Requirements/Requirements.rst
> +++ b/Documentation/RCU/Design/Requirements/Requirements.rst
> @@ -1929,8 +1929,10 @@ The Linux-kernel CPU-hotplug implementation has notifiers that are used
> to allow the various kernel subsystems (including RCU) to respond
> appropriately to a given CPU-hotplug operation. Most RCU operations may
> be invoked from CPU-hotplug notifiers, including even synchronous
> -grace-period operations such as ``synchronize_rcu()`` and
> -``synchronize_rcu_expedited()``.
> +grace-period operations such as. However, the synchronous variants
> +(``synchronize_rcu()`` and ``synchronize_rcu_expedited()``) should not
> +from notifiers that execute via ``stop_machine()`` -- specifically those
The "should not from notifiers" should be "should not be used from
notifiers" here. Sorry and hope you can fix it up.
thanks,
- Joel
> +between the ``CPUHP_AP_OFFLINE`` and ``CPUHP_AP_ONLINE`` states.
>
> However, all-callback-wait operations such as ``rcu_barrier()`` are also
> not supported, due to the fact that there are phases of CPU-hotplug
> @@ -1940,6 +1942,30 @@ deadlock. Furthermore, ``rcu_barrier()`` blocks CPU-hotplug operations
> during its execution, which results in another type of deadlock when
> invoked from a CPU-hotplug notifier.
>
> +Also, RCU's implementation avoids serious deadlocks which could occur due to
> +interaction between hotplug, timers and grace period processing. It does so by
> +maintaining its own books of every CPU's hotplug state, independent of
> +the existing general-purpose CPU masks and by reporting quiescent states
> +explictly when an online CPU is going down. Due to this design, the force
> +quiescent state loop (FQS) is not required to report quiescent states for
> +offline CPUs, like it does for idle CPUs, but it does splat if offline CPUs are
> +stalling the RCU grace period for too long.
> +
> +For an offline CPU, the quiescent state will be reported in either of:
> +1. During CPU offlining, using RCU's hotplug notifier (``rcu_report_dead()``).
> +2. During grace period initialization (``rcu_gp_init()``) if it detected a race
> + with CPU offlining, or a race with a task unblocking on a node which
> + previously had all of its CPUs offlined.
> +
> +The CPU onlining path (``rcu_cpu_starting()``) does not need to report a
> +quiescent state for an offline CPU; in fact it would trigger a warning if a
> +quiescent state was not already reported for that CPU.
> +
> +During the checking/modification of RCU's hotplug bookkeeping, the
> +corresponding CPU's leaf node lock is held. This avoids race conditions between
> +RCU's hotplug notifier hooks, grace period initialization code and the FQS loop
> +which can concurrently refer to or modify the bookkeeping.
> +
> Scheduler and RCU
> ~~~~~~~~~~~~~~~~~
>
> --
> 2.28.0.709.gb0816b6eb0-goog
>
On Tue, Sep 29, 2020 at 03:32:48PM -0400, Joel Fernandes wrote:
> Hi Paul,
>
> On Tue, Sep 29, 2020 at 03:29:28PM -0400, Joel Fernandes (Google) wrote:
> > RCU's hotplug design will help understand the requirements an RCU
> > implementation needs to fullfill, such as dead-lock avoidance.
> >
> > The rcu_barrier() section of the "Hotplug CPU" section already talks
> > about deadlocks, however the description of what else can deadlock other
> > than rcu_barrier is rather incomplete.
> >
> > This commit therefore continues the section by describing how RCU's
> > design handles CPU hotplug in a deadlock-free way.
> >
> > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> > ---
> > .../RCU/Design/Requirements/Requirements.rst | 30 +++++++++++++++++--
> > 1 file changed, 28 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
> > index 1ae79a10a8de..e0413aa989dd 100644
> > --- a/Documentation/RCU/Design/Requirements/Requirements.rst
> > +++ b/Documentation/RCU/Design/Requirements/Requirements.rst
> > @@ -1929,8 +1929,10 @@ The Linux-kernel CPU-hotplug implementation has notifiers that are used
> > to allow the various kernel subsystems (including RCU) to respond
> > appropriately to a given CPU-hotplug operation. Most RCU operations may
> > be invoked from CPU-hotplug notifiers, including even synchronous
> > -grace-period operations such as ``synchronize_rcu()`` and
> > -``synchronize_rcu_expedited()``.
> > +grace-period operations such as. However, the synchronous variants
> > +(``synchronize_rcu()`` and ``synchronize_rcu_expedited()``) should not
> > +from notifiers that execute via ``stop_machine()`` -- specifically those
>
> The "should not from notifiers" should be "should not be used from
> notifiers" here. Sorry and hope you can fix it up.
Thank you, and queued for further review. How does the below look
for a general fixup?
Thanx, Paul
------------------------------------------------------------------------
commit a93716177eeac726037828b28e6b1a45e828688a
Author: Joel Fernandes (Google) <[email protected]>
Date: Tue Sep 29 15:29:28 2020 -0400
docs: Update RCU's hotplug requirements with a bit about design
The rcu_barrier() section of the "Hotplug CPU" section discusses
deadlocks, however the description of deadlocks other than those involving
rcu_barrier() is rather incomplete.
This commit therefore continues the section by describing how RCU's
design handles CPU hotplug in a deadlock-free way.
Signed-off-by: Joel Fernandes (Google) <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
index 1ae79a1..98557fe 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.rst
+++ b/Documentation/RCU/Design/Requirements/Requirements.rst
@@ -1929,16 +1929,45 @@ The Linux-kernel CPU-hotplug implementation has notifiers that are used
to allow the various kernel subsystems (including RCU) to respond
appropriately to a given CPU-hotplug operation. Most RCU operations may
be invoked from CPU-hotplug notifiers, including even synchronous
-grace-period operations such as ``synchronize_rcu()`` and
-``synchronize_rcu_expedited()``.
-
-However, all-callback-wait operations such as ``rcu_barrier()`` are also
-not supported, due to the fact that there are phases of CPU-hotplug
-operations where the outgoing CPU's callbacks will not be invoked until
-after the CPU-hotplug operation ends, which could also result in
-deadlock. Furthermore, ``rcu_barrier()`` blocks CPU-hotplug operations
-during its execution, which results in another type of deadlock when
-invoked from a CPU-hotplug notifier.
+grace-period operations such as (``synchronize_rcu()`` and
+``synchronize_rcu_expedited()``). However, these synchronous operations
+do block and therefore cannot be invoked from notifiers that execute via
+``stop_machine()``, specifically those between the ``CPUHP_AP_OFFLINE``
+and ``CPUHP_AP_ONLINE`` states.
+
+In addition, all-callback-wait operations such as ``rcu_barrier()`` may
+not be invoked from any CPU-hotplug notifier. This restriction is due
+to the fact that there are phases of CPU-hotplug operations where the
+outgoing CPU's callbacks will not be invoked until after the CPU-hotplug
+operation ends, which could also result in deadlock. Furthermore,
+``rcu_barrier()`` blocks CPU-hotplug operations during its execution,
+which results in another type of deadlock when invoked from a CPU-hotplug
+notifier.
+
+Finally, RCU must avoid deadlocks due to interaction between hotplug,
+timers and grace period processing. It does so by maintaining its own set
+of books that duplicate the centrally maintained ``cpu_online_mask``,
+and also by reporting quiescent states explictly when a CPU goes
+offline. This explicit reporting of quiescent states avoids any need
+for the force-quiescent-state loop (FQS) to report quiescent states for
+offline CPUs. However, as a debugging measure, the FQS loop does splat
+if offline CPUs block an RCU grace period for too long.
+
+An offline CPU's quiescent state will be reported either:
+1. As the CPU goes offline using RCU's hotplug notifier (``rcu_report_dead()``).
+2. When grace period initialization (``rcu_gp_init()``) detects a
+ race either with CPU offlining or with a task unblocking on a leaf
+ ``rcu_node`` structure whose CPUs are all offline.
+
+The CPU-online path (``rcu_cpu_starting()``) should never need to report
+a quiescent state for an offline CPU. However, as a debugging measure,
+it does emit a warning if a quiescent state was not already reported
+for that CPU.
+
+During the checking/modification of RCU's hotplug bookkeeping, the
+corresponding CPU's leaf node lock is held. This avoids race conditions
+between RCU's hotplug notifier hooks, the grace period initialization
+code, and the FQS loop, all of which refer to or modify this bookkeeping.
Scheduler and RCU
~~~~~~~~~~~~~~~~~
On Fri, Oct 2, 2020 at 3:34 PM Paul E. McKenney <[email protected]> wrote:
>
> On Tue, Sep 29, 2020 at 03:32:48PM -0400, Joel Fernandes wrote:
> > Hi Paul,
> >
> > On Tue, Sep 29, 2020 at 03:29:28PM -0400, Joel Fernandes (Google) wrote:
> > > RCU's hotplug design will help understand the requirements an RCU
> > > implementation needs to fullfill, such as dead-lock avoidance.
> > >
> > > The rcu_barrier() section of the "Hotplug CPU" section already talks
> > > about deadlocks, however the description of what else can deadlock other
> > > than rcu_barrier is rather incomplete.
> > >
> > > This commit therefore continues the section by describing how RCU's
> > > design handles CPU hotplug in a deadlock-free way.
> > >
> > > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> > > ---
> > > .../RCU/Design/Requirements/Requirements.rst | 30 +++++++++++++++++--
> > > 1 file changed, 28 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
> > > index 1ae79a10a8de..e0413aa989dd 100644
> > > --- a/Documentation/RCU/Design/Requirements/Requirements.rst
> > > +++ b/Documentation/RCU/Design/Requirements/Requirements.rst
> > > @@ -1929,8 +1929,10 @@ The Linux-kernel CPU-hotplug implementation has notifiers that are used
> > > to allow the various kernel subsystems (including RCU) to respond
> > > appropriately to a given CPU-hotplug operation. Most RCU operations may
> > > be invoked from CPU-hotplug notifiers, including even synchronous
> > > -grace-period operations such as ``synchronize_rcu()`` and
> > > -``synchronize_rcu_expedited()``.
> > > +grace-period operations such as. However, the synchronous variants
> > > +(``synchronize_rcu()`` and ``synchronize_rcu_expedited()``) should not
> > > +from notifiers that execute via ``stop_machine()`` -- specifically those
> >
> > The "should not from notifiers" should be "should not be used from
> > notifiers" here. Sorry and hope you can fix it up.
>
> Thank you, and queued for further review. How does the below look
> for a general fixup?
Looks great, thanks!
-Joel
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit a93716177eeac726037828b28e6b1a45e828688a
> Author: Joel Fernandes (Google) <[email protected]>
> Date: Tue Sep 29 15:29:28 2020 -0400
>
> docs: Update RCU's hotplug requirements with a bit about design
>
> The rcu_barrier() section of the "Hotplug CPU" section discusses
> deadlocks, however the description of deadlocks other than those involving
> rcu_barrier() is rather incomplete.
>
> This commit therefore continues the section by describing how RCU's
> design handles CPU hotplug in a deadlock-free way.
>
> Signed-off-by: Joel Fernandes (Google) <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
> index 1ae79a1..98557fe 100644
> --- a/Documentation/RCU/Design/Requirements/Requirements.rst
> +++ b/Documentation/RCU/Design/Requirements/Requirements.rst
> @@ -1929,16 +1929,45 @@ The Linux-kernel CPU-hotplug implementation has notifiers that are used
> to allow the various kernel subsystems (including RCU) to respond
> appropriately to a given CPU-hotplug operation. Most RCU operations may
> be invoked from CPU-hotplug notifiers, including even synchronous
> -grace-period operations such as ``synchronize_rcu()`` and
> -``synchronize_rcu_expedited()``.
> -
> -However, all-callback-wait operations such as ``rcu_barrier()`` are also
> -not supported, due to the fact that there are phases of CPU-hotplug
> -operations where the outgoing CPU's callbacks will not be invoked until
> -after the CPU-hotplug operation ends, which could also result in
> -deadlock. Furthermore, ``rcu_barrier()`` blocks CPU-hotplug operations
> -during its execution, which results in another type of deadlock when
> -invoked from a CPU-hotplug notifier.
> +grace-period operations such as (``synchronize_rcu()`` and
> +``synchronize_rcu_expedited()``). However, these synchronous operations
> +do block and therefore cannot be invoked from notifiers that execute via
> +``stop_machine()``, specifically those between the ``CPUHP_AP_OFFLINE``
> +and ``CPUHP_AP_ONLINE`` states.
> +
> +In addition, all-callback-wait operations such as ``rcu_barrier()`` may
> +not be invoked from any CPU-hotplug notifier. This restriction is due
> +to the fact that there are phases of CPU-hotplug operations where the
> +outgoing CPU's callbacks will not be invoked until after the CPU-hotplug
> +operation ends, which could also result in deadlock. Furthermore,
> +``rcu_barrier()`` blocks CPU-hotplug operations during its execution,
> +which results in another type of deadlock when invoked from a CPU-hotplug
> +notifier.
> +
> +Finally, RCU must avoid deadlocks due to interaction between hotplug,
> +timers and grace period processing. It does so by maintaining its own set
> +of books that duplicate the centrally maintained ``cpu_online_mask``,
> +and also by reporting quiescent states explictly when a CPU goes
> +offline. This explicit reporting of quiescent states avoids any need
> +for the force-quiescent-state loop (FQS) to report quiescent states for
> +offline CPUs. However, as a debugging measure, the FQS loop does splat
> +if offline CPUs block an RCU grace period for too long.
> +
> +An offline CPU's quiescent state will be reported either:
> +1. As the CPU goes offline using RCU's hotplug notifier (``rcu_report_dead()``).
> +2. When grace period initialization (``rcu_gp_init()``) detects a
> + race either with CPU offlining or with a task unblocking on a leaf
> + ``rcu_node`` structure whose CPUs are all offline.
> +
> +The CPU-online path (``rcu_cpu_starting()``) should never need to report
> +a quiescent state for an offline CPU. However, as a debugging measure,
> +it does emit a warning if a quiescent state was not already reported
> +for that CPU.
> +
> +During the checking/modification of RCU's hotplug bookkeeping, the
> +corresponding CPU's leaf node lock is held. This avoids race conditions
> +between RCU's hotplug notifier hooks, the grace period initialization
> +code, and the FQS loop, all of which refer to or modify this bookkeeping.
>
> Scheduler and RCU
> ~~~~~~~~~~~~~~~~~