Add a warning if CPU being onlined did not report QS already. This is to
simplify the code in the CPU onlining path and also to make clear about
where QS is reported. The act of QS reporting in CPU onlining path is
is likely unnecessary as shown by code reading and testing with
rcutorture's TREE03 and hotplug parameters.
Cc: Paul E. McKenney <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Suggested-by: Paul E. McKenney <[email protected]>
Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
kernel/rcu/tree.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 65e1b5e92319..1e51962b565b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3996,7 +3996,19 @@ void rcu_cpu_starting(unsigned int cpu)
rcu_gpnum_ovf(rnp, rdp); /* Offline-induced counter wrap? */
rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq);
rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags);
- if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */
+
+ /*
+ * Delete QS reporting from here, by June 2021, if warning does not
+ * fire. Let us make the rules for reporting QS for an offline CPUs
+ * more explicit. The CPU onlining path does not need to report QS for
+ * an offline CPU. Either the QS should have reported during CPU
+ * offlining, or during rcu_gp_init() if it detected a race with either
+ * CPU offlining or task unblocking on previously offlined CPUs. Note
+ * that the FQS loop also does not report QS for an offline CPU any
+ * longer (unless it splats due to an offline CPU blocking the GP for
+ * too long).
+ */
+ if (WARN_ON_ONCE(rnp->qsmask & mask)) { /* RCU waiting on incoming CPU? */
rcu_disable_urgency_upon_qs(rdp);
/* Report QS -after- changing ->qsmaskinitnext! */
rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
--
2.28.0.rc0.142.g3c755180ce-goog
At least since v4.19, the FQS loop no longer reports quiescent states
unless it is a dire situation where an offlined CPU failed to report
a quiescent state. Let us clarify the comment in rcu_gp_init() inorder
to keep the comment current.
Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
kernel/rcu/tree.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 1e51962b565b..929568ff5989 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1701,8 +1701,8 @@ static bool rcu_gp_init(void)
/*
* Apply per-leaf buffered online and offline operations to the
- * rcu_node tree. Note that this new grace period need not wait
- * for subsequent online CPUs, and that quiescent-state forcing
+ * rcu_node tree. Note that this new grace period need not wait for
+ * subsequent online CPUs, and that RCU hooks in CPU offlining path
* will handle subsequent offline CPUs.
*/
rcu_state.gp_state = RCU_GP_ONOFF;
--
2.28.0.rc0.142.g3c755180ce-goog
On Wed, Jul 29, 2020 at 11:02 PM Joel Fernandes (Google)
<[email protected]> wrote:
>
> At least since v4.19, the FQS loop no longer reports quiescent states
I meant here, "FQS loop no longer reports quiescent states for offline CPUs."
Sorry,
- Joel
> unless it is a dire situation where an offlined CPU failed to report
> a quiescent state. Let us clarify the comment in rcu_gp_init() inorder
> to keep the comment current.
>
> Signed-off-by: Joel Fernandes (Google) <[email protected]>
> ---
> kernel/rcu/tree.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 1e51962b565b..929568ff5989 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1701,8 +1701,8 @@ static bool rcu_gp_init(void)
>
> /*
> * Apply per-leaf buffered online and offline operations to the
> - * rcu_node tree. Note that this new grace period need not wait
> - * for subsequent online CPUs, and that quiescent-state forcing
> + * rcu_node tree. Note that this new grace period need not wait for
> + * subsequent online CPUs, and that RCU hooks in CPU offlining path
> * will handle subsequent offline CPUs.
> */
> rcu_state.gp_state = RCU_GP_ONOFF;
> --
> 2.28.0.rc0.142.g3c755180ce-goog
>
On Wed, Jul 29, 2020 at 11:02:20PM -0400, Joel Fernandes (Google) wrote:
> Add a warning if CPU being onlined did not report QS already. This is to
> simplify the code in the CPU onlining path and also to make clear about
> where QS is reported. The act of QS reporting in CPU onlining path is
> is likely unnecessary as shown by code reading and testing with
> rcutorture's TREE03 and hotplug parameters.
How about something like this for the commit log?
------------------------------------------------------------------------
Currently, rcu_cpu_starting() checks to see if the RCU core expects a
quiescent state from the incoming CPU. However, the current interaction
between RCU quiescent-state reporting and CPU-hotplug operations should
mean that the incoming CPU never needs to report a quiescent state.
First, the outgoing CPU reports a quiescent state if needed. Second,
the race where the CPU is leaving just as RCU is initializing a new
grace period is handled by an explicit check for this condition. Third,
the CPU's leaf rcu_node structure's ->lock serializes these checks.
This means that if rcu_cpu_starting() ever feels the need to report
a quiescent state, then there is a bug somewhere in the CPU hotplug
code or the RCU grace-period handling code. This commit therefore
adds a WARN_ON_ONCE() to bring that bug to everyone's attention.
------------------------------------------------------------------------
> Cc: Paul E. McKenney <[email protected]>
> Cc: Neeraj Upadhyay <[email protected]>
> Suggested-by: Paul E. McKenney <[email protected]>
> Signed-off-by: Joel Fernandes (Google) <[email protected]>
>
> ---
> kernel/rcu/tree.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 65e1b5e92319..1e51962b565b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3996,7 +3996,19 @@ void rcu_cpu_starting(unsigned int cpu)
> rcu_gpnum_ovf(rnp, rdp); /* Offline-induced counter wrap? */
> rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq);
> rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags);
> - if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */
> +
> + /*
> + * Delete QS reporting from here, by June 2021, if warning does not
> + * fire. Let us make the rules for reporting QS for an offline CPUs
> + * more explicit. The CPU onlining path does not need to report QS for
> + * an offline CPU. Either the QS should have reported during CPU
> + * offlining, or during rcu_gp_init() if it detected a race with either
> + * CPU offlining or task unblocking on previously offlined CPUs. Note
> + * that the FQS loop also does not report QS for an offline CPU any
> + * longer (unless it splats due to an offline CPU blocking the GP for
> + * too long).
> + */
Let's leave at least the WARN_ON_ONCE() indefinitely. If you don't
believe me, remove this code in your local tree, have someone give you
several branches, some with bugs injected, and then try to figure out
which have the bugs and then try to find those bugs.
This is not a fastpath, so the overhead of the check is not a concern.
Believe me, the difficulty of bug location without this check is a very
real concern! ;-)
On the other hand, I fully agree with the benefits of documenting the
design rules. But is this really the best place to do that from the
viewpoint of someone who is trying to figure out how RCU works?
Thanx, Paul
> + if (WARN_ON_ONCE(rnp->qsmask & mask)) { /* RCU waiting on incoming CPU? */
> rcu_disable_urgency_upon_qs(rdp);
> /* Report QS -after- changing ->qsmaskinitnext! */
> rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> --
> 2.28.0.rc0.142.g3c755180ce-goog
>
On Wed, Jul 29, 2020 at 11:25:19PM -0400, Joel Fernandes wrote:
> On Wed, Jul 29, 2020 at 11:02 PM Joel Fernandes (Google)
> <[email protected]> wrote:
> >
> > At least since v4.19, the FQS loop no longer reports quiescent states
>
> I meant here, "FQS loop no longer reports quiescent states for offline CPUs."
>
> Sorry,
You did have me going there for a bit. ;-)
No period (".") at the end though, unless you fix up the following
to start a new sentence.
> > unless it is a dire situation where an offlined CPU failed to report
> > a quiescent state. Let us clarify the comment in rcu_gp_init() inorder
> > to keep the comment current.
How about the following for this last sentence?
"This commit therefore fixes the comment in rcu_gp_init() to match
the current code."
> > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> > ---
> > kernel/rcu/tree.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 1e51962b565b..929568ff5989 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1701,8 +1701,8 @@ static bool rcu_gp_init(void)
> >
> > /*
> > * Apply per-leaf buffered online and offline operations to the
> > - * rcu_node tree. Note that this new grace period need not wait
> > - * for subsequent online CPUs, and that quiescent-state forcing
> > + * rcu_node tree. Note that this new grace period need not wait for
> > + * subsequent online CPUs, and that RCU hooks in CPU offlining path
> > * will handle subsequent offline CPUs.
How about something like this?
... Note that this new grace period ned not wait for subsequent
online CPUs, and that RCU hooks in the CPU offlining path, when
combined with checks in this function, will handle CPUs that
are currently going offline and that go offline later.
Thanx, Paul
> > */
> > rcu_state.gp_state = RCU_GP_ONOFF;
> > --
> > 2.28.0.rc0.142.g3c755180ce-goog
> >
On Thu, Jul 30, 2020 at 09:21:59AM -0700, Paul E. McKenney wrote:
> On Wed, Jul 29, 2020 at 11:02:20PM -0400, Joel Fernandes (Google) wrote:
> > Add a warning if CPU being onlined did not report QS already. This is to
> > simplify the code in the CPU onlining path and also to make clear about
> > where QS is reported. The act of QS reporting in CPU onlining path is
> > is likely unnecessary as shown by code reading and testing with
> > rcutorture's TREE03 and hotplug parameters.
>
> How about something like this for the commit log?
>
> ------------------------------------------------------------------------
>
> Currently, rcu_cpu_starting() checks to see if the RCU core expects a
> quiescent state from the incoming CPU. However, the current interaction
> between RCU quiescent-state reporting and CPU-hotplug operations should
> mean that the incoming CPU never needs to report a quiescent state.
> First, the outgoing CPU reports a quiescent state if needed. Second,
> the race where the CPU is leaving just as RCU is initializing a new
> grace period is handled by an explicit check for this condition. Third,
> the CPU's leaf rcu_node structure's ->lock serializes these checks.
>
> This means that if rcu_cpu_starting() ever feels the need to report
> a quiescent state, then there is a bug somewhere in the CPU hotplug
> code or the RCU grace-period handling code. This commit therefore
> adds a WARN_ON_ONCE() to bring that bug to everyone's attention.
Wow, this is so well written. I certainly need to up my writing game a bitt.
But yes, this is flawless explanation and really will help people reading it
in the future.
I will make the change to my tree for the next revision.
thanks,
- Joel
> ------------------------------------------------------------------------
>
> > Cc: Paul E. McKenney <[email protected]>
> > Cc: Neeraj Upadhyay <[email protected]>
> > Suggested-by: Paul E. McKenney <[email protected]>
> > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> >
> > ---
> > kernel/rcu/tree.c | 14 +++++++++++++-
> > 1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 65e1b5e92319..1e51962b565b 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3996,7 +3996,19 @@ void rcu_cpu_starting(unsigned int cpu)
> > rcu_gpnum_ovf(rnp, rdp); /* Offline-induced counter wrap? */
> > rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq);
> > rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags);
> > - if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */
> > +
> > + /*
> > + * Delete QS reporting from here, by June 2021, if warning does not
> > + * fire. Let us make the rules for reporting QS for an offline CPUs
> > + * more explicit. The CPU onlining path does not need to report QS for
> > + * an offline CPU. Either the QS should have reported during CPU
> > + * offlining, or during rcu_gp_init() if it detected a race with either
> > + * CPU offlining or task unblocking on previously offlined CPUs. Note
> > + * that the FQS loop also does not report QS for an offline CPU any
> > + * longer (unless it splats due to an offline CPU blocking the GP for
> > + * too long).
> > + */
>
> Let's leave at least the WARN_ON_ONCE() indefinitely. If you don't
> believe me, remove this code in your local tree, have someone give you
> several branches, some with bugs injected, and then try to figure out
> which have the bugs and then try to find those bugs.
>
> This is not a fastpath, so the overhead of the check is not a concern.
> Believe me, the difficulty of bug location without this check is a very
> real concern! ;-)
>
> On the other hand, I fully agree with the benefits of documenting the
> design rules. But is this really the best place to do that from the
> viewpoint of someone who is trying to figure out how RCU works?
>
> Thanx, Paul
>
> > + if (WARN_ON_ONCE(rnp->qsmask & mask)) { /* RCU waiting on incoming CPU? */
> > rcu_disable_urgency_upon_qs(rdp);
> > /* Report QS -after- changing ->qsmaskinitnext! */
> > rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> > --
> > 2.28.0.rc0.142.g3c755180ce-goog
> >
On Thu, Jul 30, 2020 at 09:35:20AM -0700, Paul E. McKenney wrote:
> On Wed, Jul 29, 2020 at 11:25:19PM -0400, Joel Fernandes wrote:
> > On Wed, Jul 29, 2020 at 11:02 PM Joel Fernandes (Google)
> > <[email protected]> wrote:
> > >
> > > At least since v4.19, the FQS loop no longer reports quiescent states
> >
> > I meant here, "FQS loop no longer reports quiescent states for offline CPUs."
> >
> > Sorry,
>
> You did have me going there for a bit. ;-)
>
> No period (".") at the end though, unless you fix up the following
> to start a new sentence.
Ok.
> > > unless it is a dire situation where an offlined CPU failed to report
> > > a quiescent state. Let us clarify the comment in rcu_gp_init() inorder
> > > to keep the comment current.
>
> How about the following for this last sentence?
>
> "This commit therefore fixes the comment in rcu_gp_init() to match
> the current code."
As per:
https://www.kernel.org/doc/html/v4.17/process/submitting-patches.html
It says:
Describe your changes in imperative mood, e.g. “make xyzzy do frotz” instead
of “[This patch] makes xyzzy do frotz” or “[I] changed xyzzy to do frotz”, as
if you are giving orders to the codebase to change its behaviour.
May be I should make it "Fix the comment in rcu_gp_init() to match the
current code"?
> > > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> > > ---
> > > kernel/rcu/tree.c | 4 ++--
> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 1e51962b565b..929568ff5989 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -1701,8 +1701,8 @@ static bool rcu_gp_init(void)
> > >
> > > /*
> > > * Apply per-leaf buffered online and offline operations to the
> > > - * rcu_node tree. Note that this new grace period need not wait
> > > - * for subsequent online CPUs, and that quiescent-state forcing
> > > + * rcu_node tree. Note that this new grace period need not wait for
> > > + * subsequent online CPUs, and that RCU hooks in CPU offlining path
> > > * will handle subsequent offline CPUs.
>
> How about something like this?
>
> ... Note that this new grace period ned not wait for subsequent
> online CPUs, and that RCU hooks in the CPU offlining path, when
> combined with checks in this function, will handle CPUs that
> are currently going offline and that go offline later.
Sounds good to me. I think s/and that go/or that go/ though.
I will make these changes and send v3, let me know though if you object.
thanks,
- Joel
> Thanx, Paul
>
> > > */
> > > rcu_state.gp_state = RCU_GP_ONOFF;
> > > --
> > > 2.28.0.rc0.142.g3c755180ce-goog
> > >
On Thu, Jul 30, 2020 at 09:21:52PM -0400, Joel Fernandes wrote:
> On Thu, Jul 30, 2020 at 09:35:20AM -0700, Paul E. McKenney wrote:
> > On Wed, Jul 29, 2020 at 11:25:19PM -0400, Joel Fernandes wrote:
> > > On Wed, Jul 29, 2020 at 11:02 PM Joel Fernandes (Google)
> > > <[email protected]> wrote:
> > > >
> > > > At least since v4.19, the FQS loop no longer reports quiescent states
> > >
> > > I meant here, "FQS loop no longer reports quiescent states for offline CPUs."
> > >
> > > Sorry,
> >
> > You did have me going there for a bit. ;-)
> >
> > No period (".") at the end though, unless you fix up the following
> > to start a new sentence.
>
> Ok.
>
> > > > unless it is a dire situation where an offlined CPU failed to report
> > > > a quiescent state. Let us clarify the comment in rcu_gp_init() inorder
> > > > to keep the comment current.
> >
> > How about the following for this last sentence?
> >
> > "This commit therefore fixes the comment in rcu_gp_init() to match
> > the current code."
>
> As per:
> https://www.kernel.org/doc/html/v4.17/process/submitting-patches.html
>
> It says:
> Describe your changes in imperative mood, e.g. “make xyzzy do frotz” instead
> of “[This patch] makes xyzzy do frotz” or “[I] changed xyzzy to do frotz”, as
> if you are giving orders to the codebase to change its behaviour.
>
> May be I should make it "Fix the comment in rcu_gp_init() to match the
> current code"?
What submitting-patches.rst is objecting to is starting the commit
log with "This patch...". I am suggesting something quite different,
namely providing a clear indication of the transition from problem
statement to solution.
> > > > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> > > > ---
> > > > kernel/rcu/tree.c | 4 ++--
> > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 1e51962b565b..929568ff5989 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -1701,8 +1701,8 @@ static bool rcu_gp_init(void)
> > > >
> > > > /*
> > > > * Apply per-leaf buffered online and offline operations to the
> > > > - * rcu_node tree. Note that this new grace period need not wait
> > > > - * for subsequent online CPUs, and that quiescent-state forcing
> > > > + * rcu_node tree. Note that this new grace period need not wait for
> > > > + * subsequent online CPUs, and that RCU hooks in CPU offlining path
> > > > * will handle subsequent offline CPUs.
> >
> > How about something like this?
> >
> > ... Note that this new grace period ned not wait for subsequent
> > online CPUs, and that RCU hooks in the CPU offlining path, when
> > combined with checks in this function, will handle CPUs that
> > are currently going offline and that go offline later.
>
> Sounds good to me. I think s/and that go/or that go/ though.
Good point! Another approach would be s/and that/and those that/
but yours works.
> I will make these changes and send v3, let me know though if you object.
Sounds good!
Thanx, Paul
> thanks,
>
> - Joel
>
>
> > Thanx, Paul
> >
> > > > */
> > > > rcu_state.gp_state = RCU_GP_ONOFF;
> > > > --
> > > > 2.28.0.rc0.142.g3c755180ce-goog
> > > >
On Thu, Jul 30, 2020 at 09:21:59AM -0700, Paul E. McKenney wrote:
> On Wed, Jul 29, 2020 at 11:02:20PM -0400, Joel Fernandes (Google) wrote:
> > Add a warning if CPU being onlined did not report QS already. This is to
> > simplify the code in the CPU onlining path and also to make clear about
> > where QS is reported. The act of QS reporting in CPU onlining path is
> > is likely unnecessary as shown by code reading and testing with
> > rcutorture's TREE03 and hotplug parameters.
>
> How about something like this for the commit log?
>
> ------------------------------------------------------------------------
>
> Currently, rcu_cpu_starting() checks to see if the RCU core expects a
> quiescent state from the incoming CPU. However, the current interaction
> between RCU quiescent-state reporting and CPU-hotplug operations should
> mean that the incoming CPU never needs to report a quiescent state.
> First, the outgoing CPU reports a quiescent state if needed. Second,
> the race where the CPU is leaving just as RCU is initializing a new
> grace period is handled by an explicit check for this condition. Third,
> the CPU's leaf rcu_node structure's ->lock serializes these checks.
>
> This means that if rcu_cpu_starting() ever feels the need to report
> a quiescent state, then there is a bug somewhere in the CPU hotplug
> code or the RCU grace-period handling code. This commit therefore
> adds a WARN_ON_ONCE() to bring that bug to everyone's attention.
>
> ------------------------------------------------------------------------
>
> > Cc: Paul E. McKenney <[email protected]>
> > Cc: Neeraj Upadhyay <[email protected]>
> > Suggested-by: Paul E. McKenney <[email protected]>
> > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> >
> > ---
> > kernel/rcu/tree.c | 14 +++++++++++++-
> > 1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 65e1b5e92319..1e51962b565b 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3996,7 +3996,19 @@ void rcu_cpu_starting(unsigned int cpu)
> > rcu_gpnum_ovf(rnp, rdp); /* Offline-induced counter wrap? */
> > rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq);
> > rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags);
> > - if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */
> > +
> > + /*
> > + * Delete QS reporting from here, by June 2021, if warning does not
> > + * fire. Let us make the rules for reporting QS for an offline CPUs
> > + * more explicit. The CPU onlining path does not need to report QS for
> > + * an offline CPU. Either the QS should have reported during CPU
> > + * offlining, or during rcu_gp_init() if it detected a race with either
> > + * CPU offlining or task unblocking on previously offlined CPUs. Note
> > + * that the FQS loop also does not report QS for an offline CPU any
> > + * longer (unless it splats due to an offline CPU blocking the GP for
> > + * too long).
> > + */
>
> Let's leave at least the WARN_ON_ONCE() indefinitely. If you don't
> believe me, remove this code in your local tree, have someone give you
> several branches, some with bugs injected, and then try to figure out
> which have the bugs and then try to find those bugs.
>
> This is not a fastpath, so the overhead of the check is not a concern.
> Believe me, the difficulty of bug location without this check is a very
> real concern! ;-)
>
> On the other hand, I fully agree with the benefits of documenting the
> design rules. But is this really the best place to do that from the
> viewpoint of someone who is trying to figure out how RCU works?
I can move this comment to: "Hotplug CPU" section in
Documentation/RCU/Design/Requirements/Requirements.rst
And I could make the comment here as:
/*
* Delete QS reporting from here, by June 2021, if the warning does not
* fire. Leave the warning indefinitely. Check RCU design requirements
* in Documentation/RCU/ about CPU hotplug requirements.
*/
I will post my v3 with changes to the requirements document.
Let me know any other comments, thanks,
- Joel
On Thu, Jul 30, 2020 at 09:42:22PM -0400, Joel Fernandes wrote:
> On Thu, Jul 30, 2020 at 09:21:59AM -0700, Paul E. McKenney wrote:
> > On Wed, Jul 29, 2020 at 11:02:20PM -0400, Joel Fernandes (Google) wrote:
> > > Add a warning if CPU being onlined did not report QS already. This is to
> > > simplify the code in the CPU onlining path and also to make clear about
> > > where QS is reported. The act of QS reporting in CPU onlining path is
> > > is likely unnecessary as shown by code reading and testing with
> > > rcutorture's TREE03 and hotplug parameters.
> >
> > How about something like this for the commit log?
> >
> > ------------------------------------------------------------------------
> >
> > Currently, rcu_cpu_starting() checks to see if the RCU core expects a
> > quiescent state from the incoming CPU. However, the current interaction
> > between RCU quiescent-state reporting and CPU-hotplug operations should
> > mean that the incoming CPU never needs to report a quiescent state.
> > First, the outgoing CPU reports a quiescent state if needed. Second,
> > the race where the CPU is leaving just as RCU is initializing a new
> > grace period is handled by an explicit check for this condition. Third,
> > the CPU's leaf rcu_node structure's ->lock serializes these checks.
> >
> > This means that if rcu_cpu_starting() ever feels the need to report
> > a quiescent state, then there is a bug somewhere in the CPU hotplug
> > code or the RCU grace-period handling code. This commit therefore
> > adds a WARN_ON_ONCE() to bring that bug to everyone's attention.
> >
> > ------------------------------------------------------------------------
> >
> > > Cc: Paul E. McKenney <[email protected]>
> > > Cc: Neeraj Upadhyay <[email protected]>
> > > Suggested-by: Paul E. McKenney <[email protected]>
> > > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> > >
> > > ---
> > > kernel/rcu/tree.c | 14 +++++++++++++-
> > > 1 file changed, 13 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 65e1b5e92319..1e51962b565b 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3996,7 +3996,19 @@ void rcu_cpu_starting(unsigned int cpu)
> > > rcu_gpnum_ovf(rnp, rdp); /* Offline-induced counter wrap? */
> > > rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq);
> > > rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags);
> > > - if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */
> > > +
> > > + /*
> > > + * Delete QS reporting from here, by June 2021, if warning does not
> > > + * fire. Let us make the rules for reporting QS for an offline CPUs
> > > + * more explicit. The CPU onlining path does not need to report QS for
> > > + * an offline CPU. Either the QS should have reported during CPU
> > > + * offlining, or during rcu_gp_init() if it detected a race with either
> > > + * CPU offlining or task unblocking on previously offlined CPUs. Note
> > > + * that the FQS loop also does not report QS for an offline CPU any
> > > + * longer (unless it splats due to an offline CPU blocking the GP for
> > > + * too long).
> > > + */
> >
> > Let's leave at least the WARN_ON_ONCE() indefinitely. If you don't
> > believe me, remove this code in your local tree, have someone give you
> > several branches, some with bugs injected, and then try to figure out
> > which have the bugs and then try to find those bugs.
> >
> > This is not a fastpath, so the overhead of the check is not a concern.
> > Believe me, the difficulty of bug location without this check is a very
> > real concern! ;-)
> >
> > On the other hand, I fully agree with the benefits of documenting the
> > design rules. But is this really the best place to do that from the
> > viewpoint of someone who is trying to figure out how RCU works?
>
> I can move this comment to: "Hotplug CPU" section in
> Documentation/RCU/Design/Requirements/Requirements.rst
That sounds like a better place for it, very good!
> And I could make the comment here as:
> /*
> * Delete QS reporting from here, by June 2021, if the warning does not
> * fire. Leave the warning indefinitely. Check RCU design requirements
> * in Documentation/RCU/ about CPU hotplug requirements.
> */
Rather than decide for our future selves, could we please just suggest
reviewing this on June 2021? Or, given enterprise distro schedules,
2024. :-/
Thanx, Paul
> I will post my v3 with changes to the requirements document.
>
> Let me know any other comments, thanks,
>
> - Joel
>
Hi Paul,
On Thu, Jul 30, 2020 at 08:48:25PM -0700, Paul E. McKenney wrote:
[...]
> > And I could make the comment here as:
> > /*
> > * Delete QS reporting from here, by June 2021, if the warning does not
> > * fire. Leave the warning indefinitely. Check RCU design requirements
> > * in Documentation/RCU/ about CPU hotplug requirements.
> > */
>
> Rather than decide for our future selves, could we please just suggest
> reviewing this on June 2021? Or, given enterprise distro schedules,
> 2024. :-/
I am replacing it with the following, let me know if any objections, thanks:
+ * XXX: The following rcu_report_qs_rnp() is redundant. If the below
+ * warning does not fire, consider replacing it with the "else" block,
+ * by June 2021 or so. The rationale for this is as follows: The CPU
+ * onlining path does not need to report QS for an offline CPU. Either
+ * the QS should have reported during CPU offlining, or during
+ * rcu_gp_init() if it detected a race with either CPU offlining or
+ * task unblocking on previously offlined CPUs. To avoid deadlocks
+ * related to waiting on timers or cpu hotplug locks, only those paths
+ * do the QS reporting for offline CPUs.
thanks,
- Joel
>
> Thanx, Paul
>
> > I will post my v3 with changes to the requirements document.
> >
> > Let me know any other comments, thanks,
> >
> > - Joel
> >
On Fri, Aug 07, 2020 at 11:37:32AM -0400, Joel Fernandes wrote:
> Hi Paul,
>
> On Thu, Jul 30, 2020 at 08:48:25PM -0700, Paul E. McKenney wrote:
> [...]
> > > And I could make the comment here as:
> > > /*
> > > * Delete QS reporting from here, by June 2021, if the warning does not
> > > * fire. Leave the warning indefinitely. Check RCU design requirements
> > > * in Documentation/RCU/ about CPU hotplug requirements.
> > > */
> >
> > Rather than decide for our future selves, could we please just suggest
> > reviewing this on June 2021? Or, given enterprise distro schedules,
> > 2024. :-/
>
> I am replacing it with the following, let me know if any objections, thanks:
>
> + * XXX: The following rcu_report_qs_rnp() is redundant. If the below
> + * warning does not fire, consider replacing it with the "else" block,
> + * by June 2021 or so. The rationale for this is as follows: The CPU
> + * onlining path does not need to report QS for an offline CPU. Either
> + * the QS should have reported during CPU offlining, or during
> + * rcu_gp_init() if it detected a race with either CPU offlining or
> + * task unblocking on previously offlined CPUs. To avoid deadlocks
> + * related to waiting on timers or cpu hotplug locks, only those paths
> + * do the QS reporting for offline CPUs.
And you did mention you still want the warn-on indefinitely, so I'll document
that in the comment as well.
Now it looks like:
/*
* XXX: The following rcu_report_qs_rnp() is redundant. If the below
* warning does not fire, consider replacing it with the "else" block,
* by June 2021 or so (while keeping the warning). The rationale for
* this is as follows: The CPU onlining path does not need to report QS
* for an offline CPU. Either the QS should have reported during CPU
* offlining, or during rcu_gp_init() if it detected a race with either
* CPU offlining or task unblocking on a node with all of its CPUs
* previously offlined. To avoid deadlocks related to waiting on
* timers or cpu hotplug locks, only these paths do the QS reporting
* for offline CPUs making the following reporting redundant.
*/
thanks,
- Joel
>
> thanks,
>
> - Joel
>
> >
> > Thanx, Paul
> >
> > > I will post my v3 with changes to the requirements document.
> > >
> > > Let me know any other comments, thanks,
> > >
> > > - Joel
> > >