2020-02-15 00:29:55

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 0/4] SRCU updates for v5.7

Hello!

This series contains SRCU updates.

1. Fix __call_srcu()/process_srcu() datarace.

2. Fix __call_srcu()/srcu_get_delay() datarace.

3. Fix process_srcu()/srcu_batches_completed() datarace.

4. Add READ_ONCE() to srcu_struct ->srcu_gp_seq load.

Thanx, Paul

------------------------------------------------------------------------

srcutree.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)


2020-02-15 00:30:03

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 2/4] srcu: Fix __call_srcu()/srcu_get_delay() datarace

From: "Paul E. McKenney" <[email protected]>

The srcu_struct structure's ->srcu_gp_seq_needed_exp field is accessed
locklessly, so updates must use WRITE_ONCE(). This commit therefore
adds the needed WRITE_ONCE() invocations.

This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/srcutree.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index b1edac9..79848f7d 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -534,7 +534,7 @@ static void srcu_gp_end(struct srcu_struct *ssp)
rcu_seq_end(&ssp->srcu_gp_seq);
gpseq = rcu_seq_current(&ssp->srcu_gp_seq);
if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, gpseq))
- ssp->srcu_gp_seq_needed_exp = gpseq;
+ WRITE_ONCE(ssp->srcu_gp_seq_needed_exp, gpseq);
spin_unlock_irq_rcu_node(ssp);
mutex_unlock(&ssp->srcu_gp_mutex);
/* A new grace period can start at this point. But only one. */
@@ -614,7 +614,7 @@ static void srcu_funnel_exp_start(struct srcu_struct *ssp, struct srcu_node *snp
}
spin_lock_irqsave_rcu_node(ssp, flags);
if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, s))
- ssp->srcu_gp_seq_needed_exp = s;
+ WRITE_ONCE(ssp->srcu_gp_seq_needed_exp, s);
spin_unlock_irqrestore_rcu_node(ssp, flags);
}

@@ -674,7 +674,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
smp_store_release(&ssp->srcu_gp_seq_needed, s); /*^^^*/
}
if (!do_norm && ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, s))
- ssp->srcu_gp_seq_needed_exp = s;
+ WRITE_ONCE(ssp->srcu_gp_seq_needed_exp, s);

/* If grace period not already done and none in progress, start it. */
if (!rcu_seq_done(&ssp->srcu_gp_seq, s) &&
--
2.9.5

2020-02-15 00:30:35

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 4/4] srcu: Add READ_ONCE() to srcu_struct ->srcu_gp_seq load

From: "Paul E. McKenney" <[email protected]>

The load of the srcu_struct structure's ->srcu_gp_seq field in
srcu_funnel_gp_start() is lockless, so this commit adds the requisite
READ_ONCE().

This data race was reported by KCSAN.

Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/srcutree.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 119a373..90ab475 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -678,7 +678,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,

/* If grace period not already done and none in progress, start it. */
if (!rcu_seq_done(&ssp->srcu_gp_seq, s) &&
- rcu_seq_state(ssp->srcu_gp_seq) == SRCU_STATE_IDLE) {
+ rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)) == SRCU_STATE_IDLE) {
WARN_ON_ONCE(ULONG_CMP_GE(ssp->srcu_gp_seq, ssp->srcu_gp_seq_needed));
srcu_gp_start(ssp);
if (likely(srcu_init_done))
--
2.9.5

2020-02-15 00:30:56

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 1/4] srcu: Fix __call_srcu()/process_srcu() datarace

From: "Paul E. McKenney" <[email protected]>

The srcu_node structure's ->srcu_gp_seq_needed_exp field is accessed
locklessly, so updates must use WRITE_ONCE(). This commit therefore
adds the needed WRITE_ONCE() invocations.

This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/srcutree.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 657e6a7..b1edac9 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -550,7 +550,7 @@ static void srcu_gp_end(struct srcu_struct *ssp)
snp->srcu_have_cbs[idx] = gpseq;
rcu_seq_set_state(&snp->srcu_have_cbs[idx], 1);
if (ULONG_CMP_LT(snp->srcu_gp_seq_needed_exp, gpseq))
- snp->srcu_gp_seq_needed_exp = gpseq;
+ WRITE_ONCE(snp->srcu_gp_seq_needed_exp, gpseq);
mask = snp->srcu_data_have_cbs[idx];
snp->srcu_data_have_cbs[idx] = 0;
spin_unlock_irq_rcu_node(snp);
@@ -660,7 +660,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
if (snp == sdp->mynode)
snp->srcu_data_have_cbs[idx] |= sdp->grpmask;
if (!do_norm && ULONG_CMP_LT(snp->srcu_gp_seq_needed_exp, s))
- snp->srcu_gp_seq_needed_exp = s;
+ WRITE_ONCE(snp->srcu_gp_seq_needed_exp, s);
spin_unlock_irqrestore_rcu_node(snp, flags);
}

--
2.9.5

2020-02-15 00:31:11

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/rcu 3/4] srcu: Fix process_srcu()/srcu_batches_completed() datarace

From: "Paul E. McKenney" <[email protected]>

The srcu_struct structure's ->srcu_idx field is accessed locklessly,
so reads must use READ_ONCE(). This commit therefore adds the needed
READ_ONCE() invocation where it was missed.

This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/srcutree.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 79848f7d..119a373 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -1079,7 +1079,7 @@ EXPORT_SYMBOL_GPL(srcu_barrier);
*/
unsigned long srcu_batches_completed(struct srcu_struct *ssp)
{
- return ssp->srcu_idx;
+ return READ_ONCE(ssp->srcu_idx);
}
EXPORT_SYMBOL_GPL(srcu_batches_completed);

--
2.9.5

2020-02-17 13:03:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 1/4] srcu: Fix __call_srcu()/process_srcu() datarace

On Fri, Feb 14, 2020 at 04:29:29PM -0800, [email protected] wrote:
> From: "Paul E. McKenney" <[email protected]>
>
> The srcu_node structure's ->srcu_gp_seq_needed_exp field is accessed
> locklessly, so updates must use WRITE_ONCE(). This commit therefore
> adds the needed WRITE_ONCE() invocations.
>
> This data race was reported by KCSAN. Not appropriate for backporting
> due to failure being unlikely.

This does indeed look like there can be a failure; but this Changelog
fails to describe an actual problem.

> Signed-off-by: Paul E. McKenney <[email protected]>
> ---
> kernel/rcu/srcutree.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 657e6a7..b1edac9 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -550,7 +550,7 @@ static void srcu_gp_end(struct srcu_struct *ssp)
> snp->srcu_have_cbs[idx] = gpseq;
> rcu_seq_set_state(&snp->srcu_have_cbs[idx], 1);
> if (ULONG_CMP_LT(snp->srcu_gp_seq_needed_exp, gpseq))
> - snp->srcu_gp_seq_needed_exp = gpseq;
> + WRITE_ONCE(snp->srcu_gp_seq_needed_exp, gpseq);
> mask = snp->srcu_data_have_cbs[idx];
> snp->srcu_data_have_cbs[idx] = 0;
> spin_unlock_irq_rcu_node(snp);
> @@ -660,7 +660,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
> if (snp == sdp->mynode)
> snp->srcu_data_have_cbs[idx] |= sdp->grpmask;
> if (!do_norm && ULONG_CMP_LT(snp->srcu_gp_seq_needed_exp, s))
> - snp->srcu_gp_seq_needed_exp = s;
> + WRITE_ONCE(snp->srcu_gp_seq_needed_exp, s);
> spin_unlock_irqrestore_rcu_node(snp, flags);
> }
>
> --
> 2.9.5
>

2020-02-17 13:03:58

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 4/4] srcu: Add READ_ONCE() to srcu_struct ->srcu_gp_seq load

On Fri, Feb 14, 2020 at 04:29:32PM -0800, [email protected] wrote:
> From: "Paul E. McKenney" <[email protected]>
>
> The load of the srcu_struct structure's ->srcu_gp_seq field in
> srcu_funnel_gp_start() is lockless, so this commit adds the requisite
> READ_ONCE().
>
> This data race was reported by KCSAN.

But is there in actual fact a data-race? AFAICT this code was just fine.

> Signed-off-by: Paul E. McKenney <[email protected]>
> ---
> kernel/rcu/srcutree.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 119a373..90ab475 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -678,7 +678,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
>
> /* If grace period not already done and none in progress, start it. */
> if (!rcu_seq_done(&ssp->srcu_gp_seq, s) &&
> - rcu_seq_state(ssp->srcu_gp_seq) == SRCU_STATE_IDLE) {
> + rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)) == SRCU_STATE_IDLE) {
> WARN_ON_ONCE(ULONG_CMP_GE(ssp->srcu_gp_seq, ssp->srcu_gp_seq_needed));
> srcu_gp_start(ssp);
> if (likely(srcu_init_done))
> --
> 2.9.5
>

2020-02-17 17:09:42

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 1/4] srcu: Fix __call_srcu()/process_srcu() datarace

On Mon, Feb 17, 2020 at 01:42:31PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 14, 2020 at 04:29:29PM -0800, [email protected] wrote:
> > From: "Paul E. McKenney" <[email protected]>
> >
> > The srcu_node structure's ->srcu_gp_seq_needed_exp field is accessed
> > locklessly, so updates must use WRITE_ONCE(). This commit therefore
> > adds the needed WRITE_ONCE() invocations.
> >
> > This data race was reported by KCSAN. Not appropriate for backporting
> > due to failure being unlikely.
>
> This does indeed look like there can be a failure; but this Changelog
> fails to describe an actual problem.

Peter it sounds like you have a failure scenario in mind. Could you describe
more if so?

I am curious if you were thinking of invented-stores issue here.

For educational purposes, I was trying to come up with an example where my
compiler does something bad to code without WRITE_ONCE(). So far I only can
reproduce a write-tearing example when write with an immediate value is split
into 2 writes, like Will mentioned:
http://lore.kernel.org/r/20190821103200.kpufwtviqhpbuv2n@willie-the-truck
But that does not seem to apply to this code.

thanks,

- Joel


> > Signed-off-by: Paul E. McKenney <[email protected]>
> > ---
> > kernel/rcu/srcutree.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index 657e6a7..b1edac9 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -550,7 +550,7 @@ static void srcu_gp_end(struct srcu_struct *ssp)
> > snp->srcu_have_cbs[idx] = gpseq;
> > rcu_seq_set_state(&snp->srcu_have_cbs[idx], 1);
> > if (ULONG_CMP_LT(snp->srcu_gp_seq_needed_exp, gpseq))
> > - snp->srcu_gp_seq_needed_exp = gpseq;
> > + WRITE_ONCE(snp->srcu_gp_seq_needed_exp, gpseq);
> > mask = snp->srcu_data_have_cbs[idx];
> > snp->srcu_data_have_cbs[idx] = 0;
> > spin_unlock_irq_rcu_node(snp);
> > @@ -660,7 +660,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
> > if (snp == sdp->mynode)
> > snp->srcu_data_have_cbs[idx] |= sdp->grpmask;
> > if (!do_norm && ULONG_CMP_LT(snp->srcu_gp_seq_needed_exp, s))
> > - snp->srcu_gp_seq_needed_exp = s;
> > + WRITE_ONCE(snp->srcu_gp_seq_needed_exp, s);
> > spin_unlock_irqrestore_rcu_node(snp, flags);
> > }
> >
> > --
> > 2.9.5
> >

2020-02-17 17:13:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 1/4] srcu: Fix __call_srcu()/process_srcu() datarace

On Mon, Feb 17, 2020 at 12:01:57PM -0500, Joel Fernandes wrote:

> Peter it sounds like you have a failure scenario in mind. Could you describe
> more if so?
>
> I am curious if you were thinking of invented-stores issue here.
>
> For educational purposes, I was trying to come up with an example where my
> compiler does something bad to code without WRITE_ONCE(). So far I only can
> reproduce a write-tearing example when write with an immediate value is split
> into 2 writes, like Will mentioned:
> http://lore.kernel.org/r/20190821103200.kpufwtviqhpbuv2n@willie-the-truck
> But that does not seem to apply to this code.

> > > - snp->srcu_gp_seq_needed_exp = gpseq;
> > > + WRITE_ONCE(snp->srcu_gp_seq_needed_exp, gpseq);

Yeah, store tearing. No sane compiler will actually do that, but it is
allowed to do random permutations of byte stores just to fuck with us.

WRITE_ONCE() disallows that.

In that case, the READ_ONCE()s could observe garbage and the compare
might accidentally report the wrong thing.

2020-02-17 18:22:47

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 1/4] srcu: Fix __call_srcu()/process_srcu() datarace

On Mon, Feb 17, 2020 at 06:11:04PM +0100, Peter Zijlstra wrote:
> On Mon, Feb 17, 2020 at 12:01:57PM -0500, Joel Fernandes wrote:
>
> > Peter it sounds like you have a failure scenario in mind. Could you describe
> > more if so?
> >
> > I am curious if you were thinking of invented-stores issue here.
> >
> > For educational purposes, I was trying to come up with an example where my
> > compiler does something bad to code without WRITE_ONCE(). So far I only can
> > reproduce a write-tearing example when write with an immediate value is split
> > into 2 writes, like Will mentioned:
> > http://lore.kernel.org/r/20190821103200.kpufwtviqhpbuv2n@willie-the-truck
> > But that does not seem to apply to this code.
>
> > > > - snp->srcu_gp_seq_needed_exp = gpseq;
> > > > + WRITE_ONCE(snp->srcu_gp_seq_needed_exp, gpseq);
>
> Yeah, store tearing. No sane compiler will actually do that, but it is
> allowed to do random permutations of byte stores just to fuck with us.
>
> WRITE_ONCE() disallows that.
>
> In that case, the READ_ONCE()s could observe garbage and the compare
> might accidentally report the wrong thing.

Oh ok, I understand what you mean now. Thank you for clarification!

thanks,

- Joel

2020-02-17 18:29:36

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 1/4] srcu: Fix __call_srcu()/process_srcu() datarace

On Mon, Feb 17, 2020 at 01:42:31PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 14, 2020 at 04:29:29PM -0800, [email protected] wrote:
> > From: "Paul E. McKenney" <[email protected]>
> >
> > The srcu_node structure's ->srcu_gp_seq_needed_exp field is accessed
> > locklessly, so updates must use WRITE_ONCE(). This commit therefore
> > adds the needed WRITE_ONCE() invocations.
> >
> > This data race was reported by KCSAN. Not appropriate for backporting
> > due to failure being unlikely.
>
> This does indeed look like there can be a failure; but this Changelog
> fails to describe an actual problem.

As before, within RCU, the mere fact of a KCSAN report motivates a change.
I am not going to waste time brainstorming overly creative compiler
optimizations, present or future.

Thanx, Paul

> > Signed-off-by: Paul E. McKenney <[email protected]>
> > ---
> > kernel/rcu/srcutree.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index 657e6a7..b1edac9 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -550,7 +550,7 @@ static void srcu_gp_end(struct srcu_struct *ssp)
> > snp->srcu_have_cbs[idx] = gpseq;
> > rcu_seq_set_state(&snp->srcu_have_cbs[idx], 1);
> > if (ULONG_CMP_LT(snp->srcu_gp_seq_needed_exp, gpseq))
> > - snp->srcu_gp_seq_needed_exp = gpseq;
> > + WRITE_ONCE(snp->srcu_gp_seq_needed_exp, gpseq);
> > mask = snp->srcu_data_have_cbs[idx];
> > snp->srcu_data_have_cbs[idx] = 0;
> > spin_unlock_irq_rcu_node(snp);
> > @@ -660,7 +660,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
> > if (snp == sdp->mynode)
> > snp->srcu_data_have_cbs[idx] |= sdp->grpmask;
> > if (!do_norm && ULONG_CMP_LT(snp->srcu_gp_seq_needed_exp, s))
> > - snp->srcu_gp_seq_needed_exp = s;
> > + WRITE_ONCE(snp->srcu_gp_seq_needed_exp, s);
> > spin_unlock_irqrestore_rcu_node(snp, flags);
> > }
> >
> > --
> > 2.9.5
> >

2020-02-17 18:32:58

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 4/4] srcu: Add READ_ONCE() to srcu_struct ->srcu_gp_seq load

On Mon, Feb 17, 2020 at 01:45:07PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 14, 2020 at 04:29:32PM -0800, [email protected] wrote:
> > From: "Paul E. McKenney" <[email protected]>
> >
> > The load of the srcu_struct structure's ->srcu_gp_seq field in
> > srcu_funnel_gp_start() is lockless, so this commit adds the requisite
> > READ_ONCE().
> >
> > This data race was reported by KCSAN.
>
> But is there in actual fact a data-race? AFAICT this code was just fine.

Now that you mention it, the lock is held at that point, isn't it? So if
that READ_ONCE() actually does anything, there is a bug somewhere else.

Good catch, I will drop this patch, thank you!

Thanx, Paul

> > Signed-off-by: Paul E. McKenney <[email protected]>
> > ---
> > kernel/rcu/srcutree.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index 119a373..90ab475 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -678,7 +678,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
> >
> > /* If grace period not already done and none in progress, start it. */
> > if (!rcu_seq_done(&ssp->srcu_gp_seq, s) &&
> > - rcu_seq_state(ssp->srcu_gp_seq) == SRCU_STATE_IDLE) {
> > + rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)) == SRCU_STATE_IDLE) {
> > WARN_ON_ONCE(ULONG_CMP_GE(ssp->srcu_gp_seq, ssp->srcu_gp_seq_needed));
> > srcu_gp_start(ssp);
> > if (likely(srcu_init_done))
> > --
> > 2.9.5
> >

2020-02-17 23:07:19

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 4/4] srcu: Add READ_ONCE() to srcu_struct ->srcu_gp_seq load

On Mon, Feb 17, 2020 at 10:32:20AM -0800, Paul E. McKenney wrote:
> On Mon, Feb 17, 2020 at 01:45:07PM +0100, Peter Zijlstra wrote:
> > On Fri, Feb 14, 2020 at 04:29:32PM -0800, [email protected] wrote:
> > > From: "Paul E. McKenney" <[email protected]>
> > >
> > > The load of the srcu_struct structure's ->srcu_gp_seq field in
> > > srcu_funnel_gp_start() is lockless, so this commit adds the requisite
> > > READ_ONCE().
> > >
> > > This data race was reported by KCSAN.
> >
> > But is there in actual fact a data-race? AFAICT this code was just fine.
>
> Now that you mention it, the lock is held at that point, isn't it? So if
> that READ_ONCE() actually does anything, there is a bug somewhere else.
>
> Good catch, I will drop this patch, thank you!

But looking more closely, I saw a lockless update to that field. Which
can be argued to be sort of OK, but it definitely was not the intent.
So please see below for the updated patch.

Thanx, Paul

------------------------------------------------------------------------

commit 52324a7b8a025f47a1a1a9fbd23ffe59fa764764
Author: Paul E. McKenney <[email protected]>
Date: Fri Jan 3 11:42:05 2020 -0800

srcu: Hold srcu_struct ->lock when updating ->srcu_gp_seq

A read of the srcu_struct structure's ->srcu_gp_seq field should not
need READ_ONCE() when that structure's ->lock is held. Except that this
lock is not always held when updating this field. This commit therefore
acquires the lock around updates and removes a now-unneeded READ_ONCE().

This data race was reported by KCSAN.

Signed-off-by: Paul E. McKenney <[email protected]>
[ paulmck: Switch from READ_ONCE() to lock per Peter Zilstra question. ]

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 119a373..c19c1df 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -450,7 +450,7 @@ static void srcu_gp_start(struct srcu_struct *ssp)
spin_unlock_rcu_node(sdp); /* Interrupts remain disabled. */
smp_mb(); /* Order prior store to ->srcu_gp_seq_needed vs. GP start. */
rcu_seq_start(&ssp->srcu_gp_seq);
- state = rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq));
+ state = rcu_seq_state(ssp->srcu_gp_seq);
WARN_ON_ONCE(state != SRCU_STATE_SCAN1);
}

@@ -1130,7 +1130,9 @@ static void srcu_advance_state(struct srcu_struct *ssp)
return; /* readers present, retry later. */
}
srcu_flip(ssp);
+ spin_lock_irq_rcu_node(ssp);
rcu_seq_set_state(&ssp->srcu_gp_seq, SRCU_STATE_SCAN2);
+ spin_unlock_irq_rcu_node(ssp);
}

if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)) == SRCU_STATE_SCAN2) {

2020-02-18 11:45:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 4/4] srcu: Add READ_ONCE() to srcu_struct ->srcu_gp_seq load

On Mon, Feb 17, 2020 at 10:32:20AM -0800, Paul E. McKenney wrote:
> On Mon, Feb 17, 2020 at 01:45:07PM +0100, Peter Zijlstra wrote:
> > On Fri, Feb 14, 2020 at 04:29:32PM -0800, [email protected] wrote:
> > > From: "Paul E. McKenney" <[email protected]>
> > >
> > > The load of the srcu_struct structure's ->srcu_gp_seq field in
> > > srcu_funnel_gp_start() is lockless, so this commit adds the requisite
> > > READ_ONCE().
> > >
> > > This data race was reported by KCSAN.
> >
> > But is there in actual fact a data-race? AFAICT this code was just fine.
>
> Now that you mention it, the lock is held at that point, isn't it? So if
> that READ_ONCE() actually does anything, there is a bug somewhere else.
>
> Good catch, I will drop this patch, thank you!

Well, I didn't get further than the Changelog fails to describe an
actual problem and it looks like compare-against-a-constant.

(worse, it masks off everything but the 2 lowest bits, so even if there
was a problem with load-tearing, it still wouldn't matter)

I'm not going to argue with you if you want to use READ_ONCE() vs
data_race() and a comment to denote false-positive KCSAN warnings, but I
do feel somewhat strongly that the Changelog should describe the actual
problem -- if there is one -- or just flat out state that this is to
make KCSAN shut up but the code is fine.

That is; every KCSAN report should be analysed, right? All I'm asking is
for that analysis to end up in the Changelog.

> > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > ---
> > > kernel/rcu/srcutree.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > > index 119a373..90ab475 100644
> > > --- a/kernel/rcu/srcutree.c
> > > +++ b/kernel/rcu/srcutree.c
> > > @@ -678,7 +678,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
> > >
> > > /* If grace period not already done and none in progress, start it. */
> > > if (!rcu_seq_done(&ssp->srcu_gp_seq, s) &&
> > > - rcu_seq_state(ssp->srcu_gp_seq) == SRCU_STATE_IDLE) {
> > > + rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)) == SRCU_STATE_IDLE) {
> > > WARN_ON_ONCE(ULONG_CMP_GE(ssp->srcu_gp_seq, ssp->srcu_gp_seq_needed));
> > > srcu_gp_start(ssp);
> > > if (likely(srcu_init_done))
> > > --
> > > 2.9.5
> > >

2020-02-18 11:48:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 4/4] srcu: Add READ_ONCE() to srcu_struct ->srcu_gp_seq load

On Mon, Feb 17, 2020 at 03:06:57PM -0800, Paul E. McKenney wrote:
> commit 52324a7b8a025f47a1a1a9fbd23ffe59fa764764
> Author: Paul E. McKenney <[email protected]>
> Date: Fri Jan 3 11:42:05 2020 -0800
>
> srcu: Hold srcu_struct ->lock when updating ->srcu_gp_seq
>
> A read of the srcu_struct structure's ->srcu_gp_seq field should not
> need READ_ONCE() when that structure's ->lock is held. Except that this
> lock is not always held when updating this field. This commit therefore
> acquires the lock around updates and removes a now-unneeded READ_ONCE().
>
> This data race was reported by KCSAN.
>
> Signed-off-by: Paul E. McKenney <[email protected]>

Acked-by: Peter Zijlstra (Intel) <[email protected]>

>
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 119a373..c19c1df 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -450,7 +450,7 @@ static void srcu_gp_start(struct srcu_struct *ssp)
> spin_unlock_rcu_node(sdp); /* Interrupts remain disabled. */
> smp_mb(); /* Order prior store to ->srcu_gp_seq_needed vs. GP start. */
> rcu_seq_start(&ssp->srcu_gp_seq);
> - state = rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq));
> + state = rcu_seq_state(ssp->srcu_gp_seq);
> WARN_ON_ONCE(state != SRCU_STATE_SCAN1);
> }
>
> @@ -1130,7 +1130,9 @@ static void srcu_advance_state(struct srcu_struct *ssp)
> return; /* readers present, retry later. */
> }
> srcu_flip(ssp);
> + spin_lock_irq_rcu_node(ssp);
> rcu_seq_set_state(&ssp->srcu_gp_seq, SRCU_STATE_SCAN2);
> + spin_unlock_irq_rcu_node(ssp);
> }
>
> if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)) == SRCU_STATE_SCAN2) {

2020-02-18 16:35:45

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 4/4] srcu: Add READ_ONCE() to srcu_struct ->srcu_gp_seq load

On Tue, Feb 18, 2020 at 12:43:34PM +0100, Peter Zijlstra wrote:
> On Mon, Feb 17, 2020 at 10:32:20AM -0800, Paul E. McKenney wrote:
> > On Mon, Feb 17, 2020 at 01:45:07PM +0100, Peter Zijlstra wrote:
> > > On Fri, Feb 14, 2020 at 04:29:32PM -0800, [email protected] wrote:
> > > > From: "Paul E. McKenney" <[email protected]>
> > > >
> > > > The load of the srcu_struct structure's ->srcu_gp_seq field in
> > > > srcu_funnel_gp_start() is lockless, so this commit adds the requisite
> > > > READ_ONCE().
> > > >
> > > > This data race was reported by KCSAN.
> > >
> > > But is there in actual fact a data-race? AFAICT this code was just fine.
> >
> > Now that you mention it, the lock is held at that point, isn't it? So if
> > that READ_ONCE() actually does anything, there is a bug somewhere else.
> >
> > Good catch, I will drop this patch, thank you!
>
> Well, I didn't get further than the Changelog fails to describe an
> actual problem and it looks like compare-against-a-constant.
>
> (worse, it masks off everything but the 2 lowest bits, so even if there
> was a problem with load-tearing, it still wouldn't matter)

There is still the possibility of load fusing. And the possibility
of defending against possible future changes as well as the current
snapshot of the code base.

> I'm not going to argue with you if you want to use READ_ONCE() vs
> data_race() and a comment to denote false-positive KCSAN warnings, but I
> do feel somewhat strongly that the Changelog should describe the actual
> problem -- if there is one -- or just flat out state that this is to
> make KCSAN shut up but the code is fine.

The problem is that "the code is fine" is highly subjective and varies
over time. :-/

But in this case there was a real problem, just that I got confused
when analyzing.

> That is; every KCSAN report should be analysed, right? All I'm asking is
> for that analysis to end up in the Changelog.

Before responding further, I have to ask...

Are you intending your "every KCSAN report should be analyzed" to apply
globally or just when someone creates a patch based on such a report?

In any case, you have acked this patch's successor (thank you very
much!), so on this specific patch (or more accurately, its successor)
I presume that we are all good.

Thanx, Paul

> > > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > > ---
> > > > kernel/rcu/srcutree.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > > > index 119a373..90ab475 100644
> > > > --- a/kernel/rcu/srcutree.c
> > > > +++ b/kernel/rcu/srcutree.c
> > > > @@ -678,7 +678,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
> > > >
> > > > /* If grace period not already done and none in progress, start it. */
> > > > if (!rcu_seq_done(&ssp->srcu_gp_seq, s) &&
> > > > - rcu_seq_state(ssp->srcu_gp_seq) == SRCU_STATE_IDLE) {
> > > > + rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)) == SRCU_STATE_IDLE) {
> > > > WARN_ON_ONCE(ULONG_CMP_GE(ssp->srcu_gp_seq, ssp->srcu_gp_seq_needed));
> > > > srcu_gp_start(ssp);
> > > > if (likely(srcu_init_done))
> > > > --
> > > > 2.9.5
> > > >

2020-02-18 20:28:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 4/4] srcu: Add READ_ONCE() to srcu_struct ->srcu_gp_seq load

On Tue, Feb 18, 2020 at 08:34:05AM -0800, Paul E. McKenney wrote:
> On Tue, Feb 18, 2020 at 12:43:34PM +0100, Peter Zijlstra wrote:

> > Well, I didn't get further than the Changelog fails to describe an
> > actual problem and it looks like compare-against-a-constant.
> >
> > (worse, it masks off everything but the 2 lowest bits, so even if there
> > was a problem with load-tearing, it still wouldn't matter)
>
> There is still the possibility of load fusing.

Agreed; that can be an issue. But if so, that then needs to be stated.

> And the possibility
> of defending against possible future changes as well as the current
> snapshot of the code base.

Sure; and like I said, if you want to use READ_ONCE() I'm not going to
argue.

> > I'm not going to argue with you if you want to use READ_ONCE() vs
> > data_race() and a comment to denote false-positive KCSAN warnings, but I
> > do feel somewhat strongly that the Changelog should describe the actual
> > problem -- if there is one -- or just flat out state that this is to
> > make KCSAN shut up but the code is fine.
>
> The problem is that "the code is fine" is highly subjective and varies
> over time. :-/
>
> But in this case there was a real problem, just that I got confused
> when analyzing.

Shit happens :-)

> > That is; every KCSAN report should be analysed, right? All I'm asking is
> > for that analysis to end up in the Changelog.
>
> Before responding further, I have to ask...
>
> Are you intending your "every KCSAN report should be analyzed" to apply
> globally or just when someone creates a patch based on such a report?

Ideally every KCSAN report, but that is a longer term effort. But
specifically, when someone has written a patch, I expect that same
someone to have analysed the code. Then it also makes sense to put that
in the Changelog.

> In any case, you have acked this patch's successor (thank you very
> much!), so on this specific patch (or more accurately, its successor)
> I presume that we are all good.

We are. That new patch describes a clear problem and fixes it.

Anyway, the reaoson I'm being difficuly is partly because on the one
hand I'm just an annoying pendant at times, but also because I've seen
a bunch of, let's say, hasty, KCSAN patches.

2020-02-18 21:38:26

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 4/4] srcu: Add READ_ONCE() to srcu_struct ->srcu_gp_seq load

On Tue, Feb 18, 2020 at 09:26:44PM +0100, Peter Zijlstra wrote:
> On Tue, Feb 18, 2020 at 08:34:05AM -0800, Paul E. McKenney wrote:
> > On Tue, Feb 18, 2020 at 12:43:34PM +0100, Peter Zijlstra wrote:
>
> > > Well, I didn't get further than the Changelog fails to describe an
> > > actual problem and it looks like compare-against-a-constant.
> > >
> > > (worse, it masks off everything but the 2 lowest bits, so even if there
> > > was a problem with load-tearing, it still wouldn't matter)
> >
> > There is still the possibility of load fusing.
>
> Agreed; that can be an issue. But if so, that then needs to be stated.
>
> > And the possibility
> > of defending against possible future changes as well as the current
> > snapshot of the code base.
>
> Sure; and like I said, if you want to use READ_ONCE() I'm not going to
> argue.
>
> > > I'm not going to argue with you if you want to use READ_ONCE() vs
> > > data_race() and a comment to denote false-positive KCSAN warnings, but I
> > > do feel somewhat strongly that the Changelog should describe the actual
> > > problem -- if there is one -- or just flat out state that this is to
> > > make KCSAN shut up but the code is fine.
> >
> > The problem is that "the code is fine" is highly subjective and varies
> > over time. :-/
> >
> > But in this case there was a real problem, just that I got confused
> > when analyzing.
>
> Shit happens :-)
>
> > > That is; every KCSAN report should be analysed, right? All I'm asking is
> > > for that analysis to end up in the Changelog.
> >
> > Before responding further, I have to ask...
> >
> > Are you intending your "every KCSAN report should be analyzed" to apply
> > globally or just when someone creates a patch based on such a report?
>
> Ideally every KCSAN report, but that is a longer term effort. But
> specifically, when someone has written a patch, I expect that same
> someone to have analysed the code. Then it also makes sense to put that
> in the Changelog.
>
> > In any case, you have acked this patch's successor (thank you very
> > much!), so on this specific patch (or more accurately, its successor)
> > I presume that we are all good.
>
> We are. That new patch describes a clear problem and fixes it.
>
> Anyway, the reaoson I'm being difficuly is partly because on the one
> hand I'm just an annoying pendant at times, but also because I've seen
> a bunch of, let's say, hasty, KCSAN patches.

Agreed. I briefly considered putting KCSAN for RCU on the newbie list,
but looking at a few of them put paid to that idea. Responding to them
properly does require a reasonably deep understanding of the code.

Thanx, Paul

2020-02-18 21:49:13

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/rcu 4/4] srcu: Add READ_ONCE() to srcu_struct ->srcu_gp_seq load

On Tue, Feb 18, 2020 at 12:46:31PM +0100, Peter Zijlstra wrote:
> On Mon, Feb 17, 2020 at 03:06:57PM -0800, Paul E. McKenney wrote:
> > commit 52324a7b8a025f47a1a1a9fbd23ffe59fa764764
> > Author: Paul E. McKenney <[email protected]>
> > Date: Fri Jan 3 11:42:05 2020 -0800
> >
> > srcu: Hold srcu_struct ->lock when updating ->srcu_gp_seq
> >
> > A read of the srcu_struct structure's ->srcu_gp_seq field should not
> > need READ_ONCE() when that structure's ->lock is held. Except that this
> > lock is not always held when updating this field. This commit therefore
> > acquires the lock around updates and removes a now-unneeded READ_ONCE().
> >
> > This data race was reported by KCSAN.
> >
> > Signed-off-by: Paul E. McKenney <[email protected]>
>
> Acked-by: Peter Zijlstra (Intel) <[email protected]>

Applied, thank you!

Thanx, Paul

> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index 119a373..c19c1df 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -450,7 +450,7 @@ static void srcu_gp_start(struct srcu_struct *ssp)
> > spin_unlock_rcu_node(sdp); /* Interrupts remain disabled. */
> > smp_mb(); /* Order prior store to ->srcu_gp_seq_needed vs. GP start. */
> > rcu_seq_start(&ssp->srcu_gp_seq);
> > - state = rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq));
> > + state = rcu_seq_state(ssp->srcu_gp_seq);
> > WARN_ON_ONCE(state != SRCU_STATE_SCAN1);
> > }
> >
> > @@ -1130,7 +1130,9 @@ static void srcu_advance_state(struct srcu_struct *ssp)
> > return; /* readers present, retry later. */
> > }
> > srcu_flip(ssp);
> > + spin_lock_irq_rcu_node(ssp);
> > rcu_seq_set_state(&ssp->srcu_gp_seq, SRCU_STATE_SCAN2);
> > + spin_unlock_irq_rcu_node(ssp);
> > }
> >
> > if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)) == SRCU_STATE_SCAN2) {