2024-03-04 11:09:27

by linke li

[permalink] [raw]
Subject: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

Some changes are done to fix a data race in commit 202489101f2e ("rcutorture: Fix rcu_torture_one_read()/rcu_torture_writer() data race")

{
int i;

- i = rp->rtort_pipe_count;
+ i = READ_ONCE(rp->rtort_pipe_count);
if (i > RCU_TORTURE_PIPE_LEN)
i = RCU_TORTURE_PIPE_LEN;
atomic_inc(&rcu_torture_wcount[i]);
- if (++rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
+ WRITE_ONCE(rp->rtort_pipe_count, i + 1);
+ if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
rp->rtort_mbtest = 0;
return true;
}

But ++rp->rtort_pipe_count is meant to add itself by 1, not give i+1 to
rp->rtort_pipe_count, because rp->rtort_pipe_count may write by
rcu_torture_writer() concurrently.

Also, rp->rtort_pipe_count in the next line should be read using
READ_ONCE() because of data race.

Signed-off-by: linke li <[email protected]>
---
kernel/rcu/rcutorture.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 7567ca8e743c..00059ace4fd5 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -465,8 +465,8 @@ rcu_torture_pipe_update_one(struct rcu_torture *rp)
if (i > RCU_TORTURE_PIPE_LEN)
i = RCU_TORTURE_PIPE_LEN;
atomic_inc(&rcu_torture_wcount[i]);
- WRITE_ONCE(rp->rtort_pipe_count, i + 1);
- if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
+ WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
+ if (READ_ONCE(rp->rtort_pipe_count) >= RCU_TORTURE_PIPE_LEN) {
rp->rtort_mbtest = 0;
return true;
}
--
2.39.3 (Apple Git-145)



2024-03-04 16:20:54

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug



On 3/4/2024 5:54 AM, linke li wrote:
> Some changes are done to fix a data race in commit 202489101f2e ("rcutorture: Fix rcu_torture_one_read()/rcu_torture_writer() data race")
>
> {
> int i;
>
> - i = rp->rtort_pipe_count;
> + i = READ_ONCE(rp->rtort_pipe_count);
> if (i > RCU_TORTURE_PIPE_LEN)
> i = RCU_TORTURE_PIPE_LEN;
> atomic_inc(&rcu_torture_wcount[i]);
> - if (++rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
> + WRITE_ONCE(rp->rtort_pipe_count, i + 1);
> + if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
> rp->rtort_mbtest = 0;
> return true;
> }
>
> But ++rp->rtort_pipe_count is meant to add itself by 1, not give i+1 to
> rp->rtort_pipe_count, because rp->rtort_pipe_count may write by
> rcu_torture_writer() concurrently.
>
> Also, rp->rtort_pipe_count in the next line should be read using
> READ_ONCE() because of data race.
>
> Signed-off-by: linke li <[email protected]>
> ---
> kernel/rcu/rcutorture.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> index 7567ca8e743c..00059ace4fd5 100644
> --- a/kernel/rcu/rcutorture.c
> +++ b/kernel/rcu/rcutorture.c
> @@ -465,8 +465,8 @@ rcu_torture_pipe_update_one(struct rcu_torture *rp)
> if (i > RCU_TORTURE_PIPE_LEN)
> i = RCU_TORTURE_PIPE_LEN;
> atomic_inc(&rcu_torture_wcount[i]);
> - WRITE_ONCE(rp->rtort_pipe_count, i + 1);
> - if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
> + WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
> + if (READ_ONCE(rp->rtort_pipe_count) >= RCU_TORTURE_PIPE_LEN) {

I want to say, I am not convinced with the patch because what's wrong with
writing to an old index?

You win/lose the race anyway, say the CPU executed the WRITE_ONCE() a bit too
early/late and another WRITE_ONCE() lost/won, regardless of whether you wrote
the "incremented i" or "the increment from the latest value of pipe_count".

Anyway, a slightly related/different question:

Should that:
WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);

Be:
WRITE_ONCE(rp->rtort_pipe_count, READ_ONCE(rp->rtort_pipe_count) + 1);

?

thanks,

- Joel

2024-03-04 17:44:24

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Mon, Mar 04, 2024 at 11:19:21AM -0500, Joel Fernandes wrote:
>
>
> On 3/4/2024 5:54 AM, linke li wrote:
> > Some changes are done to fix a data race in commit 202489101f2e ("rcutorture: Fix rcu_torture_one_read()/rcu_torture_writer() data race")
> >
> > {
> > int i;
> >
> > - i = rp->rtort_pipe_count;
> > + i = READ_ONCE(rp->rtort_pipe_count);
> > if (i > RCU_TORTURE_PIPE_LEN)
> > i = RCU_TORTURE_PIPE_LEN;
> > atomic_inc(&rcu_torture_wcount[i]);
> > - if (++rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
> > + WRITE_ONCE(rp->rtort_pipe_count, i + 1);
> > + if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
> > rp->rtort_mbtest = 0;
> > return true;
> > }
> >
> > But ++rp->rtort_pipe_count is meant to add itself by 1, not give i+1 to
> > rp->rtort_pipe_count, because rp->rtort_pipe_count may write by
> > rcu_torture_writer() concurrently.
> >
> > Also, rp->rtort_pipe_count in the next line should be read using
> > READ_ONCE() because of data race.
> >
> > Signed-off-by: linke li <[email protected]>
> > ---
> > kernel/rcu/rcutorture.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > index 7567ca8e743c..00059ace4fd5 100644
> > --- a/kernel/rcu/rcutorture.c
> > +++ b/kernel/rcu/rcutorture.c
> > @@ -465,8 +465,8 @@ rcu_torture_pipe_update_one(struct rcu_torture *rp)
> > if (i > RCU_TORTURE_PIPE_LEN)
> > i = RCU_TORTURE_PIPE_LEN;
> > atomic_inc(&rcu_torture_wcount[i]);
> > - WRITE_ONCE(rp->rtort_pipe_count, i + 1);
> > - if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
> > + WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
> > + if (READ_ONCE(rp->rtort_pipe_count) >= RCU_TORTURE_PIPE_LEN) {
>
> I want to say, I am not convinced with the patch because what's wrong with
> writing to an old index?
>
> You win/lose the race anyway, say the CPU executed the WRITE_ONCE() a bit too
> early/late and another WRITE_ONCE() lost/won, regardless of whether you wrote
> the "incremented i" or "the increment from the latest value of pipe_count".
>
> Anyway, a slightly related/different question:
>
> Should that:
> WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
>
> Be:
> WRITE_ONCE(rp->rtort_pipe_count, READ_ONCE(rp->rtort_pipe_count) + 1);
>
> ?

Thank you both!

At first glance, I would argue for something like this:

------------------------------------------------------------------------

static bool
rcu_torture_pipe_update_one(struct rcu_torture *rp)
{
int i;
struct rcu_torture_reader_check *rtrcp = READ_ONCE(rp->rtort_chkp);

if (rtrcp) {
WRITE_ONCE(rp->rtort_chkp, NULL);
smp_store_release(&rtrcp->rtc_ready, 1); // Pair with smp_load_acquire().
}
i = READ_ONCE(rp->rtort_pipe_count) + 1;
if (i > RCU_TORTURE_PIPE_LEN)
i = RCU_TORTURE_PIPE_LEN;
atomic_inc(&rcu_torture_wcount[i]);
WRITE_ONCE(rp->rtort_pipe_count, i);
if (i >= RCU_TORTURE_PIPE_LEN) {
rp->rtort_mbtest = 0;
return true;
}
return false;
}

------------------------------------------------------------------------

That is, move the increment to the read and replace the re-read with
the value "i" that was just written.

Thoughts?

Thanx, Paul

2024-03-04 19:10:28

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug



On 3/4/2024 12:14 PM, Paul E. McKenney wrote:
> On Mon, Mar 04, 2024 at 11:19:21AM -0500, Joel Fernandes wrote:
>>
>>
>> On 3/4/2024 5:54 AM, linke li wrote:
>>> Some changes are done to fix a data race in commit 202489101f2e ("rcutorture: Fix rcu_torture_one_read()/rcu_torture_writer() data race")
>>>
>>> {
>>> int i;
>>>
>>> - i = rp->rtort_pipe_count;
>>> + i = READ_ONCE(rp->rtort_pipe_count);
>>> if (i > RCU_TORTURE_PIPE_LEN)
>>> i = RCU_TORTURE_PIPE_LEN;
>>> atomic_inc(&rcu_torture_wcount[i]);
>>> - if (++rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
>>> + WRITE_ONCE(rp->rtort_pipe_count, i + 1);
>>> + if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
>>> rp->rtort_mbtest = 0;
>>> return true;
>>> }
>>>
>>> But ++rp->rtort_pipe_count is meant to add itself by 1, not give i+1 to
>>> rp->rtort_pipe_count, because rp->rtort_pipe_count may write by
>>> rcu_torture_writer() concurrently.
>>>
>>> Also, rp->rtort_pipe_count in the next line should be read using
>>> READ_ONCE() because of data race.
>>>
>>> Signed-off-by: linke li <[email protected]>
>>> ---
>>> kernel/rcu/rcutorture.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
>>> index 7567ca8e743c..00059ace4fd5 100644
>>> --- a/kernel/rcu/rcutorture.c
>>> +++ b/kernel/rcu/rcutorture.c
>>> @@ -465,8 +465,8 @@ rcu_torture_pipe_update_one(struct rcu_torture *rp)
>>> if (i > RCU_TORTURE_PIPE_LEN)
>>> i = RCU_TORTURE_PIPE_LEN;
>>> atomic_inc(&rcu_torture_wcount[i]);
>>> - WRITE_ONCE(rp->rtort_pipe_count, i + 1);
>>> - if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
>>> + WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
>>> + if (READ_ONCE(rp->rtort_pipe_count) >= RCU_TORTURE_PIPE_LEN) {
>>
>> I want to say, I am not convinced with the patch because what's wrong with
>> writing to an old index?
>>
>> You win/lose the race anyway, say the CPU executed the WRITE_ONCE() a bit too
>> early/late and another WRITE_ONCE() lost/won, regardless of whether you wrote
>> the "incremented i" or "the increment from the latest value of pipe_count".
>>
>> Anyway, a slightly related/different question:
>>
>> Should that:
>> WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
>>
>> Be:
>> WRITE_ONCE(rp->rtort_pipe_count, READ_ONCE(rp->rtort_pipe_count) + 1);
>>
>> ?
>
> Thank you both!
>
> At first glance, I would argue for something like this:
>
> ------------------------------------------------------------------------
>
> static bool
> rcu_torture_pipe_update_one(struct rcu_torture *rp)
> {
> int i;
> struct rcu_torture_reader_check *rtrcp = READ_ONCE(rp->rtort_chkp);
>
> if (rtrcp) {
> WRITE_ONCE(rp->rtort_chkp, NULL);
> smp_store_release(&rtrcp->rtc_ready, 1); // Pair with smp_load_acquire().
> }
> i = READ_ONCE(rp->rtort_pipe_count) + 1;
> if (i > RCU_TORTURE_PIPE_LEN)
> i = RCU_TORTURE_PIPE_LEN;
> atomic_inc(&rcu_torture_wcount[i]);
> WRITE_ONCE(rp->rtort_pipe_count, i);
> if (i >= RCU_TORTURE_PIPE_LEN) {
> rp->rtort_mbtest = 0;
> return true;
> }
> return false;
> }
>
> ------------------------------------------------------------------------
>
> That is, move the increment to the read and replace the re-read with
> the value "i" that was just written.

But that changes the original logic as well? It looks like with the above
change, you're now writing to rcu_torture_wcount[READ_ONCE(rp->rtort_pipe_count)
+ 1] instead of rcu_torture_wcount[READ_ONCE(rp->rtort_pipe_count)].

I think that might break rcutorture, because there is an increment outside of
the first 2 entries in rcu_torture_wcount but not sure (need to look more).

Thanks.

2024-03-04 19:44:21

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Mon, Mar 04, 2024 at 02:10:09PM -0500, Joel Fernandes wrote:
>
>
> On 3/4/2024 12:14 PM, Paul E. McKenney wrote:
> > On Mon, Mar 04, 2024 at 11:19:21AM -0500, Joel Fernandes wrote:
> >>
> >>
> >> On 3/4/2024 5:54 AM, linke li wrote:
> >>> Some changes are done to fix a data race in commit 202489101f2e ("rcutorture: Fix rcu_torture_one_read()/rcu_torture_writer() data race")
> >>>
> >>> {
> >>> int i;
> >>>
> >>> - i = rp->rtort_pipe_count;
> >>> + i = READ_ONCE(rp->rtort_pipe_count);
> >>> if (i > RCU_TORTURE_PIPE_LEN)
> >>> i = RCU_TORTURE_PIPE_LEN;
> >>> atomic_inc(&rcu_torture_wcount[i]);
> >>> - if (++rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
> >>> + WRITE_ONCE(rp->rtort_pipe_count, i + 1);
> >>> + if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
> >>> rp->rtort_mbtest = 0;
> >>> return true;
> >>> }
> >>>
> >>> But ++rp->rtort_pipe_count is meant to add itself by 1, not give i+1 to
> >>> rp->rtort_pipe_count, because rp->rtort_pipe_count may write by
> >>> rcu_torture_writer() concurrently.
> >>>
> >>> Also, rp->rtort_pipe_count in the next line should be read using
> >>> READ_ONCE() because of data race.
> >>>
> >>> Signed-off-by: linke li <[email protected]>
> >>> ---
> >>> kernel/rcu/rcutorture.c | 4 ++--
> >>> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> >>> index 7567ca8e743c..00059ace4fd5 100644
> >>> --- a/kernel/rcu/rcutorture.c
> >>> +++ b/kernel/rcu/rcutorture.c
> >>> @@ -465,8 +465,8 @@ rcu_torture_pipe_update_one(struct rcu_torture *rp)
> >>> if (i > RCU_TORTURE_PIPE_LEN)
> >>> i = RCU_TORTURE_PIPE_LEN;
> >>> atomic_inc(&rcu_torture_wcount[i]);
> >>> - WRITE_ONCE(rp->rtort_pipe_count, i + 1);
> >>> - if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
> >>> + WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
> >>> + if (READ_ONCE(rp->rtort_pipe_count) >= RCU_TORTURE_PIPE_LEN) {
> >>
> >> I want to say, I am not convinced with the patch because what's wrong with
> >> writing to an old index?
> >>
> >> You win/lose the race anyway, say the CPU executed the WRITE_ONCE() a bit too
> >> early/late and another WRITE_ONCE() lost/won, regardless of whether you wrote
> >> the "incremented i" or "the increment from the latest value of pipe_count".
> >>
> >> Anyway, a slightly related/different question:
> >>
> >> Should that:
> >> WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
> >>
> >> Be:
> >> WRITE_ONCE(rp->rtort_pipe_count, READ_ONCE(rp->rtort_pipe_count) + 1);
> >>
> >> ?
> >
> > Thank you both!
> >
> > At first glance, I would argue for something like this:
> >
> > ------------------------------------------------------------------------
> >
> > static bool
> > rcu_torture_pipe_update_one(struct rcu_torture *rp)
> > {
> > int i;
> > struct rcu_torture_reader_check *rtrcp = READ_ONCE(rp->rtort_chkp);
> >
> > if (rtrcp) {
> > WRITE_ONCE(rp->rtort_chkp, NULL);
> > smp_store_release(&rtrcp->rtc_ready, 1); // Pair with smp_load_acquire().
> > }
> > i = READ_ONCE(rp->rtort_pipe_count) + 1;
> > if (i > RCU_TORTURE_PIPE_LEN)
> > i = RCU_TORTURE_PIPE_LEN;
> > atomic_inc(&rcu_torture_wcount[i]);
> > WRITE_ONCE(rp->rtort_pipe_count, i);
> > if (i >= RCU_TORTURE_PIPE_LEN) {
> > rp->rtort_mbtest = 0;
> > return true;
> > }
> > return false;
> > }
> >
> > ------------------------------------------------------------------------
> >
> > That is, move the increment to the read and replace the re-read with
> > the value "i" that was just written.
>
> But that changes the original logic as well? It looks like with the above
> change, you're now writing to rcu_torture_wcount[READ_ONCE(rp->rtort_pipe_count)
> + 1] instead of rcu_torture_wcount[READ_ONCE(rp->rtort_pipe_count)].
>
> I think that might break rcutorture, because there is an increment outside of
> the first 2 entries in rcu_torture_wcount but not sure (need to look more).

Good point on never incrementing the zeroth entry! Clearly I should
have waited before replying.

How about the following?

------------------------------------------------------------------------

static bool
rcu_torture_pipe_update_one(struct rcu_torture *rp)
{
int i;
struct rcu_torture_reader_check *rtrcp = READ_ONCE(rp->rtort_chkp);

if (rtrcp) {
WRITE_ONCE(rp->rtort_chkp, NULL);
smp_store_release(&rtrcp->rtc_ready, 1); // Pair with smp_load_acquire().
}
i = READ_ONCE(rp->rtort_pipe_count);
if (i > RCU_TORTURE_PIPE_LEN)
i = RCU_TORTURE_PIPE_LEN;
atomic_inc(&rcu_torture_wcount[i]);
WRITE_ONCE(rp->rtort_pipe_count, i + 1);
if (i + 1 >= RCU_TORTURE_PIPE_LEN) {
rp->rtort_mbtest = 0;
return true;
}
return false;
}

------------------------------------------------------------------------

Thanx, Paul

2024-03-04 20:13:37

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug



On 3/4/2024 2:44 PM, Paul E. McKenney wrote:
> On Mon, Mar 04, 2024 at 02:10:09PM -0500, Joel Fernandes wrote:
>>
>>
>> On 3/4/2024 12:14 PM, Paul E. McKenney wrote:
>>> On Mon, Mar 04, 2024 at 11:19:21AM -0500, Joel Fernandes wrote:
>>>>
>>>>
>>>> On 3/4/2024 5:54 AM, linke li wrote:
>>>>> Some changes are done to fix a data race in commit 202489101f2e ("rcutorture: Fix rcu_torture_one_read()/rcu_torture_writer() data race")
>>>>>
>>>>> {
>>>>> int i;
>>>>>
>>>>> - i = rp->rtort_pipe_count;
>>>>> + i = READ_ONCE(rp->rtort_pipe_count);
>>>>> if (i > RCU_TORTURE_PIPE_LEN)
>>>>> i = RCU_TORTURE_PIPE_LEN;
>>>>> atomic_inc(&rcu_torture_wcount[i]);
>>>>> - if (++rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
>>>>> + WRITE_ONCE(rp->rtort_pipe_count, i + 1);
>>>>> + if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
>>>>> rp->rtort_mbtest = 0;
>>>>> return true;
>>>>> }
>>>>>
>>>>> But ++rp->rtort_pipe_count is meant to add itself by 1, not give i+1 to
>>>>> rp->rtort_pipe_count, because rp->rtort_pipe_count may write by
>>>>> rcu_torture_writer() concurrently.
>>>>>
>>>>> Also, rp->rtort_pipe_count in the next line should be read using
>>>>> READ_ONCE() because of data race.
>>>>>
>>>>> Signed-off-by: linke li <[email protected]>
>>>>> ---
>>>>> kernel/rcu/rcutorture.c | 4 ++--
>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
>>>>> index 7567ca8e743c..00059ace4fd5 100644
>>>>> --- a/kernel/rcu/rcutorture.c
>>>>> +++ b/kernel/rcu/rcutorture.c
>>>>> @@ -465,8 +465,8 @@ rcu_torture_pipe_update_one(struct rcu_torture *rp)
>>>>> if (i > RCU_TORTURE_PIPE_LEN)
>>>>> i = RCU_TORTURE_PIPE_LEN;
>>>>> atomic_inc(&rcu_torture_wcount[i]);
>>>>> - WRITE_ONCE(rp->rtort_pipe_count, i + 1);
>>>>> - if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
>>>>> + WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
>>>>> + if (READ_ONCE(rp->rtort_pipe_count) >= RCU_TORTURE_PIPE_LEN) {
>>>>
>>>> I want to say, I am not convinced with the patch because what's wrong with
>>>> writing to an old index?
>>>>
>>>> You win/lose the race anyway, say the CPU executed the WRITE_ONCE() a bit too
>>>> early/late and another WRITE_ONCE() lost/won, regardless of whether you wrote
>>>> the "incremented i" or "the increment from the latest value of pipe_count".
>>>>
>>>> Anyway, a slightly related/different question:
>>>>
>>>> Should that:
>>>> WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
>>>>
>>>> Be:
>>>> WRITE_ONCE(rp->rtort_pipe_count, READ_ONCE(rp->rtort_pipe_count) + 1);
>>>>
>>>> ?
>>>
>>> Thank you both!
>>>
>>> At first glance, I would argue for something like this:
>>>
>>> ------------------------------------------------------------------------
>>>
>>> static bool
>>> rcu_torture_pipe_update_one(struct rcu_torture *rp)
>>> {
>>> int i;
>>> struct rcu_torture_reader_check *rtrcp = READ_ONCE(rp->rtort_chkp);
>>>
>>> if (rtrcp) {
>>> WRITE_ONCE(rp->rtort_chkp, NULL);
>>> smp_store_release(&rtrcp->rtc_ready, 1); // Pair with smp_load_acquire().
>>> }
>>> i = READ_ONCE(rp->rtort_pipe_count) + 1;
>>> if (i > RCU_TORTURE_PIPE_LEN)
>>> i = RCU_TORTURE_PIPE_LEN;
>>> atomic_inc(&rcu_torture_wcount[i]);
>>> WRITE_ONCE(rp->rtort_pipe_count, i);
>>> if (i >= RCU_TORTURE_PIPE_LEN) {
>>> rp->rtort_mbtest = 0;
>>> return true;
>>> }
>>> return false;
>>> }
>>>
>>> ------------------------------------------------------------------------
>>>
>>> That is, move the increment to the read and replace the re-read with
>>> the value "i" that was just written.
>>
>> But that changes the original logic as well? It looks like with the above
>> change, you're now writing to rcu_torture_wcount[READ_ONCE(rp->rtort_pipe_count)
>> + 1] instead of rcu_torture_wcount[READ_ONCE(rp->rtort_pipe_count)].
>>
>> I think that might break rcutorture, because there is an increment outside of
>> the first 2 entries in rcu_torture_wcount but not sure (need to look more).
>
> Good point on never incrementing the zeroth entry! Clearly I should
> have waited before replying.
>
> How about the following?
>
> ------------------------------------------------------------------------
>
> static bool
> rcu_torture_pipe_update_one(struct rcu_torture *rp)
> {
> int i;
> struct rcu_torture_reader_check *rtrcp = READ_ONCE(rp->rtort_chkp);
>
> if (rtrcp) {
> WRITE_ONCE(rp->rtort_chkp, NULL);
> smp_store_release(&rtrcp->rtc_ready, 1); // Pair with smp_load_acquire().
> }
> i = READ_ONCE(rp->rtort_pipe_count);
> if (i > RCU_TORTURE_PIPE_LEN)
> i = RCU_TORTURE_PIPE_LEN;
> atomic_inc(&rcu_torture_wcount[i]);
> WRITE_ONCE(rp->rtort_pipe_count, i + 1);
> if (i + 1 >= RCU_TORTURE_PIPE_LEN) {
> rp->rtort_mbtest = 0;
> return true;
> }
> return false;
> }

Yes, this looks good to me. Thanks,
Reviewed-by: Joel Fernandes (Google) <[email protected]>


2024-03-04 20:47:59

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Mon, Mar 04, 2024 at 03:13:10PM -0500, Joel Fernandes wrote:
>
>
> On 3/4/2024 2:44 PM, Paul E. McKenney wrote:
> > On Mon, Mar 04, 2024 at 02:10:09PM -0500, Joel Fernandes wrote:
> >>
> >>
> >> On 3/4/2024 12:14 PM, Paul E. McKenney wrote:
> >>> On Mon, Mar 04, 2024 at 11:19:21AM -0500, Joel Fernandes wrote:
> >>>>
> >>>>
> >>>> On 3/4/2024 5:54 AM, linke li wrote:
> >>>>> Some changes are done to fix a data race in commit 202489101f2e ("rcutorture: Fix rcu_torture_one_read()/rcu_torture_writer() data race")
> >>>>>
> >>>>> {
> >>>>> int i;
> >>>>>
> >>>>> - i = rp->rtort_pipe_count;
> >>>>> + i = READ_ONCE(rp->rtort_pipe_count);
> >>>>> if (i > RCU_TORTURE_PIPE_LEN)
> >>>>> i = RCU_TORTURE_PIPE_LEN;
> >>>>> atomic_inc(&rcu_torture_wcount[i]);
> >>>>> - if (++rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
> >>>>> + WRITE_ONCE(rp->rtort_pipe_count, i + 1);
> >>>>> + if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
> >>>>> rp->rtort_mbtest = 0;
> >>>>> return true;
> >>>>> }
> >>>>>
> >>>>> But ++rp->rtort_pipe_count is meant to add itself by 1, not give i+1 to
> >>>>> rp->rtort_pipe_count, because rp->rtort_pipe_count may write by
> >>>>> rcu_torture_writer() concurrently.
> >>>>>
> >>>>> Also, rp->rtort_pipe_count in the next line should be read using
> >>>>> READ_ONCE() because of data race.
> >>>>>
> >>>>> Signed-off-by: linke li <[email protected]>
> >>>>> ---
> >>>>> kernel/rcu/rcutorture.c | 4 ++--
> >>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> >>>>> index 7567ca8e743c..00059ace4fd5 100644
> >>>>> --- a/kernel/rcu/rcutorture.c
> >>>>> +++ b/kernel/rcu/rcutorture.c
> >>>>> @@ -465,8 +465,8 @@ rcu_torture_pipe_update_one(struct rcu_torture *rp)
> >>>>> if (i > RCU_TORTURE_PIPE_LEN)
> >>>>> i = RCU_TORTURE_PIPE_LEN;
> >>>>> atomic_inc(&rcu_torture_wcount[i]);
> >>>>> - WRITE_ONCE(rp->rtort_pipe_count, i + 1);
> >>>>> - if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
> >>>>> + WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
> >>>>> + if (READ_ONCE(rp->rtort_pipe_count) >= RCU_TORTURE_PIPE_LEN) {
> >>>>
> >>>> I want to say, I am not convinced with the patch because what's wrong with
> >>>> writing to an old index?
> >>>>
> >>>> You win/lose the race anyway, say the CPU executed the WRITE_ONCE() a bit too
> >>>> early/late and another WRITE_ONCE() lost/won, regardless of whether you wrote
> >>>> the "incremented i" or "the increment from the latest value of pipe_count".
> >>>>
> >>>> Anyway, a slightly related/different question:
> >>>>
> >>>> Should that:
> >>>> WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
> >>>>
> >>>> Be:
> >>>> WRITE_ONCE(rp->rtort_pipe_count, READ_ONCE(rp->rtort_pipe_count) + 1);
> >>>>
> >>>> ?
> >>>
> >>> Thank you both!
> >>>
> >>> At first glance, I would argue for something like this:
> >>>
> >>> ------------------------------------------------------------------------
> >>>
> >>> static bool
> >>> rcu_torture_pipe_update_one(struct rcu_torture *rp)
> >>> {
> >>> int i;
> >>> struct rcu_torture_reader_check *rtrcp = READ_ONCE(rp->rtort_chkp);
> >>>
> >>> if (rtrcp) {
> >>> WRITE_ONCE(rp->rtort_chkp, NULL);
> >>> smp_store_release(&rtrcp->rtc_ready, 1); // Pair with smp_load_acquire().
> >>> }
> >>> i = READ_ONCE(rp->rtort_pipe_count) + 1;
> >>> if (i > RCU_TORTURE_PIPE_LEN)
> >>> i = RCU_TORTURE_PIPE_LEN;
> >>> atomic_inc(&rcu_torture_wcount[i]);
> >>> WRITE_ONCE(rp->rtort_pipe_count, i);
> >>> if (i >= RCU_TORTURE_PIPE_LEN) {
> >>> rp->rtort_mbtest = 0;
> >>> return true;
> >>> }
> >>> return false;
> >>> }
> >>>
> >>> ------------------------------------------------------------------------
> >>>
> >>> That is, move the increment to the read and replace the re-read with
> >>> the value "i" that was just written.
> >>
> >> But that changes the original logic as well? It looks like with the above
> >> change, you're now writing to rcu_torture_wcount[READ_ONCE(rp->rtort_pipe_count)
> >> + 1] instead of rcu_torture_wcount[READ_ONCE(rp->rtort_pipe_count)].
> >>
> >> I think that might break rcutorture, because there is an increment outside of
> >> the first 2 entries in rcu_torture_wcount but not sure (need to look more).
> >
> > Good point on never incrementing the zeroth entry! Clearly I should
> > have waited before replying.
> >
> > How about the following?
> >
> > ------------------------------------------------------------------------
> >
> > static bool
> > rcu_torture_pipe_update_one(struct rcu_torture *rp)
> > {
> > int i;
> > struct rcu_torture_reader_check *rtrcp = READ_ONCE(rp->rtort_chkp);
> >
> > if (rtrcp) {
> > WRITE_ONCE(rp->rtort_chkp, NULL);
> > smp_store_release(&rtrcp->rtc_ready, 1); // Pair with smp_load_acquire().
> > }
> > i = READ_ONCE(rp->rtort_pipe_count);
> > if (i > RCU_TORTURE_PIPE_LEN)
> > i = RCU_TORTURE_PIPE_LEN;
> > atomic_inc(&rcu_torture_wcount[i]);
> > WRITE_ONCE(rp->rtort_pipe_count, i + 1);
> > if (i + 1 >= RCU_TORTURE_PIPE_LEN) {
> > rp->rtort_mbtest = 0;
> > return true;
> > }
> > return false;
> > }
>
> Yes, this looks good to me. Thanks,
> Reviewed-by: Joel Fernandes (Google) <[email protected]>

Again, thank you.

linke li, does this approach work for you? If so, would you be willing to
send a new patch along these lines? If it does not work, what additional
problems do you see?

Thanx, Paul

2024-03-05 03:33:44

by linke li

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

Thank you both. This looks good to me. I will send a new patch.

> 2024年3月5日 04:47,Paul E. McKenney <[email protected]> 写道:
>
> On Mon, Mar 04, 2024 at 03:13:10PM -0500, Joel Fernandes wrote:
>>
>>
>> On 3/4/2024 2:44 PM, Paul E. McKenney wrote:
>>> On Mon, Mar 04, 2024 at 02:10:09PM -0500, Joel Fernandes wrote:
>>>>
>>>>
>>>> On 3/4/2024 12:14 PM, Paul E. McKenney wrote:
>>>>> On Mon, Mar 04, 2024 at 11:19:21AM -0500, Joel Fernandes wrote:
>>>>>>
>>>>>>
>>>>>> On 3/4/2024 5:54 AM, linke li wrote:
>>>>>>> Some changes are done to fix a data race in commit 202489101f2e ("rcutorture: Fix rcu_torture_one_read()/rcu_torture_writer() data race")
>>>>>>>
>>>>>>> {
>>>>>>> int i;
>>>>>>>
>>>>>>> - i = rp->rtort_pipe_count;
>>>>>>> + i = READ_ONCE(rp->rtort_pipe_count);
>>>>>>> if (i > RCU_TORTURE_PIPE_LEN)
>>>>>>> i = RCU_TORTURE_PIPE_LEN;
>>>>>>> atomic_inc(&rcu_torture_wcount[i]);
>>>>>>> - if (++rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
>>>>>>> + WRITE_ONCE(rp->rtort_pipe_count, i + 1);
>>>>>>> + if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
>>>>>>> rp->rtort_mbtest = 0;
>>>>>>> return true;
>>>>>>> }
>>>>>>>
>>>>>>> But ++rp->rtort_pipe_count is meant to add itself by 1, not give i+1 to
>>>>>>> rp->rtort_pipe_count, because rp->rtort_pipe_count may write by
>>>>>>> rcu_torture_writer() concurrently.
>>>>>>>
>>>>>>> Also, rp->rtort_pipe_count in the next line should be read using
>>>>>>> READ_ONCE() because of data race.
>>>>>>>
>>>>>>> Signed-off-by: linke li <[email protected]>
>>>>>>> ---
>>>>>>> kernel/rcu/rcutorture.c | 4 ++--
>>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
>>>>>>> index 7567ca8e743c..00059ace4fd5 100644
>>>>>>> --- a/kernel/rcu/rcutorture.c
>>>>>>> +++ b/kernel/rcu/rcutorture.c
>>>>>>> @@ -465,8 +465,8 @@ rcu_torture_pipe_update_one(struct rcu_torture *rp)
>>>>>>> if (i > RCU_TORTURE_PIPE_LEN)
>>>>>>> i = RCU_TORTURE_PIPE_LEN;
>>>>>>> atomic_inc(&rcu_torture_wcount[i]);
>>>>>>> - WRITE_ONCE(rp->rtort_pipe_count, i + 1);
>>>>>>> - if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
>>>>>>> + WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
>>>>>>> + if (READ_ONCE(rp->rtort_pipe_count) >= RCU_TORTURE_PIPE_LEN) {
>>>>>>
>>>>>> I want to say, I am not convinced with the patch because what's wrong with
>>>>>> writing to an old index?
>>>>>>
>>>>>> You win/lose the race anyway, say the CPU executed the WRITE_ONCE() a bit too
>>>>>> early/late and another WRITE_ONCE() lost/won, regardless of whether you wrote
>>>>>> the "incremented i" or "the increment from the latest value of pipe_count".
>>>>>>
>>>>>> Anyway, a slightly related/different question:
>>>>>>
>>>>>> Should that:
>>>>>> WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
>>>>>>
>>>>>> Be:
>>>>>> WRITE_ONCE(rp->rtort_pipe_count, READ_ONCE(rp->rtort_pipe_count) + 1);
>>>>>>
>>>>>> ?
>>>>>
>>>>> Thank you both!
>>>>>
>>>>> At first glance, I would argue for something like this:
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> static bool
>>>>> rcu_torture_pipe_update_one(struct rcu_torture *rp)
>>>>> {
>>>>> int i;
>>>>> struct rcu_torture_reader_check *rtrcp = READ_ONCE(rp->rtort_chkp);
>>>>>
>>>>> if (rtrcp) {
>>>>> WRITE_ONCE(rp->rtort_chkp, NULL);
>>>>> smp_store_release(&rtrcp->rtc_ready, 1); // Pair with smp_load_acquire().
>>>>> }
>>>>> i = READ_ONCE(rp->rtort_pipe_count) + 1;
>>>>> if (i > RCU_TORTURE_PIPE_LEN)
>>>>> i = RCU_TORTURE_PIPE_LEN;
>>>>> atomic_inc(&rcu_torture_wcount[i]);
>>>>> WRITE_ONCE(rp->rtort_pipe_count, i);
>>>>> if (i >= RCU_TORTURE_PIPE_LEN) {
>>>>> rp->rtort_mbtest = 0;
>>>>> return true;
>>>>> }
>>>>> return false;
>>>>> }
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> That is, move the increment to the read and replace the re-read with
>>>>> the value "i" that was just written.
>>>>
>>>> But that changes the original logic as well? It looks like with the above
>>>> change, you're now writing to rcu_torture_wcount[READ_ONCE(rp->rtort_pipe_count)
>>>> + 1] instead of rcu_torture_wcount[READ_ONCE(rp->rtort_pipe_count)].
>>>>
>>>> I think that might break rcutorture, because there is an increment outside of
>>>> the first 2 entries in rcu_torture_wcount but not sure (need to look more).
>>>
>>> Good point on never incrementing the zeroth entry! Clearly I should
>>> have waited before replying.
>>>
>>> How about the following?
>>>
>>> ------------------------------------------------------------------------
>>>
>>> static bool
>>> rcu_torture_pipe_update_one(struct rcu_torture *rp)
>>> {
>>> int i;
>>> struct rcu_torture_reader_check *rtrcp = READ_ONCE(rp->rtort_chkp);
>>>
>>> if (rtrcp) {
>>> WRITE_ONCE(rp->rtort_chkp, NULL);
>>> smp_store_release(&rtrcp->rtc_ready, 1); // Pair with smp_load_acquire().
>>> }
>>> i = READ_ONCE(rp->rtort_pipe_count);
>>> if (i > RCU_TORTURE_PIPE_LEN)
>>> i = RCU_TORTURE_PIPE_LEN;
>>> atomic_inc(&rcu_torture_wcount[i]);
>>> WRITE_ONCE(rp->rtort_pipe_count, i + 1);
>>> if (i + 1 >= RCU_TORTURE_PIPE_LEN) {
>>> rp->rtort_mbtest = 0;
>>> return true;
>>> }
>>> return false;
>>> }
>>
>> Yes, this looks good to me. Thanks,
>> Reviewed-by: Joel Fernandes (Google) <[email protected]>
>
> Again, thank you.
>
> linke li, does this approach work for you? If so, would you be willing to
> send a new patch along these lines? If it does not work, what additional
> problems do you see?
>
> Thanx, Paul


2024-03-05 06:24:42

by linke li

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

> Anyway, a slightly related/different question:
>
> Should that:
> WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
>
> Be:
> WRITE_ONCE(rp->rtort_pipe_count, READ_ONCE(rp->rtort_pipe_count) + 1);
>
> ?

Hi, Joel. Sorry, I am not very sure about this. I do a little research on
it.

I looked through all code in kernel that looks like this, both kinds are
used. I also try to compile this file with and without the READ_ONCE in
WRITE_ONCE using gcc-9. Their binary is just the same.

Thus I think in the current compiler version, they do not have a big
difference, but if the compiler wants to optimize more, maybe the latter
one is better.


2024-03-06 15:36:34

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Tue, 5 Mar 2024 14:24:20 +0800
linke li <[email protected]> wrote:

> > Anyway, a slightly related/different question:
> >
> > Should that:
> > WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
> >
> > Be:
> > WRITE_ONCE(rp->rtort_pipe_count, READ_ONCE(rp->rtort_pipe_count) + 1);
> >
> > ?
>
> Hi, Joel. Sorry, I am not very sure about this. I do a little research on
> it.
>
> I looked through all code in kernel that looks like this, both kinds are
> used. I also try to compile this file with and without the READ_ONCE in
> WRITE_ONCE using gcc-9. Their binary is just the same.
>
> Thus I think in the current compiler version, they do not have a big
> difference, but if the compiler wants to optimize more, maybe the latter
> one is better.

I'm sorry but all these READ_ONCE/WRITE_ONCE() additions that are being
added because there's a fear of the compiler tearing long words, is going to
make the code really ugly and hard to read.

If we take the policy of handling a compiler that can tear reads and writes
of any size word, then we should have proper macros to handle it.

Perhaps READ_SHARED(), WRITE_SHARED(), ADD_SHARED(), SUB_SHARED(). The ONCE
has nothing to do with the reasons for these changes. But at least "SHARED"
can be considered "this variable is shared between different contexts".
Note, this is different than "atomic". It's just to document that this
variable must be loaded or stored in one transaction.

I don't know if Linus even cares about fixing "read/write tearing" which is
why I Cc'd him.

But I'm not going to take any patches that add these macros to fix
compilers that tear words on load and store until we have a set policy on
what to do with them.

-- Steve

2024-03-06 17:36:29

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Wed, Mar 06, 2024 at 10:37:19AM -0500, Steven Rostedt wrote:
> On Tue, 5 Mar 2024 14:24:20 +0800
> linke li <[email protected]> wrote:
>
> > > Anyway, a slightly related/different question:
> > >
> > > Should that:
> > > WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
> > >
> > > Be:
> > > WRITE_ONCE(rp->rtort_pipe_count, READ_ONCE(rp->rtort_pipe_count) + 1);
> > >
> > > ?
> >
> > Hi, Joel. Sorry, I am not very sure about this. I do a little research on
> > it.
> >
> > I looked through all code in kernel that looks like this, both kinds are
> > used. I also try to compile this file with and without the READ_ONCE in
> > WRITE_ONCE using gcc-9. Their binary is just the same.
> >
> > Thus I think in the current compiler version, they do not have a big
> > difference, but if the compiler wants to optimize more, maybe the latter
> > one is better.
>
> I'm sorry but all these READ_ONCE/WRITE_ONCE() additions that are being
> added because there's a fear of the compiler tearing long words, is going to
> make the code really ugly and hard to read.

There are quite a few other things to fear besides tearing. The compiler
can and does invent and fuse normal loads, and it can and does fuse
normal stores. And there really are compilers that tear stores of
certain constants on systems with short immediate fields.

I would argue that use of normal C-language loads and stores should be
accompanied by comments saying why the compiler cannot mess things up.
But maintainer's choice. Those maintainers who keep a close personal
relationship with the ever-growing list of optimizations should have
no problem working out which use cases are safe for normal C-language
loads and stores. Me, I would really rather play it safe, especially
in the innards of something like RCU. ;-)

> If we take the policy of handling a compiler that can tear reads and writes
> of any size word, then we should have proper macros to handle it.

Those are in fact READ_ONCE() and WRITE_ONCE() when given machine-word
sized/aligned variables.

> Perhaps READ_SHARED(), WRITE_SHARED(), ADD_SHARED(), SUB_SHARED(). The ONCE
> has nothing to do with the reasons for these changes. But at least "SHARED"
> can be considered "this variable is shared between different contexts".
> Note, this is different than "atomic". It's just to document that this
> variable must be loaded or stored in one transaction.

We already have READ_ONCE() and WRITE_ONCE(). An ADD_SHARED() might
be useful, though compilers are starting to learn how to emit good code
for things like WRITE_ONCE(a, READ_ONCE(a) + 1).

But such things should also be documented and added to LKMM.

> I don't know if Linus even cares about fixing "read/write tearing" which is
> why I Cc'd him.

I am sure that whatever his views, he will not suffer in silence. ;-)

> But I'm not going to take any patches that add these macros to fix
> compilers that tear words on load and store until we have a set policy on
> what to do with them.

Maintainer's choice!

For RCU, I want the code to just work with future compiler optimizations
as well as with current ones. This stuff is fun enough without giving
the compiler opportunities for more mischief!

Thanx, Paul

2024-03-06 17:59:20

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Wed, 6 Mar 2024 09:36:16 -0800
"Paul E. McKenney" <[email protected]> wrote:

> > If we take the policy of handling a compiler that can tear reads and writes
> > of any size word, then we should have proper macros to handle it.
>
> Those are in fact READ_ONCE() and WRITE_ONCE() when given machine-word
> sized/aligned variables.

IIRC, the original purpose of READ_ONCE() and WRITE_ONCE() was to make sure
that the compiler only reads or writes the variable "once". Hence the name.
That way after a load, you don't need to worry that the content of the
variable you read isn't going to be read again from the original location
because the compiler decided to save stack space and registers.

But that macro has now been extended for other purposes.

>
> > Perhaps READ_SHARED(), WRITE_SHARED(), ADD_SHARED(), SUB_SHARED(). The ONCE
> > has nothing to do with the reasons for these changes. But at least "SHARED"
> > can be considered "this variable is shared between different contexts".
> > Note, this is different than "atomic". It's just to document that this
> > variable must be loaded or stored in one transaction.
>
> We already have READ_ONCE() and WRITE_ONCE(). An ADD_SHARED() might
> be useful, though compilers are starting to learn how to emit good code
> for things like WRITE_ONCE(a, READ_ONCE(a) + 1).

Well, if we keep the _ONCE() naming, it should be ADD_ONCE(). Because

WRITE_ONCE(a, READ_ONCE(a) + 1)

is an abomination and should only be present in obfuscation contests.

>
> But such things should also be documented and added to LKMM.
>
> > I don't know if Linus even cares about fixing "read/write tearing" which is
> > why I Cc'd him.
>
> I am sure that whatever his views, he will not suffer in silence. ;-)
>
> > But I'm not going to take any patches that add these macros to fix
> > compilers that tear words on load and store until we have a set policy on
> > what to do with them.
>
> Maintainer's choice!
>
> For RCU, I want the code to just work with future compiler optimizations
> as well as with current ones. This stuff is fun enough without giving
> the compiler opportunities for more mischief!

I'm not against the changes. I'm against the ugliness of the changes.
Should we just create a ADD_ONCE() macro?

If the approach is now to find all places that access a variable between
different contexts, and create READ_ONCE()/WRITE_ONCE() around them, I'm
fine with it.

Perhaps we need a way to annotate them, like we have with __rcu. "__shared"?

Then all accesses to that variable must be wrapped with a READ_ONCE() or
WRITE_ONCE()? I mean, if this can cause legitimate bugs, we should probably
address it like we do with locking and RCU.

-- Steve



2024-03-06 18:09:56

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Wed, Mar 06, 2024 at 01:01:03PM -0500, Steven Rostedt wrote:
> On Wed, 6 Mar 2024 09:36:16 -0800
> "Paul E. McKenney" <[email protected]> wrote:
>
> > > If we take the policy of handling a compiler that can tear reads and writes
> > > of any size word, then we should have proper macros to handle it.
> >
> > Those are in fact READ_ONCE() and WRITE_ONCE() when given machine-word
> > sized/aligned variables.
>
> IIRC, the original purpose of READ_ONCE() and WRITE_ONCE() was to make sure
> that the compiler only reads or writes the variable "once". Hence the name.
> That way after a load, you don't need to worry that the content of the
> variable you read isn't going to be read again from the original location
> because the compiler decided to save stack space and registers.
>
> But that macro has now been extended for other purposes.

If I remember correctly, some 32-bit system had 64-bit PTEs that it
wanted to use WRITE_ONCE() on. Does Linux still support that system?
If not, maybe it is time to remove that extension.

> > > Perhaps READ_SHARED(), WRITE_SHARED(), ADD_SHARED(), SUB_SHARED(). The ONCE
> > > has nothing to do with the reasons for these changes. But at least "SHARED"
> > > can be considered "this variable is shared between different contexts".
> > > Note, this is different than "atomic". It's just to document that this
> > > variable must be loaded or stored in one transaction.
> >
> > We already have READ_ONCE() and WRITE_ONCE(). An ADD_SHARED() might
> > be useful, though compilers are starting to learn how to emit good code
> > for things like WRITE_ONCE(a, READ_ONCE(a) + 1).
>
> Well, if we keep the _ONCE() naming, it should be ADD_ONCE(). Because
>
> WRITE_ONCE(a, READ_ONCE(a) + 1)
>
> is an abomination and should only be present in obfuscation contests.

I have no problem with replacing that sort of thing with ADD_ONCE().

> > But such things should also be documented and added to LKMM.
> >
> > > I don't know if Linus even cares about fixing "read/write tearing" which is
> > > why I Cc'd him.
> >
> > I am sure that whatever his views, he will not suffer in silence. ;-)
> >
> > > But I'm not going to take any patches that add these macros to fix
> > > compilers that tear words on load and store until we have a set policy on
> > > what to do with them.
> >
> > Maintainer's choice!
> >
> > For RCU, I want the code to just work with future compiler optimizations
> > as well as with current ones. This stuff is fun enough without giving
> > the compiler opportunities for more mischief!
>
> I'm not against the changes. I'm against the ugliness of the changes.
> Should we just create a ADD_ONCE() macro?

Works for me! We should also update tools/memory-model/linux-kernel.defs
to allow it to be used in litmus tests. (I can help with that.)

Plus of course documentation.

> If the approach is now to find all places that access a variable between
> different contexts, and create READ_ONCE()/WRITE_ONCE() around them, I'm
> fine with it.

I don't know that the entire kernel is going that far, but RCU has had
that philosophy for a very long time. Yes, KCSAN sometimes finds places
where we slipped up, but those get fixed.

> Perhaps we need a way to annotate them, like we have with __rcu. "__shared"?
>
> Then all accesses to that variable must be wrapped with a READ_ONCE() or
> WRITE_ONCE()? I mean, if this can cause legitimate bugs, we should probably
> address it like we do with locking and RCU.

If we want that, just mark the field "volatile", as in "jiffies".

And one of the strengths of READ_ONCE() and WRITE_ONCE() is that they
allow non-volatile access where it is safe. For example, if you hold the
lock protecting all stores to that variable, you still need WRITE_ONCE()
but not READ_ONCE(). In initialization and cleanup code, you don't
need either.

Thanx, Paul

2024-03-06 18:18:35

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Wed, 6 Mar 2024 10:09:48 -0800
"Paul E. McKenney" <[email protected]> wrote:

> > Perhaps we need a way to annotate them, like we have with __rcu. "__shared"?
> >
> > Then all accesses to that variable must be wrapped with a READ_ONCE() or
> > WRITE_ONCE()? I mean, if this can cause legitimate bugs, we should probably
> > address it like we do with locking and RCU.
>
> If we want that, just mark the field "volatile", as in "jiffies".

I already know Linus's view on "volatile" variables ;-)

>
> And one of the strengths of READ_ONCE() and WRITE_ONCE() is that they
> allow non-volatile access where it is safe. For example, if you hold the
> lock protecting all stores to that variable, you still need WRITE_ONCE()
> but not READ_ONCE(). In initialization and cleanup code, you don't
> need either.

I guess the current static analyzers just look to see where READ_ONCE() or
WRITE_ONCE() is used and checks to see if other places have them properly
used. I'm guessing that's where the OP patch came from.

Sounds like we just need a ADD_ONCE() or INC_ONCE() then. Because I am not
taking a

WRITE_ONCE(a, READ_ONCE(a) + 1);

patch that replaces a simple "a++".

-- Steve


2024-03-06 18:44:38

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Wed, 6 Mar 2024 at 09:59, Steven Rostedt <[email protected]> wrote:
>
> IIRC, the original purpose of READ_ONCE() and WRITE_ONCE() was to make sure
> that the compiler only reads or writes the variable "once". Hence the name.
> That way after a load, you don't need to worry that the content of the
> variable you read isn't going to be read again from the original location
> because the compiler decided to save stack space and registers.
>
> But that macro has now been extended for other purposes.

Not really.

Tearing of simple types (as opposed to structures or bitfields or
"more than one word" or whatever) has never really been a real
concern.

It keeps being brought up as a "compilers could do this", but it's
basically just BS fear-mongering. Compilers _don't_ do it, and the
reason why compilers don't do it isn't some "compilers are trying to
be nice" issue, but simply a "it is insane and generates worse code"
issue.

So what happens is that READ_ONCE() and WRITE_ONCE() have always been
about reading and writing *consistent* values. There is no locking,
but the idea is - and has always been - that you get one *single*
answer from READ_ONCE(), and that single answer will always be
consistent with something that has been written by WRITE_ONCE.

That's often useful - lots of code doesn't really care if you get the
old or the new value, but the code *does* care that it gets *one*
value, and not some random mix of "I tested one value for validity,
then it got reloaded due to register pressure, and I actually used
another value".

And not some "I read one value, and it was a mix of two other values".

But in order to get those semantics, the READ_ONCE() and WRITE_ONCE()
macros don't do just the 'volatile' (to get the "no reloads"
guarantee), but they also do that "simple types" check.

So READ_ONCE/WRITE_ONCE has never really been "extended for other
purposes". The purpose has always been the same: one single consistent
value.

What did happen that our *original* name for this was not "read vs
write", but just "access".

So instead of "READ_ONCE(x)" you'd do "ACCESS_ONCE(x)", and instead of
"WRITE_ONCE(x,y)" you'd do "ACCESS_ONCE(x) = y".

And, to make matters more interesting, we had code that did that on
things that were *not* simple values. IOW, we'd have things like
ACCESS_ONCE() on things that literally *couldn't* be accessed as one
single value.

The most notable was accessing page table entries, which on multiple
architectures (including plain old 32-bit x86) ended up being two
words.

So the extension that *did* happen is that READ_ONCE and WRITE_ONCE
actually verify that the type is simple, and that you can't do a
64-bit READ_ONCE on a 32-bit architecture. Because then while you
migth guarantee that the value isn't reloaded multiple times, you
cannot guarantee that you actually get a value that is consistent with
a WRITE_ONCE (because the reads and writes are both two operations).

Now, we've gotten rid of the whole ACCESS_ONCE() thing, and so some of
that history is no longer visible (although you can still see that
pattern in the rseq self-tests).

So yes, READ_ONCE/WRITE_ONCE do control "tearing", but realistically,
it was always only about the "complex values" kind of tearing that the
old ACCESS_ONCE() model silently and incorrectly allowed.

Linus

2024-03-06 18:53:28

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Wed, 6 Mar 2024 10:43:25 -0800
Linus Torvalds <[email protected]> wrote:

Thanks for the history lesson ;-)

> So yes, READ_ONCE/WRITE_ONCE do control "tearing", but realistically,
> it was always only about the "complex values" kind of tearing that the
> old ACCESS_ONCE() model silently and incorrectly allowed.

Now, are you OK with an addition of ADD_ONCE() and/or INC_ONCE()? So that we
don't have to look at:

WRITE_ONCE(a, READ_ONCE(a) + 1);

?

-- Steve

2024-03-06 19:02:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Wed, 6 Mar 2024 at 10:53, Steven Rostedt <[email protected]> wrote:
>
> Now, are you OK with an addition of ADD_ONCE() and/or INC_ONCE()? So that we
> don't have to look at:
>
> WRITE_ONCE(a, READ_ONCE(a) + 1);
>
> ?

In a generic header file under include/linux/?

Absolutely not. The above is a completely broken operation. There is
no way in hell we should expose it as a general helper.

So there is no way we'd add that kind of sh*t-for-brains operation in
(for example) our <asm/rwonce.h> header file next to the normal
READ/WRITE_ONCE defines.

In some individual tracing C file where it has a comment above it how
it's braindamaged and unsafe and talking about why it's ok in that
particular context? Go wild.

But honestly, I do not see when a ADD_ONCE() would ever be a valid
thing to do, and *if* it's a valid thing to do, why you'd do it with
READ_ONCE and WRITE_ONCE.

If you don't care about races, just do a simple "++" and be done with
it. The end result is random.

Adding a "ADD_ONCE()" macro doesn't make it one whit less random. It
just makes a broken concept even uglier.

So honestly, I think the ADD_ONCE macro not only needs to be in some
tracing-specific C file, the comment needs to be pretty damn big too.
Because as a random number generator, it's not even a very good one.
So you need to explain *why* you want a particularly bad random number
generator in the first place.

Linus

2024-03-06 19:28:03

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Wed, 6 Mar 2024 at 11:01, Linus Torvalds
<[email protected]> wrote:
>
> In some individual tracing C file where it has a comment above it how
> it's braindamaged and unsafe and talking about why it's ok in that
> particular context? Go wild.

Actually, I take that back.

Even in a random C file, the naming makes no sense. There's no "once" about it.

So if you want to do something like

#define UNSAFE_INCREMENTISH(x) (WRITE_ONCE(a, READ_ONCE(a) + 1))

then that's fine, I guess. Because that's what the operation is.

It's not safe, and it's not an increment, but it _approximates_ an
increment most of the time. So UNSAFE_INCREMENTISH() pretty much
perfectly describes what it is doing.

Note that you'll also almost certainly end up with worse code
generation, ie don't expect to see a single "inc" instruction (or "add
$1") for the above.

Because at least for gcc, the volatiles involved with those "ONCE"
operations end up often generating much worse code, so rather than an
"inc" instruction, you'll almost certainly get "load+add+store" and
the inevitable code expansion and extra register use.

I really don't know what you want to do, but it smells bad. A big
comment about why you'd want that "incrementish" operation will be
needed.

To me, this smells like "Steven did something fundamentally wrong
again, some tool is now complaining about it, and Steven doesn't want
to fix the problem but instead paper over it again".

Not a good look.

But I don't have a link to the original report, and I'm not thrilled
enough about this to go looking for it.

Linus

2024-03-06 19:28:09

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Wed, 6 Mar 2024 11:01:55 -0800
Linus Torvalds <[email protected]> wrote:

> On Wed, 6 Mar 2024 at 10:53, Steven Rostedt <[email protected]> wrote:
> >
> > Now, are you OK with an addition of ADD_ONCE() and/or INC_ONCE()? So that we
> > don't have to look at:
> >
> > WRITE_ONCE(a, READ_ONCE(a) + 1);
> >
> > ?
>
> In a generic header file under include/linux/?
>
> Absolutely not. The above is a completely broken operation. There is
> no way in hell we should expose it as a general helper.
>
> So there is no way we'd add that kind of sh*t-for-brains operation in
> (for example) our <asm/rwonce.h> header file next to the normal
> READ/WRITE_ONCE defines.
>
> In some individual tracing C file where it has a comment above it how
> it's braindamaged and unsafe and talking about why it's ok in that
> particular context? Go wild.

Note this has nothing to do with tracing. This thread is in RCU. I just
happen to receive the same patch "fix" for my code.

>
> But honestly, I do not see when a ADD_ONCE() would ever be a valid
> thing to do, and *if* it's a valid thing to do, why you'd do it with
> READ_ONCE and WRITE_ONCE.
>
> If you don't care about races, just do a simple "++" and be done with
> it. The end result is random.

That was my feeling. But when I saw this going into RCU, I was thinking
there was a more fundamental problem here.

>
> Adding a "ADD_ONCE()" macro doesn't make it one whit less random. It
> just makes a broken concept even uglier.
>
> So honestly, I think the ADD_ONCE macro not only needs to be in some
> tracing-specific C file, the comment needs to be pretty damn big too.
> Because as a random number generator, it's not even a very good one.
> So you need to explain *why* you want a particularly bad random number
> generator in the first place.

Again, this has nothing to do with tracing. The code here is solely in
RCU. I did receive a patch in the tracing code, but that had to deal
with wakeups of readers with respect to writers which is a common thing
across the kernel and is not anything tracing specific.

I wasn't about to take the patch to my code, but when I saw the same
changes in RCU, then I thought this might be something I need to worry
about.

-- Steve


2024-03-06 19:45:30

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Wed, 6 Mar 2024 11:27:04 -0800
Linus Torvalds <[email protected]> wrote:

> On Wed, 6 Mar 2024 at 11:01, Linus Torvalds
> <[email protected]> wrote:
> >
> > In some individual tracing C file where it has a comment above it how
> > it's braindamaged and unsafe and talking about why it's ok in that
> > particular context? Go wild.
>
> Actually, I take that back.
>
> Even in a random C file, the naming makes no sense. There's no "once" about it.
>
> So if you want to do something like
>
> #define UNSAFE_INCREMENTISH(x) (WRITE_ONCE(a, READ_ONCE(a) + 1))
>
> then that's fine, I guess. Because that's what the operation is.
>
> It's not safe, and it's not an increment, but it _approximates_ an
> increment most of the time. So UNSAFE_INCREMENTISH() pretty much
> perfectly describes what it is doing.
>
> Note that you'll also almost certainly end up with worse code
> generation, ie don't expect to see a single "inc" instruction (or "add
> $1") for the above.
>
> Because at least for gcc, the volatiles involved with those "ONCE"
> operations end up often generating much worse code, so rather than an
> "inc" instruction, you'll almost certainly get "load+add+store" and
> the inevitable code expansion and extra register use.
>
> I really don't know what you want to do, but it smells bad. A big
> comment about why you'd want that "incrementish" operation will be
> needed.
>
> To me, this smells like "Steven did something fundamentally wrong
> again, some tool is now complaining about it, and Steven doesn't want
> to fix the problem but instead paper over it again".
>
> Not a good look.
>
> But I don't have a link to the original report, and I'm not thrilled
> enough about this to go looking for it.

Well, it's not the above, and I was afraid you would even think that.

Here's the back story. I received the following patch:

https://lore.kernel.org/all/[email protected]/

I didn't like it. My reply was:

> - rbwork->wait_index++;
> + WRITE_ONCE(rbwork->wait_index, READ_ONCE(rbwork->wait_index) + 1);

I mean the above is really ugly. If this is the new thing to do, we need
better macros.

If anything, just convert it to an atomic_t.

Then because I'm a reviewer of RCU patches, I saw the same fix for RCU
(this thread):

https://lore.kernel.org/all/[email protected]/

Where it was recommended to do the same WRITE_ONCE(a, READ_ONCE(a) + 1),
and this is when I Cc'd you into the conversation to get your view of the
situation.

Sounds like my original gut feeling that this is a non-issue is proving to
be correct.

I even argued against using the _ONCE() macros. If WRITE_ONCE(a, READ_ONCE(a) + 1)
is a valid operation, I sure as hell don't want it in my code, but I would
be OK if it had a macro wrapper. Which you seem to be against, so I'm
against it too.

-- Steve

2024-03-06 19:46:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Wed, 6 Mar 2024 at 11:27, Steven Rostedt <[email protected]> wrote:
>
> Note this has nothing to do with tracing. This thread is in RCU. I just
> happen to receive the same patch "fix" for my code.

Ok, googling for rtort_pipe_count, I can only state that that code is
complete garbage.

And no amount of READ_ONCE/WRITE_ONCE will fix it.

For one thing, we have this code:

WRITE_ONCE(rp->rtort_pipe_count, i + 1);
if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {

which is broken by design. The compiler is allowed to (and probably
does) turn that into just

WRITE_ONCE(rp->rtort_pipe_count, i + 1);
if (i + 1 >= RCU_TORTURE_PIPE_LEN) {

which only results in the question "Why didn't the source code do that
obvious simplification itself?"

So that code is actively *STUPID*. It's randomly mixing READ_ONCE and
regular reads in ways that just makes me go: "there's no saving this
shit".

This needs fixing. Having tests that have random code in them only
makes me doubt that the *TEST* itself is correct, rather than the code
it is trying to actually test.

And dammit, none of that makes sense anyway. This is not some
performance-crticial code. Why is it not using proper atomics if there
is an actual data race?

The reason to use READ_ONCE() and WRITE_ONCE() is that they can be a
lot faster than atomics, or - more commonly - because you have some
fundamental algorithm that doesn't do arithmetic, but cares about some
"state at time X" (the RCU _pointer_ being one such obvious case, but
doing an *increment* sure as hell isn't).

So using those READ_ONCE/WRITE_ONCE macros for that thing is
fundamntally wrong to begin with.

The question should not be "should we add another READ_ONCE()". The
question should be "what drugs were people on when writing this code"?

People - please just stop writing garbage.

That 'rtort_pipe_count' should be an atomic_t, and the "add one and
return the old value" should be an "atomic_inc_return()-1" (the "-1"
is because "inc_return" returns the *new* value).

And feel free to add "_relaxed()" to that atomic op because this code
doesn't care about ordering of that counter. It will help on some
architectures, but as mentioned, this is not performance-crticial code
to begin with.

Linus

2024-03-06 20:19:56

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Wed, 6 Mar 2024 at 11:45, Steven Rostedt <[email protected]> wrote:
>
> Here's the back story. I received the following patch:
>
> https://lore.kernel.org/all/[email protected]/
>
> I didn't like it. My reply was:
>
> > - rbwork->wait_index++;
> > + WRITE_ONCE(rbwork->wait_index, READ_ONCE(rbwork->wait_index) + 1);
>
> I mean the above is really ugly. If this is the new thing to do, we need
> better macros.
>
> If anything, just convert it to an atomic_t.

The right thing is definitely to convert it to an atomic_t.

The memory barriers can probably also be turned into atomic ordering,
although we don't always have all the variates.

But for example, that

/* Make sure to see the new wait index */
smp_rmb();
if (wait_index != work->wait_index)
break;

looks odd, and should probably do an "atomic_read_acquire()" instead
of a rmb and a (non-atomic and non-READ_ONCE thing).

The first READ_ONCE() should probably also be that atomic_read_acquire() op.

On the writing side, my gut feel is that the

rbwork->wait_index++;
/* make sure the waiters see the new index */
smp_wmb();

should be an "atomic_inc_release(&rbwork->wait_index);" but we don't
actually have that operation. We only have the "release" versions for
things that return a value.

So it would probably need to be either

atomic_inc(&rbwork->wait_index);
/* make sure the waiters see the new index */
smp_wmb();

or

atomic_inc_return_release(&rbwork->wait_index);

or we'd need to add the "basic atomics with ordering semantics" (which
we aren't going to do unless we end up with a lot more people who want
them).

I dunno. I didn't look all *that* closely at the code. The above might
be garbage too. Somebody who actually knows the code should think
about what ordering they actually were looking for.

(And I note that 'wait_index' is of type 'long' in 'struct
rb_irq_work', so I guess it should be "atomic_long_t" instead - just
shows how little attention I paid on the first read-through, which
should make everybody go "I need to double-check Linus here")

Linus

2024-03-06 20:20:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Wed, 6 Mar 2024 at 11:46, Linus Torvalds
<[email protected]> wrote:
>
> That 'rtort_pipe_count' should be an atomic_t, and the "add one and
> return the old value" should be an "atomic_inc_return()-1" (the "-1"
> is because "inc_return" returns the *new* value).

Bah. I am lost in a twisty maze of operations, all the same.

One final correction to myself: if you want the old value, the nicer
thing to use is probably just "atomic_fetch_inc()".

It generates the same result as "atomic_inc_return()-1", but since we
do have that native "return old value" variant of this, let's just use
it.

So the rules are "atomic_op_return()" returns the new value after the
op, and "atomic_fetch_op()" returns the old value.

For some ops, this matters more than for others. For 'add' like
operations, it's you can deduce the old from the new (and vice versa).

But for bitwise ops, only the 'fetch" version makes much sense,
because you can see the end result from that, but you can't figure out
the original value from the final one.

And to *really* confuse things, as with the memory ordering variants,
we don't always have the full complement of operations.

So we have atomic_fetch_and() (returns old version) and atomic_and()
(doesn't return any version), but we don't have "atomic_and_return()"
because it's less useful.

But for 'inc' we have all three.

Linus

2024-03-07 02:29:52

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Wed, Mar 06, 2024 at 11:46:10AM -0800, Linus Torvalds wrote:
> On Wed, 6 Mar 2024 at 11:27, Steven Rostedt <[email protected]> wrote:
> >
> > Note this has nothing to do with tracing. This thread is in RCU. I just
> > happen to receive the same patch "fix" for my code.
>
> Ok, googling for rtort_pipe_count, I can only state that that code is
> complete garbage.

Well, you all do have to admit that I was right about something, namely
that Linus did not suffer in silence. ;-)

TL;DR: Those ->rtort_pipe_count increments cannot run concurrently
with each other or any other update of that field, so that update-side
READ_ONCE() call is unnecessary and the update-side plain C-language
read is OK. The WRITE_ONCE() calls are there for the benefit of the
lockless read-side accesses to rtort_pipe_count.

Of course, I will fix.

> And no amount of READ_ONCE/WRITE_ONCE will fix it.
>
> For one thing, we have this code:
>
> WRITE_ONCE(rp->rtort_pipe_count, i + 1);
> if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
>
> which is broken by design. The compiler is allowed to (and probably
> does) turn that into just
>
> WRITE_ONCE(rp->rtort_pipe_count, i + 1);
> if (i + 1 >= RCU_TORTURE_PIPE_LEN) {
>
> which only results in the question "Why didn't the source code do that
> obvious simplification itself?"

Oddly enough, we got a patch to this effect just yesterday evening,
Pacific Time:

https://lore.kernel.org/all/[email protected]/

But that of course only fixes one aspect of this mess.

> So that code is actively *STUPID*. It's randomly mixing READ_ONCE and
> regular reads in ways that just makes me go: "there's no saving this
> shit".
>
> This needs fixing. Having tests that have random code in them only
> makes me doubt that the *TEST* itself is correct, rather than the code
> it is trying to actually test.

Huh. There should only be one CPU updating ->rtort_pipe_count at at time.
Which to your point would make that READ_ONCE() call not only unnecessary,
but confusing as well. Here is the intended uses of ->rtort_pipe_count:

1. Zeroed when the item is first allocated.

2. Incremented at the end of each subsequent grace period,
so there is no concurrency with previous instances of
this increment and also none with the zeroing.

This happens in two different ways. When testing things like
call_rcu(), the RCU callback does the update, and if we have not
yet reached the limit, passes that same callback to call_rcu().
When testing other grace-period primitives (synchronize_rcu(),
for example), we put the item on a list, and update each element
of that list after each subsequent synchronous grace period.
Once an element has pased through RCU_TORTURE_PIPE_LEN grace
periods, it is removed from that list and freed.

Either way, there is only ever one CPU incrementing a given
->rtort_pipe_count at any given time.

3. Freed, and not passed to another grace period, thus avoiding
concurrency with a future #2. Allocation and free is protected by
a spinlock, so there is no concurrency with #1 above. The same
CPU that did the last increment does the free, so there is also
no concurrency with the last #2.

I fired up a KCSAN run with added ASSERT_EXCLUSIVE_WRITER() calls, which
agrees with this analysis. (And I will run longer runs to double-check.)

> And dammit, none of that makes sense anyway. This is not some
> performance-crticial code. Why is it not using proper atomics if there
> is an actual data race?
>
> The reason to use READ_ONCE() and WRITE_ONCE() is that they can be a
> lot faster than atomics, or - more commonly - because you have some
> fundamental algorithm that doesn't do arithmetic, but cares about some
> "state at time X" (the RCU _pointer_ being one such obvious case, but
> doing an *increment* sure as hell isn't).

The only data race is between the one-at-a-time increment and the
lockless reads in the RCU readers. So the RCU readers need READ_ONCE()
for ->rtort_pipe_count, but again the updater does not.

Which means that the compiler optimization that Linus pointed out above
really is an optimization for once because the compiler is for once
correct that nothing else is updating ->rtort_pipe_count.

> So using those READ_ONCE/WRITE_ONCE macros for that thing is
> fundamntally wrong to begin with.
>
> The question should not be "should we add another READ_ONCE()". The
> question should be "what drugs were people on when writing this code"?

So what (if anything) was I thinking when I committed those update-side
READ_ONCE() calls?

202489101f2e6c ("rcutorture: Fix rcu_torture_one_read()/rcu_torture_writer() data race")

The commit log says the right thing, but I nevertheless added that
unnecessary READ_ONCE() call. And here I was hoping to be able to blame
someone else! ;-)

> People - please just stop writing garbage.
>
> That 'rtort_pipe_count' should be an atomic_t, and the "add one and
> return the old value" should be an "atomic_inc_return()-1" (the "-1"
> is because "inc_return" returns the *new* value).
>
> And feel free to add "_relaxed()" to that atomic op because this code
> doesn't care about ordering of that counter. It will help on some
> architectures, but as mentioned, this is not performance-crticial code
> to begin with.

There is no update-side concurrency, so there is no need for atomics.
I just need to get rid of that extraneous update-side READ_ONCE() call.

Athough I am not sure that I can credibly promise to never ever write
garbage, I most certainly can fix this particular instance. :-/

Thanx, Paul

2024-03-07 02:44:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Wed, 6 Mar 2024 at 18:29, Paul E. McKenney <[email protected]> wrote:
>
> TL;DR: Those ->rtort_pipe_count increments cannot run concurrently
> with each other or any other update of that field, so that update-side
> READ_ONCE() call is unnecessary and the update-side plain C-language
> read is OK. The WRITE_ONCE() calls are there for the benefit of the
> lockless read-side accesses to rtort_pipe_count.

Ahh. Ok. That makes a bit more sense.

So if that's the case, then the "updating side" should never use
READ_ONCE, because there's nothing else to protect against.

Honestly, this all makes me think that we'd be *much* better off
showing the real "handoff" with smp_store_release() and
smp_load_acquire().

IOW, something like this (TOTALLY UNTESTED!) patch, perhaps?

And please note that this patch is not only untested, it really is a
very handwavy patch.

I'm sending it as a patch just because it's a more precise way of
saying "I think the writers and readers could use the store-release ->
load-acquire not just to avoid any worries about accessing things
once, but also as a way to show the directional 'flow' of the data".

I dunno.

Linus


Attachments:
patch.diff (1.65 kB)

2024-03-07 02:50:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Wed, 6 Mar 2024 at 18:43, Linus Torvalds
<[email protected]> wrote:
>
> I dunno.

Oh, and just looking at that patch, I still think the code is confused.

On the reading side, we have:

pipe_count = smp_load_acquire(&p->rtort_pipe_count);
if (pipe_count > RCU_TORTURE_PIPE_LEN) {
/* Should not happen, but... */

where that comment clearly says that the pipe_count we read (whether
with READ_ONCE() or with my smp_load_acquire() suggestion) should
never be larger than RCU_TORTURE_PIPE_LEN.

But the writing side very clearly did:

i = rp->rtort_pipe_count;
if (i > RCU_TORTURE_PIPE_LEN)
i = RCU_TORTURE_PIPE_LEN;
...
smp_store_release(&rp->rtort_pipe_count, ++i);

(again, syntactically it could have been "i + 1" instead of my "++i" -
same value), so clearly the writing side *can* write a value that is >
RCU_TORTURE_PIPE_LEN.

So while the whole READ/WRITE_ONCE vs smp_load_acquire/store_release
is one thing that might be worth looking at, I think there are other
very confusing aspects here.

Linus

2024-03-07 03:06:20

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Wed, Mar 06, 2024 at 06:43:42PM -0800, Linus Torvalds wrote:
> On Wed, 6 Mar 2024 at 18:29, Paul E. McKenney <[email protected]> wrote:
> >
> > TL;DR: Those ->rtort_pipe_count increments cannot run concurrently
> > with each other or any other update of that field, so that update-side
> > READ_ONCE() call is unnecessary and the update-side plain C-language
> > read is OK. The WRITE_ONCE() calls are there for the benefit of the
> > lockless read-side accesses to rtort_pipe_count.
>
> Ahh. Ok. That makes a bit more sense.
>
> So if that's the case, then the "updating side" should never use
> READ_ONCE, because there's nothing else to protect against.
>
> Honestly, this all makes me think that we'd be *much* better off
> showing the real "handoff" with smp_store_release() and
> smp_load_acquire().
>
> IOW, something like this (TOTALLY UNTESTED!) patch, perhaps?
>
> And please note that this patch is not only untested, it really is a
> very handwavy patch.
>
> I'm sending it as a patch just because it's a more precise way of
> saying "I think the writers and readers could use the store-release ->
> load-acquire not just to avoid any worries about accessing things
> once, but also as a way to show the directional 'flow' of the data".
>
> I dunno.

Thank you for looking at this!

I will look carefully at this, but the reason I didn't do it this way
to begin with is that I did not want false negatives that let weak-memory
RCU bugs escape unnoticed due to that synchronization and its overhead.

Of course on x86, that synchronization is (nearly) free.

Thanx, Paul

> Linus

> kernel/rcu/rcutorture.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> index 7567ca8e743c..60b74df3eae2 100644
> --- a/kernel/rcu/rcutorture.c
> +++ b/kernel/rcu/rcutorture.c
> @@ -461,12 +461,12 @@ rcu_torture_pipe_update_one(struct rcu_torture *rp)
> WRITE_ONCE(rp->rtort_chkp, NULL);
> smp_store_release(&rtrcp->rtc_ready, 1); // Pair with smp_load_acquire().
> }
> - i = READ_ONCE(rp->rtort_pipe_count);
> + i = rp->rtort_pipe_count;
> if (i > RCU_TORTURE_PIPE_LEN)
> i = RCU_TORTURE_PIPE_LEN;
> atomic_inc(&rcu_torture_wcount[i]);
> - WRITE_ONCE(rp->rtort_pipe_count, i + 1);
> - if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
> + smp_store_release(&rp->rtort_pipe_count, ++i);
> + if (i >= RCU_TORTURE_PIPE_LEN) {
> rp->rtort_mbtest = 0;
> return true;
> }
> @@ -1408,8 +1408,7 @@ rcu_torture_writer(void *arg)
> if (i > RCU_TORTURE_PIPE_LEN)
> i = RCU_TORTURE_PIPE_LEN;
> atomic_inc(&rcu_torture_wcount[i]);
> - WRITE_ONCE(old_rp->rtort_pipe_count,
> - old_rp->rtort_pipe_count + 1);
> + smp_store_release(&old_rp->rtort_pipe_count, ++i);
>
> // Make sure readers block polled grace periods.
> if (cur_ops->get_gp_state && cur_ops->poll_gp_state) {
> @@ -1991,7 +1990,7 @@ static bool rcu_torture_one_read(struct torture_random_state *trsp, long myid)
> rcu_torture_reader_do_mbchk(myid, p, trsp);
> rtrsp = rcutorture_loop_extend(&readstate, trsp, rtrsp);
> preempt_disable();
> - pipe_count = READ_ONCE(p->rtort_pipe_count);
> + pipe_count = smp_load_acquire(&p->rtort_pipe_count);
> if (pipe_count > RCU_TORTURE_PIPE_LEN) {
> /* Should not happen, but... */
> pipe_count = RCU_TORTURE_PIPE_LEN;


2024-03-07 03:06:32

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On 2024-03-06 21:43, Linus Torvalds wrote:
[...]
>
> Honestly, this all makes me think that we'd be *much* better off
> showing the real "handoff" with smp_store_release() and
> smp_load_acquire().

We've done something similar in liburcu (userspace code) to allow
Thread Sanitizer to understand the happens-before relationships
within the RCU implementations and lock-free data structures.

Moving to load-acquire/store-release (C11 model in our case)
allowed us to provide enough happens-before relationship for
Thread Sanitizer to understand what is happening under the
hood in liburcu and perform relevant race detection of user
code.

As far as the WRITE_ONCE(x, READ_ONCE(x) + 1) pattern
is concerned, the only valid use-case I can think of is
split counters or RCU implementations where there is a
single updater doing the increment, and one or more
concurrent reader threads that need to snapshot a
consistent value with READ_ONCE().

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


2024-03-07 03:21:24

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Wed, Mar 06, 2024 at 06:49:38PM -0800, Linus Torvalds wrote:
> On Wed, 6 Mar 2024 at 18:43, Linus Torvalds
> <[email protected]> wrote:
> >
> > I dunno.
>
> Oh, and just looking at that patch, I still think the code is confused.
>
> On the reading side, we have:
>
> pipe_count = smp_load_acquire(&p->rtort_pipe_count);
> if (pipe_count > RCU_TORTURE_PIPE_LEN) {
> /* Should not happen, but... */
>
> where that comment clearly says that the pipe_count we read (whether
> with READ_ONCE() or with my smp_load_acquire() suggestion) should
> never be larger than RCU_TORTURE_PIPE_LEN.

I will fix that comment. It should not happen *if* RCU is working
correctly. It can happen if you have an RCU that is so broken that a
single RCU reader can span more than ten grace periods. An example of
an RCU that really is this broken can be selected using rcutorture's
torture_type=busted module parameter. No surprise, given that its
implementation of call_rcu() invokes the callback function directly and
its implementation of synchronize_rcu() is a complete no-op. ;-)

Of course, the purpose of that value of the torture_type module parameter
(along with all other possible values containing the string "busted")
is to test rcutorture itself.

> But the writing side very clearly did:
>
> i = rp->rtort_pipe_count;
> if (i > RCU_TORTURE_PIPE_LEN)
> i = RCU_TORTURE_PIPE_LEN;
> ...
> smp_store_release(&rp->rtort_pipe_count, ++i);
>
> (again, syntactically it could have been "i + 1" instead of my "++i" -
> same value), so clearly the writing side *can* write a value that is >
> RCU_TORTURE_PIPE_LEN.
>
> So while the whole READ/WRITE_ONCE vs smp_load_acquire/store_release
> is one thing that might be worth looking at, I think there are other
> very confusing aspects here.

With this change in that comment, are things better?

Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 6b821a7037b03..0cb5452ecd945 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -2000,7 +2000,8 @@ static bool rcu_torture_one_read(struct torture_random_state *trsp, long myid)
preempt_disable();
pipe_count = READ_ONCE(p->rtort_pipe_count);
if (pipe_count > RCU_TORTURE_PIPE_LEN) {
- /* Should not happen, but... */
+ // Should not happen in a correct RCU implementation,
+ // happens quite often for torture_type=busted.
pipe_count = RCU_TORTURE_PIPE_LEN;
}
completed = cur_ops->get_gp_seq();

2024-03-07 03:38:39

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Wed, Mar 06, 2024 at 10:06:21PM -0500, Mathieu Desnoyers wrote:
> On 2024-03-06 21:43, Linus Torvalds wrote:
> [...]
> >
> > Honestly, this all makes me think that we'd be *much* better off
> > showing the real "handoff" with smp_store_release() and
> > smp_load_acquire().
>
> We've done something similar in liburcu (userspace code) to allow
> Thread Sanitizer to understand the happens-before relationships
> within the RCU implementations and lock-free data structures.
>
> Moving to load-acquire/store-release (C11 model in our case)
> allowed us to provide enough happens-before relationship for
> Thread Sanitizer to understand what is happening under the
> hood in liburcu and perform relevant race detection of user
> code.

Good point!

In the kernel, though, KCSAN understands the Linux-kernel memory model,
and so we don't get that sort of false positive.

> As far as the WRITE_ONCE(x, READ_ONCE(x) + 1) pattern
> is concerned, the only valid use-case I can think of is
> split counters or RCU implementations where there is a
> single updater doing the increment, and one or more
> concurrent reader threads that need to snapshot a
> consistent value with READ_ONCE().

It is wrong here. OK, not wrong from a functional viewpoint, but it
is at best very confusing. I am applying patches to fix this. I will
push out a new "dev" branch on -rcu that will make this function appear
as shown below.

So what would you use that pattern for?

One possibility is a per-CPU statistical counter in userspace on a
fastpath, in cases where losing the occasional count is OK. Then learning
your CPU (and possibly being immediately migrated to some other CPU),
READ_ONCE() of the count, increment, and WRITE_ONCE() might (or might not)
make sense.

I suppose the same in the kernel if there was a fastpath so extreme you
could not afford to disable preemption.

At best, very niche.

Or am I suffering a failure of imagination yet again? ;-)

Thanx, Paul

------------------------------------------------------------------------

static bool
rcu_torture_pipe_update_one(struct rcu_torture *rp)
{
int i;
struct rcu_torture_reader_check *rtrcp = READ_ONCE(rp->rtort_chkp);

if (rtrcp) {
WRITE_ONCE(rp->rtort_chkp, NULL);
smp_store_release(&rtrcp->rtc_ready, 1); // Pair with smp_load_acquire().
}
i = rp->rtort_pipe_count;
if (i > RCU_TORTURE_PIPE_LEN)
i = RCU_TORTURE_PIPE_LEN;
atomic_inc(&rcu_torture_wcount[i]);
WRITE_ONCE(rp->rtort_pipe_count, i + 1);
ASSERT_EXCLUSIVE_WRITER(rp->rtort_pipe_count);
if (i + 1 >= RCU_TORTURE_PIPE_LEN) {
rp->rtort_mbtest = 0;
return true;
}
return false;
}

2024-03-07 05:46:53

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug



On 3/6/2024 10:37 PM, Paul E. McKenney wrote:
> On Wed, Mar 06, 2024 at 10:06:21PM -0500, Mathieu Desnoyers wrote:
>> On 2024-03-06 21:43, Linus Torvalds wrote:
>> [...]
>>>
>>> Honestly, this all makes me think that we'd be *much* better off
>>> showing the real "handoff" with smp_store_release() and
>>> smp_load_acquire().
>>
>> We've done something similar in liburcu (userspace code) to allow
>> Thread Sanitizer to understand the happens-before relationships
>> within the RCU implementations and lock-free data structures.
>>
>> Moving to load-acquire/store-release (C11 model in our case)
>> allowed us to provide enough happens-before relationship for
>> Thread Sanitizer to understand what is happening under the
>> hood in liburcu and perform relevant race detection of user
>> code.
>
> Good point!
>
> In the kernel, though, KCSAN understands the Linux-kernel memory model,
> and so we don't get that sort of false positive.
>
>> As far as the WRITE_ONCE(x, READ_ONCE(x) + 1) pattern
>> is concerned, the only valid use-case I can think of is
>> split counters or RCU implementations where there is a
>> single updater doing the increment, and one or more
>> concurrent reader threads that need to snapshot a
>> consistent value with READ_ONCE().
>
> It is wrong here. OK, not wrong from a functional viewpoint, but it
> is at best very confusing. I am applying patches to fix this. I will
> push out a new "dev" branch on -rcu that will make this function appear
> as shown below.
>
> So what would you use that pattern for?
>
> One possibility is a per-CPU statistical counter in userspace on a
> fastpath, in cases where losing the occasional count is OK. Then learning
> your CPU (and possibly being immediately migrated to some other CPU),
> READ_ONCE() of the count, increment, and WRITE_ONCE() might (or might not)
> make sense.
>
> I suppose the same in the kernel if there was a fastpath so extreme you
> could not afford to disable preemption.
>
> At best, very niche.
>
> Or am I suffering a failure of imagination yet again? ;-)
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> static bool
> rcu_torture_pipe_update_one(struct rcu_torture *rp)
> {
> int i;
> struct rcu_torture_reader_check *rtrcp = READ_ONCE(rp->rtort_chkp);
>
> if (rtrcp) {
> WRITE_ONCE(rp->rtort_chkp, NULL);
> smp_store_release(&rtrcp->rtc_ready, 1); // Pair with smp_load_acquire().
> }
> i = rp->rtort_pipe_count;
> if (i > RCU_TORTURE_PIPE_LEN)
> i = RCU_TORTURE_PIPE_LEN;
> atomic_inc(&rcu_torture_wcount[i]);
> WRITE_ONCE(rp->rtort_pipe_count, i + 1);
> ASSERT_EXCLUSIVE_WRITER(rp->rtort_pipe_count);

I was going to say to add a comment here for the future reader, that update-side
->rtort_pipe_count READ/WRITE are already mutually exclusive, but this ASSERT
already documents it ;-)

Also FWIW I confirmed after starting at code that indeed only one update-side
access is possible at a time! Thanks,
Reviewed-by: Joel Fernandes (Google) <[email protected]>

- Joel


2024-03-07 13:20:20

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Wed, 6 Mar 2024 12:06:00 -0800
Linus Torvalds <[email protected]> wrote:

> On Wed, 6 Mar 2024 at 11:45, Steven Rostedt <[email protected]> wrote:
> >
> > Here's the back story. I received the following patch:
> >
> > https://lore.kernel.org/all/[email protected]/
> >
> > I didn't like it. My reply was:
> >
> > > - rbwork->wait_index++;
> > > + WRITE_ONCE(rbwork->wait_index, READ_ONCE(rbwork->wait_index) + 1);
> >
> > I mean the above is really ugly. If this is the new thing to do, we need
> > better macros.
> >
> > If anything, just convert it to an atomic_t.
>
> The right thing is definitely to convert it to an atomic_t.

Agreed.

>
> The memory barriers can probably also be turned into atomic ordering,
> although we don't always have all the variates.
>
> But for example, that
>
> /* Make sure to see the new wait index */
> smp_rmb();
> if (wait_index != work->wait_index)
> break;
>
> looks odd, and should probably do an "atomic_read_acquire()" instead
> of a rmb and a (non-atomic and non-READ_ONCE thing).
>
> The first READ_ONCE() should probably also be that atomic_read_acquire() op.
>
> On the writing side, my gut feel is that the
>
> rbwork->wait_index++;
> /* make sure the waiters see the new index */
> smp_wmb();
>
> should be an "atomic_inc_release(&rbwork->wait_index);" but we don't
> actually have that operation. We only have the "release" versions for
> things that return a value.
>
> So it would probably need to be either
>
> atomic_inc(&rbwork->wait_index);
> /* make sure the waiters see the new index */
> smp_wmb();
>
> or
>
> atomic_inc_return_release(&rbwork->wait_index);
>
> or we'd need to add the "basic atomics with ordering semantics" (which
> we aren't going to do unless we end up with a lot more people who want
> them).
>
> I dunno. I didn't look all *that* closely at the code. The above might
> be garbage too. Somebody who actually knows the code should think
> about what ordering they actually were looking for.
>
> (And I note that 'wait_index' is of type 'long' in 'struct
> rb_irq_work', so I guess it should be "atomic_long_t" instead - just
> shows how little attention I paid on the first read-through, which
> should make everybody go "I need to double-check Linus here")

It doesn't need to be long. I'm not even sure why I made it long. Probably
for natural alignment.

The code isn't that complex. You can have a list of waiters waiting for the
ring buffer to fill to various watermarks (in most cases, it's just one waiter).

When a write happens, it looks to see if the smallest watermark is hit,
if so, calls irqwork to wakeup all the waiters.

The waiters will wake up, check to see if a signal is pending or if the
ring buffer has hit the watermark the waiter was waiting for and exit the
wait loop.

What the wait_index does, is just a way to force all waiters out of the wait
loop regardless of the watermark the waiter is waiting for. Before a waiter
goes into the wait loop, it saves the current wait_index. The waker will
increment the wait_index and then call the same irq_work to wake up all the
waiters.

After the wakeup happens, the waiter will test if the new wait_index
matches what it was before it entered the loop, and if it is different, it
falls out of the loop. Then the caller of the ring_buffer_wait() can
re-evaluate if it needs to enter the wait again.

The wait_index loop exit was needed for when the file descriptor of a file
that uses a ring buffer closes and it needs to wake up all the readers of
that file descriptor to notify their tasks that the file closed.

So we can switch the:

rbwork->wait_index++;
smp_wmb();

into just a:

(void)atomic_inc_return_release(&rbwork->wait_index);

and the:

smp_rmb()
if (wait_index != work->wait_index)

into:

if (wait_index != atomic_read_acquire(&rb->wait_index))

I'll write up a patch.

Hmm, I have the same wait_index logic at the higher level doing basically
the same thing (at the close of the file). I'll switch that over too.

-- Steve

2024-03-07 13:53:07

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On 2024-03-06 22:37, Paul E. McKenney wrote:
> On Wed, Mar 06, 2024 at 10:06:21PM -0500, Mathieu Desnoyers wrote:
[...]
>
>> As far as the WRITE_ONCE(x, READ_ONCE(x) + 1) pattern
>> is concerned, the only valid use-case I can think of is
>> split counters or RCU implementations where there is a
>> single updater doing the increment, and one or more
>> concurrent reader threads that need to snapshot a
>> consistent value with READ_ONCE().
>
[...]
>
> So what would you use that pattern for?
>
> One possibility is a per-CPU statistical counter in userspace on a
> fastpath, in cases where losing the occasional count is OK. Then learning
> your CPU (and possibly being immediately migrated to some other CPU),
> READ_ONCE() of the count, increment, and WRITE_ONCE() might (or might not)
> make sense.
>
> I suppose the same in the kernel if there was a fastpath so extreme you
> could not afford to disable preemption.
>
> At best, very niche.
>
> Or am I suffering a failure of imagination yet again? ;-)

The (niche) use-cases I have in mind are split-counters and RCU
grace period tracking, where precise counters totals are needed
(no lost count).

In the kernel, this could be:

- A per-cpu counter, each counter incremented from thread context with
preemption disabled (single updater per counter), read concurrently by
other threads. WRITE_ONCE/READ_ONCE is useful to make sure there
is no store/load tearing there. Atomics on the update would be stronger
than necessary on the increment fast-path.

- A per-thread counter (e.g. within task_struct), only incremented by the
single thread, read by various threads concurrently.

- A counter which increment happens to be already protected by a lock, read
by various threads without taking the lock. (technically doable, but
I'm not sure I see a relevant use-case for it)

In user-space:

- The "per-cpu" counter would have to use rseq for increments to prevent
inopportune migrations, which needs to be implemented in assembler anyway.
The counter reads would have to use READ_ONCE().

- The per-thread counter (Thread-Local Storage) incremented by a single
thread, read by various threads concurrently, is a good target
for WRITE_ONCE()/READ_ONCE() pairing. This is actually what we do in
various liburcu implementations which track read-side critical sections
per-thread.

- Same as for the kernel, a counter increment protected by a lock which
needs to be read from various threads concurrently without taking
the lock could be a valid use-case, though I fail to see how it is
useful due to lack of imagination on my part. ;-)

I'm possibly missing other use-cases, but those come to mind as not
involving racy counter increments.

I agree that use-cases are so niche that we probably do _not_ want to
introduce ADD_SHARED() for that purpose in a common header file,
because I suspect that it would then become misused in plenty of
scenarios where the updates are actually racy and would be better
served by atomics or local-atomics.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


2024-03-07 16:10:42

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Thu, 7 Mar 2024 08:20:59 -0500
Steven Rostedt <[email protected]> wrote:

> When a write happens, it looks to see if the smallest watermark is hit,
> if so, calls irqwork to wakeup all the waiters.
>
> The waiters will wake up, check to see if a signal is pending or if the
> ring buffer has hit the watermark the waiter was waiting for and exit the
> wait loop.
>
> What the wait_index does, is just a way to force all waiters out of the wait
> loop regardless of the watermark the waiter is waiting for. Before a waiter
> goes into the wait loop, it saves the current wait_index. The waker will
> increment the wait_index and then call the same irq_work to wake up all the
> waiters.
>
> After the wakeup happens, the waiter will test if the new wait_index
> matches what it was before it entered the loop, and if it is different, it
> falls out of the loop. Then the caller of the ring_buffer_wait() can
> re-evaluate if it needs to enter the wait again.
>
> The wait_index loop exit was needed for when the file descriptor of a file
> that uses a ring buffer closes and it needs to wake up all the readers of
> that file descriptor to notify their tasks that the file closed.
>
> So we can switch the:
>
> rbwork->wait_index++;
> smp_wmb();
>
> into just a:
>
> (void)atomic_inc_return_release(&rbwork->wait_index);
>
> and the:
>
> smp_rmb()
> if (wait_index != work->wait_index)
>
> into:
>
> if (wait_index != atomic_read_acquire(&rb->wait_index))
>
> I'll write up a patch.
>
> Hmm, I have the same wait_index logic at the higher level doing basically
> the same thing (at the close of the file). I'll switch that over too.

Discussing this with Maitheu on IRC, we found two bugs with the current
implementation. One was a stupid bug with an easy fix, and the other is
actually a design flaw.

The first bug was the (wait_index != work->wait_index) check was done
*after* the schedule() call and not before it.

The second more fundamental bug is that there's still a race between the
first read of wait_index and the call to prepare_to_wait().

The ring_buffer code doesn't have enough context to know enough to loop or
not. If a file is being closed when another thread is just entering this
code, it could miss the wakeup.

As the callers of ring_buffer_wait() also do a loop, it's redundant to have
ring_buffer_wait() do a loop. It should just do a single wait, and then
exit and let the callers decide if it should loop again.

This will get rid of the need for the rbwork->wait_index and simplifies the
code.

Working on that patch now.

-- Steve

2024-03-07 19:05:38

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Thu, Mar 07, 2024 at 12:44:39AM -0500, Joel Fernandes wrote:
>
>
> On 3/6/2024 10:37 PM, Paul E. McKenney wrote:
> > On Wed, Mar 06, 2024 at 10:06:21PM -0500, Mathieu Desnoyers wrote:
> >> On 2024-03-06 21:43, Linus Torvalds wrote:
> >> [...]
> >>>
> >>> Honestly, this all makes me think that we'd be *much* better off
> >>> showing the real "handoff" with smp_store_release() and
> >>> smp_load_acquire().
> >>
> >> We've done something similar in liburcu (userspace code) to allow
> >> Thread Sanitizer to understand the happens-before relationships
> >> within the RCU implementations and lock-free data structures.
> >>
> >> Moving to load-acquire/store-release (C11 model in our case)
> >> allowed us to provide enough happens-before relationship for
> >> Thread Sanitizer to understand what is happening under the
> >> hood in liburcu and perform relevant race detection of user
> >> code.
> >
> > Good point!
> >
> > In the kernel, though, KCSAN understands the Linux-kernel memory model,
> > and so we don't get that sort of false positive.
> >
> >> As far as the WRITE_ONCE(x, READ_ONCE(x) + 1) pattern
> >> is concerned, the only valid use-case I can think of is
> >> split counters or RCU implementations where there is a
> >> single updater doing the increment, and one or more
> >> concurrent reader threads that need to snapshot a
> >> consistent value with READ_ONCE().
> >
> > It is wrong here. OK, not wrong from a functional viewpoint, but it
> > is at best very confusing. I am applying patches to fix this. I will
> > push out a new "dev" branch on -rcu that will make this function appear
> > as shown below.
> >
> > So what would you use that pattern for?
> >
> > One possibility is a per-CPU statistical counter in userspace on a
> > fastpath, in cases where losing the occasional count is OK. Then learning
> > your CPU (and possibly being immediately migrated to some other CPU),
> > READ_ONCE() of the count, increment, and WRITE_ONCE() might (or might not)
> > make sense.
> >
> > I suppose the same in the kernel if there was a fastpath so extreme you
> > could not afford to disable preemption.
> >
> > At best, very niche.
> >
> > Or am I suffering a failure of imagination yet again? ;-)
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > static bool
> > rcu_torture_pipe_update_one(struct rcu_torture *rp)
> > {
> > int i;
> > struct rcu_torture_reader_check *rtrcp = READ_ONCE(rp->rtort_chkp);
> >
> > if (rtrcp) {
> > WRITE_ONCE(rp->rtort_chkp, NULL);
> > smp_store_release(&rtrcp->rtc_ready, 1); // Pair with smp_load_acquire().
> > }
> > i = rp->rtort_pipe_count;
> > if (i > RCU_TORTURE_PIPE_LEN)
> > i = RCU_TORTURE_PIPE_LEN;
> > atomic_inc(&rcu_torture_wcount[i]);
> > WRITE_ONCE(rp->rtort_pipe_count, i + 1);
> > ASSERT_EXCLUSIVE_WRITER(rp->rtort_pipe_count);
>
> I was going to say to add a comment here for the future reader, that update-side
> ->rtort_pipe_count READ/WRITE are already mutually exclusive, but this ASSERT
> already documents it ;-)

Plus KCSAN is way better at repeatedly inspecting code for this sort of
issue than I am. ;-)

> Also FWIW I confirmed after starting at code that indeed only one update-side
> access is possible at a time! Thanks,
> Reviewed-by: Joel Fernandes (Google) <[email protected]>

Thank you very much! I will apply your Reviewed-by to these commits
on my next rebase:

28455c73b620 ("rcutorture: ASSERT_EXCLUSIVE_WRITER() for ->rtort_pipe_count updates")
b0b99e7db12e ("rcutorture: Remove extraneous rcu_torture_pipe_update_one() READ_ONCE()")

Thanx, Paul

2024-03-07 19:48:53

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Thu, Mar 07, 2024 at 08:53:05AM -0500, Mathieu Desnoyers wrote:
> On 2024-03-06 22:37, Paul E. McKenney wrote:
> > On Wed, Mar 06, 2024 at 10:06:21PM -0500, Mathieu Desnoyers wrote:
> [...]
> >
> > > As far as the WRITE_ONCE(x, READ_ONCE(x) + 1) pattern
> > > is concerned, the only valid use-case I can think of is
> > > split counters or RCU implementations where there is a
> > > single updater doing the increment, and one or more
> > > concurrent reader threads that need to snapshot a
> > > consistent value with READ_ONCE().
> >
> [...]
> >
> > So what would you use that pattern for?
> >
> > One possibility is a per-CPU statistical counter in userspace on a
> > fastpath, in cases where losing the occasional count is OK. Then learning
> > your CPU (and possibly being immediately migrated to some other CPU),
> > READ_ONCE() of the count, increment, and WRITE_ONCE() might (or might not)
> > make sense.
> >
> > I suppose the same in the kernel if there was a fastpath so extreme you
> > could not afford to disable preemption.
> >
> > At best, very niche.
> >
> > Or am I suffering a failure of imagination yet again? ;-)
>
> The (niche) use-cases I have in mind are split-counters and RCU
> grace period tracking, where precise counters totals are needed
> (no lost count).
>
> In the kernel, this could be:

Thank you for looking into this!

> - A per-cpu counter, each counter incremented from thread context with
> preemption disabled (single updater per counter), read concurrently by
> other threads. WRITE_ONCE/READ_ONCE is useful to make sure there
> is no store/load tearing there. Atomics on the update would be stronger
> than necessary on the increment fast-path.

But if preemption is disabled, the updater can read the value without
READ_ONCE() without risk of concurrent update. Or are you concerned about
interrupt handlers? This would have to be a read from the interrupt
handler, given that an updated from the interrupt handler could result
in a lost count.

> - A per-thread counter (e.g. within task_struct), only incremented by the
> single thread, read by various threads concurrently.

Ditto.

> - A counter which increment happens to be already protected by a lock, read
> by various threads without taking the lock. (technically doable, but
> I'm not sure I see a relevant use-case for it)

In that case, the lock would exclude concurrent updates, so the lock
holder would not need READ_ONCE(), correct?

> In user-space:
>
> - The "per-cpu" counter would have to use rseq for increments to prevent
> inopportune migrations, which needs to be implemented in assembler anyway.
> The counter reads would have to use READ_ONCE().

Fair enough!

> - The per-thread counter (Thread-Local Storage) incremented by a single
> thread, read by various threads concurrently, is a good target
> for WRITE_ONCE()/READ_ONCE() pairing. This is actually what we do in
> various liburcu implementations which track read-side critical sections
> per-thread.

Agreed, but do any of these use WRITE_ONCE(x, READ_ONCE(x) + 1) or
similar?

> - Same as for the kernel, a counter increment protected by a lock which
> needs to be read from various threads concurrently without taking
> the lock could be a valid use-case, though I fail to see how it is
> useful due to lack of imagination on my part. ;-)

In RCU, we have "WRITE_ONCE(*sp, *sp + 1)" for this use case, though
here we have the WRITE_ONCE() but not the READ_ONCE() because we hold
the lock excluding any other updates.

> I'm possibly missing other use-cases, but those come to mind as not
> involving racy counter increments.
>
> I agree that use-cases are so niche that we probably do _not_ want to
> introduce ADD_SHARED() for that purpose in a common header file,
> because I suspect that it would then become misused in plenty of
> scenarios where the updates are actually racy and would be better
> served by atomics or local-atomics.

I agree that unless or until we have a reasonable number of use cases, we
should open-code it. With a big fat comment explaining how it works. ;-)

Thanx, Paul

2024-03-07 19:54:51

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On 2024-03-07 14:47, Paul E. McKenney wrote:
> On Thu, Mar 07, 2024 at 08:53:05AM -0500, Mathieu Desnoyers wrote:
>> On 2024-03-06 22:37, Paul E. McKenney wrote:
>>> On Wed, Mar 06, 2024 at 10:06:21PM -0500, Mathieu Desnoyers wrote:
>> [...]
>>>
>>>> As far as the WRITE_ONCE(x, READ_ONCE(x) + 1) pattern
>>>> is concerned, the only valid use-case I can think of is
>>>> split counters or RCU implementations where there is a
>>>> single updater doing the increment, and one or more
>>>> concurrent reader threads that need to snapshot a
>>>> consistent value with READ_ONCE().
>>>
>> [...]
>>>
>>> So what would you use that pattern for?
>>>
>>> One possibility is a per-CPU statistical counter in userspace on a
>>> fastpath, in cases where losing the occasional count is OK. Then learning
>>> your CPU (and possibly being immediately migrated to some other CPU),
>>> READ_ONCE() of the count, increment, and WRITE_ONCE() might (or might not)
>>> make sense.
>>>
>>> I suppose the same in the kernel if there was a fastpath so extreme you
>>> could not afford to disable preemption.
>>>
>>> At best, very niche.
>>>
>>> Or am I suffering a failure of imagination yet again? ;-)
>>
>> The (niche) use-cases I have in mind are split-counters and RCU
>> grace period tracking, where precise counters totals are needed
>> (no lost count).
>>
>> In the kernel, this could be:
>
> Thank you for looking into this!
>
>> - A per-cpu counter, each counter incremented from thread context with
>> preemption disabled (single updater per counter), read concurrently by
>> other threads. WRITE_ONCE/READ_ONCE is useful to make sure there
>> is no store/load tearing there. Atomics on the update would be stronger
>> than necessary on the increment fast-path.
>
> But if preemption is disabled, the updater can read the value without
> READ_ONCE() without risk of concurrent update. Or are you concerned about
> interrupt handlers? This would have to be a read from the interrupt
> handler, given that an updated from the interrupt handler could result
> in a lost count.

You are correct that the updater don't need READ_ONCE there. It would
however require a WRITE_ONCE() to match READ_ONCE() from concurrent
reader threads.

>
>> - A per-thread counter (e.g. within task_struct), only incremented by the
>> single thread, read by various threads concurrently.
>
> Ditto.

Right, only WRITE_ONCE() on the single updater, READ_ONCE() on readers.

>
>> - A counter which increment happens to be already protected by a lock, read
>> by various threads without taking the lock. (technically doable, but
>> I'm not sure I see a relevant use-case for it)
>
> In that case, the lock would exclude concurrent updates, so the lock
> holder would not need READ_ONCE(), correct?

Correct.

>
>> In user-space:
>>
>> - The "per-cpu" counter would have to use rseq for increments to prevent
>> inopportune migrations, which needs to be implemented in assembler anyway.
>> The counter reads would have to use READ_ONCE().
>
> Fair enough!
>
>> - The per-thread counter (Thread-Local Storage) incremented by a single
>> thread, read by various threads concurrently, is a good target
>> for WRITE_ONCE()/READ_ONCE() pairing. This is actually what we do in
>> various liburcu implementations which track read-side critical sections
>> per-thread.
>
> Agreed, but do any of these use WRITE_ONCE(x, READ_ONCE(x) + 1) or
> similar?

Not quite, I recall it's more like WRITE_ONCE(x, READ_ONCE(y)) or such,
so we can grab the value of the current gp counter and store it into a
TLS variable.


>
>> - Same as for the kernel, a counter increment protected by a lock which
>> needs to be read from various threads concurrently without taking
>> the lock could be a valid use-case, though I fail to see how it is
>> useful due to lack of imagination on my part. ;-)
>
> In RCU, we have "WRITE_ONCE(*sp, *sp + 1)" for this use case, though
> here we have the WRITE_ONCE() but not the READ_ONCE() because we hold
> the lock excluding any other updates.

You are right, the READ_ONCE() is not needed in this case for the
updater, only for the concurrent readers.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


2024-03-07 20:14:55

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Thu, 7 Mar 2024 at 11:47, Paul E. McKenney <[email protected]> wrote:
>
> > - The per-thread counter (Thread-Local Storage) incremented by a single
> > thread, read by various threads concurrently, is a good target
> > for WRITE_ONCE()/READ_ONCE() pairing. This is actually what we do in
> > various liburcu implementations which track read-side critical sections
> > per-thread.
>
> Agreed, but do any of these use WRITE_ONCE(x, READ_ONCE(x) + 1) or
> similar?

Absolutely not.

The READ_ONCE->WRITE_ONCE pattern is almost certainly a bug.

The valid reason to have a WRITE_ONCE() is that there's a _later_
READ_ONCE() on another CPU.

So WRITE_ONCE->READ_ONCE (across threads) is very valid. But
READ_ONCE->WRITE_ONCE (inside a thread) simply is not a valid
operation.

We do have things like "local_t", which allows for non-smp-safe local
thread atomic accesses, but they explicitly are *NOT* about some kind
of READ_ONCE -> WRITE_ONCE sequence that by definition cannot be
atomic unless you disable interrupts and disable preemption (at which
point they become pointless and only generate worse code).

But the point of "local_t" is that you can do things that aresafe if
there is no SMP issues. They are kind of an extension of the
percpu_add() kind of operations.

In fact, I think it might be interesting to catch those
READ_ONCE->WRITE_ONCE chains (perhaps with coccinelle?) because they
are a sign of bugs.

Now, there's certainly some possibility of "I really don't care about
some stats, I'm willing to do non-smp-safe and non-thread safe
operations if they are faster". So I'm not saying a
READ_ONCE->WRITE_ONCE data dependency is _always_ a bug, but I do
think it's a pattern that is very very close to being one.

Linus

2024-03-07 20:57:31

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Thu, Mar 07, 2024 at 12:00:44PM -0800, Linus Torvalds wrote:
> On Thu, 7 Mar 2024 at 11:47, Paul E. McKenney <[email protected]> wrote:
> >
> > > - The per-thread counter (Thread-Local Storage) incremented by a single
> > > thread, read by various threads concurrently, is a good target
> > > for WRITE_ONCE()/READ_ONCE() pairing. This is actually what we do in
> > > various liburcu implementations which track read-side critical sections
> > > per-thread.
> >
> > Agreed, but do any of these use WRITE_ONCE(x, READ_ONCE(x) + 1) or
> > similar?
>
> Absolutely not.
>
> The READ_ONCE->WRITE_ONCE pattern is almost certainly a bug.
>
> The valid reason to have a WRITE_ONCE() is that there's a _later_
> READ_ONCE() on another CPU.
>
> So WRITE_ONCE->READ_ONCE (across threads) is very valid. But
> READ_ONCE->WRITE_ONCE (inside a thread) simply is not a valid
> operation.
>
> We do have things like "local_t", which allows for non-smp-safe local
> thread atomic accesses, but they explicitly are *NOT* about some kind
> of READ_ONCE -> WRITE_ONCE sequence that by definition cannot be
> atomic unless you disable interrupts and disable preemption (at which
> point they become pointless and only generate worse code).
>
> But the point of "local_t" is that you can do things that aresafe if
> there is no SMP issues. They are kind of an extension of the
> percpu_add() kind of operations.
>
> In fact, I think it might be interesting to catch those
> READ_ONCE->WRITE_ONCE chains (perhaps with coccinelle?) because they
> are a sign of bugs.

Good point, adding Julia Lawall on CC.

A really stupid version of this pattern (WRITE_ONCE.*READ_ONCE) found
four more in RCU, so I will take a look at those and either fix or add
comments as appropriate.

> Now, there's certainly some possibility of "I really don't care about
> some stats, I'm willing to do non-smp-safe and non-thread safe
> operations if they are faster". So I'm not saying a
> READ_ONCE->WRITE_ONCE data dependency is _always_ a bug, but I do
> think it's a pattern that is very very close to being one.

I agree that valid use cases should be quite rare.

Thanx, Paul

2024-03-07 21:42:26

by Julia Lawall

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug



On Thu, 7 Mar 2024, Paul E. McKenney wrote:

> On Thu, Mar 07, 2024 at 12:00:44PM -0800, Linus Torvalds wrote:
> > On Thu, 7 Mar 2024 at 11:47, Paul E. McKenney <[email protected]> wrote:
> > >
> > > > - The per-thread counter (Thread-Local Storage) incremented by a single
> > > > thread, read by various threads concurrently, is a good target
> > > > for WRITE_ONCE()/READ_ONCE() pairing. This is actually what we do in
> > > > various liburcu implementations which track read-side critical sections
> > > > per-thread.
> > >
> > > Agreed, but do any of these use WRITE_ONCE(x, READ_ONCE(x) + 1) or
> > > similar?
> >
> > Absolutely not.
> >
> > The READ_ONCE->WRITE_ONCE pattern is almost certainly a bug.
> >
> > The valid reason to have a WRITE_ONCE() is that there's a _later_
> > READ_ONCE() on another CPU.
> >
> > So WRITE_ONCE->READ_ONCE (across threads) is very valid. But
> > READ_ONCE->WRITE_ONCE (inside a thread) simply is not a valid
> > operation.
> >
> > We do have things like "local_t", which allows for non-smp-safe local
> > thread atomic accesses, but they explicitly are *NOT* about some kind
> > of READ_ONCE -> WRITE_ONCE sequence that by definition cannot be
> > atomic unless you disable interrupts and disable preemption (at which
> > point they become pointless and only generate worse code).
> >
> > But the point of "local_t" is that you can do things that aresafe if
> > there is no SMP issues. They are kind of an extension of the
> > percpu_add() kind of operations.
> >
> > In fact, I think it might be interesting to catch those
> > READ_ONCE->WRITE_ONCE chains (perhaps with coccinelle?) because they
> > are a sign of bugs.
>
> Good point, adding Julia Lawall on CC.

I tried the following:

@@
expression x;
@@

*WRITE_ONCE(x,<+...READ_ONCE(x)...+>)

This gave a number of results, shown below. Let me know if some of them
are undesirable.

I also tried:

@@
expression x,v;

*v = READ_ONCE(x)
.. when != v
*WRITE_ONCE(x,<+...v...+>)

I found the results somewhat less compelling. Maybe the following are
interesting (note that - means this line is interesting, not a suggestion
to remove it):

479 files match
diff -u -p /home/julia/linux/net/netfilter/nf_tables_api.c /tmp/nothing/net/netfilter/nf_tables_api.c
--- /home/julia/linux/net/netfilter/nf_tables_api.c
+++ /tmp/nothing/net/netfilter/nf_tables_api.c
@@ -10026,8 +10026,6 @@ static unsigned int nft_gc_seq_begin(str
unsigned int gc_seq;

/* Bump gc counter, it becomes odd, this is the busy mark. */
- gc_seq = READ_ONCE(nft_net->gc_seq);
- WRITE_ONCE(nft_net->gc_seq, ++gc_seq);

return gc_seq;
}
diff -u -p /home/julia/linux/fs/xfs/xfs_icache.c /tmp/nothing/fs/xfs/xfs_icache.c
--- /home/julia/linux/fs/xfs/xfs_icache.c
+++ /tmp/nothing/fs/xfs/xfs_icache.c
@@ -2076,8 +2076,6 @@ xfs_inodegc_queue(
cpu_nr = get_cpu();
gc = this_cpu_ptr(mp->m_inodegc);
llist_add(&ip->i_gclist, &gc->list);
- items = READ_ONCE(gc->items);
- WRITE_ONCE(gc->items, items + 1);
shrinker_hits = READ_ONCE(gc->shrinker_hits);

/*
@@ -2199,9 +2197,7 @@ xfs_inodegc_shrinker_scan(
for_each_cpu(cpu, &mp->m_inodegc_cpumask) {
gc = per_cpu_ptr(mp->m_inodegc, cpu);
if (!llist_empty(&gc->list)) {
- unsigned int h = READ_ONCE(gc->shrinker_hits);

- WRITE_ONCE(gc->shrinker_hits, h + 1);
mod_delayed_work_on(cpu, mp->m_inodegc_wq, &gc->work, 0);
no_items = false;
}

In other cases, more work is done between the read and the write. I can send themif of interest.

julia

diff -u -p /home/julia/linux/kernel/sched/fair.c /tmp/nothing/kernel/sched/fair.c
--- /home/julia/linux/kernel/sched/fair.c
+++ /tmp/nothing/kernel/sched/fair.c
@@ -3153,7 +3153,6 @@ static void reset_ptenuma_scan(struct ta
* statistical sampling. Use READ_ONCE/WRITE_ONCE, which are not
* expensive, to avoid any form of compiler optimizations:
*/
- WRITE_ONCE(p->mm->numa_scan_seq, READ_ONCE(p->mm->numa_scan_seq) + 1);
p->mm->numa_scan_offset = 0;
}

diff -u -p /home/julia/linux/net/ipv4/inet_hashtables.c /tmp/nothing/net/ipv4/inet_hashtables.c
--- /home/julia/linux/net/ipv4/inet_hashtables.c
+++ /tmp/nothing/net/ipv4/inet_hashtables.c
@@ -1109,7 +1109,6 @@ ok:
* it may be inexistent.
*/
i = max_t(int, i, get_random_u32_below(8) * step);
- WRITE_ONCE(table_perturb[index], READ_ONCE(table_perturb[index]) + i + step);

/* Head lock still held and bh's disabled */
inet_bind_hash(sk, tb, tb2, port);
diff -u -p /home/julia/linux/kernel/rcu/tree.c /tmp/nothing/kernel/rcu/tree.c
--- /home/julia/linux/kernel/rcu/tree.c
+++ /tmp/nothing/kernel/rcu/tree.c
@@ -1620,8 +1620,6 @@ static void rcu_gp_fqs(bool first_time)
/* Clear flag to prevent immediate re-entry. */
if (READ_ONCE(rcu_state.gp_flags) & RCU_GP_FLAG_FQS) {
raw_spin_lock_irq_rcu_node(rnp);
- WRITE_ONCE(rcu_state.gp_flags,
- READ_ONCE(rcu_state.gp_flags) & ~RCU_GP_FLAG_FQS);
raw_spin_unlock_irq_rcu_node(rnp);
}
}
@@ -1882,8 +1880,6 @@ static void rcu_report_qs_rsp(unsigned l
{
raw_lockdep_assert_held_rcu_node(rcu_get_root());
WARN_ON_ONCE(!rcu_gp_in_progress());
- WRITE_ONCE(rcu_state.gp_flags,
- READ_ONCE(rcu_state.gp_flags) | RCU_GP_FLAG_FQS);
raw_spin_unlock_irqrestore_rcu_node(rcu_get_root(), flags);
rcu_gp_kthread_wake();
}
@@ -2392,8 +2388,6 @@ void rcu_force_quiescent_state(void)
raw_spin_unlock_irqrestore_rcu_node(rnp_old, flags);
return; /* Someone beat us to it. */
}
- WRITE_ONCE(rcu_state.gp_flags,
- READ_ONCE(rcu_state.gp_flags) | RCU_GP_FLAG_FQS);
raw_spin_unlock_irqrestore_rcu_node(rnp_old, flags);
rcu_gp_kthread_wake();
}
diff -u -p /home/julia/linux/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c /tmp/nothing/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
--- /home/julia/linux/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
+++ /tmp/nothing/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
@@ -80,8 +80,6 @@ static int cn23xx_vf_reset_io_queues(str
q_no);
return -1;
}
- WRITE_ONCE(reg_val, READ_ONCE(reg_val) &
- ~CN23XX_PKT_INPUT_CTL_RST);
octeon_write_csr64(oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no),
READ_ONCE(reg_val));

diff -u -p /home/julia/linux/kernel/rcu/srcutree.c /tmp/nothing/kernel/rcu/srcutree.c
--- /home/julia/linux/kernel/rcu/srcutree.c
+++ /tmp/nothing/kernel/rcu/srcutree.c
@@ -628,7 +628,6 @@ static unsigned long srcu_get_delay(stru
if (time_after(j, gpstart))
jbase += j - gpstart;
if (!jbase) {
- WRITE_ONCE(sup->srcu_n_exp_nodelay, READ_ONCE(sup->srcu_n_exp_nodelay) + 1);
if (READ_ONCE(sup->srcu_n_exp_nodelay) > srcu_max_nodelay_phase)
jbase = 1;
}
@@ -1799,7 +1798,6 @@ static void process_srcu(struct work_str
} else {
j = jiffies;
if (READ_ONCE(sup->reschedule_jiffies) == j) {
- WRITE_ONCE(sup->reschedule_count, READ_ONCE(sup->reschedule_count) + 1);
if (READ_ONCE(sup->reschedule_count) > srcu_max_nodelay)
curdelay = 1;
} else {
diff -u -p /home/julia/linux/kernel/kcsan/kcsan_test.c /tmp/nothing/kernel/kcsan/kcsan_test.c
--- /home/julia/linux/kernel/kcsan/kcsan_test.c
+++ /tmp/nothing/kernel/kcsan/kcsan_test.c
@@ -381,7 +381,6 @@ static noinline void test_kernel_change_
test_var ^= TEST_CHANGE_BITS;
kcsan_nestable_atomic_end();
} else
- WRITE_ONCE(test_var, READ_ONCE(test_var) ^ TEST_CHANGE_BITS);
}

static noinline void test_kernel_assert_bits_change(void)
diff -u -p /home/julia/linux/arch/s390/kernel/idle.c /tmp/nothing/arch/s390/kernel/idle.c
--- /home/julia/linux/arch/s390/kernel/idle.c
+++ /tmp/nothing/arch/s390/kernel/idle.c
@@ -43,8 +43,6 @@ void account_idle_time_irq(void)
S390_lowcore.last_update_timer = S390_lowcore.sys_enter_timer;

/* Account time spent with enabled wait psw loaded as idle time. */
- WRITE_ONCE(idle->idle_time, READ_ONCE(idle->idle_time) + idle_time);
- WRITE_ONCE(idle->idle_count, READ_ONCE(idle->idle_count) + 1);
account_idle_time(cputime_to_nsecs(idle_time));
}

diff -u -p /home/julia/linux/mm/mmap.c /tmp/nothing/mm/mmap.c
--- /home/julia/linux/mm/mmap.c
+++ /tmp/nothing/mm/mmap.c
@@ -3476,7 +3476,6 @@ bool may_expand_vm(struct mm_struct *mm,

void vm_stat_account(struct mm_struct *mm, vm_flags_t flags, long npages)
{
- WRITE_ONCE(mm->total_vm, READ_ONCE(mm->total_vm)+npages);

if (is_exec_mapping(flags))
mm->exec_vm += npages;
diff -u -p /home/julia/linux/fs/xfs/libxfs/xfs_iext_tree.c /tmp/nothing/fs/xfs/libxfs/xfs_iext_tree.c
--- /home/julia/linux/fs/xfs/libxfs/xfs_iext_tree.c
+++ /tmp/nothing/fs/xfs/libxfs/xfs_iext_tree.c
@@ -618,7 +618,6 @@ xfs_iext_realloc_root(
*/
static inline void xfs_iext_inc_seq(struct xfs_ifork *ifp)
{
- WRITE_ONCE(ifp->if_seq, READ_ONCE(ifp->if_seq) + 1);
}

void
diff -u -p /home/julia/linux/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c /tmp/nothing/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
--- /home/julia/linux/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
+++ /tmp/nothing/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
@@ -379,8 +379,6 @@ static int cn23xx_reset_io_queues(struct
q_no);
return -1;
}
- WRITE_ONCE(reg_val, READ_ONCE(reg_val) &
- ~CN23XX_PKT_INPUT_CTL_RST);
octeon_write_csr64(oct, CN23XX_SLI_IQ_PKT_CONTROL64(q_no),
READ_ONCE(reg_val));

@@ -879,9 +877,6 @@ static void cn23xx_disable_io_queues(str
/* start the Reset for a particular ring */
WRITE_ONCE(d64, octeon_read_csr64(
oct, CN23XX_SLI_IQ_PKT_CONTROL64(q_no)));
- WRITE_ONCE(d64, READ_ONCE(d64) &
- (~(CN23XX_PKT_INPUT_CTL_RING_ENB)));
- WRITE_ONCE(d64, READ_ONCE(d64) | CN23XX_PKT_INPUT_CTL_RST);
octeon_write_csr64(oct, CN23XX_SLI_IQ_PKT_CONTROL64(q_no),
READ_ONCE(d64));

diff -u -p /home/julia/linux/mm/kfence/kfence_test.c /tmp/nothing/mm/kfence/kfence_test.c
--- /home/julia/linux/mm/kfence/kfence_test.c
+++ /tmp/nothing/mm/kfence/kfence_test.c
@@ -501,7 +501,6 @@ static void test_kmalloc_aligned_oob_wri
* fault immediately after it.
*/
expect.addr = buf + size;
- WRITE_ONCE(*expect.addr, READ_ONCE(*expect.addr) + 1);
KUNIT_EXPECT_FALSE(test, report_available());
test_free(buf);
KUNIT_EXPECT_TRUE(test, report_matches(&expect));
diff -u -p /home/julia/linux/io_uring/io_uring.c /tmp/nothing/io_uring/io_uring.c
--- /home/julia/linux/io_uring/io_uring.c
+++ /tmp/nothing/io_uring/io_uring.c
@@ -363,7 +363,6 @@ static void io_account_cq_overflow(struc
{
struct io_rings *r = ctx->rings;

- WRITE_ONCE(r->cq_overflow, READ_ONCE(r->cq_overflow) + 1);
ctx->cq_extra--;
}

@@ -2403,8 +2402,6 @@ static bool io_get_sqe(struct io_ring_ct
spin_lock(&ctx->completion_lock);
ctx->cq_extra--;
spin_unlock(&ctx->completion_lock);
- WRITE_ONCE(ctx->rings->sq_dropped,
- READ_ONCE(ctx->rings->sq_dropped) + 1);
return false;
}
}
diff -u -p /home/julia/linux/security/apparmor/apparmorfs.c /tmp/nothing/security/apparmor/apparmorfs.c
--- /home/julia/linux/security/apparmor/apparmorfs.c
+++ /tmp/nothing/security/apparmor/apparmorfs.c
@@ -596,7 +596,6 @@ static __poll_t ns_revision_poll(struct

void __aa_bump_ns_revision(struct aa_ns *ns)
{
- WRITE_ONCE(ns->revision, READ_ONCE(ns->revision) + 1);
wake_up_interruptible(&ns->wait);
}

diff -u -p /home/julia/linux/arch/riscv/kvm/vmid.c /tmp/nothing/arch/riscv/kvm/vmid.c
--- /home/julia/linux/arch/riscv/kvm/vmid.c
+++ /tmp/nothing/arch/riscv/kvm/vmid.c
@@ -90,7 +90,6 @@ void kvm_riscv_gstage_vmid_update(struct

/* First user of a new VMID version? */
if (unlikely(vmid_next == 0)) {
- WRITE_ONCE(vmid_version, READ_ONCE(vmid_version) + 1);
vmid_next = 1;

/*

2024-03-07 22:13:44

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Thu, 7 Mar 2024 at 13:40, Julia Lawall <[email protected]> wrote:
>
> I tried the following:
>
> @@
> expression x;
> @@
>
> *WRITE_ONCE(x,<+...READ_ONCE(x)...+>)
>
> This gave a number of results, shown below. Let me know if some of them
> are undesirable.

Well, all the ones you list do look like garbage.

That said, quite often the garbage does seem to be "we don't actually
care about the result". Several of them look like statistics.

Some of them look outright nasty, though:

> --- /home/julia/linux/net/netfilter/nf_tables_api.c
> +++ /tmp/nothing/net/netfilter/nf_tables_api.c
> @@ -10026,8 +10026,6 @@ static unsigned int nft_gc_seq_begin(str
> unsigned int gc_seq;
>
> /* Bump gc counter, it becomes odd, this is the busy mark. */
> - gc_seq = READ_ONCE(nft_net->gc_seq);
> - WRITE_ONCE(nft_net->gc_seq, ++gc_seq);

The above is garbage code, and the comment implies that it is garbage
code that _should_ be reliable.

> diff -u -p /home/julia/linux/fs/xfs/xfs_icache.c /tmp/nothing/fs/xfs/xfs_icache.c
> --- /home/julia/linux/fs/xfs/xfs_icache.c
> +++ /tmp/nothing/fs/xfs/xfs_icache.c
> @@ -2076,8 +2076,6 @@ xfs_inodegc_queue(
> cpu_nr = get_cpu();
> gc = this_cpu_ptr(mp->m_inodegc);
> llist_add(&ip->i_gclist, &gc->list);
> - items = READ_ONCE(gc->items);
> - WRITE_ONCE(gc->items, items + 1);

In contrast, this is also garbage code, but the only user of it seems
to be a heuristic, so if 'items' is off by one (or by a hundred), it
probably doesn't matter.

The xfs code is basically using that 'items' count to decide if it
really wants to do GC or not.

This is actually a case where having a "UNSAFE_INCREMENTISH()" macro
might make sense.

That said, this is also a case where using a "local_t" and using
"local_add_return()" might be a better option. It falls back on true
atomics, but at least on x86 you probably get *better* code generation
for the "incrementish" operation than you get with READ_ONCE ->
WRITE_ONCE.


> diff -u -p /home/julia/linux/kernel/rcu/tree.c /tmp/nothing/kernel/rcu/tree.c
> --- /home/julia/linux/kernel/rcu/tree.c
> +++ /tmp/nothing/kernel/rcu/tree.c
> @@ -1620,8 +1620,6 @@ static void rcu_gp_fqs(bool first_time)
> /* Clear flag to prevent immediate re-entry. */
> if (READ_ONCE(rcu_state.gp_flags) & RCU_GP_FLAG_FQS) {
> raw_spin_lock_irq_rcu_node(rnp);
> - WRITE_ONCE(rcu_state.gp_flags,
> - READ_ONCE(rcu_state.gp_flags) & ~RCU_GP_FLAG_FQS);
> raw_spin_unlock_irq_rcu_node(rnp);

This smells bad to me. The code is holding a lock, but apparently not
one that protects gp_flags.

And that READ_ONCE->WRITE_ONCE sequence can corrupt all the other flags.

Maybe it's fine for some reason (that reason being either that the
ONCE operations aren't actually needed at all, or because nobody
*really* cares about the flags), but it smells.

> @@ -1882,8 +1880,6 @@ static void rcu_report_qs_rsp(unsigned l
> {
> raw_lockdep_assert_held_rcu_node(rcu_get_root());
> WARN_ON_ONCE(!rcu_gp_in_progress());
> - WRITE_ONCE(rcu_state.gp_flags,
> - READ_ONCE(rcu_state.gp_flags) | RCU_GP_FLAG_FQS);
> raw_spin_unlock_irqrestore_rcu_node(rcu_get_root(), flags);

Same field, same lock held, same odd smelly pattern.

> - WRITE_ONCE(rcu_state.gp_flags,
> - READ_ONCE(rcu_state.gp_flags) | RCU_GP_FLAG_FQS);
> raw_spin_unlock_irqrestore_rcu_node(rnp_old, flags);

. and again.

> --- /home/julia/linux/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
> +++ /tmp/nothing/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
> @@ -80,8 +80,6 @@ static int cn23xx_vf_reset_io_queues(str
> q_no);
> return -1;
> }
> - WRITE_ONCE(reg_val, READ_ONCE(reg_val) &
> - ~CN23XX_PKT_INPUT_CTL_RST);
> octeon_write_csr64(oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no),
> READ_ONCE(reg_val));

I suspect this is garbage that has been triggered by the usual
mindless "fix the symptoms, not the bug" as a result of a "undefined
behavior report".

>> --- /home/julia/linux/kernel/kcsan/kcsan_test.c
> +++ /tmp/nothing/kernel/kcsan/kcsan_test.c
> @@ -381,7 +381,6 @@ static noinline void test_kernel_change_
> test_var ^= TEST_CHANGE_BITS;
> kcsan_nestable_atomic_end();
> } else
> - WRITE_ONCE(test_var, READ_ONCE(test_var) ^ TEST_CHANGE_BITS);

Presumably this is intentionally testing whether KCSAN notices these
things at all.

> diff -u -p /home/julia/linux/arch/s390/kernel/idle.c /tmp/nothing/arch/s390/kernel/idle.c
> /* Account time spent with enabled wait psw loaded as idle time. */
> - WRITE_ONCE(idle->idle_time, READ_ONCE(idle->idle_time) + idle_time);
> - WRITE_ONCE(idle->idle_count, READ_ONCE(idle->idle_count) + 1);
> account_idle_time(cputime_to_nsecs(idle_time));

This looks like another "UNSAFE_INCREMENTISH()" case.

> --- /home/julia/linux/mm/mmap.c
> +++ /tmp/nothing/mm/mmap.c
> @@ -3476,7 +3476,6 @@ bool may_expand_vm(struct mm_struct *mm,
>
> void vm_stat_account(struct mm_struct *mm, vm_flags_t flags, long npages)
> {
> - WRITE_ONCE(mm->total_vm, READ_ONCE(mm->total_vm)+npages);

As does this.

> diff -u -p /home/julia/linux/fs/xfs/libxfs/xfs_iext_tree.c /tmp/nothing/fs/xfs/libxfs/xfs_iext_tree.c
> static inline void xfs_iext_inc_seq(struct xfs_ifork *ifp)
> {
> - WRITE_ONCE(ifp->if_seq, READ_ONCE(ifp->if_seq) + 1);
> }

Ugh. A sequence count that is "incrementish"? That smells wrong to me.
But I didn't go look at the users. Maybe it's another case of "we
don't *actually* care about the sequence count".

>
> +++ /tmp/nothing/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
> @@ -379,8 +379,6 @@ static int cn23xx_reset_io_queues(struct
> q_no);
> return -1;
> }
> - WRITE_ONCE(reg_val, READ_ONCE(reg_val) &
> - ~CN23XX_PKT_INPUT_CTL_RST);
> ....
> - WRITE_ONCE(d64, READ_ONCE(d64) &
> - (~(CN23XX_PKT_INPUT_CTL_RING_ENB)));
> - WRITE_ONCE(d64, READ_ONCE(d64) | CN23XX_PKT_INPUT_CTL_RST);


More "likely wrong" cases.

> +++ /tmp/nothing/mm/kfence/kfence_test.c
> @@ -501,7 +501,6 @@ static void test_kmalloc_aligned_oob_wri
> * fault immediately after it.
> */
> expect.addr = buf + size;
> - WRITE_ONCE(*expect.addr, READ_ONCE(*expect.addr) + 1);

Looks like questionable test-code again.

> +++ /tmp/nothing/io_uring/io_uring.c
> @@ -363,7 +363,6 @@ static void io_account_cq_overflow(struc
> {
> struct io_rings *r = ctx->rings;
>
> - WRITE_ONCE(r->cq_overflow, READ_ONCE(r->cq_overflow) + 1);
> ctx->cq_extra--;

Bah. Looks like garbage, but the kernel doesn't actually use that
value. Looks like a random number generator exposed to user space.
Presumably this is another "statistics, but I don't care enouhg".

> @@ -2403,8 +2402,6 @@ static bool io_get_sqe(struct io_ring_ct
> - WRITE_ONCE(ctx->rings->sq_dropped,
> - READ_ONCE(ctx->rings->sq_dropped) + 1);

As is the above.

> +++ /tmp/nothing/security/apparmor/apparmorfs.c
> @@ -596,7 +596,6 @@ static __poll_t ns_revision_poll(struct
>
> void __aa_bump_ns_revision(struct aa_ns *ns)
> {
> - WRITE_ONCE(ns->revision, READ_ONCE(ns->revision) + 1);
> wake_up_interruptible(&ns->wait);

This looks like somebody copied the RCU / tracing pattern?

> +++ /tmp/nothing/arch/riscv/kvm/vmid.c
> @@ -90,7 +90,6 @@ void kvm_riscv_gstage_vmid_update(struct
>
> /* First user of a new VMID version? */
> if (unlikely(vmid_next == 0)) {
> - WRITE_ONCE(vmid_version, READ_ONCE(vmid_version) + 1);
> vmid_next = 1;

Looks bogus and wrong. An unreliable address space version does _not_
sound sane, but who knows.

Anyway, from a quick look, there's a mix of "this is just wrong" and a
couple of "this seems to just want approximate statistics".

Maybe the RCU 'flags' field is using WRITE_ONCE() because while the
spinlock protects the bit changes, there are readers that look at
other bits with READ_ONCE.

That would imply that the READ_ONCE->WRITE_ONCE is just broken garbage
- the WRITE_ONCE() part may be right, but the READ_ONCE is wrong
because the value is stable.

Linus

2024-03-08 00:55:16

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Thu, Mar 07, 2024 at 02:09:00PM -0800, Linus Torvalds wrote:
> On Thu, 7 Mar 2024 at 13:40, Julia Lawall <[email protected]> wrote:
> >
> > I tried the following:
> >
> > @@
> > expression x;
> > @@
> >
> > *WRITE_ONCE(x,<+...READ_ONCE(x)...+>)
> >
> > This gave a number of results, shown below. Let me know if some of them
> > are undesirable.
>
> Well, all the ones you list do look like garbage.
>
> That said, quite often the garbage does seem to be "we don't actually
> care about the result". Several of them look like statistics.

[ . . . ]

> > diff -u -p /home/julia/linux/kernel/rcu/tree.c /tmp/nothing/kernel/rcu/tree.c
> > --- /home/julia/linux/kernel/rcu/tree.c
> > +++ /tmp/nothing/kernel/rcu/tree.c
> > @@ -1620,8 +1620,6 @@ static void rcu_gp_fqs(bool first_time)
> > /* Clear flag to prevent immediate re-entry. */
> > if (READ_ONCE(rcu_state.gp_flags) & RCU_GP_FLAG_FQS) {
> > raw_spin_lock_irq_rcu_node(rnp);
> > - WRITE_ONCE(rcu_state.gp_flags,
> > - READ_ONCE(rcu_state.gp_flags) & ~RCU_GP_FLAG_FQS);
> > raw_spin_unlock_irq_rcu_node(rnp);
>
> This smells bad to me. The code is holding a lock, but apparently not
> one that protects gp_flags.
>
> And that READ_ONCE->WRITE_ONCE sequence can corrupt all the other flags.
>
> Maybe it's fine for some reason (that reason being either that the
> ONCE operations aren't actually needed at all, or because nobody
> *really* cares about the flags), but it smells.
>
> > @@ -1882,8 +1880,6 @@ static void rcu_report_qs_rsp(unsigned l
> > {
> > raw_lockdep_assert_held_rcu_node(rcu_get_root());
> > WARN_ON_ONCE(!rcu_gp_in_progress());
> > - WRITE_ONCE(rcu_state.gp_flags,
> > - READ_ONCE(rcu_state.gp_flags) | RCU_GP_FLAG_FQS);
> > raw_spin_unlock_irqrestore_rcu_node(rcu_get_root(), flags);
>
> Same field, same lock held, same odd smelly pattern.
>
> > - WRITE_ONCE(rcu_state.gp_flags,
> > - READ_ONCE(rcu_state.gp_flags) | RCU_GP_FLAG_FQS);
> > raw_spin_unlock_irqrestore_rcu_node(rnp_old, flags);
>
> .. and again.

In all three cases, the updates are protected by the lock, so the
READ_ONCE() is unnecessary. I have queued a commit to remove the
READ_ONCE()s.

Thanks to both of you (Julia and Linus) for spotting these!

Thanx, Paul

2024-03-08 01:00:08

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency bug

On Thu, Mar 07, 2024 at 02:53:43PM -0500, Mathieu Desnoyers wrote:
> On 2024-03-07 14:47, Paul E. McKenney wrote:
> > On Thu, Mar 07, 2024 at 08:53:05AM -0500, Mathieu Desnoyers wrote:
> > > On 2024-03-06 22:37, Paul E. McKenney wrote:
> > > > On Wed, Mar 06, 2024 at 10:06:21PM -0500, Mathieu Desnoyers wrote:
> > > [...]
> > > >
> > > > > As far as the WRITE_ONCE(x, READ_ONCE(x) + 1) pattern
> > > > > is concerned, the only valid use-case I can think of is
> > > > > split counters or RCU implementations where there is a
> > > > > single updater doing the increment, and one or more
> > > > > concurrent reader threads that need to snapshot a
> > > > > consistent value with READ_ONCE().
> > > >
> > > [...]
> > > >
> > > > So what would you use that pattern for?
> > > >
> > > > One possibility is a per-CPU statistical counter in userspace on a
> > > > fastpath, in cases where losing the occasional count is OK. Then learning
> > > > your CPU (and possibly being immediately migrated to some other CPU),
> > > > READ_ONCE() of the count, increment, and WRITE_ONCE() might (or might not)
> > > > make sense.
> > > >
> > > > I suppose the same in the kernel if there was a fastpath so extreme you
> > > > could not afford to disable preemption.
> > > >
> > > > At best, very niche.
> > > >
> > > > Or am I suffering a failure of imagination yet again? ;-)
> > >
> > > The (niche) use-cases I have in mind are split-counters and RCU
> > > grace period tracking, where precise counters totals are needed
> > > (no lost count).
> > >
> > > In the kernel, this could be:
> >
> > Thank you for looking into this!
> >
> > > - A per-cpu counter, each counter incremented from thread context with
> > > preemption disabled (single updater per counter), read concurrently by
> > > other threads. WRITE_ONCE/READ_ONCE is useful to make sure there
> > > is no store/load tearing there. Atomics on the update would be stronger
> > > than necessary on the increment fast-path.
> >
> > But if preemption is disabled, the updater can read the value without
> > READ_ONCE() without risk of concurrent update. Or are you concerned about
> > interrupt handlers? This would have to be a read from the interrupt
> > handler, given that an updated from the interrupt handler could result
> > in a lost count.
>
> You are correct that the updater don't need READ_ONCE there. It would
> however require a WRITE_ONCE() to match READ_ONCE() from concurrent
> reader threads.
>
> >
> > > - A per-thread counter (e.g. within task_struct), only incremented by the
> > > single thread, read by various threads concurrently.
> >
> > Ditto.
>
> Right, only WRITE_ONCE() on the single updater, READ_ONCE() on readers.
>
> >
> > > - A counter which increment happens to be already protected by a lock, read
> > > by various threads without taking the lock. (technically doable, but
> > > I'm not sure I see a relevant use-case for it)
> >
> > In that case, the lock would exclude concurrent updates, so the lock
> > holder would not need READ_ONCE(), correct?
>
> Correct.
>
> >
> > > In user-space:
> > >
> > > - The "per-cpu" counter would have to use rseq for increments to prevent
> > > inopportune migrations, which needs to be implemented in assembler anyway.
> > > The counter reads would have to use READ_ONCE().
> >
> > Fair enough!
> >
> > > - The per-thread counter (Thread-Local Storage) incremented by a single
> > > thread, read by various threads concurrently, is a good target
> > > for WRITE_ONCE()/READ_ONCE() pairing. This is actually what we do in
> > > various liburcu implementations which track read-side critical sections
> > > per-thread.
> >
> > Agreed, but do any of these use WRITE_ONCE(x, READ_ONCE(x) + 1) or
> > similar?
>
> Not quite, I recall it's more like WRITE_ONCE(x, READ_ONCE(y)) or such,
> so we can grab the value of the current gp counter and store it into a
> TLS variable.

Good point, you could use that pattern to grab a shared snapshot.

Still sounds niche, but you never know! ;-)

Thanx, Paul

> > > - Same as for the kernel, a counter increment protected by a lock which
> > > needs to be read from various threads concurrently without taking
> > > the lock could be a valid use-case, though I fail to see how it is
> > > useful due to lack of imagination on my part. ;-)
> >
> > In RCU, we have "WRITE_ONCE(*sp, *sp + 1)" for this use case, though
> > here we have the WRITE_ONCE() but not the READ_ONCE() because we hold
> > the lock excluding any other updates.
>
> You are right, the READ_ONCE() is not needed in this case for the
> updater, only for the concurrent readers.
>
> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
>