prev->next could be accessed concurrently as noticed by KCSAN,
write (marked) to 0xffff9d3370dbbe40 of 8 bytes by task 3294 on cpu 107:
osq_lock+0x25f/0x350
osq_wait_next at kernel/locking/osq_lock.c:79
(inlined by) osq_lock at kernel/locking/osq_lock.c:185
rwsem_optimistic_spin
<snip>
read to 0xffff9d3370dbbe40 of 8 bytes by task 3398 on cpu 100:
osq_lock+0x196/0x350
osq_lock at kernel/locking/osq_lock.c:157
rwsem_optimistic_spin
<snip>
Since the write only stores NULL to prev->next and the read tests if
prev->next equals to this_cpu_ptr(&osq_node). Even if the value is
shattered, the code is still working correctly. Thus, mark it as an
intentional data race using the data_race() macro.
Signed-off-by: Qian Cai <[email protected]>
---
v2: insert some code comments.
kernel/locking/osq_lock.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 1f7734949ac8..f733bcd99e8a 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
*/
for (;;) {
- if (prev->next == node &&
+ /*
+ * cpu_relax() below implies a compiler barrier which would
+ * prevent this comparison being optimized away.
+ */
+ if (data_race(prev->next == node) &&
cmpxchg(&prev->next, node, NULL) == node)
break;
--
1.8.3.1
> On Feb 11, 2020, at 8:54 AM, Qian Cai <[email protected]> wrote:
>
> prev->next could be accessed concurrently as noticed by KCSAN,
>
> write (marked) to 0xffff9d3370dbbe40 of 8 bytes by task 3294 on cpu 107:
> osq_lock+0x25f/0x350
> osq_wait_next at kernel/locking/osq_lock.c:79
> (inlined by) osq_lock at kernel/locking/osq_lock.c:185
> rwsem_optimistic_spin
> <snip>
>
> read to 0xffff9d3370dbbe40 of 8 bytes by task 3398 on cpu 100:
> osq_lock+0x196/0x350
> osq_lock at kernel/locking/osq_lock.c:157
> rwsem_optimistic_spin
> <snip>
>
> Since the write only stores NULL to prev->next and the read tests if
> prev->next equals to this_cpu_ptr(&osq_node). Even if the value is
> shattered, the code is still working correctly. Thus, mark it as an
> intentional data race using the data_race() macro.
>
> Signed-off-by: Qian Cai <[email protected]>
Hmm, this patch has been dropped from linux-next from some reasons.
Paul, can you pick this up along with KCSAN fixes?
https://lore.kernel.org/lkml/[email protected]/
> ---
>
> v2: insert some code comments.
>
> kernel/locking/osq_lock.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index 1f7734949ac8..f733bcd99e8a 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> */
>
> for (;;) {
> - if (prev->next == node &&
> + /*
> + * cpu_relax() below implies a compiler barrier which would
> + * prevent this comparison being optimized away.
> + */
> + if (data_race(prev->next == node) &&
> cmpxchg(&prev->next, node, NULL) == node)
> break;
>
> --
> 1.8.3.1
>
On Fri, May 08, 2020 at 04:59:05PM -0400, Qian Cai wrote:
>
>
> > On Feb 11, 2020, at 8:54 AM, Qian Cai <[email protected]> wrote:
> >
> > prev->next could be accessed concurrently as noticed by KCSAN,
> >
> > write (marked) to 0xffff9d3370dbbe40 of 8 bytes by task 3294 on cpu 107:
> > osq_lock+0x25f/0x350
> > osq_wait_next at kernel/locking/osq_lock.c:79
> > (inlined by) osq_lock at kernel/locking/osq_lock.c:185
> > rwsem_optimistic_spin
> > <snip>
> >
> > read to 0xffff9d3370dbbe40 of 8 bytes by task 3398 on cpu 100:
> > osq_lock+0x196/0x350
> > osq_lock at kernel/locking/osq_lock.c:157
> > rwsem_optimistic_spin
> > <snip>
> >
> > Since the write only stores NULL to prev->next and the read tests if
> > prev->next equals to this_cpu_ptr(&osq_node). Even if the value is
> > shattered, the code is still working correctly. Thus, mark it as an
> > intentional data race using the data_race() macro.
> >
> > Signed-off-by: Qian Cai <[email protected]>
>
> Hmm, this patch has been dropped from linux-next from some reasons.
>
> Paul, can you pick this up along with KCSAN fixes?
>
> https://lore.kernel.org/lkml/[email protected]/
I have queued it on -rcu, but it is too late for v5.8 via the -rcu tree,
so this is v5.9 at the earliest. Plus I would need an ack from one of
the locking folks.
Thanx, Paul
> > ---
> >
> > v2: insert some code comments.
> >
> > kernel/locking/osq_lock.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> > index 1f7734949ac8..f733bcd99e8a 100644
> > --- a/kernel/locking/osq_lock.c
> > +++ b/kernel/locking/osq_lock.c
> > @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> > */
> >
> > for (;;) {
> > - if (prev->next == node &&
> > + /*
> > + * cpu_relax() below implies a compiler barrier which would
> > + * prevent this comparison being optimized away.
> > + */
> > + if (data_race(prev->next == node) &&
> > cmpxchg(&prev->next, node, NULL) == node)
> > break;
> >
> > --
> > 1.8.3.1
> >
>
> On May 9, 2020, at 12:33 AM, Paul E. McKenney <[email protected]> wrote:
>
> On Fri, May 08, 2020 at 04:59:05PM -0400, Qian Cai wrote:
>>
>>
>>> On Feb 11, 2020, at 8:54 AM, Qian Cai <[email protected]> wrote:
>>>
>>> prev->next could be accessed concurrently as noticed by KCSAN,
>>>
>>> write (marked) to 0xffff9d3370dbbe40 of 8 bytes by task 3294 on cpu 107:
>>> osq_lock+0x25f/0x350
>>> osq_wait_next at kernel/locking/osq_lock.c:79
>>> (inlined by) osq_lock at kernel/locking/osq_lock.c:185
>>> rwsem_optimistic_spin
>>> <snip>
>>>
>>> read to 0xffff9d3370dbbe40 of 8 bytes by task 3398 on cpu 100:
>>> osq_lock+0x196/0x350
>>> osq_lock at kernel/locking/osq_lock.c:157
>>> rwsem_optimistic_spin
>>> <snip>
>>>
>>> Since the write only stores NULL to prev->next and the read tests if
>>> prev->next equals to this_cpu_ptr(&osq_node). Even if the value is
>>> shattered, the code is still working correctly. Thus, mark it as an
>>> intentional data race using the data_race() macro.
>>>
>>> Signed-off-by: Qian Cai <[email protected]>
>>
>> Hmm, this patch has been dropped from linux-next from some reasons.
>>
>> Paul, can you pick this up along with KCSAN fixes?
>>
>> https://lore.kernel.org/lkml/[email protected]/
>
> I have queued it on -rcu, but it is too late for v5.8 via the -rcu tree,
> so this is v5.9 at the earliest. Plus I would need an ack from one of
> the locking folks.
Peter, Will, can you give an ACK? This v2 should incorporate all the feedback from Peter,
https://lore.kernel.org/lkml/[email protected]/
V5.9 is fine. All I care about is it is always in linux-next (so the testing bots won’t trigger this over and over again) and to be in mainline at some point in the future.
>
> Thanx, Paul
>
>>> ---
>>>
>>> v2: insert some code comments.
>>>
>>> kernel/locking/osq_lock.c | 6 +++++-
>>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
>>> index 1f7734949ac8..f733bcd99e8a 100644
>>> --- a/kernel/locking/osq_lock.c
>>> +++ b/kernel/locking/osq_lock.c
>>> @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>>> */
>>>
>>> for (;;) {
>>> - if (prev->next == node &&
>>> + /*
>>> + * cpu_relax() below implies a compiler barrier which would
>>> + * prevent this comparison being optimized away.
>>> + */
>>> + if (data_race(prev->next == node) &&
>>> cmpxchg(&prev->next, node, NULL) == node)
>>> break;
>>>
>>> --
>>> 1.8.3.1
On Sat, May 09, 2020 at 09:01:53AM -0400, Qian Cai wrote:
>
>
> > On May 9, 2020, at 12:33 AM, Paul E. McKenney <[email protected]> wrote:
> >
> > On Fri, May 08, 2020 at 04:59:05PM -0400, Qian Cai wrote:
> >>
> >>
> >>> On Feb 11, 2020, at 8:54 AM, Qian Cai <[email protected]> wrote:
> >>>
> >>> prev->next could be accessed concurrently as noticed by KCSAN,
> >>>
> >>> write (marked) to 0xffff9d3370dbbe40 of 8 bytes by task 3294 on cpu 107:
> >>> osq_lock+0x25f/0x350
> >>> osq_wait_next at kernel/locking/osq_lock.c:79
> >>> (inlined by) osq_lock at kernel/locking/osq_lock.c:185
> >>> rwsem_optimistic_spin
> >>> <snip>
> >>>
> >>> read to 0xffff9d3370dbbe40 of 8 bytes by task 3398 on cpu 100:
> >>> osq_lock+0x196/0x350
> >>> osq_lock at kernel/locking/osq_lock.c:157
> >>> rwsem_optimistic_spin
> >>> <snip>
> >>>
> >>> Since the write only stores NULL to prev->next and the read tests if
> >>> prev->next equals to this_cpu_ptr(&osq_node). Even if the value is
> >>> shattered, the code is still working correctly. Thus, mark it as an
> >>> intentional data race using the data_race() macro.
> >>>
> >>> Signed-off-by: Qian Cai <[email protected]>
> >>
> >> Hmm, this patch has been dropped from linux-next from some reasons.
> >>
> >> Paul, can you pick this up along with KCSAN fixes?
> >>
> >> https://lore.kernel.org/lkml/[email protected]/
> >
> > I have queued it on -rcu, but it is too late for v5.8 via the -rcu tree,
> > so this is v5.9 at the earliest. Plus I would need an ack from one of
> > the locking folks.
>
> Peter, Will, can you give an ACK? This v2 should incorporate all the feedback from Peter,
>
> https://lore.kernel.org/lkml/[email protected]/
>
> V5.9 is fine. All I care about is it is always in linux-next (so the testing bots won’t trigger this over and over again) and to be in mainline at some point in the future.
Ah, and I forgot to ask. Why "if (data_race(prev->next == node)" instead
of "if (data_race(prev->next) == node"?
Thanx, Paul
> >>> ---
> >>>
> >>> v2: insert some code comments.
> >>>
> >>> kernel/locking/osq_lock.c | 6 +++++-
> >>> 1 file changed, 5 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> >>> index 1f7734949ac8..f733bcd99e8a 100644
> >>> --- a/kernel/locking/osq_lock.c
> >>> +++ b/kernel/locking/osq_lock.c
> >>> @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> >>> */
> >>>
> >>> for (;;) {
> >>> - if (prev->next == node &&
> >>> + /*
> >>> + * cpu_relax() below implies a compiler barrier which would
> >>> + * prevent this comparison being optimized away.
> >>> + */
> >>> + if (data_race(prev->next == node) &&
> >>> cmpxchg(&prev->next, node, NULL) == node)
> >>> break;
> >>>
> >>> --
> >>> 1.8.3.1
>
> On May 9, 2020, at 12:12 PM, Paul E. McKenney <[email protected]> wrote:
>
> Ah, and I forgot to ask. Why "if (data_race(prev->next == node)" instead
> of "if (data_race(prev->next) == node"?
I think the one you suggested is slightly better to point out the exact race. Do you want me to resend or you could squash it instead?
On Sat, May 09, 2020 at 12:53:38PM -0400, Qian Cai wrote:
>
>
> > On May 9, 2020, at 12:12 PM, Paul E. McKenney <[email protected]> wrote:
> >
> > Ah, and I forgot to ask. Why "if (data_race(prev->next == node)" instead
> > of "if (data_race(prev->next) == node"?
>
> I think the one you suggested is slightly better to point out the exact race. Do you want me to resend or you could squash it instead?
The patch was still at the top of my stack, so I just amended it. Here
is the updated version.
Thanx, Paul
------------------------------------------------------------------------
commit 13e69ca01ce1621ce74248bda86cfad47fa5a0fa
Author: Qian Cai <[email protected]>
Date: Tue Feb 11 08:54:15 2020 -0500
locking/osq_lock: Annotate a data race in osq_lock
The prev->next pointer can be accessed concurrently as noticed by KCSAN:
write (marked) to 0xffff9d3370dbbe40 of 8 bytes by task 3294 on cpu 107:
osq_lock+0x25f/0x350
osq_wait_next at kernel/locking/osq_lock.c:79
(inlined by) osq_lock at kernel/locking/osq_lock.c:185
rwsem_optimistic_spin
<snip>
read to 0xffff9d3370dbbe40 of 8 bytes by task 3398 on cpu 100:
osq_lock+0x196/0x350
osq_lock at kernel/locking/osq_lock.c:157
rwsem_optimistic_spin
<snip>
Since the write only stores NULL to prev->next and the read tests if
prev->next equals to this_cpu_ptr(&osq_node). Even if the value is
shattered, the code is still working correctly. Thus, mark it as an
intentional data race using the data_race() macro.
Signed-off-by: Qian Cai <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 1f77349..1de006e 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
*/
for (;;) {
- if (prev->next == node &&
+ /*
+ * cpu_relax() below implies a compiler barrier which would
+ * prevent this comparison being optimized away.
+ */
+ if (data_race(prev->next) == node &&
cmpxchg(&prev->next, node, NULL) == node)
break;
On Sat, May 09, 2020 at 02:36:54PM -0700, Paul E. McKenney wrote:
> On Sat, May 09, 2020 at 12:53:38PM -0400, Qian Cai wrote:
> >
> >
> > > On May 9, 2020, at 12:12 PM, Paul E. McKenney <[email protected]> wrote:
> > >
> > > Ah, and I forgot to ask. Why "if (data_race(prev->next == node)" instead
> > > of "if (data_race(prev->next) == node"?
> >
> > I think the one you suggested is slightly better to point out the exact race. Do you want me to resend or you could squash it instead?
>
> The patch was still at the top of my stack, so I just amended it. Here
> is the updated version.
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit 13e69ca01ce1621ce74248bda86cfad47fa5a0fa
> Author: Qian Cai <[email protected]>
> Date: Tue Feb 11 08:54:15 2020 -0500
>
> locking/osq_lock: Annotate a data race in osq_lock
>
> The prev->next pointer can be accessed concurrently as noticed by KCSAN:
>
> write (marked) to 0xffff9d3370dbbe40 of 8 bytes by task 3294 on cpu 107:
> osq_lock+0x25f/0x350
> osq_wait_next at kernel/locking/osq_lock.c:79
> (inlined by) osq_lock at kernel/locking/osq_lock.c:185
> rwsem_optimistic_spin
> <snip>
>
> read to 0xffff9d3370dbbe40 of 8 bytes by task 3398 on cpu 100:
> osq_lock+0x196/0x350
> osq_lock at kernel/locking/osq_lock.c:157
> rwsem_optimistic_spin
> <snip>
>
> Since the write only stores NULL to prev->next and the read tests if
> prev->next equals to this_cpu_ptr(&osq_node). Even if the value is
> shattered, the code is still working correctly. Thus, mark it as an
> intentional data race using the data_race() macro.
>
> Signed-off-by: Qian Cai <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index 1f77349..1de006e 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> */
>
> for (;;) {
> - if (prev->next == node &&
> + /*
> + * cpu_relax() below implies a compiler barrier which would
> + * prevent this comparison being optimized away.
> + */
> + if (data_race(prev->next) == node &&
> cmpxchg(&prev->next, node, NULL) == node)
> break;
I'm fine with the data_race() placement, but I don't find the comment
very helpful. We assign the result of a READ_ONCE() to 'prev' in the
loop, so I don't think that the cpu_relax() is really relevant.
The reason we don't need READ_ONCE() here is because if we race with
the writer then either we'll go round the loop again after accidentally
thinking prev->next != node, or we'll erroneously attempt the cmpxchg()
because we thought they were equal and that will fail.
Make sense?
Will
On Mon, May 11, 2020 at 04:58:13PM +0100, Will Deacon wrote:
> On Sat, May 09, 2020 at 02:36:54PM -0700, Paul E. McKenney wrote:
> > On Sat, May 09, 2020 at 12:53:38PM -0400, Qian Cai wrote:
> > >
> > >
> > > > On May 9, 2020, at 12:12 PM, Paul E. McKenney <[email protected]> wrote:
> > > >
> > > > Ah, and I forgot to ask. Why "if (data_race(prev->next == node)" instead
> > > > of "if (data_race(prev->next) == node"?
> > >
> > > I think the one you suggested is slightly better to point out the exact race. Do you want me to resend or you could squash it instead?
> >
> > The patch was still at the top of my stack, so I just amended it. Here
> > is the updated version.
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > commit 13e69ca01ce1621ce74248bda86cfad47fa5a0fa
> > Author: Qian Cai <[email protected]>
> > Date: Tue Feb 11 08:54:15 2020 -0500
> >
> > locking/osq_lock: Annotate a data race in osq_lock
> >
> > The prev->next pointer can be accessed concurrently as noticed by KCSAN:
> >
> > write (marked) to 0xffff9d3370dbbe40 of 8 bytes by task 3294 on cpu 107:
> > osq_lock+0x25f/0x350
> > osq_wait_next at kernel/locking/osq_lock.c:79
> > (inlined by) osq_lock at kernel/locking/osq_lock.c:185
> > rwsem_optimistic_spin
> > <snip>
> >
> > read to 0xffff9d3370dbbe40 of 8 bytes by task 3398 on cpu 100:
> > osq_lock+0x196/0x350
> > osq_lock at kernel/locking/osq_lock.c:157
> > rwsem_optimistic_spin
> > <snip>
> >
> > Since the write only stores NULL to prev->next and the read tests if
> > prev->next equals to this_cpu_ptr(&osq_node). Even if the value is
> > shattered, the code is still working correctly. Thus, mark it as an
> > intentional data race using the data_race() macro.
> >
> > Signed-off-by: Qian Cai <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
> >
> > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> > index 1f77349..1de006e 100644
> > --- a/kernel/locking/osq_lock.c
> > +++ b/kernel/locking/osq_lock.c
> > @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> > */
> >
> > for (;;) {
> > - if (prev->next == node &&
> > + /*
> > + * cpu_relax() below implies a compiler barrier which would
> > + * prevent this comparison being optimized away.
> > + */
> > + if (data_race(prev->next) == node &&
> > cmpxchg(&prev->next, node, NULL) == node)
> > break;
>
> I'm fine with the data_race() placement, but I don't find the comment
> very helpful. We assign the result of a READ_ONCE() to 'prev' in the
> loop, so I don't think that the cpu_relax() is really relevant.
Suppose that the compiler loaded a value that was not equal to "node".
In that case, the cmpxchg() won't happen, so something else must force
the compiler to do the reload in order to avoid an infinite loop, right?
Or am I missing something here?
Thanx, Paul
> The reason we don't need READ_ONCE() here is because if we race with
> the writer then either we'll go round the loop again after accidentally
> thinking prev->next != node, or we'll erroneously attempt the cmpxchg()
> because we thought they were equal and that will fail.
>
> Make sense?
>
> Will
> On May 11, 2020, at 11:58 AM, Will Deacon <[email protected]> wrote:
>
> I'm fine with the data_race() placement, but I don't find the comment
> very helpful. We assign the result of a READ_ONCE() to 'prev' in the
> loop, so I don't think that the cpu_relax() is really relevant.
>
> The reason we don't need READ_ONCE() here is because if we race with
> the writer then either we'll go round the loop again after accidentally
> thinking prev->next != node, or we'll erroneously attempt the cmpxchg()
> because we thought they were equal and that will fail.
>
> Make sense?
I think the significant concern from the previous reviews was if compilers could prove that prev->next == node was always true because it had no knowledge of the concurrency, and then took out the whole if statement away resulting in an infinite loop.
The comment tried to explain that the cpu_relax() would save us from the infinite loop in theory here.
On Mon, May 11, 2020 at 09:43:19AM -0700, Paul E. McKenney wrote:
> On Mon, May 11, 2020 at 04:58:13PM +0100, Will Deacon wrote:
> > On Sat, May 09, 2020 at 02:36:54PM -0700, Paul E. McKenney wrote:
> > > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> > > index 1f77349..1de006e 100644
> > > --- a/kernel/locking/osq_lock.c
> > > +++ b/kernel/locking/osq_lock.c
> > > @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> > > */
> > >
> > > for (;;) {
> > > - if (prev->next == node &&
> > > + /*
> > > + * cpu_relax() below implies a compiler barrier which would
> > > + * prevent this comparison being optimized away.
> > > + */
> > > + if (data_race(prev->next) == node &&
> > > cmpxchg(&prev->next, node, NULL) == node)
> > > break;
> >
> > I'm fine with the data_race() placement, but I don't find the comment
> > very helpful. We assign the result of a READ_ONCE() to 'prev' in the
> > loop, so I don't think that the cpu_relax() is really relevant.
>
> Suppose that the compiler loaded a value that was not equal to "node".
> In that case, the cmpxchg() won't happen, so something else must force
> the compiler to do the reload in order to avoid an infinite loop, right?
> Or am I missing something here?
Then we just go round the loop and reload prev:
prev = READ_ONCE(node->prev);
which should be enough to stop the compiler, no?
Will
On Mon, May 11, 2020 at 12:44:26PM -0400, Qian Cai wrote:
>
>
> > On May 11, 2020, at 11:58 AM, Will Deacon <[email protected]> wrote:
> >
> > I'm fine with the data_race() placement, but I don't find the comment
> > very helpful. We assign the result of a READ_ONCE() to 'prev' in the
> > loop, so I don't think that the cpu_relax() is really relevant.
> >
> > The reason we don't need READ_ONCE() here is because if we race with
> > the writer then either we'll go round the loop again after accidentally
> > thinking prev->next != node, or we'll erroneously attempt the cmpxchg()
> > because we thought they were equal and that will fail.
> >
> > Make sense?
>
> I think the significant concern from the previous reviews was if compilers
> could prove that prev->next == node was always true because it had no
> knowledge of the concurrency, and then took out the whole if statement
> away resulting in an infinite loop.
Hmm, I don't see how it can remove the cmpxchg(). Do you have a link to that
discussion, please?
Will
> On May 11, 2020, at 12:54 PM, Will Deacon <[email protected]> wrote:
>
> Hmm, I don't see how it can remove the cmpxchg(). Do you have a link to that
> discussion, please?
lore.kernel.org/lkml/[email protected]
Correction — if compilers could prove ”prev->next != node” is always true, that cmpxchg() would not run. cpu_relax() should be sufficient to keep that “if statement” been optimized away in any case.
On Mon, May 11, 2020 at 05:52:17PM +0100, Will Deacon wrote:
> On Mon, May 11, 2020 at 09:43:19AM -0700, Paul E. McKenney wrote:
> > On Mon, May 11, 2020 at 04:58:13PM +0100, Will Deacon wrote:
> > > On Sat, May 09, 2020 at 02:36:54PM -0700, Paul E. McKenney wrote:
> > > > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> > > > index 1f77349..1de006e 100644
> > > > --- a/kernel/locking/osq_lock.c
> > > > +++ b/kernel/locking/osq_lock.c
> > > > @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> > > > */
> > > >
> > > > for (;;) {
> > > > - if (prev->next == node &&
> > > > + /*
> > > > + * cpu_relax() below implies a compiler barrier which would
> > > > + * prevent this comparison being optimized away.
> > > > + */
> > > > + if (data_race(prev->next) == node &&
> > > > cmpxchg(&prev->next, node, NULL) == node)
> > > > break;
> > >
> > > I'm fine with the data_race() placement, but I don't find the comment
> > > very helpful. We assign the result of a READ_ONCE() to 'prev' in the
> > > loop, so I don't think that the cpu_relax() is really relevant.
> >
> > Suppose that the compiler loaded a value that was not equal to "node".
> > In that case, the cmpxchg() won't happen, so something else must force
> > the compiler to do the reload in order to avoid an infinite loop, right?
> > Or am I missing something here?
>
> Then we just go round the loop and reload prev:
>
> prev = READ_ONCE(node->prev);
>
> which should be enough to stop the compiler, no?
Yes, that would also work. Either have the cpu_relax() or a barrier()
or whatever on the one hand, or, as you say, turn the data_race() into
a READ_ONCE(). I personally prefer the READ_ONCE() myself, unless that
would undesirably suppress other KCSAN warnings.
Thanx, Paul
On Mon, May 11, 2020 at 10:29:18AM -0700, Paul E. McKenney wrote:
> On Mon, May 11, 2020 at 05:52:17PM +0100, Will Deacon wrote:
> > On Mon, May 11, 2020 at 09:43:19AM -0700, Paul E. McKenney wrote:
> > > On Mon, May 11, 2020 at 04:58:13PM +0100, Will Deacon wrote:
> > > > On Sat, May 09, 2020 at 02:36:54PM -0700, Paul E. McKenney wrote:
> > > > > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> > > > > index 1f77349..1de006e 100644
> > > > > --- a/kernel/locking/osq_lock.c
> > > > > +++ b/kernel/locking/osq_lock.c
> > > > > @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> > > > > */
> > > > >
> > > > > for (;;) {
> > > > > - if (prev->next == node &&
> > > > > + /*
> > > > > + * cpu_relax() below implies a compiler barrier which would
> > > > > + * prevent this comparison being optimized away.
> > > > > + */
> > > > > + if (data_race(prev->next) == node &&
> > > > > cmpxchg(&prev->next, node, NULL) == node)
> > > > > break;
> > > >
> > > > I'm fine with the data_race() placement, but I don't find the comment
> > > > very helpful. We assign the result of a READ_ONCE() to 'prev' in the
> > > > loop, so I don't think that the cpu_relax() is really relevant.
> > >
> > > Suppose that the compiler loaded a value that was not equal to "node".
> > > In that case, the cmpxchg() won't happen, so something else must force
> > > the compiler to do the reload in order to avoid an infinite loop, right?
> > > Or am I missing something here?
> >
> > Then we just go round the loop and reload prev:
> >
> > prev = READ_ONCE(node->prev);
> >
> > which should be enough to stop the compiler, no?
>
> Yes, that would also work. Either have the cpu_relax() or a barrier()
> or whatever on the one hand, or, as you say, turn the data_race() into
> a READ_ONCE(). I personally prefer the READ_ONCE() myself, unless that
> would undesirably suppress other KCSAN warnings.
No, I mean here is the code after this patch is applied:
for (;;) {
if (data_race(prev->next) == node &&
cmpxchg(&prev->next, node, NULL) == node)
break;
/*
* We can only fail the cmpxchg() racing against an unlock(),
* in which case we should observe @node->locked becomming
* true.
*/
if (smp_load_acquire(&node->locked))
return true;
cpu_relax();
/*
* Or we race against a concurrent unqueue()'s step-B, in which
* case its step-C will write us a new @node->prev pointer.
*/
prev = READ_ONCE(node->prev);
}
I'm saying that this READ_ONCE at the end of the loop should be sufficient
to stop the compiler making value assumptions about prev->next. Do you
agree?
Will
On Mon, May 11, 2020 at 06:34:13PM +0100, Will Deacon wrote:
> On Mon, May 11, 2020 at 10:29:18AM -0700, Paul E. McKenney wrote:
> > On Mon, May 11, 2020 at 05:52:17PM +0100, Will Deacon wrote:
> > > On Mon, May 11, 2020 at 09:43:19AM -0700, Paul E. McKenney wrote:
> > > > On Mon, May 11, 2020 at 04:58:13PM +0100, Will Deacon wrote:
> > > > > On Sat, May 09, 2020 at 02:36:54PM -0700, Paul E. McKenney wrote:
> > > > > > diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> > > > > > index 1f77349..1de006e 100644
> > > > > > --- a/kernel/locking/osq_lock.c
> > > > > > +++ b/kernel/locking/osq_lock.c
> > > > > > @@ -154,7 +154,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> > > > > > */
> > > > > >
> > > > > > for (;;) {
> > > > > > - if (prev->next == node &&
> > > > > > + /*
> > > > > > + * cpu_relax() below implies a compiler barrier which would
> > > > > > + * prevent this comparison being optimized away.
> > > > > > + */
> > > > > > + if (data_race(prev->next) == node &&
> > > > > > cmpxchg(&prev->next, node, NULL) == node)
> > > > > > break;
> > > > >
> > > > > I'm fine with the data_race() placement, but I don't find the comment
> > > > > very helpful. We assign the result of a READ_ONCE() to 'prev' in the
> > > > > loop, so I don't think that the cpu_relax() is really relevant.
> > > >
> > > > Suppose that the compiler loaded a value that was not equal to "node".
> > > > In that case, the cmpxchg() won't happen, so something else must force
> > > > the compiler to do the reload in order to avoid an infinite loop, right?
> > > > Or am I missing something here?
> > >
> > > Then we just go round the loop and reload prev:
> > >
> > > prev = READ_ONCE(node->prev);
> > >
> > > which should be enough to stop the compiler, no?
> >
> > Yes, that would also work. Either have the cpu_relax() or a barrier()
> > or whatever on the one hand, or, as you say, turn the data_race() into
> > a READ_ONCE(). I personally prefer the READ_ONCE() myself, unless that
> > would undesirably suppress other KCSAN warnings.
>
> No, I mean here is the code after this patch is applied:
>
> for (;;) {
> if (data_race(prev->next) == node &&
> cmpxchg(&prev->next, node, NULL) == node)
> break;
>
> /*
> * We can only fail the cmpxchg() racing against an unlock(),
> * in which case we should observe @node->locked becomming
> * true.
> */
> if (smp_load_acquire(&node->locked))
> return true;
>
> cpu_relax();
>
> /*
> * Or we race against a concurrent unqueue()'s step-B, in which
> * case its step-C will write us a new @node->prev pointer.
> */
> prev = READ_ONCE(node->prev);
> }
>
> I'm saying that this READ_ONCE at the end of the loop should be sufficient
> to stop the compiler making value assumptions about prev->next. Do you
> agree?
Good point, and I would certainly hope so!
Thanx, Paul