2016-03-03 15:38:14

by Jianyu Zhan

[permalink] [raw]
Subject: [PATCH] futex: replace bare barrier() with more lightweight READ_ONCE()

Commit e91467ecd1ef ("bug in futex unqueue_me") introduces a barrier()
in unqueue_me(), to address below problem.

The scenario is like this:

====================
original code:

retry:
lock_ptr = q->lock_ptr;
if (lock_ptr != 0) {
spin_lock(lock_ptr)
if (unlikely(lock_ptr != q->lock_ptr)) {
spin_unlock(lock_ptr);
goto retry;
}
...
}

====================
It was observed that compiler generates code that is equivalent to:

retry:
if (q->lock_ptr != 0) {
spin_lock(q->lock_ptr)
if (unlikely(lock_ptr != q->lock_ptr)) {
spin_unlock(lock_ptr);
goto retry;
}
...
}

since q->lock_ptr might change between the test of non-nullness and spin_lock(),
the double load will cause trouble. So that commit uses a barrier() to prevent this.

This patch replaces this bare barrier() with a READ_ONCE().

The reasons are:

1) READ_ONCE() is a more weak form of barrier() that affect only the specific
accesses, while barrier() is a more general compiler level memroy barrier.
READ_ONCE() was not available at that time when that patch was written.

2) READ_ONCE() which could be more informative by its name, while a bare barrier()
without comment leads to quite a bit of perplexity.

Assembly code before(barrier version) and after this patch(READ_ONCE version) are the same:

====================
Before(barrier version):

unqueue_me():
linux/kernel/futex.c:1930
1df6: 4c 8b bd 28 ff ff ff mov -0xd8(%rbp),%r15
linux/kernel/futex.c:1932
1dfd: 4d 85 ff test %r15,%r15
1e00: 0f 84 5c 01 00 00 je 1f62 <futex_wait+0x292>
spin_lock():
linux/include/linux/spinlock.h:302
1e06: 4c 89 ff mov %r15,%rdi
1e09: e8 00 00 00 00 callq 1e0e <futex_wait+0x13e>

====================
After(READ_ONCE version):

__read_once_size():
linux/include/linux/compiler.h:218
1df6: 4c 8b bd 28 ff ff ff mov -0xd8(%rbp),%r15
unqueue_me():
linux/kernel/futex.c:1935
1dfd: 4d 85 ff test %r15,%r15
1e00: 0f 84 5c 01 00 00 je 1f62 <futex_wait+0x292>
spin_lock():
linux/include/linux/spinlock.h:302
1e06: 4c 89 ff mov %r15,%rdi
1e09: e8 00 00 00 00 callq 1e0e <futex_wait+0x13e>

Code size is also the same.

Suggested-by: Darren Hart <[email protected]>
Acked-by: Christian Borntraeger <[email protected]>
Signed-off-by: Jianyu Zhan <[email protected]>
---
kernel/futex.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 5d6ce64..58c1bcc 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1927,8 +1927,11 @@ static int unqueue_me(struct futex_q *q)

/* In the common case we don't take the spinlock, which is nice. */
retry:
- lock_ptr = q->lock_ptr;
- barrier();
+ /*
+ * Prevent the compiler to read q->lock_ptr twice (if and spin_lock),
+ * or that would cause trouble since q->lock_ptr can change in between.
+ */
+ lock_ptr = READ_ONCE(q->lock_ptr);
if (lock_ptr != NULL) {
spin_lock(lock_ptr);
/*
--
2.4.3


2016-03-03 17:05:42

by Darren Hart

[permalink] [raw]
Subject: Re: [PATCH] futex: replace bare barrier() with more lightweight READ_ONCE()

On Thu, Mar 03, 2016 at 11:38:05PM +0800, Jianyu Zhan wrote:
> Commit e91467ecd1ef ("bug in futex unqueue_me") introduces a barrier()
> in unqueue_me(), to address below problem.
>
> The scenario is like this:
>
> ====================
> original code:
>
> retry:
> lock_ptr = q->lock_ptr;
> if (lock_ptr != 0) {
> spin_lock(lock_ptr)
> if (unlikely(lock_ptr != q->lock_ptr)) {
> spin_unlock(lock_ptr);
> goto retry;
> }
> ...
> }
>
> ====================
> It was observed that compiler generates code that is equivalent to:
>
> retry:
> if (q->lock_ptr != 0) {
> spin_lock(q->lock_ptr)
> if (unlikely(lock_ptr != q->lock_ptr)) {
> spin_unlock(lock_ptr);
> goto retry;
> }
> ...
> }
>
> since q->lock_ptr might change between the test of non-nullness and spin_lock(),
> the double load will cause trouble. So that commit uses a barrier() to prevent this.
>
> This patch replaces this bare barrier() with a READ_ONCE().
>
> The reasons are:
>
> 1) READ_ONCE() is a more weak form of barrier() that affect only the specific
> accesses, while barrier() is a more general compiler level memroy barrier.
> READ_ONCE() was not available at that time when that patch was written.
>
> 2) READ_ONCE() which could be more informative by its name, while a bare barrier()
> without comment leads to quite a bit of perplexity.
>
> Assembly code before(barrier version) and after this patch(READ_ONCE version) are the same:
>
> ====================
> Before(barrier version):
>
> unqueue_me():
> linux/kernel/futex.c:1930
> 1df6: 4c 8b bd 28 ff ff ff mov -0xd8(%rbp),%r15
> linux/kernel/futex.c:1932
> 1dfd: 4d 85 ff test %r15,%r15
> 1e00: 0f 84 5c 01 00 00 je 1f62 <futex_wait+0x292>
> spin_lock():
> linux/include/linux/spinlock.h:302
> 1e06: 4c 89 ff mov %r15,%rdi
> 1e09: e8 00 00 00 00 callq 1e0e <futex_wait+0x13e>
>
> ====================
> After(READ_ONCE version):
>
> __read_once_size():
> linux/include/linux/compiler.h:218
> 1df6: 4c 8b bd 28 ff ff ff mov -0xd8(%rbp),%r15
> unqueue_me():
> linux/kernel/futex.c:1935
> 1dfd: 4d 85 ff test %r15,%r15
> 1e00: 0f 84 5c 01 00 00 je 1f62 <futex_wait+0x292>
> spin_lock():
> linux/include/linux/spinlock.h:302
> 1e06: 4c 89 ff mov %r15,%rdi
> 1e09: e8 00 00 00 00 callq 1e0e <futex_wait+0x13e>
>
> Code size is also the same.
>
> Suggested-by: Darren Hart <[email protected]>

A simple suggested-by may cause people to think this patch was my idea, which
would not be accurate. You can credit people for recommendations in the patch
changelog ("Since v1: ..." below the --- line).

> Acked-by: Christian Borntraeger <[email protected]>
> Signed-off-by: Jianyu Zhan <[email protected]>
> ---
> kernel/futex.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 5d6ce64..58c1bcc 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -1927,8 +1927,11 @@ static int unqueue_me(struct futex_q *q)
>
> /* In the common case we don't take the spinlock, which is nice. */
> retry:
> - lock_ptr = q->lock_ptr;
> - barrier();
> + /*
> + * Prevent the compiler to read q->lock_ptr twice (if and spin_lock),
> + * or that would cause trouble since q->lock_ptr can change in between.
> + */

I thought I provided a corrected comment block.... maybe I didn't. We have been
working on improving the futex documentation, so we're paying close attention to
terminology as well as grammar. This one needs a couple minor tweaks. I suggest:

/*
* Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and
* optimizing lock_ptr out of the logic below.
*/

The bit about q->lock_ptr possibly changing is already covered by the large
comment block below the spin_lock(lock_ptr) call.

With the above change, I'll add my Reviewed-by.

Thanks,

--
Darren Hart
Intel Open Source Technology Center

2016-03-04 01:13:13

by Jianyu Zhan

[permalink] [raw]
Subject: Re: [PATCH] futex: replace bare barrier() with more lightweight READ_ONCE()

On Fri, Mar 4, 2016 at 1:05 AM, Darren Hart <[email protected]> wrote:
> I thought I provided a corrected comment block.... maybe I didn't. We have been
> working on improving the futex documentation, so we're paying close attention to
> terminology as well as grammar. This one needs a couple minor tweaks. I suggest:
>
> /*
> * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and
> * optimizing lock_ptr out of the logic below.
> */
>
> The bit about q->lock_ptr possibly changing is already covered by the large
> comment block below the spin_lock(lock_ptr) call.

The large comment block is explaining the why the retry logic is required.
To achieve this semantic requirement, the READ_ONCE is needed to prevent
compiler optimizing it by doing double loads.

So I think the comment above should explain this tricky part.

> /* Use READ_ONCE to forbid the compiler from reloading q->lock_ptr in spin_lock() */

And as for preventing from optimizing the lock_ptr out of the retry
code block, I have consult
Paul Mckenney, he suggests one more READ_ONCE should be added here:

if (unlikely(lock_ptr != READ_ONCE(q->lock_ptr))) {
<------------------------------
spin_unlock(lock_ptr);
goto retry;
}

And I think this are two problem, and should be separated into two patches?



Regards,
Jianyu Zhan

2016-03-04 21:05:32

by Darren Hart

[permalink] [raw]
Subject: Re: [PATCH] futex: replace bare barrier() with more lightweight READ_ONCE()

On Fri, Mar 04, 2016 at 09:12:31AM +0800, Jianyu Zhan wrote:
> On Fri, Mar 4, 2016 at 1:05 AM, Darren Hart <[email protected]> wrote:
> > I thought I provided a corrected comment block.... maybe I didn't. We have been
> > working on improving the futex documentation, so we're paying close attention to
> > terminology as well as grammar. This one needs a couple minor tweaks. I suggest:
> >
> > /*
> > * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and
> > * optimizing lock_ptr out of the logic below.
> > */
> >
> > The bit about q->lock_ptr possibly changing is already covered by the large
> > comment block below the spin_lock(lock_ptr) call.
>
> The large comment block is explaining the why the retry logic is required.
> To achieve this semantic requirement, the READ_ONCE is needed to prevent
> compiler optimizing it by doing double loads.
>
> So I think the comment above should explain this tricky part.

Fair point. Consider:


/*
* q->lock_ptr can change between this read and the following spin_lock.
* Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and
* optimizing lock_ptr out of the logic below.
*/

>
> > /* Use READ_ONCE to forbid the compiler from reloading q->lock_ptr in spin_lock() */
>
> And as for preventing from optimizing the lock_ptr out of the retry
> code block, I have consult
> Paul Mckenney, he suggests one more READ_ONCE should be added here:

Let's keep this discussion together so we have a record of the
justification.

+Paul McKenney

Paul, my understanding was that spin_lock was a CPU memory barrier,
which in turn is an implicit compiler barrier (aka barrier()), of which
READ_ONCE is described as a weaker form. Reviewing this, I realize the
scope of barrier() wasn't clear to me. It seems while barrier() ensures
ordering, it does not offer the same guarantee regarding reloading that
READ_ONCE offers. So READ_ONCE is not strictly a weaker form of
barrier() as I had gathered from a spotty reading of
memory-barriers.txt, but it also offers guarantees regarding memory
references that barrier() does not.

Correct?

>
> if (unlikely(lock_ptr != READ_ONCE(q->lock_ptr))) {
> <------------------------------
> spin_unlock(lock_ptr);
> goto retry;
> }
>
> And I think this are two problem, and should be separated into two patches?

Yes (pending results of the conversation above).

--
Darren Hart
Intel Open Source Technology Center

2016-03-04 22:07:32

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] futex: replace bare barrier() with more lightweight READ_ONCE()

On Fri, Mar 04, 2016 at 01:05:24PM -0800, Darren Hart wrote:
> On Fri, Mar 04, 2016 at 09:12:31AM +0800, Jianyu Zhan wrote:
> > On Fri, Mar 4, 2016 at 1:05 AM, Darren Hart <[email protected]> wrote:
> > > I thought I provided a corrected comment block.... maybe I didn't. We have been
> > > working on improving the futex documentation, so we're paying close attention to
> > > terminology as well as grammar. This one needs a couple minor tweaks. I suggest:
> > >
> > > /*
> > > * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and
> > > * optimizing lock_ptr out of the logic below.
> > > */
> > >
> > > The bit about q->lock_ptr possibly changing is already covered by the large
> > > comment block below the spin_lock(lock_ptr) call.
> >
> > The large comment block is explaining the why the retry logic is required.
> > To achieve this semantic requirement, the READ_ONCE is needed to prevent
> > compiler optimizing it by doing double loads.
> >
> > So I think the comment above should explain this tricky part.
>
> Fair point. Consider:
>
>
> /*
> * q->lock_ptr can change between this read and the following spin_lock.
> * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and
> * optimizing lock_ptr out of the logic below.
> */
>
> >
> > > /* Use READ_ONCE to forbid the compiler from reloading q->lock_ptr in spin_lock() */
> >
> > And as for preventing from optimizing the lock_ptr out of the retry
> > code block, I have consult
> > Paul Mckenney, he suggests one more READ_ONCE should be added here:
>
> Let's keep this discussion together so we have a record of the
> justification.
>
> +Paul McKenney
>
> Paul, my understanding was that spin_lock was a CPU memory barrier,
> which in turn is an implicit compiler barrier (aka barrier()), of which
> READ_ONCE is described as a weaker form. Reviewing this, I realize the
> scope of barrier() wasn't clear to me. It seems while barrier() ensures
> ordering, it does not offer the same guarantee regarding reloading that
> READ_ONCE offers. So READ_ONCE is not strictly a weaker form of
> barrier() as I had gathered from a spotty reading of
> memory-barriers.txt, but it also offers guarantees regarding memory
> references that barrier() does not.
>
> Correct?

If q->lock_ptr is never changed except under that lock, then there is
indeed no reason for the ACCESS_ONCE().

So, is q->lock_ptr ever changed while the lock is -not- held? If so,
I suggest that you put an ACCESS_ONCE() there.

Thanx, Paul

> > if (unlikely(lock_ptr != READ_ONCE(q->lock_ptr))) {
> > <------------------------------
> > spin_unlock(lock_ptr);
> > goto retry;
> > }
> >
> > And I think this are two problem, and should be separated into two patches?
>
> Yes (pending results of the conversation above).
>
> --
> Darren Hart
> Intel Open Source Technology Center
>

2016-03-04 22:38:09

by Darren Hart

[permalink] [raw]
Subject: Re: [PATCH] futex: replace bare barrier() with more lightweight READ_ONCE()

On Fri, Mar 04, 2016 at 01:57:20PM -0800, Paul McKenney wrote:
> On Fri, Mar 04, 2016 at 01:05:24PM -0800, Darren Hart wrote:
> > On Fri, Mar 04, 2016 at 09:12:31AM +0800, Jianyu Zhan wrote:
> > > On Fri, Mar 4, 2016 at 1:05 AM, Darren Hart <[email protected]> wrote:
> > > > I thought I provided a corrected comment block.... maybe I didn't. We have been
> > > > working on improving the futex documentation, so we're paying close attention to
> > > > terminology as well as grammar. This one needs a couple minor tweaks. I suggest:
> > > >
> > > > /*
> > > > * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and
> > > > * optimizing lock_ptr out of the logic below.
> > > > */
> > > >
> > > > The bit about q->lock_ptr possibly changing is already covered by the large
> > > > comment block below the spin_lock(lock_ptr) call.
> > >
> > > The large comment block is explaining the why the retry logic is required.
> > > To achieve this semantic requirement, the READ_ONCE is needed to prevent
> > > compiler optimizing it by doing double loads.
> > >
> > > So I think the comment above should explain this tricky part.
> >
> > Fair point. Consider:
> >
> >
> > /*
> > * q->lock_ptr can change between this read and the following spin_lock.
> > * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and
> > * optimizing lock_ptr out of the logic below.
> > */
> >
> > >
> > > > /* Use READ_ONCE to forbid the compiler from reloading q->lock_ptr in spin_lock() */
> > >
> > > And as for preventing from optimizing the lock_ptr out of the retry
> > > code block, I have consult
> > > Paul Mckenney, he suggests one more READ_ONCE should be added here:
> >
> > Let's keep this discussion together so we have a record of the
> > justification.
> >
> > +Paul McKenney
> >
> > Paul, my understanding was that spin_lock was a CPU memory barrier,
> > which in turn is an implicit compiler barrier (aka barrier()), of which
> > READ_ONCE is described as a weaker form. Reviewing this, I realize the
> > scope of barrier() wasn't clear to me. It seems while barrier() ensures
> > ordering, it does not offer the same guarantee regarding reloading that
> > READ_ONCE offers. So READ_ONCE is not strictly a weaker form of
> > barrier() as I had gathered from a spotty reading of
> > memory-barriers.txt, but it also offers guarantees regarding memory
> > references that barrier() does not.
> >
> > Correct?
>
> If q->lock_ptr is never changed except under that lock, then there is
> indeed no reason for the ACCESS_ONCE().

The only location where a q->lock_ptr is updated without that lock being held is
in queue_lock(). This is safe as the futex_q is not yet queued onto an hb until
after the lock is held (so unqueue_me() cannot race with queue_lock()).

> So, is q->lock_ptr ever changed while the lock is -not- held? If so,
> I suggest that you put an ACCESS_ONCE() there.

It is not.

--
Darren Hart
Intel Open Source Technology Center

2016-03-04 22:45:17

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] futex: replace bare barrier() with more lightweight READ_ONCE()

On Fri, Mar 04, 2016 at 02:38:01PM -0800, Darren Hart wrote:
> On Fri, Mar 04, 2016 at 01:57:20PM -0800, Paul McKenney wrote:
> > On Fri, Mar 04, 2016 at 01:05:24PM -0800, Darren Hart wrote:
> > > On Fri, Mar 04, 2016 at 09:12:31AM +0800, Jianyu Zhan wrote:
> > > > On Fri, Mar 4, 2016 at 1:05 AM, Darren Hart <[email protected]> wrote:
> > > > > I thought I provided a corrected comment block.... maybe I didn't. We have been
> > > > > working on improving the futex documentation, so we're paying close attention to
> > > > > terminology as well as grammar. This one needs a couple minor tweaks. I suggest:
> > > > >
> > > > > /*
> > > > > * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and
> > > > > * optimizing lock_ptr out of the logic below.
> > > > > */
> > > > >
> > > > > The bit about q->lock_ptr possibly changing is already covered by the large
> > > > > comment block below the spin_lock(lock_ptr) call.
> > > >
> > > > The large comment block is explaining the why the retry logic is required.
> > > > To achieve this semantic requirement, the READ_ONCE is needed to prevent
> > > > compiler optimizing it by doing double loads.
> > > >
> > > > So I think the comment above should explain this tricky part.
> > >
> > > Fair point. Consider:
> > >
> > >
> > > /*
> > > * q->lock_ptr can change between this read and the following spin_lock.
> > > * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and
> > > * optimizing lock_ptr out of the logic below.
> > > */
> > >
> > > >
> > > > > /* Use READ_ONCE to forbid the compiler from reloading q->lock_ptr in spin_lock() */
> > > >
> > > > And as for preventing from optimizing the lock_ptr out of the retry
> > > > code block, I have consult
> > > > Paul Mckenney, he suggests one more READ_ONCE should be added here:
> > >
> > > Let's keep this discussion together so we have a record of the
> > > justification.
> > >
> > > +Paul McKenney
> > >
> > > Paul, my understanding was that spin_lock was a CPU memory barrier,
> > > which in turn is an implicit compiler barrier (aka barrier()), of which
> > > READ_ONCE is described as a weaker form. Reviewing this, I realize the
> > > scope of barrier() wasn't clear to me. It seems while barrier() ensures
> > > ordering, it does not offer the same guarantee regarding reloading that
> > > READ_ONCE offers. So READ_ONCE is not strictly a weaker form of
> > > barrier() as I had gathered from a spotty reading of
> > > memory-barriers.txt, but it also offers guarantees regarding memory
> > > references that barrier() does not.
> > >
> > > Correct?
> >
> > If q->lock_ptr is never changed except under that lock, then there is
> > indeed no reason for the ACCESS_ONCE().
>
> The only location where a q->lock_ptr is updated without that lock being held is
> in queue_lock(). This is safe as the futex_q is not yet queued onto an hb until
> after the lock is held (so unqueue_me() cannot race with queue_lock()).
>
> > So, is q->lock_ptr ever changed while the lock is -not- held? If so,
> > I suggest that you put an ACCESS_ONCE() there.
>
> It is not.

If I followed that correctly, then I agree that you don't need an
ACCESS_ONCE() in this case.

Thanx, Paul

2016-03-04 22:54:07

by Darren Hart

[permalink] [raw]
Subject: Re: [PATCH] futex: replace bare barrier() with more lightweight READ_ONCE()

On Fri, Mar 04, 2016 at 02:45:11PM -0800, Paul McKenney wrote:
> On Fri, Mar 04, 2016 at 02:38:01PM -0800, Darren Hart wrote:
> > On Fri, Mar 04, 2016 at 01:57:20PM -0800, Paul McKenney wrote:
> > > On Fri, Mar 04, 2016 at 01:05:24PM -0800, Darren Hart wrote:
> > > > On Fri, Mar 04, 2016 at 09:12:31AM +0800, Jianyu Zhan wrote:
> > > > > On Fri, Mar 4, 2016 at 1:05 AM, Darren Hart <[email protected]> wrote:
> > > > > > I thought I provided a corrected comment block.... maybe I didn't. We have been
> > > > > > working on improving the futex documentation, so we're paying close attention to
> > > > > > terminology as well as grammar. This one needs a couple minor tweaks. I suggest:
> > > > > >
> > > > > > /*
> > > > > > * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and
> > > > > > * optimizing lock_ptr out of the logic below.
> > > > > > */
> > > > > >
> > > > > > The bit about q->lock_ptr possibly changing is already covered by the large
> > > > > > comment block below the spin_lock(lock_ptr) call.
> > > > >
> > > > > The large comment block is explaining the why the retry logic is required.
> > > > > To achieve this semantic requirement, the READ_ONCE is needed to prevent
> > > > > compiler optimizing it by doing double loads.
> > > > >
> > > > > So I think the comment above should explain this tricky part.
> > > >
> > > > Fair point. Consider:
> > > >
> > > >
> > > > /*
> > > > * q->lock_ptr can change between this read and the following spin_lock.
> > > > * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and
> > > > * optimizing lock_ptr out of the logic below.
> > > > */
> > > >
> > > > >
> > > > > > /* Use READ_ONCE to forbid the compiler from reloading q->lock_ptr in spin_lock() */
> > > > >
> > > > > And as for preventing from optimizing the lock_ptr out of the retry
> > > > > code block, I have consult
> > > > > Paul Mckenney, he suggests one more READ_ONCE should be added here:
> > > >
> > > > Let's keep this discussion together so we have a record of the
> > > > justification.
> > > >
> > > > +Paul McKenney
> > > >
> > > > Paul, my understanding was that spin_lock was a CPU memory barrier,
> > > > which in turn is an implicit compiler barrier (aka barrier()), of which
> > > > READ_ONCE is described as a weaker form. Reviewing this, I realize the
> > > > scope of barrier() wasn't clear to me. It seems while barrier() ensures
> > > > ordering, it does not offer the same guarantee regarding reloading that
> > > > READ_ONCE offers. So READ_ONCE is not strictly a weaker form of
> > > > barrier() as I had gathered from a spotty reading of
> > > > memory-barriers.txt, but it also offers guarantees regarding memory
> > > > references that barrier() does not.
> > > >
> > > > Correct?
> > >
> > > If q->lock_ptr is never changed except under that lock, then there is
> > > indeed no reason for the ACCESS_ONCE().
> >
> > The only location where a q->lock_ptr is updated without that lock being held is
> > in queue_lock(). This is safe as the futex_q is not yet queued onto an hb until
> > after the lock is held (so unqueue_me() cannot race with queue_lock()).
> >
> > > So, is q->lock_ptr ever changed while the lock is -not- held? If so,
> > > I suggest that you put an ACCESS_ONCE() there.
> >
> > It is not.
>
> If I followed that correctly, then I agree that you don't need an
> ACCESS_ONCE() in this case.
>

Thanks for offering your time and expertise Paul. It's a major effort every time
I open memory-barriers.txt :-)

--
Darren Hart
Intel Open Source Technology Center

2016-03-07 01:32:49

by Jianyu Zhan

[permalink] [raw]
Subject: [PATCH v3] futex: replace bare barrier() with more lightweight READ_ONCE()

Commit e91467ecd1ef ("bug in futex unqueue_me") introduces a barrier()
in unqueue_me(), to address below problem.

The scenario is like this:

====================
original code:

retry:
lock_ptr = q->lock_ptr;
if (lock_ptr != 0) {
spin_lock(lock_ptr)
if (unlikely(lock_ptr != q->lock_ptr)) {
spin_unlock(lock_ptr);
goto retry;
}
...
}

====================
It was observed that compiler generates code that is equivalent to:

retry:
if (q->lock_ptr != 0) {
spin_lock(q->lock_ptr)
if (unlikely(lock_ptr != q->lock_ptr)) {
spin_unlock(lock_ptr);
goto retry;
}
...
}

since q->lock_ptr might change between the test of non-nullness and spin_lock(),
the double load will cause trouble. So that commit uses a barrier() to prevent this.

This patch replaces this bare barrier() with a READ_ONCE().

The reasons are:

1) READ_ONCE() is a more weak form of barrier() that affect only the specific
accesses, while barrier() is a more general compiler level memroy barrier.
READ_ONCE() was not available at that time when that patch was written.

2) READ_ONCE() which could be more informative by its name, while a bare barrier()
without comment leads to quite a bit of perplexity.

Assembly code before(barrier version) and after this patch(READ_ONCE version) are the same:

====================
Before(barrier version):

unqueue_me():
linux/kernel/futex.c:1930
1df6: 4c 8b bd 28 ff ff ff mov -0xd8(%rbp),%r15
linux/kernel/futex.c:1932
1dfd: 4d 85 ff test %r15,%r15
1e00: 0f 84 5c 01 00 00 je 1f62 <futex_wait+0x292>
spin_lock():
linux/include/linux/spinlock.h:302
1e06: 4c 89 ff mov %r15,%rdi
1e09: e8 00 00 00 00 callq 1e0e <futex_wait+0x13e>

====================
After(READ_ONCE version):

__read_once_size():
linux/include/linux/compiler.h:218
1df6: 4c 8b bd 28 ff ff ff mov -0xd8(%rbp),%r15
unqueue_me():
linux/kernel/futex.c:1935
1dfd: 4d 85 ff test %r15,%r15
1e00: 0f 84 5c 01 00 00 je 1f62 <futex_wait+0x292>
spin_lock():
linux/include/linux/spinlock.h:302
1e06: 4c 89 ff mov %r15,%rdi
1e09: e8 00 00 00 00 callq 1e0e <futex_wait+0x13e>

Code size is also the same.

Many thanks to Darren Hart <[email protected]> for reviewing and
suggestion.

Acked-by: Christian Borntraeger <[email protected]>
Signed-off-by: Jianyu Zhan <[email protected]>
---
kernel/futex.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 5d6ce64..25dbfed 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1927,8 +1927,12 @@ static int unqueue_me(struct futex_q *q)

/* In the common case we don't take the spinlock, which is nice. */
retry:
- lock_ptr = q->lock_ptr;
- barrier();
+ /*
+ * q->lock_ptr can change between this read and the following spin_lock.
+ * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and
+ * optimizing lock_ptr out of the logic below.
+ */
+ lock_ptr = READ_ONCE(q->lock_ptr);
if (lock_ptr != NULL) {
spin_lock(lock_ptr);
/*
--
2.4.3

2016-03-08 11:26:43

by Darren Hart

[permalink] [raw]
Subject: Re: [PATCH v3] futex: replace bare barrier() with more lightweight READ_ONCE()

On Mon, Mar 07, 2016 at 09:32:24AM +0800, Jianyu Zhan wrote:
> Commit e91467ecd1ef ("bug in futex unqueue_me") introduces a barrier()
> in unqueue_me(), to address below problem.
>
> The scenario is like this:
>
> ====================
> original code:
>
> retry:
> lock_ptr = q->lock_ptr;
> if (lock_ptr != 0) {
> spin_lock(lock_ptr)
> if (unlikely(lock_ptr != q->lock_ptr)) {
> spin_unlock(lock_ptr);
> goto retry;
> }
> ...
> }
>
> ====================
> It was observed that compiler generates code that is equivalent to:
>
> retry:
> if (q->lock_ptr != 0) {
> spin_lock(q->lock_ptr)
> if (unlikely(lock_ptr != q->lock_ptr)) {
> spin_unlock(lock_ptr);
> goto retry;
> }
> ...
> }
>
> since q->lock_ptr might change between the test of non-nullness and spin_lock(),
> the double load will cause trouble. So that commit uses a barrier() to prevent this.
>
> This patch replaces this bare barrier() with a READ_ONCE().
>
> The reasons are:
>
> 1) READ_ONCE() is a more weak form of barrier() that affect only the specific
> accesses, while barrier() is a more general compiler level memroy barrier.
> READ_ONCE() was not available at that time when that patch was written.
>
> 2) READ_ONCE() which could be more informative by its name, while a bare barrier()
> without comment leads to quite a bit of perplexity.
>
> Assembly code before(barrier version) and after this patch(READ_ONCE version) are the same:
>
> ====================
> Before(barrier version):
>
> unqueue_me():
> linux/kernel/futex.c:1930
> 1df6: 4c 8b bd 28 ff ff ff mov -0xd8(%rbp),%r15
> linux/kernel/futex.c:1932
> 1dfd: 4d 85 ff test %r15,%r15
> 1e00: 0f 84 5c 01 00 00 je 1f62 <futex_wait+0x292>
> spin_lock():
> linux/include/linux/spinlock.h:302
> 1e06: 4c 89 ff mov %r15,%rdi
> 1e09: e8 00 00 00 00 callq 1e0e <futex_wait+0x13e>
>
> ====================
> After(READ_ONCE version):
>
> __read_once_size():
> linux/include/linux/compiler.h:218
> 1df6: 4c 8b bd 28 ff ff ff mov -0xd8(%rbp),%r15
> unqueue_me():
> linux/kernel/futex.c:1935
> 1dfd: 4d 85 ff test %r15,%r15
> 1e00: 0f 84 5c 01 00 00 je 1f62 <futex_wait+0x292>
> spin_lock():
> linux/include/linux/spinlock.h:302
> 1e06: 4c 89 ff mov %r15,%rdi
> 1e09: e8 00 00 00 00 callq 1e0e <futex_wait+0x13e>
>
> Code size is also the same.
>
> Many thanks to Darren Hart <[email protected]> for reviewing and
> suggestion.
>
> Acked-by: Christian Borntraeger <[email protected]>
> Signed-off-by: Jianyu Zhan <[email protected]>

Thanks for sticking with this Jianyu.

Reviewed-by: Darren Hart <[email protected]>


> ---
> kernel/futex.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 5d6ce64..25dbfed 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -1927,8 +1927,12 @@ static int unqueue_me(struct futex_q *q)
>
> /* In the common case we don't take the spinlock, which is nice. */
> retry:
> - lock_ptr = q->lock_ptr;
> - barrier();
> + /*
> + * q->lock_ptr can change between this read and the following spin_lock.
> + * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and
> + * optimizing lock_ptr out of the logic below.
> + */
> + lock_ptr = READ_ONCE(q->lock_ptr);
> if (lock_ptr != NULL) {
> spin_lock(lock_ptr);
> /*
> --
> 2.4.3
>
>

--
Darren Hart
Intel Open Source Technology Center

Subject: [tip:locking/core] futex: Replace barrier() in unqueue_me() with READ_ONCE()

Commit-ID: 29b75eb2d56a714190a93d7be4525e617591077a
Gitweb: http://git.kernel.org/tip/29b75eb2d56a714190a93d7be4525e617591077a
Author: Jianyu Zhan <[email protected]>
AuthorDate: Mon, 7 Mar 2016 09:32:24 +0800
Committer: Thomas Gleixner <[email protected]>
CommitDate: Tue, 8 Mar 2016 17:04:02 +0100

futex: Replace barrier() in unqueue_me() with READ_ONCE()

Commit e91467ecd1ef ("bug in futex unqueue_me") introduced a barrier() in
unqueue_me() to prevent the compiler from rereading the lock pointer which
might change after a check for NULL.

Replace the barrier() with a READ_ONCE() for the following reasons:

1) READ_ONCE() is a weaker form of barrier() that affects only the specific
load operation, while barrier() is a general compiler level memory barrier.
READ_ONCE() was not available at the time when the barrier was added.

2) Aside of that READ_ONCE() is descriptive and self explainatory while a
barrier without comment is not clear to the casual reader.

No functional change.

[ tglx: Massaged changelog ]

Signed-off-by: Jianyu Zhan <[email protected]>
Acked-by: Christian Borntraeger <[email protected]>
Acked-by: Darren Hart <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
---
kernel/futex.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index bae542e..a5d2e74 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2010,8 +2010,12 @@ static int unqueue_me(struct futex_q *q)

/* In the common case we don't take the spinlock, which is nice. */
retry:
- lock_ptr = q->lock_ptr;
- barrier();
+ /*
+ * q->lock_ptr can change between this read and the following spin_lock.
+ * Use READ_ONCE to forbid the compiler from reloading q->lock_ptr and
+ * optimizing lock_ptr out of the logic below.
+ */
+ lock_ptr = READ_ONCE(q->lock_ptr);
if (lock_ptr != NULL) {
spin_lock(lock_ptr);
/*