2016-03-02 01:38:06

by Darren Hart

[permalink] [raw]
Subject: Re: [PATCH] futex: replace bare barrier() with a READ_ONCE()

On Tue, Mar 01, 2016 at 08:44:48AM +0800, Jianyu Zhan wrote:
> On Tue, Mar 1, 2016 at 6:22 AM, Darren Hart <[email protected]> wrote:
> > You've mention it causes problems a few times, but you do not specify what
> > problem it causes or how it manifests.
> >
> > Is this a theoretical bug, or have you experienced a failure case. If so, how
> > did this manifest? Do you have a reproducer we could add to the futex testsuite
> > in the kernel selftests?
>
>
> 1. For the first problem I memtioned, it is a bug that described in
> commit e91467ecd1ef.
>
> The scenario is like:
>
> lock_ptr = q->lock_ptr
> if (lock_ptr != 0) spin_lock(lock_ptr)
>
> and the compiler generates code that is equivalent to :
>
> if (q->lock_ptr != 0) spin_lock(q->lock_ptr)
>
> The problem is q->lock_ptr can change between the test of nullness of
> q->lock_ptr and the lock on q->lock_ptr
> So a barrier() is inserted into the load of q->lock_ptr and the test
> of nullness, to avoid the pointer aliasing.
>
> Apparently, a READ_ONCE() fits the goal here.


read_once will use a *volatile assignment instead of calling barrier()
directly for a word size argument.

With weak statements like "apparently" (above) and "could be" (from the original
post: This patch replaces this bare barrier() with a READ_ONCE(), a weaker form
of barrier(), which could be more informative.)" I do not see a compelling
reason to change what is notoriously fragile code with respect to barriers.

As for #2...

> 2. For the second problem I memtioned, yes, it is theoretical, and
> it is also due to q->lock_ptr can change between
> the test of nullness of q->lock_ptr and the lock on q->lock_ptr.
>
> the code is
>
> retry:
> lock_ptr = q->lock_ptr;
> if (lock_ptr != 0) {
> spin_lock(lock_ptr)
> if (unlikely(lock_ptr != q->lock_ptr)) {
> spin_unlock(lock_ptr);
> goto retry;
> }
> ...
> }
>
> which is effectively the same as :
>
> while (lock_ptr = q->lock_ptr) {
> spin_lock(lock_ptr)
> if (unlikely(lock_ptr != q->lock_ptr)) {
> spin_unlock(lock_ptr);
> goto retry;
> }
> ...
> }
>
> This might cause the compiler load the q->lock_ptr once and use it
> many times, quoted from
> memory-barriers.txt:
>
>
> (*) The compiler is within its rights to reload a variable, for example,
> in cases where high register pressure prevents the compiler from
> keeping all data of interest in registers. The compiler might
> therefore optimize the variable 'tmp' out of our previous example:
>
> while (tmp = a)
> do_something_with(tmp);
>
> This could result in the following code, which is perfectly safe in
> single-threaded code, but can be fatal in concurrent code:
>
> while (a)
> do_something_with(a);
>
> For example, the optimized version of this code could result in
> passing a zero to do_something_with() in the case where the variable
> a was modified by some other CPU between the "while" statement and
> the call to do_something_with().
>
> Again, use READ_ONCE() to prevent the compiler from doing this:
>
> while (tmp = READ_ONCE(a))
> do_something_with(tmp);
>

OK, so this is really the meat of the argument for the patch. You are looking to
add a compiler barrier instead of a CPU memory barrier. This should be what your
commit message is focused on and it should provide compelling evidence to
justify risking the change.

Compelling evidence for a compiler barrier would be a disassembly of the code
block before and after, demonstrating the compiler generating broken code and the
compiler generating correct code.

In addition to this, however, I would want to see a strong convincing argument
that the READ_ONCE volatile cast is sufficient to cover the original case which
motivated the addition of the barrier() (not "apparently" and "could be").

Thanks,

--
Darren Hart
Intel Open Source Technology Center


2016-03-02 02:32:07

by Jianyu Zhan

[permalink] [raw]
Subject: Re: [PATCH] futex: replace bare barrier() with a READ_ONCE()

On Wed, Mar 2, 2016 at 9:37 AM, Darren Hart <[email protected]> wrote:
> read_once will use a *volatile assignment instead of calling barrier()
> directly for a word size argument.
>
> With weak statements like "apparently" (above) and "could be" (from the original
> post: This patch replaces this bare barrier() with a READ_ONCE(), a weaker form
> of barrier(), which could be more informative.)" I do not see a compelling
> reason to change what is notoriously fragile code with respect to barriers.
>


Fair enough. The "apparently" wording is lame without assembly code evidence.
I do not have such a s390 machine, so I have cc''ed the original
author incorporating this barrier(),
hopefully he could help test this.

By "could be more informative" I mean a READ_ONCE has explanatory
effect by its name, instead of
a bare barrier() without more inline comment for why.

As I said the retry code logic is effectively the same as

while (lock_ptr = READ_ONCE(q->lock_ptr)) {
spin_lock(lock_ptr)
if (unlikely(lock_ptr != q->lock_ptr)) {
spin_unlock(lock_ptr);
}

...
}

in which case READ_ONCE could perfectly fit, in that it will make
compiler only read it once in every testing condition,
which will eliminate the original problem that commit e91467ecd1ef
addressed, though assembly code proof is needed.

Actually, the quotation I argued in previous mail could be used again
here, from memory-barriers.txt:

(*) The compiler is within its rights to merge successive loads from
the same variable. Such merging can cause the compiler to "optimize"
the following code:

while (tmp = a)
do_something_with(tmp);

into the following code, which, although in some sense legitimate
for single-threaded code, is almost certainly not what the developer
intended:

if (tmp = a)
for (;;)
do_something_with(tmp);

Use READ_ONCE() to prevent the compiler from doing this to you:

while (tmp = READ_ONCE(a))
do_something_with(tmp);


I might be wrong, so I have cc'ed Paul, and Peter, I wish they give comment :-)


> As for #2...
>
>> 2. For the second problem I memtioned, yes, it is theoretical, and
>> it is also due to q->lock_ptr can change between
>> the test of nullness of q->lock_ptr and the lock on q->lock_ptr.
>>
>> the code is
>>
>> retry:
>> lock_ptr = q->lock_ptr;
>> if (lock_ptr != 0) {
>> spin_lock(lock_ptr)
>> if (unlikely(lock_ptr != q->lock_ptr)) {
>> spin_unlock(lock_ptr);
>> goto retry;
>> }
>> ...
>> }
>>
>> which is effectively the same as :
>>
>> while (lock_ptr = q->lock_ptr) {
>> spin_lock(lock_ptr)
>> if (unlikely(lock_ptr != q->lock_ptr)) {
>> spin_unlock(lock_ptr);
>> goto retry;
>> }
>> ...
>> }
>>
>> This might cause the compiler load the q->lock_ptr once and use it
>> many times, quoted from
>> memory-barriers.txt:
>>
>>
>> (*) The compiler is within its rights to reload a variable, for example,
>> in cases where high register pressure prevents the compiler from
>> keeping all data of interest in registers. The compiler might
>> therefore optimize the variable 'tmp' out of our previous example:
>>
>> while (tmp = a)
>> do_something_with(tmp);
>>
>> This could result in the following code, which is perfectly safe in
>> single-threaded code, but can be fatal in concurrent code:
>>
>> while (a)
>> do_something_with(a);
>>
>> For example, the optimized version of this code could result in
>> passing a zero to do_something_with() in the case where the variable
>> a was modified by some other CPU between the "while" statement and
>> the call to do_something_with().
>>
>> Again, use READ_ONCE() to prevent the compiler from doing this:
>>
>> while (tmp = READ_ONCE(a))
>> do_something_with(tmp);
>>
>
> OK, so this is really the meat of the argument for the patch. You are looking to
> add a compiler barrier instead of a CPU memory barrier. This should be what your
> commit message is focused on and it should provide compelling evidence to
> justify risking the change.
>
> Compelling evidence for a compiler barrier would be a disassembly of the code
> block before and after, demonstrating the compiler generating broken code and the
> compiler generating correct code.
>
> In addition to this, however, I would want to see a strong convincing argument
> that the READ_ONCE volatile cast is sufficient to cover the original case which
> motivated the addition of the barrier() (not "apparently" and "could be").

As for #2, this is pure theoretical induction from this quotation, I
do have no convincing argument
and again I would like Paul to help clarify this.


Thanks very much!


Regards,
Jianyu Zhan

2016-03-02 06:39:21

by Darren Hart

[permalink] [raw]
Subject: Re: [PATCH] futex: replace bare barrier() with a READ_ONCE()

On Wed, Mar 02, 2016 at 10:31:18AM +0800, Jianyu Zhan wrote:
> On Wed, Mar 2, 2016 at 9:37 AM, Darren Hart <[email protected]> wrote:
> > read_once will use a *volatile assignment instead of calling barrier()
> > directly for a word size argument.
> >
> > With weak statements like "apparently" (above) and "could be" (from the original
> > post: This patch replaces this bare barrier() with a READ_ONCE(), a weaker form
> > of barrier(), which could be more informative.)" I do not see a compelling
> > reason to change what is notoriously fragile code with respect to barriers.
> >
>
>
> Fair enough. The "apparently" wording is lame without assembly code evidence.
> I do not have such a s390 machine, so I have cc''ed the original
> author incorporating this barrier(),
> hopefully he could help test this.
>
> By "could be more informative" I mean a READ_ONCE has explanatory
> effect by its name, instead of
> a bare barrier() without more inline comment for why.
>
> As I said the retry code logic is effectively the same as
>
> while (lock_ptr = READ_ONCE(q->lock_ptr)) {
> spin_lock(lock_ptr)
> if (unlikely(lock_ptr != q->lock_ptr)) {
> spin_unlock(lock_ptr);
> }
>
> ...
> }
>
> in which case READ_ONCE could perfectly fit, in that it will make
> compiler only read it once in every testing condition,
> which will eliminate the original problem that commit e91467ecd1ef
> addressed, though assembly code proof is needed.
>
> Actually, the quotation I argued in previous mail could be used again
> here, from memory-barriers.txt:
>
> (*) The compiler is within its rights to merge successive loads from
> the same variable. Such merging can cause the compiler to "optimize"
> the following code:
>
> while (tmp = a)
> do_something_with(tmp);
>
> into the following code, which, although in some sense legitimate
> for single-threaded code, is almost certainly not what the developer
> intended:
>
> if (tmp = a)
> for (;;)
> do_something_with(tmp);
>
> Use READ_ONCE() to prevent the compiler from doing this to you:
>
> while (tmp = READ_ONCE(a))
> do_something_with(tmp);
>
>
> I might be wrong, so I have cc'ed Paul, and Peter, I wish they give comment :-)

I believe you are correct with respect to the retry and while condition being an
appropriate place for the application of READ_ONCE. The question is why is this
preferred to the existing barrier()? I suggest:

While barrier() is a fairly brute force general application of a compiler
barrier, READ_ONCE() is very specific, it targets only operations dealing with
the specified variable. As such, its application both clearly identifies the
volatile variable and frees the compiler to make optimizations a more general
barrier would forbid.


>
>
> > As for #2...
> >
> >> 2. For the second problem I memtioned, yes, it is theoretical, and
> >> it is also due to q->lock_ptr can change between
> >> the test of nullness of q->lock_ptr and the lock on q->lock_ptr.
> >>
> >> the code is
> >>
> >> retry:
> >> lock_ptr = q->lock_ptr;
> >> if (lock_ptr != 0) {
> >> spin_lock(lock_ptr)
> >> if (unlikely(lock_ptr != q->lock_ptr)) {
> >> spin_unlock(lock_ptr);
> >> goto retry;
> >> }
> >> ...
> >> }
> >>
> >> which is effectively the same as :
> >>
> >> while (lock_ptr = q->lock_ptr) {
> >> spin_lock(lock_ptr)
> >> if (unlikely(lock_ptr != q->lock_ptr)) {
> >> spin_unlock(lock_ptr);
> >> goto retry;
> >> }
> >> ...
> >> }
> >>
> >> This might cause the compiler load the q->lock_ptr once and use it
> >> many times, quoted from

Which is already covered by the barrier() in place today as a more general
compiler barrier.

Your argument is then simply that READ_ONCE is a more specific solution to the
problem.

> >> memory-barriers.txt:
> >>
> >>
> >> (*) The compiler is within its rights to reload a variable, for example,
> >> in cases where high register pressure prevents the compiler from
> >> keeping all data of interest in registers. The compiler might
> >> therefore optimize the variable 'tmp' out of our previous example:
> >>
> >> while (tmp = a)
> >> do_something_with(tmp);
> >>
> >> This could result in the following code, which is perfectly safe in
> >> single-threaded code, but can be fatal in concurrent code:
> >>
> >> while (a)
> >> do_something_with(a);
> >>
> >> For example, the optimized version of this code could result in
> >> passing a zero to do_something_with() in the case where the variable
> >> a was modified by some other CPU between the "while" statement and
> >> the call to do_something_with().
> >>
> >> Again, use READ_ONCE() to prevent the compiler from doing this:
> >>
> >> while (tmp = READ_ONCE(a))
> >> do_something_with(tmp);
> >>
> >
> > OK, so this is really the meat of the argument for the patch. You are looking to
> > add a compiler barrier instead of a CPU memory barrier. This should be what your

I was incorrect in my statement here. barrier() is a general compiler barrier and
READ_ONCE is a targeted compiler barrier (impacting only the operations dealing
with the specified variable). This helps the argument considerably.

> > commit message is focused on and it should provide compelling evidence to
> > justify risking the change.
> >
> > Compelling evidence for a compiler barrier would be a disassembly of the code
> > block before and after, demonstrating the compiler generating broken code and the
> > compiler generating correct code.
> >
> > In addition to this, however, I would want to see a strong convincing argument
> > that the READ_ONCE volatile cast is sufficient to cover the original case which
> > motivated the addition of the barrier() (not "apparently" and "could be").
>
> As for #2, this is pure theoretical induction from this quotation, I
> do have no convincing argument
> and again I would like Paul to help clarify this.
>
>
> Thanks very much!
>
>
> Regards,
> Jianyu Zhan
>

--
Darren Hart
Intel Open Source Technology Center

2016-03-02 07:13:33

by Jianyu Zhan

[permalink] [raw]
Subject: Re: [PATCH] futex: replace bare barrier() with a READ_ONCE()

On Wed, Mar 2, 2016 at 2:39 PM, Darren Hart <[email protected]> wrote:
> I believe you are correct with respect to the retry and while condition being an
> appropriate place for the application of READ_ONCE. The question is why is this
> preferred to the existing barrier()? I suggest:
>
> While barrier() is a fairly brute force general application of a compiler
> barrier, READ_ONCE() is very specific, it targets only operations dealing with
> the specified variable. As such, its application both clearly identifies the
> volatile variable and frees the compiler to make optimizations a more general
> barrier would forbid.
>
>

Yep, beside the informative point, the more specifics of READ_ONCE
over barrier
is what I meant "lightweight", I missed emphasizing this point.
Thanks for pointing it.

I will respin this patch to reflect this.

>>
>>
>> > As for #2...
>> >
>> >> 2. For the second problem I memtioned, yes, it is theoretical, and
>> >> it is also due to q->lock_ptr can change between
>> >> the test of nullness of q->lock_ptr and the lock on q->lock_ptr.
>> >>
>> >> the code is
>> >>
>> >> retry:
>> >> lock_ptr = q->lock_ptr;
>> >> if (lock_ptr != 0) {
>> >> spin_lock(lock_ptr)
>> >> if (unlikely(lock_ptr != q->lock_ptr)) {
>> >> spin_unlock(lock_ptr);
>> >> goto retry;
>> >> }
>> >> ...
>> >> }
>> >>
>> >> which is effectively the same as :
>> >>
>> >> while (lock_ptr = q->lock_ptr) {
>> >> spin_lock(lock_ptr)
>> >> if (unlikely(lock_ptr != q->lock_ptr)) {
>> >> spin_unlock(lock_ptr);
>> >> goto retry;
>> >> }
>> >> ...
>> >> }
>> >>
>> >> This might cause the compiler load the q->lock_ptr once and use it
>> >> many times, quoted from
>
> Which is already covered by the barrier() in place today as a more general
> compiler barrier.
>
> Your argument is then simply that READ_ONCE is a more specific solution to the
> problem.
>

Yep. And after re-thinking, I am now less convinced in this second
argument, since
it involves a comparison of q->lock_ptr in the loop body, so this may
defeat any attempts
that compilers try to optimize the load out of the loop, even without
a READ_ONCE().

But I will also incorporate this in the second submission for review .



Regards,
Jianyu Zhan