2023-10-27 21:08:23

by Paul E. McKenney

[permalink] [raw]
Subject: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion

Hello!

FYI, unless someone complains, it is quite likely that C++ (and thus
likely C) compilers and standards will enforce Hans Boehm's proposal
for ordering relaxed loads before relaxed stores. The document [1]
cites "Bounding data races in space and time" by Dolan et al. [2], and
notes an "average a 2.x% slow down" for ARMv8 and PowerPC. In the past,
this has been considered unacceptable, among other things, due to the
fact that this issue is strictly theoretical.

This would not (repeat, not) affect the current Linux kernel, which
relies on volatile loads and stores rather than C/C++ atomics.

To be clear, the initial proposal is not to change the standards, but
rather to add a command-line argument to enforce the stronger ordering.
However, given the long list of ARM-related folks in the Acknowledgments
section, the future direction is clear.

So, do any ARMv8, PowerPC, or RISC-V people still care? If so, I strongly
recommend speaking up. ;-)

Thanx, Paul

[1] https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/
[2] https://dl.acm.org/doi/10.1145/3192366.3192421

----- Forwarded message from David Goldblatt via Parallel <[email protected]> -----

Date: Fri, 27 Oct 2023 11:09:18 -0700
From: David Goldblatt via Parallel <[email protected]>
To: SG1 concurrency and parallelism <[email protected]>
Reply-To: [email protected]
Cc: David Goldblatt <[email protected]>
Subject: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion

Those who read this list but not the LLVM discourse might be interested in:
- This discussion, proposing `-mstrict-rlx-atomics`:
https://discourse.llvm.org/t/rfc-strengthen-relaxed-atomics-implementation-behind-mstrict-rlx-atomics-flag/74473
to enforce load-store ordering
- The associated blog post here:
https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/

- David

_______________________________________________
Parallel mailing list
[email protected]
Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/parallel
Link to this post: http://lists.isocpp.org/parallel/2023/10/4151.php


----- End forwarded message -----


2023-11-03 17:03:17

by Alglave, Jade

[permalink] [raw]
Subject: Re: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion

Dear all, (resending because I accidentally sent it in html first, sorry)

Arm’s official position on the topic can be found in this recent blog:
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-technical-view-on-relaxed-atomics

Please do reach out to [email protected] if there are any questions.
Thanks,
Jade


From: Paul E. McKenney <[email protected]>
Sent: 27 October 2023 22:08
To: Alglave, Jade <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>
Cc: [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>
Subject: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion

⚠ Caution: External sender


Hello!

FYI, unless someone complains, it is quite likely that C++ (and thus
likely C) compilers and standards will enforce Hans Boehm's proposal
for ordering relaxed loads before relaxed stores. The document [1]
cites "Bounding data races in space and time" by Dolan et al. [2], and
notes an "average a 2.x% slow down" for ARMv8 and PowerPC. In the past,
this has been considered unacceptable, among other things, due to the
fact that this issue is strictly theoretical.

This would not (repeat, not) affect the current Linux kernel, which
relies on volatile loads and stores rather than C/C++ atomics.

To be clear, the initial proposal is not to change the standards, but
rather to add a command-line argument to enforce the stronger ordering.
However, given the long list of ARM-related folks in the Acknowledgments
section, the future direction is clear.

So, do any ARMv8, PowerPC, or RISC-V people still care? If so, I strongly
recommend speaking up. ;-)

Thanx, Paul

[1] https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/
[2] https://dl.acm.org/doi/10.1145/3192366.3192421

----- Forwarded message from David Goldblatt via Parallel <[email protected]> -----

Date: Fri, 27 Oct 2023 11:09:18 -0700
From: David Goldblatt via Parallel <[email protected]>
To: SG1 concurrency and parallelism <[email protected]>
Reply-To: [email protected]
Cc: David Goldblatt <[email protected]>
Subject: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion

Those who read this list but not the LLVM discourse might be interested in:
- This discussion, proposing `-mstrict-rlx-atomics`:
https://discourse.llvm.org/t/rfc-strengthen-relaxed-atomics-implementation-behind-mstrict-rlx-atomics-flag/74473
to enforce load-store ordering
- The associated blog post here:
https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/

- David

_______________________________________________
Parallel mailing list
[email protected]
Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/parallel
Link to this post: http://lists.isocpp.org/parallel/2023/10/4151.php


----- End forwarded message -----

2023-11-04 18:21:11

by Jonas Oberhauser

[permalink] [raw]
Subject: Re: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion

Thanks Jade.

I agree with the position you linked to in that the move is... unwise.

IMO, for a high-level language like C, if you need to outrule OOTA, just
declare it impossible (Viktor, in CC, made this suggestion a while ago)
by a "no OOTA axiom".

BTW, is there at least a proof that just making relaxed atomics ordered
in this way rules out OOTA in programs that contain non-atomics?
Or can we have something like the LKMM OOTA example I sent around last year?


best wishes,

jonas


Am 11/3/2023 um 6:02 PM schrieb Alglave, Jade:
> Dear all, (resending because I accidentally sent it in html first, sorry)
>
> Arm’s official position on the topic can be found in this recent blog:
> https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-technical-view-on-relaxed-atomics
>
> Please do reach out to [email protected] if there are any questions.
> Thanks,
> Jade
>
>
> From: Paul E. McKenney <[email protected]>
> Sent: 27 October 2023 22:08
> To: Alglave, Jade <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>
> Cc: [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>
> Subject: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion
>
> ⚠ Caution: External sender
>
>
> Hello!
>
> FYI, unless someone complains, it is quite likely that C++ (and thus
> likely C) compilers and standards will enforce Hans Boehm's proposal
> for ordering relaxed loads before relaxed stores. The document [1]
> cites "Bounding data races in space and time" by Dolan et al. [2], and
> notes an "average a 2.x% slow down" for ARMv8 and PowerPC. In the past,
> this has been considered unacceptable, among other things, due to the
> fact that this issue is strictly theoretical.
>
> This would not (repeat, not) affect the current Linux kernel, which
> relies on volatile loads and stores rather than C/C++ atomics.
>
> To be clear, the initial proposal is not to change the standards, but
> rather to add a command-line argument to enforce the stronger ordering.
> However, given the long list of ARM-related folks in the Acknowledgments
> section, the future direction is clear.
>
> So, do any ARMv8, PowerPC, or RISC-V people still care? If so, I strongly
> recommend speaking up. ;-)
>
> Thanx, Paul
>
> [1] https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/
> [2] https://dl.acm.org/doi/10.1145/3192366.3192421
>
> ----- Forwarded message from David Goldblatt via Parallel <[email protected]> -----
>
> Date: Fri, 27 Oct 2023 11:09:18 -0700
> From: David Goldblatt via Parallel <[email protected]>
> To: SG1 concurrency and parallelism <[email protected]>
> Reply-To: [email protected]
> Cc: David Goldblatt <[email protected]>
> Subject: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion
>
> Those who read this list but not the LLVM discourse might be interested in:
> - This discussion, proposing `-mstrict-rlx-atomics`:
> https://discourse.llvm.org/t/rfc-strengthen-relaxed-atomics-implementation-behind-mstrict-rlx-atomics-flag/74473
> to enforce load-store ordering
> - The associated blog post here:
> https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/
>
> - David
>
> _______________________________________________
> Parallel mailing list
> [email protected]
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/parallel
> Link to this post: http://lists.isocpp.org/parallel/2023/10/4151.php
>
>
> ----- End forwarded message -----

2023-11-05 23:09:42

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion

On Fri, Oct 27, 2023 at 02:08:13PM -0700, Paul E. McKenney wrote:
> Hello!
>
> FYI, unless someone complains, it is quite likely that C++ (and thus
> likely C) compilers and standards will enforce Hans Boehm's proposal
> for ordering relaxed loads before relaxed stores. The document [1]
> cites "Bounding data races in space and time" by Dolan et al. [2], and
> notes an "average a 2.x% slow down" for ARMv8 and PowerPC. In the past,
> this has been considered unacceptable, among other things, due to the
> fact that this issue is strictly theoretical.
>
> This would not (repeat, not) affect the current Linux kernel, which
> relies on volatile loads and stores rather than C/C++ atomics.
>
> To be clear, the initial proposal is not to change the standards, but
> rather to add a command-line argument to enforce the stronger ordering.
> However, given the long list of ARM-related folks in the Acknowledgments
> section, the future direction is clear.
>
> So, do any ARMv8, PowerPC, or RISC-V people still care? If so, I strongly
> recommend speaking up. ;-)

OK, I finally had some time to read up...

Colour me properly confused. To me this all reads like C people can't
deal with relaxed atomics and are doing crazy things to try and 'fix'
it.

And while I don't speak for ARM/Power, I do worry this all takes C/C++
even further away from LKMM instead of closing the gap.

Worse, things like:

https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/

Which state:

"It would solve real issues in the Linux Kernel without costly fences
(the kernel does not use relaxed atomics or the ISO C/C++ model - the
load buffering issue affects the ISO C and linux memory models) ..."

Which is a contradiction if ever I saw one. It both claims this atrocity
fixes our volatile_if() woes while at the same time saying we're
unaffected because we don't use any of the C/C++ atomic batshit.

Anyway, I worry that all this faffing about will get in the way of our
volatile_if() 'demands'. Compiler people will tell us, just use relaxed
atomics, which that is very much not what we want. We know relaxed loads
and stores behave 'funny', we've been doing that for a long long time.
Don't impose that madness on us. And certainly don't use us as an excuse
to peddle this nonsense.

Bah, what a load of crazy.

/me stomps off in disgust.

2023-11-07 02:16:41

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Fw: [isocpp-parallel] OOTA fix (via fake branch-after-load) discussion

On Mon, Nov 06, 2023 at 12:08:59AM +0100, Peter Zijlstra wrote:
> On Fri, Oct 27, 2023 at 02:08:13PM -0700, Paul E. McKenney wrote:
> > Hello!
> >
> > FYI, unless someone complains, it is quite likely that C++ (and thus
> > likely C) compilers and standards will enforce Hans Boehm's proposal
> > for ordering relaxed loads before relaxed stores. The document [1]
> > cites "Bounding data races in space and time" by Dolan et al. [2], and
> > notes an "average a 2.x% slow down" for ARMv8 and PowerPC. In the past,
> > this has been considered unacceptable, among other things, due to the
> > fact that this issue is strictly theoretical.
> >
> > This would not (repeat, not) affect the current Linux kernel, which
> > relies on volatile loads and stores rather than C/C++ atomics.
> >
> > To be clear, the initial proposal is not to change the standards, but
> > rather to add a command-line argument to enforce the stronger ordering.
> > However, given the long list of ARM-related folks in the Acknowledgments
> > section, the future direction is clear.
> >
> > So, do any ARMv8, PowerPC, or RISC-V people still care? If so, I strongly
> > recommend speaking up. ;-)
>
> OK, I finally had some time to read up...
>
> Colour me properly confused. To me this all reads like C people can't
> deal with relaxed atomics and are doing crazy things to try and 'fix'
> it.
>
> And while I don't speak for ARM/Power, I do worry this all takes C/C++
> even further away from LKMM instead of closing the gap.
>
> Worse, things like:
>
> https://lukegeeson.com/blog/2023-10-17-A-Proposal-For-Relaxed-Atomics/
>
> Which state:
>
> "It would solve real issues in the Linux Kernel without costly fences
> (the kernel does not use relaxed atomics or the ISO C/C++ model - the
> load buffering issue affects the ISO C and linux memory models) ..."
>
> Which is a contradiction if ever I saw one. It both claims this atrocity
> fixes our volatile_if() woes while at the same time saying we're
> unaffected because we don't use any of the C/C++ atomic batshit.

I guess that my traditional reply would be that if you are properly
confused by all this, that just means that you were reading carefully.

> Anyway, I worry that all this faffing about will get in the way of our
> volatile_if() 'demands'. Compiler people will tell us, just use relaxed
> atomics, which that is very much not what we want. We know relaxed loads
> and stores behave 'funny', we've been doing that for a long long time.
> Don't impose that madness on us. And certainly don't use us as an excuse
> to peddle this nonsense.

I am very much against incurring real overhead to solve an issue that is
an issue only in theory and not in practice. I wish I could confidently
say that my view will prevail, but...

> Bah, what a load of crazy.
>
> /me stomps off in disgust.

If this goes through and if developers see any overhead from relaxed
atomics in a situation that matters to them, they will reach for some
other tool. Inline assembly and volatile accesses, I suppose. Or the
traditional approach of a compiler flag.

Thanx, Paul