LinuxLists.cc - LKMM: Making RMW barriers explicit

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

Am 5/16/2024 um 3:43 AM schrieb Alan Stern:
> Hernan and Jonas:
>
> Can you explain more fully the changes you want to make to herd7 and/or
> the LKMM? The goal is to make the memory barriers currently implicit in
> RMW operations explicit, but I couldn't understand how you propose to do
> this.
>
> Are you going to change herd7 somehow, and if so, how? It seems like
> you should want to provide sufficient information so that the .bell
> and .cat files can implement the appropriate memory barriers associated
> with each RMW operation. What additional information is needed? And
> how (explained in English, not by quoting source code) will the .bell
> and .cat files make use of this information?
>
> Alan

I don't know whether herd7 needs to be changed. Probably, herd7 does the
following:
- if a tag called Mb appears on an rmw instruction (by instruction I
mean things like xchg(), atomic_inc_return_relaxed()), replace it with
one of those things:
* full mb ; once (the rmw) ; full mb, if a value returning
(successful) rmw
* once (the rmw) otherwise
- everything else gets translated 1:1 into some internal representation

What I'm proposing is:
1. remove this transpilation step,
2. and instead allow the Mb tag to actually appear on RMW instructions
3. change the cat file to explicitly define the behavior of the Mb tag
on RMW instructions

There are probably two ways to achieve this:
a) change herd7 to remove the special behavior for Mb, after that we
should be able to do everything else in the .cat/.bell/.def files.
b) sidestep herd7's behavior by renaming Mb to _Mb in the .def file,
and then defining Mb=_Mb in the .bell file, and define the semantics
of the Mb tag in the .cat files.

The latter would not include modification to herd7, but it's a bit hacky.

I'm not sure if the second way really works since I don't know exactly
how the herd7 tool does the transpilation, e.g., if it really looks for
an Mb tag or rather for the names of the instructions.

Does it make sense?

jonas

2024-05-16 08:54:12

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

On 5/16/2024 10:31 AM, Jonas Oberhauser wrote:
>
>
> Am 5/16/2024 um 3:43 AM schrieb Alan Stern:
>> Hernan and Jonas:
>>
>> Can you explain more fully the changes you want to make to herd7 and/or
>> the LKMM? The goal is to make the memory barriers currently implicit in
>> RMW operations explicit, but I couldn't understand how you propose to do
>> this.
>>
>> Are you going to change herd7 somehow, and if so, how? It seems like
>> you should want to provide sufficient information so that the .bell
>> and .cat files can implement the appropriate memory barriers associated
>> with each RMW operation. What additional information is needed? And
>> how (explained in English, not by quoting source code) will the .bell
>> and .cat files make use of this information?
>>
>> Alan
>
>
> I don't know whether herd7 needs to be changed. Probably, herd7 does the
> following:
> - if a tag called Mb appears on an rmw instruction (by instruction I
> mean things like xchg(), atomic_inc_return_relaxed()), replace it with
> one of those things:
> * full mb ; once (the rmw) ; full mb, if a value returning
> (successful) rmw
> * once (the rmw) otherwise
> - everything else gets translated 1:1 into some internal representation

This is my understanding from reading the source code of CSem.ml in
herd7's repo.

Also, this is exactly what dartagnan is currently doing.

>
> What I'm proposing is:
> 1. remove this transpilation step,
> 2. and instead allow the Mb tag to actually appear on RMW instructions
> 3. change the cat file to explicitly define the behavior of the Mb tag
> on RMW instructions

These are the exact 3 things I changed in dartagnan for testing what
Jonas proposed.

I am not sure if further changes are needed for herd7.

Hernan

2024-05-18 00:31:44

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote:
> On 5/16/2024 10:31 AM, Jonas Oberhauser wrote:
> >
> >
> > Am 5/16/2024 um 3:43 AM schrieb Alan Stern:
> > > Hernan and Jonas:
> > >
> > > Can you explain more fully the changes you want to make to herd7 and/or
> > > the LKMM?? The goal is to make the memory barriers currently implicit in
> > > RMW operations explicit, but I couldn't understand how you propose to do
> > > this.
> > >
> > > Are you going to change herd7 somehow, and if so, how?? It seems like
> > > you should want to provide sufficient information so that the .bell
> > > and .cat files can implement the appropriate memory barriers associated
> > > with each RMW operation.? What additional information is needed?? And
> > > how (explained in English, not by quoting source code) will the .bell
> > > and .cat files make use of this information?
> > >
> > > Alan
> >
> >
> > I don't know whether herd7 needs to be changed. Probably, herd7 does the
> > following:
> > - if a tag called Mb appears on an rmw instruction (by instruction I
> > mean things like xchg(), atomic_inc_return_relaxed()), replace it with
> > one of those things:
> > ? * full mb ; once (the rmw) ; full mb, if a value returning
> > (successful) rmw
> > ? * once (the rmw)?? otherwise
> > - everything else gets translated 1:1 into some internal representation
>
> This is my understanding from reading the source code of CSem.ml in herd7's
> repo.
>
> Also, this is exactly what dartagnan is currently doing.
>
> >
> > What I'm proposing is:
> > 1. remove this transpilation step,
> > 2. and instead allow the Mb tag to actually appear on RMW instructions
> > 3. change the cat file to explicitly define the behavior of the Mb tag
> > on RMW instructions
>
> These are the exact 3 things I changed in dartagnan for testing what Jonas
> proposed.
>
> I am not sure if further changes are needed for herd7.

Okay, good. This answers the first part of what I asked. What about
the second part? That is, how will the changes to the .def, .bell, and
cat files achieve your goals?

Alan

2024-05-21 09:58:12

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

Am 5/18/2024 um 2:31 AM schrieb Alan Stern:
> On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote:
>> On 5/16/2024 10:31 AM, Jonas Oberhauser wrote:
>>>
>>>
>>> Am 5/16/2024 um 3:43 AM schrieb Alan Stern:
>>>> Hernan and Jonas:
>>>>
>>>> Can you explain more fully the changes you want to make to herd7 and/or
>>>> the LKMM? The goal is to make the memory barriers currently implicit in
>>>> RMW operations explicit, but I couldn't understand how you propose to do
>>>> this.
>>>>
>>>> Are you going to change herd7 somehow, and if so, how? It seems like
>>>> you should want to provide sufficient information so that the .bell
>>>> and .cat files can implement the appropriate memory barriers associated
>>>> with each RMW operation. What additional information is needed? And
>>>> how (explained in English, not by quoting source code) will the .bell
>>>> and .cat files make use of this information?
>>>>
>>>> Alan
>>>
>>>
>>> I don't know whether herd7 needs to be changed. Probably, herd7 does the
>>> following:
>>> - if a tag called Mb appears on an rmw instruction (by instruction I
>>> mean things like xchg(), atomic_inc_return_relaxed()), replace it with
>>> one of those things:
>>> * full mb ; once (the rmw) ; full mb, if a value returning
>>> (successful) rmw
>>> * once (the rmw) otherwise
>>> - everything else gets translated 1:1 into some internal representation
>>
>> This is my understanding from reading the source code of CSem.ml in herd7's
>> repo.
>>
>> Also, this is exactly what dartagnan is currently doing.
>>
>>>
>>> What I'm proposing is:
>>> 1. remove this transpilation step,
>>> 2. and instead allow the Mb tag to actually appear on RMW instructions
>>> 3. change the cat file to explicitly define the behavior of the Mb tag
>>> on RMW instructions
>>
>> These are the exact 3 things I changed in dartagnan for testing what Jonas
>> proposed.
>>
>> I am not sure if further changes are needed for herd7.
>
> Okay, good. This answers the first part of what I asked. What about
> the second part? That is, how will the changes to the .def, .bell, and
> .cat files achieve your goals?
>
> Alan

Firstly, we'd allow 'mb as a barrier mode in events, e.g.

instructions RMW[{'once,'acquire,'release,'mb}]

then the Mb tags would appear in the graph. And then I'd define the
ordering explicitly. One way is to say that an Mb tag orders all memory
accesses before(or at) the tag with all memory accesses after(or at) the
tag, except the accesses of the rmw with each other.
This is the same as the full fence before the read, which orders all
memory accesses before the read with every access after(or at) the read,
plus the full fence after the write, which orders all memory accesses
before(or at) the write with all accesses after the write.

That would be done by adding

[M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M]

to ppo.

One could also split it into two rules to keep with the "two full
fences" analogy. Maybe a good way would be like this:

[M] ; po; [RMW_MB & R] ; po^? ; [M]

[M] ; po^?; [RMW_MB & W] ; po ; [M]

Hope that makes sense,

jonas

2024-05-21 11:42:28

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

On 5/18/2024 2:31 AM, Alan Stern wrote:
> On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote:
>> On 5/16/2024 10:31 AM, Jonas Oberhauser wrote:
>>>
>>>
>>> Am 5/16/2024 um 3:43 AM schrieb Alan Stern:
>>>> Hernan and Jonas:
>>>>
>>>> Can you explain more fully the changes you want to make to herd7 and/or
>>>> the LKMM? The goal is to make the memory barriers currently implicit in
>>>> RMW operations explicit, but I couldn't understand how you propose to do
>>>> this.
>>>>
>>>> Are you going to change herd7 somehow, and if so, how? It seems like
>>>> you should want to provide sufficient information so that the .bell
>>>> and .cat files can implement the appropriate memory barriers associated
>>>> with each RMW operation. What additional information is needed? And
>>>> how (explained in English, not by quoting source code) will the .bell
>>>> and .cat files make use of this information?
>>>>
>>>> Alan
>>>
>>>
>>> I don't know whether herd7 needs to be changed. Probably, herd7 does the
>>> following:
>>> - if a tag called Mb appears on an rmw instruction (by instruction I
>>> mean things like xchg(), atomic_inc_return_relaxed()), replace it with
>>> one of those things:
>>> * full mb ; once (the rmw) ; full mb, if a value returning
>>> (successful) rmw
>>> * once (the rmw) otherwise
>>> - everything else gets translated 1:1 into some internal representation
>>
>> This is my understanding from reading the source code of CSem.ml in herd7's
>> repo.
>>
>> Also, this is exactly what dartagnan is currently doing.
>>
>>>
>>> What I'm proposing is:
>>> 1. remove this transpilation step,
>>> 2. and instead allow the Mb tag to actually appear on RMW instructions
>>> 3. change the cat file to explicitly define the behavior of the Mb tag
>>> on RMW instructions
>>
>> These are the exact 3 things I changed in dartagnan for testing what Jonas
>> proposed.
>>
>> I am not sure if further changes are needed for herd7.

I implemented these changes in herd7 and they seem enough.
I opened a PRs for discussion
https://github.com/herd/herdtools7/pull/865

>
> Okay, good. This answers the first part of what I asked. What about
> the second part? That is, how will the changes to the .def, .bell, and
> .cat files achieve your goals?

At least the cat model needs to be updated. If I remove the fences from
herd7, but keep the current model, these 4 tests fail.

--- Summary:
!!! Result changed:
/litmus/manual/locked/SUW+or-ow+l-ow-or.litmus.out.new
!!! Result changed:
/litmus/manual/atomic/C-PaulEMcKenney-SB+adat-o+adat-o.litmus.out.new
!!! Result changed: ./litmus/manual/atomic/C-atomic-01.litmus.out.new
!!! Result changed: ./litmus/manual/atomic/C-atomic-02.litmus.out.new

Using this patch I get the correct results

diff --git a/tools/memory-model/linux-kernel.cat
b/tools/memory-model/linux-kernel.cat
index adf3c4f41229..7e4787cdbd66 100644
--- a/tools/memory-model/linux-kernel.cat
+++ b/tools/memory-model/linux-kernel.cat
@@ -34,6 +34,12 @@ let R4rmb = R \ Noreturn (* Reads for which rmb
works *)
let rmb = [R4rmb] ; fencerel(Rmb) ; [R4rmb]
let wmb = [W] ; fencerel(Wmb) ; [W]
let mb = ([M] ; fencerel(Mb) ; [M]) |
+ (* everything across a full barrier RMW is ordered. This
includes up to one event inside the RMW. *)
+ ([M] ; po ; [RMW & Mb] ; po ; [M]) |
+ (* full barrier RMW writes are ordered with everything behind
the RMW *)
+ ([W & RMW & Mb] ; po ; [M]) |
+ (* full barrier RMW reads are ordered with everything before the
RMW *)
+ ([M] ; po ; [R & RMW & Mb]) |
([M] ; fencerel(Before-atomic) ; [RMW] ; po? ; [M]) |
([M] ; po? ; [RMW] ; fencerel(After-atomic) ; [M]) |
([M] ; po? ; [LKW] ; fencerel(After-spinlock) ; [M]) |

Hernan

>
> Alan

2024-05-21 15:36:49

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

On Tue, May 21, 2024 at 11:57:29AM +0200, Jonas Oberhauser wrote:
>
>
> Am 5/18/2024 um 2:31 AM schrieb Alan Stern:
> > On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote:
> > > On 5/16/2024 10:31 AM, Jonas Oberhauser wrote:
> > > >
> > > >
> > > > Am 5/16/2024 um 3:43 AM schrieb Alan Stern:
> > > > > Hernan and Jonas:
> > > > >
> > > > > Can you explain more fully the changes you want to make to herd7 and/or
> > > > > the LKMM?? The goal is to make the memory barriers currently implicit in
> > > > > RMW operations explicit, but I couldn't understand how you propose to do
> > > > > this.
> > > > >
> > > > > Are you going to change herd7 somehow, and if so, how?? It seems like
> > > > > you should want to provide sufficient information so that the .bell
> > > > > and .cat files can implement the appropriate memory barriers associated
> > > > > with each RMW operation.? What additional information is needed?? And
> > > > > how (explained in English, not by quoting source code) will the .bell
> > > > > and .cat files make use of this information?
> > > > >
> > > > > Alan
> > > >
> > > >
> > > > I don't know whether herd7 needs to be changed. Probably, herd7 does the
> > > > following:
> > > > - if a tag called Mb appears on an rmw instruction (by instruction I
> > > > mean things like xchg(), atomic_inc_return_relaxed()), replace it with
> > > > one of those things:
> > > > ? * full mb ; once (the rmw) ; full mb, if a value returning
> > > > (successful) rmw
> > > > ? * once (the rmw)?? otherwise
> > > > - everything else gets translated 1:1 into some internal representation
> > >
> > > This is my understanding from reading the source code of CSem.ml in herd7's
> > > repo.
> > >
> > > Also, this is exactly what dartagnan is currently doing.
> > >
> > > >
> > > > What I'm proposing is:
> > > > 1. remove this transpilation step,
> > > > 2. and instead allow the Mb tag to actually appear on RMW instructions
> > > > 3. change the cat file to explicitly define the behavior of the Mb tag
> > > > on RMW instructions
> > >
> > > These are the exact 3 things I changed in dartagnan for testing what Jonas
> > > proposed.
> > >
> > > I am not sure if further changes are needed for herd7.

What about failed RMW instructions? IIRC, herd7 generates just an R for
these, not both R and W, but won't it still be annotated with an mb tag?
And wasn't this matter of failed RMWs one of the issues that the two of
you specifically wanted to make explicit in the memory model, rather
than implicit in the operation of herd7?

And wasn't another one of these issues the difference between
value-returning and non-value-returning RMWs? As far as I can, nothing
in the .def file specifically mentions this. There's the noreturn tag
in the .bell file, but nothing in the .def file says which instructions
it applies to. Or are we supposed to know that it automatically applies
to all __atomic_op() instances?

> > Okay, good. This answers the first part of what I asked. What about
> > the second part? That is, how will the changes to the .def, .bell, and
> > .cat files achieve your goals?
> >
> > Alan
>
>
> Firstly, we'd allow 'mb as a barrier mode in events, e.g.

You mean as a mode in RMW events. 'mb already is a mode for F events.

> instructions RMW[{'once,'acquire,'release,'mb}]
>
> then the Mb tags would appear in the graph. And then I'd define the ordering
> explicitly. One way is to say that an Mb tag orders all memory accesses
> before(or at) the tag with all memory accesses after(or at) the tag, except
> the accesses of the rmw with each other.

I don't think you need to add very much. The .cat file already defines
the mb relation as including the term:

([M] ; fencerel(Mb) ; [M])

All that's needed is to replace the fencerel(Mb) with something more
general...

> This is the same as the full fence before the read, which orders all memory
> accesses before the read with every access after(or at) the read,
> plus the full fence after the write, which orders all memory accesses
> before(or at) the write with all accesses after the write.
>
> That would be done by adding
>
> [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M]

.. like this.

> to ppo.

You need to update the definition of mb, not ppo. And the RMW_MB above
looks wrong; shouldn't it be just Mb?

Also, is the "\ rmw" part really necessary? I don't think it makes any
difference; the memory model already knows that the read part of an RMW
has to happen before the write part.

> One could also split it into two rules to keep with the "two full fences"
> analogy. Maybe a good way would be like this:
>
> [M] ; po; [RMW_MB & R] ; po^? ; [M]
>
> [M] ; po^?; [RMW_MB & W] ; po ; [M]

My preference is for the first approach.

Alan

2024-05-22 09:21:26

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

Am 5/21/2024 um 5:36 PM schrieb Alan Stern:
> On Tue, May 21, 2024 at 11:57:29AM +0200, Jonas Oberhauser wrote:
>>
>>
>> Am 5/18/2024 um 2:31 AM schrieb Alan Stern:
>>> On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote:
>>>> On 5/16/2024 10:31 AM, Jonas Oberhauser wrote:
>>>>>
>>>>>
>>>>> Am 5/16/2024 um 3:43 AM schrieb Alan Stern:
>>>>>> Hernan and Jonas:
>>>>>>
>>>>>> Can you explain more fully the changes you want to make to herd7 and/or
>>>>>> the LKMM? The goal is to make the memory barriers currently implicit in
>>>>>> RMW operations explicit, but I couldn't understand how you propose to do
>>>>>> this.
>>>>>>
>>>>>> Are you going to change herd7 somehow, and if so, how? It seems like
>>>>>> you should want to provide sufficient information so that the .bell
>>>>>> and .cat files can implement the appropriate memory barriers associated
>>>>>> with each RMW operation. What additional information is needed? And
>>>>>> how (explained in English, not by quoting source code) will the .bell
>>>>>> and .cat files make use of this information?
>>>>>>
>>>>>> Alan
>>>>>
>>>>>
>>>>> I don't know whether herd7 needs to be changed. Probably, herd7 does the
>>>>> following:
>>>>> - if a tag called Mb appears on an rmw instruction (by instruction I
>>>>> mean things like xchg(), atomic_inc_return_relaxed()), replace it with
>>>>> one of those things:
>>>>> * full mb ; once (the rmw) ; full mb, if a value returning
>>>>> (successful) rmw
>>>>> * once (the rmw) otherwise
>>>>> - everything else gets translated 1:1 into some internal representation
>>>>
>>>> This is my understanding from reading the source code of CSem.ml in herd7's
>>>> repo.
>>>>
>>>> Also, this is exactly what dartagnan is currently doing.
>>>>
>>>>>
>>>>> What I'm proposing is:
>>>>> 1. remove this transpilation step,
>>>>> 2. and instead allow the Mb tag to actually appear on RMW instructions
>>>>> 3. change the cat file to explicitly define the behavior of the Mb tag
>>>>> on RMW instructions
>>>>
>>>> These are the exact 3 things I changed in dartagnan for testing what Jonas
>>>> proposed.
>>>>
>>>> I am not sure if further changes are needed for herd7.
>
> What about failed RMW instructions? IIRC, herd7 generates just an R for
> these, not both R and W, but won't it still be annotated with an mb tag?
> And wasn't this matter of failed RMWs one of the issues that the two of
> you specifically wanted to make explicit in the memory model, rather
> than implicit in the operation of herd7?

That's why we use the RMW_MB tag. I should have copied that definition too:

(* full barrier events that appear in non-failing RMW *)
let RMW_MB = Mb & (dom(rmw) | range(rmw))

This ensures that the only successful rmw instructions have an RMW_MB tag.

>
> And wasn't another one of these issues the difference between
> value-returning and non-value-returning RMWs? As far as I can, nothing
> in the .def file specifically mentions this. There's the noreturn tag
> in the .bell file, but nothing in the .def file says which instructions
> it applies to. Or are we supposed to know that it automatically applies
> to all __atomic_op() instances?

Ah, now you're already forestalling my next suggestion :))

I would say let's fix one issue at a time (learned this from Andrea).

For the other issue, do noreturn rmws always have the same ordering as once?

I suspect we need to extend herd slightly more to support the second
one, I don't know if there's syntax for passing tags to __atomic_op.

>
>>> Okay, good. This answers the first part of what I asked. What about
>>> the second part? That is, how will the changes to the .def, .bell, and
>>> .cat files achieve your goals?
>>>
>>> Alan
>>
>>
>> Firstly, we'd allow 'mb as a barrier mode in events, e.g.
>
> You mean as a mode in RMW events. 'mb already is a mode for F events.

Thanks, you're right.

>
>> instructions RMW[{'once,'acquire,'release,'mb}]
>>
>> then the Mb tags would appear in the graph. And then I'd define the ordering
>> explicitly. One way is to say that an Mb tag orders all memory accesses
>> before(or at) the tag with all memory accesses after(or at) the tag, except
>> the accesses of the rmw with each other.
>
> I don't think you need to add very much. The .cat file already defines
> the mb relation as including the term:
>
> ([M] ; fencerel(Mb) ; [M])
>
> All that's needed is to replace the fencerel(Mb) with something more
> general...
>
>> This is the same as the full fence before the read, which orders all memory
>> accesses before the read with every access after(or at) the read,
>> plus the full fence after the write, which orders all memory accesses
>> before(or at) the write with all accesses after the write.
>>
>> That would be done by adding
>>
>> [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M]
>
> .. like this.
>
>> to ppo.
>
> You need to update the definition of mb, not ppo.

Yes, I meant to update the definition of mb, but I momentarily thought
the effect of that is just that

ppo_new = ppo_old | [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M]

I forgot that extending mb has another effect, in pb.

> And the RMW_MB above
> looks wrong; shouldn't it be just Mb?

As discussed above, no, since the Mb will also be on the failed RMW.

>
> Also, is the "\ rmw" part really necessary? I don't think it makes any
> difference; the memory model already knows that the read part of an RMW
> has to happen before the write part.

It unfortunately does make a difference, because mb is a strong fence.
This means that an Mb in an rmw sequence would create additional pb edges.

prop;(rfe ; [Mb];rmw;[Mb])

I believe this is might give wrong results on powerpc, but I'd need to
double check.

>
>> One could also split it into two rules to keep with the "two full fences"
>> analogy. Maybe a good way would be like this:
>>
>> [M] ; po; [RMW_MB & R] ; po^? ; [M]
>>
>> [M] ; po^?; [RMW_MB & W] ; po ; [M]
>
> My preference is for the first approach.

That was also my early preference, but to be honest I expected that
you'd prefer the second approach.
Actually, I also started warming up to it.

Have fun,
jonas

2024-05-22 14:21:31

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

On Wed, May 22, 2024 at 11:20:47AM +0200, Jonas Oberhauser wrote:
>
>
> Am 5/21/2024 um 5:36 PM schrieb Alan Stern:
> > On Tue, May 21, 2024 at 11:57:29AM +0200, Jonas Oberhauser wrote:
> > >
> > >
> > > Am 5/18/2024 um 2:31 AM schrieb Alan Stern:
> > > > On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote:
> > > > > On 5/16/2024 10:31 AM, Jonas Oberhauser wrote:
> > > > > >
> > > > > >
> > > > > > Am 5/16/2024 um 3:43 AM schrieb Alan Stern:
> > > > > > > Hernan and Jonas:
> > > > > > >
> > > > > > > Can you explain more fully the changes you want to make to herd7 and/or
> > > > > > > the LKMM?? The goal is to make the memory barriers currently implicit in
> > > > > > > RMW operations explicit, but I couldn't understand how you propose to do
> > > > > > > this.
> > > > > > >
> > > > > > > Are you going to change herd7 somehow, and if so, how?? It seems like
> > > > > > > you should want to provide sufficient information so that the .bell
> > > > > > > and .cat files can implement the appropriate memory barriers associated
> > > > > > > with each RMW operation.? What additional information is needed?? And
> > > > > > > how (explained in English, not by quoting source code) will the .bell
> > > > > > > and .cat files make use of this information?
> > > > > > >
> > > > > > > Alan
> > > > > >
> > > > > >
> > > > > > I don't know whether herd7 needs to be changed. Probably, herd7 does the
> > > > > > following:
> > > > > > - if a tag called Mb appears on an rmw instruction (by instruction I
> > > > > > mean things like xchg(), atomic_inc_return_relaxed()), replace it with
> > > > > > one of those things:
> > > > > > ? * full mb ; once (the rmw) ; full mb, if a value returning
> > > > > > (successful) rmw
> > > > > > ? * once (the rmw)?? otherwise
> > > > > > - everything else gets translated 1:1 into some internal representation
> > > > >
> > > > > This is my understanding from reading the source code of CSem.ml in herd7's
> > > > > repo.
> > > > >
> > > > > Also, this is exactly what dartagnan is currently doing.
> > > > >
> > > > > >
> > > > > > What I'm proposing is:
> > > > > > 1. remove this transpilation step,
> > > > > > 2. and instead allow the Mb tag to actually appear on RMW instructions
> > > > > > 3. change the cat file to explicitly define the behavior of the Mb tag
> > > > > > on RMW instructions
> > > > >
> > > > > These are the exact 3 things I changed in dartagnan for testing what Jonas
> > > > > proposed.
> > > > >
> > > > > I am not sure if further changes are needed for herd7.
> >
> > What about failed RMW instructions? IIRC, herd7 generates just an R for
> > these, not both R and W, but won't it still be annotated with an mb tag?
> > And wasn't this matter of failed RMWs one of the issues that the two of
> > you specifically wanted to make explicit in the memory model, rather
> > than implicit in the operation of herd7?
>
> That's why we use the RMW_MB tag. I should have copied that definition too:
>
>
> (* full barrier events that appear in non-failing RMW *)
> let RMW_MB = Mb & (dom(rmw) | range(rmw))
>
>
> This ensures that the only successful rmw instructions have an RMW_MB tag.

It would be better if there was a way to tell herd7 not to add the 'mb
tag to failed instructions in the first place. This approach is
brittle; see below.

An alternative would be to have a way for the .cat file to remove the
'mb tag from a failed RMW instruction. But I don't know if this is
feasible.

> > And wasn't another one of these issues the difference between
> > value-returning and non-value-returning RMWs? As far as I can, nothing
> > in the .def file specifically mentions this. There's the noreturn tag
> > in the .bell file, but nothing in the .def file says which instructions
> > it applies to. Or are we supposed to know that it automatically applies
> > to all __atomic_op() instances?
>
> Ah, now you're already forestalling my next suggestion :))
>
> I would say let's fix one issue at a time (learned this from Andrea).
>
> For the other issue, do noreturn rmws always have the same ordering as once?

If they aren't annotated with _acquire or _release then yes... And I
don't know whether there _are_ any annotated no-return RMWs. If
somebody wanted such a thing, they probably would just use a
value-returning RMW instead.

> I suspect we need to extend herd slightly more to support the second one, I
> don't know if there's syntax for passing tags to __atomic_op.

This may not be be needed. But still, it would nice to be explicit (in
a comment in the .def file if nowhere else) that __atomic_op always adds
a 'noreturn tag.

> > > instructions RMW[{'once,'acquire,'release,'mb}]
> > >
> > > then the Mb tags would appear in the graph. And then I'd define the ordering
> > > explicitly. One way is to say that an Mb tag orders all memory accesses
> > > before(or at) the tag with all memory accesses after(or at) the tag, except
> > > the accesses of the rmw with each other.
> >
> > I don't think you need to add very much. The .cat file already defines
> > the mb relation as including the term:
> >
> > ([M] ; fencerel(Mb) ; [M])
> >
> > All that's needed is to replace the fencerel(Mb) with something more
> > general...

And this is why I said the RMW_MB mechanism is brittle. With the 'mb
tag still added to failed RMW events, the term above will cause the
memory model to think there is ordering even though the event isn't in
the RMW_MB class.

> > Also, is the "\ rmw" part really necessary? I don't think it makes any
> > difference; the memory model already knows that the read part of an RMW
> > has to happen before the write part.
>
> It unfortunately does make a difference, because mb is a strong fence.
> This means that an Mb in an rmw sequence would create additional pb edges.
>
> prop;(rfe ; [Mb];rmw;[Mb])
>
> I believe this is might give wrong results on powerpc, but I'd need to
> double check.

PowerPC uses a load-reserve/store-conditional approach for RMW, so it's
tricky. However, you're right that having a fictitious mb between the
read and the write of an RMW would mean that the preceding (in coherence
order) write would have to be visible to all CPUs before the RMW write
could execute, and I don't believe we want to assert this.

> > > One could also split it into two rules to keep with the "two full fences"
> > > analogy. Maybe a good way would be like this:
> > >
> > > [M] ; po; [RMW_MB & R] ; po^? ; [M]
> > >
> > > [M] ; po^?; [RMW_MB & W] ; po ; [M]
> >
> > My preference is for the first approach.
>
> That was also my early preference, but to be honest I expected that you'd
> prefer the second approach.
> Actually, I also started warming up to it.

If you do want to use this approach, it should be simplified. All you
need is:

[M] ; po ; [RMW_MB]

[RMW_MB] ; po ; [M]

This is because events tagged with RMW_MB always are memory accesses,
and accesses that aren't part of the RMW are already covered by the
fencerel(Mb) thing above.

Alan

2024-05-22 16:54:48

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

Alan, all,

("randomly" picking a recent post in the thread, after having observed
this discussion for a while...)

> It would be better if there was a way to tell herd7 not to add the 'mb
> tag to failed instructions in the first place. This approach is
> brittle; see below.

AFAIU, changing the herd representation to generate mb-accesses in place
of certain mb-fences...

> If you do want to use this approach, it should be simplified. All you
> need is:
>
> [M] ; po ; [RMW_MB]
>
> [RMW_MB] ; po ; [M]
>
> This is because events tagged with RMW_MB always are memory accesses,
> and accesses that aren't part of the RMW are already covered by the
> fencerel(Mb) thing above.

.. and updating the .cat file to the effects of something like

-let mb = ([M] ; fencerel(Mb) ; [M]) |
+let mb = (([M] ; po? ; [Mb] ; po? ; [M]) \ id) |

.. can hardly be called "making RMW barriers explicit". (So much so
that the first commit in PR #865 was titled "Remove explicit barriers
from RMWs". :-))

Overall, this discussion rather seems to confirm the close link between
tools/memory-model/ and herdtools7. (After all, to what extent could
any putative RMW_MB be considered "explicit" without _knowing the under-
lying representation of the RMW operations...) My understanding is that
this discussion was at least in part motivated by a desire to experiment
and familiarize with the current herd representation (that does indeed
require some getting-used-to...); this suggests, as some of you already
mentioned, to add some comments or a .txt in tools/memory-model/ in order
to document such representation and ameliorate that experience. OTOH, I
must admit, I'm unable to see here sufficient motivation(tm) for changing
the current representation (and model): not the how, but the why...

Andrea

2024-05-22 18:21:07

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

On Wed, May 22, 2024 at 06:54:25PM +0200, Andrea Parri wrote:
> Alan, all,
>
> ("randomly" picking a recent post in the thread, after having observed
> this discussion for a while...)
>
> > It would be better if there was a way to tell herd7 not to add the 'mb
> > tag to failed instructions in the first place. This approach is
> > brittle; see below.
>
> AFAIU, changing the herd representation to generate mb-accesses in place
> of certain mb-fences...

I believe herd7 already generates mb accesses (not fences) for certain
RMW operations. But then it does some post-processing on them, and that
post-processing is what we are thinking of changing.

> > If you do want to use this approach, it should be simplified. All you
> > need is:
> >
> > [M] ; po ; [RMW_MB]
> >
> > [RMW_MB] ; po ; [M]
> >
> > This is because events tagged with RMW_MB always are memory accesses,
> > and accesses that aren't part of the RMW are already covered by the
> > fencerel(Mb) thing above.
>
> ... and updating the .cat file to the effects of something like
>
> -let mb = ([M] ; fencerel(Mb) ; [M]) |
> +let mb = (([M] ; po? ; [Mb] ; po? ; [M]) \ id) |
>
> ... can hardly be called "making RMW barriers explicit". (So much so
> that the first commit in PR #865 was titled "Remove explicit barriers
> from RMWs". :-))

There is another point, something we didn't spell out explicitly in the
email discussion. Namely, in linux-kernel.def there is a long list of
instructions along with corresponding herd7 implementation instructions,
and those instructions explicitly contain either {once}, {acquire},
{release}, or {mb} tags. So to a large extent, these barriers already
are explicit in the memory model. Not in the .cat file, but in the .def
file.

What is not so explicit is how the {mb} tag works. Its operation isn't
as simple as the operation of the {acquire} and {release} tags; those
just modify the R or W access in the RMW pair as you would expect.
Instead, an {mb} tag says to insert strong memory barriers before the R
access and after the W access. This is more or less what the
post-processing mentioned earlier does, and Jonas and Hernan want to
move this out of herd7 and into the memory model.

> Overall, this discussion rather seems to confirm the close link between
> tools/memory-model/ and herdtools7. (After all, to what extent could
> any putative RMW_MB be considered "explicit" without _knowing the under-
> lying representation of the RMW operations...) My understanding is that
> this discussion was at least in part motivated by a desire to experiment
> and familiarize with the current herd representation (that does indeed
> require some getting-used-to...); this suggests, as some of you already
> mentioned, to add some comments or a .txt in tools/memory-model/ in order
> to document such representation and ameliorate that experience. OTOH, I
> must admit, I'm unable to see here sufficient motivation(tm) for changing
> the current representation (and model): not the how, but the why...

Well, it's not a big change. And in my opinion, if something can be
moved out of herd7's innards and into the memory model files, then doing
so is generally a good idea.

However, I do agree that there will still be a close link between
tools/memory-model/ and herdtools7. This may be unavoidable, at least
to some extent, but any way to reduce it is worth considering.

Alan

2024-05-22 19:48:53

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

On 5/22/2024 8:20 PM, Alan Stern wrote:
> On Wed, May 22, 2024 at 06:54:25PM +0200, Andrea Parri wrote:
>> Alan, all,
>>
>> ("randomly" picking a recent post in the thread, after having observed
>> this discussion for a while...)
>>
>>> It would be better if there was a way to tell herd7 not to add the 'mb
>>> tag to failed instructions in the first place. This approach is
>>> brittle; see below.
>>
>> AFAIU, changing the herd representation to generate mb-accesses in place
>> of certain mb-fences...
>
> I believe herd7 already generates mb accesses (not fences) for certain
> RMW operations. But then it does some post-processing on them, and that
> post-processing is what we are thinking of changing.
>
>>> If you do want to use this approach, it should be simplified. All you
>>> need is:
>>>
>>> [M] ; po ; [RMW_MB]
>>>
>>> [RMW_MB] ; po ; [M]
>>>
>>> This is because events tagged with RMW_MB always are memory accesses,
>>> and accesses that aren't part of the RMW are already covered by the
>>> fencerel(Mb) thing above.
>>
>> ... and updating the .cat file to the effects of something like
>>
>> -let mb = ([M] ; fencerel(Mb) ; [M]) |
>> +let mb = (([M] ; po? ; [Mb] ; po? ; [M]) \ id) |
>>
>> ... can hardly be called "making RMW barriers explicit". (So much so
>> that the first commit in PR #865 was titled "Remove explicit barriers
>> from RMWs". :-))
>
> There is another point, something we didn't spell out explicitly in the
> email discussion. Namely, in linux-kernel.def there is a long list of
> instructions along with corresponding herd7 implementation instructions,
> and those instructions explicitly contain either {once}, {acquire},
> {release}, or {mb} tags. So to a large extent, these barriers already
> are explicit in the memory model. Not in the .cat file, but in the .def
> file.
>
> What is not so explicit is how the {mb} tag works. Its operation isn't
> as simple as the operation of the {acquire} and {release} tags; those
> just modify the R or W access in the RMW pair as you would expect.
> Instead, an {mb} tag says to insert strong memory barriers before the R
> access and after the W access. This is more or less what the
> post-processing mentioned earlier does, and Jonas and Hernan want to
> move this out of herd7 and into the memory model.
>
>> Overall, this discussion rather seems to confirm the close link between
>> tools/memory-model/ and herdtools7. (After all, to what extent could
>> any putative RMW_MB be considered "explicit" without _knowing the under-
>> lying representation of the RMW operations...) My understanding is that
>> this discussion was at least in part motivated by a desire to experiment
>> and familiarize with the current herd representation (that does indeed
>> require some getting-used-to...); this suggests, as some of you already
>> mentioned, to add some comments or a .txt in tools/memory-model/ in order
>> to document such representation and ameliorate that experience. OTOH, I
>> must admit, I'm unable to see here sufficient motivation(tm) for changing
>> the current representation (and model): not the how, but the why...
>
> Well, it's not a big change. And in my opinion, if something can be
> moved out of herd7's innards and into the memory model files, then doing
> so is generally a good idea.
>
> However, I do agree that there will still be a close link between
> tools/memory-model/ and herdtools7. This may be unavoidable, at least
> to some extent, but any way to reduce it is worth considering.

I can give my motivation as a tool developer. It would be much simpler
if one could find all the information to support lkmm in
tools/memory-model/ (in the form of the model + some comments or a .txt
to cover those things we cannot move out of the tool implementation),
rather than having to dive into herd7 code.

Hernan

>
> Alan

2024-05-23 09:04:40

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

> It would be much simpler if
> one could find all the information to support lkmm in tools/memory-model/
> (in the form of the model + some comments or a .txt to cover those things we
> cannot move out of the tool implementation), rather than having to dive into
> herd7 code.

Got it. And I do find that very relatable to LKMM developers who, like me,
are definitely not fluent in OCaml. :-/

Let me draft some .txt to such effect, based on but expanding and hopefully
completing my previous remarks in

https://lore.kernel.org/lkml/ZjrSm119+IfIU9eU@andrea/

Having something like that "on paper" can also help evaluate future changes
to the tool and/or model.

Thank you for the suggestion.

Andrea

2024-05-23 12:54:50

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

Am 5/22/2024 um 4:20 PM schrieb Alan Stern:
> On Wed, May 22, 2024 at 11:20:47AM +0200, Jonas Oberhauser wrote:
>>
>>
>> Am 5/21/2024 um 5:36 PM schrieb Alan Stern:
>>> On Tue, May 21, 2024 at 11:57:29AM +0200, Jonas Oberhauser wrote:
>>>>
>>>>
>>>> Am 5/18/2024 um 2:31 AM schrieb Alan Stern:
>>>>> On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote:
>>>>>> On 5/16/2024 10:31 AM, Jonas Oberhauser wrote:
>>>>>>>
>>>>>>>
>>>>>>> Am 5/16/2024 um 3:43 AM schrieb Alan Stern:
>>>>>>>> Hernan and Jonas:
>>>>>>>>
>>>>>>>> Can you explain more fully the changes you want to make to herd7 and/or
>>>>>>>> the LKMM? The goal is to make the memory barriers currently implicit in
>>>>>>>> RMW operations explicit, but I couldn't understand how you propose to do
>>>>>>>> this.
>>>>>>>>
>>>>>>>> Are you going to change herd7 somehow, and if so, how? It seems like
>>>>>>>> you should want to provide sufficient information so that the .bell
>>>>>>>> and .cat files can implement the appropriate memory barriers associated
>>>>>>>> with each RMW operation. What additional information is needed? And
>>>>>>>> how (explained in English, not by quoting source code) will the .bell
>>>>>>>> and .cat files make use of this information?
>>>>>>>>
>>>>>>>> Alan
>>>>>>>
>>>>>>>
>>>>>>> I don't know whether herd7 needs to be changed. Probably, herd7 does the
>>>>>>> following:
>>>>>>> - if a tag called Mb appears on an rmw instruction (by instruction I
>>>>>>> mean things like xchg(), atomic_inc_return_relaxed()), replace it with
>>>>>>> one of those things:
>>>>>>> * full mb ; once (the rmw) ; full mb, if a value returning
>>>>>>> (successful) rmw
>>>>>>> * once (the rmw) otherwise
>>>>>>> - everything else gets translated 1:1 into some internal representation
>>>>>>
>>>>>> This is my understanding from reading the source code of CSem.ml in herd7's
>>>>>> repo.
>>>>>>
>>>>>> Also, this is exactly what dartagnan is currently doing.
>>>>>>
>>>>>>>
>>>>>>> What I'm proposing is:
>>>>>>> 1. remove this transpilation step,
>>>>>>> 2. and instead allow the Mb tag to actually appear on RMW instructions
>>>>>>> 3. change the cat file to explicitly define the behavior of the Mb tag
>>>>>>> on RMW instructions
>>>>>>
>>>>>> These are the exact 3 things I changed in dartagnan for testing what Jonas
>>>>>> proposed.
>>>>>>
>>>>>> I am not sure if further changes are needed for herd7.
>>>
>>> What about failed RMW instructions? IIRC, herd7 generates just an R for
>>> these, not both R and W, but won't it still be annotated with an mb tag?
>>> And wasn't this matter of failed RMWs one of the issues that the two of
>>> you specifically wanted to make explicit in the memory model, rather
>>> than implicit in the operation of herd7?
>>
>> That's why we use the RMW_MB tag. I should have copied that definition too:
>>
>>
>> (* full barrier events that appear in non-failing RMW *)
>> let RMW_MB = Mb & (dom(rmw) | range(rmw))
>>
>>
>> This ensures that the only successful rmw instructions have an RMW_MB tag.
>
> It would be better if there was a way to tell herd7 not to add the 'mb
> tag to failed instructions in the first place. This approach is
> brittle; see below.

Hernan told me that in fact that is actually currently the case in
herd7. Failing RMW get assigned the Once tag implicitly.
Another thing that I'd suggest to change.

>
> An alternative would be to have a way for the .cat file to remove the
> 'mb tag from a failed RMW instruction. But I don't know if this is
> feasible.

For Mb it's feasible, as there is no Mb read or Mb store.

Mb = Mb & (~M | dom(rmw) | range(rmw))

However one would want to do the same for Acq and Rel.

For that one would need to distinguish e.g. between a read that comes
from a failed rmw instruction, and where the tag would disappear, or a
normal standalone read.

For example, by using two different acquire tags, 'acquire and
'rmw-acquire, and defining

Acquire = Acquire | Rmw-acquire & (dom(rmw) | range(rmw))

Anyways we can do this change independently. So for now, we don't need
RMW_MB.

>
>>> And wasn't another one of these issues the difference between
>>> value-returning and non-value-returning RMWs? As far as I can, nothing
>>> in the .def file specifically mentions this. There's the noreturn tag
>>> in the .bell file, but nothing in the .def file says which instructions
>>> it applies to. Or are we supposed to know that it automatically applies
>>> to all __atomic_op() instances?
>>
>> Ah, now you're already forestalling my next suggestion :))
>>
>> I would say let's fix one issue at a time (learned this from Andrea).
>>
>> For the other issue, do noreturn rmws always have the same ordering as once?
>
> If they aren't annotated with _acquire or _release then yes... And I
> don't know whether there _are_ any annotated no-return RMWs. If
> somebody wanted such a thing, they probably would just use a
> value-returning RMW instead.
>
>> I suspect we need to extend herd slightly more to support the second one, I
>> don't know if there's syntax for passing tags to __atomic_op.
>
> This may not be be needed. But still, it would nice to be explicit (in
> a comment in the .def file if nowhere else) that __atomic_op always adds
> a 'noreturn tag.
>
>>>> instructions RMW[{'once,'acquire,'release,'mb}]
>>>>
>>>> then the Mb tags would appear in the graph. And then I'd define the ordering
>>>> explicitly. One way is to say that an Mb tag orders all memory accesses
>>>> before(or at) the tag with all memory accesses after(or at) the tag, except
>>>> the accesses of the rmw with each other.
>>>
>>> I don't think you need to add very much. The .cat file already defines
>>> the mb relation as including the term:
>>>
>>> ([M] ; fencerel(Mb) ; [M])
>>>
>>> All that's needed is to replace the fencerel(Mb) with something more
>>> general...
>
> And this is why I said the RMW_MB mechanism is brittle. With the 'mb
> tag still added to failed RMW events, the term above will cause the
> memory model to think there is ordering even though the event isn't in
> the RMW_MB class.
>

Huh, I thought that fencerel(Mb) is something like po ; [F & Mb] ; po
(where F includes standalone fences). But looking into the stdlib.cat,
you're right.

>>> Also, is the "\ rmw" part really necessary? I don't think it makes any
>>> difference; the memory model already knows that the read part of an RMW
>>> has to happen before the write part.
>>
>> It unfortunately does make a difference, because mb is a strong fence.
>> This means that an Mb in an rmw sequence would create additional pb edges.
>>
>> prop;(rfe ; [Mb];rmw;[Mb])
>>
>> I believe this is might give wrong results on powerpc, but I'd need to
>> double check.
>
> PowerPC uses a load-reserve/store-conditional approach for RMW, so it's
> tricky. However, you're right that having a fictitious mb between the
> read and the write of an RMW would mean that the preceding (in coherence
> order) write would have to be visible to all CPUs before the RMW write
> could execute, and I don't believe we want to assert this.
>
>>>> One could also split it into two rules to keep with the "two full fences"
>>>> analogy. Maybe a good way would be like this:
>>>>
>>>> [M] ; po; [RMW_MB & R] ; po^? ; [M]
>>>>
>>>> [M] ; po^?; [RMW_MB & W] ; po ; [M]
>>>
>>> My preference is for the first approach.
>>
>> That was also my early preference, but to be honest I expected that you'd
>> prefer the second approach.
>> Actually, I also started warming up to it.
>
> If you do want to use this approach, it should be simplified. All you
> need is:
>
> [M] ; po ; [RMW_MB]
>
> [RMW_MB] ; po ; [M]
>
> This is because events tagged with RMW_MB always are memory accesses,
> and accesses that aren't part of the RMW are already covered by the
> fencerel(Mb) thing above.

This has exactly the issue mentioned above - it will cause the rmw to
have an internal strong fence that on powerpc probably isn't there.

We could do (with the assumption that Mb applies only to successful rmw):

[M] ; po ; [Mb & R]
[Mb & W] ; po ; [M]

Kind regards, jonas

2024-05-23 13:36:04

by Paul E. McKenney

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

On Thu, May 23, 2024 at 02:54:05PM +0200, Jonas Oberhauser wrote:
> Am 5/22/2024 um 4:20 PM schrieb Alan Stern:
> > On Wed, May 22, 2024 at 11:20:47AM +0200, Jonas Oberhauser wrote:
> > > Am 5/21/2024 um 5:36 PM schrieb Alan Stern:
> > > > On Tue, May 21, 2024 at 11:57:29AM +0200, Jonas Oberhauser wrote:
> > > > > Am 5/18/2024 um 2:31 AM schrieb Alan Stern:
> > > > > > On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote:
> > > > > > > On 5/16/2024 10:31 AM, Jonas Oberhauser wrote:
> > > > > > > > Am 5/16/2024 um 3:43 AM schrieb Alan Stern:
> > > > > > > > > Hernan and Jonas:
> > > > > > > > >
> > > > > > > > > Can you explain more fully the changes you want to make to herd7 and/or
> > > > > > > > > the LKMM?? The goal is to make the memory barriers currently implicit in
> > > > > > > > > RMW operations explicit, but I couldn't understand how you propose to do
> > > > > > > > > this.
> > > > > > > > >
> > > > > > > > > Are you going to change herd7 somehow, and if so, how?? It seems like
> > > > > > > > > you should want to provide sufficient information so that the .bell
> > > > > > > > > and .cat files can implement the appropriate memory barriers associated
> > > > > > > > > with each RMW operation.? What additional information is needed?? And
> > > > > > > > > how (explained in English, not by quoting source code) will the .bell
> > > > > > > > > and .cat files make use of this information?
> > > > > > > > >
> > > > > > > > > Alan
> > > > > > > >
> > > > > > > >
> > > > > > > > I don't know whether herd7 needs to be changed. Probably, herd7 does the
> > > > > > > > following:
> > > > > > > > - if a tag called Mb appears on an rmw instruction (by instruction I
> > > > > > > > mean things like xchg(), atomic_inc_return_relaxed()), replace it with
> > > > > > > > one of those things:
> > > > > > > > ? * full mb ; once (the rmw) ; full mb, if a value returning
> > > > > > > > (successful) rmw
> > > > > > > > ? * once (the rmw)?? otherwise
> > > > > > > > - everything else gets translated 1:1 into some internal representation
> > > > > > >
> > > > > > > This is my understanding from reading the source code of CSem.ml in herd7's
> > > > > > > repo.
> > > > > > >
> > > > > > > Also, this is exactly what dartagnan is currently doing.
> > > > > > >
> > > > > > > >
> > > > > > > > What I'm proposing is:
> > > > > > > > 1. remove this transpilation step,
> > > > > > > > 2. and instead allow the Mb tag to actually appear on RMW instructions
> > > > > > > > 3. change the cat file to explicitly define the behavior of the Mb tag
> > > > > > > > on RMW instructions
> > > > > > >
> > > > > > > These are the exact 3 things I changed in dartagnan for testing what Jonas
> > > > > > > proposed.
> > > > > > >
> > > > > > > I am not sure if further changes are needed for herd7.
> > > >
> > > > What about failed RMW instructions? IIRC, herd7 generates just an R for
> > > > these, not both R and W, but won't it still be annotated with an mb tag?
> > > > And wasn't this matter of failed RMWs one of the issues that the two of
> > > > you specifically wanted to make explicit in the memory model, rather
> > > > than implicit in the operation of herd7?
> > >
> > > That's why we use the RMW_MB tag. I should have copied that definition too:
> > >
> > >
> > > (* full barrier events that appear in non-failing RMW *)
> > > let RMW_MB = Mb & (dom(rmw) | range(rmw))
> > >
> > >
> > > This ensures that the only successful rmw instructions have an RMW_MB tag.
> >
> > It would be better if there was a way to tell herd7 not to add the 'mb
> > tag to failed instructions in the first place. This approach is
> > brittle; see below.
>
> Hernan told me that in fact that is actually currently the case in herd7.
> Failing RMW get assigned the Once tag implicitly.
> Another thing that I'd suggest to change.

Let's please be careful, though. There is a lot out there that depends
on this semantic, odd though it might seem at first glance. ;-)

Thanx, Paul

> > An alternative would be to have a way for the .cat file to remove the
> > 'mb tag from a failed RMW instruction. But I don't know if this is
> > feasible.
>
> For Mb it's feasible, as there is no Mb read or Mb store.
>
> Mb = Mb & (~M | dom(rmw) | range(rmw))
>
> However one would want to do the same for Acq and Rel.
>
> For that one would need to distinguish e.g. between a read that comes from a
> failed rmw instruction, and where the tag would disappear, or a normal
> standalone read.
>
> For example, by using two different acquire tags, 'acquire and 'rmw-acquire,
> and defining
>
> Acquire = Acquire | Rmw-acquire & (dom(rmw) | range(rmw))
>
> Anyways we can do this change independently. So for now, we don't need
> RMW_MB.
>
>
> >
> > > > And wasn't another one of these issues the difference between
> > > > value-returning and non-value-returning RMWs? As far as I can, nothing
> > > > in the .def file specifically mentions this. There's the noreturn tag
> > > > in the .bell file, but nothing in the .def file says which instructions
> > > > it applies to. Or are we supposed to know that it automatically applies
> > > > to all __atomic_op() instances?
> > >
> > > Ah, now you're already forestalling my next suggestion :))
> > >
> > > I would say let's fix one issue at a time (learned this from Andrea).
> > >
> > > For the other issue, do noreturn rmws always have the same ordering as once?
> >
> > If they aren't annotated with _acquire or _release then yes... And I
> > don't know whether there _are_ any annotated no-return RMWs. If
> > somebody wanted such a thing, they probably would just use a
> > value-returning RMW instead.
> >
> > > I suspect we need to extend herd slightly more to support the second one, I
> > > don't know if there's syntax for passing tags to __atomic_op.
> >
> > This may not be be needed. But still, it would nice to be explicit (in
> > a comment in the .def file if nowhere else) that __atomic_op always adds
> > a 'noreturn tag.
> >
> > > > > instructions RMW[{'once,'acquire,'release,'mb}]
> > > > >
> > > > > then the Mb tags would appear in the graph. And then I'd define the ordering
> > > > > explicitly. One way is to say that an Mb tag orders all memory accesses
> > > > > before(or at) the tag with all memory accesses after(or at) the tag, except
> > > > > the accesses of the rmw with each other.
> > > >
> > > > I don't think you need to add very much. The .cat file already defines
> > > > the mb relation as including the term:
> > > >
> > > > ([M] ; fencerel(Mb) ; [M])
> > > >
> > > > All that's needed is to replace the fencerel(Mb) with something more
> > > > general...
> >
> > And this is why I said the RMW_MB mechanism is brittle. With the 'mb
> > tag still added to failed RMW events, the term above will cause the
> > memory model to think there is ordering even though the event isn't in
> > the RMW_MB class.
> >
>
> Huh, I thought that fencerel(Mb) is something like po ; [F & Mb] ; po (where
> F includes standalone fences). But looking into the stdlib.cat, you're
> right.
>
>
> > > > Also, is the "\ rmw" part really necessary? I don't think it makes any
> > > > difference; the memory model already knows that the read part of an RMW
> > > > has to happen before the write part.
> > >
> > > It unfortunately does make a difference, because mb is a strong fence.
> > > This means that an Mb in an rmw sequence would create additional pb edges.
> > >
> > > prop;(rfe ; [Mb];rmw;[Mb])
> > >
> > > I believe this is might give wrong results on powerpc, but I'd need to
> > > double check.
> >
> > PowerPC uses a load-reserve/store-conditional approach for RMW, so it's
> > tricky. However, you're right that having a fictitious mb between the
> > read and the write of an RMW would mean that the preceding (in coherence
> > order) write would have to be visible to all CPUs before the RMW write
> > could execute, and I don't believe we want to assert this.
> >
> > > > > One could also split it into two rules to keep with the "two full fences"
> > > > > analogy. Maybe a good way would be like this:
> > > > >
> > > > > [M] ; po; [RMW_MB & R] ; po^? ; [M]
> > > > >
> > > > > [M] ; po^?; [RMW_MB & W] ; po ; [M]
> > > >
> > > > My preference is for the first approach.
> > >
> > > That was also my early preference, but to be honest I expected that you'd
> > > prefer the second approach.
> > > Actually, I also started warming up to it.
> >
> > If you do want to use this approach, it should be simplified. All you
> > need is:
> >
> > [M] ; po ; [RMW_MB]
> >
> > [RMW_MB] ; po ; [M]
> >
> > This is because events tagged with RMW_MB always are memory accesses,
> > and accesses that aren't part of the RMW are already covered by the
> > fencerel(Mb) thing above.
>
> This has exactly the issue mentioned above - it will cause the rmw to have
> an internal strong fence that on powerpc probably isn't there.
>
> We could do (with the assumption that Mb applies only to successful rmw):
>
> [M] ; po ; [Mb & R]
> [Mb & W] ; po ; [M]
>
>
> Kind regards, jonas
>

2024-05-23 14:05:26

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

On Thu, May 23, 2024 at 02:54:05PM +0200, Jonas Oberhauser wrote:
>
>
> Am 5/22/2024 um 4:20 PM schrieb Alan Stern:
> > It would be better if there was a way to tell herd7 not to add the 'mb
> > tag to failed instructions in the first place. This approach is
> > brittle; see below.
>
> Hernan told me that in fact that is actually currently the case in herd7.
> Failing RMW get assigned the Once tag implicitly.
> Another thing that I'd suggest to change.

Indeed.

> > An alternative would be to have a way for the .cat file to remove the
> > 'mb tag from a failed RMW instruction. But I don't know if this is
> > feasible.
>
> For Mb it's feasible, as there is no Mb read or Mb store.
>
> Mb = Mb & (~M | dom(rmw) | range(rmw))
>
> However one would want to do the same for Acq and Rel.
>
> For that one would need to distinguish e.g. between a read that comes from a
> failed rmw instruction, and where the tag would disappear, or a normal
> standalone read.
>
> For example, by using two different acquire tags, 'acquire and 'rmw-acquire,
> and defining
>
> Acquire = Acquire | Rmw-acquire & (dom(rmw) | range(rmw))
>
> Anyways we can do this change independently. So for now, we don't need
> RMW_MB.

Overall, it seems better to have herd7 assign the right tag, but change
the way the .def file works so that it can tell herd7 which tag to use
in each of the success and failure cases.

> > [M] ; po ; [RMW_MB]
> >
> > [RMW_MB] ; po ; [M]
> >
> > This is because events tagged with RMW_MB always are memory accesses,
> > and accesses that aren't part of the RMW are already covered by the
> > fencerel(Mb) thing above.
>
> This has exactly the issue mentioned above - it will cause the rmw to have
> an internal strong fence that on powerpc probably isn't there.

Oops, that's right. Silly oversight on my part. But at least you
understood what I meant.

> We could do (with the assumption that Mb applies only to successful rmw):
>
> [M] ; po ; [Mb & R]
> [Mb & W] ; po ; [M]

That works.

Alan

2024-05-23 14:26:59

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

On 5/23/2024 4:05 PM, Alan Stern wrote:
> On Thu, May 23, 2024 at 02:54:05PM +0200, Jonas Oberhauser wrote:
>>
>>
>> Am 5/22/2024 um 4:20 PM schrieb Alan Stern:
>>> It would be better if there was a way to tell herd7 not to add the 'mb
>>> tag to failed instructions in the first place. This approach is
>>> brittle; see below.
>>
>> Hernan told me that in fact that is actually currently the case in herd7.
>> Failing RMW get assigned the Once tag implicitly.
>> Another thing that I'd suggest to change.
>
> Indeed.
>
>>> An alternative would be to have a way for the .cat file to remove the
>>> 'mb tag from a failed RMW instruction. But I don't know if this is
>>> feasible.
>>
>> For Mb it's feasible, as there is no Mb read or Mb store.
>>
>> Mb = Mb & (~M | dom(rmw) | range(rmw))
>>
>> However one would want to do the same for Acq and Rel.
>>
>> For that one would need to distinguish e.g. between a read that comes from a
>> failed rmw instruction, and where the tag would disappear, or a normal
>> standalone read.
>>
>> For example, by using two different acquire tags, 'acquire and 'rmw-acquire,
>> and defining
>>
>> Acquire = Acquire | Rmw-acquire & (dom(rmw) | range(rmw))
>>
>> Anyways we can do this change independently. So for now, we don't need
>> RMW_MB.
>
> Overall, it seems better to have herd7 assign the right tag, but change
> the way the .def file works so that it can tell herd7 which tag to use
> in each of the success and failure cases.

I am not fully sure how herd7 uses the .def file, but I guess something
like adding a second memory tag to __cmpxchg could work

cmpxchg(X,V,W) __cmpxchg{mb, once}(X,V,W)
cmpxchg_relaxed(X,V,W) __cmpxchg{once, once}(X,V,W)
cmpxchg_acquire(X,V,W) __cmpxchg{acquire, acquire}(X,V,W)
cmpxchg_release(X,V,W) __cmpxchg{release, release}(X,V,W)

Hernan

>
>>> [M] ; po ; [RMW_MB]
>>>
>>> [RMW_MB] ; po ; [M]
>>>
>>> This is because events tagged with RMW_MB always are memory accesses,
>>> and accesses that aren't part of the RMW are already covered by the
>>> fencerel(Mb) thing above.
>>
>> This has exactly the issue mentioned above - it will cause the rmw to have
>> an internal strong fence that on powerpc probably isn't there.
>
> Oops, that's right. Silly oversight on my part. But at least you
> understood what I meant.
>
>> We could do (with the assumption that Mb applies only to successful rmw):
>>
>> [M] ; po ; [Mb & R]
>> [Mb & W] ; po ; [M]
>
> That works.
>
> Alan

2024-05-23 14:28:15

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

Am 5/22/2024 um 6:54 PM schrieb Andrea Parri:
> Alan, all,
>
> ("randomly" picking a recent post in the thread, after having observed
> this discussion for a while...)
>
>> It would be better if there was a way to tell herd7 not to add the 'mb
>> tag to failed instructions in the first place. This approach is
>> brittle; see below.
>
> AFAIU, changing the herd representation to generate mb-accesses in place
> of certain mb-fences...
>
>> If you do want to use this approach, it should be simplified. All you
>> need is:
>>
>> [M] ; po ; [RMW_MB]
>>
>> [RMW_MB] ; po ; [M]
>>
>> This is because events tagged with RMW_MB always are memory accesses,
>> and accesses that aren't part of the RMW are already covered by the
>> fencerel(Mb) thing above.
>
> .. and updating the .cat file to the effects of something like
>
> -let mb = ([M] ; fencerel(Mb) ; [M]) |
> +let mb = (([M] ; po? ; [Mb] ; po? ; [M]) \ id) |
>
> .. can hardly be called "making RMW barriers explicit". (So much so
> that the first commit in PR #865 was titled "Remove explicit barriers
> from RMWs". :-))
>
> Overall, this discussion rather seems to confirm the close link between
> tools/memory-model/ and herdtools7. (After all, to what extent could
> any putative RMW_MB be considered "explicit" without _knowing the under-
> lying representation of the RMW operations...)

My view is a bit different. There's more or less standard theory for
weak memory models, including how operations at the source code level
map to events in the graph.

Part of that standard theory are relations like rmw, and the
circumstances under which they appear in the graph.

You'll see these relations in generic papers about weak memory models,
like GenMC, Hahn, etc. as well as in pretty much every specific memory
model like TSO, C11, PowerPC, VMM, etc., totally independently of herd7
(even though these notations have historically developed together with
herd).

Naively I would expect that a tool like herd7 would be a generic WMM
exploration tool, which follows these standards with a very obvious 1:1
mapping of what we see in the code and the events we see in the graph,
plus perhaps a thin and explicit configurable herd7-specific mapping
layer like what we see in linux_kernel.cat to configure the syntax for a
specific model.

In that mapping layer, we currently see that xchg() is an exchange
operation with an Mb tag. Just like an smp_load_acquire is a read with
an acquire tag.
Or so it seems.

Instead, we find that contrary to what's written in that file, and
contrary to the conventions, an xchg() is an F[mb] ; R[once] ; W[once]
; F[Mb]. And in fact a cmp_xchg could even be a R[once].

That's because the herd7 tool isn't quite as generic as one might think,
but rather specifically "detects" that it's running the LKMM and then
the mapping isn't what you'd naively think, but something hidden in the
OCaml implementation of herd7.

That would be comparable to a popular tool for matrix calculations using
the syntax
[ 10 5 4
3 4 2 ]

to define a 2x3 matrix, unless one of the values is -3, in which case it
becomes a vector of 6 elements, because that's what a really important
user of the tool wanted, and then didn't see enough of a need to change.

My point is that to anyone who has seen standard matrix notation, this
syntax in a matrix-computation-tool looks like it would be a matrix,
right? And it usually is a matrix; then it should always be a matrix.

I usually say I don't look much at natural language documentation and
only read code, because code doesn't lie. In this case, the natural
language documentation is saying the correct thing thanks to the hard
work of a lot of people, but the code (in .cat etc.) doesn't do what it
seems to be doing.

And I think that should be changed, both to reduce the anyways high
mental load of reading the code without having to do mental
transformations in your head, and also to make herd more Lkmm-agnostic.
In the ideal world, herd doesn't know anything about Lkmm except what we
tell it through tools/memory-model, and generic things like C syntax +
semantics.

So when using syntax like dom(rmw), I don't see it as confirming the
close link to herd more than before, but rather depending more on
standard notations and conventions, and relying on herd's close link to
those standard notations (such as using rmw for the rmw relation and
dom(r) for the domain of r) for dom(rmw) to mean what anyone who has
deeply read a couple of modern WMM papers would expect.

> My understanding is that
> this discussion was at least in part motivated by a desire to experiment
> and familiarize with the current herd representation

I would phrase it more extreme: I want to get rid of the unnecessary
non-standard parts of the herd representation.

Those parts have led me astray several times. Ok, who cares about me,
but even Luc once forgot about the non-standard parts and thought it is
a bug:

https://github.com/herd/herdtools7/issues/384#issuecomment-1132859904

I also know that Viktor stumbled over it before and also suggested we
change it.

I think there's 0 benefit to them, they're just wasting people's time
and energy and lead to misunderstanding.

Ok, this e-mail became long. Hope it at least helps clarify my
motivation :))

jonas

2024-05-23 14:37:32

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

Am 5/23/2024 um 4:05 PM schrieb Alan Stern:
> On Thu, May 23, 2024 at 02:54:05PM +0200, Jonas Oberhauser wrote:
>>
>>
>> Am 5/22/2024 um 4:20 PM schrieb Alan Stern:
>>> It would be better if there was a way to tell herd7 not to add the 'mb
>>> tag to failed instructions in the first place. This approach is
>>> brittle; see below.
>>
>> Hernan told me that in fact that is actually currently the case in herd7.
>> Failing RMW get assigned the Once tag implicitly.
>> Another thing that I'd suggest to change.
>
> Indeed.
>
>>> An alternative would be to have a way for the .cat file to remove the
>>> 'mb tag from a failed RMW instruction. But I don't know if this is
>>> feasible.
>>
>> For Mb it's feasible, as there is no Mb read or Mb store.
>>
>> Mb = Mb & (~M | dom(rmw) | range(rmw))
>>
>> However one would want to do the same for Acq and Rel.
>>
>> For that one would need to distinguish e.g. between a read that comes from a
>> failed rmw instruction, and where the tag would disappear, or a normal
>> standalone read.
>>
>> For example, by using two different acquire tags, 'acquire and 'rmw-acquire,
>> and defining
>>
>> Acquire = Acquire | Rmw-acquire & (dom(rmw) | range(rmw))
>>
>> Anyways we can do this change independently. So for now, we don't need
>> RMW_MB.
>
> Overall, it seems better to have herd7 assign the right tag, but change
> the way the .def file works so that it can tell herd7 which tag to use
> in each of the success and failure cases.

Yes, that would be good.
In principle herd should already support this kind of logic for e.g. C11
which also has distinct success and failure modes.
But of course I don't know if there's syntax to make this change in
def, let alone what it would look like.

> But at least you
> understood what I meant.

I do try :) (: :)

>
>> We could do (with the assumption that Mb applies only to successful rmw):
>>
>> [M] ; po ; [Mb & R]
>> [Mb & W] ; po ; [M]
>
> That works.

Ok, I'll prepare a patch for this and Andrea or anyone else can still
interject.
I suppose the patch would not change the semantics at all with the
current herd7, since Mb does not appear on any reads and writes for the
time being.

best wishes,
jonas

2024-05-23 15:14:52

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

On Thu, May 23, 2024 at 04:26:23PM +0200, Hernan Ponce de Leon wrote:
> On 5/23/2024 4:05 PM, Alan Stern wrote:
> > On Thu, May 23, 2024 at 02:54:05PM +0200, Jonas Oberhauser wrote:
> > >
> > >
> > > Am 5/22/2024 um 4:20 PM schrieb Alan Stern:
> > > > It would be better if there was a way to tell herd7 not to add the 'mb
> > > > tag to failed instructions in the first place. This approach is
> > > > brittle; see below.
> > >
> > > Hernan told me that in fact that is actually currently the case in herd7.
> > > Failing RMW get assigned the Once tag implicitly.
> > > Another thing that I'd suggest to change.
> >
> > Indeed.
> >
> > > > An alternative would be to have a way for the .cat file to remove the
> > > > 'mb tag from a failed RMW instruction. But I don't know if this is
> > > > feasible.
> > >
> > > For Mb it's feasible, as there is no Mb read or Mb store.
> > >
> > > Mb = Mb & (~M | dom(rmw) | range(rmw))
> > >
> > > However one would want to do the same for Acq and Rel.
> > >
> > > For that one would need to distinguish e.g. between a read that comes from a
> > > failed rmw instruction, and where the tag would disappear, or a normal
> > > standalone read.
> > >
> > > For example, by using two different acquire tags, 'acquire and 'rmw-acquire,
> > > and defining
> > >
> > > Acquire = Acquire | Rmw-acquire & (dom(rmw) | range(rmw))
> > >
> > > Anyways we can do this change independently. So for now, we don't need
> > > RMW_MB.
> >
> > Overall, it seems better to have herd7 assign the right tag, but change
> > the way the .def file works so that it can tell herd7 which tag to use
> > in each of the success and failure cases.
>
> I am not fully sure how herd7 uses the .def file, but I guess something like
> adding a second memory tag to __cmpxchg could work
>
> cmpxchg(X,V,W) __cmpxchg{mb, once}(X,V,W)
> cmpxchg_relaxed(X,V,W) __cmpxchg{once, once}(X,V,W)
> cmpxchg_acquire(X,V,W) __cmpxchg{acquire, acquire}(X,V,W)
> cmpxchg_release(X,V,W) __cmpxchg{release, release}(X,V,W)
>

Note that cmpxchg_acquire() and cmpxchg_release() don't have _acqurie
or _release ordering if they fails.

Besides, I'm not sure this is a good idea. Because the "{mb}, {once},
etc" part is a syntax thing, you write a cmpxchg(), it should be
translated to a cmpxchg event with MB tag on. As to failed cmpxchg()
doesn't provide ordering, it's a semantics thing, as Jonas showed that
it can be represent in cat file. As long as it's a semanitc thing and we
can represent in cat file, I don't think we want herd to give a special
treatment.

What you and Jonas looks fine to me, since it moves the semantic bits
from herd internal to cat file.

Regards,
Boqun

> Hernan
>
> >
> > > > [M] ; po ; [RMW_MB]
> > > >
> > > > [RMW_MB] ; po ; [M]
> > > >
> > > > This is because events tagged with RMW_MB always are memory accesses,
> > > > and accesses that aren't part of the RMW are already covered by the
> > > > fencerel(Mb) thing above.
> > >
> > > This has exactly the issue mentioned above - it will cause the rmw to have
> > > an internal strong fence that on powerpc probably isn't there.
> >
> > Oops, that's right. Silly oversight on my part. But at least you
> > understood what I meant.
> >
> > > We could do (with the assumption that Mb applies only to successful rmw):
> > >
> > > [M] ; po ; [Mb & R]
> > > [Mb & W] ; po ; [M]
> >
> > That works.
> >
> > Alan
>

2024-05-23 16:06:42

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

On Thu, May 23, 2024 at 04:26:23PM +0200, Hernan Ponce de Leon wrote:
> On 5/23/2024 4:05 PM, Alan Stern wrote:
> > Overall, it seems better to have herd7 assign the right tag, but change
> > the way the .def file works so that it can tell herd7 which tag to use
> > in each of the success and failure cases.
>
> I am not fully sure how herd7 uses the .def file, but I guess something like
> adding a second memory tag to __cmpxchg could work
>
> cmpxchg(X,V,W) __cmpxchg{mb, once}(X,V,W)
> cmpxchg_relaxed(X,V,W) __cmpxchg{once, once}(X,V,W)
> cmpxchg_acquire(X,V,W) __cmpxchg{acquire, acquire}(X,V,W)
> cmpxchg_release(X,V,W) __cmpxchg{release, release}(X,V,W)

Right, except that the last two should be:

cmpxchg_acquire(X,V,W) __cmpxchg{acquire, once}(X,V,W)
cmpxchg_release(X,V,W) __cmpxchg{release, once}(X,V,W)

Alan

2024-05-23 16:36:18

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

> I would phrase it more extreme: I want to get rid of the unnecessary
> non-standard parts of the herd representation.

Please indulge the thought that what might appear to be "non-standard"
to one or one's community might appear differently to others.

Continuing with the example above, I'm pretty sure your "standard" and
simple idea of mb-reads and mb-writes, or even "unsuccessful" mb-reads
will turn up the nose of many kernel developers... The current repre-
sentation for xchg() was described in the ASPLOS'18 work and it's been
used (& tested) upstream since the LKMM was first merged ~6 years ago.

But that's not the point, "standards" can change and certainly kernels
and tools do. My remark was more to point out that you're not getting
rid of anything..., you're simply proposing a different representation
(asking kernel developers & maintainers to "deal with it"): here's why
I was and I am looking forward to something more than "because we can".

Andrea

2024-05-23 20:31:17

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

Am 5/23/2024 um 6:35 PM schrieb Andrea Parri:
>> I would phrase it more extreme: I want to get rid of the unnecessary
>> non-standard parts of the herd representation.
>
> Please indulge the thought that what might appear to be "non-standard"
> to one or one's community might appear differently to others.

Certainly. And of course, the formalization of the LKMM doesn't have to
be optimized for the WMM community, even though I suspect that they (and
possibly the tools they develop) are the main direct consumers of the
formalization.

> Continuing with the example above, I'm pretty sure your "standard" and
> simple idea of mb-reads and mb-writes, or even "unsuccessful" mb-reads
> will turn up the nose of many kernel developers...

I'm not sure how true that is. Firstly I'm not sure how many kernel
developers really read the formalization and try to see what it does,
rather than relying on tools to gain an intuition of what's going on.

But let's say a kernel developer wants to make sure that their intuition
of how cmpxchg works is matched by the formal model, e.g., they want to
double check that the formal model is correct
after they got some unexpected results in a herd7 litmus test.

How would they go about it today? The only way is to read the OCaml
source code, because there's no other code that obviously defines the
behavior. At best, they would read

atomic_cmpxchg_acquire(X,V,W) __cmpxchg{acquire}(X,V,W)
atomic_cmpxchg_release(X,V,W) __cmpxchg{release}(X,V,W)
atomic_cmpxchg(X,V,W) __cmpxchg{mb}(X,V,W)

see the Acquire, Release, and Mb references in the model, and think "ok,
so '__cmpxchg{acquire}' means I get an Acquire tag in the success case
which gives acq_po ordering, '__cmpxchg{release}' means I get a Release
tag in the success case which gives po_rel ordering. Therefore
'__cmpxchg{mb}' must give me an Mb tag in the success case. That would
give me mb ordering, but not between the store and subsequent
operations, and that's just wrong."

But of course this "understanding by analogy" is broken, because there's
a bunch of special OCaml match expressions in herd that look only for
release or mb and just give totally different behavior for that one
case. Every other tag behaves exactly the same way.

At worst they wouldn't even guess that this only is the success ordering
(that's definitely what happened to us in the weak memory model
community, but we don't matter for this argument).

A better situation would be to do something like this:

(*success*)(*fail*)
atomic_cmpxchg_acquire(X,V,W) __cmpxchg {acquire} {once} (X,V,W)
atomic_cmpxchg_release(X,V,W) __cmpxchg {release} {once} (X,V,W)
atomic_cmpxchg(X,V,W) __cmpxchg {mb} {once} (X,V,W)

and being explicit in the model about what the Mb tag does:

| [M];fencerel(Mb);[M] (* smp_mb() and operations with Mb ordering
such as cmpxchg() *)
| [M];po;[R & Mb] (* reads with an Mb tag - coming from full fence
rmws - are ordered as-if there was an smp_mb() right before the read *)
| [W & Mb];po;[M] (* correspondingly, writes with an Mb tag are
ordered as-if there was an smp_mb() right after the write *)

And suddenly you can read the model and map it 1:1 to the intuition that
you can treat a successful cmpxchg() (or an xchg() etc.) as-if it was
enclosed by smp_mb().

There doesn't need to be a real fence.

> The current repre-
> sentation for xchg() was described in the ASPLOS'18 work

Do you mean the one example in Table 3?
What about cmpxchg() or cmpxchg_acquire()?

> But that's not the point, "standards" can change and certainly kernels
> and tools do. My remark was more to point out that you're not getting
> rid of anything...,

We're definitely getting rid of some lines in herd7, that have been
added solely for dealing with this specific case of LKMM.

For example, there's some code looking like this (for a memory ordering
tag `a`)

(match a with ["release"] -> a | _ -> a_once)

which specifically refers to the release tag in LKMM and we can turn
that into

a

with no more reference to any LKMM tags. Of course, this is not the
Linux community's problem, just the herd (and others who want to use the
literal cat file of LKMM) maintainers who have to deal with it.

And imagine that at some point you want to add another tag to the linux
kernel - like for example for 'accessing an atomic in initialization
code, so that the compiler can do optimizations like merge a bunch of
bitwise operations'. Let's call it 'init.

What will be the behavior of

WRITE_INIT(X,V) __store{init}(X,V);

with the current herd7? Honestly I have no clue, because there might be a

(match a with ["release"] -> a | _ -> a_once)

somewhere that will turn this 'init into 'once. We'd have to comb the
herd7 codebase to know for sure (or hope that our experiments are
sufficiently thorough to catch all cases).

>you're simply proposing a different representation

I wouldn't phrase it like that, since it's a representation many people
familiar with WMM would expect, but admittedly that doesn't have to be
the deciding factor.

> asking kernel developers & maintainers to "deal with it"

Deal with what, no longer having to learn OCaml to be sure that the
LKMM's formal definition matches the description in memory_barriers.txt?
Otherwise, I don't see anything that changes for the worse.
With the change, people need to think for a few seconds to see that the
explicit rules in the .cat file are a formalization of "treat xchg() as
if it had smp_mb() before and after".
Without the change, they need to read an OCaml codebase to see that is
meant by

atomic_xchg(X,V,W) __xchg{mb}(X,V,W)

So I don't see any downsides.

I would also support making the representation explicit in the .def
files instead, with something like

cmp_xchg(x,e,v) = { if (*x==e) { smp_mb(); int r =
__cmp_xchg{once}(x,e,v) ; smp_mb(); return r } else { __read{once}(x) } }

Then you get to keep the representation.
Without knowing herd's internals, I have no idea of how to seriously
specify a meta language so that herd could effectively and efficiently
deal with it in general. But one could at least envision some specific
syntax for cmpxchg with a code sequence for the failure case and a code
sequence for the success case.

Personally I don't prefer this option, firstly because I don't see a
good reason for the Linux community to go their own way here. I don't
think there's really much of a problem with saying "this is the
intuition; this is how we formalize it" and for the two not to be
completely identical. It happens all the time, and in this case the gap
between "we formalize it by really having two smp_mb() appear in the
graph if it's successful" and "we formalize it by providing the same
ordering that the smp_mb() would have if it was there" isn't big.
Especially for those kernel people who have enough motivation to
actually deep dive into the formalization in the first place. But it
makes it a lot more accessible for the WMM community, which can only
benefit LKMM.

And secondly because people will probably mostly focus on the .cat file,
which means that it's still a bit of a booby trap to put something so
important (and perhaps surprising) into the .def file, but at least one
that is in the open for people who are more careful and also read the
def file.

> I am looking forward to something more than "because we can".

- it makes it easier to maintain the LKMM in the future, because you
don't have to work around hidden transformations inside herd7
- it makes implicit behavior explicit
- it makes it easier to understand that the formalization matches the
intention
- it makes it easier to learn the LKMM from the formalization without
having to cross-reference every bit with the informal documentation to
avoid misunderstandings

Have a good time,
jonas

2024-05-24 01:38:24

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

On Thu, May 23, 2024 at 08:14:38AM -0700, Boqun Feng wrote:
> Besides, I'm not sure this is a good idea. Because the "{mb}, {once},
> etc" part is a syntax thing, you write a cmpxchg(), it should be
> translated to a cmpxchg event with MB tag on. As to failed cmpxchg()
> doesn't provide ordering, it's a semantics thing, as Jonas showed that
> it can be represent in cat file. As long as it's a semanitc thing and we
> can represent in cat file, I don't think we want herd to give a special
> treatment.

I don't really understand the distinction you're making between
syntactic things and semantic things. For most instructions there's no
problem, because the instruction does just one thing. But a cmpxchg
instruction can do either of two things, depending on whether it
succeeds or fails, so it makes sense to tell herd7 how to represent
both of them.

> What you and Jonas looks fine to me, since it moves the semantic bits
> from herd internal to cat file.

Trying to recognize failed RMW events by looking for R events with an mb
tag that aren't in the rmw relation seems very artificial. That fact
that it would work is merely an artifact of herd7's internal actions. I
think it would be much cleaner if herd7 represented these events in some
other way, particularly if we can tell it how.

After all, herd7 already does generate different sets of events for
successful (both R and W) and failed (only R) RMWs. It's not such a big
stretch to make the R events it generates different in the two cases.

Alan

2024-05-24 02:50:38

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

On Thu, May 23, 2024 at 09:38:05PM -0400, Alan Stern wrote:
> On Thu, May 23, 2024 at 08:14:38AM -0700, Boqun Feng wrote:
> > Besides, I'm not sure this is a good idea. Because the "{mb}, {once},
> > etc" part is a syntax thing, you write a cmpxchg(), it should be
> > translated to a cmpxchg event with MB tag on. As to failed cmpxchg()
> > doesn't provide ordering, it's a semantics thing, as Jonas showed that
> > it can be represent in cat file. As long as it's a semanitc thing and we
> > can represent in cat file, I don't think we want herd to give a special
> > treatment.
>
> I don't really understand the distinction you're making between
> syntactic things and semantic things. For most instructions there's no

Syntax is how the code is written, and semantic is how the code is
executed (in each execution candidate). So if we write a cmpxchg{mb}(),
and in execution candiates, it could generates a read{MB} event and a
write{MB} event (succeed case), or a read{MB} event (fail case), "{MB}"
here doesn't mean it's a full barrier, it only means the event comes
from a no suffix API. Here "{MB}" only has syntactic meaning (no
semantic meaning).

> problem, because the instruction does just one thing. But a cmpxchg
> instruction can do either of two things, depending on whether it
> succeeds or fails, so it makes sense to tell herd7 how to represent
> both of them.
>
> > What you and Jonas looks fine to me, since it moves the semantic bits
> > from herd internal to cat file.
>
> Trying to recognize failed RMW events by looking for R events with an mb
> tag that aren't in the rmw relation seems very artificial. That fact

Not really, RMW events contains all events generated from
read-modify-write primitives, if there is an R event without a rmw
relation (i.e there is no corresponding write event), it's a failed RWM
by definition: it cannot be anything else.

> that it would work is merely an artifact of herd7's internal actions. I
> think it would be much cleaner if herd7 represented these events in some
> other way, particularly if we can tell it how.
>
> After all, herd7 already does generate different sets of events for
> successful (both R and W) and failed (only R) RMWs. It's not such a big
> stretch to make the R events it generates different in the two cases.
>

I thought we want to simplify the herd internal processing by avoid
adding mb events in cmpxchg() cases, in the same spirit, if syntactic
tagging is already good enough, why do we want to make herd complicate?

Regards,
Boqun

> Alan

2024-05-24 03:30:47

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

> Do you mean the one example in Table 3?
> What about cmpxchg() or cmpxchg_acquire()?

Yes, Table 3.

The cmpxchg*() primitives were not discussed in the paper. IIRC, their
representation has not changed since at least 1c27b644c0fd.

> We're definitely getting rid of some lines in herd7, that have been added
> solely for dealing with this specific case of LKMM.

Good. If the herd7 maintainers are "tired" of dealing with those lines,
that's definitely a big fat "why" to put in a changelog.

> Deal with what, no longer having to learn OCaml to be sure that the LKMM's
> formal definition matches the description in memory_barriers.txt?

Nope. ;-) Dealing with the review, testing, and maintainance of a new
representation.

> - it makes it easier to maintain the LKMM in the future, because you don't
> have to work around hidden transformations inside herd7
> - it makes implicit behavior explicit
> - it makes it easier to understand that the formalization matches the
> intention
> - it makes it easier to learn the LKMM from the formalization without having
> to cross-reference every bit with the informal documentation to avoid
> misunderstandings

Jonas - You write "less hidden", "less implicit", but I keep reading "a
representation I/some people would expect". We've already acknowledged
that's no deciding factor to abandon the current seasoned representation.

Andrea

2024-05-24 08:16:50

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

On 5/23/2024 10:30 PM, Jonas Oberhauser wrote:
>
>
> Am 5/23/2024 um 6:35 PM schrieb Andrea Parri:
>>> I would phrase it more extreme: I want to get rid of the unnecessary
>>> non-standard parts of the herd representation.
>>
>> Please indulge the thought that what might appear to be "non-standard"
>> to one or one's community might appear differently to others.
>
> Certainly. And of course, the formalization of the LKMM doesn't have to
> be optimized for the WMM community, even though I suspect that they (and
> possibly the tools they develop) are the main direct consumers of the
> formalization.
>
>> Continuing with the example above, I'm pretty sure your "standard" and
>> simple idea of mb-reads and mb-writes, or even "unsuccessful" mb-reads
>> will turn up the nose of many kernel developers...
>
> I'm not sure how true that is. Firstly I'm not sure how many kernel
> developers really read the formalization and try to see what it does,
> rather than relying on tools to gain an intuition of what's going on.
>
> But let's say a kernel developer wants to make sure that their intuition
> of how cmpxchg works is matched by the formal model, e.g., they want to
> double check that the formal model is correct
> after they got some unexpected results in a herd7 litmus test.
>
> How would they go about it today? The only way is to read the OCaml
> source code, because there's no other code that obviously defines the
> behavior. At best, they would read
>
> atomic_cmpxchg_acquire(X,V,W) __cmpxchg{acquire}(X,V,W)
> atomic_cmpxchg_release(X,V,W) __cmpxchg{release}(X,V,W)
> atomic_cmpxchg(X,V,W) __cmpxchg{mb}(X,V,W)
>
> see the Acquire, Release, and Mb references in the model, and think "ok,
> so '__cmpxchg{acquire}' means I get an Acquire tag in the success case
> which gives acq_po ordering, '__cmpxchg{release}' means I get a Release
> tag in the success case which gives po_rel ordering. Therefore
> '__cmpxchg{mb}' must give me an Mb tag in the success case. That would
> give me mb ordering, but not between the store and subsequent
> operations, and that's just wrong."
>
> But of course this "understanding by analogy" is broken, because there's
> a bunch of special OCaml match expressions in herd that look only for
> release or mb and just give totally different behavior for that one
> case. Every other tag behaves exactly the same way.
>
> At worst they wouldn't even guess that this only is the success ordering
> (that's definitely what happened to us in the weak memory model
> community, but we don't matter for this argument).
>
> A better situation would be to do something like this:
>
>                                        (*success*)(*fail*)
> atomic_cmpxchg_acquire(X,V,W) __cmpxchg {acquire} {once} (X,V,W)
> atomic_cmpxchg_release(X,V,W) __cmpxchg {release} {once} (X,V,W)
> atomic_cmpxchg(X,V,W)         __cmpxchg {mb}       {once} (X,V,W)

I also think this is the best option. We could change the internal
representation of __cmpxchg in herd7 to have two memory orders (*)
similar to the current internal representation of C11 CAS and get the
memory order to be used for the success/fail case from the .def instead
of having this hardcoded in the code.

(*) Not to be confused to how LKMM sees a CAS instruction.

Hernan

>
> and being explicit in the model about what the Mb tag does:
>
>      | [M];fencerel(Mb);[M] (* smp_mb() and operations with Mb ordering
> such as cmpxchg() *)
>      | [M];po;[R & Mb] (* reads with an Mb tag - coming from full fence
> rmws - are ordered as-if there was an smp_mb() right before the read *)
>      | [W & Mb];po;[M] (* correspondingly, writes with an Mb tag are
> ordered as-if there was an smp_mb() right after the write *)
>
> And suddenly you can read the model and map it 1:1 to the intuition that
> you can treat a successful cmpxchg() (or an xchg() etc.) as-if it was
> enclosed by smp_mb().
>
> There doesn't need to be a real fence.
>
>
>> The current repre-
>> sentation for xchg() was described in the ASPLOS'18 work
>
> Do you mean the one example in Table 3?
> What about cmpxchg() or cmpxchg_acquire()?
>
>> But that's not the point, "standards" can change and certainly kernels
>> and tools do. My remark was more to point out that you're not getting
>> rid of anything...,
>
> We're definitely getting rid of some lines in herd7, that have been
> added solely for dealing with this specific case of LKMM.
>
> For example, there's some code looking like this (for a memory ordering
> tag `a`)
>
>                 (match a with ["release"] -> a | _ -> a_once)
>
> which specifically refers to the release tag in LKMM and we can turn
> that into
>
>                 a
>
> with no more reference to any LKMM tags. Of course, this is not the
> Linux community's problem, just the herd (and others who want to use the
> literal cat file of LKMM) maintainers who have to deal with it.
>
> And imagine that at some point you want to add another tag to the linux
> kernel - like for example for 'accessing an atomic in initialization
> code, so that the compiler can do optimizations like merge a bunch of
> bitwise operations'. Let's call it 'init.
>
> What will be the behavior of
>
> WRITE_INIT(X,V) __store{init}(X,V);
>
> with the current herd7? Honestly I have no clue, because there might be a
>
>                 (match a with ["release"] -> a | _ -> a_once)
>
> somewhere that will turn this 'init into 'once. We'd have to comb the
> herd7 codebase to know for sure (or hope that our experiments are
> sufficiently thorough to catch all cases).
>
> >you're simply proposing a different representation
>
> I wouldn't phrase it like that, since it's a representation many people
> familiar with WMM would expect, but admittedly that doesn't have to be
> the deciding factor.
>
>> asking kernel developers & maintainers to "deal with it"
>
> Deal with what, no longer having to learn OCaml to be sure that the
> LKMM's formal definition matches the description in memory_barriers.txt?
> Otherwise, I don't see anything that changes for the worse.
> With the change, people need to think for a few seconds to see that the
> explicit rules in the .cat file are a formalization of "treat xchg() as
> if it had smp_mb() before and after".
> Without the change, they need to read an OCaml codebase to see that is
> meant by
>
> atomic_xchg(X,V,W) __xchg{mb}(X,V,W)
>
> So I don't see any downsides.
>
>
> I would also support making the representation explicit in the .def
> files instead, with something like
>
> cmp_xchg(x,e,v) = { if (*x==e) { smp_mb(); int r =
> __cmp_xchg{once}(x,e,v) ; smp_mb(); return r } else { __read{once}(x) } }
>
> Then you get to keep the representation.
> Without knowing herd's internals, I have no idea of how to seriously
> specify a meta language so that herd could effectively and efficiently
> deal with it in general. But one could at least envision some specific
> syntax for cmpxchg with a code sequence for the failure case and a code
> sequence for the success case.
>
> Personally I don't prefer this option, firstly because I don't see a
> good reason for the Linux community to go their own way here. I don't
> think there's really much of a problem with saying "this is the
> intuition; this is how we formalize it" and for the two not to be
> completely identical. It happens all the time, and in this case the gap
> between "we formalize it by really having two smp_mb() appear in the
> graph if it's successful" and "we formalize it by providing the same
> ordering that the smp_mb() would have if it was there" isn't big.
> Especially for those kernel people who have enough motivation to
> actually deep dive into the formalization in the first place. But it
> makes it a lot more accessible for the WMM community, which can only
> benefit LKMM.
>
> And secondly because people will probably mostly focus on the .cat file,
> which means that it's still a bit of a booby trap to put something so
> important (and perhaps surprising) into the .def file, but at least one
> that is in the open for people who are more careful and also read the
> .def file.
>
> > I am looking forward to something more than "because we can".
>
> - it makes it easier to maintain the LKMM in the future, because you
> don't have to work around hidden transformations inside herd7
> - it makes implicit behavior explicit
> - it makes it easier to understand that the formalization matches the
> intention
> - it makes it easier to learn the LKMM from the formalization without
> having to cross-reference every bit with the informal documentation to
> avoid misunderstandings
>
>
> Have a good time,
> jonas
>
>
>

2024-05-24 14:14:37

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

On Thu, May 23, 2024 at 07:50:11PM -0700, Boqun Feng wrote:
> On Thu, May 23, 2024 at 09:38:05PM -0400, Alan Stern wrote:
> > On Thu, May 23, 2024 at 08:14:38AM -0700, Boqun Feng wrote:
> > > Besides, I'm not sure this is a good idea. Because the "{mb}, {once},
> > > etc" part is a syntax thing, you write a cmpxchg(), it should be
> > > translated to a cmpxchg event with MB tag on. As to failed cmpxchg()
> > > doesn't provide ordering, it's a semantics thing, as Jonas showed that
> > > it can be represent in cat file. As long as it's a semanitc thing and we
> > > can represent in cat file, I don't think we want herd to give a special
> > > treatment.
> >
> > I don't really understand the distinction you're making between
> > syntactic things and semantic things. For most instructions there's no
>
> Syntax is how the code is written, and semantic is how the code is
> executed (in each execution candidate). So if we write a cmpxchg{mb}(),
> and in execution candiates, it could generates a read{MB} event and a
> write{MB} event (succeed case), or a read{MB} event (fail case), "{MB}"
> here doesn't mean it's a full barrier, it only means the event comes
> from a no suffix API. Here "{MB}" only has syntactic meaning (no
> semantic meaning).

Okay, I get it. Then you might agree that it probably would be better
to use a different tag here, because the mb tag is already in use with
other instructions (like smp_mb()) where it does always mean there's a
full barrier.

> Not really, RMW events contains all events generated from
> read-modify-write primitives, if there is an R event without a rmw
> relation (i.e there is no corresponding write event), it's a failed RWM
> by definition: it cannot be anything else.

Not true. An R event without an rmw relation could be a READ_ONCE().
Or a plain read. The memory model uses the tag to distinguish these
cases.

> > that it would work is merely an artifact of herd7's internal actions. I
> > think it would be much cleaner if herd7 represented these events in some
> > other way, particularly if we can tell it how.
> >
> > After all, herd7 already does generate different sets of events for
> > successful (both R and W) and failed (only R) RMWs. It's not such a big
> > stretch to make the R events it generates different in the two cases.
> >
>
> I thought we want to simplify the herd internal processing by avoid
> adding mb events in cmpxchg() cases, in the same spirit, if syntactic
> tagging is already good enough, why do we want to make herd complicate?

Herd7 already is complicated by the fact that it needs to handle
cmpxchg() instructions in two ways: success and failure. This
complication is unavoidable. Adding one extra layer (different tags for
the different ways) is an insignificant increase in the complication,
IMO. And it's a net reduction when you compare it to the amount of
complication currently in the herd7 code.

Also what about cmpxchg_acquire()? If it fails, it will generate an R
event with an acquire tag not in the rmw relation. There is no way to
tell such events apart from a normal smp_load_acquire(), and so the .cat
file would have no way to know that the event should not have acquire
semantics. I guess we would have to rename this tag, as well.

Alan

2024-05-24 14:34:19

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

On Fri, May 24, 2024 at 10:14:25AM -0400, Alan Stern wrote:
[...]
> > Not really, RMW events contains all events generated from
> > read-modify-write primitives, if there is an R event without a rmw
> > relation (i.e there is no corresponding write event), it's a failed RWM
> > by definition: it cannot be anything else.
>
> Not true. An R event without an rmw relation could be a READ_ONCE().

No, the R event is already in the RWM events, it has come from a rwm
atomic.

> Or a plain read. The memory model uses the tag to distinguish these
> cases.
>
> > > that it would work is merely an artifact of herd7's internal actions. I
> > > think it would be much cleaner if herd7 represented these events in some
> > > other way, particularly if we can tell it how.
> > >
> > > After all, herd7 already does generate different sets of events for
> > > successful (both R and W) and failed (only R) RMWs. It's not such a big
> > > stretch to make the R events it generates different in the two cases.
> > >
> >
> > I thought we want to simplify the herd internal processing by avoid
> > adding mb events in cmpxchg() cases, in the same spirit, if syntactic
> > tagging is already good enough, why do we want to make herd complicate?
>
> Herd7 already is complicated by the fact that it needs to handle
> cmpxchg() instructions in two ways: success and failure. This
> complication is unavoidable. Adding one extra layer (different tags for
> the different ways) is an insignificant increase in the complication,
> IMO. And it's a net reduction when you compare it to the amount of
> complication currently in the herd7 code.
>
> Also what about cmpxchg_acquire()? If it fails, it will generate an R
> event with an acquire tag not in the rmw relation. There is no way to
> tell such events apart from a normal smp_load_acquire(), and so the .cat
> file would have no way to know that the event should not have acquire
> semantics. I guess we would have to rename this tag, as well.

No,

let read_of_rmw = (RMW & R)
let fail_read_of_rmw = read_of_rmw / dom(rmw)
let fail_cmpxchg_acquire = fail_read_of_rmw & [Acquire]

gives you all the failed cmpxchg_acquire() apart from a normal
smp_load_acquire().

Regards,
Boqun

>
> Alan

2024-05-24 14:53:28

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

On Fri, May 24, 2024 at 07:34:06AM -0700, Boqun Feng wrote:
> On Fri, May 24, 2024 at 10:14:25AM -0400, Alan Stern wrote:
> [...]
> > > Not really, RMW events contains all events generated from
> > > read-modify-write primitives, if there is an R event without a rmw
> > > relation (i.e there is no corresponding write event), it's a failed RWM
> > > by definition: it cannot be anything else.
> >
> > Not true. An R event without an rmw relation could be a READ_ONCE().
>
> No, the R event is already in the RWM events, it has come from a rwm
> atomic.

Oh, right. For some reason I was thinking that an instruction could
belong to only one set, R or RMW. But that doesn't make sense.

> > Or a plain read. The memory model uses the tag to distinguish these
> > cases.
> >
> > > > that it would work is merely an artifact of herd7's internal actions. I
> > > > think it would be much cleaner if herd7 represented these events in some
> > > > other way, particularly if we can tell it how.
> > > >
> > > > After all, herd7 already does generate different sets of events for
> > > > successful (both R and W) and failed (only R) RMWs. It's not such a big
> > > > stretch to make the R events it generates different in the two cases.
> > > >
> > >
> > > I thought we want to simplify the herd internal processing by avoid
> > > adding mb events in cmpxchg() cases, in the same spirit, if syntactic
> > > tagging is already good enough, why do we want to make herd complicate?
> >
> > Herd7 already is complicated by the fact that it needs to handle
> > cmpxchg() instructions in two ways: success and failure. This
> > complication is unavoidable. Adding one extra layer (different tags for
> > the different ways) is an insignificant increase in the complication,
> > IMO. And it's a net reduction when you compare it to the amount of
> > complication currently in the herd7 code.
> >
> > Also what about cmpxchg_acquire()? If it fails, it will generate an R
> > event with an acquire tag not in the rmw relation. There is no way to
> > tell such events apart from a normal smp_load_acquire(), and so the .cat
> > file would have no way to know that the event should not have acquire
> > semantics. I guess we would have to rename this tag, as well.
>
> No,
>
> let read_of_rmw = (RMW & R)
> let fail_read_of_rmw = read_of_rmw / dom(rmw)
> let fail_cmpxchg_acquire = fail_read_of_rmw & [Acquire]
>
> gives you all the failed cmpxchg_acquire() apart from a normal
> smp_load_acquire().

Yes, I see. Using this approach, no further changes to herd7 or the
def file would be needed. We would just have to add 'mb to the
Accesses enumeration and to the list of tags allowed for an RMW
instruction.

Question: Since R and RMW have different lists of allowable tags, how
does herd7 decide which tags are allowed for an event that is in both
the R and RMW sets?

Alan

2024-05-24 18:10:08

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

Am 5/24/2024 um 4:53 PM schrieb Alan Stern:
> On Fri, May 24, 2024 at 07:34:06AM -0700, Boqun Feng wrote:
>> On Fri, May 24, 2024 at 10:14:25AM -0400, Alan Stern wrote:
>> [...]
>>>> Not really, RMW events contains all events generated from
>>>> read-modify-write primitives, if there is an R event without a rmw
>>>> relation (i.e there is no corresponding write event), it's a failed RWM
>>>> by definition: it cannot be anything else.
>>>
>>> Not true. An R event without an rmw relation could be a READ_ONCE().
>>
>> No, the R event is already in the RWM events, it has come from a rwm
>> atomic.
>
> Oh, right. For some reason I was thinking that an instruction could
> belong to only one set, R or RMW. But that doesn't make sense.

I thought the same, so I perhaps contributed to that confusion.

>
>>> Or a plain read. The memory model uses the tag to distinguish these
>>> cases.
>>>
>>>>> that it would work is merely an artifact of herd7's internal actions. I
>>>>> think it would be much cleaner if herd7 represented these events in some
>>>>> other way, particularly if we can tell it how.
>>>>>
>>>>> After all, herd7 already does generate different sets of events for
>>>>> successful (both R and W) and failed (only R) RMWs. It's not such a big
>>>>> stretch to make the R events it generates different in the two cases.
>>>>>
>>>>
>>>> I thought we want to simplify the herd internal processing by avoid
>>>> adding mb events in cmpxchg() cases, in the same spirit, if syntactic
>>>> tagging is already good enough, why do we want to make herd complicate?
>>>
>>> Herd7 already is complicated by the fact that it needs to handle
>>> cmpxchg() instructions in two ways: success and failure. This
>>> complication is unavoidable. Adding one extra layer (different tags for
>>> the different ways) is an insignificant increase in the complication,
>>> IMO. And it's a net reduction when you compare it to the amount of
>>> complication currently in the herd7 code.
>>>
>>> Also what about cmpxchg_acquire()? If it fails, it will generate an R
>>> event with an acquire tag not in the rmw relation.

I would like that, but that is not the current implementation, a failed
atomic_compare_exchange always produces a R*[once]; this behavior is
currently hardcoded in herd7.

>>> There is no way to
>>> tell such events apart from a normal smp_load_acquire(), and so the .cat
>>> file would have no way to know that the event should not have acquire
>>> semantics. I guess we would have to rename this tag, as well.
>>
>> No,
>>
>> let read_of_rmw = (RMW & R)
>> let fail_read_of_rmw = read_of_rmw / dom(rmw)
>> let fail_cmpxchg_acquire = fail_read_of_rmw & [Acquire]
>>
>> gives you all the failed cmpxchg_acquire() apart from a normal
>> smp_load_acquire().
>
> Yes, I see. Using this approach, no further changes to herd7 or the
> def file would be needed. We would just have to add 'mb to the
> Accesses enumeration and to the list of tags allowed for an RMW
> instruction.
>
> Question: Since R and RMW have different lists of allowable tags, how
> does herd7 decide which tags are allowed for an event that is in both
> the R and RMW sets?

After looking over the patch draft for herd7 written by Hernan [1], my
best guess is: it doen't. It seems that after herd7 detects you are
using LKMM it simply drops all tags except 'acquire on read and 'release
on store. Everything else becomes 'once (and 'mb adds some F[mb] sometimes).

That means that if one were to go through with the earlier suggestion to
distinguish between the smp_mb() Mb and the cmpxchg() Mb, e.g. by
calling the latter RmwMb, current herd7 would always erase the RmwMb tag
because it is not called "acquire" or "release".
The same would happen of course if you introduced an RmwAcquire tag.

So there are several options:

- treat the tags as a syntactic thing which is always present, and
specify their meaning purely in the cat file, analogous to what you
have defined above. This is personally my favorite solution. To
implement this in herd7 we would simply remove all the current special
cases for the LKMM barriers, which I like.

However then we need to actually define the behavior of herd if
an inappropriate tag (like release on a load) is provided. Since the
restriction is actually mostly enforced by the .def file - by simply
not providing a smp_store_acquire() etc. -, that only concerns RMWs,
where xchg_acquire() would apply the Acquire tag to the write also.

The easiest solution is to simply remove the syntax for specifying
restrictions - it seems it is not doing much right now anyways - and
do the filtering inside .bell, with things like

(* only writes can have Release tags *)
let Release = Release & W \ (RMW \ rng(rmw))

One good thing about this way is that it would work even without
modifying herd, since it is in a sense idempotent with the
transformations done by herd.

FWIW, in our internal weak memory model in Dresden we did exactly this,
and use REL for the syntax and Rel for the semantic release ordering to
make the distinction more clear, with things like:

let Acq = (ACQ | SC) & (R | F)
let Rel = (REL | SC) & (W | F)

(SC is our equivalent to LKMM's Mb)

- treat the tags as semantic markers that are only present or not under
some circumstances, and define the semantics fully based on the present
tags. The circumstances are currently hardcoded in herd7, but we should
move them out with some syntax like __cmpxchg{acquire}{once}.

This requires touching the parser and of course still has the issue
with the acquire tag appearing on the store as well.

- provide a full syntax for defining the event sequence that is
expected. For example, for a cmpxchg we could do

cmpxchg = { F[success-cmpxchg]; c = R&RMW[once]; if (c==v) {
W&RMW[once]; } F[success-cmpxchg] }

and then define in .bell that a success-cmpxchg is an mb if it is
directly next to a cmpxchg that succeeds.

The advantage is that you can customize the representation to whatever
kernel devs thing is the most intuitive. The downside is that it
requires major rewrites to herd7, someone to think about a reasonable
language for specifying this etc.

Best wishes,
jonas

[1] https://github.com/herd/herdtools7/pull/865

2024-05-24 18:47:43

[permalink] [raw]

Subject: Re: LKMM: Making RMW barriers explicit

On Fri, May 24, 2024 at 08:09:28PM +0200, Jonas Oberhauser wrote:
[...]
> >
> > Question: Since R and RMW have different lists of allowable tags, how
> > does herd7 decide which tags are allowed for an event that is in both
> > the R and RMW sets?
>
> After looking over the patch draft for herd7 written by Hernan [1], my best
> guess is: it doen't. It seems that after herd7 detects you are using LKMM it

Right, you can event put a 'acquire tag to WRITE_ONCE():

-WRITE_ONCE(X,V) { __store{once}(X,V); }
+WRITE_ONCE(X,V) { __store{acquire}(X,V); }

, herd won't complain, but it may change the model behavior.

Here is what I know from the herd code:

* First, normally herd just put whatever the tag you specify in
.def file to the accesses event (it has to be one of the tags
in Access list in .bell file).

* An access event looks like:

(read_or_write?, ..., anonatation, is_atomic?, ...)

"is_atomic?" means whether the event comes from an rmw
primitives. So a READ_ONCE() will look like:

(read, ..., once, false, ...)

and a WRITE_ONCE() will look like:

(write, ..., once, false, ...)

the read part of an atomic_add_return_relaxed() is:

(read, ..., once, true, ...)

note that the "is_atomic?" is how this event ends up in "RMW"
set.

> simply drops all tags except 'acquire on read and 'release on store.

Right, herd does some extra work to filter out RELEASE reads and ACQUIRE
writes for rwm atomics, basically, when herd is generating events for an
rmw atomic, if it's not "mb", it will only kill 'acquire for read and
'release on store, otherwise, it changes the annotation part to 'once.
Funny example, if you do a __atomic_fetch_op{hello}(..), herd will treat
it as a __atomic_fetch_op{once}(..) because of this.

> Everything else becomes 'once (and 'mb adds some F[mb] sometimes).
>
> That means that if one were to go through with the earlier suggestion to
> distinguish between the smp_mb() Mb and the cmpxchg() Mb, e.g. by calling
> the latter RmwMb, current herd7 would always erase the RmwMb tag because it
> is not called "acquire" or "release".
> The same would happen of course if you introduced an RmwAcquire tag.
>
> So there are several options:
>
> - treat the tags as a syntactic thing which is always present, and
> specify their meaning purely in the cat file, analogous to what you
> have defined above. This is personally my favorite solution. To
> implement this in herd7 we would simply remove all the current special
> cases for the LKMM barriers, which I like.
>

Agreed.

> However then we need to actually define the behavior of herd if
> an inappropriate tag (like release on a load) is provided. Since the
> restriction is actually mostly enforced by the .def file - by simply
> not providing a smp_store_acquire() etc. -, that only concerns RMWs,
> where xchg_acquire() would apply the Acquire tag to the write also.
>
> The easiest solution is to simply remove the syntax for specifying
> restrictions - it seems it is not doing much right now anyways - and
> do the filtering inside .bell, with things like
>
> (* only writes can have Release tags *)
> let Release = Release & W \ (RMW \ rng(rmw))
>
> One good thing about this way is that it would work even without
> modifying herd, since it is in a sense idempotent with the

Well, we still need to turn off the "smart" annotation rewritting in
herd (e.g. -> once), but I think that's a good thing: it simplifies the
internal work herd does, and it also helps people on other tools
understand LKMM better.

> transformations done by herd.
>
> FWIW, in our internal weak memory model in Dresden we did exactly this,

so the model doesn't work elsewhere even in Germany? ;-) Sorry, couldn'
t resist ;-) ;-) ;-)

> and use REL for the syntax and Rel for the semantic release ordering to
> make the distinction more clear, with things like:
>
> let Acq = (ACQ | SC) & (R | F)
> let Rel = (REL | SC) & (W | F)
>
> (SC is our equivalent to LKMM's Mb)
>
> - treat the tags as semantic markers that are only present or not under
> some circumstances, and define the semantics fully based on the present
> tags. The circumstances are currently hardcoded in herd7, but we should
> move them out with some syntax like __cmpxchg{acquire}{once}.
>
> This requires touching the parser and of course still has the issue
> with the acquire tag appearing on the store as well.
>
> - provide a full syntax for defining the event sequence that is
> expected. For example, for a cmpxchg we could do
>
> cmpxchg = { F[success-cmpxchg]; c = R&RMW[once]; if (c==v) {
> W&RMW[once]; } F[success-cmpxchg] }
>
> and then define in .bell that a success-cmpxchg is an mb if it is
> directly next to a cmpxchg that succeeds.
>
> The advantage is that you can customize the representation to whatever
> kernel devs thing is the most intuitive. The downside is that it
> requires major rewrites to herd7, someone to think about a reasonable
> language for specifying this etc.
>

I no doubt am the fan of the first approach, herd is powerful because
the ability to customize the semantic rules for model ordering models,
moving more parts from herd internals to the cat file (or bell file) is
always a good thing to me.

Regards,
Boqun

>
>
> Best wishes,
> jonas
>
>
>
> [1] https://github.com/herd/herdtools7/pull/865
>

2024-05-24 18:57:46