Remove this subtle (and, AFAICT, unused) ordering: we can add it back,
if necessary, but let us not encourage people to rely on this thing.
For example, the following "exists" clause can be satisfied with this
change:
C dep-rfi
{ }
P0(int *x, int *y)
{
WRITE_ONCE(*x, 1);
smp_store_release(y, 1);
}
P1(int *x, int *y, int *z)
{
int r0;
int r1;
int r2;
r0 = READ_ONCE(*y);
WRITE_ONCE(*z, r0);
r1 = smp_load_acquire(z);
r2 = READ_ONCE(*x);
}
exists (1:r0=1 /\ 1:r2=0)
Signed-off-by: Andrea Parri <[email protected]>
Cc: Alan Stern <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: David Howells <[email protected]>
Cc: Jade Alglave <[email protected]>
Cc: Luc Maranget <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Akira Yokosawa <[email protected]>
Cc: Daniel Lustig <[email protected]>
---
tools/memory-model/Documentation/explanation.txt | 28 ------------------------
tools/memory-model/linux-kernel.cat | 2 +-
2 files changed, 1 insertion(+), 29 deletions(-)
diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.txt
index 68caa9a976d0c..965e11744d090 100644
--- a/tools/memory-model/Documentation/explanation.txt
+++ b/tools/memory-model/Documentation/explanation.txt
@@ -1019,34 +1019,6 @@ section for more details). The kernel includes a workaround for this
problem when the loads come from READ_ONCE(), and therefore the LKMM
includes address dependencies to loads in the ppo relation.
-On the other hand, dependencies can indirectly affect the ordering of
-two loads. This happens when there is a dependency from a load to a
-store and a second, po-later load reads from that store:
-
- R ->dep W ->rfi R',
-
-where the dep link can be either an address or a data dependency. In
-this situation we know it is possible for the CPU to execute R' before
-W, because it can forward the value that W will store to R'. But it
-cannot execute R' before R, because it cannot forward the value before
-it knows what that value is, or that W and R' do access the same
-location. However, if there is merely a control dependency between R
-and W then the CPU can speculatively forward W to R' before executing
-R; if the speculation turns out to be wrong then the CPU merely has to
-restart or abandon R'.
-
-(In theory, a CPU might forward a store to a load when it runs across
-an address dependency like this:
-
- r1 = READ_ONCE(ptr);
- WRITE_ONCE(*r1, 17);
- r2 = READ_ONCE(*r1);
-
-because it could tell that the store and the second load access the
-same location even before it knows what the location's address is.
-However, none of the architectures supported by the Linux kernel do
-this.)
-
Two memory accesses of the same location must always be executed in
program order if the second access is a store. Thus, if we have
diff --git a/tools/memory-model/linux-kernel.cat b/tools/memory-model/linux-kernel.cat
index 8dcb37835b613..6b9e3bb4e397f 100644
--- a/tools/memory-model/linux-kernel.cat
+++ b/tools/memory-model/linux-kernel.cat
@@ -62,7 +62,7 @@ let dep = addr | data
let rwdep = (dep | ctrl) ; [W]
let overwrite = co | fr
let to-w = rwdep | (overwrite & int)
-let to-r = addr | (dep ; rfi)
+let to-r = addr ; [R]
let fence = strong-fence | wmb | po-rel | rmb | acq-po
let ppo = to-r | to-w | fence | (po-unlock-rf-lock-po & int)
--
2.7.4
On Tue, Feb 19, 2019 at 11:57:37PM +0100, Andrea Parri wrote:
> Remove this subtle (and, AFAICT, unused) ordering: we can add it back,
> if necessary, but let us not encourage people to rely on this thing.
>
> For example, the following "exists" clause can be satisfied with this
> change:
>
> C dep-rfi
>
> { }
>
> P0(int *x, int *y)
> {
> WRITE_ONCE(*x, 1);
> smp_store_release(y, 1);
> }
>
> P1(int *x, int *y, int *z)
> {
> int r0;
> int r1;
> int r2;
>
> r0 = READ_ONCE(*y);
> WRITE_ONCE(*z, r0);
> r1 = smp_load_acquire(z);
> r2 = READ_ONCE(*x);
> }
>
> exists (1:r0=1 /\ 1:r2=0)
Any objections? If I don't hear any in a couple days, I will apply this.
Thanx, Paul
> Signed-off-by: Andrea Parri <[email protected]>
> Cc: Alan Stern <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Nicholas Piggin <[email protected]>
> Cc: David Howells <[email protected]>
> Cc: Jade Alglave <[email protected]>
> Cc: Luc Maranget <[email protected]>
> Cc: "Paul E. McKenney" <[email protected]>
> Cc: Akira Yokosawa <[email protected]>
> Cc: Daniel Lustig <[email protected]>
> ---
> tools/memory-model/Documentation/explanation.txt | 28 ------------------------
> tools/memory-model/linux-kernel.cat | 2 +-
> 2 files changed, 1 insertion(+), 29 deletions(-)
>
> diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.txt
> index 68caa9a976d0c..965e11744d090 100644
> --- a/tools/memory-model/Documentation/explanation.txt
> +++ b/tools/memory-model/Documentation/explanation.txt
> @@ -1019,34 +1019,6 @@ section for more details). The kernel includes a workaround for this
> problem when the loads come from READ_ONCE(), and therefore the LKMM
> includes address dependencies to loads in the ppo relation.
>
> -On the other hand, dependencies can indirectly affect the ordering of
> -two loads. This happens when there is a dependency from a load to a
> -store and a second, po-later load reads from that store:
> -
> - R ->dep W ->rfi R',
> -
> -where the dep link can be either an address or a data dependency. In
> -this situation we know it is possible for the CPU to execute R' before
> -W, because it can forward the value that W will store to R'. But it
> -cannot execute R' before R, because it cannot forward the value before
> -it knows what that value is, or that W and R' do access the same
> -location. However, if there is merely a control dependency between R
> -and W then the CPU can speculatively forward W to R' before executing
> -R; if the speculation turns out to be wrong then the CPU merely has to
> -restart or abandon R'.
> -
> -(In theory, a CPU might forward a store to a load when it runs across
> -an address dependency like this:
> -
> - r1 = READ_ONCE(ptr);
> - WRITE_ONCE(*r1, 17);
> - r2 = READ_ONCE(*r1);
> -
> -because it could tell that the store and the second load access the
> -same location even before it knows what the location's address is.
> -However, none of the architectures supported by the Linux kernel do
> -this.)
> -
> Two memory accesses of the same location must always be executed in
> program order if the second access is a store. Thus, if we have
>
> diff --git a/tools/memory-model/linux-kernel.cat b/tools/memory-model/linux-kernel.cat
> index 8dcb37835b613..6b9e3bb4e397f 100644
> --- a/tools/memory-model/linux-kernel.cat
> +++ b/tools/memory-model/linux-kernel.cat
> @@ -62,7 +62,7 @@ let dep = addr | data
> let rwdep = (dep | ctrl) ; [W]
> let overwrite = co | fr
> let to-w = rwdep | (overwrite & int)
> -let to-r = addr | (dep ; rfi)
> +let to-r = addr ; [R]
> let fence = strong-fence | wmb | po-rel | rmb | acq-po
> let ppo = to-r | to-w | fence | (po-unlock-rf-lock-po & int)
>
> --
> 2.7.4
>
On Tue, Feb 19, 2019 at 06:01:17PM -0800, Paul E. McKenney wrote:
> On Tue, Feb 19, 2019 at 11:57:37PM +0100, Andrea Parri wrote:
> > Remove this subtle (and, AFAICT, unused) ordering: we can add it back,
> > if necessary, but let us not encourage people to rely on this thing.
> >
> > For example, the following "exists" clause can be satisfied with this
> > change:
> >
> > C dep-rfi
> >
> > { }
> >
> > P0(int *x, int *y)
> > {
> > WRITE_ONCE(*x, 1);
> > smp_store_release(y, 1);
> > }
> >
> > P1(int *x, int *y, int *z)
> > {
> > int r0;
> > int r1;
> > int r2;
> >
> > r0 = READ_ONCE(*y);
> > WRITE_ONCE(*z, r0);
> > r1 = smp_load_acquire(z);
> > r2 = READ_ONCE(*x);
> > }
> >
> > exists (1:r0=1 /\ 1:r2=0)
>
> Any objections? If I don't hear any in a couple days, I will apply this.
IIUC you cannot build hardware that allows the above, so why would we
allow it?
On Wed, Feb 20, 2019 at 10:26:04AM +0100, Peter Zijlstra wrote:
> On Tue, Feb 19, 2019 at 06:01:17PM -0800, Paul E. McKenney wrote:
> > On Tue, Feb 19, 2019 at 11:57:37PM +0100, Andrea Parri wrote:
> > > Remove this subtle (and, AFAICT, unused) ordering: we can add it back,
> > > if necessary, but let us not encourage people to rely on this thing.
> > >
> > > For example, the following "exists" clause can be satisfied with this
> > > change:
> > >
> > > C dep-rfi
> > >
> > > { }
> > >
> > > P0(int *x, int *y)
> > > {
> > > WRITE_ONCE(*x, 1);
> > > smp_store_release(y, 1);
> > > }
> > >
> > > P1(int *x, int *y, int *z)
> > > {
> > > int r0;
> > > int r1;
> > > int r2;
> > >
> > > r0 = READ_ONCE(*y);
> > > WRITE_ONCE(*z, r0);
> > > r1 = smp_load_acquire(z);
> > > r2 = READ_ONCE(*x);
> > > }
> > >
> > > exists (1:r0=1 /\ 1:r2=0)
> >
> > Any objections? If I don't hear any in a couple days, I will apply this.
>
> IIUC you cannot build hardware that allows the above, so why would we
> allow it?
Agreed. Maybe the intention was to make the dependency between the read of
*y and the write of *z on P1 a control dependency instead? That's certainly
allowed on arm64.
Will
On Wed, Feb 20, 2019 at 10:26:04AM +0100, Peter Zijlstra wrote:
> On Tue, Feb 19, 2019 at 06:01:17PM -0800, Paul E. McKenney wrote:
> > On Tue, Feb 19, 2019 at 11:57:37PM +0100, Andrea Parri wrote:
> > > Remove this subtle (and, AFAICT, unused) ordering: we can add it back,
> > > if necessary, but let us not encourage people to rely on this thing.
> > >
> > > For example, the following "exists" clause can be satisfied with this
> > > change:
> > >
> > > C dep-rfi
> > >
> > > { }
> > >
> > > P0(int *x, int *y)
> > > {
> > > WRITE_ONCE(*x, 1);
> > > smp_store_release(y, 1);
> > > }
> > >
> > > P1(int *x, int *y, int *z)
> > > {
> > > int r0;
> > > int r1;
> > > int r2;
> > >
> > > r0 = READ_ONCE(*y);
> > > WRITE_ONCE(*z, r0);
> > > r1 = smp_load_acquire(z);
> > > r2 = READ_ONCE(*x);
> > > }
> > >
> > > exists (1:r0=1 /\ 1:r2=0)
> >
> > Any objections? If I don't hear any in a couple days, I will apply this.
>
> IIUC you cannot build hardware that allows the above, so why would we
> allow it?
The change/simplification was mainly intended as precautionary measure
(hence the "we can add it back, ..."): I do agree that it shouldn't be
possible to realize the above state; OTOH, you really don't need to be
too "creative" to imagine possible mis-uses/mis-interpretations of the
(dep ; rfi) term ("forget" ONCEs, trick herd7 with "false dependencies"
or simply wrongly assume that control dependencies are part this "dep",
what else? ...). So, no, I'm not that fond to this term; why should I
be? or you are simply suggesting to expand the changelog?
I should probably also remark that "such a simplification" wouldn't be
a first time for the LKMM and, in fact, for the ppo term itself:
https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/LWNLinuxMM/WeakModel.html#Preserved%20Program%20Order
Andrea
On Wed, Feb 20, 2019 at 09:57:00AM +0000, Will Deacon wrote:
> On Wed, Feb 20, 2019 at 10:26:04AM +0100, Peter Zijlstra wrote:
> > On Tue, Feb 19, 2019 at 06:01:17PM -0800, Paul E. McKenney wrote:
> > > On Tue, Feb 19, 2019 at 11:57:37PM +0100, Andrea Parri wrote:
> > > > Remove this subtle (and, AFAICT, unused) ordering: we can add it back,
> > > > if necessary, but let us not encourage people to rely on this thing.
> > > >
> > > > For example, the following "exists" clause can be satisfied with this
> > > > change:
> > > >
> > > > C dep-rfi
> > > >
> > > > { }
> > > >
> > > > P0(int *x, int *y)
> > > > {
> > > > WRITE_ONCE(*x, 1);
> > > > smp_store_release(y, 1);
> > > > }
> > > >
> > > > P1(int *x, int *y, int *z)
> > > > {
> > > > int r0;
> > > > int r1;
> > > > int r2;
> > > >
> > > > r0 = READ_ONCE(*y);
> > > > WRITE_ONCE(*z, r0);
> > > > r1 = smp_load_acquire(z);
> > > > r2 = READ_ONCE(*x);
> > > > }
> > > >
> > > > exists (1:r0=1 /\ 1:r2=0)
> > >
> > > Any objections? If I don't hear any in a couple days, I will apply this.
> >
> > IIUC you cannot build hardware that allows the above, so why would we
> > allow it?
>
> Agreed. Maybe the intention was to make the dependency between the read of
> *y and the write of *z on P1 a control dependency instead? That's certainly
> allowed on arm64.
No no, I did mean dep (= addr | data). As you remarked, control dep.
aren't going to work here. I expanded on this in my reply to peterz.
Andrea
>
> Will
On Wed, Feb 20, 2019 at 02:14:56PM +0100, Andrea Parri wrote:
> On Wed, Feb 20, 2019 at 10:26:04AM +0100, Peter Zijlstra wrote:
> > On Tue, Feb 19, 2019 at 06:01:17PM -0800, Paul E. McKenney wrote:
> > > On Tue, Feb 19, 2019 at 11:57:37PM +0100, Andrea Parri wrote:
> > > > Remove this subtle (and, AFAICT, unused) ordering: we can add it back,
> > > > if necessary, but let us not encourage people to rely on this thing.
> > > >
> > > > For example, the following "exists" clause can be satisfied with this
> > > > change:
> > > >
> > > > C dep-rfi
> > > >
> > > > { }
> > > >
> > > > P0(int *x, int *y)
> > > > {
> > > > WRITE_ONCE(*x, 1);
> > > > smp_store_release(y, 1);
> > > > }
> > > >
> > > > P1(int *x, int *y, int *z)
> > > > {
> > > > int r0;
> > > > int r1;
> > > > int r2;
> > > >
> > > > r0 = READ_ONCE(*y);
> > > > WRITE_ONCE(*z, r0);
> > > > r1 = smp_load_acquire(z);
> > > > r2 = READ_ONCE(*x);
> > > > }
> > > >
> > > > exists (1:r0=1 /\ 1:r2=0)
> > >
> > > Any objections? If I don't hear any in a couple days, I will apply this.
> >
> > IIUC you cannot build hardware that allows the above, so why would we
> > allow it?
>
> The change/simplification was mainly intended as precautionary measure
> (hence the "we can add it back, ..."): I do agree that it shouldn't be
> possible to realize the above state; OTOH, you really don't need to be
> too "creative" to imagine possible mis-uses/mis-interpretations of the
> (dep ; rfi) term ("forget" ONCEs, trick herd7 with "false dependencies"
> or simply wrongly assume that control dependencies are part this "dep",
> what else? ...). So, no, I'm not that fond to this term; why should I
> be? or you are simply suggesting to expand the changelog?
>
> I should probably also remark that "such a simplification" wouldn't be
> a first time for the LKMM and, in fact, for the ppo term itself:
>
> https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/LWNLinuxMM/WeakModel.html#Preserved%20Program%20Order
I'm completely failing to understand any of that.
What I do object to is a model that's weaker than any possible sane
hardware.
On Wed, Feb 20, 2019 at 02:14:56PM +0100, Andrea Parri wrote:
> On Wed, Feb 20, 2019 at 10:26:04AM +0100, Peter Zijlstra wrote:
> > On Tue, Feb 19, 2019 at 06:01:17PM -0800, Paul E. McKenney wrote:
> > > On Tue, Feb 19, 2019 at 11:57:37PM +0100, Andrea Parri wrote:
> > > > Remove this subtle (and, AFAICT, unused) ordering: we can add it back,
> > > > if necessary, but let us not encourage people to rely on this thing.
> > > >
> > > > For example, the following "exists" clause can be satisfied with this
> > > > change:
> > > >
> > > > C dep-rfi
> > > >
> > > > { }
> > > >
> > > > P0(int *x, int *y)
> > > > {
> > > > WRITE_ONCE(*x, 1);
> > > > smp_store_release(y, 1);
> > > > }
> > > >
> > > > P1(int *x, int *y, int *z)
> > > > {
> > > > int r0;
> > > > int r1;
> > > > int r2;
> > > >
> > > > r0 = READ_ONCE(*y);
> > > > WRITE_ONCE(*z, r0);
> > > > r1 = smp_load_acquire(z);
> > > > r2 = READ_ONCE(*x);
> > > > }
> > > >
> > > > exists (1:r0=1 /\ 1:r2=0)
> > >
> > > Any objections? If I don't hear any in a couple days, I will apply this.
> >
> > IIUC you cannot build hardware that allows the above, so why would we
> > allow it?
>
> The change/simplification was mainly intended as precautionary measure
> (hence the "we can add it back, ..."): I do agree that it shouldn't be
> possible to realize the above state; OTOH, you really don't need to be
> too "creative" to imagine possible mis-uses/mis-interpretations of the
> (dep ; rfi) term ("forget" ONCEs, trick herd7 with "false dependencies"
> or simply wrongly assume that control dependencies are part this "dep",
> what else? ...). So, no, I'm not that fond to this term; why should I
> be? or you are simply suggesting to expand the changelog?
Simplification can mean different things to different people.
Whilst I completely agree that relying on the ordering provided by "dep ;
rfi" is subtle and error prone, having it forbid the outcome above appeals
to a hardware-based mindset of how memory ordering works. In the kernel
community, I would posit that the majority of developers are writing code
with the underlying hardware in mind and so allowing behaviours in the
memory model which are counter to how a real machine operates is likely to
make things more confusing, rather than simplifying them!
IIRC, herd has a feature where you can "flag" the result of a litmus test
to highlight certain internal constraint violations (e.g. warning that a
data race is present in a concurrent C11 program). How about we preserve
the existing semantics, but flag any use of "dep; rfi" to indicate that
the ordering guarantees being relied upon are subtle and error-prone, and
therefore should only be considered for fast-path code?
Will
On Tue, 19 Feb 2019, Andrea Parri wrote:
> Remove this subtle (and, AFAICT, unused) ordering: we can add it back,
> if necessary, but let us not encourage people to rely on this thing.
>
> For example, the following "exists" clause can be satisfied with this
> change:
>
> C dep-rfi
>
> { }
>
> P0(int *x, int *y)
> {
> WRITE_ONCE(*x, 1);
> smp_store_release(y, 1);
> }
>
> P1(int *x, int *y, int *z)
> {
> int r0;
> int r1;
> int r2;
>
> r0 = READ_ONCE(*y);
> WRITE_ONCE(*z, r0);
> r1 = smp_load_acquire(z);
> r2 = READ_ONCE(*x);
> }
>
> exists (1:r0=1 /\ 1:r2=0)
>
> Signed-off-by: Andrea Parri <[email protected]>
> Cc: Alan Stern <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Nicholas Piggin <[email protected]>
> Cc: David Howells <[email protected]>
> Cc: Jade Alglave <[email protected]>
> Cc: Luc Maranget <[email protected]>
> Cc: "Paul E. McKenney" <[email protected]>
> Cc: Akira Yokosawa <[email protected]>
> Cc: Daniel Lustig <[email protected]>
> ---
> tools/memory-model/Documentation/explanation.txt | 28 ------------------------
> tools/memory-model/linux-kernel.cat | 2 +-
> 2 files changed, 1 insertion(+), 29 deletions(-)
>
> diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.txt
> index 68caa9a976d0c..965e11744d090 100644
> --- a/tools/memory-model/Documentation/explanation.txt
> +++ b/tools/memory-model/Documentation/explanation.txt
> @@ -1019,34 +1019,6 @@ section for more details). The kernel includes a workaround for this
> problem when the loads come from READ_ONCE(), and therefore the LKMM
> includes address dependencies to loads in the ppo relation.
>
> -On the other hand, dependencies can indirectly affect the ordering of
> -two loads. This happens when there is a dependency from a load to a
> -store and a second, po-later load reads from that store:
> -
> - R ->dep W ->rfi R',
> -
> -where the dep link can be either an address or a data dependency. In
> -this situation we know it is possible for the CPU to execute R' before
> -W, because it can forward the value that W will store to R'. But it
> -cannot execute R' before R, because it cannot forward the value before
> -it knows what that value is, or that W and R' do access the same
> -location. However, if there is merely a control dependency between R
> -and W then the CPU can speculatively forward W to R' before executing
> -R; if the speculation turns out to be wrong then the CPU merely has to
> -restart or abandon R'.
> -
> -(In theory, a CPU might forward a store to a load when it runs across
> -an address dependency like this:
> -
> - r1 = READ_ONCE(ptr);
> - WRITE_ONCE(*r1, 17);
> - r2 = READ_ONCE(*r1);
> -
> -because it could tell that the store and the second load access the
> -same location even before it knows what the location's address is.
> -However, none of the architectures supported by the Linux kernel do
> -this.)
> -
> Two memory accesses of the same location must always be executed in
> program order if the second access is a store. Thus, if we have
If we do decide to remove the (dep;rfi) term, I would prefer not to
erase this discussion from explanation.txt completely. Instead, we
should explain that the operational model predicts this ordering but it
has been left out of the LKMM for practical (not technical) reasons.
Alan
On Wed, 20 Feb 2019, Will Deacon wrote:
> Whilst I completely agree that relying on the ordering provided by "dep ;
> rfi" is subtle and error prone, having it forbid the outcome above appeals
> to a hardware-based mindset of how memory ordering works. In the kernel
> community, I would posit that the majority of developers are writing code
> with the underlying hardware in mind and so allowing behaviours in the
> memory model which are counter to how a real machine operates is likely to
> make things more confusing, rather than simplifying them!
>
> IIRC, herd has a feature where you can "flag" the result of a litmus test
> to highlight certain internal constraint violations (e.g. warning that a
> data race is present in a concurrent C11 program). How about we preserve
> the existing semantics, but flag any use of "dep; rfi" to indicate that
> the ordering guarantees being relied upon are subtle and error-prone, and
> therefore should only be considered for fast-path code?
Unfortunately, herd can't really tell whether a particular ordering is
being _used_; it can only tell when the ordering is present. Therefore
such a flag would be prone to false positives.
Alan
> What I do object to is a model that's weaker than any possible sane
> hardware.
Not the first time I hear you calling this out. And inevitably, every
time, other slogans come to my mind: "C is not an assembly language",
"No features (ordering) without users", ...
For the record, I won't try to push this patch further; I also have no
plans to touch herd7 internals in order to add the ad-hoc flag for the
(dep ; rfi) thing. (Maybe others will/can step in here.)
In the meantime, the hope (admittedly, probably vain) is that this RFC
could serve as a further warning or as a reference to those developers
who are quivering to use (dep ; rfi): enjoy it, be careful.
Andrea
On Fri, Feb 22, 2019 at 12:21:28PM +0100, Andrea Parri wrote:
> > What I do object to is a model that's weaker than any possible sane
> > hardware.
>
> Not the first time I hear you calling this out. And inevitably, every
> time, other slogans come to my mind: "C is not an assembly language",
But it bloody well should be ;-)
On Fri, Feb 22, 2019 at 02:00:14PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 22, 2019 at 12:21:28PM +0100, Andrea Parri wrote:
> > > What I do object to is a model that's weaker than any possible sane
> > > hardware.
> >
> > Not the first time I hear you calling this out. And inevitably, every
> > time, other slogans come to my mind: "C is not an assembly language",
>
> But it bloody well should be ;-)
Yeah, we had some debates about this sort of thing at the C++ Standards
Committee meeting last week.
Pointer provenance and concurrent algorithms, though for once not
affecting RCU! We might actually be on the road to a fix that preserves
the relevant optimizations while still allowing most (if not all) existing
concurrent C/C++ code to continue working correctly. (The current thought
is that loads and stores involving inline assembly, C/C++ atomics, or
volatile get their provenance stripped. There may need to be some other
mechanisms for plain C-language loads and stores in some cases as well.)
But if you know of any code in the Linux kernel that needs to compare
pointers, one of which might be in the process of being freed, please
do point me at it. I thought that the smp_call_function() code fit,
but it avoids the problem because only the sending CPU is allowed to
push onto the stack of pending smp_call_function() invocations.
That same concurrent linked stack pattern using cmpxchg() to atomically
push and xchg() to atomically pop the full list -would- have this problem.
The old pointer passed to cmpxchg() might reference an object that was
freed between the time that the old pointer was loaded and the time
that the cmpxchg() executed. One way to avoid this is to do the push
operation in an RCU read-side critical section and use kfree_rcu()
instead of kfree(). Of course, code in the idle loop or that might
run on offline CPUs cannot use RCU, plus some data structures are not
happy with kfree_rcu() delays, so...
Again, if you know of any code in the Linux kernel that would have
problems with aggressive optimizations based on pointer provenance,
please let me know!
Thanx, Paul
On Mon, Feb 25, 2019 at 09:55:17AM -0800, Paul E. McKenney wrote:
> Again, if you know of any code in the Linux kernel that would have
> problems with aggressive optimizations based on pointer provenance,
> please let me know!
I've no clue what pointer provenance is to begin with.
On Mon, Feb 25, 2019 at 09:55:17AM -0800, Paul E. McKenney wrote:
> But if you know of any code in the Linux kernel that needs to compare
> pointers, one of which might be in the process of being freed, please
> do point me at it.
I'm having the utmost difficulty of understanding why that would be a
problem. It's just a value. Freeing memory does not affect the actual
memory or any pointers to it.
*confusion*
None of this makes any kind of sense.
On Tue, Feb 26, 2019 at 10:30:09AM +0100, Peter Zijlstra wrote:
> On Mon, Feb 25, 2019 at 09:55:17AM -0800, Paul E. McKenney wrote:
> > But if you know of any code in the Linux kernel that needs to compare
> > pointers, one of which might be in the process of being freed, please
> > do point me at it.
>
> I'm having the utmost difficulty of understanding why that would be a
> problem. It's just a value. Freeing memory does not affect the actual
> memory or any pointers to it.
>
> *confusion*
>
> None of this makes any kind of sense.
I found and started to read:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2311.pdf
and that's all massive bong-hits. That's utterly insane.
Even the proposed semantics are crazy; they include UB and are therefore
broken on principle. But also; the provenance ID has words like:
"allocated storage duration", how is that well defined in the context of
concurrent execution?
Also, isn't the kernel filled with inter-object pointer arithmetic?
PNVI also looks like something that simply cannot work; how are we at
compile time supposed to know the active (concurrent modified) heap
layout. What is a 'live' object?
On Tue, Feb 26, 2019 at 11:45:51AM +0100, Peter Zijlstra wrote:
> On Tue, Feb 26, 2019 at 10:30:09AM +0100, Peter Zijlstra wrote:
> > On Mon, Feb 25, 2019 at 09:55:17AM -0800, Paul E. McKenney wrote:
> > > But if you know of any code in the Linux kernel that needs to compare
> > > pointers, one of which might be in the process of being freed, please
> > > do point me at it.
> >
> > I'm having the utmost difficulty of understanding why that would be a
> > problem. It's just a value. Freeing memory does not affect the actual
> > memory or any pointers to it.
> >
> > *confusion*
> >
> > None of this makes any kind of sense.
>
> I found and started to read:
>
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2311.pdf
>
> and that's all massive bong-hits. That's utterly insane.
>
> Even the proposed semantics are crazy; they include UB and are therefore
> broken on principle. But also; the provenance ID has words like:
> "allocated storage duration", how is that well defined in the context of
> concurrent execution?
>
> Also, isn't the kernel filled with inter-object pointer arithmetic?
>
> PNVI also looks like something that simply cannot work; how are we at
> compile time supposed to know the active (concurrent modified) heap
> layout. What is a 'live' object?
Also; we need to find a GCC person to find/give us a knob to kill this
entire class of nonsense. This is just horrible broken shit:
~/tmp# gcc -O2 -fno-strict-aliasing -o ptr ptr.c ; ./ptr
p=0x5635dd3d5034 q=0x5635dd3d5034
x=1 y=2 *p=11 *q=2
~/tmp# cat ptr.c
#include <stdio.h>
#include <string.h>
int y = 2, x = 1;
int main (int argc, char **argv) {
int *p = &x + argc;
int *q = &y;
printf("p=%p q=%p\n", p, q);
if (!memcmp(&p, &q, sizeof(p))) {
*p = 11;
printf("x=%d y=%d *p=%d *q=%d\n", x, y, *p, *q);
}
}
On Tue, Feb 26, 2019 at 12:21:33PM +0100, Peter Zijlstra wrote:
> Also; we need to find a GCC person to find/give us a knob to kill this
> entire class of nonsense. This is just horrible broken shit:
>
>
> ~/tmp# gcc -O2 -fno-strict-aliasing -o ptr ptr.c ; ./ptr
> p=0x5635dd3d5034 q=0x5635dd3d5034
> x=1 y=2 *p=11 *q=2
> ~/tmp# cat ptr.c
> #include <stdio.h>
> #include <string.h>
> int y = 2, x = 1;
> int main (int argc, char **argv) {
> int *p = &x + argc;
damn; wrong version; that should've been: s/argc/1/
same result though.
> int *q = &y;
> printf("p=%p q=%p\n", p, q);
> if (!memcmp(&p, &q, sizeof(p))) {
> *p = 11;
> printf("x=%d y=%d *p=%d *q=%d\n", x, y, *p, *q);
> }
> }
On Tue, Feb 26, 2019 at 12:25:21PM +0100, Peter Zijlstra wrote:
> On Tue, Feb 26, 2019 at 12:21:33PM +0100, Peter Zijlstra wrote:
>
> > Also; we need to find a GCC person to find/give us a knob to kill this
> > entire class of nonsense. This is just horrible broken shit:
> >
> >
> > ~/tmp# gcc -O2 -fno-strict-aliasing -o ptr ptr.c ; ./ptr
> > p=0x5635dd3d5034 q=0x5635dd3d5034
> > x=1 y=2 *p=11 *q=2
> > ~/tmp# cat ptr.c
> > #include <stdio.h>
> > #include <string.h>
> > int y = 2, x = 1;
> > int main (int argc, char **argv) {
> > int *p = &x + argc;
>
> damn; wrong version; that should've been: s/argc/1/
> same result though.
Note that gcc (as used above) was:
gcc (Debian 7.3.0-3) 7.3.0
When I used the argc variant, gcc-8 'works', but with s/argc/1/ it is
still broken.
gcc-8 (Debian 8.2.0-21) 8.2.0
gcc (Debian 7.3.0-3) 7.3.0
gcc-6 (Debian 6.4.0-12) 6.4.0 20180123
gcc-5 (Debian 5.5.0-8) 5.5.0 20171010
gcc-4.6 (Debian 4.6.3-14) 4.6.3
all generate the exact same broken crap with the '+ 1' variant.
> > int *q = &y;
> > printf("p=%p q=%p\n", p, q);
> > if (!memcmp(&p, &q, sizeof(p))) {
> > *p = 11;
> > printf("x=%d y=%d *p=%d *q=%d\n", x, y, *p, *q);
> > }
> > }
On Tue, Feb 26, 2019 at 12:30:08PM +0100, Peter Zijlstra wrote:
> When I used the argc variant, gcc-8 'works', but with s/argc/1/ it is
> still broken.
As requested on IRC:
$ gcc --version
gcc (SUSE Linux) 4.8.5
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ cat ptr.c
#include <stdio.h>
#include <string.h>
int y = 2, x = 1;
int main (int argc, char **argv) {
int *p = &x + 1;
int *q = &y;
printf("p=%p q=%p\n", p, q);
if (!memcmp(&p, &q, sizeof(p))) {
*p = 11;
printf("x=%d y=%d *p=%d *q=%d\n", x, y, *p, *q);
$ gcc -O2 -fno-strict-aliasing -o ptr ptr.c ; ./ptr
p=0x601044 q=0x601044
x=1 y=2 *p=11 *q=2
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Tue, Feb 26, 2019 at 12:38:13PM +0100, Borislav Petkov wrote:
> On Tue, Feb 26, 2019 at 12:30:08PM +0100, Peter Zijlstra wrote:
> > When I used the argc variant, gcc-8 'works', but with s/argc/1/ it is
> > still broken.
>
> As requested on IRC:
What I asked was if you could get your GCC developer friends to have a
look at this :-)
On Tue, Feb 26, 2019 at 02:49:06PM +0100, Peter Zijlstra wrote:
> On Tue, Feb 26, 2019 at 12:38:13PM +0100, Borislav Petkov wrote:
> > On Tue, Feb 26, 2019 at 12:30:08PM +0100, Peter Zijlstra wrote:
> > > When I used the argc variant, gcc-8 'works', but with s/argc/1/ it is
> > > still broken.
> >
> > As requested on IRC:
>
> What I asked was if you could get your GCC developer friends to have a
> look at this :-)
Yes, this all is a bit on the insane side from a kernel viewpoint.
But the paper you found does not impose this; it has instead been there
for about 20 years, back before C and C++ admitted to the existence
of concurrency. But of course compilers are getting more aggressive,
and yes, some of the problems show up in single-threaded code.
The usual response is "then cast the pointers to intptr_t!" but of
course that breaks type checking.
There is an effort to claw back the concurrency pieces, and I would
be happy to run the resulting paper past you guys.
I must confess to not being all that sympathetic to code that takes
advantage of happenstance stack-frame layout. Is there some reason
we need that?
Thanx, Paul
On Tue, Feb 26, 2019 at 06:28:45AM -0800, Paul E. McKenney wrote:
> I must confess to not being all that sympathetic to code that takes
> advantage of happenstance stack-frame layout. Is there some reason
> we need that?
Not that I'm aware; but if it gets this 'obvious' case wrong, I worry
what else it gets wrong.
At the very least we should get this fixed and compile a kernel with the
fixed compiler to see what (if anything) changes in the generated code
and analyse the changes (if any) to make sure we were ok (or not).
I mean; yes that example is UB, but the result is also clearly batshit
insane.
Hi Paul,
On Tue, 26 Feb 2019 06:28:45 -0800, Paul E. McKenney wrote:
> On Tue, Feb 26, 2019 at 02:49:06PM +0100, Peter Zijlstra wrote:
>> On Tue, Feb 26, 2019 at 12:38:13PM +0100, Borislav Petkov wrote:
>>> On Tue, Feb 26, 2019 at 12:30:08PM +0100, Peter Zijlstra wrote:
>>>> When I used the argc variant, gcc-8 'works', but with s/argc/1/ it is
>>>> still broken.
>>>
>>> As requested on IRC:
>>
>> What I asked was if you could get your GCC developer friends to have a
>> look at this :-)
>
> Yes, this all is a bit on the insane side from a kernel viewpoint.
> But the paper you found does not impose this; it has instead been there
> for about 20 years, back before C and C++ admitted to the existence
> of concurrency.
By "it", do you mean the concept of "pointer provenance"?
I'm asking because the paper's header reads:
"ISO/IEC JTC1/SC22/WG14 N2311, 2018-11-09"
Just wanted to make sure.
Thanks, Akira
> But of course compilers are getting more aggressive,
> and yes, some of the problems show up in single-threaded code.
>
> The usual response is "then cast the pointers to intptr_t!" but of
> course that breaks type checking.
>
> There is an effort to claw back the concurrency pieces, and I would
> be happy to run the resulting paper past you guys.
>
> I must confess to not being all that sympathetic to code that takes
> advantage of happenstance stack-frame layout. Is there some reason
> we need that?
>
> Thanx, Paul
>
On Tue, Feb 26, 2019 at 11:56:57PM +0900, Akira Yokosawa wrote:
> Hi Paul,
>
> On Tue, 26 Feb 2019 06:28:45 -0800, Paul E. McKenney wrote:
> > On Tue, Feb 26, 2019 at 02:49:06PM +0100, Peter Zijlstra wrote:
> >> On Tue, Feb 26, 2019 at 12:38:13PM +0100, Borislav Petkov wrote:
> >>> On Tue, Feb 26, 2019 at 12:30:08PM +0100, Peter Zijlstra wrote:
> >>>> When I used the argc variant, gcc-8 'works', but with s/argc/1/ it is
> >>>> still broken.
> >>>
> >>> As requested on IRC:
> >>
> >> What I asked was if you could get your GCC developer friends to have a
> >> look at this :-)
> >
> > Yes, this all is a bit on the insane side from a kernel viewpoint.
> > But the paper you found does not impose this; it has instead been there
> > for about 20 years, back before C and C++ admitted to the existence
> > of concurrency.
>
> By "it", do you mean the concept of "pointer provenance"?
>
> I'm asking because the paper's header reads:
>
> "ISO/IEC JTC1/SC22/WG14 N2311, 2018-11-09"
>
> Just wanted to make sure.
This paper introduces neither pointer provenance nor indeterminate-on-free,
but rather proposes modification. These things have been around for a
few decades.
Thanx, Paul
> Thanks, Akira
>
> > But of course compilers are getting more aggressive,
> > and yes, some of the problems show up in single-threaded code.
> >
> > The usual response is "then cast the pointers to intptr_t!" but of
> > course that breaks type checking.
> >
> > There is an effort to claw back the concurrency pieces, and I would
> > be happy to run the resulting paper past you guys.
> >
> > I must confess to not being all that sympathetic to code that takes
> > advantage of happenstance stack-frame layout. Is there some reason
> > we need that?
> >
> > Thanx, Paul
> >
>
On Tue, Feb 26, 2019 at 06:28:45AM -0800, Paul E. McKenney wrote:
> Yes, this all is a bit on the insane side from a kernel viewpoint.
> But the paper you found does not impose this; it has instead been there
> for about 20 years, back before C and C++ admitted to the existence
> of concurrency. But of course compilers are getting more aggressive,
> and yes, some of the problems show up in single-threaded code.
But that paper is from last year!! It has Peter Sewell on, I'm sure he's
heard of concurrency.
> The usual response is "then cast the pointers to intptr_t!" but of
> course that breaks type checking.
I tried laundering the pointer through intptr_t, but I can't seem to
unbreak it.
root@ivb-ep:~/tmp# gcc-8 -O2 -fno-strict-aliasing -o ptr ptr.c ; ./ptr
p=0x55aacdc80034 q=0x55aacdc80034
x=1 y=2 *p=11 *q=2
root@ivb-ep:~/tmp# cat ptr.c
#include <stdio.h>
#include <string.h>
#include <stdint.h>
int y = 2, x = 1;
int main (int argc, char **argv) {
intptr_t P = (intptr_t)&x;
intptr_t Q = (intptr_t)&y;
P += sizeof(int);
int *q = &y;
printf("p=%p q=%p\n", (int*)P, (int*)Q);
if (P == Q) {
int *p = (int *)P;
*p = 11;
printf("x=%d y=%d *p=%d *q=%d\n", x, y, *p, *q);
}
}
On Tue, 26 Feb 2019 07:04:27 -0800, Paul E. McKenney wrote:
> On Tue, Feb 26, 2019 at 11:56:57PM +0900, Akira Yokosawa wrote:
>> Hi Paul,
>>
>> On Tue, 26 Feb 2019 06:28:45 -0800, Paul E. McKenney wrote:
>>> On Tue, Feb 26, 2019 at 02:49:06PM +0100, Peter Zijlstra wrote:
>>>> On Tue, Feb 26, 2019 at 12:38:13PM +0100, Borislav Petkov wrote:
>>>>> On Tue, Feb 26, 2019 at 12:30:08PM +0100, Peter Zijlstra wrote:
>>>>>> When I used the argc variant, gcc-8 'works', but with s/argc/1/ it is
>>>>>> still broken.
>>>>>
>>>>> As requested on IRC:
>>>>
>>>> What I asked was if you could get your GCC developer friends to have a
>>>> look at this :-)
>>>
>>> Yes, this all is a bit on the insane side from a kernel viewpoint.
>>> But the paper you found does not impose this; it has instead been there
>>> for about 20 years, back before C and C++ admitted to the existence
>>> of concurrency.
>>
>> By "it", do you mean the concept of "pointer provenance"?
>>
>> I'm asking because the paper's header reads:
>>
>> "ISO/IEC JTC1/SC22/WG14 N2311, 2018-11-09"
>>
>> Just wanted to make sure.
>
> This paper introduces neither pointer provenance nor indeterminate-on-free,
> but rather proposes modification. These things have been around for a
> few decades.
Got it!
Thank, Akira
>
> Thanx, Paul
>
>> Thanks, Akira
>>
>>> But of course compilers are getting more aggressive,
>>> and yes, some of the problems show up in single-threaded code.
>>>
>>> The usual response is "then cast the pointers to intptr_t!" but of
>>> course that breaks type checking.
>>>
>>> There is an effort to claw back the concurrency pieces, and I would
>>> be happy to run the resulting paper past you guys.
>>>
>>> I must confess to not being all that sympathetic to code that takes
>>> advantage of happenstance stack-frame layout. Is there some reason
>>> we need that?
>>>
>>> Thanx, Paul
>>>
>>
>
On Tue, Feb 26, 2019 at 03:47:30PM +0100, Peter Zijlstra wrote:
> On Tue, Feb 26, 2019 at 06:28:45AM -0800, Paul E. McKenney wrote:
> > I must confess to not being all that sympathetic to code that takes
> > advantage of happenstance stack-frame layout. Is there some reason
> > we need that?
>
> Not that I'm aware; but if it gets this 'obvious' case wrong, I worry
> what else it gets wrong.
>
> At the very least we should get this fixed and compile a kernel with the
> fixed compiler to see what (if anything) changes in the generated code
> and analyse the changes (if any) to make sure we were ok (or not).
>
> I mean; yes that example is UB, but the result is also clearly batshit
> insane.
My understanding is that their goal is better analysis of pointer
aliasing, and that this case is a side effect.
Thanx, Paul
Hello Peter,
On Tue, 26 Feb 2019 14:49:06 +0100, Peter Zijlstra wrote:
> On Tue, Feb 26, 2019 at 12:38:13PM +0100, Borislav Petkov wrote:
>> On Tue, Feb 26, 2019 at 12:30:08PM +0100, Peter Zijlstra wrote:
>>> When I used the argc variant, gcc-8 'works', but with s/argc/1/ it is
>>> still broken.
>>
>> As requested on IRC:
>
> What I asked was if you could get your GCC developer friends to have a
> look at this :-)
>
JFYI, there is a bugzilla ticket regarding this behavior of GCC
at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502,
which started on 13 June 2014 and the latest entry was on
02 Feb 2019.
Thanks, Akira
On Sun, Mar 03, 2019 at 12:27:12AM +0900, Akira Yokosawa wrote:
> Hello Peter,
>
> On Tue, 26 Feb 2019 14:49:06 +0100, Peter Zijlstra wrote:
> > On Tue, Feb 26, 2019 at 12:38:13PM +0100, Borislav Petkov wrote:
> >> On Tue, Feb 26, 2019 at 12:30:08PM +0100, Peter Zijlstra wrote:
> >>> When I used the argc variant, gcc-8 'works', but with s/argc/1/ it is
> >>> still broken.
> >>
> >> As requested on IRC:
> >
> > What I asked was if you could get your GCC developer friends to have a
> > look at this :-)
>
> JFYI, there is a bugzilla ticket regarding this behavior of GCC
> at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502,
> which started on 13 June 2014 and the latest entry was on
> 02 Feb 2019.
And that bug was submitted by none other than Peter Sewell, thank you!
This bug references https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88775,
where someone appears to -want- this sort of optimization. :-/
Thanx, Paul
On Tue, 26 Feb 2019 16:04:50 +0100, Peter Zijlstra wrote:
> On Tue, Feb 26, 2019 at 06:28:45AM -0800, Paul E. McKenney wrote:
>
>> Yes, this all is a bit on the insane side from a kernel viewpoint.
>> But the paper you found does not impose this; it has instead been there
>> for about 20 years, back before C and C++ admitted to the existence
>> of concurrency. But of course compilers are getting more aggressive,
>> and yes, some of the problems show up in single-threaded code.
>
> But that paper is from last year!! It has Peter Sewell on, I'm sure he's
> heard of concurrency.
>
>> The usual response is "then cast the pointers to intptr_t!" but of
>> course that breaks type checking.
>
> I tried laundering the pointer through intptr_t, but I can't seem to
> unbreak it.
>
>
> root@ivb-ep:~/tmp# gcc-8 -O2 -fno-strict-aliasing -o ptr ptr.c ; ./ptr
> p=0x55aacdc80034 q=0x55aacdc80034
> x=1 y=2 *p=11 *q=2
> root@ivb-ep:~/tmp# cat ptr.c
> #include <stdio.h>
> #include <string.h>
> #include <stdint.h>
> int y = 2, x = 1;
> int main (int argc, char **argv) {
> intptr_t P = (intptr_t)&x;
> intptr_t Q = (intptr_t)&y;
> P += sizeof(int);
> int *q = &y;
> printf("p=%p q=%p\n", (int*)P, (int*)Q);
> if (P == Q) {
> int *p = (int *)P;
> *p = 11;
> printf("x=%d y=%d *p=%d *q=%d\n", x, y, *p, *q);
> }
> }
>
So, I'm looking at the macro RELOC_HIDE() defined in include/linux/compiler-gcc.h.
It says:
--------
/*
* This macro obfuscates arithmetic on a variable address so that gcc
* shouldn't recognize the original var, and make assumptions about it.
*
* This is needed because the C standard makes it undefined to do
* pointer arithmetic on "objects" outside their boundaries and the
* gcc optimizers assume this is the case. In particular they
* assume such arithmetic does not wrap.
*
[...]
*/
#define RELOC_HIDE(ptr, off) \
({ \
unsigned long __ptr; \
__asm__ ("" : "=r"(__ptr) : "0"(ptr)); \
(typeof(ptr)) (__ptr + (off)); \
})
--------
Looks like this macro has existed ever since the origin of Linus' git repo.
And the optimization "bug" discussed in this thread can be suppressed by
this macro.
For example,
$ gcc -O2 -o reloc_hide reloc_hide.c; ./reloc_hide
x=1 y=11 *p=11 *q=11
$ cat reloc_hide.c
#include <stdio.h>
#include <stdint.h>
#define RELOC_HIDE(ptr, off) \
({ \
uintptr_t __ptr; \
__asm__ ("" : "=r"(__ptr) : "0"(ptr)); \
(typeof(ptr)) (__ptr + (off)); \
})
int y = 2, x = 1;
int main (int argc, char **argv) {
int *p = RELOC_HIDE(&x, sizeof(*p));
int *q = RELOC_HIDE(&y, 0);
if (p == q) {
*p = 11;
printf("x=%d y=%d *p=%d *q=%d\n", x, y, *p, *q);
}
}
Note that "uintptr_t" is used in this version of RELOC_HIDE() for user-land
code.
Am I the only one who was not aware of this gcc-specific macro?
Thanks, Akira
On Thu, Mar 07, 2019 at 12:46:05AM +0900, Akira Yokosawa wrote:
> So, I'm looking at the macro RELOC_HIDE() defined in include/linux/compiler-gcc.h.
> Am I the only one who was not aware of this gcc-specific macro?
It's one I regularly see, but had forgotten about in this context.
However; you can also fix things by adding asm volatile ("":::"memory");
in places.
But that's not really the point; I would really rather have a cmdline
knob to fix things. That way we can compile the kernel with and without
and look for differences. -fno-unicorns or something :-)
While I understand some compiler people revel in UB and love to make
unicorns happen, I think in this case the produces result is utterly
insane.
On Wed, Mar 06, 2019 at 05:58:31PM +0100, Peter Zijlstra wrote:
> On Thu, Mar 07, 2019 at 12:46:05AM +0900, Akira Yokosawa wrote:
> > So, I'm looking at the macro RELOC_HIDE() defined in include/linux/compiler-gcc.h.
>
> > Am I the only one who was not aware of this gcc-specific macro?
>
> It's one I regularly see, but had forgotten about in this context.
>
> However; you can also fix things by adding asm volatile ("":::"memory");
> in places.
>
> But that's not really the point; I would really rather have a cmdline
> knob to fix things. That way we can compile the kernel with and without
> and look for differences. -fno-unicorns or something :-)
>
> While I understand some compiler people revel in UB and love to make
> unicorns happen, I think in this case the produces result is utterly
> insane.
Because I couldn't resist...
https://raphlinus.github.io/assets/Anything_is_Possible_scaled.jpg
Thanx, Paul
On Thu, Mar 07, 2019 at 12:46:05AM +0900, Akira Yokosawa wrote:
> On Tue, 26 Feb 2019 16:04:50 +0100, Peter Zijlstra wrote:
> > On Tue, Feb 26, 2019 at 06:28:45AM -0800, Paul E. McKenney wrote:
> >
> >> Yes, this all is a bit on the insane side from a kernel viewpoint.
> >> But the paper you found does not impose this; it has instead been there
> >> for about 20 years, back before C and C++ admitted to the existence
> >> of concurrency. But of course compilers are getting more aggressive,
> >> and yes, some of the problems show up in single-threaded code.
> >
> > But that paper is from last year!! It has Peter Sewell on, I'm sure he's
> > heard of concurrency.
> >
> >> The usual response is "then cast the pointers to intptr_t!" but of
> >> course that breaks type checking.
> >
> > I tried laundering the pointer through intptr_t, but I can't seem to
> > unbreak it.
> >
> >
> > root@ivb-ep:~/tmp# gcc-8 -O2 -fno-strict-aliasing -o ptr ptr.c ; ./ptr
> > p=0x55aacdc80034 q=0x55aacdc80034
> > x=1 y=2 *p=11 *q=2
> > root@ivb-ep:~/tmp# cat ptr.c
> > #include <stdio.h>
> > #include <string.h>
> > #include <stdint.h>
> > int y = 2, x = 1;
> > int main (int argc, char **argv) {
> > intptr_t P = (intptr_t)&x;
> > intptr_t Q = (intptr_t)&y;
> > P += sizeof(int);
> > int *q = &y;
> > printf("p=%p q=%p\n", (int*)P, (int*)Q);
> > if (P == Q) {
> > int *p = (int *)P;
> > *p = 11;
> > printf("x=%d y=%d *p=%d *q=%d\n", x, y, *p, *q);
> > }
> > }
> >
>
> So, I'm looking at the macro RELOC_HIDE() defined in include/linux/compiler-gcc.h.
>
> It says:
>
> --------
> /*
> * This macro obfuscates arithmetic on a variable address so that gcc
> * shouldn't recognize the original var, and make assumptions about it.
> *
> * This is needed because the C standard makes it undefined to do
> * pointer arithmetic on "objects" outside their boundaries and the
> * gcc optimizers assume this is the case. In particular they
> * assume such arithmetic does not wrap.
> *
> [...]
> */
> #define RELOC_HIDE(ptr, off) \
> ({ \
> unsigned long __ptr; \
> __asm__ ("" : "=r"(__ptr) : "0"(ptr)); \
> (typeof(ptr)) (__ptr + (off)); \
> })
> --------
>
> Looks like this macro has existed ever since the origin of Linus' git repo.
>
> And the optimization "bug" discussed in this thread can be suppressed by
> this macro.
>
> For example,
>
> $ gcc -O2 -o reloc_hide reloc_hide.c; ./reloc_hide
> x=1 y=11 *p=11 *q=11
> $ cat reloc_hide.c
> #include <stdio.h>
> #include <stdint.h>
>
> #define RELOC_HIDE(ptr, off) \
> ({ \
> uintptr_t __ptr; \
> __asm__ ("" : "=r"(__ptr) : "0"(ptr)); \
> (typeof(ptr)) (__ptr + (off)); \
> })
>
> int y = 2, x = 1;
> int main (int argc, char **argv) {
> int *p = RELOC_HIDE(&x, sizeof(*p));
> int *q = RELOC_HIDE(&y, 0);
> if (p == q) {
> *p = 11;
> printf("x=%d y=%d *p=%d *q=%d\n", x, y, *p, *q);
> }
> }
>
> Note that "uintptr_t" is used in this version of RELOC_HIDE() for user-land
> code.
>
> Am I the only one who was not aware of this gcc-specific macro?
I have seen it before, but had forgotten it. ;-)
But people on the committee seem to agree that inline assembly should
"launder" pointers, along with atomic and volatile accesses. The case
of revalidating pointers fetched during a previous critical section for
a given lock is very much in play, but then again, we don't have any
known good use cases identified.
Thanx, Paul