2011-02-15 00:43:15

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 0/2] jump label: 2.6.38 updates

On Mon, Feb 14, 2011 at 06:29:47PM -0500, Mathieu Desnoyers wrote:
> [ added Segher Boessenkool and Paul Mackerras to CC list ]
>
> * Paul E. McKenney ([email protected]) wrote:
> > On Mon, Feb 14, 2011 at 06:03:01PM -0500, Mathieu Desnoyers wrote:
> > > * Matt Fleming ([email protected]) wrote:
> > > > On Mon, 14 Feb 2011 13:46:00 -0800 (PST)
> > > > David Miller <[email protected]> wrote:
> > > >
> > > > > From: Steven Rostedt <[email protected]>
> > > > > Date: Mon, 14 Feb 2011 16:39:36 -0500
> > > > >
> > > > > > Thus it is not about global, as global is updated by normal means
> > > > > > and will update the caches. atomic_t is updated via the ll/sc that
> > > > > > ignores the cache and causes all this to break down. IOW... broken
> > > > > > hardware ;)
> > > > >
> > > > > I don't see how cache coherency can possibly work if the hardware
> > > > > behaves this way.
> > > >
> > > > Cache coherency is still maintained provided writes/reads both go
> > > > through the cache ;-)
> > > >
> > > > The problem is that for read-modify-write operations the arbitration
> > > > logic that decides who "wins" and is allowed to actually perform the
> > > > write, assuming two or more CPUs are competing for a single memory
> > > > address, is not implemented in the cache controller, I think. I'm not a
> > > > hardware engineer and I never understood how the arbitration logic
> > > > worked but I'm guessing that's the reason that the ll/sc instructions
> > > > bypass the cache.
> > > >
> > > > Which is why the atomic_t functions worked out really well for that
> > > > arch, such that any accesses to an atomic_t * had to go through the
> > > > wrapper functions.
> >
> > ???
> >
> > What CPU family are we talking about here? For cache coherent CPUs,
> > cache coherence really is supposed to work, even for mixed atomic and
> > non-atomic instructions to the same variable.
> >
>
> I'm really curious to know which CPU families too. I've used git blame
> to see where these lwz/stw instructions were added to powerpc, and it
> points to:

But lwz and stw instructions are normal non-atomic PowerPC loads and
stores. No LL/SC -- those would instead be lwarx and stwcx.

Thanx, Paul

> commit 9f0cbea0d8cc47801b853d3c61d0e17475b0cc89
> Author: Segher Boessenkool <[email protected]>
> Date: Sat Aug 11 10:15:30 2007 +1000
>
> [POWERPC] Implement atomic{, 64}_{read, write}() without volatile
>
> Instead, use asm() like all other atomic operations already do.
>
> Also use inline functions instead of macros; this actually
> improves code generation (some code becomes a little smaller,
> probably because of improved alias information -- just a few
> hundred bytes total on a default kernel build, nothing shocking).
>
> Signed-off-by: Segher Boessenkool <[email protected]>
> Signed-off-by: Paul Mackerras <[email protected]>
>
> So let's ping the relevant people to see if there was any reason for
> making these atomic read/set operations different from other
> architectures in the first place.
>
> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com


2011-02-15 00:51:39

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH 0/2] jump label: 2.6.38 updates

* Paul E. McKenney ([email protected]) wrote:
> On Mon, Feb 14, 2011 at 06:29:47PM -0500, Mathieu Desnoyers wrote:
> > [ added Segher Boessenkool and Paul Mackerras to CC list ]
> >
> > * Paul E. McKenney ([email protected]) wrote:
> > > On Mon, Feb 14, 2011 at 06:03:01PM -0500, Mathieu Desnoyers wrote:
> > > > * Matt Fleming ([email protected]) wrote:
> > > > > On Mon, 14 Feb 2011 13:46:00 -0800 (PST)
> > > > > David Miller <[email protected]> wrote:
> > > > >
> > > > > > From: Steven Rostedt <[email protected]>
> > > > > > Date: Mon, 14 Feb 2011 16:39:36 -0500
> > > > > >
> > > > > > > Thus it is not about global, as global is updated by normal means
> > > > > > > and will update the caches. atomic_t is updated via the ll/sc that
> > > > > > > ignores the cache and causes all this to break down. IOW... broken
> > > > > > > hardware ;)
> > > > > >
> > > > > > I don't see how cache coherency can possibly work if the hardware
> > > > > > behaves this way.
> > > > >
> > > > > Cache coherency is still maintained provided writes/reads both go
> > > > > through the cache ;-)
> > > > >
> > > > > The problem is that for read-modify-write operations the arbitration
> > > > > logic that decides who "wins" and is allowed to actually perform the
> > > > > write, assuming two or more CPUs are competing for a single memory
> > > > > address, is not implemented in the cache controller, I think. I'm not a
> > > > > hardware engineer and I never understood how the arbitration logic
> > > > > worked but I'm guessing that's the reason that the ll/sc instructions
> > > > > bypass the cache.
> > > > >
> > > > > Which is why the atomic_t functions worked out really well for that
> > > > > arch, such that any accesses to an atomic_t * had to go through the
> > > > > wrapper functions.
> > >
> > > ???
> > >
> > > What CPU family are we talking about here? For cache coherent CPUs,
> > > cache coherence really is supposed to work, even for mixed atomic and
> > > non-atomic instructions to the same variable.
> > >
> >
> > I'm really curious to know which CPU families too. I've used git blame
> > to see where these lwz/stw instructions were added to powerpc, and it
> > points to:
>
> But lwz and stw instructions are normal non-atomic PowerPC loads and
> stores. No LL/SC -- those would instead be lwarx and stwcx.

Ah, right. Color me confused ;) I think Matt was talking about a secret
"out of tree" architecture. It sure feels like a James Bond movie. :)

Thanks,

Mathieu

>
> Thanx, Paul
>
> > commit 9f0cbea0d8cc47801b853d3c61d0e17475b0cc89
> > Author: Segher Boessenkool <[email protected]>
> > Date: Sat Aug 11 10:15:30 2007 +1000
> >
> > [POWERPC] Implement atomic{, 64}_{read, write}() without volatile
> >
> > Instead, use asm() like all other atomic operations already do.
> >
> > Also use inline functions instead of macros; this actually
> > improves code generation (some code becomes a little smaller,
> > probably because of improved alias information -- just a few
> > hundred bytes total on a default kernel build, nothing shocking).
> >
> > Signed-off-by: Segher Boessenkool <[email protected]>
> > Signed-off-by: Paul Mackerras <[email protected]>
> >
> > So let's ping the relevant people to see if there was any reason for
> > making these atomic read/set operations different from other
> > architectures in the first place.
> >
> > Thanks,
> >
> > Mathieu
> >
> > --
> > Mathieu Desnoyers
> > Operating System Efficiency R&D Consultant
> > EfficiOS Inc.
> > http://www.efficios.com

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com