2009-04-19 19:46:17

by Robert P. J. Day

[permalink] [raw]
Subject: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars


from arch/x86/Kconfig:
...
select HAVE_READQ
select HAVE_WRITEQ
...

yet there are no such defined Kconfig vars anywhere. thoughts?

rday
-


========================================================================
Robert P. J. Day Waterloo, Ontario, CANADA

Linux Consulting, Training and Annoying Kernel Pedantry.

Web page: http://crashcourse.ca
Linked In: http://www.linkedin.com/in/rpjday
Twitter: http://twitter.com/rpjday
========================================================================


2009-04-19 21:12:20

by Roland Dreier

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars

> from arch/x86/Kconfig:
> ...
> select HAVE_READQ
> select HAVE_WRITEQ
> ...
>
> yet there are no such defined Kconfig vars anywhere. thoughts?

git blame shows that this came in from 2c5643b1 ("x86: provide
readq()/writeq() on 32-bit too"). And that commit looks very dubious
indeed to me -- it defines readq() and writeq() in a way that is not
atomic and probably won't generate single 64-bit bus cycles.

Now, many drivers do "#ifndef readq <define my own implementation> #endif"
but exactly what is required is very hardware-dependent, and accessing
32-bit halves in the wrong order may lead to very subtle bugs. For
example, the changelog for e23a59e1 ("niu: Fix readq implementation when
architecture does not provide one.") says:

In particular one of the issues is whether the top 32-bits
or the bottom 32-bits of the 64-bit register should be read
first. There could be side effects, and in fact that is
exactly the problem here.

By coincidence, the 32-bit x86 implementation is actually OK for niu,
but I didn't audit every similar driver, and I don't think any
implementation of readq()/writeq() that generates multiple bus cycles is
suitable in general -- it doesn't meet the requirements of the API.

So I would strongly suggest reverting 2c5643b1 since as far as I can
tell it just sets a trap for subtle bugs that only show up on 32-bit
x86 -- any portable driver still needs to provide readq()/writeq() for
other 32-bit architectures, so it doesn't really help anyone.

- R.

2009-04-19 21:46:44

by Ingo Molnar

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars


* Roland Dreier <[email protected]> wrote:

> > from arch/x86/Kconfig:
> > ...
> > select HAVE_READQ
> > select HAVE_WRITEQ
> > ...
> >
> > yet there are no such defined Kconfig vars anywhere. thoughts?
>
> git blame shows that this came in from 2c5643b1 ("x86: provide
> readq()/writeq() on 32-bit too"). And that commit looks very
> dubious indeed to me -- it defines readq() and writeq() in a way
> that is not atomic and probably won't generate single 64-bit bus
> cycles.

Look at the drivers that define their own wrappers:

#ifndef readq
static inline unsigned long long readq(void __iomem *addr)
{
return readl(addr) | (((unsigned long long)readl(addr + 4)) << 32LL);
}
#endif

... it's the obvious 32-bit semantics for reading a 64-bit value
from an mmio address. We made that available on 32-bit too.

It's being used ... and has been in use for some time. Where's the
problem? readl is serializing on all default-ioremap mmio targets on
x86 so there's no ambiguity in ordering.

> Now, many drivers do "#ifndef readq <define my own implementation>
> #endif" but exactly what is required is very hardware-dependent,
> and accessing 32-bit halves in the wrong order may lead to very
> subtle bugs. For example, the changelog for e23a59e1 ("niu: Fix
> readq implementation when architecture does not provide one.")
> says:
>
> In particular one of the issues is whether the top 32-bits
> or the bottom 32-bits of the 64-bit register should be read
> first. There could be side effects, and in fact that is
> exactly the problem here.
>
> By coincidence, the 32-bit x86 implementation is actually OK for
> niu, but I didn't audit every similar driver, and I don't think
> any implementation of readq()/writeq() that generates multiple bus
> cycles is suitable in general -- it doesn't meet the requirements
> of the API.
>
> So I would strongly suggest reverting 2c5643b1 since as far as I
> can tell it just sets a trap for subtle bugs that only show up on
> 32-bit x86 [...]

Heh. It "only" shows up on the platform that ~80% of all our kernel
testers use? ;-)

> [...] -- any portable driver still needs to provide
> readq()/writeq() for other 32-bit architectures, so it doesn't
> really help anyone.

So, are you arguing for a per driver definition of readq/writeq? If
so then that does not make much technical sense. If not ... then
what is your technical point?

Your complains are unfounded i think:

Firstly, any such bug would have shown up already.

Secondly, currently there's about 4 drivers in mainline that define
readq/writeq methods: one of them is a very popular x86 driver and
works fine, the other one is niu.c that you just mentioned is fine -
the other two are for PARISC.

Thirdly, a driver simply has no business defining such a low-level
hardware API that just happens not to be implemented on 32-bit (but
is implemented on 64-bit).

Unless you can point to a real breakage that happened in the past ~6
months since this commit has been upstream, i'm not sure what the
fuss is about. Drivers found a hole in our APIs and filled it in an
ad-hoc way - we plugged the hole on x86.

Pretty much the only valid argument to make here is that it should
be filled in on all the other platforms as well - but you cant blame
the x86 commit for that, can you?

So the only beef i have with that commit at the moment is that the
HAVE_READQ / HAVE_WRITEQ Kconfig symbols are still orphan - this
stuff should have been implemented in a tree-wide way long ago.
Note, there's ongoing work regarding that in -mm - i saw related
threads about that, recently. Obviously it take a lot of time to
propagate something across a space of 20 architectures.

Ingo

2009-04-19 22:02:51

by H. Peter Anvin

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars

Ingo Molnar wrote:
>
> Look at the drivers that define their own wrappers:
>
> #ifndef readq
> static inline unsigned long long readq(void __iomem *addr)
> {
> return readl(addr) | (((unsigned long long)readl(addr + 4)) << 32LL);
> }
> #endif
>
> ... it's the obvious 32-bit semantics for reading a 64-bit value
> from an mmio address. We made that available on 32-bit too.
>
> It's being used ... and has been in use for some time. Where's the
> problem? readl is serializing on all default-ioremap mmio targets on
> x86 so there's no ambiguity in ordering.
>

I think his point is that they're not atomic. For what it's worth,
atomic readq()/writeq() *are* possible with any x86-32 CPU which
supports MMX, but it is very costly to do in the kernel since it
involves touching the FPU state.

For the vast number of users, a non-atomic primitive which is available
for both 32- and 64-bit x86 is a win. For a small number of users,
it'll be confusing, and for a very small minority it's going to be
desirable to have the atomic primitive.

The reason the non-atomic is generally fine is because most device
drivers are inherently single-threaded on a per-hardware device basis.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-04-19 22:35:43

by Ingo Molnar

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars


* H. Peter Anvin <[email protected]> wrote:

> Ingo Molnar wrote:
> >
> > Look at the drivers that define their own wrappers:
> >
> > #ifndef readq
> > static inline unsigned long long readq(void __iomem *addr)
> > {
> > return readl(addr) | (((unsigned long long)readl(addr + 4)) << 32LL);
> > }
> > #endif
> >
> > ... it's the obvious 32-bit semantics for reading a 64-bit value
> > from an mmio address. We made that available on 32-bit too.
> >
> > It's being used ... and has been in use for some time. Where's
> > the problem? readl is serializing on all default-ioremap mmio
> > targets on x86 so there's no ambiguity in ordering.
>
> I think his point is that they're not atomic. [...]

Ok - i didnt really consider atomicity, because that's not really
feasible for a number of reasons anyway:

> [...] For what it's worth, atomic readq()/writeq() *are* possible
> with any x86-32 CPU which supports MMX, but it is very costly to
> do in the kernel since it involves touching the FPU state.
>
> For the vast number of users, a non-atomic primitive which is
> available for both 32- and 64-bit x86 is a win. For a small
> number of users, it'll be confusing, and for a very small minority
> it's going to be desirable to have the atomic primitive.
>
> The reason the non-atomic is generally fine is because most device
> drivers are inherently single-threaded on a per-hardware device
> basis.

Also, atomicity might not be possible to guarantee on the bus level:
say the device sits on a 32-bit PCI bus. (No matter what instruction
the CPU gets, a readq/writeq there has to be done as two 32-bit bus
accesses.)

(Also, even a genuine 64-bit device might be bridged over 32-bit
pathways so a driver cannot really assume atomicity on that level.)

Ingo

2009-04-20 00:54:12

by Roland Dreier

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars

> Look at the drivers that define their own wrappers:
>
> #ifndef readq
> static inline unsigned long long readq(void __iomem *addr)
> {
> return readl(addr) | (((unsigned long long)readl(addr + 4)) << 32LL);
> }
> #endif
>
> ... it's the obvious 32-bit semantics for reading a 64-bit value
> from an mmio address. We made that available on 32-bit too.

But look at, say, drivers/infiniband/hw/amso100/c2.h:

#ifndef readq
static inline u64 readq(const void __iomem * addr)
{
u64 ret = readl(addr + 4);
ret <<= 32;
ret |= readl(addr);

return ret;
}
#endif

Notice that it reads from addr+4 *before* it reads from addr, rather
than after as in your example (and in fact your example depends on
undefined compiler semantics, since there is no sequence point between
the two operands of the | operator). Now, I don't know that hardware,
so I don't know if it makes a difference, but the niu example I gave in
my original email shows that given hardware with clear-on-read
registers, the order does very much matter.

In a similar vein, drivers/infiniband/hw/mthca (which I wrote) deals
with hardware that has 64-bit registers, where we can write in two
32-bit chunks, as long as we have the right order and no other writes to
the same page of registers come in between. So on 32-bit architectures,
the driver must use a spinlock around the pair of 32-bit writes (see
drivers/infiniband/hw/mthca/mthca_doorbell.h for the code). And the
simple fact is that if that driver used "#ifdef writeq" (instead of "#if
BITS_PER_LONG == 64" as it actually does) then it would be broken on
32-bit x86 right now.

> > So I would strongly suggest reverting 2c5643b1 since as far as I
> > can tell it just sets a trap for subtle bugs that only show up on
> > 32-bit x86 [...]

> Heh. It "only" shows up on the platform that ~80% of all our kernel
> testers use? ;-)

Well, most of the drivers using readq()/writeq() are probably driving
"high-end" hardware (InfiniBand, 10G ethernet, "enterprise" SCSI) that
is much more tilted to 64-bit architectures. But yes, such bugs would
probably be seen quickly -- but the effort to debug "works on x86-64,
fails on x86-32 under high load" bugs is pretty big, given that the
symptoms of non-atomic access to a 64-bit register are probably pretty
mysterious (you can read about how the niu bug I mentioned was fixed --
it took a while to zero in on the root cause).

> So, are you arguing for a per driver definition of readq/writeq? If
> so then that does not make much technical sense. If not ... then
> what is your technical point?

Yes, I am arguing for exactly that, because dealing with the semantics
of non-atomic access to 64-bit registers involved low-level knowledge of
the specific hardware being driven.

As it stands 32-bit x86 has readq()/writeq() that are subtly different
subtly different from all other 64-bit architectures, in a way that sets
a booby trap for any driver that uses them. So yes I stick to my
original point that the commit that added them for 32-bit x86 should be
reverted.

- R.

2009-04-20 00:57:18

by Roland Dreier

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars

> Also, atomicity might not be possible to guarantee on the bus level:
> say the device sits on a 32-bit PCI bus. (No matter what instruction
> the CPU gets, a readq/writeq there has to be done as two 32-bit bus
> accesses.)

Well, the conventional PCI devices I know of with 64-bit registers were
PCI-X cards, keyed so they would only fit into a 64-bit slot. And of
course there is no such thing as 32-bit PCI Express.

> (Also, even a genuine 64-bit device might be bridged over 32-bit
> pathways so a driver cannot really assume atomicity on that level.)

I have never even heard of a system with a 64-bit PCI slot that went
through a 32-bit pathway -- in fact I'm not sure how one could build
that.

But yes, for example on 32-bit PowerPC I don't think it's possible to
generate a 64-bit bus transaction in general. So if a device requires
such a cycle then it simply can't work on such a system. But there is
also the case where racing accesses to other registers must be avoided
(the mthca example I gave in my previous example) where the current
32-bit x86 definition is broken, but it could be fixed in a
driver-specific version that used a spinlock.

- R.

2009-04-20 01:20:59

by H. Peter Anvin

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars

Roland Dreier wrote:
>
> Notice that it reads from addr+4 *before* it reads from addr, rather
> than after as in your example (and in fact your example depends on
> undefined compiler semantics, since there is no sequence point between
> the two operands of the | operator). Now, I don't know that hardware,
> so I don't know if it makes a difference, but the niu example I gave in
> my original email shows that given hardware with clear-on-read
> registers, the order does very much matter.
>

At least for x86, the order should be low-high, because that is the
order that those two transactions would be seen on a 32-bit bus
downstream from the CPU if the CPU issued a 64-bit transaction.

The only sane way to handle this as something other than per-driver
hacks would be something like:

#include <linux/io64.h> /* Any 64-bit I/O OK */

#include <linux/io64lh.h> /* Low-high splitting OK */

#include <linux/io64hl.h> /* High-low splitting OK */

#include <linux/io64atomic.h> /* 64-bit I/O must be atomic */

... i.e. letting the driver choose what fallback method it will accept.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-04-20 02:09:13

by Robert Hancock

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars

Roland Dreier wrote:
> > Also, atomicity might not be possible to guarantee on the bus level:
> > say the device sits on a 32-bit PCI bus. (No matter what instruction
> > the CPU gets, a readq/writeq there has to be done as two 32-bit bus
> > accesses.)
>
> Well, the conventional PCI devices I know of with 64-bit registers were
> PCI-X cards, keyed so they would only fit into a 64-bit slot. And of
> course there is no such thing as 32-bit PCI Express.

It's quite possible to use most 64-bit PCI-X cards in a 32-bit slot,
with the 64-bit part of the card not plugged into anything. It's
supposed to work (as long as the card can handle the voltage the slot uses).

2009-04-20 10:53:24

by Ingo Molnar

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars


* H. Peter Anvin <[email protected]> wrote:

> Roland Dreier wrote:
> >
> > Notice that it reads from addr+4 *before* it reads from addr, rather
> > than after as in your example (and in fact your example depends on
> > undefined compiler semantics, since there is no sequence point between
> > the two operands of the | operator). Now, I don't know that hardware,
> > so I don't know if it makes a difference, but the niu example I gave in
> > my original email shows that given hardware with clear-on-read
> > registers, the order does very much matter.
> >
>
> At least for x86, the order should be low-high, because that is the
> order that those two transactions would be seen on a 32-bit bus
> downstream from the CPU if the CPU issued a 64-bit transaction.
>
> The only sane way to handle this as something other than per-driver
> hacks would be something like:
>
> #include <linux/io64.h> /* Any 64-bit I/O OK */
>
> #include <linux/io64lh.h> /* Low-high splitting OK */
>
> #include <linux/io64hl.h> /* High-low splitting OK */
>
> #include <linux/io64atomic.h> /* 64-bit I/O must be atomic */
>
> ... i.e. letting the driver choose what fallback method it will accept.

Yeah - with the default being the natural low-high order.

The other argument is that if a driver really wants some rare, oddly
different order it should better define its own method that is not
named in the same (or in a similar) way as an existing generic API.
Otherwise, confusion will ensue.

Ingo

2009-04-20 14:47:21

by Hitoshi Mitake

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars

On Mon, Apr 20, 2009 at 19:53, Ingo Molnar <[email protected]> wrote:
>
> * H. Peter Anvin <[email protected]> wrote:
>
>> Roland Dreier wrote:
>> >
>> > Notice that it reads from addr+4 *before* it reads from addr, rather
>> > than after as in your example (and in fact your example depends on
>> > undefined compiler semantics, since there is no sequence point between
>> > the two operands of the | operator). ?Now, I don't know that hardware,
>> > so I don't know if it makes a difference, but the niu example I gave in
>> > my original email shows that given hardware with clear-on-read
>> > registers, the order does very much matter.
>> >
>>
>> At least for x86, the order should be low-high, because that is the
>> order that those two transactions would be seen on a 32-bit bus
>> downstream from the CPU if the CPU issued a 64-bit transaction.
>>
>> The only sane way to handle this as something other than per-driver
>> hacks would be something like:
>>
>> #include <linux/io64.h> ? ? ? ? ? ? ? /* Any 64-bit I/O OK */
>>
>> #include <linux/io64lh.h> ? ? /* Low-high splitting OK */
>>
>> #include <linux/io64hl.h> ? ? /* High-low splitting OK */
>>
>> #include <linux/io64atomic.h> /* 64-bit I/O must be atomic */
>>
>> ... i.e. letting the driver choose what fallback method it will accept.
>
> Yeah - with the default being the natural low-high order.
>
> The other argument is that if a driver really wants some rare, oddly
> different order it should better define its own method that is not
> named in the same (or in a similar) way as an existing generic API.
> Otherwise, confusion will ensue.
I think this is a good way.
readq/writeq are already in Linus's tree, removing these is not a good idea.

And I've sent the patch to fix a little problem of Kconfig about
readq/writeq to you.
http://marc.info/?l=linux-kernel&m=123521109218008&w=2
Did you notice?

Adding cautions about accessing order or non-atomic to Kconfig's help
part may be benefit.

2009-04-20 16:03:51

by Ingo Molnar

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars


* Hitoshi Mitake <[email protected]> wrote:

> On Mon, Apr 20, 2009 at 19:53, Ingo Molnar <[email protected]> wrote:
> >
> > * H. Peter Anvin <[email protected]> wrote:
> >
> >> Roland Dreier wrote:
> >> >
> >> > Notice that it reads from addr+4 *before* it reads from addr, rather
> >> > than after as in your example (and in fact your example depends on
> >> > undefined compiler semantics, since there is no sequence point between
> >> > the two operands of the | operator). ?Now, I don't know that hardware,
> >> > so I don't know if it makes a difference, but the niu example I gave in
> >> > my original email shows that given hardware with clear-on-read
> >> > registers, the order does very much matter.
> >> >
> >>
> >> At least for x86, the order should be low-high, because that is the
> >> order that those two transactions would be seen on a 32-bit bus
> >> downstream from the CPU if the CPU issued a 64-bit transaction.
> >>
> >> The only sane way to handle this as something other than per-driver
> >> hacks would be something like:
> >>
> >> #include <linux/io64.h> ? ? ? ? ? ? ? /* Any 64-bit I/O OK */
> >>
> >> #include <linux/io64lh.h> ? ? /* Low-high splitting OK */
> >>
> >> #include <linux/io64hl.h> ? ? /* High-low splitting OK */
> >>
> >> #include <linux/io64atomic.h> /* 64-bit I/O must be atomic */
> >>
> >> ... i.e. letting the driver choose what fallback method it will accept.
> >
> > Yeah - with the default being the natural low-high order.
> >
> > The other argument is that if a driver really wants some rare, oddly
> > different order it should better define its own method that is not
> > named in the same (or in a similar) way as an existing generic API.
> > Otherwise, confusion will ensue.
> I think this is a good way.
> readq/writeq are already in Linus's tree, removing these is not a good idea.
>
> And I've sent the patch to fix a little problem of Kconfig about
> readq/writeq to you.
> http://marc.info/?l=linux-kernel&m=123521109218008&w=2
> Did you notice?
>
> Adding cautions about accessing order or non-atomic to Kconfig's help
> part may be benefit.

It's better to add add such non-interactive help text as Makefile
comments:

#
# This option ...
#

and they should be invisible in make menuconfig. This is a facility
provided by architectures.

Note, the whole patchset is still incomplete - readq/writeq wrappers
should be provided on all 32-bit architectures. Are those in the
works?

Ingo

2009-04-21 08:33:56

by Hitoshi Mitake

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars

On Tue, Apr 21, 2009 at 01:03, Ingo Molnar <[email protected]> wrote:
>
> * Hitoshi Mitake <[email protected]> wrote:
>
>> On Mon, Apr 20, 2009 at 19:53, Ingo Molnar <[email protected]> wrote:
>> >
>> > * H. Peter Anvin <[email protected]> wrote:
>> >
>> >> Roland Dreier wrote:
>> >> >
>> >> > Notice that it reads from addr+4 *before* it reads from addr, rather
>> >> > than after as in your example (and in fact your example depends on
>> >> > undefined compiler semantics, since there is no sequence point between
>> >> > the two operands of the | operator). ?Now, I don't know that hardware,
>> >> > so I don't know if it makes a difference, but the niu example I gave in
>> >> > my original email shows that given hardware with clear-on-read
>> >> > registers, the order does very much matter.
>> >> >
>> >>
>> >> At least for x86, the order should be low-high, because that is the
>> >> order that those two transactions would be seen on a 32-bit bus
>> >> downstream from the CPU if the CPU issued a 64-bit transaction.
>> >>
>> >> The only sane way to handle this as something other than per-driver
>> >> hacks would be something like:
>> >>
>> >> #include <linux/io64.h> ? ? ? ? ? ? ? /* Any 64-bit I/O OK */
>> >>
>> >> #include <linux/io64lh.h> ? ? /* Low-high splitting OK */
>> >>
>> >> #include <linux/io64hl.h> ? ? /* High-low splitting OK */
>> >>
>> >> #include <linux/io64atomic.h> /* 64-bit I/O must be atomic */
>> >>
>> >> ... i.e. letting the driver choose what fallback method it will accept.
>> >
>> > Yeah - with the default being the natural low-high order.
>> >
>> > The other argument is that if a driver really wants some rare, oddly
>> > different order it should better define its own method that is not
>> > named in the same (or in a similar) way as an existing generic API.
>> > Otherwise, confusion will ensue.
>> I think this is a good way.
>> readq/writeq are already in Linus's tree, removing these is not a good idea.
>>
>> And I've sent the patch to fix a little problem of Kconfig about
>> readq/writeq to you.
>> http://marc.info/?l=linux-kernel&m=123521109218008&w=2
>> Did you notice?
>>
>> Adding cautions about accessing order or non-atomic to Kconfig's help
>> part may be benefit.
>
> It's better to add add such non-interactive help text as Makefile
> comments:
>
> #
> # This option ...
> #
>
> and they should be invisible in make menuconfig. This is a facility
> provided by architectures.
I'll move the help text from Kconfig to Makefile.
(My original patch also doesn't make help text visible in make menuconfig.)

>
> Note, the whole patchset is still incomplete - readq/writeq wrappers
> should be provided on all 32-bit architectures. Are those in the
> works?

I'm not working on porting readq/writeq on all 32-bit architectures.
If I port these, HAVE_READQ will be needless. Because there's no reason
to judge that architecture provides readq/writeq.

Porting readq/writeq on all architectures is radical way to solve.
But the problem related to order of accessing and non-atomic still exists.

I think there are 3 ways to choose:

1) Removing readq/writeq from x86_32
This is the way Roland mentioned.
This way removes the bugs related to order of accessing and non-atomic forever.
But driver programmers must implement their own version of readq/writeq,
and Andrew Morton said such case is sucks.
http://marc.info/?l=linux-kernel&m=122625885124798&w=2

2) Adding HAVE_READQ and HAVE_WRITEQ to Kconfigs of architectures
which provides readq/writeq
despite of 32/64bit
This is the nearest with current state of Linux.
But some day non-atomic or order of accessing which driver programmers
didn't expect may cause
subtle bugs.

3) Porting readq/writeq on all architectures despite of 32/64bit
This is a very radical way.
This frees us from the problem of "#ifdef readq <implement driver own
version> #endif"
or HAVE_READQ forever.
But the possibility of subtle bugs caused by non-atomic or order of
accessing still exists.

Which one should we choose?

I suggest 2) (or 3)).
Because there's no problem since ported readq/writeq on x86_32.
And as H. Peter Anvin mentioned non-atomic is generally fine.

2009-04-21 08:46:10

by Ingo Molnar

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars


* Hitoshi Mitake <[email protected]> wrote:

> On Tue, Apr 21, 2009 at 01:03, Ingo Molnar <[email protected]> wrote:
> >
> > * Hitoshi Mitake <[email protected]> wrote:
> >
> >> On Mon, Apr 20, 2009 at 19:53, Ingo Molnar <[email protected]> wrote:
> >> >
> >> > * H. Peter Anvin <[email protected]> wrote:
> >> >
> >> >> Roland Dreier wrote:
> >> >> >
> >> >> > Notice that it reads from addr+4 *before* it reads from addr, rather
> >> >> > than after as in your example (and in fact your example depends on
> >> >> > undefined compiler semantics, since there is no sequence point between
> >> >> > the two operands of the | operator). ?Now, I don't know that hardware,
> >> >> > so I don't know if it makes a difference, but the niu example I gave in
> >> >> > my original email shows that given hardware with clear-on-read
> >> >> > registers, the order does very much matter.
> >> >> >
> >> >>
> >> >> At least for x86, the order should be low-high, because that is the
> >> >> order that those two transactions would be seen on a 32-bit bus
> >> >> downstream from the CPU if the CPU issued a 64-bit transaction.
> >> >>
> >> >> The only sane way to handle this as something other than per-driver
> >> >> hacks would be something like:
> >> >>
> >> >> #include <linux/io64.h> ? ? ? ? ? ? ? /* Any 64-bit I/O OK */
> >> >>
> >> >> #include <linux/io64lh.h> ? ? /* Low-high splitting OK */
> >> >>
> >> >> #include <linux/io64hl.h> ? ? /* High-low splitting OK */
> >> >>
> >> >> #include <linux/io64atomic.h> /* 64-bit I/O must be atomic */
> >> >>
> >> >> ... i.e. letting the driver choose what fallback method it will accept.
> >> >
> >> > Yeah - with the default being the natural low-high order.
> >> >
> >> > The other argument is that if a driver really wants some rare, oddly
> >> > different order it should better define its own method that is not
> >> > named in the same (or in a similar) way as an existing generic API.
> >> > Otherwise, confusion will ensue.
> >> I think this is a good way.
> >> readq/writeq are already in Linus's tree, removing these is not a good idea.
> >>
> >> And I've sent the patch to fix a little problem of Kconfig about
> >> readq/writeq to you.
> >> http://marc.info/?l=linux-kernel&m=123521109218008&w=2
> >> Did you notice?
> >>
> >> Adding cautions about accessing order or non-atomic to Kconfig's help
> >> part may be benefit.
> >
> > It's better to add add such non-interactive help text as Makefile
> > comments:
> >
> > #
> > # This option ...
> > #
> >
> > and they should be invisible in make menuconfig. This is a facility
> > provided by architectures.
> I'll move the help text from Kconfig to Makefile.
> (My original patch also doesn't make help text visible in make menuconfig.)

sorry i meant the Kconfig file - you can put comments in there too.
help text is really meant for things that are interactive.

Ingo

2009-04-21 08:57:24

by Hitoshi Mitake

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars

On Tue, Apr 21, 2009 at 17:45, Ingo Molnar <[email protected]> wrote:
>
> * Hitoshi Mitake <[email protected]> wrote:
>
>> On Tue, Apr 21, 2009 at 01:03, Ingo Molnar <[email protected]> wrote:
>> >
>> > * Hitoshi Mitake <[email protected]> wrote:
>> >
>> >> On Mon, Apr 20, 2009 at 19:53, Ingo Molnar <[email protected]> wrote:
>> >> >
>> >> > * H. Peter Anvin <[email protected]> wrote:
>> >> >
>> >> >> Roland Dreier wrote:
>> >> >> >
>> >> >> > Notice that it reads from addr+4 *before* it reads from addr, rather
>> >> >> > than after as in your example (and in fact your example depends on
>> >> >> > undefined compiler semantics, since there is no sequence point between
>> >> >> > the two operands of the | operator). ?Now, I don't know that hardware,
>> >> >> > so I don't know if it makes a difference, but the niu example I gave in
>> >> >> > my original email shows that given hardware with clear-on-read
>> >> >> > registers, the order does very much matter.
>> >> >> >
>> >> >>
>> >> >> At least for x86, the order should be low-high, because that is the
>> >> >> order that those two transactions would be seen on a 32-bit bus
>> >> >> downstream from the CPU if the CPU issued a 64-bit transaction.
>> >> >>
>> >> >> The only sane way to handle this as something other than per-driver
>> >> >> hacks would be something like:
>> >> >>
>> >> >> #include <linux/io64.h> ? ? ? ? ? ? ? /* Any 64-bit I/O OK */
>> >> >>
>> >> >> #include <linux/io64lh.h> ? ? /* Low-high splitting OK */
>> >> >>
>> >> >> #include <linux/io64hl.h> ? ? /* High-low splitting OK */
>> >> >>
>> >> >> #include <linux/io64atomic.h> /* 64-bit I/O must be atomic */
>> >> >>
>> >> >> ... i.e. letting the driver choose what fallback method it will accept.
>> >> >
>> >> > Yeah - with the default being the natural low-high order.
>> >> >
>> >> > The other argument is that if a driver really wants some rare, oddly
>> >> > different order it should better define its own method that is not
>> >> > named in the same (or in a similar) way as an existing generic API.
>> >> > Otherwise, confusion will ensue.
>> >> I think this is a good way.
>> >> readq/writeq are already in Linus's tree, removing these is not a good idea.
>> >>
>> >> And I've sent the patch to fix a little problem of Kconfig about
>> >> readq/writeq to you.
>> >> http://marc.info/?l=linux-kernel&m=123521109218008&w=2
>> >> Did you notice?
>> >>
>> >> Adding cautions about accessing order or non-atomic to Kconfig's help
>> >> part may be benefit.
>> >
>> > It's better to add add such non-interactive help text as Makefile
>> > comments:
>> >
>> > #
>> > # This option ...
>> > #
>> >
>> > and they should be invisible in make menuconfig. This is a facility
>> > provided by architectures.
>> I'll move the help text from Kconfig to Makefile.
>> (My original patch also doesn't make help text visible in make menuconfig.)
>
> sorry i meant the Kconfig file - you can put comments in there too.
> help text is really meant for things that are interactive.

Can I think that I should add help text to both of Kconfig and Makefile?
(I'm not a native English speaker, and I'm not good at English.
I'm sorry if I misunderstand something...)

2009-04-21 15:45:34

by H. Peter Anvin

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars

Hitoshi Mitake wrote:
> I suggest 2) (or 3)).
> Because there's no problem since ported readq/writeq on x86_32.
> And as H. Peter Anvin mentioned non-atomic is generally fine.

Okay, I'm going to throw in yet another wrench in the machinery.

There are devices which really only want to use writeq() if it is an
inexpensive (no x86 MMX hacks) atomic operation.

What follows is *NOT* a hypothetical example, I worked on a real device
which behaved this way:

Consider a device with a set of control registers in device memory. You
can issue a whole command sequence at one point, but the command isn't
activated until a write is received with a certain bit set (located in
the high half of the last qword.)

For performance, the device provides its registers mapped to two
different physical addresses, with the intent that the OS will map one
of them write-combining.

Thus, what you want is to perform all writes but the very last one as a
write-combining write; the very last write should be a conventional I/O
write that flushes the write combiners ahead of itself and triggers the
write.

Now, in theory you could use writel() for this, but if you have a
working writeq() [e.g. 64-bit mode] you want to use it.

So it's important that the driver can know when readq/writeq are
emulated at all.

One way to deal with that is the <linux/io64*.h> method, another is with
feature test macros.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-04-21 17:08:07

by Roland Dreier

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars

> Okay, I'm going to throw in yet another wrench in the machinery.

> There are devices which really only want to use writeq() if it is an
> inexpensive (no x86 MMX hacks) atomic operation.

Interesting example. It does seem there are several reasonable hardware
design patters where a driver needs to know if writeq() and/or readq()
is atomic.

> One way to deal with that is the <linux/io64*.h> method, another is with
> feature test macros.

A further idea would be to add readq_atomic()/writeq_atomic() that
behave as the current 64-bit versions do: a single instruction,
generates a single bus cycle, etc. Then drivers that really need to
use the full semantics that the 64-bit versions gave can use those,
while drivers that just want a convenient way to read a 64-bit register
can use readq()/writeq().

This only makes sense if we define a 32-bit fallback for
readq()/writeq() for all 32-bit architectures -- in fact it would be
good to do it in asm-generic so that there can be a single
implementation that guarantees that non-atomic versions always do, say,
low 32 bits then high 32 bits. (So eg niu can use the generic version)
And then drivers like drivers/infiniband/hw/mthca can be switched to
"#ifndef writeq_atomic <...hardware specific fallback...>"

However I worry that this just leaves driver authors too much rope.
Choosing readq_atomic() vs. readq() is just one more thing to get wrong.

- R.

2009-04-21 17:20:08

by H. Peter Anvin

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars

Roland Dreier wrote:
>
> However I worry that this just leaves driver authors too much rope.
> Choosing readq_atomic() vs. readq() is just one more thing to get wrong.
>

... as is having each driver implementing their own substitutes.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2009-04-21 17:23:48

by Roland Dreier

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars

> > However I worry that this just leaves driver authors too much rope.
> > Choosing readq_atomic() vs. readq() is just one more thing to get wrong.

> ... as is having each driver implementing their own substitutes.

Yes, I agree with that. However at least status quo ante (readq/writeq
64-bit only) means that driver authors who use readq/writeq are forced
(by a compile error) to spend a little thought on what 32-bit fallback
they should use.

I guess one possibility is to make readq/writeq the atomic version, and
add readq_nonatomic()/writeq_nonatomic() for 32-bit architectures. Then
it's much more opt-in -- but then that makes the (perhaps) more common
case look a bit uglier.

- R.

2009-04-21 19:10:08

by H. Peter Anvin

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars

Roland Dreier wrote:
> > > However I worry that this just leaves driver authors too much rope.
> > > Choosing readq_atomic() vs. readq() is just one more thing to get wrong.
>
> > ... as is having each driver implementing their own substitutes.
>
> Yes, I agree with that. However at least status quo ante (readq/writeq
> 64-bit only) means that driver authors who use readq/writeq are forced
> (by a compile error) to spend a little thought on what 32-bit fallback
> they should use.
>
> I guess one possibility is to make readq/writeq the atomic version, and
> add readq_nonatomic()/writeq_nonatomic() for 32-bit architectures. Then
> it's much more opt-in -- but then that makes the (perhaps) more common
> case look a bit uglier.
>

Do you really expect driver authors to type writeq_nonatomic() for every
register reference?

I think an #include at the top is one thing, but something that
heavyweight for each call site really isn't going to fly.

-hpa

2009-04-21 21:12:15

by Roland Dreier

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars

> Do you really expect driver authors to type writeq_nonatomic() for
> every register reference?

No -- that's why I didn't even bring it up at first and why I consider
it ugly.

> I think an #include at the top is one thing, but something that
> heavyweight for each call site really isn't going to fly.

Yeah, I guess that could work, although I do worry that debugging the
wrong choice of #include might be a pain (mysterious symptoms on 32-bit
architectures caused by the name of an include file would be hard to
track).

To be honest I think the status quo ante was not really that bad.

- R.

2009-04-21 21:17:24

by H. Peter Anvin

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars

Roland Dreier wrote:
>
> To be honest I think the status quo ante was not really that bad.
>

That I have to vehemently disagree with.

-hpa

2009-04-22 00:26:11

by David Miller

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars

From: Roland Dreier <[email protected]>
Date: Tue, 21 Apr 2009 10:07:50 -0700

> This only makes sense if we define a 32-bit fallback for
> readq()/writeq() for all 32-bit architectures -- in fact it would be
> good to do it in asm-generic so that there can be a single
> implementation that guarantees that non-atomic versions always do, say,
> low 32 bits then high 32 bits. (So eg niu can use the generic version)
> And then drivers like drivers/infiniband/hw/mthca can be switched to
> "#ifndef writeq_atomic <...hardware specific fallback...>"

I think if you want to do this right you have to provide two versions
of the 32-bit implementations, when the cpu cannot generate full
64-bit transactions. Especially for readq().

Some devices clear the status bits of a 64-bit register when read, so
it might matter deeply whether the top-half or the bottom-half 32-bits
are read first.

The following are ugly names, but something like "readq_hifirst()"
and "readq_lofirst()" and they just get defined to play "readq()"
in situations where a full 64-bit transaction can be generated by
the cpu.

The driver author has to figure out which is appropriate.

And I'm pretty sure similar high-first/low-first issues can exist
for writeq() as well.

2009-04-22 00:28:18

by David Miller

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars

From: "H. Peter Anvin" <[email protected]>
Date: Tue, 21 Apr 2009 12:09:20 -0700

> Do you really expect driver authors to type writeq_nonatomic() for
> every register reference?

They'll write local driver macros, as every driver does
to save typing. It also allows to get rid of the
redundant "foop->regs" in every register access too.

Look at what all of these drivers do:

#define nr64(reg) readq(np->regs + (reg))
#define nw64(reg, val) writeq((val), np->regs + (reg))

So nobody actually types readq() for every register access,
just as they don't type foop->regs for every register access
either :-)

2009-04-22 00:31:40

by David Miller

[permalink] [raw]
Subject: Re: arch/x86/Kconfig selects invalid HAVE_READQ, HAVE_WRITEQ vars

From: "H. Peter Anvin" <[email protected]>
Date: Tue, 21 Apr 2009 14:16:31 -0700

> Roland Dreier wrote:
>> To be honest I think the status quo ante was not really that bad.
>
> That I have to vehemently disagree with.

I have to agree with Roland.

Unless you make it a compile failure, no driver author is going to
spend any amount of time trying to figure out how to deal with this
situation properly.

So in this sense, the current situation works really well.

If you make it just compile and make an arbitrary choice of whether
the top-32bits is read first or not, you're going to end up with
mysterious driver failures that only occur on some machines and the
cause of which won't be determined until after a lot of painful
digging.

This painful debugging experience is eliminated if the driver author
is told with a compile failure that there is an issue to resolve.

And that is what happens right now.

2009-04-28 19:05:27

by Roland Dreier

[permalink] [raw]
Subject: [PATCH] x86: Remove readq()/writeq() on 32-bit

As discussed in <http://lkml.org/lkml/2009/4/19/164> and follow-ups,
readq()/writeq() for 32-bit x86 are implemented as two readl()/writel()
operations. This is not atomic (in the sense that another MMIO
operation from another CPU or thread can be done in the middle of the
two read/writes), and may not access the two halves of the register in
the correct order to work with hardware.

Rather than silently providing a 32-bit fallback that leaves a
possibility for strange driver bugs, it's better to provide readq()
and writeq() only for 64-bit architectures, and have a compile failure
on 32-bit architectures that forces driver authors to think about what
the correct solution is.

This essentially reverts 2c5643b1 ("x86: provide readq()/writeq() on
32-bit too") and follow-on commits. If in the future someone wants to
provide a generic solution for all 32-bit architectures, that's great,
but there's not much point in providing (arguably broken)
implementations for only one architecture, since any portable driver
will have to implement fallbacks for other architectures anyway.

Signed-off-by: Roland Dreier <[email protected]>
---
We never seemed to reach closure on this. I would strongly suggest
merging something like this, and then if someone has a grand plan to
unify all fallbacks, we can add that when it shows up. As it stands,
the x86-32 situation is not progress towards that grand unified plans,
and does nothing that I can tell beyond setting a trap for drivers.

arch/x86/Kconfig | 2 --
arch/x86/include/asm/io.h | 23 ++---------------------
2 files changed, 2 insertions(+), 23 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index df9e885..0be277b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -19,8 +19,6 @@ config X86_64
config X86
def_bool y
select HAVE_AOUT if X86_32
- select HAVE_READQ
- select HAVE_WRITEQ
select HAVE_UNSTABLE_SCHED_CLOCK
select HAVE_IDE
select HAVE_OPROFILE
diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 7373932..199c7b9 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -51,27 +51,6 @@ build_mmio_write(__writel, "l", unsigned int, "r", )
build_mmio_read(readq, "q", unsigned long, "=r", :"memory")
build_mmio_write(writeq, "q", unsigned long, "r", :"memory")

-#else
-
-static inline __u64 readq(const volatile void __iomem *addr)
-{
- const volatile u32 __iomem *p = addr;
- u32 low, high;
-
- low = readl(p);
- high = readl(p + 1);
-
- return low + ((u64)high << 32);
-}
-
-static inline void writeq(__u64 val, volatile void __iomem *addr)
-{
- writel(val, addr);
- writel(val >> 32, addr+4);
-}
-
-#endif
-
#define readq_relaxed(a) readq(a)

#define __raw_readq(a) readq(a)
@@ -81,6 +60,8 @@ static inline void writeq(__u64 val, volatile void __iomem *addr)
#define readq readq
#define writeq writeq

+#endif
+
/**
* virt_to_phys - map virtual addresses to physical
* @address: address to remap

2009-04-29 05:12:44

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] x86: Remove readq()/writeq() on 32-bit

From: Roland Dreier <[email protected]>
Date: Tue, 28 Apr 2009 12:05:10 -0700

> As discussed in <http://lkml.org/lkml/2009/4/19/164> and follow-ups,
> readq()/writeq() for 32-bit x86 are implemented as two readl()/writel()
> operations. This is not atomic (in the sense that another MMIO
> operation from another CPU or thread can be done in the middle of the
> two read/writes), and may not access the two halves of the register in
> the correct order to work with hardware.
>
> Rather than silently providing a 32-bit fallback that leaves a
> possibility for strange driver bugs, it's better to provide readq()
> and writeq() only for 64-bit architectures, and have a compile failure
> on 32-bit architectures that forces driver authors to think about what
> the correct solution is.
>
> This essentially reverts 2c5643b1 ("x86: provide readq()/writeq() on
> 32-bit too") and follow-on commits. If in the future someone wants to
> provide a generic solution for all 32-bit architectures, that's great,
> but there's not much point in providing (arguably broken)
> implementations for only one architecture, since any portable driver
> will have to implement fallbacks for other architectures anyway.
>
> Signed-off-by: Roland Dreier <[email protected]>

Acked-by: David S. Miller <[email protected]>

2009-04-29 11:57:30

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86: Remove readq()/writeq() on 32-bit


(Linus Cc:-ed)

* David Miller <[email protected]> wrote:

> From: Roland Dreier <[email protected]>
> Date: Tue, 28 Apr 2009 12:05:10 -0700
>
> > As discussed in <http://lkml.org/lkml/2009/4/19/164> and follow-ups,
> > readq()/writeq() for 32-bit x86 are implemented as two readl()/writel()
> > operations. This is not atomic (in the sense that another MMIO
> > operation from another CPU or thread can be done in the middle of the
> > two read/writes), and may not access the two halves of the register in
> > the correct order to work with hardware.
> >
> > Rather than silently providing a 32-bit fallback that leaves a
> > possibility for strange driver bugs, it's better to provide readq()
> > and writeq() only for 64-bit architectures, and have a compile failure
> > on 32-bit architectures that forces driver authors to think about what
> > the correct solution is.
> >
> > This essentially reverts 2c5643b1 ("x86: provide readq()/writeq() on
> > 32-bit too") and follow-on commits. If in the future someone wants to
> > provide a generic solution for all 32-bit architectures, that's great,
> > but there's not much point in providing (arguably broken)
> > implementations for only one architecture, since any portable driver
> > will have to implement fallbacks for other architectures anyway.
> >
> > Signed-off-by: Roland Dreier <[email protected]>
>
> Acked-by: David S. Miller <[email protected]>
[...]
> > We never seemed to reach closure on this. I would strongly
> > suggest merging something like this, and then if someone has a
> > grand plan to unify all fallbacks, we can add that when it shows
> > up. As it stands, the x86-32 situation is not progress towards
> > that grand unified plans, and does nothing that I can tell
> > beyond setting a trap for drivers.

I still have no particularly strong opinion on this - other the
reluctance i expressed already in the previous threads. My arguments
are not reflected (and not addressed) in the changelog AFAICS, so
let me repeat them here:

Firstly, it doesnt really matter in practice because such use is
very rare and non-spinlocked access to IO regions is even rarer.

What caused 2c5643b1 was that right now we have ugly per driver
#defines and inlines for readq/writeq. See:

git grep 'define.*readq' drivers/

for a rough estimation of what the current practices are. The 32-bit
wrapper we added 6 months ago is the obvious implementation on x86
and that it matches existing wrappers.

Atomicity of a 64-bit IO space access on 32-bit platforms, on an
unknown-bitness transport (it might even be a 64-bit PCI device
bridged over a 32-bit bridge) is obviously not guaranteed.

Not even 64-bit-on-64-bit is really guaranteed to be atomic. The
bitness here is what the CPU runs its _own_ code in (and how it
accesses its cached memory space) - it does not transform over to
the uncached IO bus.

So trying to suggest that 64-bit readq/writeq when running on a
64-bit kernel is somehow atomic (or can be made atomic) is really
wrong IMO. The 32-bit wrapper is simply the expression of how the
CPU would do a 64-bit access even in 64-bit mode anyway [if the
transport is 32-bit].

aligned 32-bit access can be assumed atomic to a certain degree by
virtue of PCI being 32 bit or better - but assuming any 64-bit
IO-space read/write atomicity is wrong on many levels.

Driver authors will have to think about it anyway _even on 64-bit_,
regardless of the existence of a 32-bit fallback.

A driver and hw _might_ be quirky and might require atomicity or
might define a different order of access ... but then _that_ driver
should become ugly, not all the others, right?

So my (slight) preference would be to keep the default generic
implementation and not make any atomicity guarantees - we never made
any. _If_ you want atomicity then provide a readq_atomic() /
writeq_atomic() facility, with various higher level checks that make
it sure that the IO transport is really atomic. (i dont see this
happening any time soon for anything else but some really rare
high-end IO transport.)

For the common case of there not being any atomicity assumption on
the driver case it should result in cleaner code. (assuming all
other 64-bit architectures implement a fallback too)

But ... i might be wrong about it, so i've Cc:-ed Linus who usually
has a rather strong opinion about IO APIs. I'll apply the patch if
Linus acks it (or Linus might take it straight out of email).

Thanks,

Ingo

2009-04-29 12:11:48

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] x86: Remove readq()/writeq() on 32-bit

Ingo Molnar wrote:
> (Linus Cc:-ed)
>
> * David Miller <[email protected]> wrote:
>
>> From: Roland Dreier <[email protected]>
>> Date: Tue, 28 Apr 2009 12:05:10 -0700
>>
>>> As discussed in <http://lkml.org/lkml/2009/4/19/164> and follow-ups,
>>> readq()/writeq() for 32-bit x86 are implemented as two readl()/writel()
>>> operations. This is not atomic (in the sense that another MMIO
>>> operation from another CPU or thread can be done in the middle of the
>>> two read/writes), and may not access the two halves of the register in
>>> the correct order to work with hardware.
>>>
>>> Rather than silently providing a 32-bit fallback that leaves a
>>> possibility for strange driver bugs, it's better to provide readq()
>>> and writeq() only for 64-bit architectures, and have a compile failure
>>> on 32-bit architectures that forces driver authors to think about what
>>> the correct solution is.
>>>
>>> This essentially reverts 2c5643b1 ("x86: provide readq()/writeq() on
>>> 32-bit too") and follow-on commits. If in the future someone wants to
>>> provide a generic solution for all 32-bit architectures, that's great,
>>> but there's not much point in providing (arguably broken)
>>> implementations for only one architecture, since any portable driver
>>> will have to implement fallbacks for other architectures anyway.
>>>
>>> Signed-off-by: Roland Dreier <[email protected]>
>> Acked-by: David S. Miller <[email protected]>
> [...]
>>> We never seemed to reach closure on this. I would strongly
>>> suggest merging something like this, and then if someone has a
>>> grand plan to unify all fallbacks, we can add that when it shows
>>> up. As it stands, the x86-32 situation is not progress towards
>>> that grand unified plans, and does nothing that I can tell
>>> beyond setting a trap for drivers.
>
> I still have no particularly strong opinion on this - other the
> reluctance i expressed already in the previous threads. My arguments
> are not reflected (and not addressed) in the changelog AFAICS, so
> let me repeat them here:

I do.


> What caused 2c5643b1 was that right now we have ugly per driver
> #defines and inlines for readq/writeq. See:
>
> git grep 'define.*readq' drivers/
>
> for a rough estimation of what the current practices are. The 32-bit
> wrapper we added 6 months ago is the obvious implementation on x86
> and that it matches existing wrappers.

This is the key...


> So my (slight) preference would be to keep the default generic
> implementation and not make any atomicity guarantees - we never made
> any.

Agreed.

This removal patch is completely pointless, because it moves us
backwards to the point where we had a bunch of drivers defining it.

Why is that any better?

"Forcing driver writers to think" translates in the real world to each
hardware vendor putting the common "#define readq" into their driver.

At least the networking drivers I messed with (until 11/2008) were
always fine with a non-atomic readq.

The x86 kernel 32-bit implementation of readq/writeq is the code that
every hardware vendor otherwise would re-create, when doing a Linux driver.

Jeff


2009-04-29 17:22:04

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH] x86: Remove readq()/writeq() on 32-bit

> What caused 2c5643b1 was that right now we have ugly per driver
> #defines and inlines for readq/writeq. See:

but 2c5643b1 doesn't fix that situation -- a portable driver still needs
the #defines for other 32-bit architectures. And 2c5643b1 isn't really
a particularly good step towards rectifying the situation, since if
every 32-bit architecture follows suit and adds its own compatibility
versions, then we'll want someone to go through and unify them into a
generic implementation. In other words removing the x86 private version
will be part of the work in getting to a final solution.

> Atomicity of a 64-bit IO space access on 32-bit platforms, on an
> unknown-bitness transport (it might even be a 64-bit PCI device
> bridged over a 32-bit bridge) is obviously not guaranteed.

Yes, some platforms may not be able to give true atomicity. eg 32-bit
PowerPC has no instructions that generate 64-bit cycles, even on the CPU
bus. I do think the 32-bit PCI thing is a bit of a red herring, since
eg PCIe devices can rely on a 64-bit bus.

> So trying to suggest that 64-bit readq/writeq when running on a
> 64-bit kernel is somehow atomic (or can be made atomic) is really
> wrong IMO. The 32-bit wrapper is simply the expression of how the
> CPU would do a 64-bit access even in 64-bit mode anyway [if the
> transport is 32-bit].

As far as I know, all 64-bit CPUs doing 64-bit accesses to a PCIe device
(eg the NIC driven by the niu driver) will generate 64-bit bus cycles.

The issue for me is that the benefit of having this compatibility define
is rather minimal, while the cost is potentially high: spending a long
time debugging platform-specific bugs -- the symptoms will not point
immediately to the IO define, of course.

- R.

2009-04-29 17:26:10

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH] x86: Remove readq()/writeq() on 32-bit

> This removal patch is completely pointless, because it moves us
> backwards to the point where we had a bunch of drivers defining it.

No, the current kernel still requires drivers to define it anyway,
because there are tons of 32-bit architectures that are not x86.

And more than that, centralizing the definition makes the API much more
dangerous for driver authors.

> At least the networking drivers I messed with (until 11/2008) were
> always fine with a non-atomic readq.

The commit to niu I keep citing (e23a59e1, "niu: Fix readq
implementation when architecture does not provide one.") shows that
drivers need to take care. Now, the x86 implementation would happen to
work for that hardware, but eg drivers/infiniband/hw/amso1100 defines
readq with the opposite order -- whether that's required or just an
arbitrary choice, I don't know. And drivers/infiniband/hw/mthca has
some uses of __raw_writeq() that only work if no other CPU accesses to
the same page can happen between the two halves, so it adds a per-page
spinlock for 32-bit architectures. etc.

- R.

2009-04-29 20:01:03

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] x86: Remove readq()/writeq() on 32-bit

Roland Dreier wrote:
> > This removal patch is completely pointless, because it moves us
> > backwards to the point where we had a bunch of drivers defining it.
>
> No, the current kernel still requires drivers to define it anyway,
> because there are tons of 32-bit architectures that are not x86.

Then let's fix that issue... by propagating the common definition to
other platforms that properly implement {read,write}[bwl] in terms of
the PCI bus.


> And more than that, centralizing the definition makes the API much more
> dangerous for driver authors.

I think that's really cranking the hyperbole level to 11.

The common definition is... the one found most commonly in the wild.
For weird drivers, they will do their own thing.

That's pretty much how other drivers handle things.

Apply your logic here to _any_ API in the kernel, for the same result.


> > At least the networking drivers I messed with (until 11/2008) were
> > always fine with a non-atomic readq.
>
> The commit to niu I keep citing (e23a59e1, "niu: Fix readq
> implementation when architecture does not provide one.") shows that
> drivers need to take care. Now, the x86 implementation would happen to

That commit also shows that, had the driver been using a common
definition, problems would not have arisen.


> work for that hardware, but eg drivers/infiniband/hw/amso1100 defines
> readq with the opposite order -- whether that's required or just an

'required' seems unlikely, given that

a) their readq only exists when #ifndef readq, thus implying the
driver-local readq is far less tested, on their most-tested,
highest-volume platform.

b) their readq still operates in LE order -- as it should:
read,write[bwl] were defined in terms of PCI originally, and thus
defined to be LE.

c) their __raw_writeq writes in lower-32-bits-first, as one would expect


> arbitrary choice, I don't know. And drivers/infiniband/hw/mthca has
> some uses of __raw_writeq() that only work if no other CPU accesses to
> the same page can happen between the two halves, so it adds a per-page
> spinlock for 32-bit architectures. etc.

Any use of __raw_xxx implies that You Know What You're Doing And Accept
The Consequences. __raw_xxx means _you_ handle endian conversions,
barriers, and other arch-specific details. I don't think that a driver
intentionally using the "raw" APIs is a good source of ideas and
generalizations.

So, for your three examples,

1) niu - common definition is OK

2) amso1100 - common definition is OK; driver-local definition
never used on common PCI platforms

3) mthca - intentionally uses raw API, an API which ditches
arch-specific barriers, endian conversions, and other
guarantees.

Given that, I see zero justification for API removal. I see
justification for propagating this code to other PCI-capable platforms.

Finally, I think given all this time we've had driver-define writeq and
readq, and "driver authors were forced to think about this API" -- the
result was the obvious definition now in place!

Jeff