LinuxLists.cc - lib/iomap.c mmio_{in,out}s* vs. __raw

2006-11-04 07:53:20

Subject: lib/iomap.c mmio_{in,out}s* vs. __raw_* accessors

I'm implementing an optional mecanism that some platforms can use on
powerpc to hook the various MMIO and PIO operations (to handle things
ranging from iSeries-like MMIO via HyperVisor stuff to platform specific
errata on the PCI bus).

When that mecanism is enabled, I stop using arch/powerpc's iomap
implementation and fallback instead on the generic one in lib/iomap.c
since by using the standard accessors, it will directly benefit from the
hooks.

However, I noticed the following:

static inline void mmio_insb(void __iomem *addr, u8 *dst, int count)
{
while (--count >= 0) {
u8 data = __raw_readb(addr);
*dst = data;
dst++;
}
}

etc...

The problem is that the current implementation, at least on powerpc, of
the "raw" accessors is to both not swap _and_ have no barriers.

I don't want to dig back out the big discussion we had about in-order
vs. "relaxed" IO operations and my proposed document about it that's
currently sitting in limbo mostly due to apparent lack of interest to
fix the issue on other parts than mine (that's ok though, I'm not
complaining, I'll come back and push when I get some more time and moti
vation for it :)

However, there is some issue here which I think is that the "raw"
accessors are badly defined.

In fact, I would be very very very much in favor of, instead of the
above, defining a set of:

readsb, readsw, readsl, readsq
writesb, writesw, writesl, writesq

And have lib/iomap.c use those. That has several benefits, though two
pop off the top of my mind at the moment:

- Consistent naming, which is very useful, especially with my new
hookable mecanism where I generate things with macros :) But still,
appart from that, consistent naming is I think a big bonus to the users
of those APIs.

- The arch is responsible from implementing them, exactly like the
others readb...q, writeb....q, inb...l, outb...l, and thus is in precise
control of which barriers are necessary, while still being able to use
the generic iomap because it's handy and avoids having to re-implement
everything duplicate.

Now, of course, currently no arch implement them :-)

So I'd like to propose that we do that. I can put together a "default"
implementation that everybody uses based on __raw (assuming that works
fine for a lot of archs) and have powerpc do it's "correct" one.

Oh, also, the insb...insl etc.. versions of those are generally not
strongly types (buffer is a void *) while the iomap are (buffer is a
pointer to the native type of the operation: u8...u32). What do you guys
prefer ? I tend to personally prefer void * for IO related bits but I
wouldn't fight for that one.

Cheers,
Ben.

2006-11-04 14:06:14

by Russell King

[permalink] [raw]

Subject: Re: lib/iomap.c mmio_{in,out}s* vs. __raw_* accessors

On Sat, Nov 04, 2006 at 06:52:41PM +1100, Benjamin Herrenschmidt wrote:
> In fact, I would be very very very much in favor of, instead of the
> above, defining a set of:
>
> readsb, readsw, readsl, readsq
> writesb, writesw, writesl, writesq

ARM already has these. Sounds like a good idea for everyone else to also
implement them. 8)

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2006-11-04 22:18:47

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: lib/iomap.c mmio_{in,out}s* vs. __raw_* accessors

On Sat, 2006-11-04 at 14:06 +0000, Russell King wrote:
> On Sat, Nov 04, 2006 at 06:52:41PM +1100, Benjamin Herrenschmidt wrote:
> > In fact, I would be very very very much in favor of, instead of the
> > above, defining a set of:
> >
> > readsb, readsw, readsl, readsq
> > writesb, writesw, writesl, writesq
>
> ARM already has these. Sounds like a good idea for everyone else to also
> implement them. 8)

Ok, powerpc will have these in 2.6.20 then :-)

I'm tempted to remove those mmio_* things from iomap.c completely. I
need to check who uses them, but in all cases, I don't see what they do
in iomap.c, it's not their place.

Versions that would transparently use MMIO or PIO would make sense. A
pure MMIO implementation doesn't, that has to be arch specific. It makes
the generic iomap suddently non-portable in some ways.

So I think we need to make sure all archs grow readsb,sw,sl etc... and
just have iomap use those for the "transparent" versions.

Ben.

2006-11-04 23:54:49

by Linus Torvalds

[permalink] [raw]

Subject: Re: lib/iomap.c mmio_{in,out}s* vs. __raw_* accessors

On Sun, 5 Nov 2006, Benjamin Herrenschmidt wrote:
>
> I'm tempted to remove those mmio_* things from iomap.c completely. I
> need to check who uses them, but in all cases, I don't see what they do
> in iomap.c, it's not their place.

I don't think you understand the point. The point is that a lot of the
tests for whether something is MMIO or PIO can be done _once_, instead of
doing it for every access.

> Versions that would transparently use MMIO or PIO would make sense.

No, they would be idiotic, because we already have those. If you want to
use MMIO or PIO transparently one access at a time, YOU SHOULD NOT USE THE
"STRING" VERSION. You should just use "ioread8()" or something like that.

That _is_ the single-access-does-MMIO-or-PIO-transparently function!

> A pure MMIO implementation doesn't, that has to be arch specific. It makes
> the generic iomap suddently non-portable in some ways.

Whaa? I really don't see what's wrong with the one that is in lib/iomap.c.
But if you want to do your own, go ahead and do so - the whole point of
lib/iomap.c is to be a library that can be used by architectures that can
use the generic functionality. It's all hidden behind
CONFIG_GENERIC_IOMAP.

In other words, this whole thread makes absolutely _zero_ sense. Either
you use those functions or you don't. Trying to change them would be
insane.

> So I think we need to make sure all archs grow readsb,sw,sl etc... and
> just have iomap use those for the "transparent" versions.

Again, totally insane. If you don't want to use GENERIC_IOMAP, don't. But
don't force other architectures to follow that path to insanity.

So, in short:
(a) if the generic library doesn't work for you, stop using it

(b) the whole _point_ of the "repeat" instructions is to avoid doing the
same tests over and over again for an iomem address that won't
change, so doing them in the individual accessor functions would be
_crazy_.

(c) if you want to add #IO barriers to that thing, again, do it _around_
the repeat string, not in the individual accesses. If you need them
on an individual access basis, you're probably better off doing your
own version altogether.

Please explain what is so wrong with the current setup, and please explain
why you'd want to break the obvious "check the address only once!" rule
that makes sense for _any_ architecture that has separate #PIO and #MMIO
address spaces.

Linus

2006-11-05 01:10:35

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: lib/iomap.c mmio_{in,out}s* vs. __raw_* accessors

On Sat, 2006-11-04 at 15:52 -0800, Linus Torvalds wrote:
>
> On Sun, 5 Nov 2006, Benjamin Herrenschmidt wrote:
> >
> > I'm tempted to remove those mmio_* things from iomap.c completely. I
> > need to check who uses them, but in all cases, I don't see what they do
> > in iomap.c, it's not their place.
>
> I don't think you understand the point. The point is that a lot of the
> tests for whether something is MMIO or PIO can be done _once_, instead of
> doing it for every access.

.../...

I think we are just not talking about the same thing here... (and are
quickly jumping on big words :-)

Let me rephrase my point, I think we are simply not understanding each
other.

So Yes, iomap is a library that can be replaced, Yes, all you said is
good about testing the type once and doing the "repeat", and that's
exactly what iomap does.

The Generic iomap basically does something around the lines of

- define an IO_COND macro that decides wether an adddress is PIO or
MMIO (and that can even be customized in some ways with
HAVE_ARCH_PIO_SIZE, which is all good).

- implements a set of io{read,write}{8,16,32}{le,be}[_rep] functions
that provide pretty much the complete set of accessors using that macro
such that those accessors operate on both MMIO and PIO provided the
address passed comes pci_iomap/ioport_map.

So far so good, up to this point, we all agree and the world is
beautiful.

The -only- problem I'm talking about is that in order to implement the
above, this IOCOND does something around the line of:

if (pio)
use_pio_accessor()
else
use_mmio_accessor()

It does that, by using the existing, arch provided, accessors for PIO
and MMIO (inX, outX, readX, writeX), which are doing the right thing vs.
barriers etc, so again, all is good and well.... until you hit cases
where such accessors don't exist !

And that's exactly where the shit hits the fan. That is, when the
appropriate accessor don't already exist, it tries to define it's own,
and that's where I have the problem. This is more specifically for the
following cases:

- "be" versions of MMIO accesses. Example:

unsigned int fastcall ioread16be(void __iomem *addr)
{
IO_COND(addr, return inw(port), return be16_to_cpu(__raw_readw(addr)));
}

- repeat versions of MMIO accessors. Example:

static inline void mmio_outsl(void __iomem *addr, const u32 *src, int count)
{
while (--count >= 0) {
__raw_writel(*src, addr);
src++;
}
}

Followed by

void fastcall ioread32_rep(void __iomem *addr, void *dst, unsigned long count)
{
IO_COND(addr, insl(port,dst,count), mmio_insl(addr, dst, count));
}

In both those cases, as you can see, we are using an existing arch
provided accessor for the PIO case which does the right thing. But due
to the lack of standard "stream" (ie. "repeat") accessors provided by
archs for MMIO, we invent one based on __raw_* and due to the lack of
"big endian" standard accessor (see my point below), we invent one based
on be16_to_cpu(__raw*).

The problem with the above is that these "replacement" accessors we make
up for MMIO based on __raw, because they use __raw, don't provide any
barrier _at_all_. They are thus incorrect for some platforms like
PowerPC.

So yes, we can have our own iomap alltogether. In fact, we do right now
and it works fine. It's just that I need to do this specific option for
powerpc where IO accessors can be hooked (to workaround HW and
Hypervisor issues) when building for some platforms, in which case, I
can't use our current iomap implementation and what I need is really
just the generic one ... except for those couple of issues with
"missing" MMIO accessors.

Hence my proposal: instead of having iomap create it's own versions of
the missing MMIO accessors, have them defined generally, and have the
generic iomap use them.

In the above case, that means defining generally some "repeat" version
of the PCI MMIO accessors (readsb, readsw, etc...) which would have a
consistent naming with the IO versions and would even be useable by
MMIO-only PCI drivers who don't wnat to take the iomap overhead. In
fact, as Russell mentioned, ARM already have those and I have a patch
adding them to PowerPC. It should be trivial to use the implementation
abvoe based on __raw_* as a default for archs that don't provide them
for now too.

In the remaning case of the "be" versions, I suppose we still have a
problem... I'd be almost tempted to use swab16(readw) instead of the
current be16_to_cpu(__raw_readw(addr) (which should probably be
cpu_to_be16 btw).

Now I'm not -that- big about it. As you said, and I agree, I can just do
my own (or rather, have powerpc have 2 versions of iomap, the current
one, and a modified version of the generic one for when my "IOs can be
hooked" option is set.

I just felt that it was a bit of a waste considering that most of the
generic iomap just fits nicely, only those few MMIO operations that
aren't generically defined and that iomap tries to define itself based
on the __raw_ versions are a problem.

(BTW. That leads to the point that the __raw_ accessors are generally
not useful as they both don't endian swap and don't do barriers and that
we don't provide generic barrier ops that can be used to do the job
manually afaik).

Cheers,
Ben.

2006-11-05 03:32:38

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: lib/iomap.c mmio_{in,out}s* vs. __raw_* accessors

Make the generic lib/iomap.c use arch provided MMIO accessors when
available for big endian and repeat operations. Also while at it,
fix the *_be version which are currently broken for PIO

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---

And because a patch is better than a long email...

Index: linux-cell/lib/iomap.c
===================================================================
--- linux-cell.orig/lib/iomap.c 2006-11-05 14:21:29.000000000 +1100
+++ linux-cell/lib/iomap.c 2006-11-05 14:30:32.000000000 +1100
@@ -34,6 +34,71 @@
#define PIO_RESERVED 0x40000UL
#endif

+#ifndef HAVE_ARCH_FULL_MMIO_SET
+/*
+ * Allow arch to provide _be versions and "repeat" versions. If not, we
+ * define default implementations here
+ */
+#define readw_be(addr) be16_to_cpu(__raw_readw(addr))
+#define readl_be(addr) be32_to_cpu(__raw_readl(addr))
+#define writew_be(val, addr) __raw_writew(cpu_to_be16(val), addr)
+#define writel_be(val, addr) __raw_writel(cpu_to_be32(val), addr)
+
+/*
+ * These are the "repeat MMIO read/write" functions.
+ * Note the "__raw" accesses, since we don't want to
+ * convert to CPU byte order. We write in "IO byte
+ * order" (we also don't have IO barriers).
+ */
+static inline void readsb(void __iomem *addr, u8 *dst, int count)
+{
+ while (--count >= 0) {
+ u8 data = __raw_readb(addr);
+ *dst = data;
+ dst++;
+ }
+}
+static inline void readsw(void __iomem *addr, u16 *dst, int count)
+{
+ while (--count >= 0) {
+ u16 data = __raw_readw(addr);
+ *dst = data;
+ dst++;
+ }
+}
+static inline void readsl(void __iomem *addr, u32 *dst, int count)
+{
+ while (--count >= 0) {
+ u32 data = __raw_readl(addr);
+ *dst = data;
+ dst++;
+ }
+}
+
+static inline void writesb(void __iomem *addr, const u8 *src, int count)
+{
+ while (--count >= 0) {
+ __raw_writeb(*src, addr);
+ src++;
+ }
+}
+static inline void writesw(void __iomem *addr, const u16 *src, int count)
+{
+ while (--count >= 0) {
+ __raw_writew(*src, addr);
+ src++;
+ }
+}
+static inline void writesl(void __iomem *addr, const u32 *src, int count)
+{
+ while (--count >= 0) {
+ __raw_writel(*src, addr);
+ src++;
+ }
+}
+
+#endif /* !defined(HAVE_ARCH_FULL_MMIO_SET) */
+
/*
* Ugly macros are a way of life.
*/
@@ -60,7 +125,7 @@
}
unsigned int fastcall ioread16be(void __iomem *addr)
{
- IO_COND(addr, return inw(port), return be16_to_cpu(__raw_readw(addr)));
+ IO_COND(addr, return swab16(inw(port)), return readw_be(addr));
}
unsigned int fastcall ioread32(void __iomem *addr)
{
@@ -68,7 +133,7 @@
}
unsigned int fastcall ioread32be(void __iomem *addr)
{
- IO_COND(addr, return inl(port), return be32_to_cpu(__raw_readl(addr)));
+ IO_COND(addr, return swab32(inl(port)), return readl_be(addr));
}
EXPORT_SYMBOL(ioread8);
EXPORT_SYMBOL(ioread16);
@@ -86,7 +151,7 @@
}
void fastcall iowrite16be(u16 val, void __iomem *addr)
{
- IO_COND(addr, outw(val,port), __raw_writew(cpu_to_be16(val), addr));
+ IO_COND(addr, outw(swab16(val),port), writew_be(val, addr));
}
void fastcall iowrite32(u32 val, void __iomem *addr)
{
@@ -94,7 +159,7 @@
}
void fastcall iowrite32be(u32 val, void __iomem *addr)
{
- IO_COND(addr, outl(val,port), __raw_writel(cpu_to_be32(val), addr));
+ IO_COND(addr, outl(swab32(val),port), writel_be(val, addr));
}
EXPORT_SYMBOL(iowrite8);
EXPORT_SYMBOL(iowrite16);
@@ -102,70 +167,18 @@
EXPORT_SYMBOL(iowrite32);
EXPORT_SYMBOL(iowrite32be);

-/*
- * These are the "repeat MMIO read/write" functions.
- * Note the "__raw" accesses, since we don't want to
- * convert to CPU byte order. We write in "IO byte
- * order" (we also don't have IO barriers).
- */
-static inline void mmio_insb(void __iomem *addr, u8 *dst, int count)
-{
- while (--count >= 0) {
- u8 data = __raw_readb(addr);
- *dst = data;
- dst++;
- }
-}
-static inline void mmio_insw(void __iomem *addr, u16 *dst, int count)
-{
- while (--count >= 0) {
- u16 data = __raw_readw(addr);
- *dst = data;
- dst++;
- }
-}
-static inline void mmio_insl(void __iomem *addr, u32 *dst, int count)
-{
- while (--count >= 0) {
- u32 data = __raw_readl(addr);
- *dst = data;
- dst++;
- }
-}
-
-static inline void mmio_outsb(void __iomem *addr, const u8 *src, int count)
-{
- while (--count >= 0) {
- __raw_writeb(*src, addr);
- src++;
- }
-}
-static inline void mmio_outsw(void __iomem *addr, const u16 *src, int count)
-{
- while (--count >= 0) {
- __raw_writew(*src, addr);
- src++;
- }
-}
-static inline void mmio_outsl(void __iomem *addr, const u32 *src, int count)
-{
- while (--count >= 0) {
- __raw_writel(*src, addr);
- src++;
- }
-}

void fastcall ioread8_rep(void __iomem *addr, void *dst, unsigned long count)
{
- IO_COND(addr, insb(port,dst,count), mmio_insb(addr, dst, count));
+ IO_COND(addr, insb(port,dst,count), readsb(addr, dst, count));
}
void fastcall ioread16_rep(void __iomem *addr, void *dst, unsigned long count)
{
- IO_COND(addr, insw(port,dst,count), mmio_insw(addr, dst, count));
+ IO_COND(addr, insw(port,dst,count), readsw(addr, dst, count));
}
void fastcall ioread32_rep(void __iomem *addr, void *dst, unsigned long count)
{
- IO_COND(addr, insl(port,dst,count), mmio_insl(addr, dst, count));
+ IO_COND(addr, insl(port,dst,count), readsl(addr, dst, count));
}
EXPORT_SYMBOL(ioread8_rep);
EXPORT_SYMBOL(ioread16_rep);
@@ -173,15 +186,15 @@

void fastcall iowrite8_rep(void __iomem *addr, const void *src, unsigned long count)
{
- IO_COND(addr, outsb(port, src, count), mmio_outsb(addr, src, count));
+ IO_COND(addr, outsb(port, src, count), writesb(addr, src, count));
}
void fastcall iowrite16_rep(void __iomem *addr, const void *src, unsigned long count)
{
- IO_COND(addr, outsw(port, src, count), mmio_outsw(addr, src, count));
+ IO_COND(addr, outsw(port, src, count), writesw(addr, src, count));
}
void fastcall iowrite32_rep(void __iomem *addr, const void *src, unsigned long count)
{
- IO_COND(addr, outsl(port, src,count), mmio_outsl(addr, src, count));
+ IO_COND(addr, outsl(port, src,count), writesl(addr, src, count));
}
EXPORT_SYMBOL(iowrite8_rep);
EXPORT_SYMBOL(iowrite16_rep);

2006-11-05 03:34:42

by Linus Torvalds

[permalink] [raw]

Subject: Re: lib/iomap.c mmio_{in,out}s* vs. __raw_* accessors

On Sun, 5 Nov 2006, Benjamin Herrenschmidt wrote:
>
> The -only- problem I'm talking about is that in order to implement the
> above, this IOCOND does something around the line of:
>
> if (pio)
> use_pio_accessor()
> else
> use_mmio_accessor()
>
> It does that, by using the existing, arch provided, accessors for PIO
> and MMIO (inX, outX, readX, writeX), which are doing the right thing vs.
> barriers etc, so again, all is good and well.... until you hit cases
> where such accessors don't exist !

Right. Your first email seemed to make sense. However, you then said (in
the second email, the one I responded to), and THAT didn't make any sense
at all:

Versions that would transparently use MMIO or PIO would make sense. A
pure MMIO implementation doesn't, that has to be arch specific. It makes
the generic iomap suddently non-portable in some ways.

This is the part I claim says that you don't seem to make any sense. You
seemed to be missing the point about the whole "repeat" instructions,
since the WHOLE POINT there was that it would _not_ transparently use MMIO
or PIO, it would very much _nontransparently_ use one or the other,
exactly because the choice has already very much been made.

> - "be" versions of MMIO accesses. Example:

Now, this one we should probably just drop, because nobody uses it.
Big-endian IO devices basically don't exist any more, and if they do, they
wouldn't use any generic structures.

> - repeat versions of MMIO accessors. Example:

So this one is the only one that makes sense. But it DOES NOT MAKE SENSE
in light of your second email that talked about "transparently use MMIO or
PIO". That's what made me think that you were just blathering.

But I also don't want to add _yet_another_ bogus IO accessor function to
all architectures that nobody actually uses except for this one little
thing. Quite frankly, I'd rather just add something like

#ifndef iobarrier_w
# define iobarrier_w() do { } while (0)
#endif

and then use iobarrier_w()" in the "mmio" version of the __raw_writel loop
(and same for iobarrier_r() for the __raw_readl).

Linus

2006-11-05 03:47:15

by Linus Torvalds

[permalink] [raw]

Subject: Re: lib/iomap.c mmio_{in,out}s* vs. __raw_* accessors

On Sun, 5 Nov 2006, Benjamin Herrenschmidt wrote:
>
> Make the generic lib/iomap.c use arch provided MMIO accessors when
> available for big endian and repeat operations. Also while at it,
> fix the *_be version which are currently broken for PIO

Just rip the _be versions out, methinks.

Also, what does your "writesb()" actually look like? I assume it's the
exact same thing as the generic one, with just another barrier. No?

Linus

2006-11-05 03:50:08

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: lib/iomap.c mmio_{in,out}s* vs. __raw_* accessors

> > - "be" versions of MMIO accesses. Example:
>
> Now, this one we should probably just drop, because nobody uses it.
> Big-endian IO devices basically don't exist any more, and if they do, they
> wouldn't use any generic structures.

I'd be happy to do so :) I just added _be versions for PowerPC for that
(writew_be, etc...) and an ARCH_HAVE_* to tell iomap that they exit. I
also found out that the _be implementation in lib/iomap is broken for
PIO (it just uses the LE versions without swap).

I just sent you a patch that allows the arch to provide it's own
versions. But if you want, I can redo it dropping the "be" thingies.

There are BE devices though... just generally not on PCI (Heh, I've just
had to deal with some folks doing a BE version of EHCI !). For plaform
specific stuffs, PowerPC has it's own low level accessors
(in_le16/in_be16/etc....) so it doesn't really matter unless we start
wanting drivers for those weird animals to get a bit more generic.

> > - repeat versions of MMIO accessors. Example:
>
> So this one is the only one that makes sense. But it DOES NOT MAKE SENSE
> in light of your second email that talked about "transparently use MMIO or
> PIO". That's what made me think that you were just blathering.

I think I just badly expressed myself. My appologies.

> But I also don't want to add _yet_another_ bogus IO accessor function to
> all architectures that nobody actually uses except for this one little
> thing. Quite frankly, I'd rather just add something like
>
> #ifndef iobarrier_w
> # define iobarrier_w() do { } while (0)
> #endif
>
> and then use iobarrier_w()" in the "mmio" version of the __raw_writel loop
> (and same for iobarrier_r() for the __raw_readl).

Unfortunately, that is not enough for us...

In the case of reads. I need a barrier before the read and I need to
create a data dependency on the read data to force the CPU to perform
the read followed by an instruction barrier...

Also, with our new ultra-hard-like-x86 semantics we have now (at least
until I try again sorting that other mess out), our stores also need
additional barriers before the stores and setting a magic per-CPU bit
after... That sort of crap isn't easy to expose in the form of a couple
of barrier macros so I'd rather keep it out of the way safely inside
asm-powerpc/io.h

I've actually come accross buggy drivers that could use some "repeat"
version of the MMIO accessors a couple of times in the past in fact
(they used a loop instead and got the endian wrong of course). So I'm
not convinced they are that useless.... That sort of shit is actually
fairly common in the embedded world.

Anyway, the patch I sent you earlier allows the arch to provide them and
just falls back to the "hand made" versions if not. You might still want
to add this iobarrier_w() thingy in addition to that for other archs
that might want to use the fallback and need barriers though...

Cheers
Ben.

2006-11-05 03:56:12

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: lib/iomap.c mmio_{in,out}s* vs. __raw_* accessors

On Sat, 2006-11-04 at 19:46 -0800, Linus Torvalds wrote:
>
> On Sun, 5 Nov 2006, Benjamin Herrenschmidt wrote:
> >
> > Make the generic lib/iomap.c use arch provided MMIO accessors when
> > available for big endian and repeat operations. Also while at it,
> > fix the *_be version which are currently broken for PIO
>
> Just rip the _be versions out, methinks.
>
> Also, what does your "writesb()" actually look like? I assume it's the
> exact same thing as the generic one, with just another barrier. No?

With our old naming (we used _ins* for the base operations, the PIO ones
just calling them with a magic offset) :

void _insb(const volatile u8 __iomem *port, void *buf, long count)
{
u8 *tbuf = buf;
u8 tmp;

if (unlikely(count <= 0))
return;
asm volatile("sync");
do {
tmp = *port;
asm volatile("eieio");
*tbuf++ = tmp;
} while (--count != 0);
asm volatile("twi 0,%0,0; isync" : : "r" (tmp));
}
EXPORT_SYMBOL(_insb);

void _outsb(volatile u8 __iomem *port, const void *buf, long count)
{
const u8 *tbuf = buf;

if (unlikely(count <= 0))
return;
asm volatile("sync");
do {
*port = *tbuf++;
} while (--count != 0);
asm volatile("sync");
}
EXPORT_SYMBOL(_outsb);

I'm not even entirely sure that the insb one is correct, maybe it should
or-in all the values to get a proper data dependency that acutally
depends on all previous loads to pass to the twi instructions (it's a
conditional trap instruction with the condition set as never, it forces
the CPU to think the data has been consumed and thus, in conjunction
with the subsequent isync, forces the loads to be performed before any
further code can execute).

The store also has two barriers as you can see. That's mostly related to
the need for very synchronous IOs to please drivers, though the second
one could be replaced with

get_paca()->io_sync = 1;

(Which is a new mecanism we introduced recently to tell spin_unlock that
one or previous MMIO stores might need a sync before we unlock. Looks
like we didn't update the "s" functions when adding that).

Ben.

2006-11-05 04:01:23

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: lib/iomap.c mmio_{in,out}s* vs. __raw_* accessors

2006-11-05 04:17:19

by Linus Torvalds

[permalink] [raw]

Subject: Re: lib/iomap.c mmio_{in,out}s* vs. __raw_* accessors

On Sun, 5 Nov 2006, Benjamin Herrenschmidt wrote:
> >
> > Just rip the _be versions out, methinks.
>
> At least one user:
>
> ./drivers/scsi/53c700.h: __u32 value = bEBus ? ioread32be(hostdata->base + reg) :
> ./drivers/scsi/53c700.h: bEBus ? iowrite32be(value, hostdata->base + reg):
>
> Should I make it use explicit swab32 instead ?

Well, I actually really dislike your version with the explicit swab.

The _only_ reason to use "ioread32be()" would be because the machine is
actually natively BE, and you want to avoid swab. That's kind of the point
of using "be32_to_cpu(__raw_readl(addr)))" like we do now - it will do the
byte swap only if it's necessary.

In contrast, your "swab(readl())" does _two_ byteswaps - once to turn it
into LE, then to turn it back into BE.

So if we can't just rip it out, then we sure as hell shouldn't replace it
with something that is obviously worse either.

In other words - I don't see the reasoning here again. You seem to want to
make the code just worse.

Linus

2006-11-05 04:39:34

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: lib/iomap.c mmio_{in,out}s* vs. __raw_* accessors

> The _only_ reason to use "ioread32be()" would be because the machine is
> actually natively BE, and you want to avoid swab. That's kind of the point
> of using "be32_to_cpu(__raw_readl(addr)))" like we do now - it will do the
> byte swap only if it's necessary.
>
> In contrast, your "swab(readl())" does _two_ byteswaps - once to turn it
> into LE, then to turn it back into BE.

I'm not doing a swab(readl()), I'm doing a swab(insl()) and have the
arch provide a native BE accessor for readl_be(). The idea is that I
don't want to add _be accessors for PIO and PIO is slow anyway. But I'm
providing one for MMIO, along with the repeat versions. Have a second
look.

> So if we can't just rip it out, then we sure as hell shouldn't replace it
> with something that is obviously worse either.
>
> In other words - I don't see the reasoning here again. You seem to want to
> make the code just worse.

Wait, let's make thing clear, there are 2 things here:

- MMIO : For that, I'm providing readw_be etc... which my patch defines
based on __raw_* just as your suggest, I'm just adding a way for the
arch to provide its own.

- PIO : This is broken -now-. The current code doesn't swap at all in
the PIO case, thus you get LE result when using ioread32be() on PIO. I
propose to fix that with swab() because PIO sucks already, there is no
"__raw" for PIO and it doesn't deserve new accessors nor speed.

Cheers,
Ben.

2006-11-05 05:01:36

by Linus Torvalds

[permalink] [raw]

Subject: Re: lib/iomap.c mmio_{in,out}s* vs. __raw_* accessors

On Sun, 5 Nov 2006, Benjamin Herrenschmidt wrote:
>
> I'm not doing a swab(readl()), I'm doing a swab(insl()) and have the
> arch provide a native BE accessor for readl_be().

Ok.

Can you work based on something like this instead?

(Totally untested, I just did this as an example of what I think is a lot
more maintainable)

Basically, it allow you to replace _all_ of the individual small thing,
and doesn't require any architecture that doesn't need to, to do anything
new at all. I find it very irritating to have architectures that don't
actually need any new stuff to have to define new accessor functions, and
I also find it silly to have to have magic new #define's to show that you
have some other functions.

This just basically says:

If something isn't #define'd by the architecture, we'll fall back
to using our own defaults.

which is nice, in that you don't have to have any definitions, and if you
_do_ have definitions, it's only for the functions that you actually
define, ie there is no new magic ARCH_HAS_xyzzy to keep track of.

So if you can do (for example) a "pio_read16be()" in the architecture some
sane way, just do it, either directly as a #define, or as an inline
function that you then tell the rest of the system is there by just doing

#define pio_read16be pio_read16be

and you're all done. In other words, you can have fine granularity of what
the architecture supports, _without_ having to have tons of illogical and
hard-to-track ARCH_HAS_xyzzy things. You define exactly what you have.

This does assume that if you have the "16be" functions, you'll also have
the 32be versions, of course - there's "fine granularity" and then there's
"insanity", and this goes for the non-insane granularity ;)

Anyway, I'll just blow away this example from my tree, I generated it just
for discussion, and as an example of what I think is much easier to
maintain. It's also an example of something I'll happily apply if it comes
back with a "yeah, that works for me, and I tested it"

Linus

---
diff --git a/lib/iomap.c b/lib/iomap.c
index 55689c5..d6ccdd8 100644
--- a/lib/iomap.c
+++ b/lib/iomap.c
@@ -50,6 +50,16 @@
} \
} while (0)

+#ifndef pio_read16be
+#define pio_read16be(port) swab16(inw(port))
+#define pio_read32be(port) swab32(inl(port))
+#endif
+
+#ifndef mmio_read16be
+#define mmio_read16be(addr) be16_to_cpu(__raw_readw(addr))
+#define mmio_read32be(addr) be32_to_cpu(__raw_readl(addr))
+#endif
+
unsigned int fastcall ioread8(void __iomem *addr)
{
IO_COND(addr, return inb(port), return readb(addr));
@@ -60,7 +70,7 @@ unsigned int fastcall ioread16(void __io
}
unsigned int fastcall ioread16be(void __iomem *addr)
{
- IO_COND(addr, return inw(port), return be16_to_cpu(__raw_readw(addr)));
+ IO_COND(addr, return pio_read16be(port), return mmio_read16be(addr));
}
unsigned int fastcall ioread32(void __iomem *addr)
{
@@ -68,7 +78,7 @@ unsigned int fastcall ioread32(void __io
}
unsigned int fastcall ioread32be(void __iomem *addr)
{
- IO_COND(addr, return inl(port), return be32_to_cpu(__raw_readl(addr)));
+ IO_COND(addr, return pio_read32be(port), return mmio_read32be(addr));
}
EXPORT_SYMBOL(ioread8);
EXPORT_SYMBOL(ioread16);
@@ -76,6 +86,16 @@ EXPORT_SYMBOL(ioread16be);
EXPORT_SYMBOL(ioread32);
EXPORT_SYMBOL(ioread32be);

+#ifndef pio_write16be
+#define pio_write16be(val,port) outw(swab16(val),port)
+#define pio_write32be(val,port) outl(swab32(val),port)
+#endif
+
+#ifndef mmio_write16be
+#define mmio_write16be(val,port) __raw_writew(be16_to_cpu(val),port)
+#define mmio_write32be(val,port) __raw_writel(be32_to_cpu(val),port)
+#endif
+
void fastcall iowrite8(u8 val, void __iomem *addr)
{
IO_COND(addr, outb(val,port), writeb(val, addr));
@@ -86,7 +106,7 @@ void fastcall iowrite16(u16 val, void __
}
void fastcall iowrite16be(u16 val, void __iomem *addr)
{
- IO_COND(addr, outw(val,port), __raw_writew(cpu_to_be16(val), addr));
+ IO_COND(addr, pio_write16be(val,port), mmio_write16be(val, addr));
}
void fastcall iowrite32(u32 val, void __iomem *addr)
{
@@ -94,7 +114,7 @@ void fastcall iowrite32(u32 val, void __
}
void fastcall iowrite32be(u32 val, void __iomem *addr)
{
- IO_COND(addr, outl(val,port), __raw_writel(cpu_to_be32(val), addr));
+ IO_COND(addr, pio_write32be(val,port), mmio_write32be(val, addr));
}
EXPORT_SYMBOL(iowrite8);
EXPORT_SYMBOL(iowrite16);
@@ -108,6 +128,7 @@ EXPORT_SYMBOL(iowrite32be);
* convert to CPU byte order. We write in "IO byte
* order" (we also don't have IO barriers).
*/
+#ifndef mmio_insb
static inline void mmio_insb(void __iomem *addr, u8 *dst, int count)
{
while (--count >= 0) {
@@ -132,7 +153,9 @@ static inline void mmio_insl(void __iome
dst++;
}
}
+#endif

+#ifndef mmio_outsb
static inline void mmio_outsb(void __iomem *addr, const u8 *src, int count)
{
while (--count >= 0) {
@@ -154,6 +177,7 @@ static inline void mmio_outsl(void __iom
src++;
}
}
+#endif

void fastcall ioread8_rep(void __iomem *addr, void *dst, unsigned long count)
{

2006-11-05 05:09:28

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: lib/iomap.c mmio_{in,out}s* vs. __raw_* accessors

> Can you work based on something like this instead?
>
> (Totally untested, I just did this as an example of what I think is a lot
> more maintainable)

Yup, that would definitely work for me.

I'll do the same for the repeat ops..

Thanks,
Ben.

2006-11-05 05:30:53

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: lib/iomap.c mmio_{in,out}s* vs. __raw_* accessors

On Sun, 2006-11-05 at 16:08 +1100, Benjamin Herrenschmidt wrote:
> > Can you work based on something like this instead?
> >
> > (Totally untested, I just did this as an example of what I think is a lot
> > more maintainable)
>
> Yup, that would definitely work for me.
>
> I'll do the same for the repeat ops..

I'm blind, didn't see you did it for them already :-)

Ok, your patch builds fine here. I can't test at the moment as I don't
have a machine at hand that has a device whose driver uses the ops in
iomap though, but I can't see any reason why it wouldn't work if it
builds, so as far as I'm concerned, that's good to go in 2.6.20.
(earlier if you wish but I won't submit the patch doing the powerpc
changes that makes me use those change before 2.6.20 obviously :-)

Cheers,
Ben.

2006-11-06 02:56:42

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: lib/iomap.c mmio_{in,out}s* vs. __raw_* accessors

> Ok, your patch builds fine here. I can't test at the moment as I don't
> have a machine at hand that has a device whose driver uses the ops in
> iomap though, but I can't see any reason why it wouldn't work if it
> builds, so as far as I'm concerned, that's good to go in 2.6.20.
> (earlier if you wish but I won't submit the patch doing the powerpc
> changes that makes me use those change before 2.6.20 obviously :-)

In fact, you might want to push it to 2.6.19 since it fixes a bug
(current _be operations are incorrect for PIO without the patch).

Cheers,
Ben.

2006-11-06 03:16:38

by Linus Torvalds

[permalink] [raw]

Subject: Re: lib/iomap.c mmio_{in,out}s* vs. __raw_* accessors

On Mon, 6 Nov 2006, Benjamin Herrenschmidt wrote:
>
> In fact, you might want to push it to 2.6.19 since it fixes a bug
> (current _be operations are incorrect for PIO without the patch).

Well, I doubt anybody uses them (or we'd have seen the problem), and more
importantly, I've already blown the patch away. If you think it's easier
for you to sync up later, though, I'll happily apply it. Mind sending it
back to me with a Tested-by: line or something? I literally didn't even
compile-test the thing, and blew that file away after I had generated the
trial patch.

Linus

2006-11-06 04:51:20

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: lib/iomap.c mmio_{in,out}s* vs. __raw_* accessors

On Sun, 2006-11-05 at 19:13 -0800, Linus Torvalds wrote:
>
> On Mon, 6 Nov 2006, Benjamin Herrenschmidt wrote:
> >
> > In fact, you might want to push it to 2.6.19 since it fixes a bug
> > (current _be operations are incorrect for PIO without the patch).
>
> Well, I doubt anybody uses them (or we'd have seen the problem), and more
> importantly, I've already blown the patch away. If you think it's easier
> for you to sync up later, though, I'll happily apply it. Mind sending it
> back to me with a Tested-by: line or something? I literally didn't even
> compile-test the thing, and blew that file away after I had generated the
> trial patch.

I'll send it to you later then, in the merge window, along with the
stuff that uses it. That will give me a bit more time to actually test
it.

Cheers,
Ben.