2004-09-15 16:33:17

by Linus Torvalds

[permalink] [raw]
Subject: Being more anal about iospace accesses..


This is a background mail mainly for driver writers and/or architecture
people. Or others that are just interested in really low-level hw access
details. Others - please feel free to ignore.

[ This has been discussed to some degree already on the architecture
mailing lists and obviously among the people who actually worked on it,
but I thought I'd bounce it off linux-kernel too, in order to make
people more aware of what the new type-checking does. Most people may
have seen it as only generating a ton of new warnings for some crufty
device drivers. ]

The background for this iospace type-checking change is that we've long
had some serious confusion about how to access PCI memory mapped IO
(MMIO), mainly because on a PC (and some non-PC's too) that IO really does
look like regular memory, so you can have a driver that just accesses a
pointer directly, and it will actually work on most machines.

At the same time, we've had the proper "accessor" functions (read[bwl](),
write[bwl]() and friends) that on purpose dropped all type information
from the MMIO pointer, mostly just because of historical reasons, and as a
result some drivers didn't use a pointer at all, but some kind of integer.
Sometimes even one that couldn't _fit_ a MMIO address in it on a 64-bit
machine.

In short, the PCI MMIO access case was largely the same as the user
pointer case, except the access functions were different (readb vs
get_user) and they were even less lax about checking for sanity. At least
the user access code required a pointer with the right size.

We've been very successful in annotating user pointers, and that found a
couple of bugs, and more importantly it made the kernel code much more
"aware" of what kind of pointer was passed around. In general, a big
success, I think. And an obvious example for what MMIO pointers should do.

So lately, the kernel infrastructure for MMIO accesses has become a _lot_
more strict about what it accepts. Not only do the MMIO access functions
want a real pointer (which is already more type-checking than we did
before, and causes gcc to spew out lots of warnings for some drivers), but
as with user pointers, sparse annotations mark them as being in a
different address space, and building the kernel with checking on will
warn about mixing up address spaces. So far so good.

So right now the current snapshots (and 2.6.9-rc2) have this enabled, and
some drivers will be _very_ noisy when compiled. Most of the regular ones
are fine, so maybe people haven't even noticed it that much, but some of
them were using things like "u32" to store MMIO pointers, and are
generally extremely broken on anything but an x86. We'll hopefully get
around to fixing them up eventually, but in the meantime this should at
least explain the background for some of the new noise people may see.

Perhaps even more interesting is _another_ case of driver, though: one
that started warning not because it was ugly and broken, but because it
did something fairly rare but something that does happen occasionally: it
mixed PIO and MMIO accesses on purpose, because it drove hardware that
literally uses one or the other.

Sometimes such a "mixed interface" driver does it based on a compile
option that just #defines 'writel()' to 'inl()', sometimes it's a runtime
decision depending on the hardware or configuration.

The anal typechecking obviously ended up being very unhappy about this,
since it wants "void __iomem *" for MMIO pointers, and a normal "unsigned
long" for PIO accesses. The compile-time option could have been easily
fixed up by adding the proper cast when re-defining the IO accessor, but
that doesn't work for the dynamic case.

Also, the compile-time switchers often really _wanted_ to be dynamic, but
it was just too painful with the regular Linux IO interfaces to duplicate
the code and do things conditionally one way or the other.

To make a long story even longer: rather than scrapping the typechecking,
or requiring drivers to do strange and nasty casts all over the place,
there's now a new interface in town. It's called "iomap", because it
extends the old "ioremap()" interface to work on the PIO accesses too.

That way, the drivers that really want to mix both PIO and MMIO accesses
can very naturally do it: they just need to remap the PIO space too, the
same way that we've required people to remap the MMIO space for a long
long time.

For example, if you don't know (or, more importantly - don't care) what
kind of IO interface you use, you can now do something like

void __iomem * map = pci_iomap(dev, bar, maxbytes);
...
status = ioread32(map + DRIVER_STATUS_OFFSET);

and it will do the proper IO mapping for the named PCI BAR for that
device. Regardless of whether the BAR was an IO or MEM mapping. Very
convenient for cases where the hardware migt expose its IO window in
either (or sometimes both).

Nothing in the current tree actually uses this new interface, although
Jeff has patches for SATA for testing (and they clean up the code quite
noticeably, never mind getting rid of the warnings). The interface has
been implemented by yours truly for x86 and ppc64, and David did a
first-pass version for sparc64 too (missing the "xxxx_rep()" functions
that were added a bit later, I believe).

So far experience seems to show that it's a very natural interface for
most non-x86 hardware - they all tend to map in both PIO and MMIO into one
address space _anyway_, so the two aren't really any different. It's
mainly just x86 and it's ilk that actually have two different interfaces
for the two kinds of PCI accesses, and at least in that case it's trivial
to encode the difference in the virtual ioremap pointer.

The best way to explain the interface is to just point you guys at the
<asm-generic/iomap.h> file, which isn't very big, has about as much
comments than code, and contains nothing but the necessary function
declarations. The actual meaning of the functions should be pretty
obvious even without the comments.

Feel free to flame or discuss rationally,

Linus


2004-09-15 16:57:56

by Jörn Engel

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..

On Wed, 15 September 2004 09:30:42 -0700, Linus Torvalds wrote:
>
> For example, if you don't know (or, more importantly - don't care) what
> kind of IO interface you use, you can now do something like
>
> void __iomem * map = pci_iomap(dev, bar, maxbytes);
> ...
> status = ioread32(map + DRIVER_STATUS_OFFSET);

C now supports pointer arithmetic with void*? I hope the width of a
void is not architecture dependent, that would introduce more subtle
bugs.

J?rn

--
Sometimes, asking the right question is already the answer.
-- Unknown

2004-09-15 16:58:44

by Dave Jones

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..

On Wed, Sep 15, 2004 at 09:30:42AM -0700, Linus Torvalds wrote:

> So right now the current snapshots (and 2.6.9-rc2) have this enabled, and
> some drivers will be _very_ noisy when compiled. Most of the regular ones
> are fine, so maybe people haven't even noticed it that much, but some of
> them were using things like "u32" to store MMIO pointers, and are
> generally extremely broken on anything but an x86. We'll hopefully get
> around to fixing them up eventually, but in the meantime this should at
> least explain the background for some of the new noise people may see.

For the curious, 6MB of sparse output is generated from a make allmodconfig
right now. (http://www.codemonkey.org.uk/junk/2.6.9-rc2-warnings.txt)

You can filter out just the __iomem warnings by grepping for asn:2

Dave

2004-09-15 17:11:06

by Jeff Garzik

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..

On Wed, Sep 15, 2004 at 06:54:50PM +0200, J?rn Engel wrote:
> On Wed, 15 September 2004 09:30:42 -0700, Linus Torvalds wrote:
> >
> > For example, if you don't know (or, more importantly - don't care) what
> > kind of IO interface you use, you can now do something like
> >
> > void __iomem * map = pci_iomap(dev, bar, maxbytes);
> > ...
> > status = ioread32(map + DRIVER_STATUS_OFFSET);
>
> C now supports pointer arithmetic with void*? I hope the width of a
> void is not architecture dependent, that would introduce more subtle
> bugs.

gcc extension, which has been used in the kernel for ages.

Jeff



2004-09-15 17:11:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..



On Wed, 15 Sep 2004, J?rn Engel wrote:
>
> C now supports pointer arithmetic with void*?

C doesn't. gcc does. It's a documented extension, and it acts like if it
was done on a byte.

See gcc's user guide "Extension to the C Language Family".

It's a singularly good feature.

Linus

2004-09-15 17:23:49

by Roger Luethi

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..

On Wed, 15 Sep 2004 09:30:42 -0700, Linus Torvalds wrote:
> This is a background mail mainly for driver writers and/or architecture

Thanks, I appreciate the effort (speaking as the maintainer of an
"even more interesting" driver).

Roger

2004-09-15 17:28:08

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..

On Wed, 15 Sep 2004, Linus Torvalds wrote:

>
> This is a background mail mainly for driver writers and/or architecture
> people. Or others that are just interested in really low-level hw access
> details. Others - please feel free to ignore.
>
[SNIPPED mostly....]


> For example, if you don't know (or, more importantly - don't care) what
> kind of IO interface you use, you can now do something like
>
> void __iomem * map = pci_iomap(dev, bar, maxbytes);
> ...
> status = ioread32(map + DRIVER_STATUS_OFFSET);
^^^^^^^^^^^^^^^^

Doesn't this rely on the non-standard GNUism that you can
perform pointer-arithmetic on a void-pointer? Which is illegal,
immoral, and fattening. I'd much rather see char-pointers so
it's valid to perform the offset math. That way, in the future,
a new tool that follows (ANSI, IEEE, POSIX) rules doesn't barf.
I suggest a new pointer type like (BASE *) or (BAR *) that
hides the (unsigned char *) necessary to not barf, plus
minimize side-effects.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.26 on an i686 machine (5570.56 BogoMips).
Note 96.31% of all statistics are fiction.

2004-09-15 17:21:32

by Roland Dreier

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..

J?rn> C now supports pointer arithmetic with void*? I hope the
J?rn> width of a void is not architecture dependent, that would
J?rn> introduce more subtle bugs.

Pointer arithmetic on void * has been a gcc extension since forever
(gcc acts as though sizeof (void) == 1).

However, I somewhat agree -- it's ugly for drivers rely on this and do
arithmetic on void *. It should be OK for a driver to use
char __iomem * for its IO base if it needs to add in offsets, right?

- R.

2004-09-15 17:38:17

by Horst H. von Brand

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..

=?iso-8859-1?Q?J=F6rn?= Engel <[email protected]> said:
> On Wed, 15 September 2004 09:30:42 -0700, Linus Torvalds wrote:

[...]

> > For example, if you don't know (or, more importantly - don't care) what
> > kind of IO interface you use, you can now do something like
> >
> > void __iomem * map = pci_iomap(dev, bar, maxbytes);
> > ...
> > status = ioread32(map + DRIVER_STATUS_OFFSET);

> C now supports pointer arithmetic with void*?

It doesn't. It's a gcc-ism.

> I hope the width of a
> void is not architecture dependent, that would introduce more subtle
> bugs.

gcc takes it as a char pointer for such uses.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513

2004-09-15 17:42:53

by Linus Torvalds

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..



On Wed, 15 Sep 2004, Roland Dreier wrote:
>
> However, I somewhat agree -- it's ugly for drivers rely on this and do
> arithmetic on void *. It should be OK for a driver to use
> char __iomem * for its IO base if it needs to add in offsets, right?

"char __iomem *" will certainly work - all the normal pointer conversions
are ok. Some people in fact use pointers to structures in MMIO space, and
this is quite reasonable when working with a chip that uses "mailboxes"
for commands.

However, I disagree with "void *" arithmetic being ugly. It's a very nice
feature to have a pointer that can be validly cast to any other type, and
that is the whole _point_ of "void *". The fact that C++ got that wrong is
arguably the worst failing of the language, causing tons of unnecessary
casts that can silently hide real bugs (maybe the thing you cast wasn't a
"void *" in the first place, but you'll never know - the compiler will do
the cast for you).

For example, to go back to the mailbox example, let's say that your
hardware has an IO area that is 8kB in size, with the last 4kB being
mailboxes.

The _sane_ way to do that is to do

void __iomem *base_io = ioremap(...);
struct mailbox __iomem *mbox = base_io + MAILBOX_OFFSET;

and then just work on that.

In contrast, havign to cast to a "char *" in order to do arithmetic, and
then casting back to the resultant structure type pointer is not only
ugly and unreadable, it's a lot more prone to errors as a result.

In other words, think of "void *" as a pointer to storage. Not "char"
(which is the C name for a signed byte), but really, it's the pointer to
whatever underlying memory there is. And a _fundamental_ part of such
memory is the fact that it is addressable. Thus "pointer to storage
arithmetic" really does make sense on a very fundamental level. It has
nothing to do with C types, and that also explains why "void *" silently
converts to anything else. It's a very internally consistent world-view.

Now, I disagree with gcc when it comes to actually taking the "size" of
void. Gcc will silently accept

void *x;
x = malloc(sizeof(*x));

which I consider to be an abomination (and the above _can_ happen, quite
easily, as part of macros for doing allocation etc - nobody would write
it in that form, but if you have an "MEMALLOC(x)" macro that does the
sizeof, you could end up trying to feed the compiler bogus code).

The fact that you can do arithmetic on typeless storage does _not_ imply
that typeless storage would have a "size" in my book.

So sparse will say:

warning: cannot size expression

and refuse to look at broken code like the above. But hey, the fact that I
have better taste than anybody else in the universe is just something I
have to live with. It's not easy being me.

Linus

2004-09-15 17:42:54

by Brian Gerst

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..

J?rn Engel wrote:
> On Wed, 15 September 2004 09:30:42 -0700, Linus Torvalds wrote:
>
>>For example, if you don't know (or, more importantly - don't care) what
>>kind of IO interface you use, you can now do something like
>>
>> void __iomem * map = pci_iomap(dev, bar, maxbytes);
>> ...
>> status = ioread32(map + DRIVER_STATUS_OFFSET);
>
>
> C now supports pointer arithmetic with void*? I hope the width of a
> void is not architecture dependent, that would introduce more subtle
> bugs.
>
> J?rn
>

http://gcc.gnu.org/onlinedocs/gcc-3.4.2/gcc/Pointer-Arith.html#Pointer-Arith

5.18 Arithmetic on void- and Function-Pointers

In GNU C, addition and subtraction operations are supported on pointers
to void and on pointers to functions. This is done by treating the size
of a void or of a function as 1.

A consequence of this is that sizeof is also allowed on void and on
function types, and returns 1.

The option -Wpointer-arith requests a warning if these extensions are used.

--
Brian Gerst

2004-09-15 17:49:11

by Nikita Danilov

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..

Linus Torvalds writes:
>
>
> On Wed, 15 Sep 2004, J?rn Engel wrote:
> >
> > C now supports pointer arithmetic with void*?
>
> C doesn't. gcc does. It's a documented extension, and it acts like if it
> was done on a byte.
>
> See gcc's user guide "Extension to the C Language Family".
>
> It's a singularly good feature.

Unfortunately it breaks even better identity

foo *p;

p + nr == (foo *)((char *)p + nr * sizeof *p)

unless one thinks that is --together with gcc-- that nothing occupies
exactly one byte.

>
> Linus

Nikita.

2004-09-15 17:36:34

by Jörn Engel

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..

On Wed, 15 September 2004 10:07:29 -0700, Linus Torvalds wrote:
> On Wed, 15 Sep 2004, J?rn Engel wrote:
> >
> > C now supports pointer arithmetic with void*?
>
> C doesn't. gcc does. It's a documented extension, and it acts like if it
> was done on a byte.
>
> See gcc's user guide "Extension to the C Language Family".
>
> It's a singularly good feature.

Nice.

But it still leaves me confused. Before I had this code:

struct regs {
uint32_t status;
...
}

...

struct regs *regs = ioremap(...);
uint32_t status = regs->status;
...

So now I should do it like this:

#define REG_STATUS 0

...

void __iomem *regs = ioremap(...);
uint32_t status = readl(regs + REG_STATUS);
...

But wait, that only works when long is 32bit wide. Plus I could be
stupid enough and "#define REG_STATUS 64" while the register space is
just 64 bytes long. It solves the confusion about address spaces,
agreed, but overall I'm more confused now. Hope it's just temporary.

J?rn

--
There is no worse hell than that provided by the regrets
for wasted opportunities.
-- Andre-Louis Moreau in Scarabouche

2004-09-15 18:01:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..



On Wed, 15 Sep 2004, J?rn Engel wrote:
>
> But it still leaves me confused. Before I had this code:
>
> struct regs {
> uint32_t status;
> ...
> }
>
> ...
>
> struct regs *regs = ioremap(...);
> uint32_t status = regs->status;
> ...
>
> So now I should do it like this:
>
> #define REG_STATUS 0
>
> ...
>
> void __iomem *regs = ioremap(...);
> uint32_t status = readl(regs + REG_STATUS);

No, you can certainly continue to use non-void pointers. The "void __iomem
*" case is just the typeless one, exactly equivalent to regular void
pointer usage.

So let me clarify my original post with two points:

- if your device only supports MMIO, you might as well just use the old
interfaces. The new interface will _also_ work, but there is no real
advantage, unless you count the "pci_iomap()" as a simpler interface.

The new interface is really only meaningful for things that want to
support _both_ PIO and MMIO. It's also, perhaps, a bit syntactically
easier to work with, so some people might prefer that for that
reason. See my comments further down on the auto-sizing. BUT it doesn't
make the old interfaces go away by any means, and I'm not even
suggesting that people should re-write drivers just for the hell of it.

In short: if you don't go "ooh, that will simplify XXX", then you
should just ignore the new interfaces.

- you can _absolutely_ use other pointers than "void *". You should
annotate them with "__iomem" if you want to be sparse-clean (and it
often helps visually to clarify the issue), but gcc won't care, the
"__iomem" annotation is purely a extended check.

So you can absolutely still continue with

struct mydev_iolayout {
__u32 status;
__u32 irqmask;
...

struct mydev_iolayout __iomem *regs = pci_map(...);
status = ioread32(&regs.status);

which is often a lot more readable, and thus in fact _preferred_. It also
adds another level of type-checking, so I applaud drivers that do this.

Now, I'm _contemplating_ also allowing the "get_user()" kind of "unsized"
access function for the new interface. Right now all the old (and the new)
access functions are all explicitly sized. But for the "struct layout"
case, it's actually often nice to just say

status = ioread(&regs.status);

and the compiler can certainly figure out the size of the register on its
own. This was impossible with the old interface, because the old
interfaces didn't even take a _pointer_, much less one that could be sized
up automatically.

(The auto-sizing is something that "get_user()" and "put_user()" have
always done, and it makes them very easy to use. It involved a few pretty
ugly macros, but hey, that's all hidden away, and is actually pretty
simple in the end).

Linus

2004-09-15 18:09:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..



On Wed, 15 Sep 2004, Linus Torvalds wrote:
>
> In short: if you don't go "ooh, that will simplify XXX", then you
> should just ignore the new interfaces.

Btw, to get an example of what _is_ simplified, look at
drivers/scsi/libata-core.c:

void ata_tf_load(struct ata_port *ap, struct ata_taskfile *tf)
{
if (ap->flags & ATA_FLAG_MMIO)
ata_tf_load_mmio(ap, tf);
else
ata_tf_load_pio(ap, tf);
}

and realize that "ata_tf_load_mmio()" and "ata_tf_load_pio()" are exactly
the SAME FUNCTION. Except one uses MMIO, the other uses PIO. With the new
setup, it literally collapses into one function, and code size goes down
pretty dramatically. Not to mention making the code more readable.

For another example of this (of the static kind), look at something like
drivers/net/8139too.c:

#ifdef USE_IO_OPS

#define RTL_R8(reg) inb (((unsigned long)ioaddr) + (reg))
#define RTL_R16(reg) inw (((unsigned long)ioaddr) + (reg))
#define RTL_R32(reg) ((unsigned long) inl (((unsigned long)ioaddr) + (reg)))
...

#else

...
/* read MMIO register */
#define RTL_R8(reg) readb (ioaddr + (reg))
#define RTL_R16(reg) readw (ioaddr + (reg))
#define RTL_R32(reg) ((unsigned long) readl (ioaddr + (reg)))

see? In this case, USE_IO_OPS depends on a static configuration variable,
namely CONFIG_8139TOO_PIO. So the user at _compile_ time has to decide
whether he wants to use MMIO or PIO. See the Kconfig help file:

This instructs the driver to use programmed I/O ports (PIO) instead
of PCI shared memory (MMIO). This can possibly solve some problems
in case your mainboard has memory consistency issues. If unsure,
say N.

In other words, the new interface is not designed to replace the old ones,
it's designed to help drivers like these, that either go to a lot of extra
pain in order to support both methods, or then have a _static_ config
option that makes it really hard for system vendors to just release one
driver that knows when it needs to use PIO and when it needs MMIO.

Linus

2004-09-15 18:14:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..



On Wed, 15 Sep 2004, Nikita Danilov wrote:
>
> Unfortunately it breaks even better identity
>
> foo *p;
>
> p + nr == (foo *)((char *)p + nr * sizeof *p)

No, gcc allows the above, by making sizeof(void) be 1.

And sane compilers would just inform the user at compile-time with a nice
readable message that he's doing something stupid.

Linus

2004-09-15 18:26:19

by George Spelvin

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..

> Nice.
>
> But it still leaves me confused. Before I had this code:
>
> struct regs {
> uint32_t status;
> ...
> }
>
> ...
>
> struct regs *regs = ioremap(...);
> uint32_t status = regs->status;
> ...
>
> So now I should do it like this:
>
> #define REG_STATUS 0
>
> ...
>
> void __iomem *regs = ioremap(...);
> uint32_t status = readl(regs + REG_STATUS);
> ...
>
> But wait, that only works when long is 32bit wide. Plus I could be
> stupid enough and "#define REG_STATUS 64" while the register space is
> just 64 bytes long. It solves the confusion about address spaces,
> agreed, but overall I'm more confused now. Hope it's just temporary.

No, you should do:

struct regs {
uint32_t status;
...
}

...

struct regs __iomem *regs = ioremap(...);
uint32_t status = ioread32(&regs->status);

2004-09-15 18:40:35

by Chris Wedgwood

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..

On Wed, Sep 15, 2004 at 10:07:29AM -0700, Linus Torvalds wrote:

> C doesn't. gcc does. It's a documented extension, and it acts like
> if it was done on a byte.

[...]

> It's a singularly good feature.

I dunno about that. Maybe it is, but it has some gotchas.

Recently when doing a sparsification of code I noticed there are
places which essentially do things like:

void *foo;

[...]

foo += bar * n;

Part of the fix (cleanup) was to change the 'void *foo' to
'gratuitous_typedef_t __user *foo' --- which silently breaks the math
if you don't explicitly check for this.


--cw

2004-09-15 19:34:58

by Greg KH

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..

On Wed, Sep 15, 2004 at 10:57:25AM -0700, Linus Torvalds wrote:
>
> So you can absolutely still continue with
>
> struct mydev_iolayout {
> __u32 status;
> __u32 irqmask;
> ...
>
> struct mydev_iolayout __iomem *regs = pci_map(...);
> status = ioread32(&regs.status);
>
> which is often a lot more readable, and thus in fact _preferred_. It also
> adds another level of type-checking, so I applaud drivers that do this.

Currently a few drivers do:
status = readl(&regs.status);
which causes sparse warnings.

How should that code be changed to prevent this? Convert them all to
ioread32()? Or figure out a way to supress the warning for readl()?

thanks,

greg k-h

2004-09-15 19:54:11

by Linus Torvalds

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..



On Wed, 15 Sep 2004, Greg KH wrote:
>
> Currently a few drivers do:
> status = readl(&regs.status);

I assume that's "&regs->status", since regs had better be a pointer..

> which causes sparse warnings.
>
> How should that code be changed to prevent this? Convert them all to
> ioread32()? Or figure out a way to supress the warning for readl()?

Just make sure that you annotate "regs" as a pointer to IO space.

Ie if you have

struct xxxx __iomem *regs;

then everything will be fine - sparse knows that "regs" is a IO pointer,
and that obviously makes "&regs->member" _also_ an IO pointer.

You'll need that __iomem annotation anyway, since otherwise sparse would
complain when you do the assignment from the "ioremap()" call (and the
thing had better come from ioremap() some way, or it's not valid in the
first place to do "readl()" on it).

Linus

2004-09-15 20:09:13

by Russell King

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..

On Wed, Sep 15, 2004 at 10:39:02AM -0700, Linus Torvalds wrote:
> In other words, think of "void *" as a pointer to storage. Not "char"
> (which is the C name for a signed byte),

Common Programming Error #99: "char" is implementation whether it is
signed or may be unsigned. Only a "char" type qualified by "signed"
or "unsigned" can be relied upon to have the requested property.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core

2004-09-15 20:11:37

by Greg KH

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..

On Wed, Sep 15, 2004 at 12:53:51PM -0700, Linus Torvalds wrote:
>
>
> On Wed, 15 Sep 2004, Greg KH wrote:
> >
> > Currently a few drivers do:
> > status = readl(&regs.status);
>
> I assume that's "&regs->status", since regs had better be a pointer..

Yes, sorry.

> > which causes sparse warnings.
> >
> > How should that code be changed to prevent this? Convert them all to
> > ioread32()? Or figure out a way to supress the warning for readl()?
>
> Just make sure that you annotate "regs" as a pointer to IO space.

Ah, ok, that works. Thanks for the clarification, I now realize that
__iomem can be a marker for any type of pointer, not just a void.
That's where I was confused.

thanks,

greg k-h

2004-09-15 22:31:32

by Roland Dreier

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..

Linus, while we're on the subject of new sparse checks, could you give
a quick recap of the semantics of the new __leXX types (and what
__bitwise means to sparse)? I don't think I've ever seen this stuff
described on LKML.

Thanks,
Roland

2004-09-15 22:48:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..



On Wed, 15 Sep 2004, Deepak Saxena wrote:
>
> Since we are on the subject of io-access, I would like a
> clarification/opinion on the read*/write* & in*/out* accessors
> (and now the ioread/write equivalents). Are these functions only meant
> to be used for PCI memory-mapped devices or _any_ memory mapped devices?

It really depends on the bus architecture.

At some point, if the bus is different enough from a "normal" setup, you
should just use your own accessor functions. Trying to overload
"readl/writel" is just too painful.

However, at that point you should also realize that you can't re-use _any_
of the existing chip drivers, and you'll have to write your own. If the
bus is exotic enough, that's not a problem, and you'd have to do that
anyway. But there really aren't all that many "exotic" buses around any
more.

Quite frankly, of your two suggested interfaces, I would select neither.
I'd just say that if your bus is special enough, just write your own
drivers, and use

cookie = ixp4xx_iomap(dev, xx);
...
ixp4xx_iowrite(val, cookie + offset);

which is perfectly valid. You don't have to make these devices even _look_
like a PCI device. Why should you?

Linus

2004-09-15 23:12:44

by Deepak Saxena

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..

On Sep 15 2004, at 15:46, Linus Torvalds was caught saying:
> Quite frankly, of your two suggested interfaces, I would select neither.
> I'd just say that if your bus is special enough, just write your own
> drivers, and use
>
> cookie = ixp4xx_iomap(dev, xx);
> ...
> ixp4xx_iowrite(val, cookie + offset);
>
> which is perfectly valid. You don't have to make these devices even _look_
> like a PCI device. Why should you?

Problem is that some of those devices are not that special. For example,
the on-board 16550 is accessed using readb/writeb in the 8250.c driver.
I don't think we want to add that level of low-level detail to that
driver and instead should just hide it in the platform code. I look
at it from the point of view that the driver should not care about how
the access actually occurs on the bus. It just says, write data foo at
location bar regardless of whether bar is ISA, PCI, on-chip, RapidIO,
etc and that writing of the data is hidden in the implementation of
the accessor API.

~Deepak


--
Deepak Saxena - dsaxena at plexity dot net - http://www.plexity.net/

"Unlike me, many of you have accepted the situation of your imprisonment
and will die here like rotten cabbages." - Number 6

2004-09-15 23:33:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: Being more careful about iospace accesses..


[ Subject line changed to avoid getting caught as spam ;]

On Wed, 15 Sep 2004, Roland Dreier wrote:
>
> Linus, while we're on the subject of new sparse checks, could you give
> a quick recap of the semantics of the new __leXX types (and what
> __bitwise means to sparse)? I don't think I've ever seen this stuff
> described on LKML.

[ The bitwise checks are actually by Al Viro, but I'll explain the basic
idea. Al is Cc'd so that he can add any corrections or extensions. ]

Sparse allows a number of extra type qualifiers, including address spaces
and various random extra restrictions on what you can do with them. There
are "context" bits that allow you to use a symbol or type only in certain
contexts, for example, and there are type qualifiers like "noderef" that
just say that a pointer cannot be dereferenced (it looks _exactly_ like a
pointer in all other respects, but trying to actually access anything
through it will cause a sparse warning).

The "bitwise" attribute is very much like the "noderef" one, in that it
restricts how you can use an expression of that type. Unlike "noderef",
it's designed for integer types, though. In fact, sparse will refuse to
apply the bitwise attribute to non-integer types.

As the name suggests, a "bitwise" expression is one that is restricted to
only a certain "bitwise" operations that make sense within that class. In
particular, you can't mix a "bitwise" class with a normal integer
expression (the constant zero happens to be special, since it's "safe"
for all bitwise ops), and in fact you can't even mix it with _another_
bitwise expression of a different type.

And when I say "different", I mean even _slightly_ different. Each typedef
creates a type of it's own, and will thus create a bitwise type that is
not compatible with anything else. So if you declare

int __bitwise i;
int __bitwise j;

the two variables "i" and "j" are _not_ compatible, simply because they
were declared separately, while in the case of

int __bitwise i, j;

they _are_ compatible. The above is a horribly contrieved example, as it
shows an extreme case that doesn't make much sense, but it shows how
"bitwise" always creates its own new "class".

Normally you'd always use "__bitwise" in a typedef, which effectively
makes that particular typedef one single "bitwise class". After that, you
can obviously declare any number of variables in that class.

Now apart from the classes having to match, "bitwise" as it's name
suggests, also restricts all operations within that class to a subset of
"bit-safe" operations. For example, addition isn't "bit-safe", since
clearly the carry-chain moves bits around. But you can do normal bit-wise
operations, and you can compare the values against other values in the
same class, since those are all "bit-safe".

Oh, as an example of something that isn't obviously bit-safe: look out for
things like bit negation: doing a ~ is ok on an bitwise "int" type, but it
is _not_ ok on a bitwise "short" or "char". Why? Because on a bitwise
"int" you actually stay within the type. But doing the same thing on a
short or char will move "outside" the type by virtue of setting the high
bits (normal C semantics: a short gets promoted to an "int", so doign a
bitwise negation on a short will actually set the high bits).

So as far as sparse is concerned, a "bitwise" type is not really so much
about endianness as it is about making sure bits are never lost or moved
around.

For example, you can use the bitwise operation to verify the __GFP_XXX
mask bits. Right now they are just regular integers, which means that you
can write

kmalloc(GFP_KERNEL, size);

and the compiler will not notice anything wrong. But something is
_seriously_ wrong: the GFP_KERNEL should be the _second_ argument. If we
mark it to be a "bitwise" type (which it is), that bug would have been
noticed immediately, and you could still do all the operations that are
valid of GFP_xxx values.

See the usage?

In the byte-order case, what we have is:

typedef __u16 __bitwise __le16;
typedef __u16 __bitwise __be16;
typedef __u32 __bitwise __le32;
typedef __u32 __bitwise __be32;
typedef __u64 __bitwise __le64;
typedef __u64 __bitwise __be64;

and if you think about the above rules about what is acceptable for
bitwise types, you'll likely immediately notivce that it automatically
means

- you can never assign a __le16 type to any other integer type or any
other bitwise type. You'd get a warnign about incompatible types. Makes
sense, no?
- you can only do operations that are safe within that byte order. For
example, it is safe to do a bitwise "&" on two __le16 values. Clearly
the result is meaningful.
- if you want to go outside that bitwise type, you have to convert it
properly first. For example, if you want to add a constant to a __le16
type, you can do so, but you have to use the proper sequence:

__le16 sum, a, b;

sum = a + b; /* INVALID! "warning: incompatible types for operation (+)" */
sum = cpu_to_le16(le16_to_cpu(a) + le16_to_cpu(b)); /* Ok */

See?

In short, "bitwise" is about more than just byte-order, but the semantics
of bitwise-restricted ops happen to be the semantics that are valid for
byte-order operations too.

Oh, btw, right now you only get the warnings from sparse if you use
"-Wbitwise" on the command line. Without that, sparse will ignore the
bitwise attribute.

Linus

2004-09-15 22:25:57

by Deepak Saxena

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..

On Sep 15 2004, at 09:30, Linus Torvalds was caught saying:
> At the same time, we've had the proper "accessor" functions (read[bwl](),
> write[bwl]() and friends) that on purpose dropped all type information
> from the MMIO pointer, mostly just because of historical reasons, and as a
> result some drivers didn't use a pointer at all, but some kind of integer.
> Sometimes even one that couldn't _fit_ a MMIO address in it on a 64-bit
> machine.

Linus,

Since we are on the subject of io-access, I would like a
clarification/opinion on the read*/write* & in*/out* accessors
(and now the ioread/write equivalents). Are these functions only meant
to be used for PCI memory-mapped devices or _any_ memory mapped devices?
Same with ioremap(). I ask because there are bits of code in the
kernel that use these on non-PCI devices and this sometimes causes
some complication in platform-level code. For example, b/c of
the way PCI access work on the IXP4xx (indirect access via register
read/writes), we have to do the following to differentiate b/w
PCI and non-PCI devices (include/asm-arm/arch-ixp4xx/io.h):

static inline void *
__ixp4xx_ioremap(unsigned long addr, size_t size, unsigned long flags, unsigned long align)
{
extern void * __ioremap(unsigned long, size_t, unsigned long, unsigned long);
if((addr < 0x48000000) || (addr > 0x4fffffff))
return __ioremap(addr, size, flags, align);

return (void *)addr;
}

static inline void
__ixp4xx_writeb(u8 value, u32 addr)
{
u32 n, byte_enables, data;

if (addr >= VMALLOC_START) {
__raw_writeb(value, addr);
return;
}

n = addr % 4;
byte_enables = (0xf & ~BIT(n)) << IXP4XX_PCI_NP_CBE_BESL;
data = value << (8*n);
ixp4xx_pci_write(addr, byte_enables | NP_CMD_MEMWRITE, data);
}

#define writeb(p, v) __ixp4xx_writeb(p, v)

(0x48000000 -> 0x4fffffff is the PCI memory window on this CPU).

While this is not a huge level of uglyness, I have systems where
this is going to get much uglier b/c we have overlapping addresses
on different buses and we need to be able to differentiate accesses
It raises the question of whether we need a different interface
for non-PCI devices, if we should be passing a struct device into all
the I/O accessors functions to make it easier for platform code to
determine what to do, or if we should make I/O access functions a
property of devices. So instead of doing read*/write/in*/out*, we
would do either:

a) pass device into io-access routines:

cookie = iomap(dev, foo, len);
bar = read32(dev, cookie + offset);

or

b) make access routines a function of the devices themselves

cookie = dev->iomap(foo, len);
bar = dev->read32(cookie + REG_OFFSET);

The former is nicer b/c it allows the dev to be ignored on systems where
we do not care about PCI vs non-PCI devices.

Comments?

~Deepak

--
Deepak Saxena - dsaxena at plexity dot net - http://www.plexity.net/

"Unlike me, many of you have accepted the situation of your imprisonment
and will die here like rotten cabbages." - Number 6

2004-09-16 00:18:29

by Al Viro

[permalink] [raw]
Subject: Re: Being more careful about iospace accesses..

On Wed, Sep 15, 2004 at 04:26:12PM -0700, Linus Torvalds wrote:

> other bitwise type. You'd get a warnign about incompatible types. Makes
> sense, no?
> - you can only do operations that are safe within that byte order. For
> example, it is safe to do a bitwise "&" on two __le16 values. Clearly
> the result is meaningful.

BTW, so far the most frequent class of endianness bugs had been along the
lines of
foo->le16_field = cpu_to_le32(12);
and vice versa. On big-endian it's a guaranteed FUBAR - think carefully about
the value that will end up there.

> Oh, btw, right now you only get the warnings from sparse if you use
> "-Wbitwise" on the command line. Without that, sparse will ignore the
> bitwise attribute.

We probably want __attribute__((opaque)) in addition to bitwise - e.g. for
the handles of all sorts passed in network filesystem protocols. I'll look
into that when we get the endianness warnings somewhat under control.

For now I'm going to #define __opaque __bitwise and use it for stuff like
typedef __u32 __opaque cifs_fid;
etc.

2004-09-16 11:51:40

by David Woodhouse

[permalink] [raw]
Subject: Re: Being more careful about iospace accesses..

On Thu, 2004-09-16 at 01:10 +0100,
[email protected] wrote:
> On Wed, Sep 15, 2004 at 04:26:12PM -0700, Linus Torvalds wrote:
>
> > other bitwise type. You'd get a warnign about incompatible types. Makes
> > sense, no?
> > - you can only do operations that are safe within that byte order. For
> > example, it is safe to do a bitwise "&" on two __le16 values. Clearly
> > the result is meaningful.
>
> BTW, so far the most frequent class of endianness bugs had been along the
> lines of
> foo->le16_field = cpu_to_le32(12);
> and vice versa. On big-endian it's a guaranteed FUBAR - think carefully about
> the value that will end up there.

Is that really more frequent than just 'foo->le16_field = 12' ?
I'm surprised.

Certainly, it was the frequency of pure assignment without _any_ attempt
at byte-swapping which caused me to introduce the 'jint32_t' et al
structures in jffs2, which even gcc then bitches about if you use them
wrongly.

I suppose I can ditch those now -- I always intended to after a while
anyway.

--
dwmw2


2004-09-16 12:27:37

by David Woodhouse

[permalink] [raw]
Subject: Re: Being more careful about iospace accesses..

On Wed, 2004-09-15 at 16:26 -0700, Linus Torvalds wrote:
> - if you want to go outside that bitwise type, you have to convert it
> properly first. For example, if you want to add a constant to a __le16
> type, you can do so, but you have to use the proper sequence:
>
> __le16 sum, a, b;
>
> sum = a + b; /* INVALID! "warning: incompatible types for operation (+)" */
> sum = cpu_to_le16(le16_to_cpu(a) + le16_to_cpu(b)); /* Ok */
>
> See?

Yeah right, that latter case is _so_ much more readable, and makes it
_so_ easy for the compiler to optimise precisely when it wants to do the
byte-swapping, especially if the back end has load-and-swap or
store-and-swap instructions. :)

It's even nicer when it ends up as:

sum = cpu_to_le16(le16_to_cpu(a) + le16_to_cpu(b)); /* Ok */
sum |= c;
sum = cpu_to_le16(le16_to_cpu(sum) + le16_to_cpu(d));

I'd really quite like to see the real compiler know about endianness,
too. I dare say I _could_ optimise the above (admittedly contrived but
not _so_ unlikely) case, but I don't _want_ to hand-optimise my code --
that's what I keep a compiler _for_.

--
dwmw2


2004-09-16 12:51:55

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..

On Wed, 15 Sep 2004, Deepak Saxena wrote:
> On Sep 15 2004, at 15:46, Linus Torvalds was caught saying:
> > Quite frankly, of your two suggested interfaces, I would select neither.
> > I'd just say that if your bus is special enough, just write your own
> > drivers, and use
> >
> > cookie = ixp4xx_iomap(dev, xx);
> > ...
> > ixp4xx_iowrite(val, cookie + offset);
> >
> > which is perfectly valid. You don't have to make these devices even _look_
> > like a PCI device. Why should you?
>
> Problem is that some of those devices are not that special. For example,
> the on-board 16550 is accessed using readb/writeb in the 8250.c driver.

Just what I was thinking...

> I don't think we want to add that level of low-level detail to that
> driver and instead should just hide it in the platform code. I look
> at it from the point of view that the driver should not care about how
> the access actually occurs on the bus. It just says, write data foo at
> location bar regardless of whether bar is ISA, PCI, on-chip, RapidIO,
> etc and that writing of the data is hidden in the implementation of
> the accessor API.

While 16550 serial is a bad example since it does byte accesses only (and thus
doesn't suffer from endianness), I have no problem to imagine there exist
platforms where you have multiple instances of a `standard' (cfr. 16550 serial)
device block, while each of them must be accessed differently:
- one of them is in PCI I/O space (little endian)
- one of them is in PCI MMIO space (little endian)
- one of them is on-chip MMIO (big endian)
- one of them is somewhere else, but sparsely addressed (some bytes of
padding between each register)
and we can for sure come up with a few more examples of weird addressing.

How to solve this, without cluttering each ioread*() with many if clauses?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2004-09-16 14:01:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: Being more careful about iospace accesses..



On Thu, 16 Sep 2004, David Woodhouse wrote:

> On Wed, 2004-09-15 at 16:26 -0700, Linus Torvalds wrote:
> > - if you want to go outside that bitwise type, you have to convert it
> > properly first. For example, if you want to add a constant to a __le16
> > type, you can do so, but you have to use the proper sequence:
> >
> > __le16 sum, a, b;
> >
> > sum = a + b; /* INVALID! "warning: incompatible types for operation (+)" */
> > sum = cpu_to_le16(le16_to_cpu(a) + le16_to_cpu(b)); /* Ok */
> >
> > See?
>
> Yeah right, that latter case is _so_ much more readable

It's not about readability.

It's about the first case being WRONG!

You can't add two values in the wrong byte-order. It's not an operation
that makes sense. You _have_ to convert them to CPU byte order first.

I certainly agree that the first version "looks nicer".

> It's even nicer when it ends up as:
>
> sum = cpu_to_le16(le16_to_cpu(a) + le16_to_cpu(b)); /* Ok */
> sum |= c;
> sum = cpu_to_le16(le16_to_cpu(sum) + le16_to_cpu(d));

This is actually the strongest argument _against_ hiding endianness in the
compiler, or hiding it behind macros. Make it very explicit, and just make
sure there are tools (ie 'sparse') that can tell you when you do something
wrong.

Any programmer who sees the above will go "well that's stupid", and
rewrite it as something saner instead. You can certainly rewrite it as

cpu_sum = le16_to_cpu(a) + le16_to_cpu(b);
cpu_sum |= le16_to_cpu(c);
cpu_sum += le16_to_cpu(d);
sum = cpu_to_le16(d);

which gets rid of the double conversions.

But if you hide the endianness in macro's, you'll never see the mess at
all, and won't be able to fix it.

> I'd really quite like to see the real compiler know about endianness,
> too.

I would have agreed with you some time ago. Having been bitten by too damn
many bompiler bugs I'e become convinced that the compiler doing things
behind your back to "help" you just isn't worth it. Not in a kernel, at
least. It's much better to build up good typechecking and the
infrastructure to help you get the job done.

Expressions like the above might happen once or twice in a project with
several million lines of code. It's just not worth compiler infrastructure
for - that just makes people use it as if it is free, and impossible to
find the bugs when they _do_ happen. Much better to have a type system
that can warn about the bad uses, but that doesn't actually change any of
the code generated..

Linus

2004-09-16 15:07:21

by Horst H. von Brand

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..

Linus Torvalds <[email protected]> said:
> On Wed, 15 Sep 2004, J?rn Engel wrote:
> >
> > But it still leaves me confused. Before I had this code:

[...]

> No, you can certainly continue to use non-void pointers. The "void __iomem
> *" case is just the typeless one, exactly equivalent to regular void
> pointer usage.
>
> So let me clarify my original post with two points:

[...]

Could you please put the clarified message into Documentation? It would be
a shame if it got lost.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513

2004-09-16 22:14:16

by Deepak Saxena

[permalink] [raw]
Subject: Re: Being more anal about iospace accesses..

On Sep 16 2004, at 14:51, Geert Uytterhoeven was caught saying:
> While 16550 serial is a bad example since it does byte accesses only (and thus
> doesn't suffer from endianness), I have no problem to imagine there exist
> platforms where you have multiple instances of a `standard' (cfr. 16550 serial)

No longer true. We now have UPIO_IOMEM32 for some of the fscked up HW
that large silicon manufacturer has released with 32-bit wide registers
that must be accessed as full 32-bits.

> device block, while each of them must be accessed differently:
> - one of them is in PCI I/O space (little endian)
> - one of them is in PCI MMIO space (little endian)
> - one of them is on-chip MMIO (big endian)
> - one of them is somewhere else, but sparsely addressed (some bytes of
> padding between each register)
> and we can for sure come up with a few more examples of weird addressing.
>
> How to solve this, without cluttering each ioread*() with many if clauses?

If clauses will still be needed, the only difference is that instead
of basing them on hardcoded addresses we now base them on
the devices coming in.

~Deepak

--
Deepak Saxena - dsaxena at plexity dot net - http://www.plexity.net/

"Unlike me, many of you have accepted the situation of your imprisonment
and will die here like rotten cabbages." - Number 6

2004-09-18 13:06:18

by kaih

[permalink] [raw]
Subject: Re: Being more careful about iospace accesses..

[email protected] (Linus Torvalds) wrote on 16.09.04 in <[email protected]>:

> On Thu, 16 Sep 2004, David Woodhouse wrote:
>
> > On Wed, 2004-09-15 at 16:26 -0700, Linus Torvalds wrote:
> > > - if you want to go outside that bitwise type, you have to convert it
> > > properly first. For example, if you want to add a constant to a
> > > __le16 type, you can do so, but you have to use the proper sequence:
> > >
> > > __le16 sum, a, b;
> > >
> > > sum = a + b; /* INVALID! "warning: incompatible types for operation
> > > (+)" */ sum = cpu_to_le16(le16_to_cpu(a) + le16_to_cpu(b)); /* Ok */
> > >
> > > See?
> >
> > Yeah right, that latter case is _so_ much more readable
>
> It's not about readability.
>
> It's about the first case being WRONG!

... in general.

And on the machines where it works (le machines), I'd certainly expect
those conversion functions to be trivially eliminated by the compiler (ie,
they're either trivial macros or trivial inline functions).

> > I'd really quite like to see the real compiler know about endianness,
> > too.

Well, these day, optimizers often can recognize the usual endianness
conversion idioms, so the compiler still gets a chance at inserting your
load-with-swap or whatever.

MfG Kai