2007-01-25 14:58:14

by Alan

[permalink] [raw]
Subject: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

Fixes bogus accesses to ports 0-15 with a non DMA capable controller.
This I think should go in for 2.6.20

Arguably it shouldn't be called for PIO commands at all but thats a
matter for Jeff to decide

Signed-off-by: Alan Cox <[email protected]>

diff -u --new-file --recursive --exclude-from /usr/src/exclude linux.vanilla-2.6.20-rc4-mm1/drivers/ata/libata-sff.c linux-2.6.20-rc4-mm1/drivers/ata/libata-sff.c
--- linux.vanilla-2.6.20-rc4-mm1/drivers/ata/libata-sff.c 2007-01-22 16:26:50.000000000 +0000
+++ linux-2.6.20-rc4-mm1/drivers/ata/libata-sff.c 2007-01-24 17:31:40.000000000 +0000
@@ -827,7 +827,8 @@
*/
void ata_bmdma_post_internal_cmd(struct ata_queued_cmd *qc)
{
- ata_bmdma_stop(qc);
+ if (qc->ap->ioaddr.bmdma_addr)
+ ata_bmdma_stop(qc);
}

#ifdef CONFIG_PCI


2007-01-25 16:14:37

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

On Thu, 2007-01-25 at 15:09 +0000, Alan wrote:
>
> diff -u --new-file --recursive --exclude-from /usr/src/exclude linux.vanilla-2.6.20-rc4-mm1/drivers/ata/libata-sff.c linux-2.6.20-rc4-mm1/drivers/ata/libata-sff.c
> --- linux.vanilla-2.6.20-rc4-mm1/drivers/ata/libata-sff.c 2007-01-22 16:26:50.000000000 +0000
> +++ linux-2.6.20-rc4-mm1/drivers/ata/libata-sff.c 2007-01-24 17:31:40.000000000 +0000
> @@ -827,7 +827,8 @@
> */
> void ata_bmdma_post_internal_cmd(struct ata_queued_cmd *qc)
> {
> - ata_bmdma_stop(qc);
> + if (qc->ap->ioaddr.bmdma_addr)
> + ata_bmdma_stop(qc);
> }

But what if the bmdma_addr _is_ zero? Please, let's not allow the "zero
is not a valid number" braindamage to spread any further than the IRQ
setup it's already broken.

All the world is not a PeeCee.

--
dwmw2

2007-01-25 16:17:32

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

David Woodhouse wrote:
> On Thu, 2007-01-25 at 15:09 +0000, Alan wrote:
>> diff -u --new-file --recursive --exclude-from /usr/src/exclude linux.vanilla-2.6.20-rc4-mm1/drivers/ata/libata-sff.c linux-2.6.20-rc4-mm1/drivers/ata/libata-sff.c
>> --- linux.vanilla-2.6.20-rc4-mm1/drivers/ata/libata-sff.c 2007-01-22 16:26:50.000000000 +0000
>> +++ linux-2.6.20-rc4-mm1/drivers/ata/libata-sff.c 2007-01-24 17:31:40.000000000 +0000
>> @@ -827,7 +827,8 @@
>> */
>> void ata_bmdma_post_internal_cmd(struct ata_queued_cmd *qc)
>> {
>> - ata_bmdma_stop(qc);
>> + if (qc->ap->ioaddr.bmdma_addr)
>> + ata_bmdma_stop(qc);
>> }
>
> But what if the bmdma_addr _is_ zero? Please, let's not allow the "zero
> is not a valid number" braindamage to spread any further than the IRQ
> setup it's already broken.

Read the code... This test is already widely in use in libata.

Jeff



2007-01-25 16:19:18

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

On Thu, 2007-01-25 at 11:17 -0500, Jeff Garzik wrote:
> Read the code... This test is already widely in use in libata.

That doesn't make it correct; it just means it's your fault not Alan's,
right? :)

--
dwmw2

2007-01-25 16:23:16

by Russell King

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

On Thu, Jan 25, 2007 at 11:17:20AM -0500, Jeff Garzik wrote:
> David Woodhouse wrote:
> >On Thu, 2007-01-25 at 15:09 +0000, Alan wrote:
> >>diff -u --new-file --recursive --exclude-from /usr/src/exclude
> >>linux.vanilla-2.6.20-rc4-mm1/drivers/ata/libata-sff.c
> >>linux-2.6.20-rc4-mm1/drivers/ata/libata-sff.c
> >>--- linux.vanilla-2.6.20-rc4-mm1/drivers/ata/libata-sff.c
> >>2007-01-22 16:26:50.000000000 +0000
> >>+++ linux-2.6.20-rc4-mm1/drivers/ata/libata-sff.c 2007-01-24
> >>17:31:40.000000000 +0000
> >>@@ -827,7 +827,8 @@
> >> */
> >> void ata_bmdma_post_internal_cmd(struct ata_queued_cmd *qc)
> >> {
> >>- ata_bmdma_stop(qc);
> >>+ if (qc->ap->ioaddr.bmdma_addr)
> >>+ ata_bmdma_stop(qc);
> >> }
> >
> >But what if the bmdma_addr _is_ zero? Please, let's not allow the "zero
> >is not a valid number" braindamage to spread any further than the IRQ
> >setup it's already broken.
>
> Read the code... This test is already widely in use in libata.

Ditto. The only interpretation that can be placed upon DMA addresses
is done by dma_mapping_error(), which _is_ a per-architecture defined
test because these things aren't defined in the API.

The zero DMA address is absolutely and totally valid. It might not
correspond with physical address zero on platforms.

Also, DMA address zero is not the same as NULL.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:

2007-01-25 16:26:37

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

On Thu, 2007-01-25 at 16:22 +0000, Russell King wrote:
> The zero DMA address is absolutely and totally valid. It might not
> correspond with physical address zero on platforms.
>
> Also, DMA address zero is not the same as NULL.

This isn't actually "DMA address" in that context -- it's the I/O port
number at which the controller's BM-DMA control registers reside. But
still it can be zero.

--
dwmw2

2007-01-25 17:16:43

by Alan

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

> > void ata_bmdma_post_internal_cmd(struct ata_queued_cmd *qc)
> > {
> > - ata_bmdma_stop(qc);
> > + if (qc->ap->ioaddr.bmdma_addr)
> > + ata_bmdma_stop(qc);
> > }
>
> But what if the bmdma_addr _is_ zero? Please, let's not allow the "zero
> is not a valid number" braindamage to spread any further than the IRQ
> setup it's already broken.

The fix matches the rest of libata on this and fixes a bug that wants
fixing urgently for 2.6.20.

If you want to put your bmdma address at zero then libata-sff won't help
you at the moment, and assumes zero is a safe "not in use" value because
the PCI layer also takes a similar view in places.

libata-sff and thus ata_bmdma_post_internal_cmd() aren't used by non PCI
devices so the bmdma test is ok providing it stays within libata-sff. I
don't think its a problem, although clearly its an assumption that the
core code rather than the SFF code cannot make.

drivers/ide uses roughly the same assumptions without problem.

Alan

2007-01-25 17:56:54

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers



On Thu, 25 Jan 2007, Alan wrote:
>
> If you want to put your bmdma address at zero then libata-sff won't help
> you at the moment, and assumes zero is a safe "not in use" value because
> the PCI layer also takes a similar view in places.

Indeed. Zero means "not in use". It really is that simple.

And I'm sorry, David, but 99% of the world _is_ a PC, and that is where
PCI and ATA came from, so anybody who thinks that zero is a valid PCI
address is just sadly mistaken and has a bug. In fact, even on a hardware
level, a lot of devices will literally have "zero means disabled".

Broken architectures that put PCI things at some "PCI physical address
zero" need to map their PCI addresses to something else. It's part of why
we have the whole infrastructure for doing things like

pcibios_bus_to_resource()
pcibios_resource_to_bus()

etc - not just because a resource having zero in "start" means that it is
disabled, but because normally such broken setups have *other* problems
too (ie they don't have a 1:1 mapping between PCI addresses and physical
addresses _anyway_, so they need to address translations).

This has come up before. For example: for an IRQ, 0 means "does not
exist", it does _not_ mean "physical irq 0", and we test for whether a
device has a valid irq by doing "if (dev->irq)" rather than having some
insane archiecture-specific "IRQ_NONE". And if you validly really have an
irq at the hardware level that is zero, then that just means that the irq
numbers you should tell the kernel should be translated some way.

(On a PC, hardware irq 0 is a real irq too, but it's a _special_ irq, and
it is set up by architecture-specific code. So as far as the generic
kernel and all devices are concerned, "!dev->irq" means that the irq
doesn't exist or hasn't been mapped for that device yet).

So there are three issues:

- we always need a way of saying "not mapped"/"nonexistent", and using
the value zero is the one that GIVES THE CLEANEST SOURCE CODE! It's why
NULL pointers are zero too. Sure, virtual address zero is a real
virtual address, but that doesn't change the fact that C made 0 be
special on a language level. If you want to access that virtual
address, and use NULL as a "doesn't exist" at the same time, you need
to swizzle your pointer somehow.

- the x86[-64] architecture is the one that gets tested the most. So
everybody else should try to look like it, rather than say "we are
different/better/strange". Don't have any special IO instructions?
Tough. You'd better do "inb/outb" anyway, and map them onto your
memory-mapped IO somehow.

- per-architecture magic values is a bad idea. If you need a magic value
(and things like this _do_ need one, unless we always want to carry a
separate flag around saying "valid" or "not valid"), it's a lot better
to just say "everybody uses this value" and then have the _small_
architecture-specific code work around it, than have everybody have to
work around a lot of architectures doing things differently.

All these issues boil down to the same thing: whenever at all physically
possible, we make sure that everything looks as much as a PC as possible.
Only when that literally isn't an option do we add other abstractions.

Linus

2007-01-25 23:36:24

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

Alan wrote:
> Fixes bogus accesses to ports 0-15 with a non DMA capable controller.
> This I think should go in for 2.6.20

applied to #upstream-fixes, but it's a hack based on a misunderstanding.
See comments below for further work needed.


> Arguably it shouldn't be called for PIO commands at all but thats a
> matter for Jeff to decide

You are getting misled by the function name.

ata_bmdma_post_internal_cmd() is the common ->post_internal_cmd() hook
for BMDMA-like (SFF-like) controllers. ->post_internal_cmd() hook will
always be called, for all commands, when present.

For PIO-only controllers, simply delete the post_internal_cmd hook from
that specific driver's ata_port_operations. (assuming no other cleanup
is needed)

For other SFF controllers, perhaps ata_bmdma_post_internal_cmd() should
be revised to check the taskfile protocol (PIO, DMA, ...)?

I leave that up to your judgement, to figure out what's best. I
certainly AGREE that an unconditional ata_bmdma_stop() for all commands,
for all taskfile protocols, sounds wrong.

Jeff



2007-01-26 00:23:24

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

On Thu, 2007-01-25 at 09:56 -0800, Linus Torvalds wrote:
> Broken architectures that put PCI things at some "PCI physical address
> zero" need to map their PCI addresses to something else. It's part of why
> we have the whole infrastructure for doing things like
>
> pcibios_bus_to_resource()
> pcibios_resource_to_bus()
>
> etc - not just because a resource having zero in "start" means that it is
> disabled, but because normally such broken setups have *other* problems
> too (ie they don't have a 1:1 mapping between PCI addresses and physical
> addresses _anyway_, so they need to address translations).

You're thinking of MMIO, while the case we were discussing was PIO. My
laptop is perfectly happy to assign PIO resources from zero.

The claim that "zero is disabled" is a relatively new (and IMO broken)
thing, and doesn't seem to be universal. This bizarre new special case
certainly isn't shared by the PCMCIA code, for example, which is quite
happy to put devices are PIO address 0...

shinybook /root # cat /sys/class/pcmcia_socket/pcmcia_socket0/available_resources_io
0x00000000 - 0x007fffff
shinybook /root # dmesg | grep 'max PIO'
ata2: PATA max PIO0 cmd 0x0 ctl 0xE bmdma 0x0 irq 53
ata2.00: CFA, max PIO4, 500400 sectors: LBA

I believe PCMCIA just uses the generic resource code, which also seems
to lack any knowledge of this hackish special case for zero.

> This has come up before. For example: for an IRQ, 0 means "does not
> exist", it does _not_ mean "physical irq 0", and we test for whether a
> device has a valid irq by doing "if (dev->irq)" rather than having some
> insane archiecture-specific "IRQ_NONE". And if you validly really have an
> irq at the hardware level that is zero, then that just means that the irq
> numbers you should tell the kernel should be translated some way.
>
> (On a PC, hardware irq 0 is a real irq too, but it's a _special_ irq, and
> it is set up by architecture-specific code. So as far as the generic
> kernel and all devices are concerned, "!dev->irq" means that the irq
> doesn't exist or hasn't been mapped for that device yet).

So again you end up in a situation where zero is a strange special case.
It works sometimes (setup_irq()) but not other times, and has meaning in
some places but not others. Once we go tickless and I start playing with
the PC speaker audio driver again, that hackish special case is going to
be fun to play with, I'm sure.

And your IRQ numbers no longer match the labels on the IRQ lines in the
hardware documentation; you have to have some 'mapping', which should
ideally be used for user-visible displays of IRQ numbers too.

That doesn't really sound much like the 'CLEANEST SOURCE CODE' to me.

Programmers do seem to cope with comparing file descriptors with -1.
Counting from zero is quite normal. But would you suggest that our "zero
is invalid" policy should extend to those too, just in case our kernel
hackers can't manage? Obviously userspace programs will still expect
stdin to be fd #0 but glibc can fix that up for us -- it can handle the
"should be translated in some way" requirement of which you speak, so
that we can have the special case which seems to be permeating the
kernel.

> So there are three issues:
>
> - we always need a way of saying "not mapped"/"nonexistent", and using
> the value zero is the one that GIVES THE CLEANEST SOURCE CODE! It's why
> NULL pointers are zero too. Sure, virtual address zero is a real
> virtual address, but that doesn't change the fact that C made 0 be
> special on a language level. If you want to access that virtual
> address, and use NULL as a "doesn't exist" at the same time, you need
> to swizzle your pointer somehow.

Really, it doesn't make the code cleaner when you have to add a whole
layer of translation just because of a single misguided special case.
The "special case" of NULL isn't really a special case because we just
refrain from _mapping_ that page. We don't actually treat it any
differently.

> - the x86[-64] architecture is the one that gets tested the most. So
> everybody else should try to look like it, rather than say "we are
> different/better/strange". Don't have any special IO instructions?
> Tough. You'd better do "inb/outb" anyway, and map them onto your
> memory-mapped IO somehow.

I look forward to pretending the world is little-endian and
dma-coherent, and doesn't need memory barriers :)

> - per-architecture magic values is a bad idea. If you need a magic value
> (and things like this _do_ need one, unless we always want to carry a
> separate flag around saying "valid" or "not valid"), it's a lot better
> to just say "everybody uses this value" and then have the _small_
> architecture-specific code work around it, than have everybody have to
> work around a lot of architectures doing things differently.

It doesn't need to be per-architecture; it can just be -1. The only
reason it was per-arch before is because some architectures were using
zero and someone considered it easier to #define NO_IRQ rather than just
making it consistent.

> All these issues boil down to the same thing: whenever at all physically
> possible, we make sure that everything looks as much as a PC as possible.
> Only when that literally isn't an option do we add other abstractions.

No, Linux is better than that. We tend to use models which encompass the
PC and other systems, and we're actually quite good at it.

The argument falls down a little more when we observe that our
assumptions about zero aren't necessarily valid on a PC either. On a PC,
IRQ0 _is_ valid, and the current code is inconsistent and confusing.
Even the PC needs some hackish "IRQ mapping" if we actually put our
money where your mouth is and make the code self-consistent by doing
something like this:

--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -220,7 +220,7 @@ int setup_irq(unsigned int irq, struct irqaction *new)
unsigned long flags;
int shared = 0;

- if (irq >= NR_IRQS)
+ if (!irq || irq >= NR_IRQS)
return -EINVAL;

if (desc->chip == &no_irq_chip)


PIO address 0 is also valid on a PC -- assuming you have the NO_ISA bit
turned off on any PCI bridges in the path. And even on a PC I've
sometimes had to manually clear that bit in order to get PCMCIA resource
assignment to work.

PCMCIA resource assignment can be hard enough already without
gratuitously throwing away a perfectly valid location on the grounds
that we've lost the ability to use zero.

This is a computer. We count from zero.

--
dwmw2

2007-01-26 01:28:45

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers



On Fri, 26 Jan 2007, David Woodhouse wrote:
>
> You're thinking of MMIO, while the case we were discussing was PIO. My
> laptop is perfectly happy to assign PIO resources from zero.

I was indeed thinking MMIO, but I really think it should extend to PIO
also. It certainly is (again) true on PC's, where the low IO space is
special and reserved for motherboard/system devices.

If you want to be different, that's YOUR problem. Some architectures have
tried to look different, and then drivers break on them, but they have
only themselves to blame.

> I believe PCMCIA just uses the generic resource code, which also seems
> to lack any knowledge of this hackish special case for zero.

The resource code actually knows what "enabled" means. A lot of other code
does not.

> > kernel and all devices are concerned, "!dev->irq" means that the irq
> > doesn't exist or hasn't been mapped for that device yet).
>
> So again you end up in a situation where zero is a strange special case.

No. We end up in a situation where *drivers* never have any strange or
special cases.

You need to have a "no irq" thing. It might as well be zero, since that is
not just the de-facto standard, it's also the one and only value that
leads to easily readable source-code (ie test it as a boolean in C).

The exact same thing has been true of MMIO. I would be not at all
surprised if several drivers do the same for PIO.

It's something you can trust on a PC. See above on your problems if you
decide that you want to be "generic" and use a value that is illegal on
99% of all hardware.

> It doesn't need to be per-architecture; it can just be -1.

Bollocks. People tried that. People tried to force this idiotic notion of
"NO_IRQ" down my throat for several years. I even accepted it.

And then, after several years, when it was clear that it still didn't
work, and drivers just weren't getting updated, it was time to just face
reality: if the choice is between 0 and -1, 0 is simply much easier for
the bulk of the code.

Live with it, or don't. I really don't care what you do on your hardware.
But if you can't face that

if (!dev->irq)
..

is simpler for people to write, and that it's what we've done for a long
time, then that really is YOUR problem.

The exact same issues have been true in MMIO. Some code will keep track of
separate "enabled" bits: the resource management code is such code. Guess
what? Not a lot of drivers tend to do that. You can try to fight
windmills, or you can just accept that the very language we use (namely,
C) has made 0 be special, and tends to be used to say "nobody home" simply
because it has that special meaning for a C compiler.

And I bet there are PIO devices out there that consider address zero to be
disabled. For EXACTLY the same reason.

(And yes, hardware actually tends to do the same thing. For PCI irq
routing registers, an irq value of 0 pretty much universally means
"disabled". In fact, even your lovely Cardbus example actually is an
example of exactly this: the very IO limit registers are DEFINED IN
HARDWARE to special-case address zero - so that making the base/limit
registers be zero actually disables the IO window, rather than making it
mean "four IO bytes at address zero").

But hey, if it works for you, go wild. Just don't expect drivers to always
work.

Linus

2007-01-26 01:45:43

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

Linus Torvalds wrote:
>
> On Fri, 26 Jan 2007, David Woodhouse wrote:
>> You're thinking of MMIO, while the case we were discussing was PIO. My
>> laptop is perfectly happy to assign PIO resources from zero.
>
> I was indeed thinking MMIO, but I really think it should extend to PIO
> also. It certainly is (again) true on PC's, where the low IO space is
> special and reserved for motherboard/system devices.

Via the even-less-of-an-excuse-than-you-thought department:

Many (most?) non-x86 handle PIO via special mappings and additional
serialization instructions, but otherwise treat PIO register space in a
very similar manner to MMIO.

Thus, it's /easier/ on non-x86 to ensure that PIO addresses never land
at zero, because you must remap /anyway/. It's only on x86 that PIO
register spaces are accessed by vastly different CPU instructions. Most
other arches convert PIO accesses into massage+mmio R/W+massage.

On sparc64, for example, after I pointed this out to DaveM, he was able
to implement the new iomap interface without the 'if (pio-mem-area)'
branch present on x86.

Jeff



2007-01-26 02:01:54

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers



On Thu, 25 Jan 2007, Jeff Garzik wrote:
>
> On sparc64, for example, after I pointed this out to DaveM, he was able to
> implement the new iomap interface without the 'if (pio-mem-area)' branch
> present on x86.

However, in all honesty, we have triggered bugs in that area too, simply
because some driver code "knew" that PIO addresses could fit in 16 bits,
and used u16 or "unsigned short" to remember the PIO address. Both ARM and
Sparc was bitten by this, although usually the issue is trivial to fix
once found.

Also, many ISA-only drivers actually have hardcoded PIO numbers (eg
"0x1f0").

But yes, I would generally suggest that architectures where the PIO range
is really just another magic MMIO range (which is most of the non-x86
world, as you point out) might as well at least aim for doing the
remapping early (ie with "pci_resource_start()")

Making that easy was one of my goals for the "new" IO accessor functions,
in fact.

Not that many people actually use them.

So *if* you use the new "iomap" interfaces, and the new "pci_iomap()"
things, that should actually not just allow drivers (like the ATA layer)
to share much more code between the PIO and MMIO cases, but it hopefully
actually makes it easier for strange architectures to do it all.

So traditionally, we've had PIO be "limited integer addresses, and some
drivers know magic numbers", but hopefully new drivers could at least try
to use some of the infrastructure where we try to help people not have to
deal with it so much as a special case any more.

Linus

2007-01-26 02:11:40

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

Linus Torvalds wrote:
> So *if* you use the new "iomap" interfaces, and the new "pci_iomap()"
> things, that should actually not just allow drivers (like the ATA layer)
> to share much more code between the PIO and MMIO cases, but it hopefully
> actually makes it easier for strange architectures to do it all.


Another aside: Tejun took my libata iomap and came up with something
I'm quite happy with, so libata will /finally/ switch over to using the
new iomap interfaces for 98% of all drivers, as of 2.6.21.

Look at "[PATCHSET] Managed device resources, take #3" on LKML if you're
interested. Tejun's 'devres' stuff makes it a lot easier for drivers to
reserve, map, unmap, and free various hardware resources.

Jeff


2007-01-26 02:24:15

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

On Thu, 2007-01-25 at 17:28 -0800, Linus Torvalds wrote:
>
> On Fri, 26 Jan 2007, David Woodhouse wrote:
> >
> > You're thinking of MMIO, while the case we were discussing was PIO. My
> > laptop is perfectly happy to assign PIO resources from zero.
>
> I was indeed thinking MMIO, but I really think it should extend to PIO
> also. It certainly is (again) true on PC's, where the low IO space is
> special and reserved for motherboard/system devices.

As you wish.

There's a trade-off to be made. I happen to disagree with your choice --
you seem to want to introduce special cases and new layers of
'translation', and in some (admittedly, non-PC and relatively rare)
cases reduce functionality just to account for the fact that Linux
driver authors are fairly incompetent, and I think that's the wrong
choice.

So although I've mostly given up on interrupts¹, I reserve the right to
object if the "zero is not a real number" fallacy extends itself into
new areas. The thing about PIO addresses is an example of that. The
ide-cs and pata_pcmcia drivers (and indeed most other drivers AFAICT;
certainly all PCMCIA drivers I've used in my laptop) work _fine_ with
their main PIO address being zero. This thread started when I noticed
pata_pcmcia getting it wrong if its _BMDMA_ I/O range is zero.

Certainly, the resource code does not know about this newly-invented
special case for PIO address zero and is happy to assign it.

> > It doesn't need to be per-architecture; it can just be -1.
>
> Bollocks. People tried that. People tried to force this idiotic notion of
> "NO_IRQ" down my throat for several years. I even accepted it.
>
> And then, after several years, when it was clear that it still didn't
> work, and drivers just weren't getting updated, it was time to just face
> reality: if the choice is between 0 and -1, 0 is simply much easier for
> the bulk of the code.

The quality of our drivers is low; I'm fully aware that trying to
improve driver quality is a quixotic task. But where do we draw the
line? Should we abandon the dma-mapping stuff too? Declare that page
zero is a special case and you can't DMA to it? Should we try to make
every PCI write also do a read in order to flush posted writes, because
people can't cope with the real world?

> Live with it, or don't. I really don't care what you do on your hardware.
> But if you can't face that
>
> if (!dev->irq)
> ..
>
> is simpler for people to write, and that it's what we've done for a long
> time, then that really is YOUR problem.

Even userspace people seem to cope with it in the case of file
descriptors. Kernel people have to cope too, for stuff like DMA
addresses.

> And I bet there are PIO devices out there that consider address zero to be
> disabled. For EXACTLY the same reason.

> (And yes, hardware actually tends to do the same thing. For PCI irq
> routing registers, an irq value of 0 pretty much universally means
> "disabled". In fact, even your lovely Cardbus example actually is an
> example of exactly this: the very IO limit registers are DEFINED IN
> HARDWARE to special-case address zero - so that making the base/limit
> registers be zero actually disables the IO window, rather than making it
> mean "four IO bytes at address zero").

My example was 16-bit PCMCIA (actually CompactFlash), and it's
_certainly_ not true in that case. Devices often don't bother to decode
address lines higher than the two or three they need to tell which of
their registers is being accessed; they let the socket do the equivalent
of the 'chip select' decode.

--
dwmw2

¹ Mostly. I still wonder occasionally if we could use _pointers_ in the
generic code, and let drivers deal with a (struct irq_desc *) instead of
just a number. Since the old system of ISA IRQ numbering is fairly out
of date now and we're starting to recognise the fact that IRQs are a
_tree_, we might get away with it. And it would let us preserve the special
case for NULL -- but I haven't had time to fully work through the
implications.

2007-01-26 02:59:13

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers



On Fri, 26 Jan 2007, David Woodhouse wrote:
>
> The quality of our drivers is low; I'm fully aware that trying to
> improve driver quality is a quixotic task.

This is an important point. I actually hold things like the kernel VM
layer to _much_ higher standards than I would ever hold a driver writer.
That said, we tend to have "0 is special" even there, although we tend to
try to make it always be about a pointer for those cases.

But drivers really are not just the bulk of the code, but they also can't
be tested nearly as well. So for that reason (and really _only_ for that
reason), we should always just accept that a "driver" should never have to
worry, and *all* of the inconvenience of some special case or whatever
must be solidly elsewhere.

> But where do we draw the line? Should we abandon the dma-mapping stuff
> too? Declare that page zero is a special case and you can't DMA to it?
> Should we try to make every PCI write also do a read in order to flush
> posted writes, because people can't cope with the real world?

I think we should try to aim for two things:

- things that "just work" on a PC should generally "just work" everywhere
else. Just for the simple reason that most drivers never get tested
anywhere else.

- if it can't "just work", we should have as many static checks as
possible, and not let it compile.

- and if it does compile, the stuff that "looks simple" should just work.

For example, the reason "0" is special really _is_ about the compiler. For
any integer (or pointer, or FP) type in C, there really is just one
special value. 0 (or NULL, for pointers).

The reason why resources work fine with zeroes is that for a *resource*,
zero isn't actually a special value at all. Why? A resource isn't an
integer value. It's a struct.

So compiler type safety (or lack thereof) really ends up forcing some of
these issues. If something is represented as an integral type, the only
*obvious* real special case is always going to be 0, just because it's the
only one you can "test for" implicitly. Similarly, if you return a
pointer, you only have NULL that can sanely be used as a special value.

(Yeah, mmap() has taught some people about MAP_FAIL, but that's pretty
unusual too. And in the VFS layer, we use the magic "error pointers" that
actually encode error values in the bits too - but then, VFS people tend
to have to know a lot more about the kernel than a driver writer should
have to - VFS people are just held to higher standards).

With a pointer, NULL is usually ok anyway, and always tends to mean
something special - and for the other stuff you can then hide things
"behind" the pointer. So with a pointer, you could often have "ptr->valid"
or something else. With an integer, you can't really do that, because 0
always remains special, just because EVEN JUST BY MISTAKE the test of the
integer actually just ends up making zero be special.

So if you want a separate "enable" value, it almost has to be a structure
or some opaque type, because that's the only type where there isn't an
implicit special value.

We've done it occasionally. The VM has done it, for example. And we do it
in drivers for more complex cases (and resources is one such case: it's
already not a single value, since it has a start and a length, and other
structure).

We could have done it for interrupts too. A "struct irqnum" that has a bit
that specifies "valid". That would work. But it tends to be painful, so it
really has to give you something more than "zero is disabled".

It's just not worth it.

And it's why I decreed, that the ONLY SANE THING is to just let people do
the obvious thing:

if (!dev->irq)
return -ENODEV;

you don't have to know ANYTHING, and that code just works, and just looks
obvious. And you know what? If it causes a bit of pain for some platform
maintainer, I don't care one whit. Because it's obviously much better than
the alternatives.

We may not "need" that rule for IO ports. If IO port 0 "happens to work",
then hey, fine. But on the other hand, _all_ the same arguments really
end up being still 100% true. If some driver happens to write

if (!dev->ioport)
return -ENODEV;

then I say: go for it. The _driver_ is correct. It's the obvious thing to
do. It works on PC's, and it's simple and looks fine. Why not? If it
causes some minor heartburn for an architecture maintainer, so what?
Really? The tradeoff is obvious, and the architecture maintainer needs to
just go and fix his IO mappings so that no device ever sees a "valid" port
at port 0.

But in the meantime: if nobody complains, and it happens to work on
hardware even though some devices _can_ see a port of zero, I also don't
care. So I'm certainly not going to claim that your laptop "must be
fixed". If it works, it works. Hey fine.

But the first driver that doesn't work because it thought it didn't have
an IO port (beause it was zero), and the first time somebody complains, I
know *exactly* whose fault it is. It's not the driver.

Linus

2007-01-26 03:28:58

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

On Thu, 2007-01-25 at 18:58 -0800, Linus Torvalds wrote:
> And it's why I decreed, that the ONLY SANE THING is to just let people do
> the obvious thing:
>
> if (!dev->irq)
> return -ENODEV;
>
> you don't have to know ANYTHING, and that code just works, and just looks
> obvious. And you know what? If it causes a bit of pain for some platform
> maintainer, I don't care one whit. Because it's obviously much better than
> the alternatives.

I do understand the benefits; I'm just dubious about the trade-off. If
we could _completely_ isolate our poor crack-addled driver authors from
the nasty number zero then perhaps I'd be less unimpressed, but as it is
they still have to wake up and smell the coffee when it comes to DMA
addresses _anyway_.

When I eventually get to go home, which will hopefully still be some
time this month, I'll give some more coherent thought to the idea of
just using a (struct irq_desc *) directly instead of an integer. Then
you get to use NULL as a special case still. It's not as if people
should be pulling 'raw' IRQ numbers out of a hat or even module
parameters these days, except for ISA drivers where they can do
something like isa_irq[7] for the parallel port etc. Although that kind
of stuff should be done through a platform_device anyway too these days.

> But in the meantime: if nobody complains, and it happens to work on
> hardware even though some devices _can_ see a port of zero, I also
> don't care. So I'm certainly not going to claim that your laptop "must
> be fixed". If it works, it works. Hey fine.

It's not just "my laptop", I believe. It's the generic resource code,
which is happy to assign address zero since it's never been taught that
zero is now a special case. If we're not going to ask for the bug I
observed to be fixed -- if we're going to declare that driver authors
don't have to sober up and clean up their code -- then the resource code
should be modified accordingly.

--
dwmw2

2007-01-26 04:00:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers



On Fri, 26 Jan 2007, David Woodhouse wrote:
>
> It's not just "my laptop", I believe. It's the generic resource code,
> which is happy to assign address zero since it's never been taught that
> zero is now a special case. If we're not going to ask for the bug I
> observed to be fixed -- if we're going to declare that driver authors
> don't have to sober up and clean up their code -- then the resource code
> should be modified accordingly.

The resource code really is totally agnostic, and you're barking up the
wrong tree there. In many ways, the resource code isn't even about "IO
resources", it could do other things too.

[ In practice, of course, IO resources is all it does and what it was
designed for, since there really aren't a lot of hierarchical things
that need to be able to nest and handle byte-range-like things. ]

It's really up to the architecture-specific PCI initialization what the
PCI resources look like. The resource code just takes whatever resource
layout it is given. Yes, there's a "root" ioport_resource, but that's just
the container for the whole PCI resource tree, and generally you'd show
the different PCI domains exposed with their buses in that tree.

Of course, for all the historical reasons (a single domain, and it was
written for a PC), on PC's, the root PCI bus just points directly to the
root io port resource. But the way things work is that you cardbus card
doesn't just allocate space from that "ioport_resource" itself. No, it
allocates space from the cardbus controller resources, which in turn have
allocated space from the PCI bridge controller resources, etc etc all the
way up to whatever is the PCI root resource.

There *are* drivers that use the "ioport_resource" directly, but they are
system devices (where "ISA" counts as a system device - augh: it's not
enumerable or discoverable) which know where they go. But a normal driver
never does in any modern world.

So the way to make sure that PCI devices get allocated in the proper area
is not to change the resource manager, but to make sure that the
architecture initializes the root bridges for all the domains properly.

(A lot of them do the "PC thing", of course: they just make the ioport
resource the direct parent of the root bridge, and that's ok if the root
really _is_ supposed to cover everything from zero. On a PC, that's
actually the right thing to do, because the system devices will insert
themselves into the low area, and then PCIBIOS_MIN_IO - 0x1000 on a PC -
is used as the minimum for any *dynamic* allocation.)

PCI PIO/IOMEM resource allocation is actually fairly complicated, and most
people really *really* never need to care. It should be considered a sign
of how well the resource code works that it all usually works without most
people ever really needing to understand it.

Linus

2007-01-26 04:19:32

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

On Thu, 2007-01-25 at 20:00 -0800, Linus Torvalds wrote:
> The resource code really is totally agnostic, and you're barking up the
> wrong tree there. In many ways, the resource code isn't even about "IO
> resources", it could do other things too.

Of course it could, but then again why shouldn't we special-case zero in
_all_ of those use cases, just to make life easier for those poor driver
authors who presumably can't manage to write userspace code using stdin
or open() either.

I've already _said_ I think it's barking up the wrong tree; I think we
should just fix the driver code. But you disagree, so I'm trying to
understand what _you_ think the answer is here -- how the architecture
code should cope with this new decree that PIO address zero is invalid
even though it's actually always been OK in the past.

On this particular platform I believe that the PCI I/O resources are
allocated by firmware before we boot, and they look something like
this...

00000000-007fffff : /pci@f2000000
00000000-0000000f : pcmcia_socket0 (the CF card at zero)
00001000-000011ff : PCI CardBus #11
00001400-000015ff : PCI CardBus #11
00802000-01001fff : /pci@f0000000
00802400-008024ff : 0000:00:10.0
ff7fe000-ffffdfff : /pci@f4000000

Since you don't like the idea that the resource code should special-case
zero, are you instead suggesting that we should _reassign_ the
already-assigned PCI resources when we boot, just to make sure that
driver authors don't have to deal with zeroes (at least, not until they
start to think about DMA)? Or are you thinking of some other solution?

--
dwmw2

2007-01-26 04:48:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers



On Fri, 26 Jan 2007, David Woodhouse wrote:
>
> Of course it could, but then again why shouldn't we special-case zero in
> _all_ of those use cases, just to make life easier for those poor driver
> authors who presumably can't manage to write userspace code using stdin
> or open() either.

You're not thinking, or, more likely YOU DON'T WANT TO.

What is so hard to understand? IRQ0 is a _real_irq_ as far as the
*platform* goes even on PC's. Same goes for IO-port 0 and MMIO area zero.

Same goes for virtual address zero in a lot of user programs. It's a real
virtual address. The fact that the NULL pointer points there doesn't make
it less real.

Do you really have such a hard time understanding this fundamental issue?

Irq0 may _exist_. IO Port 0 may _exist_. Virtual address 0 may _exist_.

Got it?

But they ARE NOT VALID THINGS FOR DRIVERS TO WORRY ABOUT.

When a *DRIVER* sees a NULL pointer, it's always a sign of "not here".

It's not a "valid pointer that just happens to be at virtual address
zero". But in other contexts it actually _can_ be exactly that (ie when
you call "mmap(NULL .. MAP_FIXED)" you actually will get a NULL pointer
that is a *REAL POINTER*.

Similarly, when a *DRIVER* seens a "port 0" or MMIO 0, it is perfectly
valid for that driver to say "ok, my device apparently doesn't have an IO
port".

That does not mean that "port 0" doesn't exist. It just means that AS FAR
AS A DRIVER IS CONCERNED, port 0 is not a valid port for a driver!

Port 0 actually *does* exist on a PC. It's a system port (it also happens
to be a total legacy port that nobody would ever use, but that's another
issue entirely).

The same thing is true of irq 0. It exists. It's a valid IRQ for
archtiecture code for a PC. It's just NOT A VALID IRQ FOR A DRIVER! So
when a driver sees a device with !irq, it knows that the irq hasn't been
allocated.

I don't understand why this is so hard for you to accept. Even when you
yourself accept that irq0 actually *exists*, and even when you give as an
example why "setup_irq()" must be able to take that irq, you give that as
some kind of ass-hat example of why you are "right". Now you do exactly
the same thing for the IO port space.

You're totally confused. You say the words, and you seem to understand
that device drivers are special. But then you don't seem to follow that
understanding through, and you want to then say that everything else is
special too.

Don't you get it? If everybody is special, then nobody is special.

System code is special. System code can do things that drivers shouldn't
do. System code can know that irq0 is actually the timer, and can set it
up. System code can know that IO port 0 is actually decoded by the old and
insane AT architecture DMA controller.

This is not even kernel-specific. "normal programs" think that NULL is
special, and is never a valid pointer. But "system programs" may actually
know better. If you're programming in DOS-EMU, and use vm86 mode, you know
that you actually need to map the virtual address that is at 0, and NULL
is suddenly not actually an invalid pointer: it's literally a pointer to
the first byte in the vm86 model.

But when "malloc()" returns NULL, you know that it doesn't return such a
"system pointer", so when malloc returns NULL, you realize that it's a
special value.

The *EXACT* same thing is true within the kernel. When x86 architecture
code explicitly sets up IRQ0, it does so because it knows that it's the
timer interrupt. But that doesn't make it a valid irq for *anybody* else.

Ok, enough shouting.

Comprende? Do you _really_ think that the NULL pointer "doesn't exist"? Or
can you realize that it's generally just a convention, and it's a
convention that has been selected to be maximally easy to test for (both
on a code generation level and on a C syntax level)? It doesn't mean that
virtual address 0 "doesn't exist", and could not be mapped.

The exact same thing is true of "IO port 0". It's the maximally simple
_convention_ for someting that may actually exist, but it's somethign that
NO NORMAL USER SHOULD EVER SEE AS A REAL IO PORT. There are special users
that may use it, exactly the same way special users who know deep hardware
details may decide that "on this architecture, the NULL pointer actually
_literally_ means virtual address zero, and when I do *xyz* I actually can
access things through it".

Does the fact that some things can use NULL as meaning something else than
"no pointer" invalidate NULL as a concept? No. It just means that those
things are very architecture-specific. They're not "common code" aka
"drivers" any more.

Same exact deal.

Linus

2007-01-26 04:53:25

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

David Woodhouse wrote:
> When I eventually get to go home, which will hopefully still be some
> time this month, I'll give some more coherent thought to the idea of
> just using a (struct irq_desc *) directly instead of an integer. Then

Tejun beat you to it with devres. devres makes all sorts of resource
allocation and tracking loads easier.

Learn it, love it, lick it.

:)

Jeff


2007-01-26 05:10:35

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

On Thu, 2007-01-25 at 20:48 -0800, Linus Torvalds wrote:
> Irq0 may _exist_. IO Port 0 may _exist_. Virtual address 0 may
> _exist_.
>
> Got it?
>
> But they ARE NOT VALID THINGS FOR DRIVERS TO WORRY ABOUT.

I do understand what you're saying; there's no need to shout. I think
it's very misguided and leads to both internal inconsistency (as
demonstrated by the setup_irq() patch) and external inconsistency with
stuff like hardware documentation. But I _do_ understand what you're
saying.

> When a *DRIVER* sees a [zero], it's always a sign of "not here".

Except when it isn't. Like when it's a DMA address. Or a file
descriptor. Or a CPU number. Or one of numerous other things.

But still, I do understand what you're saying although I disagree with
your intention and your statement above is plain wrong (well, at least
my misquote of it is wrong -- you actually said 'NULL' which is fair
enough, but in the middle of a rant about _zero_ so I edited the quote
to say zero because that's what we're actually talking about).

> But they ARE NOT VALID THINGS FOR DRIVERS TO WORRY ABOUT.
...
> NO NORMAL USER SHOULD EVER SEE [zero] AS A REAL IO PORT.

Yes, that much I understand. We disagree, but I understand you. My last
response was not intending to pursue that part of the discussion.

My question was about _how_ you think this should be achieved in this
particular case. You didn't like the suggestion that we should put your
new special-case hack into the resource code... where/how _do_ you
suggest that it's done, so that we can protect those poor driver authors
from the number zero?

--
dwmw2

2007-01-26 06:01:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers



On Fri, 26 Jan 2007, David Woodhouse wrote:
>
> My question was about _how_ you think this should be achieved in this
> particular case.

I already told you.

Have you noticed that the resource-allocation code _already_ never returns
zero on a PC?

Have you noticed that the resource-allocation code _already_ never returns
zero on sparc64?

Your special-case hack would actually be actively *wrong*, because the
resource-allocation code already does this correctly.

But by "correctly" I very much mean "the architecture has to tell it what
the allocation rules are".

And different architectures will have different rules.

On x86[-64], the reason the allocation code never returns a zero is
because of several factors:

- PCIBIOS_MIN_IO is 0x1000
- PCIBIOS_MIN_CARDBUS_IO is 0x1000
- the x86 arch startup code has already reserved all the special ranges,
including IO port 0.

so on a PC, no dynamic resource will _ever_ be allocated at zero simply
because there are several rules in place (that the resource allocation
code honors) that just makes sure it doesn't.

Now, the reason I say that it needs the architecture-specific code to tell
it those rules is that on other architectures, the reasons for not
returning zero may simply be *different*.

For example, on other platforms, as I already tried to explain, the root
PCI bus itself may not even start at 0 at all. You may have 16 bits worth
of "PIO space", but for all you know, you may have that for 8 different
PCI domains, and for all the kernel cares about, the domains may end up
mapping their PCI IO ports in any random way. For example, what might
"physically" be port 0 on domain0 might be mapped into the IO port
resource tree at 0xff000000, and "physical port 0" in domain1 might be at
0xff010000 - so that the eight different PCI domains really end up having
8 separate ranges of 64k ports each, covering 0xff000000-0xff07ffff in the
IO resource map.

See? I tried to explain this to you already..

If you are a cardbus device, and you get plugged in, your IO port range
DOES NOT GET allocated from "ioport_resource". No, no, no. It gets
allocated from the IO resource of the cardbus controller. And _that_ IO
resource was allocated within the resource that was the PCI bridge for the
PCI bus that the cardbus controller was on. And so on, all the way up to
the root bridge on that PCI domain.

And on x86, the root bridge will use the ioport_resource directly, since
it will always cover the whole area from 0 - IO_SPACE_LIMIT.

But that's an *architecture-specific* choice. It makes sense on x86,
because that's how the IO ports physically work. They're literally tied to
the CPU, and the CPU ends up being in that sense the "root" of the IO port
resources.

But on a lot of non-x86 architectures, it actually could make most sense
to never use "ioport_resource" AT ALL (in which case "cat /proc/ioports"
will always be empty), and instead make the root PCI controller have it's
IORESOURFE_IO resource be a resource that is then mapped into
"iomem_resource". Because that's _physically_ how most non-x86
architectures are literally wired up.

See? Nobody happens to do that (probably because a number of them want to
actually emulate ISA accesses too, and then you actually want to make the
PIO accesses look like a different address space, and probably because
others just copied that, and never did anything different), but the point
is, the resource allocation code really doesn't care. And it _shouldn't_
care. Because the resource allocation code tries to be generic and cater
to all these *different* ways people hook up hardware.

Now, I should finish this by saying that there's a number of legacy
issues, like the fact that "ioport_resource" and "iomem_resource" actually
even *exist* for all platforms. As mentioned, in some cases, it would
probably actually make more sense to not even have "ioport_resource" at
all, except perhaps as a sub-resource of "iomem_resource" (*).

So a lot of this ends up then beign slightly set up in certain way - or at
least only tested in certain configurations - due to various historical
reasons.

For example, we've never needed to abstract out the IO port address
translations as much as we did for MMIO. MMIO has always had
"remap_io_range()" and other translator functions, because even x86 needed
them. The PIO resources have generally needed less indirection, not only
because x86 can't even use them *anyway* (for the "iomap" interfaces we
actually do the mapping totally in software on x86, because the hardware
cannot do it), but also because quite often, PIO simply isn't used as
much, and thus we've often ended up having ugly hacks instead of any real
"IO port abstraction".

For an example of this, look at the IDE driver. It has a lot of crap to
just allow other architectures to just use their own MMIO accessors
instead of even trying to abstract PIO some way. So the PIO layer actually
lacks some of the abstraction, simply because it's no used very much.

Linus

(*) It should actually be possible to really just let an architecture
insert the "ioport_resource" as a sub-resource of the "iomem_resource". It
would be a bit strange, with "cat /proc/iomem" literally as part of it
showing all the same things that "cat /proc/ioport" shows, but it would
really be technically *correct* in many ways. BUT! I won't guarantee that
it works, simply because I've never tried it. There might be some reason
why something like that would break.

2007-01-26 06:18:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers



On Thu, 25 Jan 2007, Linus Torvalds wrote:
>
> Have you noticed that the resource-allocation code _already_ never returns
> zero on sparc64?

Btw, that was a rhetorical question, and I'm not actually sure what the
heck sparc64 will _really_ do ;)

I picked sparc64 as an example, because I _think_ that sparc64 actually is
an example of an architecture that sets up a separate root resource for
each PCI domain, and they are actually set up so that the ioport regions
are literally offset to match the hardware bases (and there are several
different kinds of PCI domain controllers that sparc supports, so those
bases will depend on that too).

So on sparc64, "ioport_resource" really is just a container for the actual
per-domain resource buckets that the hardware (within that domain) will
then do the resource allocation from. Afaik.

But you should actaully verify that with somebody like Davem if you
_really_ care. I cc'd him in case he wants to pipe up and perhaps prove
me wrong.


Linus

2007-01-26 08:17:26

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

From: Linus Torvalds <[email protected]>
Date: Thu, 25 Jan 2007 22:18:21 -0800 (PST)

> So on sparc64, "ioport_resource" really is just a container for the actual
> per-domain resource buckets that the hardware (within that domain) will
> then do the resource allocation from. Afaik.
>
> But you should actaully verify that with somebody like Davem if you
> _really_ care. I cc'd him in case he wants to pipe up and perhaps prove
> me wrong.

They are all physical addresses of the I/O locations.

It just so happens that all the non-memory physicall addresses
have some high bit set.

So it all works out, but not intentionally, it's merely a side
effect of how the physical address space is layed out.

2007-01-26 08:20:22

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

From: David Woodhouse <[email protected]>
Date: Fri, 26 Jan 2007 13:09:40 +0800

> My question was about _how_ you think this should be achieved in this
> particular case. You didn't like the suggestion that we should put your
> new special-case hack into the resource code... where/how _do_ you
> suggest that it's done, so that we can protect those poor driver authors
> from the number zero?

Just set some high bit in the arch code that creates the resources,
then when you interpret them in arch code simply clear the bit.

That's effectively what happens on sparc64 since all non-memory
physical addressing occurs at the top of the physical address
space. But you don't need that layout, you can fake it.

2007-01-26 10:26:05

by Alan

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

> Also, many ISA-only drivers actually have hardcoded PIO numbers (eg
> "0x1f0").

Bad example - 0x1F0 is one of those that leaked into the PCI world as
well 8)

2007-01-26 15:29:38

by Mark Lord

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

> (Yeah, mmap() has taught some people about MAP_FAIL, but that's pretty
> unusual too.

It's so unusual, that I regularly find buggy code checking for NULL
instead of MAP_FAILED. Just about every program I look at has it wrong
in at least one place.

Ditto for the result value from wait() -- lots of code fails to use
the fancy macros required for correct interpretation of it.

> The quality of our drivers is low;

I think that's a bit harsh. But yes, compared to core parts
of the kernel perhaps (eg. page dirty mappings: remember the recent bugfix?).

The real deal is that using zero/NULL makes it *easier* to get the code right.
And we certainly have tons more people working on drivers than anything else,
so making it *easier* for those coders to reduce the bug count is a Good Thing(tm).

Cheers

2007-01-26 15:32:38

by Mark Lord

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

> It's not as if people should be pulling 'raw' IRQ numbers out
> of a hat or even module parameters these days, except for ISA drivers..

Err.. in the embedded space, this is exceptionally common today,
on the most modern of chips, too. Lots of magic hard(ware)-coded IRQ numbers.

Cheers

2007-01-28 22:52:06

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers


> Indeed. Zero means "not in use". It really is that simple.

I tried to avoid stepping into that specific flamewar as I don't have
strong opinions either way, but I've been asked to so ... and there are
a few things in your statement that I do like to argue about a bit :-)

> And I'm sorry, David, but 99% of the world _is_ a PC, and that is where
> PCI and ATA came from, so anybody who thinks that zero is a valid PCI
> address is just sadly mistaken and has a bug. In fact, even on a hardware
> level, a lot of devices will literally have "zero means disabled".
>
> Broken architectures that put PCI things at some "PCI physical address
> zero" need to map their PCI addresses to something else. It's part of why
> we have the whole infrastructure for doing things like

As I said, I don't have a firm pro/con position in that matter, but I
notice that we had some firmwares putting things at IO port 0 for a
while now and it generally worked for what we did with it.

I do still think that there is some value to the argument that, for IO
ports (and I'm really talking about IO ports here, the HW one, not
what's in the struct resource which may or may not be the actual HW IO
port number depending on how the arch implements them) :

- 0 is valid and works fine on some platforms with some devices
- 0 seems to be in some rare cases the only valid/useable spot
according to David and even pretty common in pcmcia land
- some existing firmwares -are- allocating things at 0 and having
to relocate things is a pain.

Thus, based on that, while I -do- agree that it's a bad idea to
purposefully put things at 0 if you can avoid it (looking for trouble...
I always tell FW folks to start assigning IO ports at 0x1000), I don't
think it's nice to say that it's ok for drivers to use it as a special
case and thus intentionally break some of the cases that would have
worked just fine.

> pcibios_bus_to_resource()
> pcibios_resource_to_bus()
>
> etc - not just because a resource having zero in "start" means that it is
> disabled, but because normally such broken setups have *other* problems
> too (ie they don't have a 1:1 mapping between PCI addresses and physical
> addresses _anyway_, so they need to address translations).

Yes well... remapping... there are several issues with that, see below.

> This has come up before. For example: for an IRQ, 0 means "does not
> exist", it does _not_ mean "physical irq 0", and we test for whether a
> device has a valid irq by doing "if (dev->irq)" rather than having some
> insane archiecture-specific "IRQ_NONE". And if you validly really have an
> irq at the hardware level that is zero, then that just means that the irq
> numbers you should tell the kernel should be translated some way.
>
> (On a PC, hardware irq 0 is a real irq too, but it's a _special_ irq, and
> it is set up by architecture-specific code. So as far as the generic
> kernel and all devices are concerned, "!dev->irq" means that the irq
> doesn't exist or hasn't been mapped for that device yet).

While I didn't much like your decree forcing everybody to consider IRQ 0
as invalid (I think nor everybody adapted yet even, isn't ARM still
using -1 as NO_IRQ ? BTW... I do like having a constant even if it's
defined to 0 :-) ... I do admit that writing a remapping layer on
powerpc (in part because of that, in part to solve other issues) ended
up cleaning up a lot of cruft, especially in the area of multiple
cascaded controllers.

Once the remapping layer was there, HW numbers became totally irrelevant
to anything but the controller instance they sit on, and linux irqs
became dynamically allocated virtual numbers, thus allowing to do some
tricks like 0 is always invalid or 1...15 are never assigned to anybody
except legacy 8259 if any. A nice way to fool-proof against stupid
drivers. In the end, remapping of IRQs did improve more stuffs than it
added bloat so I think it was a winning situation. Now, the next step is
to completely get rid of linux irq numbers alltogether and use irq_desc
* :-)

However, for IOs, things are a bit more annoying in several areas.
First, if you remap, do you remap at the HW level or the linux resource
content ? (And I'm talking about both PIO and MMIO here).

Because if we just do a linux remapping layer, we basically:

- Don't solve any problem where HW actually doesn't like 0

- Add a possibly complicated remapping layer on top of something that
doesn't necessarily need one.

Besides, there's already enough convoluted issues in the way archs remap
or don't remap PCI resources to not want to start playing more games in
that area (and possibly break X while at it, heh, I know it needs to be
fixed and Ian is working on it, but still).

Thus, if for some reason, IO resource 0 or memory resource 0 are
unacceptable, then they should probably be remapped by the arch code in
HW (moving the devices elsewhere), _NOT_ as a virtual remapping thing in
linux.

> So there are three issues:
>
> - we always need a way of saying "not mapped"/"nonexistent", and using
> the value zero is the one that GIVES THE CLEANEST SOURCE CODE!

I have to agree with that one... Though in various parts of the kernel,
we actually use resource->flags = 0 :-) And I think that's a nicer way
to do it.

That is, the resource flags define the type of what's in start/end, and
if no type ... then non existent.

Or maybe you think we should keep the information that there's actually
something like a BAR there (and thus remember it's IO/MEM type) but we
just didn't assign it ? I like keeping the resource "value" as something
totally "magic" that has no defined semantic outside of the accessors
provided to use it.

> It's why
> NULL pointers are zero too. Sure, virtual address zero is a real
> virtual address, but that doesn't change the fact that C made 0 be
> special on a language level. If you want to access that virtual
> address, and use NULL as a "doesn't exist" at the same time, you need
> to swizzle your pointer somehow.
>
> - the x86[-64] architecture is the one that gets tested the most. So
> everybody else should try to look like it, rather than say "we are
> different/better/strange". Don't have any special IO instructions?
> Tough. You'd better do "inb/outb" anyway, and map them onto your
> memory-mapped IO somehow.

Well, for IO space, most non-x86 archs basically ioremap some magic area
of bus space used to generate IO cycles, put that virtual address in a
global, and have inb/outb add that as an offset to the port number.

That means that port numbers are kept intact (0 based) and I don't see
the need to add another remapping layer on top of that. Works fine.

If you want 0 to be an illegal port number, then make it so by having
the arch code actually -move- the device elsewhere, don't add a
remapping trick to please drivers.

> - per-architecture magic values is a bad idea. If you need a magic value
> (and things like this _do_ need one, unless we always want to carry a
> separate flag around saying "valid" or "not valid"), it's a lot better
> to just say "everybody uses this value" and then have the _small_
> architecture-specific code work around it, than have everybody have to
> work around a lot of architectures doing things differently.
>
> All these issues boil down to the same thing: whenever at all physically
> possible, we make sure that everything looks as much as a PC as possible.
> Only when that literally isn't an option do we add other abstractions.

They are everywhere !!! :-)

Ben.

2007-01-28 22:58:53

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

> On sparc64, for example, after I pointed this out to DaveM, he was able
> to implement the new iomap interface without the 'if (pio-mem-area)'
> branch present on x86.

Yup, we did that from day 1 on powerpc :-) However, I don't totally
agree with adding some other remapping layer here, I think if we want
PIO 0 to be illegal, then make it illegal at the HW level too.

The reason is in fact the same as Linus invoked for remapping it in the
first place -> make things look like an x86 :-)

That is, quite a few non-x86 machines do have some kind of superIO chip
or other set of legacy devices around. They also commonly have VGA cards
hard decoding VGA PIO addresses.

So here's a very simplified version on how most non-x86 platforms do
PIO :

at boot:

pci_io_base = ioremap(MAGIC_PIO_REGION);

and then

#define inb(port) (readb(pci_io_base + (port)))

The nice thing with that approach is that all those legacy x86 drivers
for bits in your SuperIO chip or for VGA do actually still work when they
do

inb(STUPID_HARD_CODED_IO_PORT);

While if we now add some magic remapping to make 0 illegal, that will break
and all those legacy drivers would have to be fixed, which nobody wants
to do.

So I do stand firm there. I don't necessarily mind deciding that 0 is an
illegal PIO address, but if we're going to do that, we should make it
illegal as a HW PIO address, not by adding a remapping trick to something
that really doesn't want more than it already has :-)

Ben.


2007-01-28 23:01:37

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

> However, in all honesty, we have triggered bugs in that area too, simply
> because some driver code "knew" that PIO addresses could fit in 16 bits,
> and used u16 or "unsigned short" to remember the PIO address. Both ARM and
> Sparc was bitten by this, although usually the issue is trivial to fix
> once found.
>
> Also, many ISA-only drivers actually have hardcoded PIO numbers (eg
> "0x1f0").
>
> But yes, I would generally suggest that architectures where the PIO range
> is really just another magic MMIO range (which is most of the non-x86
> world, as you point out) might as well at least aim for doing the
> remapping early (ie with "pci_resource_start()")
>
> Making that easy was one of my goals for the "new" IO accessor functions,
> in fact.
>
> Not that many people actually use them.

Well, as I said, I prefer keeping PIO numbers 0 based so that legacy
crap works. If 0 is to be illegal, then remap any devive that's sitting
there, but don't do weirder remapping tricks than necessary ;-)

Note that on PowerPC, we do actually remap PIO on non-primary busses
(well, we have to since we have to present a flat space to inx/outx).

What we do there is basically we "pick" a bus as beeing primary
(typically the one that has an ISA bridge and/or SuperIO on it) and
that's the one that gets PIO 0. Resources for device on that domain
aren't fixed up.

All other ones are fixed up in such a way that pointer arithmeric gives
you an inx/outx landing in the right spot for PCI devices.

> So *if* you use the new "iomap" interfaces, and the new "pci_iomap()"
> things, that should actually not just allow drivers (like the ATA layer)
> to share much more code between the PIO and MMIO cases, but it hopefully
> actually makes it easier for strange architectures to do it all.

Yes. Though adoption of iomap is a bit slow on the driver side of
things.

> So traditionally, we've had PIO be "limited integer addresses, and some
> drivers know magic numbers", but hopefully new drivers could at least try
> to use some of the infrastructure where we try to help people not have to
> deal with it so much as a special case any more.

Well, hopefully new drivers don't need PIO as it's about time people
stop releasing devices that do PIO, bloody hell ! That should have been
"deprecated use for compatibility only" from day #1 :-) In fact, if I'm
not mistaken, the PCI spec mandates that device should have an MMIO way
of doing everything that can be done via PIO nowadays no ?

Ben.


2007-01-28 23:04:25

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers


> We could have done it for interrupts too. A "struct irqnum" that has a bit
> that specifies "valid". That would work. But it tends to be painful, so it
> really has to give you something more than "zero is disabled".
>
> It's just not worth it.
>
> And it's why I decreed, that the ONLY SANE THING is to just let people do
> the obvious thing:
>
> if (!dev->irq)
> return -ENODEV;
>
> you don't have to know ANYTHING, and that code just works, and just looks
> obvious. And you know what? If it causes a bit of pain for some platform
> maintainer, I don't care one whit. Because it's obviously much better than
> the alternatives.

I'd rather have dev->irq be a struct interrupt * :-) The NULL check
would then make more sense and we completely remove "numbers" from
drivers, they have to obtain their struct interrupt via some interrupt
mapping code....

Ben.