Here's the return of an oooold problem for which we really need a
solution asap since it's now biting us in real life configurations...
So the problem happens when you have a machine with more than one PCI
host bridge. This is typically the case of all new Apple machines as they
have 3 host bridges in one chip (2 of them are relevant: the AGP and the
PCI). I don't think the problem exist on x86 machines with real IO
cycles, at least in that case, the problem is different.
In order to generate IO cycles, the bridge provides us with a region in
CPU physical memory space (a 16Mb region in our case) that translates
accesses to IO cycles on the PCI bus. Our implementation of inb/outb
currently relies on the kernel ioremap'ing one of these regions (the PCI
one) and using the ioremap result as a base (offset) inside the inb/outb
functions.
So that mean that the current design won't allow access to IOs located on
any bus but the one we arbitrarily choose (the PCI bus). That's fine in
most case, until you decide to put a 3dfx or nvidia card in the AGP slot.
Those cards require some IO accesses to be done to the legacy VGA
addresses, and of course, our inb/outb functions can't do that.
Obviously, we can hack some driver specific thing that would use the
arch-specific code to retreive the proper io base address for a given
host bridge, but that's a hack. I'm looking for a solution that would
cleanly apply to all archs that may potentially face this problem.
The problem potentially exist also for any PCI card that has PCI IOs on
anything but the main PCI bus.
One possibility is to limit our IO space to 64k per bus (to avoid
bloating) and then use a hacked ioremap to create a single virtually
contiguous kernel region that appends all those IO spaces together.
Accessing IOs on bus N would just be the matter of calculating an address
of the type 64k*N+offset and doing normal inb/outb on the result. The
arch PCI code could then properly fixup PCI IO resources for PCI drivers,
and we could add a function of the kind
unsigned long pci_bus_io_offset(int busno);
that would return the offset to add to inb/outb when accessing IOs on the
N'th PCI bus.
If we want to go a bit further, and allow ISA drivers that don't have a
pci_dev structure to work on legacy devices on any bus, we could provide
a set of function of the type
int isa_get_bus_count();
unsigned long isa_get_bus_io_offset(int busno);
and eventually
int isa_bus_to_pci_bus(int isa_busno);
int pci_bus_to_isa_bus(int pci_busno);
If we want to figure out on which PCI bus a given ISA bus is located if
any (-1 beeing no mapping
exist).
Of course, the same problem exist for ISA memory (used by legacy VGA
modes). It's not a problem in real life currently since no powermac can
produce PCI cycles in the ISA memory range today, and non-powermac PPC
machines currently don't have needs for video cards on anything but the
main bus, but the potential issue is there, and the need for a solution
may pop up too.
I'm, of course open to any comments about this (in fact, I'd really like
some feedback). One thing is that we also need to find a way to pass
those infos to userland. Currently, we implement an arch-specific syscall
that allow to retreive the IO physical base of a given PCI bus. That may
be enough, but we may also want something that match more closely what we
do in the kernel.
Regards,
Ben.
>
>If we want to go a bit further, and allow ISA drivers that don't have a
>pci_dev structure to work on legacy devices on any bus, we could provide
>a set of function of the type
>
> int isa_get_bus_count();
> unsigned long isa_get_bus_io_offset(int busno);
I would add that I'd prefer to keep it separated from the PCI layer in
that sense that it can also help handle 16bits ISA-like IO busses on
embedded hardware which may (will most of the time) not have anything
like a PCI bus. Having the ability to map PCI<->ISA bus numbers should be
an option.
Ben.
> I'm, of course open to any comments about this (in fact, I'd really like
> some feedback). One thing is that we also need to find a way to pass
> those infos to userland. Currently, we implement an arch-specific syscall
> that allow to retreive the IO physical base of a given PCI bus. That may
> be enough, but we may also want something that match more closely what we
This is also a problem for mmio and to an extent other things on pa-risc.
You might want to talk to Grant and the other HPPA hackers
Benjamin Herrenschmidt writes:
> I'm, of course open to any comments about this (in fact, I'd really like
> some feedback). One thing is that we also need to find a way to pass
> those infos to userland. Currently, we implement an arch-specific syscall
> that allow to retreive the IO physical base of a given PCI bus. That may
> be enough, but we may also want something that match more closely what we
> do in the kernel.
Same problem on sparc64. Using a special PCI syscall is fine, _if_ we
all end up using the same one. However, I would prefer another
mechanism...
I think a cleaner scheme is to allow mmap() on
/proc/bus/pci/${BUS}/${DEVICE} nodes, that is much cleaner and solves
transparently any "different word size between userland and kernel"
issues (specifically 32-bit userlands executing on 64-bit kernels).
I played around with something akin to this, and some of the necessary
Xfree86-4.0.x hackery needed, some time ago. But I never finished
this.
Later,
David S. Miller
[email protected]
"David S. Miller" wrote:
> I played around with something akin to this, and some of the necessary
> Xfree86-4.0.x hackery needed, some time ago. But I never finished
> this.
Sounds pretty sweet. How about we finish it? Any complaints (well
reasonable ones :-) or concerns that came out of discussions or
your testing we need to consider?
Thanks.
-- Dan
Dan Malek writes:
> "David S. Miller" wrote:
>
> > I played around with something akin to this, and some of the necessary
> > Xfree86-4.0.x hackery needed, some time ago. But I never finished
> > this.
>
> Sounds pretty sweet. How about we finish it? Any complaints (well
> reasonable ones :-) or concerns that came out of discussions or
> your testing we need to consider?
There is only one sticking point, and that is how to convey to the
mmap() call whether you want I/O or Memory space. In the end, my
analysis came up with basically an ioctl() on the same PCI device
node to set this, and you could keep track of this state in the
filp private area.
I thought originally you could do this with the lower bits of the
mmap() offset, but that won't work in 2.4.x because they are stripped
out and you only get a page number by the time the driver mmap
call runs.
I really like this solution because it does not involve any new
syscalls to be added to glibc and/or the Xfree86 arch/os specific
code. Just opening files, mmap, and an ioctl number or two. All
of this can be shared between ports.
As a side note, Alpha has a special PCI syscall to get the "PCI
controller number" a given PCI device is behind. We could add
another ioctl number which does the same thing on /proc/bus/pci/*/*
nodes. This way sparc64 and Alpha could have the same user visible
API for this as well.
Later,
David S. Miller
[email protected]
> > I'm, of course open to any comments about this (in fact, I'd really like
> > some feedback). One thing is that we also need to find a way to pass
> > those infos to userland. Currently, we implement an arch-specific syscall
> > that allow to retreive the IO physical base of a given PCI bus. That may
> > be enough, but we may also want something that match more closely what we
> > do in the kernel.
>
>Same problem on sparc64. Using a special PCI syscall is fine, _if_ we
>all end up using the same one. However, I would prefer another
>mechanism...
Right, I remember we discussed this some monthes ago. Currently, we have
a syscall that is slightly different from the sparc/alpha ones but very
similar.
>I think a cleaner scheme is to allow mmap() on
>/proc/bus/pci/${BUS}/${DEVICE} nodes, that is much cleaner and solves
>transparently any "different word size between userland and kernel"
>issues (specifically 32-bit userlands executing on 64-bit kernels).
>
>I played around with something akin to this, and some of the necessary
>Xfree86-4.0.x hackery needed, some time ago. But I never finished
>this.
I do agree with you on this. I didn't have time to really work on it so
far, I remember you posted a test patch but I was busy at that time with
other PCI issues we had with multiple bus systems.
Note that this is only the userland side of the story. For now, I'm more
concerned about finding a good solution to the kernel side.
Also, the problem of finding where the legacy ISA IOs of a given PCI bus
are is a bit different that simply mmap'ing a BAR. Some video cards
require some access to their VGA IOs without having a BAR covering them,
in some case it's necessary to switch the chip from VGA to MMIO mode.
I've looked at the parisc code (thanks Alan for pointing that out), and
it seem they implement all inb/outb as quite big functions that decypher
the address, retreive the bus, and do the proper IO call. Unfortunately,
that's a bit bloated, and I don't think I'll ever get other PPC
maintainers to agree with such a mecanism (everybody seem to be quite
concerned with IO speed, I admit including me).
Also, that wouldn't really help the case of legacy drivers or video
drivers using legacy addresses for VGA. In all cases, whatever solution
we end up having, those will have to be adapted. What I'd like is a
smooth path that allow unchanged drivers to still work with the default
bus, while adapted driver can be done so with minimum changes (mostly
ending up storing an io base and creating a virtual "ISA bus number").
That way, an ISA-like (legacy IO bus) can be mapped to either a PCI bus,
or whatever. Maybe "ISA" is not a proper word for it, it could be
"basic_io_bus" maybe.
Alan also pointed out that there may be similar issues with MMIOs. In
fact, as long as we are working with PCI devices, we can easily get
things fixed up by munging the resource structures at fixup time. The
_is_ however a similar issue with legacy ISA memory, especially since
some platform can simply not let you access it.
Looking at those in more details (other archs), it appears that the
problem happens on most non-x86 archs and is handled differently for each
of them, when it's handled at all.
So what would be a preferred way ? Create that fake ISA bus number and
provide functions for looking them up, getting their IO and mem bases,
and eventually mapping PCI busses to ISA busses ? Or does someone have a
better idea ? The goal is to try not to change the semantics of inb/outb
and friends so that most legacy drivers can still work using the
"default" IO bus if they are not upgraded to the new scheme.
Thanks for your feedback,
Regards,
Ben.
"David S. Miller" wrote:
> There is only one sticking point, and that is how to convey to the
> mmap() call whether you want I/O or Memory space.
Isn't I/O space obsolete by now :-)? It actually caused me to think
of something else....I have cards with multiple memory and I/O
spaces (rare, but I have them). What if we did:
/proc/bus/pci/${BUS}/${DEVICE}/mem
/proc/bus/pci/${BUS}/${DEVICE}/io
/proc/bus/pci/${BUS}/${DEVICE}/BARn
The 'mem' or 'io' would map the first instance of these spaces
on the device, and would probably be suitable for nearly all devices.
If you really knew what you were doing (or wanted to make a big mess),
you could use the 'BARn' to specify the area.
You could even do something like map in as much virtually contiguous
space as indicated in the mmap(). For example, if the card has 2M I/O
and 8 M memory (in this order), the first 2M of the mmap()'ed space
would the the I/O and the next 8M would be the memory. I know, some
cards lie about the actual amount of space they have or need, but it
was just another idea that popped in.......
Thanks.
-- Dan
>As a side note, Alpha has a special PCI syscall to get the "PCI
>controller number" a given PCI device is behind. We could add
>another ioctl number which does the same thing on /proc/bus/pci/*/*
>nodes. This way sparc64 and Alpha could have the same user visible
>API for this as well.
And on PPC too since I adapted the pci controller mecanism to
it in 2.4.
In fact, all that is done by our various syscalls could be done by
ioctl's on /proc/bus/pci/*/*.
To be generic, the pci controller number should rather be the pci bus
number of the host bridge (the top of the PCI tree a given device lives
on). The internal controller numbers have no real meaning I think to
userland.
Also, an ioctl to retreive the iobase would be useful too (in addition
to the mmap), especially for getting access to VGA IOs associated with a
given PCI card, but also for whatever test tool one would want to write
in userland that access legacy IOs on a given PCI bus.
Having the mmap is fine, but I like having also the ability to retreive
all the informations via an ioctl too.
I beleive that if we can agree on the in-kernel format of the PCI
controller structure and function to retreive it from a bus number, we
can make this generic.
For us, the pci controller requires at least an iobase (physical &
virtual as we always ioremap the IO space during boot) for generating
io cycles, the config ops, the mem offset (some platforms don't have
a 1:1 mapping of memory cycles vs. CPU bus cycles for PCI memory, for
example, on PReP, you write to physical c0000000 to get a PCI memory
write to 00000000). And finally the isa memory base (it may be located
differently, some bridge have 1:1 mappings and so allow only high
memory addresses to go to the PCI, but do open a "window" at a different
physical address to generate ISA memory cycles (low address cycles)).
Finally, we have some private datas (pointer to OF node for example),
the resource structures (so that we know what a given host bridge can
decode and can allocate unallocated PCI resources properly).
I'm not familiar with the requirements of other archs however.
Ben.
Benjamin Herrenschmidt writes:
> Also, the problem of finding where the legacy ISA IOs of a given PCI bus
> are is a bit different that simply mmap'ing a BAR. Some video cards
> require some access to their VGA IOs without having a BAR covering them,
> in some case it's necessary to switch the chip from VGA to MMIO mode.
Many platforms, sparc64 included, do not have an ISA IO space nor do
they provide VGA accesses at all.
If things such as XFree86 are coded for such platforms to not require
VGA accesses (the 'ati' driver is already like this when certain
build time defines are set), this could become a non-issue in this
case.
> So what would be a preferred way ? Create that fake ISA bus number and
> provide functions for looking them up, getting their IO and mem bases,
> and eventually mapping PCI busses to ISA busses ? Or does someone have a
> better idea ? The goal is to try not to change the semantics of inb/outb
> and friends so that most legacy drivers can still work using the
> "default" IO bus if they are not upgraded to the new scheme.
There is no 'fake' ISA bus number you need. There is a 'real' one,
the one on which the PCI-->ISA bridge lives, why not use that one
:-)
Then you could find such an ISA bridge, open that PCI device, then
finally perform the PCI_IOCTL_GETIOBASE thingy on it, but I don't like
this get-iobase idea at all, see my next email in this thread for why.
Later,
David S. Miller
[email protected]
Dan Malek writes:
> It actually caused me to think of something else....I have cards
> with multiple memory and I/O spaces (rare, but I have them).
So what? All such bar's within mem/io space are part of unique
regions of the total MEM/IO space.
Thus you can pass non-conflicting offset/size pairs, based upon the
BAR value of interest, to mmap and everything is fine.
Later,
David S. Miller
[email protected]
Benjamin Herrenschmidt writes:
> Also, an ioctl to retreive the iobase would be useful too
No, the whole point of my suggested mmap() interface is to
_ENTIRELY_ eliminate any reason for the user to even see
what the physical addressing of the machine looks like.
If you start pushing iobases to the user, you break this.
I do not want an interface where the user still has to do
grotty stuff like mmap() on /dev/{mem,kmem}, this was the
core of the problem I had with the syscall idea, don't bring
it back.
Make mmap()'s on a PCI-->ISA bridge do something special, for
example.
The user doesn't need to know anything about physical addressing of
the machine, it all can and should be abstracted away. This is why I
really detest the XFree86 PCI bus probing layer, it should not need to
poke around at so much of the config space information of devices :-(
It is the reason why, at least still today in Xfree86 CVS, it simply
cannot cope with multiple PCI controllers in a machine because it
assumes a flat MEM/IO space. They know about the problem and are
working on fixes, but my point is that making this overly knowledgable
PCI prober in the first place is what created these problems.
Later,
David S. Miller
[email protected]
> There is no 'fake' ISA bus number you need. There is a 'real' one,
> the one on which the PCI-->ISA bridge lives, why not use that one
> :-)
IFF the ISA bus hangs off the PCI bridge. Similarly not all machines have
PCI as the primary I/O bus. On hppa PCI busses hang off the gsc bus
Benjamin Herrenschmidt wrote:
> Hi Grant !
>
> Alan Cox suggested I contact you about this. I'm trying to figure out a
> way to cleanly resolve the problem of doing IO accesses on machines with
> multiple PCI host bridges (and multiple IO bases when IO cycles are not
> generated by the CPU). I'd be glad if you could catch on the
> "The IO problem on multiple PCI busses" thread on linux-kernel list
> and let us share your point of viw.
To l-k, Benjamin wrote:
| I've looked at the parisc code (thanks Alan for pointing that out), and
| it seem they implement all inb/outb as quite big functions that decypher
| the address, retreive the bus, and do the proper IO call. Unfortunately,
| that's a bit bloated, and I don't think I'll ever get other PPC
| maintainers to agree with such a mecanism (everybody seem to be quite
| concerned with IO speed, I admit including me).
Benjamin,
As the main author/maintainer of that code, let me explain why
it's so ugly. Hopefully this will give you insight into a "better"
(arch independent) solution. Apologies for the length.
For IO Port space, I didn't worry about the bloat. A nice side effect of
this bloat is it will discourage use of I/O Port space. That's good for
everyone, AFAICT. (I know some devices *only* support I/O port space and
I personnally don't care about them. If someone who does care about one
wants to talk to me about it...fine...I'll help)
[ Caveat: I've simplified the following *alot* to keep it short. ]
parisc supports two different PCI host bus adapters with each having
variants that behave differently. All work under the model we are using
with one binary. One kernel binary is important since we want to make
install's easy for users.
Under Dino (GSCtoPCI), each PCI HBA has it's own 64K I/O port space.
I/O port space transactions are generated by poking registers on Dino.
Yes - performance sucks - that's why HPUX (almost) exclusively
uses devices which support MMIO.
Under Elroy (aka LBA or RopesToPCI), we have two methods of accessing
I/O port space. One view of I/O space can be shared across all Elroy's
which share the same IOMMU (aka SBA). This method distributes the 64K
I/O space over the 8 (or 16) "ropes" with rope 0 getting the first
8k (or 4k) and so on. The other view is each LBA has it's own 64K
of I/O port space. The second view is mapped above 4GB and requires
64-bit kernel to access. In both cases, processor loads/stores from/to
the region will generate an I/O cycle on the respective PCI bus.
Generally speaking, parisc doesn't support VGA or ISA legacy crud on
it's PCI busses. But I think those are orthogonal issues.
The inb/outb support hings on this definition in include/asm-parisc/pci.h:
struct pci_port_ops {
u8 (*inb) (struct pci_hba_data *hba, u16 port);
u16 (*inw) (struct pci_hba_data *hba, u16 port);
u32 (*inl) (struct pci_hba_data *hba, u16 port);
void (*outb) (struct pci_hba_data *hba, u16 port, u8 data);
void (*outw) (struct pci_hba_data *hba, u16 port, u16 data);
void (*outl) (struct pci_hba_data *hba, u16 port, u32 data);
};
Code which uses this is in arch/parisc/kernel/pci.c at:
http://puffin.external.hp.com/cvs/linux/arch/parisc/kernel/pci.c
(look for PCI_PORT_HBA usage)
In a nut shell, the HBA number is encoded in the upper 16-bits
of the 32-bit I/O port space address. The inb() *function* uses the
decoded HBA number to lookup the matching pci_port_ops function table
and pci_hba_data * to pass in. PCI fixup_bus() code virtualizes the
I/O port addresses found by the generic PCI bus walk. inb() is
function so drivers work under *all* parisc PCI HBAs with one binary.
This scheme allows us to support "PCI-like" busses as well.
Some parisc machines have both PCI and EISA slots which are completely
independent of each other. We'd like to keep the semantics of inb/outb
the same and support both at the same time. It might be possible
to do this by feeding the drivers different versions of inb/outb
definitions at compile time. But initial attempts to do this ran
into problems (which I don't remember the details of).
Last comment is regarding who *configures* the PCI devices. On legacy PDC
(parisc's "BIOS on steriods"), the PDC sets everything up but does
not enable everything (ie pci_enable_device will set bits in PCI_COMMAND
cfg register). On card-mode Dino, (GSC cards plugged in proprietary bus),
the firmware doesn't know *anything* about the PCI devices and the arch
support has to set everything up - PCI MMIO space is not currently
supported there. And new servers (like L2000 or A500) with "PAT PDC" only
initialize PCI devices for boot. OS has to initialize the rest.
grant
Grant Grundler
parisc-linux {PCI|IOMMU|SMP} hacker
+1.408.447.7253
Grant Grundler writes:
> A nice side effect of this bloat is it will discourage use of I/O
> Port space. That's good for everyone, AFAICT. (I know some devices
> *only* support I/O port space and I personnally don't care about
> them. If someone who does care about one wants to talk to me about
> it...fine...I'll help)
There is another case you are ignoring. Some devices support memory
space as well as I/O space, but only operate reliably when their
I/O space window is used to access it.
It just sounds to me like the hppa pci controllers are crap,
especially the GSC one. At least the rope one does something
reasonable when you have a 64-bit kernel. The horrors you've told me
about the IOMMUs and stream-caches on these chips further confirms my
theory :-)
Later,
David S. Miller
[email protected]
>Many platforms, sparc64 included, do not have an ISA IO space nor do
>they provide VGA accesses at all.
>
>If things such as XFree86 are coded for such platforms to not require
>VGA accesses (the 'ati' driver is already like this when certain
>build time defines are set), this could become a non-issue in this
>case.
What I call ISA IOs here doesn't necessarily mean there's an ISA bridge
on the PCI. In fact, VGA cards usually don't live behind such a bridge
while still requiring occasionally access to the "legacy" ISA IO addresses.
On PPC, we don't have an "IO" space neither, all we have is a range of
memory addresses that will cause IO cycles to happen on the PCI bus. But
we can have one per bus, thus causing this need for several "legacy" IO
spaces (each bus can have IO addresses in the range 0-64k).
The typical problem that happens right now is a VGA card in the AGP slot
and a VGA card in a PCI slot. Both may (depending on the card) need
access to the legacy VGA IO addresses on the PCI bus on which they are
located. So in this case, we clearly have 2 legacy busses, av ailable at
2 different physical memory addresses in the CPU space. What we can do is
use mapping tricks to map the contiguously in kernel virtual space so
that only an "offset" allows to go from tone to the other with inb/outb.
Without that, we need to create new versions of inb/outb that take a bus
number.
Ben.
>I do not want an interface where the user still has to do
>grotty stuff like mmap() on /dev/{mem,kmem}, this was the
>core of the problem I had with the syscall idea, don't bring
>it back.
>
>Make mmap()'s on a PCI-->ISA bridge do something special, for
>example.
>
>The user doesn't need to know anything about physical addressing of
>the machine, it all can and should be abstracted away. This is why I
>really detest the XFree86 PCI bus probing layer, it should not need to
>poke around at so much of the config space information of devices :-(
>
>It is the reason why, at least still today in Xfree86 CVS, it simply
>cannot cope with multiple PCI controllers in a machine because it
>assumes a flat MEM/IO space. They know about the problem and are
>working on fixes, but my point is that making this overly knowledgable
>PCI prober in the first place is what created these problems.
Ok, I see your point and I agree.
There is still the need, in the ioctl we use the "select" what need to be
mapped by the next mmap, to ask for the "legacy IO range of the bus where
the card reside" (if it exist of course). That would be the 0-64k (or less,
actually a couple of pages would probably be enough) that generates IO cycles
in the "low" addresses used for VGA registers on the card.
Ben.
"David S. Miller" wrote:
> There is another case you are ignoring. Some devices support memory
> space as well as I/O space, but only operate reliably when their
> I/O space window is used to access it.
ok. Those also fall into the category of "I personally don't care" :^)
> It just sounds to me like the hppa pci controllers are crap,
> especially the GSC one.
In defense of the HW designers, Dino operates extremely well
in the environment it was designed for. Principally, workstations
with HP graphics cards (which only use MMIO). Optimizations for
graphics make it one of the fastest PCI-1X (and Cujo is PCI-2X)
HBA's - that's according to a 3rd party graphics card vendor who
has ported to the major high-end platforms.
> At least the rope one does something
> reasonable when you have a 64-bit kernel. The horrors you've told me
> about the IOMMUs and stream-caches on these chips further confirms my
> theory :-)
Yup. *sigh*. Between chip bugs, tradeoffs of performance, time to market,
and simple programming interface, things got pretty ugly (its the
old saying about "Pick any two").
grant
Grant Grundler
parisc-linux {PCI|IOMMU|SMP} hacker
+1.408.447.7253
Benjamin Herrenschmidt writes:
> What I call ISA IOs here doesn't necessarily mean there's an ISA bridge
> on the PCI.
Ok.
> On PPC, we don't have an "IO" space neither, all we have is a range of
> memory addresses that will cause IO cycles to happen on the PCI bus.
This is precisely what the "next MMAP is XXX space" ioctl I've
suggested is for. I think I've addressed this concern in my
proposal already. Look:
fd = open("/proc/bus/pci/${BUS}/${DEV}", ...);
if (fd < 0)
return -errno;
err = ioctl(fd, PCI_MMAP_IO, 0);
if (err < 0) {
close(fd);
return -errno;
}
ptr = mmap(NULL, pdev->bar[3].size, PROT_READ | PROT_WRITE,
MAP_PRIVATE, fd, pdev->bar[3].start);
Something like that.
> Without that, we need to create new versions of inb/outb that take a bus
> number.
No, don't do this, it is evil. Use mappings, specify the device
related info somehow when creating the mapping (in the userspace
variant you do this by openning a specific device to mmap, in the
kernel variant you can encode the bus/dev/etc. info in the device's
resource and decode this at ioremap() time, see?).
Later,
David S. Miller
[email protected]
Benjamin Herrenschmidt writes:
> There is still the need, in the ioctl we use the "select" what need to be
> mapped by the next mmap, to ask for the "legacy IO range of the bus where
> the card reside" (if it exist of course). That would be the 0-64k (or less,
> actually a couple of pages would probably be enough) that generates IO cycles
> in the "low" addresses used for VGA registers on the card.
As I've stated in another email, this is perfectly fine and is
precisely the kind of thing implied by my original proposal
in this thread.
You can even have arch-specific "next mmap is" ioctl values to do
"special things".
The generic part of the ioctl()/mmap() bits the PCI driver will have
added won't care about these ioctl's all that much, the
include/asm/pcimmap.h header will deal with all such details. This
header is also where the physical address and the actual creation of
the page table mappings will occur. The generic PCI code will only
provide the skeletal parts of the mmap() method and call into the
arch-specific hooks coded in asm/pcimmap.h
Later,
David S. Miller
[email protected]
>No, don't do this, it is evil. Use mappings, specify the device
>related info somehow when creating the mapping (in the userspace
>variant you do this by openning a specific device to mmap, in the
>kernel variant you can encode the bus/dev/etc. info in the device's
>resource and decode this at ioremap() time, see?).
Well, except that drivers doing IOs don't ioremap...
Maybe we could define an ioremap-like function for IOs, but the more
we discuss this, the more I feel that for in-kernel, a simple function
that returns a per-bus io base (and another one for ISA mem) is plenty
enough for the few legacy things we have to deal with (mostly VGA).
For PCI drivers doing IOs, we just need to have the IO resource
structures to be properly fixed up (include the correct iobase already).
That iobase can either be a mix of a real io address and a "cooking" in
the high bits like parisc, or it can be an address ioremap'd in the
correct bus mapping when it's possible, or whatever...
Ben.
>I/O is not supposed to be fast, that's what MMIO is for. :) Just do
>
>void outb (u8 val, u16 addr)
>{
> void *addr = ioremap (ISA_IO_BASE + addr);
> if (addr) {
> writeb (val, addr);
> iounmap (addr);
> }
>}
>
>You can map and unmap for each call :) Ugly and slow, but hey, it's
>I/O...
Well, that would really suck ;) And I don't think it would be necessary
as we can probably limit each IO bus to 64k without much problem, and
have them permanently ioremap'ed.
Ben.
[email protected] said:
> You can map and unmap for each call :) Ugly and slow, but hey, it's
> I/O...
outb(bus *bus, u8 val, u16 addr);
#ifdef ONE_TRUE_BUS_SPACE
#define outb(bus, val, addr) __outb(val, addr)
#else
#define outb(bus, val, addr) bus->out8(bus, val, addr)
#endif
--
dwmw2
On Thu, Mar 01, 2001 at 11:09:00AM -0800, David S. Miller wrote:
I think a cleaner scheme is to allow mmap() on
/proc/bus/pci/${BUS}/${DEVICE} nodes, that is much cleaner and solves
transparently any "different word size between userland and kernel"
issues (specifically 32-bit userlands executing on 64-bit
kernels).
This works great for when you want to do IO cycles from userland; but
what about the case of hardware which requires IO cycles from a
device driver for some very non-video hardware that may support
multiple cards across multiple busses?
Or do I not understand?
--cw
On Fri, 2 Mar 2001, David S. Miller wrote:
> > On PPC, we don't have an "IO" space neither, all we have is a range of
> > memory addresses that will cause IO cycles to happen on the PCI bus.
>
> This is precisely what the "next MMAP is XXX space" ioctl I've
> suggested is for. I think I've addressed this concern in my
> proposal already. Look:
>
> fd = open("/proc/bus/pci/${BUS}/${DEV}", ...);
> if (fd < 0)
> return -errno;
> err = ioctl(fd, PCI_MMAP_IO, 0);
I know I'm coming in on this late, but wouldn't it be cleaner to have
separate files for memory and io cycles, eg ${BUS}/${DEV}.(io|mem)?
They're logically different so they might as well be embodied separately.
--
"Love the dolphins," she advised him. "Write by W.A.S.T.E.."
At 5:01 PM -0600 3/6/2001, Oliver Xymoron wrote:
>On Fri, 2 Mar 2001, David S. Miller wrote:
>
>> > On PPC, we don't have an "IO" space neither, all we have is a range of
>> > memory addresses that will cause IO cycles to happen on the PCI bus.
>>
>> This is precisely what the "next MMAP is XXX space" ioctl I've
>> suggested is for. I think I've addressed this concern in my
>> proposal already. Look:
>>
>> fd = open("/proc/bus/pci/${BUS}/${DEV}", ...);
>> if (fd < 0)
>> return -errno;
>> err = ioctl(fd, PCI_MMAP_IO, 0);
>
>I know I'm coming in on this late, but wouldn't it be cleaner to have
>separate files for memory and io cycles, eg ${BUS}/${DEV}.(io|mem)?
>They're logically different so they might as well be embodied separately.
If I were designing this (and I'm not), I would do it as thus:
/proc/bus/pci/${BUS}/${DEV} is same as it always is
/proc/bus/pci/${BUS}/${DEV}.d/io.n for IO resources, where n is the number
of the IO resource
/proc/bus/pci/${BUS}/${DEV}.d/mem.n for Mem resouces, where n is...
/proc/bus/pci/${BUS}/${DEV}.d/ints for interrupts, which would block on
read when there are no interrupts pending, and after an interrupt is
triggered the data read would be some sort of information about the
interrupt.
And that should (in theory) be all you need for writing a basic userspace
PCI device driver. (You wouldn't really be able to set up DMA or such, but
at that point I think "put the damn driver in the kernel" would be an
appropriate utterance)
This is just off the top of my head, so no warranties expressed or implied
about the sanity of this kind of system.
Come to think of it, is /proc really the best place to put all this stuff?
It would be a pain to put it in /dev and mess with assigning majors and
minors and making sure all the special devices get created and stuff...
Makes me wish Linux had an /hw fs like on IRIX. (I suppose devfs is close,
but I don't personally like the idea of completely replacing /dev with an
automatic filesystem)
Anyways...
Cheers - Tony 'Nicoya' Mantler :)
--
Tony "Nicoya" Mantler - Renaissance Nerd Extraordinaire - [email protected]
Winnipeg, Manitoba, Canada -- http://nicoya.feline.pp.se/
Benjamin Herrenschmidt wrote:
>
> >No, don't do this, it is evil. Use mappings, specify the device
> >related info somehow when creating the mapping (in the userspace
> >variant you do this by openning a specific device to mmap, in the
> >kernel variant you can encode the bus/dev/etc. info in the device's
> >resource and decode this at ioremap() time, see?).
>
> Well, except that drivers doing IOs don't ioremap...
>
> Maybe we could define an ioremap-like function for IOs, but the more
I/O is not supposed to be fast, that's what MMIO is for. :) Just do
void outb (u8 val, u16 addr)
{
void *addr = ioremap (ISA_IO_BASE + addr);
if (addr) {
writeb (val, addr);
iounmap (addr);
}
}
You can map and unmap for each call :) Ugly and slow, but hey, it's
I/O...
--
Jeff Garzik | "You see, in this world there's two kinds of
Building 1024 | people, my friend: Those with loaded guns
MandrakeSoft | and those who dig. You dig." --Blondie