All programmers
I am relatively new to linux kernel. Please advise
what is the safe way to get the original virtaul
address from dma address e.g.,
dma_addr = pci_map_single(dev, vaddr, sizeof(struct
some), PCI_DMA_BIDIRECTIONAL);
issue_command_to_the_controller();
my_isr()
{
struct some *some;
dma_addr = this_address_completed();
--->some = how_to_get_from_dma_addr(dma_addr);
}
Would ioremap work.
TIA
__________________________________________________
Do You Yahoo!?
Listen to your Yahoo! Mail messages from any phone.
http://phone.yahoo.com
>>>>> "Linux" == Linux Bigot <[email protected]> writes:
Linux> All programmers I am relatively new to linux kernel. Please
Linux> advise what is the safe way to get the original virtaul address
Linux> from dma address e.g.,
You have to store the address you pass to pci_map_single() somewhere
in your data structures together with the dma address.
Jes
Please tell me why can't I use bus_to_virt().
Thanks
-----Original Message-----
From: Jes Sorensen [SMTP:[email protected]]
Sent: Wednesday, October 03, 2001 11:26 AM
To: Linux Bigot
Cc: [email protected]
Subject: Re: how to get virtual address from dma
address
>>>>> "Linux" == Linux Bigot <[email protected]>
writes:
Linux> All programmers I am relatively new to linux
kernel. Please
Linux> advise what is the safe way to get the original
virtaul address
Linux> from dma address e.g.,
You have to store the address you pass to
pci_map_single() somewhere
in your data structures together with the dma address.
Jes
On Wed, Oct 03, 2001 at 09:37:00AM -0700, Linux Bigot wrote:
> Please tell me why can't I use bus_to_virt().
Because bus_to_virt/virt_to_bus is obsolete, and using it will make your
driver non-portable to some architectures.
--
.----------=======-=-======-=========-----------=====------------=-=-----.
/ Ben Collins -- Debian GNU/Linux \
` [email protected] -- [email protected] -- [email protected] '
`---=========------=======-------------=-=-----=-===-======-------=--=---'
Then why isn't there a call like pci_unmap_single()
Thanks
--- Ben Collins <[email protected]> wrote:
> On Wed, Oct 03, 2001 at 09:37:00AM -0700, Linux
> Bigot wrote:
> > Please tell me why can't I use bus_to_virt().
>
> Because bus_to_virt/virt_to_bus is obsolete, and
> using it will make your
> driver non-portable to some architectures.
>
> --
>
>
.----------=======-=-======-=========-----------=====------------=-=-----.
> / Ben Collins -- Debian
> GNU/Linux \
> ` [email protected] -- [email protected]
> -- [email protected] '
>
`---=========------=======-------------=-=-----=-===-======-------=--=---'
__________________________________________________
Do You Yahoo!?
NEW from Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1
On Wed, Oct 03, 2001 at 02:11:08PM -0700, Linux Bigot wrote:
> Then why isn't there a call like pci_unmap_single()
include/asm-i386/pci.h:static inline void pci_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr,
Uh, there is.
--
.----------=======-=-======-=========-----------=====------------=-=-----.
/ Ben Collins -- Debian GNU/Linux \
` [email protected] -- [email protected] -- [email protected] '
`---=========------=======-------------=-=-----=-===-======-------=--=---'
> Then why isn't there a call like pci_unmap_single()
<<< linux/include/asm-i386/pci.h:
/* Unmap a single streaming mode DMA translation. The dma_addr and size
* must match what was provided for in a previous pci_map_single call.
All
* other usages are undefined.
*
* After this call, reads by the cpu to the buffer are guarenteed to see
* whatever the device wrote there.
*/
static inline void pci_unmap_single(struct pci_dev *hwdev, dma_addr_t
dma_addr,
size_t size, int direction)
<<<<
Please try to store both the dma address and the virtual address in your
driver and use pci_map_single().
I'm aware of one case where that's not possible: if your hardware builds
a linked list of finished commands, and that list is in DMA addresses,
not virtual addresses.
I think both sym53c8xx and one of the USB controllers (usb-ohci.c?) have
that problem, they use a hash table for the reverse dma address->linear
address mapping.
I hope that answers your question.
--
Manfred
oops! I am sorry. I did not meant "pci_unmap_single"
pci_unmap_single.
I meant to get a routine which does exactly opposite
of what pci_map_single does, i.e., give me a virtual
address for a dma address.
Some people suggested, bus_to_virt() is not good a
call.
Thanks
--- Kurt Ferreira <[email protected]> wrote:
> I have to admit I have not been following this
> thread and I apologize if I
> am stating somethign already stated, but there is a
> call in 2.4 kernels
> called pci_unmap_single. see
> Documentation/DMA-mapping.txt. Now on some
> arch's this call does nothing (for example i386) but
> for some arch's
> (for example alpha's, and others) it does stuff. I
> really would stay
> aways from the virt_to_phys family of calls as has
> been stated.
>
> Hope this helps,
> Kurt
>
>
> On Wed, 3 Oct 2001, Linux Bigot wrote:
>
> > Then why isn't there a call like
> pci_unmap_single()
> >
> >
> > Thanks
> >
> >
> > --- Ben Collins <[email protected]> wrote:
> > > On Wed, Oct 03, 2001 at 09:37:00AM -0700, Linux
> > > Bigot wrote:
> > > > Please tell me why can't I use bus_to_virt().
> > >
> > > Because bus_to_virt/virt_to_bus is obsolete, and
> > > using it will make your
> > > driver non-portable to some architectures.
> > >
> > > --
> > >
> > >
> >
>
.----------=======-=-======-=========-----------=====------------=-=-----.
> > > / Ben Collins -- Debian
> > > GNU/Linux \
> > > ` [email protected] --
> [email protected]
> > > -- [email protected] '
> > >
> >
>
`---=========------=======-------------=-=-----=-===-======-------=--=---'
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > NEW from Yahoo! GeoCities - quick and easy web
> site hosting, just $8.95/month.
> > http://geocities.yahoo.com/ps/info1
> > -
> > To unsubscribe from this list: send the line
> "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
> --
>
>
>
__________________________________________________
Do You Yahoo!?
NEW from Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
http://geocities.yahoo.com/ps/info1
On Wed, Oct 03, 2001 at 02:48:58PM -0700, Linux Bigot wrote:
> oops! I am sorry. I did not meant "pci_unmap_single"
> pci_unmap_single.
>
> I meant to get a routine which does exactly opposite
> of what pci_map_single does, i.e., give me a virtual
> address for a dma address.
pci_unmap_single is the exact opposite of pci_map_single. It unmaps a
memory region. It just doesn't do what you want.
Bottom line, to keep your driver portable, you are going to have to keep
a mapping of the dma and real addresses either in a structure as most
drivers do, or as a hash like a couple difficult ones do.
Ben
--
.----------=======-=-======-=========-----------=====------------=-=-----.
/ Ben Collins -- Debian GNU/Linux \
` [email protected] -- [email protected] -- [email protected] '
`---=========------=======-------------=-=-----=-===-======-------=--=---'
> Linux> All programmers I am relatively new to linux kernel. Please
> Linux> advise what is the safe way to get the original virtaul address
> Linux> from dma address e.g.,
> You have to store the address you pass to pci_map_single() somewhere
> in your data structures together with the dma address.
Yes, but speaking as someone who had to use a large hammer to convert his
driver from bus_to_virt et al., it does seem rather hard not to have the
equivalent for the new pci_dma paradigm. It does present an obstacle
persuading people to convert drivers, particularly if the hardware is going to
present a linked list of addresses (as SCSI hardware often does).
After all, whatever device maps between the io bus and the memory bus, it must
always map a given dma_addr_t to a known physical address. It can't be that
hard to provide an an API in the kernel which can compute this relationship
(although I can see it may be expensive to walk iommu page tables). I'm only
really asking for a dma_addr_t to virtual address by the way. I see that
mapping the other way would be problematic.
James Bottomley
From: James Bottomley <[email protected]>
Date: Wed, 03 Oct 2001 17:44:18 -0500
(although I can see it may be expensive to walk iommu page tables)
I know of hardware where doing the reverse mapping would not even be
possible, the page tables are in hardware registers and are "write
only". This means you can't even read the PTEs back, you'd have to
keep track of them in software and that is totally unacceptable
overhead when it won't even be used %99 of the time.
The DMA API allows us to support such hardware cleanly and
efficiently, but once we add this feature which "everyone absolutely
needs" we have a problem with the above mentioned piece of hardware.
Franks a lot,
David S. Miller
[email protected]
With Rik's reverse mapping patch, wouldn't we have the virtual address for the given
physical address ? I have no clue about how the patch works, somebody willing to explain
it?
Balbir
David S. Miller wrote:
> From: James Bottomley <[email protected]>
> Date: Wed, 03 Oct 2001 17:44:18 -0500
>
> (although I can see it may be expensive to walk iommu page tables)
>
>I know of hardware where doing the reverse mapping would not even be
>possible, the page tables are in hardware registers and are "write
>only". This means you can't even read the PTEs back, you'd have to
>keep track of them in software and that is totally unacceptable
>overhead when it won't even be used %99 of the time.
>
>The DMA API allows us to support such hardware cleanly and
>efficiently, but once we add this feature which "everyone absolutely
>needs" we have a problem with the above mentioned piece of hardware.
>
>Franks a lot,
>David S. Miller
>[email protected]
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
From: "BALBIR SINGH" <[email protected]>
Date: Thu, 04 Oct 2001 15:41:08 +0530
With Rik's reverse mapping patch, wouldn't we have the virtual address for the given
physical address ? I have no clue about how the patch works, somebody willing to explain
it?
Rik's work is for user process PTEs, we're talking about IOMMU PTEs on
PCI controllers used to map the 32-bit PCI address space to the (often
larger) physical address space of main memory.
These two PTE types are totally unrelated.
Franks a lot,
David S. Miller
[email protected]
[email protected] said:
> I know of hardware where doing the reverse mapping would not even be
> possible, the page tables are in hardware registers and are "write
> only". This means you can't even read the PTEs back, you'd have to
> keep track of them in software and that is totally unacceptable
> overhead when it won't even be used %99 of the time.
> The DMA API allows us to support such hardware cleanly and
> efficiently, but once we add this feature which "everyone absolutely
> needs" we have a problem with the above mentioned piece of hardware.
Walking the iommu page tables was only a suggestion of how it should be done.
I'm not advocating providing this as a generic feature of the iommu handling
software, but as an API.
What's wrong with implementing a separate API, like
pci_register_mapping(struct pci_dev *dev, void *virtualAddress, dma_addr_t
dmaAddress, size_t size);
void *pci_dma_to_virtual(struct pci_dev *dev, dma_addr_t dmaAddress);
pci_unregister_mapping(struct pci_dev *dev, void *virtualAddress, dma_addr_t
dmaAddress, size_t size);
?
That way, the driver will only ask for translations of addresses it knows its
going to have difficulty with, so there's no overhead in the general mapping
case where you don't request registration for a mapping lookup.
For an IOMMU with readable page tables, you can chose to implement the
register/unregister as nops and do the pci_dma_to_virtual as a PTE walk. For
the really recalcitrant iommu hardware, you only save the mapping tables (or
more likely just the mappings in a hash lookup table) when requested to.
This has the dual benefits of being completely backwards convertible from the
old bus_to_virt et al. scheme and requiring no overhead unless the driver
actually asks for it.
James Bottomley
>>>>> "James" == James Bottomley <[email protected]> writes:
Linux> All programmers I am relatively new to linux kernel. Please
Linux> advise what is the safe way to get the original virtaul address
Linux> from dma address e.g.,
>> You have to store the address you pass to pci_map_single()
>> somewhere in your data structures together with the dma address.
James> Yes, but speaking as someone who had to use a large hammer to
James> convert his driver from bus_to_virt et al., it does seem rather
James> hard not to have the equivalent for the new pci_dma paradigm.
James> It does present an obstacle persuading people to convert
James> drivers, particularly if the hardware is going to present a
James> linked list of addresses (as SCSI hardware often does).
Because as DaveM pointed out, some hardware can't do it, and as I said
earlier, it's a lot cheaper and easier for driver writers to just
store the extra pointer in their data structures than it is to
implement a database to maintain it.
Remember you often need this address in the hot path (say TX interrupt
handler) so you don't want to introduce any unnecessary function calls.
Jes
>>>>> "Linux" == Linux Bigot <[email protected]> writes:
Linux> oops! I am sorry. I did not meant "pci_unmap_single"
Linux> pci_unmap_single.
Linux> I meant to get a routine which does exactly opposite of what
Linux> pci_map_single does, i.e., give me a virtual address for a dma
Linux> address.
Because it is cheaper for you to keep both addresses in the driver
than it is to implement a database to keep track of it for you.
Jes
Jes Sorensen writes:
> Because as DaveM pointed out, some hardware can't do it, and as I said
> earlier, it's a lot cheaper and easier for driver writers to just
> store the extra pointer in their data structures than it is to
> implement a database to maintain it.
For some devices yes, for others storing the extra pointer doesn't
help all that much. Devices which can complete requests out of order
and which tell you the DMA addresses of the requests they have
completed would be in the second class, and that includes the OHCI USB
controller, and some intelligent SCSI controllers I believe.
So you end up having to map the DMA address back to the pointer to the
data structure for the request. At the moment the drivers that need
this each have their own implementation, using a hash table or
whatever.
The argument for supplying this functionality in the PCI DMA code
would be that if it was done there it could be done once, and in a
sophisticated and efficient (and SMP-safe :) fashion, rather than
ad-hoc in each driver.
It may also be possible for the PCI DMA code to take advantage of its
knowledge of a particular platform, for example if the platform only
has a small range of possible DMA addresses then it could use a simple
and fast lookup table. Or it may be possible to read the IOMMU tables
on some platforms and do the reverse mapping quickly that way - this
would certainly be the case for the IBM RS/6000 machines since the
IOMMU tables are in system RAM.
> Remember you often need this address in the hot path (say TX interrupt
> handler) so you don't want to introduce any unnecessary function calls.
The drivers that need this would already be doing internal function
calls to do the reverse mapping anyway. Of course we would not want
this functionality to add extra overhead at the point where we set up
the mapping (at least in the cases where we won't need the reverse
mapping).
There is a question though as to whether a reverse mapping from DMA
address to virtual address is sufficient for these drivers, or whether
they would need a mapping from DMA address to something else such as a
pointer to a request structure. The usb-ohci driver looks like it
would be sufficient to get back to the virtual address.
Regards,
Paul.
From: Paul Mackerras <[email protected]>
Date: Sat, 6 Oct 2001 18:06:23 +1000 (EST)
The argument for supplying this functionality in the PCI DMA code
would be that if it was done there it could be done once, and in a
sophisticated and efficient (and SMP-safe :) fashion, rather than
ad-hoc in each driver.
The argument against it is that if you provide such an easy way out,
people will just blindly transform bus_to_virt/virt_to_bus without
considering more straightforward methods when they exist.
I can not even count on one hand how many people I've helped
converting, who wanted a bus_to_virt() and when I showed them
how to do it with information the device provided already they
said "oh wow, I never would have thought of that". That process
won't happen as often with the suggested feature.
I am adamently against generic infrastructure to do this. Yes, it's
social engineering, tough cookies... it's social engineering that I
know is working :-)
Franks a lot,
David S. Miller
[email protected]
David S. Miller writes:
> I can not even count on one hand how many people I've helped
> converting, who wanted a bus_to_virt() and when I showed them
> how to do it with information the device provided already they
> said "oh wow, I never would have thought of that". That process
> won't happen as often with the suggested feature.
Well, let's see if we can come up with a way to achieve this goal as
well as the other.
I look at all the hash-table stuff in the usb-ohci driver and I think
to myself about all the complexity that is there (and I haven't
managed to convince myself yet that it is actually SMP-safe) and all
the time wasted doing that stuff, when on probably 95% of the
machines that use the usb-ohci driver, the hashing stuff is totally
unnecessary. I am talking about powermacs, which don't have an iommu,
and where the reverse mapping is as simple as adding a constant.
That was my second argument, which you didn't reply to - that doing
the reverse mapping is very simple on some platforms, and so the right
place to do reverse mapping is in the platform-aware code, not in the
drivers. On other platforms the reverse mapping is more complex, but
the complexity is bounded by the complexity that is already there in
drivers like the usb-ohci driver.
> I am adamently against generic infrastructure to do this. Yes, it's
> social engineering, tough cookies... it's social engineering that I
> know is working :-)
Well, I don't maintain any of the affected drivers, so it's not an
issue that affects me personally.
Maybe we want a reverse-mapping API with a copyright notice on it that
says you can't use it unless you have written permission from David
S. Miller. Just joking, but I do think that we should see if we can
think of something that achieves that sort of effect while still
making the facilities available for the drivers that truly do need it.
Regards,
Paul.
[email protected] said:
> I look at all the hash-table stuff in the usb-ohci driver and I think
> to myself about all the complexity that is there (and I haven't
> managed to convince myself yet that it is actually SMP-safe) and all
> the time wasted doing that stuff, when on probably 95% of the machines
> that use the usb-ohci driver, the hashing stuff is totally
> unnecessary. I am talking about powermacs, which don't have an iommu,
> and where the reverse mapping is as simple as adding a constant.
> That was my second argument, which you didn't reply to - that doing
> the reverse mapping is very simple on some platforms, and so the right
> place to do reverse mapping is in the platform-aware code, not in the
> drivers. On other platforms the reverse mapping is more complex, but
> the complexity is bounded by the complexity that is already there in
> drivers like the usb-ohci driver.
This is precisely my point as well: You've made all the drivers that must be
able to do this type of mapping assume the worst case because we cannot now
have any window into the iommu if there is one.
Worse still, every driver that needs this feature is doing it on its own, so
the code for doing this in usb-ohci is different from the code in the
sym53c8xx driver etc. We're now duplicating this fairly subtle and complex
code all over the driver base.
At the very least, providing an API to do this will get us away from all the
error prone duplication. And if it's going to be an API, it might as well be
architecture specific and iommu aware so it functions in the simplest and
fastest way.
[email protected] said:
> The argument against it is that if you provide such an easy way out,
> people will just blindly transform bus_to_virt/virt_to_bus without
> considering more straightforward methods when they exist.
> I can not even count on one hand how many people I've helped
> converting, who wanted a bus_to_virt() and when I showed them how to
> do it with information the device provided already they said "oh wow,
> I never would have thought of that". That process won't happen as
> often with the suggested feature.
I agree that some people are very prone to abusing the tools provided to them.
I cannot agree that this is a good reason not to provide the tools in the
first place. This is the type of argument usually heard from lawmakers and
politicians. The linux community is better than that: we have a review
process for new drivers. We even have a bunch of people who trawl through the
old code looking for mistakes and problems who would probably be more that
willing to send in patches for abuse of this API. Can we not educate the
community (document the API properly) and rely on it to make the correct
choices?
James Bottomley
[...]
> Worse still, every driver that needs this feature is doing it on its own, so
> the code for doing this in usb-ohci is different from the code in the
> sym53c8xx driver etc. We're now duplicating this fairly subtle and complex
> code all over the driver base.
What's complex??
The sym53c8xx driver uses a simple hash table to retrieve the IO control
block from the DSA value. This DSA value is in fact the bus physical
address. Indeed, this is kind of reverse mapping DMA address -> Virtual
address, but the code is about 20 lines _only_, and it is absolutely not
complex, neither it impacts performances nor makes significant difference
about memory used. The driver could, for example, use a simple index
instead and retrieve the IO control block from an array of virtual
addresses, as does the aic7xxx driver, for example. This simple reverse
mapping seemed to me more appropriate for the sym53c8xx stuff.
In my opinion, any bus_to_virt() thing hurts a lot. It only makes sense if
it refers to the virt_to_bus() mapping that was used to generate the bus
address and assumes the reverse function to be a mapping. A general
bus_to_virt() thing looks so ugly thing to me that I donnot want to ever
use such. Even if we implicitely assume that it refers to some 'virtual to
DMA mapping' that ensures that each virtual address only maps a single DMA
address, it may be trivial to implement on some arch with no significant
overhead but it can be complex to implement on some other or may suffer of
significant overhead.
Frankly, I definitely would not like any general bus_to_virt() to be
resurrected.
By the way, what is a bit complex in the sym53c8xx driver is probably the
memory allocator that provides virtual to bus address mapping for internal
driver data structures + naturally aligned allocations. I wrote it once
and now it is also useful for SYM-2 that runs on 3 O/Ses. I haven't had
any problem with that code since day one, so I will keep with it even if
some O/S does provide an equivalent service. :-)
[...]
Regards,
G?rard.
>>>>> "Paul" == Paul Mackerras <[email protected]> writes:
Paul> David S. Miller writes:
>> I can not even count on one hand how many people I've helped
>> converting, who wanted a bus_to_virt() and when I showed them how
>> to do it with information the device provided already they said "oh
>> wow, I never would have thought of that". That process won't
>> happen as often with the suggested feature.
Paul> Well, let's see if we can come up with a way to achieve this
Paul> goal as well as the other.
Paul> I look at all the hash-table stuff in the usb-ohci driver and I
Paul> think to myself about all the complexity that is there (and I
Paul> haven't managed to convince myself yet that it is actually
Paul> SMP-safe) and all the time wasted doing that stuff, when on
Paul> probably 95% of the machines that use the usb-ohci driver, the
Paul> hashing stuff is totally unnecessary. I am talking about
Paul> powermacs, which don't have an iommu, and where the reverse
Paul> mapping is as simple as adding a constant.
I haven't looked at the ohci driver at all, however doesn't it return
anything but the dma address? No index, no offset, no nothing? If
thats the case, someone really needs to go visit the designers with a
large bat ;-(
Jes
>[...]
> The argument for supplying this functionality in the PCI DMA code
> would be that if it was done there it could be done once, and in a
> sophisticated and efficient (and SMP-safe :) fashion, rather than
> ad-hoc in each driver.
This is exactly the kind of thinking that brought us pci_pool.
With all due respect to David-B, it was totally unnecessary,
in my view. However, attempts to make it "pretty" resulted in
something that cannot be implemented right. Just think if
pci_pool_free can be entered from an interrupt. If you allow
that, you cannot free full pages (because some broken architectures
implement pci_alloc_consistent with vmalloc, and vfree is not
interrupt safe). The root of the problem is an attempt to specify
variable sized pools.
I am afraid that if we start adding first class citizen APIs
in the area of pci_alloc_something, it's going to be more and
more interdependent and will place very strong constraints
on what architectures can and cannot do. For example, already
pci_alloc_consistent _cannot_ be implemented on some HP machines
(and, by extension, pci_pool).
> It may also be possible for the PCI DMA code to take advantage of its
> knowledge of a particular platform, for example if the platform only
> has a small range of possible DMA addresses then it could use a simple
> and fast lookup table. Or it may be possible to read the IOMMU tables
> on some platforms and do the reverse mapping quickly that way - this
> would certainly be the case for the IBM RS/6000 machines since the
> IOMMU tables are in system RAM.
And it neither of these is possible, then what? Then you fall
back on the most generic code, which is the worst case of
all possible partial implementations in drivers.
-- Pete
> > I can not even count on one hand how many people I've helped
> > converting, who wanted a bus_to_virt() and when I showed them
> > how to do it with information the device provided already they
> > said "oh wow, I never would have thought of that". That process
> > won't happen as often with the suggested feature.
> I look at all the hash-table stuff in the usb-ohci driver and I think
> to myself about all the complexity that is there (and I haven't
> managed to convince myself yet that it is actually SMP-safe) and all
> the time wasted doing that stuff, when on probably 95% of the
> machines that use the usb-ohci driver, the hashing stuff is totally
> unnecessary. I am talking about powermacs, which don't have an iommu,
> and where the reverse mapping is as simple as adding a constant.
That's one kinky driver, no wonder you were traumatized by looking
at it. I think you must not project the shock and horror of usb-ohci
onto other drivers. Gerard already defended the Symbios SCSI.
There may be some approaches to deal with the problem. One is
to leave hash in and clean up the rest. It would probably
break a number of devices and take a time to straighten out.
Another possibility is to limit the number of URBs that are
posted in any given time to the hardware.
> That was my second argument, which you didn't reply to - that doing
> the reverse mapping is very simple on some platforms, and so the right
> place to do reverse mapping is in the platform-aware code, not in the
> drivers. On other platforms the reverse mapping is more complex, but
> the complexity is bounded by the complexity that is already there in
> drivers like the usb-ohci driver.
No, it's not bounded. Outside implementation has to be much more
complex to accomodate slightly different requirements of different
drivers.
And remember, you can put it in, but cannot pull it out.
-- Pete
Jes Sorensen writes:
> I haven't looked at the ohci driver at all, however doesn't it return
> anything but the dma address? No index, no offset, no nothing? If
> thats the case, someone really needs to go visit the designers with a
> large bat ;-(
The OHCI hardware works with linked lists of transfer descriptors,
using bus addresses for the pointers of course. When a transfer
descriptor is completed, it gets linked onto a done-list by the
hardware (on to the front of the list, so you get the completed
descriptors in reverse order).
There is no way to predict the completion order in general because you
can have transfers in progress to several different devices, and to
several endpoints on each device, at the same time, which can each
supply or accept data at different rates.
I don't think the OHCI designers considered the possibility that it
might be less than straightforward for the CPU to access some memory
given its bus address. And I would have a hard time arguing against
someone who claimed that a platform where that was difficult was
just terminally broken. :)
Paul.
On 6 Oct 2001, Jes Sorensen wrote:
> >>>>> "Paul" == Paul Mackerras <[email protected]> writes:
>
> Paul> David S. Miller writes:
> >> I can not even count on one hand how many people I've helped
> >> converting, who wanted a bus_to_virt() and when I showed them how
> >> to do it with information the device provided already they said "oh
> >> wow, I never would have thought of that". That process won't
> >> happen as often with the suggested feature.
>
> Paul> Well, let's see if we can come up with a way to achieve this
> Paul> goal as well as the other.
>
> Paul> I look at all the hash-table stuff in the usb-ohci driver and I
> Paul> think to myself about all the complexity that is there (and I
> Paul> haven't managed to convince myself yet that it is actually
> Paul> SMP-safe) and all the time wasted doing that stuff, when on
> Paul> probably 95% of the machines that use the usb-ohci driver, the
> Paul> hashing stuff is totally unnecessary. I am talking about
> Paul> powermacs, which don't have an iommu, and where the reverse
> Paul> mapping is as simple as adding a constant.
>
> I haven't looked at the ohci driver at all, however doesn't it return
> anything but the dma address? No index, no offset, no nothing? If
> thats the case, someone really needs to go visit the designers with a
> large bat ;-(
I would apply the bat to people that wants such a dma to virtual general
translation. This thing is obviously gross shit.
I would also apply the bat to people that look into stuff of other people
and, instead of trying to actually understand the code, just give a look
and send inappropriate statements to the list.
G?rard.
> I would apply the bat to people that wants such a dma to virtual
> general translation. This thing is obviously gross shit.
If you read back in the thread, you'll find the proposed API was per
pci_device and only when the device driver writer actually asked for it. This
makes it potentially as efficient as the code in sym53c8xx in the worst case
and much more efficient in the best.
We have all agreed that just doing the mapping generally will be inefficient
for a particular class of hardware with no readable access to its page tables.
> I would also apply the bat to people that look into stuff of other
> people and, instead of trying to actually understand the code, just
> give a look and send inappropriate statements to the list.
The statement currently made to the list:
> Worse still, every driver that needs this feature is doing it on its own, so
> the code for doing this in usb-ohci is different from the code in the
> sym53c8xx driver etc.
Is true: both drivers use hashes to do dma to virtual mapping. They both have
their own code for doing it (I have looked). We can disagree about whether
the code is subtle or complex, but you can't deny that these two drivers have
separate implementations of the same function.
> In my opinion, any bus_to_virt() thing hurts a lot. It only makes
> sense if it refers to the virt_to_bus() mapping that was used to
> generate the bus address and assumes the reverse function to be a
> mapping. A general bus_to_virt() thing looks so ugly thing to me that
> I donnot want to ever use such.
Right, but we're not arguing over whether to do it generally. The argument is
whether an API looking like this:
pci_register_mapping(struct pci_dev *dev, void *virtualAddress, dma_addr_t
dmaAddress, size_t size);
void *pci_dma_to_virtual(struct pci_dev *dev, dma_addr_t dmaAddress);
dma_addr_t pci_virtual_to_dma(struct pci_dev *dev, void *virtualAddress);
pci_unregister_mapping(struct pci_dev *dev, void *virtualAddress, dma_addr_t
dmaAddress, size_t size);
should or should not be provided.
For this API I claim that:
1. I can do the worst case MMU hardware as efficiently as your driver (because
it would essentially be a hashed look up of registered mappings for your
single pci device instance alone, functionally identical to the code you use
today).
2. on optimal hardware (like x86 where this can be done by bit flipping of the
address) I can do a whole lot better.
I also claim the same is true for every driver which still needs to do this
type of mapping (which is a significant fraction of drivers for devices which
have an intelligent processor core and multiple outstanding jobs with a
non-predictable order of completion).
Therefore, every driver that needs to do this today is using the least
efficient method. Further, they're all coding their own least efficient
methods.
If you can provide a reasoned counter argument to the above (preferably
stripped of the invective periphrasis) I'm listening.
James Bottomley
>>>>> "Paul" == Paul Mackerras <[email protected]> writes:
Paul> Jes Sorensen writes:
>> I haven't looked at the ohci driver at all, however doesn't it
>> return anything but the dma address? No index, no offset, no
>> nothing? If thats the case, someone really needs to go visit the
>> designers with a large bat ;-(
Paul> The OHCI hardware works with linked lists of transfer
Paul> descriptors, using bus addresses for the pointers of course.
Paul> When a transfer descriptor is completed, it gets linked onto a
Paul> done-list by the hardware (on to the front of the list, so you
Paul> get the completed descriptors in reverse order).
Paul> There is no way to predict the completion order in general
Paul> because you can have transfers in progress to several different
Paul> devices, and to several endpoints on each device, at the same
Paul> time, which can each supply or accept data at different rates.
Ok, so this is actually quite similar to how the AceNIC does it,
however the great thing about the AceNIC descriptors is that it has an
opague field in the descriptor which you can use as an index into a
table or similar to dig out your dma descriptor addresses.
Jes
On Sun, 7 Oct 2001, James Bottomley wrote:
> > I would apply the bat to people that wants such a dma to virtual
> > general translation. This thing is obviously gross shit.
>
> If you read back in the thread, you'll find the proposed API was per
> pci_device and only when the device driver writer actually asked for it. This
> makes it potentially as efficient as the code in sym53c8xx in the worst case
> and much more efficient in the best.
>
> We have all agreed that just doing the mapping generally will be inefficient
> for a particular class of hardware with no readable access to its page tables.
>
> > I would also apply the bat to people that look into stuff of other
> > people and, instead of trying to actually understand the code, just
> > give a look and send inappropriate statements to the list.
>
> The statement currently made to the list:
>
> > Worse still, every driver that needs this feature is doing it on its own, so
> > the code for doing this in usb-ohci is different from the code in the
> > sym53c8xx driver etc.
>
> Is true: both drivers use hashes to do dma to virtual mapping. They both have
> their own code for doing it (I have looked). We can disagree about whether
> the code is subtle or complex, but you can't deny that these two drivers have
> separate implementations of the same function.
Am I dreaming? It is less than 30 lines of code one-shot written within a
couple of minutes. Too much sharing code has counter-effects on
maintainability, performances and also may have the adverse effect of
bloating the overall code or making it less readable. We must share what
is worth to be so. The right compromize I see here is absolutely not
yours, it seems.
> > In my opinion, any bus_to_virt() thing hurts a lot. It only makes
> > sense if it refers to the virt_to_bus() mapping that was used to
> > generate the bus address and assumes the reverse function to be a
> > mapping. A general bus_to_virt() thing looks so ugly thing to me that
> > I donnot want to ever use such.
>
> Right, but we're not arguing over whether to do it generally. The argument is
> whether an API looking like this:
>
> pci_register_mapping(struct pci_dev *dev, void *virtualAddress, dma_addr_t
> dmaAddress, size_t size);
> void *pci_dma_to_virtual(struct pci_dev *dev, dma_addr_t dmaAddress);
> dma_addr_t pci_virtual_to_dma(struct pci_dev *dev, void *virtualAddress);
> pci_unregister_mapping(struct pci_dev *dev, void *virtualAddress, dma_addr_t
> dmaAddress, size_t size);
The prototypes seem larger in characters that the full code that does the
work in the sym53c8xx driver. On the other hand, it involves the pcidev
structure and thus may require additionnal synchronisation. You just
illustrates exactly what I wrote above, in my opinion.
> should or should not be provided.
>
> For this API I claim that:
>
> 1. I can do the worst case MMU hardware as efficiently as your driver (because
> it would essentially be a hashed look up of registered mappings for your
> single pci device instance alone, functionally identical to the code you use
> today).
>
> 2. on optimal hardware (like x86 where this can be done by bit flipping of the
> address) I can do a whole lot better.
>
> I also claim the same is true for every driver which still needs to do this
> type of mapping (which is a significant fraction of drivers for devices which
> have an intelligent processor core and multiple outstanding jobs with a
> non-predictable order of completion).
>
> Therefore, every driver that needs to do this today is using the least
> efficient method. Further, they're all coding their own least efficient
> methods.
>
> If you can provide a reasoned counter argument to the above (preferably
> stripped of the invective periphrasis) I'm listening.
No problem.
Here is the FULL simple code from SYM-2 driver that perform the reverse
mapping (it is mostly the same in sym53c8xx):
/*
* A CCB hashed table is used to retrieve CCB address
* from DSA value.
*/
#define CCB_HASH_SHIFT 8
#define CCB_HASH_SIZE (1UL << CCB_HASH_SHIFT)
#define CCB_HASH_MASK (CCB_HASH_SIZE-1)
#if 1
#define CCB_HASH_CODE(dsa) \
(((dsa) >> (_LGRU16_(sizeof(struct sym_ccb)))) & CCB_HASH_MASK)
#else
#define CCB_HASH_CODE(dsa) (((dsa) >> 9) & CCB_HASH_MASK)
#endif
--
/*
* Insert this ccb into the hashed list.
*/
hcode = CCB_HASH_CODE(cp->ccb_ba);
cp->link_ccbh = np->ccbh[hcode];
np->ccbh[hcode] = cp;
--
/*
* Look up a CCB from a DSA value.
*/
static ccb_p sym_ccb_from_dsa(hcb_p np, u32 dsa)
{
int hcode;
ccb_p cp;
hcode = CCB_HASH_CODE(dsa);
cp = np->ccbh[hcode];
while (cp) {
if (cp->ccb_ba == dsa)
break;
cp = cp->link_ccbh;
}
return cp;
}
--
No need to ever provide me with a patch that wants to use _any_ API for
that. It will obviously not be applied.
If there are candidates for using some API like the one your propose, it
will not be a problem for me. I just will not use it. It is as simple as
this. :-)
G?rard.
[email protected] said:
> No problem. Here is the FULL simple code from SYM-2 driver that
> perform the reverse mapping (it is mostly the same in sym53c8xx):
Well, since this piece of code isn't in the current kernel, it makes it more
difficult, but it looks to me like there's an internal ccb structure in the
driver that would contain pointers to the scsi command pointer, the dsa
structure and various other things. The dsa structure is the chip's internal
representation of an outstanding command. When a command is completed, you
get a dma address of the dsa pointer back from the chip, so you do the
sym_ccb_from_dsa() conversion to get the ccb and from that you get the virtual
address of the dsa structure because you need to collect completion
information about the command before sending it back into the mid-layer.
The way you do this is to walk a list of ccb structures hashed on the dsa
pointer for efficient reverse lookup until you find the ccb of the returning
dsa.
It's certainly a valid thing to do, I've done it before myself. However, an
equally valid way of processing a returning dsa is to embed a pointer to the
ccb structure in the dsa structure. Then to get back to the ccb you simply
dereference the dsa structure. The catch now is that I need the virtual
address of the dsa pointer. If I had the API I could do this. How
efficiently? well on the x86 it's a simple bit flip to get the virtual
address and away I go. The cost is O(1). On the most opaque DMA hardware, the
mappings would have to be stored separately in a hash table. The cost of
doing the lookup is O(number of pages registered for this device) which, since
I would expect to pack a few dsa structures per page is less than O(total dsa
structures).
Your method, providing only outstanding dsa structures are hashed, has an
efficiency O(total outstanding commands).
So, for the worst case DMA hardware the two methods are very comparable (you
can probably get each to beat the other by judicious tuning of the hash bucket
size). For the best case DMA hardware, my lookup is O(1) which is hard for a
hash lookup to beat.
So, in summary we have two methods, each of which could beat the other under
optimal hardware conditions. Which should be used? well that's up to the
choice of the device driver writer.
My point here is that there isn't a choice any more because the API to do DMA
to virtual address mappings is gone. What I've done is proposed a replacement
API that will have no impact on a device driver writer who doesn't want to use
it.
The fact that you wouldn't use it is irrelevant to the argument, since I've no
wish to force you to. However, I do want the the freedom to write my drivers
according to my choice of method.
So the outstanding point of debate still remains: what is wrong with the
proposed API? The arguments I've heard so far are:
- It might be misused [true but not relevant].
- It would bloat the kernel [Actually, it would be implemented as #defines
like the standard dma APIs, so would only bloat the kernel for hardware that
has no window into the IOMMU]
- You can do the same thing differently [true, but you cannot do it as
efficiently on optimal dma hardware].
James Bottomley
On Sun, 7 Oct 2001, James Bottomley wrote:
> [email protected] said:
> > No problem. Here is the FULL simple code from SYM-2 driver that
> > perform the reverse mapping (it is mostly the same in sym53c8xx):
>
> Well, since this piece of code isn't in the current kernel,
It cames from sym53c8xx. Wheel didn't get reinvented here.
> it makes it more
> difficult, but it looks to me like there's an internal ccb structure in the
> driver that would contain pointers to the scsi command pointer, the dsa
> structure and various other things. The dsa structure is the chip's internal
> representation of an outstanding command. When a command is completed, you
> get a dma address of the dsa pointer back from the chip, so you do the
> sym_ccb_from_dsa() conversion to get the ccb and from that you get the virtual
Is ncr_ccb_from_dsa() in sym53c8xx.
> address of the dsa structure because you need to collect completion
> information about the command before sending it back into the mid-layer.
>
> The way you do this is to walk a list of ccb structures hashed on the dsa
> pointer for efficient reverse lookup until you find the ccb of the returning
> dsa.
Correct. :)
> It's certainly a valid thing to do, I've done it before myself. However, an
> equally valid way of processing a returning dsa is to embed a pointer to the
> ccb structure in the dsa structure. Then to get back to the ccb you simply
> dereference the dsa structure. The catch now is that I need the virtual
> address of the dsa pointer. If I had the API I could do this. How
> efficiently? well on the x86 it's a simple bit flip to get the virtual
> address and away I go. The cost is O(1).
Compared to other load involving an IO, the current driver hash load is
_negligible_. The DSA value is returned by the SCSI scripts that runs in
the chip hardware. Dereferencing this value looks brain-damage to me. I
_do_ want to _reliably_ catch situations where this value is just bogus,
whatever it is due to a hardware failure or a driver bug.
> On the most opaque DMA hardware, the
> mappings would have to be stored separately in a hash table. The cost of
> doing the lookup is O(number of pages registered for this device) which, since
> I would expect to pack a few dsa structures per page is less than O(total dsa
> structures).
In my taste and in the context of sym* drivers, you suggestion is a wrong
solution to a simple problem. OTOH, the optimization it provides is not
worthwhile here.
> Your method, providing only outstanding dsa structures are hashed, has an
> efficiency O(total outstanding commands).
All data structures (for simplicity) are hashed in 256 buckets.
The hash, despite it is simplistic, is tricky enough to give good results.
It is based on the fact that all driver internal data structures are
naturally aligned and the following simple alchemy from my own does the
trick:
#define CCB_HASH_CODE(dsa) \
(((dsa) >> (_LGRU16_(sizeof(struct sym_ccb)))) & CCB_HASH_MASK)
You are not required to understand how it works, but please donnot send
inaccurate statements to the list.
> So, for the worst case DMA hardware the two methods are very comparable (you
> can probably get each to beat the other by judicious tuning of the hash bucket
> size). For the best case DMA hardware, my lookup is O(1) which is hard for a
> hash lookup to beat.
It just gives a different service that the one I want for the driver and
does try to optimize code that does not need to be faster.
> So, in summary we have two methods, each of which could beat the other under
> optimal hardware conditions. Which should be used? well that's up to the
> choice of the device driver writer.
>
> My point here is that there isn't a choice any more because the API to do DMA
> to virtual address mappings is gone. What I've done is proposed a replacement
> API that will have no impact on a device driver writer who doesn't want to use
> it.
>
> The fact that you wouldn't use it is irrelevant to the argument, since I've no
> wish to force you to. However, I do want the the freedom to write my drivers
> according to my choice of method.
>
> So the outstanding point of debate still remains: what is wrong with the
> proposed API? The arguments I've heard so far are:
>
> - It might be misused [true but not relevant].
> - It would bloat the kernel [Actually, it would be implemented as #defines
> like the standard dma APIs, so would only bloat the kernel for hardware that
> has no window into the IOMMU]
> - You can do the same thing differently [true, but you cannot do it as
> efficiently on optimal dma hardware].
I donnot decide of anything regarding kernel APIs. If you think your
proposal of this new API is good then just send a patch to Linus and if it
gets accepted, then you win.
OTOH, I would not lose anything. :)
G?rard.