Hi,
I am writing a driver, and I am faced with following problem:
I receive a kernel pointer, and I want device to dma into that memory.
However, device is only capable of 32 bit dma.
I can create a 'consistent' mapping and use memcpy from/to it, but it
feels like waste of performance.
According to Documentation/DMA-mapping.txt, I can tell that hw supports
32 bit dma using pci_set_dma_mask, however, what will happen if I pass
arbitrary kernel address into pci_map_single.
What will happen if the address is above 32 bit?
I tried to follow source of pci_map_single, but it is quite strongly
buried.
Also note that I don't need any scatter gather lists, because buffer
will always be 512 bytes long.
Note that I don't write the block driver itself, but a small driver that
plugs into that driver.
I am writing a driver for mtd subsystem and more precisely a nand
driver.
Unfortunately mtd system can call the driver with arbitrary kernel
pointer, although I am sure its not vmalloc'ed.
Usually it passes the pointer that is received from block subsystem.
Thanks in advance,
Maxim Levitsky
On 12/11/2009 03:39 PM, Maxim Levitsky wrote:
> Hi,
>
> I am writing a driver, and I am faced with following problem:
>
> I receive a kernel pointer, and I want device to dma into that memory.
> However, device is only capable of 32 bit dma.
>
> I can create a 'consistent' mapping and use memcpy from/to it, but it
> feels like waste of performance.
>
>
> According to Documentation/DMA-mapping.txt, I can tell that hw supports
> 32 bit dma using pci_set_dma_mask, however, what will happen if I pass
> arbitrary kernel address into pci_map_single.
> What will happen if the address is above 32 bit?
The kernel should set up an IOMMU (either hardware or software) mapping
for that memory so that the device can access it through an address
below 4GB. This is assuming it's a 64-bit kernel (on 32-bit, a kernel
memory address will always be below 4GB).
On Fri, 2009-12-11 at 18:07 -0600, Robert Hancock wrote:
> On 12/11/2009 03:39 PM, Maxim Levitsky wrote:
> > Hi,
> >
> > I am writing a driver, and I am faced with following problem:
> >
> > I receive a kernel pointer, and I want device to dma into that memory.
> > However, device is only capable of 32 bit dma.
> >
> > I can create a 'consistent' mapping and use memcpy from/to it, but it
> > feels like waste of performance.
> >
> >
> > According to Documentation/DMA-mapping.txt, I can tell that hw supports
> > 32 bit dma using pci_set_dma_mask, however, what will happen if I pass
> > arbitrary kernel address into pci_map_single.
> > What will happen if the address is above 32 bit?
>
> The kernel should set up an IOMMU (either hardware or software) mapping
> for that memory so that the device can access it through an address
> below 4GB. This is assuming it's a 64-bit kernel (on 32-bit, a kernel
> memory address will always be below 4GB).
What do you mean by software IOMMU?
On my system there is no IOMMU present, so only way to ensure 32 bit
address is to copy pages.
pci_map_single could copy the data for write case, and pci_unmap_single
for read case, but I now strongly doubt they do.
I am not sure at all that these functions will fail if too high address
is specified....
Also, very recently, I found that in
Documentation/DMA-API.txt, there is a statement that says that
pci_map_single fails for >32 bit memory address.
I guess I just do a memcpy...
Best regards,
Maxim Levitsky
On Fri, Dec 11, 2009 at 6:18 PM, Maxim Levitsky <[email protected]> wrote:
>> The kernel should set up an IOMMU (either hardware or software) mapping
>> for that memory so that the device can access it through an address
>> below 4GB. This is assuming it's a 64-bit kernel (on 32-bit, a kernel
>> memory address will always be below 4GB).
>
> What do you mean by software IOMMU?
>
> On my system there is no IOMMU present, so only way to ensure 32 bit
> address is to copy pages.
> pci_map_single could copy the data for write case, and pci_unmap_single
> for read case, but I now strongly doubt they do.
It does. See the swiotlb code. You should see some message on bootup like:
PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
Placing 64MB software IO TLB between ffff880020000000 - ffff880024000000
software IO TLB at phys 0x20000000 - 0x24000000
> I am not sure at all that these functions will fail if too high address
> is specified....
>
> Also, very recently, I found that in
> Documentation/DMA-API.txt, there is a statement that says that
> pci_map_single fails for >32 bit memory address.
Where do you see this?
>
> I guess I just do a memcpy...
This isn't optimal because if there is a hardware IOMMU, the copy can
be avoided.
On Fri, 2009-12-11 at 18:38 -0600, Robert Hancock wrote:
> On Fri, Dec 11, 2009 at 6:18 PM, Maxim Levitsky <[email protected]> wrote:
> >> The kernel should set up an IOMMU (either hardware or software) mapping
> >> for that memory so that the device can access it through an address
> >> below 4GB. This is assuming it's a 64-bit kernel (on 32-bit, a kernel
> >> memory address will always be below 4GB).
> >
> > What do you mean by software IOMMU?
> >
> > On my system there is no IOMMU present, so only way to ensure 32 bit
> > address is to copy pages.
> > pci_map_single could copy the data for write case, and pci_unmap_single
> > for read case, but I now strongly doubt they do.
>
> It does. See the swiotlb code. You should see some message on bootup like:
>
> PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> Placing 64MB software IO TLB between ffff880020000000 - ffff880024000000
> software IO TLB at phys 0x20000000 - 0x24000000
Finally got my hands on this.
I thought that swiotlb is a hardware feature and its only supported on
AMD cpus.
I think I have seen that in Kconfig
swiotlb is not used on my system, but that is just due to the fact my
system has 64 bit kernel and 2 GB of memory.
Since swiotlb is generic, this I guess means that any physical address
and aligned address can be thrown at dma api, and it will still work.
Very nice.
Although, the fact that swiotlb memory space is constant, is a bit of a
problem, because this introduces a point of failure.
For me it doesn't matter at all, because all I need is 512 bytes.
Best regards,
Maxim Levitsky