2006-12-13 07:48:12

by Andrew Morton

[permalink] [raw]
Subject: Re: IB: Add DMA mapping functions to allow device drivers to interpose

On Wed, 13 Dec 2006 03:59:45 GMT
Linux Kernel Mailing List <[email protected]> wrote:

> IB: Add DMA mapping functions to allow device drivers to interpose
>
> The QLogic InfiniPath HCAs use programmed I/O instead of HW DMA.
> This patch allows a verbs device driver to interpose on DMA mapping
> function calls in order to avoid relying on bus_to_virt() and
> phys_to_virt() to undo the mappings created by dma_map_single(),
> dma_map_sg(), etc.
>
> Signed-off-by: Ralph Campbell <[email protected]>
> Signed-off-by: Roland Dreier <[email protected]>

include/rdma/ib_verbs.h: In function 'ib_dma_alloc_coherent':
include/rdma/ib_verbs.h:1635: warning: passing argument 3 of 'dma_alloc_coherent' from incompatible pointer type
In file included from drivers/infiniband/hw/mthca/mthca_provider.h:41,
from drivers/infiniband/hw/mthca/mthca_dev.h:53,
from drivers/infiniband/hw/mthca/mthca_main.c:44:
include/rdma/ib_verbs.h: In function 'ib_dma_alloc_coherent':
include/rdma/ib_verbs.h:1635: warning: passing argument 3 of 'dma_alloc_coherent' from incompatible pointer type


That u64 needs to become a dma_addr_t. That means that
ib_dma_mapping_ops.alloc_coherent() and ib_dma_mapping_ops.free_coherent() are
wrong as well.

> +struct ib_dma_mapping_ops {
> ...
> + void *(*alloc_coherent)(struct ib_device *dev,
> + size_t size,
> + u64 *dma_handle,
> + gfp_t flag);
> + void (*free_coherent)(struct ib_device *dev,
> + size_t size, void *cpu_addr,
> + u64 dma_handle);
> +};

I'd have picked this up if it had been in git-infiniband for even a couple
of days. I'm assuming this all got slammed into mainline because of the
merge window thing.

I cannot find these patches on the kernel mailing list. I cannot find the
pull request anywhere.

> +static inline u64 ib_dma_map_single(struct ib_device *dev,
> + void *cpu_addr, size_t size,
> + enum dma_data_direction direction)

no, dma_map_single() returns a dma_addr_t.



2006-12-13 20:31:09

by Andrew Morton

[permalink] [raw]
Subject: Re: IB: Add DMA mapping functions to allow device drivers to interpose

On Wed, 13 Dec 2006 12:11:27 -0800
Ralph Campbell <[email protected]> wrote:

> > I'd have picked this up if it had been in git-infiniband for even a couple
> > of days. I'm assuming this all got slammed into mainline because of the
> > merge window thing.
> >
> > I cannot find these patches on the kernel mailing list. I cannot find the
> > pull request anywhere.
> >
> > > +static inline u64 ib_dma_map_single(struct ib_device *dev,
> > > + void *cpu_addr, size_t size,
> > > + enum dma_data_direction direction)
> >
> > no, dma_map_single() returns a dma_addr_t.
>
> ib_dma_map_single() allows the ib_ipath device driver to interpose
> on IOMMU allocations and not do them by returning the kernel
> virtual address as the "DMA address". I started with dma_addr_t
> but it was pointed out to me that sparc64 defines dma_addr_t
> as u32. This would cause addresses to be truncated.
> Also, I chose u64 because the return value from ib_dma_*() is
> stored in the ib_sge.addr field which is u64.
>
> My preference would be to change the offending uses of dma_addr_t
> to u64. Do you have a better solution?

We should be able to use dma_addr_t for this? Is it not the case that the
values we're dealing with here _are_ DMA addresses? I think a more complete
description of the problem we're trying to solve here would help.

I'm not sure what the problem is with sparc64 - perhaps its dma_addr_t
really is a "cookie" and isn't a physical bus address? But you want a value
which is really a physical bus address? Dunno.

Perhaps dma64_addr_t can be used here.

2006-12-13 20:44:54

by Ralph Campbell

[permalink] [raw]
Subject: Re: IB: Add DMA mapping functions to allow device drivers to interpose

On Tue, 2006-12-12 at 23:47 -0800, Andrew Morton wrote:
> On Wed, 13 Dec 2006 03:59:45 GMT
> Linux Kernel Mailing List <[email protected]> wrote:
>
> > IB: Add DMA mapping functions to allow device drivers to interpose
> >
> > The QLogic InfiniPath HCAs use programmed I/O instead of HW DMA.
> > This patch allows a verbs device driver to interpose on DMA mapping
> > function calls in order to avoid relying on bus_to_virt() and
> > phys_to_virt() to undo the mappings created by dma_map_single(),
> > dma_map_sg(), etc.
> >
> > Signed-off-by: Ralph Campbell <[email protected]>
> > Signed-off-by: Roland Dreier <[email protected]>
>
> include/rdma/ib_verbs.h: In function 'ib_dma_alloc_coherent':
> include/rdma/ib_verbs.h:1635: warning: passing argument 3 of 'dma_alloc_coherent' from incompatible pointer type
> In file included from drivers/infiniband/hw/mthca/mthca_provider.h:41,
> from drivers/infiniband/hw/mthca/mthca_dev.h:53,
> from drivers/infiniband/hw/mthca/mthca_main.c:44:
> include/rdma/ib_verbs.h: In function 'ib_dma_alloc_coherent':
> include/rdma/ib_verbs.h:1635: warning: passing argument 3 of 'dma_alloc_coherent' from incompatible pointer type
>
>
> That u64 needs to become a dma_addr_t. That means that
> ib_dma_mapping_ops.alloc_coherent() and ib_dma_mapping_ops.free_coherent() are
> wrong as well.
>
> > +struct ib_dma_mapping_ops {
> > ...
> > + void *(*alloc_coherent)(struct ib_device *dev,
> > + size_t size,
> > + u64 *dma_handle,
> > + gfp_t flag);
> > + void (*free_coherent)(struct ib_device *dev,
> > + size_t size, void *cpu_addr,
> > + u64 dma_handle);
> > +};
>
> I'd have picked this up if it had been in git-infiniband for even a couple
> of days. I'm assuming this all got slammed into mainline because of the
> merge window thing.
>
> I cannot find these patches on the kernel mailing list. I cannot find the
> pull request anywhere.
>
> > +static inline u64 ib_dma_map_single(struct ib_device *dev,
> > + void *cpu_addr, size_t size,
> > + enum dma_data_direction direction)
>
> no, dma_map_single() returns a dma_addr_t.

ib_dma_map_single() allows the ib_ipath device driver to interpose
on IOMMU allocations and not do them by returning the kernel
virtual address as the "DMA address". I started with dma_addr_t
but it was pointed out to me that sparc64 defines dma_addr_t
as u32. This would cause addresses to be truncated.
Also, I chose u64 because the return value from ib_dma_*() is
stored in the ib_sge.addr field which is u64.

My preference would be to change the offending uses of dma_addr_t
to u64. Do you have a better solution?

2006-12-13 21:22:42

by Ralph Campbell

[permalink] [raw]
Subject: Re: IB: Add DMA mapping functions to allow device drivers to interpose

On Wed, 2006-12-13 at 12:30 -0800, Andrew Morton wrote:
> On Wed, 13 Dec 2006 12:11:27 -0800
> Ralph Campbell <[email protected]> wrote:
snip.
> > My preference would be to change the offending uses of dma_addr_t
> > to u64. Do you have a better solution?
>
> We should be able to use dma_addr_t for this? Is it not the case that the
> values we're dealing with here _are_ DMA addresses? I think a more complete
> description of the problem we're trying to solve here would help.
>
> I'm not sure what the problem is with sparc64 - perhaps its dma_addr_t
> really is a "cookie" and isn't a physical bus address? But you want a value
> which is really a physical bus address? Dunno.
>
> Perhaps dma64_addr_t can be used here.

The fundamental problem is that ib_get_dma_mr() returns a
pointer representing a memory region for all of physical memory
and the IB interface consumer is expected to call dma_map_single()
to get a dma_addr_t to pass to the HCA driver with the memory
region pointer.

For an HCA like Mellanox, the dma_addr_t really is a DMA address
which the hardware uses to DMA data from the chip to memory.

The ib_ipath HCA driver does not generally use DMA addresses.
The hardware does DMA the receive packet to a ring of buffers
but the receive interrupt handler then has to copy the data
from the receive buffer to the address specified by the
ib_post_recv() function. This means the driver needs a
kernel virtual address instead of the memory region + DMA address
that is passed by ib_post_recv().

The ib_dma_*() functions were added to allow the ib_ipath
driver to interpose on the dma_*() functions so that a
kernel virtual address can be returned instead of allocating
an IOMMU DMA address. Without the interposing functions,
there is no way for the ib_ipath driver to convert the IOMMU
address back into a kernel virtual address since it never
sees the original inputs used to allocate the IOMMU address.

I don't want to make ib_dma_map_single() return a dma_addr_t
cookie and use it to lookup the kernel virtual address since
this is a performance critical code path in the receive
interrupt handler and the "cookie" needs to be a byte address
which can be offset within a page.

The secondary problem is that on sparc64, dma_addr_t is a
u32 so if ib_dma_map_single() returns a void * cast to dma_addr_t,
it is truncated. We might be able to convince the sparc64
maintainers to make dma_addr_t u64 but dma64_addr_t won't
work because that is specific to sparc64. The type has
to be architecture neutral.