2006-05-01 18:50:39

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 5 of 13] ipath - use proper address translation routine

Bryan> Move away from an obsolete, unportable routine for
Bryan> translating physical addresses.

This change:

> - isge->vaddr = bus_to_virt(sge->addr);
> + isge->vaddr = phys_to_virt(sge->addr);

is really wrong. bus_to_virt() is really what you want, because in
this case the address is a bus address that came from dma_map_xxx().
You're still going to be hosed on systems with IOMMUs for example.

- R.


2006-05-01 18:54:18

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH 5 of 13] ipath - use proper address translation routine

On Mon, 2006-05-01 at 11:50 -0700, Roland Dreier wrote:
> Bryan> Move away from an obsolete, unportable routine for
> Bryan> translating physical addresses.
>
> This change:
>
> > - isge->vaddr = bus_to_virt(sge->addr);
> > + isge->vaddr = phys_to_virt(sge->addr);
>
> is really wrong. bus_to_virt() is really what you want, because in
> this case the address is a bus address that came from dma_map_xxx().
> You're still going to be hosed on systems with IOMMUs for example.

do you really NEED the vaddr?
(most of the time linux drivers don't need it, while other OSes do)
If you really need it you should grab it at dma_map time ...
(and realize that it's not kernel addressable per se ;)

2006-05-01 19:00:13

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 5 of 13] ipath - use proper address translation routine

Arjan> do you really NEED the vaddr? (most of the time linux
Arjan> drivers don't need it, while other OSes do) If you really
Arjan> need it you should grab it at dma_map time ... (and
Arjan> realize that it's not kernel addressable per se ;)

Yes, they need some kind of vaddr.

It's kind of a layering problem. The IB stack assumes that IB devices
have a DMA engine that deals with bus addresses. But the ipath driver
has to simulate this by using a memcpy on the CPU to move data to the
PCI device.

I really don't know what the right solution is. Maybe having some way
to override the dma mapping operations so that the ipath driver can
keep the info it needs?

- R.

2006-05-01 19:03:17

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [PATCH 5 of 13] ipath - use proper address translation routine

On Mon, 2006-05-01 at 11:50 -0700, Roland Dreier wrote:
> Bryan> Move away from an obsolete, unportable routine for
> Bryan> translating physical addresses.
>
> This change:
>
> > - isge->vaddr = bus_to_virt(sge->addr);
> > + isge->vaddr = phys_to_virt(sge->addr);
>
> is really wrong. bus_to_virt() is really what you want, because in
> this case the address is a bus address that came from dma_map_xxx().

Well, bus_to_virt is not portable, so we definitely can't use it. I'll
have to do some thinking about this.

<b

2006-05-01 19:20:08

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH 5 of 13] ipath - use proper address translation routine

On Mon, 2006-05-01 at 12:00 -0700, Roland Dreier wrote:
> Arjan> do you really NEED the vaddr? (most of the time linux
> Arjan> drivers don't need it, while other OSes do) If you really
> Arjan> need it you should grab it at dma_map time ... (and
> Arjan> realize that it's not kernel addressable per se ;)
>
> Yes, they need some kind of vaddr.
>
> It's kind of a layering problem. The IB stack assumes that IB devices
> have a DMA engine that deals with bus addresses. But the ipath driver
> has to simulate this by using a memcpy on the CPU to move data to the
> PCI device.
>
> I really don't know what the right solution is. Maybe having some way
> to override the dma mapping operations so that the ipath driver can
> keep the info it needs?

sounds like you need to redesign your layering ;)
In linux it's common to have the lowest level driver do the mapping
(even when the mid layer will provide the most commonly used helper to
do it for the common case)...


2006-05-01 19:28:44

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 5 of 13] ipath - use proper address translation routine

Arjan> sounds like you need to redesign your layering ;) In linux
Arjan> it's common to have the lowest level driver do the mapping
Arjan> (even when the mid layer will provide the most commonly
Arjan> used helper to do it for the common case)...

It's not that simple of course...

InfiniBand allows RDMA -- _remote_ DMA. So that address might be
something that a protocol sent to the remote host and which is now
showing up for a DMA operation initiated by the remote side. And we
can't very well send a struct page * + offset to the remote side...

2006-05-02 13:35:14

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 5 of 13] ipath - use proper address translation routine

On Mon, May 01, 2006 at 12:00:00PM -0700, Roland Dreier wrote:
> Arjan> do you really NEED the vaddr? (most of the time linux
> Arjan> drivers don't need it, while other OSes do) If you really
> Arjan> need it you should grab it at dma_map time ... (and
> Arjan> realize that it's not kernel addressable per se ;)
>
> Yes, they need some kind of vaddr.
>
> It's kind of a layering problem. The IB stack assumes that IB devices
> have a DMA engine that deals with bus addresses. But the ipath driver
> has to simulate this by using a memcpy on the CPU to move data to the
> PCI device.
>
> I really don't know what the right solution is. Maybe having some way
> to override the dma mapping operations so that the ipath driver can
> keep the info it needs?

Or stop doing the dma mapping in the IB upper level drivers. I told you
that we'll get broken hardware that doesn't want dma mapping in the upper
level driver, and pathscale created exactly that :)

2006-05-02 14:24:22

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 5 of 13] ipath - use proper address translation routine

Christoph> Or stop doing the dma mapping in the IB upper level
Christoph> drivers. I told you that we'll get broken hardware
Christoph> that doesn't want dma mapping in the upper level
Christoph> driver, and pathscale created exactly that :)

But see my earlier mail to Arjan about RDMA -- what address can a
protocol (eg SRP initiator) put in a message that the other side will
use to initiate a remote DMA operation? It seems to me it has to be a
bus address, and that means that the protocol has to do the DMA mapping.

- R.

2006-05-02 14:27:36

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 5 of 13] ipath - use proper address translation routine

On Tue, May 02, 2006 at 07:24:18AM -0700, Roland Dreier wrote:
> Christoph> Or stop doing the dma mapping in the IB upper level
> Christoph> drivers. I told you that we'll get broken hardware
> Christoph> that doesn't want dma mapping in the upper level
> Christoph> driver, and pathscale created exactly that :)
>
> But see my earlier mail to Arjan about RDMA -- what address can a
> protocol (eg SRP initiator) put in a message that the other side will
> use to initiate a remote DMA operation? It seems to me it has to be a
> bus address, and that means that the protocol has to do the DMA mapping.

Then we're back to the discussion on why RDMA is a fundamentally flawed
approach, but we already knew that. The usual workaround is to only
allow RDMA operations to registered memory windows for which we can use
the normal dma operation. There's also the *dac* pci dma operations that
can avoid iommu overhead if you support 64bit addressing. But for all this
to work dma mapping fundamentally needs to be handled by the low level driver.

2006-05-02 14:44:33

by Alan

[permalink] [raw]
Subject: Re: [PATCH 5 of 13] ipath - use proper address translation routine

On Maw, 2006-05-02 at 07:24 -0700, Roland Dreier wrote:
> But see my earlier mail to Arjan about RDMA -- what address can a
> protocol (eg SRP initiator) put in a message that the other side will
> use to initiate a remote DMA operation? It seems to me it has to be a
> bus address, and that means that the protocol has to do the DMA mapping.

For most drivers properly, but you are making assumptions again. Why
can't a driver which is doing its own mapping not also do its own rdma
cookie handling ? You opt out of mapping being done for you, then you
get opted out of defaults for other stuff too.

Alan

2006-05-02 14:58:25

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 5 of 13] ipath - use proper address translation routine

Alan> For most drivers properly, but you are making assumptions
Alan> again. Why can't a driver which is doing its own mapping not
Alan> also do its own rdma cookie handling ? You opt out of
Alan> mapping being done for you, then you get opted out of
Alan> defaults for other stuff too.

You're right, and that was what I was driving at in my earlier message
when I talked about overriding the dma mapping operations for a
device. That would let ipath or whatever create its own RDMA cookies,
and keep track of the struct page or kernel virtual address of the
original memory, so it can do memcpy when needed.

I don't think the idea lets you push mapping down into the low-level
driver, though. Take the SRP initiator as a specific example. The
SCSI midlayer gives SRP a SCSI command to send. The SRP initiator
formats that into an SRP message, with a "memory descriptor" (address
and RDMA cookie) for the buffer associated with the SCSI command, and
tells the low-level driver to send that message to the target. The
target then performs RDMA into that buffer, sending back only the RDMA
cookie and address.

So unless you teach every low-level driver how to snoop inside SRP
messages (along with NFS/RDMA, iSER and all the other protocols), I
don't see where the low-level driver has a chance to do the mapping.

- R.