2009-04-27 19:32:04

by Steve Wise

[permalink] [raw]
Subject: Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.

Trond Myklebust wrote:
> On Mon, 2009-04-27 at 14:05 -0400, Trond Myklebust wrote:
>
>> It looks looks as though the bug is really that the IB code is using a
>> u64 to store dma handles. As an external user of the IB api, we really
>> shouldn't have to perform this sort of transformation. If it is
>> absolutely necessary, then it should be done by means of specialised
>> accessor functions to initialise/read iova_start value when given a
>> dma_addr_t.
>>
>> I'd therefore prefer the no-cast version (with eventual compiler
>> warnings), in the hope that eventually the IB folks will fix their
>> interface.
>>
>
> Translation: It looks to me as if the interface that we're using is a
> bit too corrupted with IB low level implementation grime. In the future,
> I'd like to see someone come up with a more high level interface for use
> by external code such as the sunrpc module.
>
>

Clarification: The iova_start isn't used to store dma handles. The
iova_start is the "address" base value that is advertised to a peer to
describe the base address of a memory region. The contents of that can
be more than just a dma handle...its up to the application. For
instance, you could advertise a iova_start of zero or a kernel VA as the
rdma server does. Also, the type is u64 because that is the size used
on the wire as part of the rdma (IB and iWARP) protocols.


Steve.


2009-04-27 19:43:02

by Tom Talpey

[permalink] [raw]
Subject: Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.

At 03:32 PM 4/27/2009, Steve Wise wrote:
>Trond Myklebust wrote:
>> On Mon, 2009-04-27 at 14:05 -0400, Trond Myklebust wrote:
>>
>>> It looks looks as though the bug is really that the IB code is using a
>>> u64 to store dma handles. As an external user of the IB api, we really
>>> shouldn't have to perform this sort of transformation. If it is
>>> absolutely necessary, then it should be done by means of specialised
>>> accessor functions to initialise/read iova_start value when given a
>>> dma_addr_t.
>>>
>>> I'd therefore prefer the no-cast version (with eventual compiler
>>> warnings), in the hope that eventually the IB folks will fix their
>>> interface.
>>>
>>
>> Translation: It looks to me as if the interface that we're using is a
>> bit too corrupted with IB low level implementation grime. In the future,
>> I'd like to see someone come up with a more high level interface for use
>> by external code such as the sunrpc module.
>>
>>
>
>Clarification: The iova_start isn't used to store dma handles. The

Agreed, it's more of a hardware register, that ends up on the wire as well.

I think the net of this is that the mr_dma should have a more sensible
up-cast that yields the right bits in the iova_start. Maybe a nice
machine-dependent macro, defined in the RDMA layer, would be a good
approach. Surely the other upper layers need it too.

While I have the floor, why doesn't the server have this issue? Looking
at the code, it has the same (unsigned long) cast as the client when
initializing its iova_start.

Tom.

>iova_start is the "address" base value that is advertised to a peer to
>describe the base address of a memory region. The contents of that can
>be more than just a dma handle...its up to the application. For
>instance, you could advertise a iova_start of zero or a kernel VA as the
>rdma server does. Also, the type is u64 because that is the size used
>on the wire as part of the rdma (IB and iWARP) protocols.
>
>
>Steve.
>


2009-04-27 20:47:01

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.

On Mon, 2009-04-27 at 14:32 -0500, Steve Wise wrote:
> Clarification: The iova_start isn't used to store dma handles. The
> iova_start is the "address" base value that is advertised to a peer to
> describe the base address of a memory region. The contents of that can
> be more than just a dma handle...its up to the application. For
> instance, you could advertise a iova_start of zero or a kernel VA as the
> rdma server does. Also, the type is u64 because that is the size used
> on the wire as part of the rdma (IB and iWARP) protocols.

OK, but my point is we shouldn't be having this discussion at all. I
shouldn't be required to know that the wire protocol uses a 64-bit
unsigned little-endian/big endian integer in order to use the rdma api.

All I should need to know is that I can advertise either dma handles or
kernel VAs, and know that I can choose between two functions, say,
ib_send_wr_fastreg_dma_init() and ib_send_wr_fastreg_kva_init() to
initialise the ib_send_wr structure correctly.

Trond