2008-03-23 21:27:13

by Roland Dreier

[permalink] [raw]
Subject: [PATCH for 2.6.25] SVCRDMA: Use only 1 RDMA read scatter entry for iWARP adapters

The iWARP protocol limits RDMA read requests to a single scatter
entry. NFS/RDMA has code in rdma_read_max_sge() that is supposed to
limit the sge_count for RDMA read requests to 1, but the code to do
that is inside an #ifdef RDMA_TRANSPORT_IWARP block. In the mainline
kernel at least, RDMA_TRANSPORT_IWARP is an enum and not a
preprocessor #define, so the #ifdef'ed code is never compiled.

In my test of a kernel build with -j8 on an NFS/RDMA mount, this
problem eventually leads to trouble starting with:

svcrdma: Error posting send = -22
svcrdma : RDMA_READ error = -22

and things go downhill from there.

The trivial fix is to delete the #ifdef guard. The check seems to be
a remnant of when the NFS/RDMA code was not merged and needed to
compile against multiple kernel versions, although I don't think it
ever worked as intended. In any case now that the code is upstream
there's no need to test whether the RDMA_TRANSPORT_IWARP constant is
defined or not.

Without this patch, my kernel build on an NFS/RDMA mount using NetEffect
adapters quickly and 100% reproducibly failed with an error like:

ld: final link failed: Software caused connection abort

With the patch applied I was able to complete a kernel build on the
same setup.

Signed-off-by: Roland Dreier <[email protected]>
---
I guess this should probably go into 2.6.25 if possible, since things
get seriously screwed up in my testing once this bug is hit.

Not sure why this doesn't trigger on Chelsio or Ammasso adapters (or
does it?), but it's easily reproducible here on Neteffect adapters
(and that driver is now upstream for 2.6.25).

diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index ab54a73..9712716 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -237,14 +237,12 @@ static void rdma_set_ctxt_sge(struct svc_rdma_op_ctxt *ctxt,

static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count)
{
-#ifdef RDMA_TRANSPORT_IWARP
if ((RDMA_TRANSPORT_IWARP ==
rdma_node_get_transport(xprt->sc_cm_id->
device->node_type))
&& sge_count > 1)
return 1;
else
-#endif
return min_t(int, sge_count, xprt->sc_max_sge);
}



2008-03-23 23:52:51

by Tom Tucker

[permalink] [raw]
Subject: Re: [PATCH for 2.6.25] SVCRDMA: Use only 1 RDMA read scatter entry for iWARP adapters


Ack. It's actually an _ancient_ remnant when it had to compile against iWARP
vs. non-iWARP enabled OFA trees.


On 3/23/08 4:27 PM, "Roland Dreier" <[email protected]> wrote:

> The iWARP protocol limits RDMA read requests to a single scatter
> entry. NFS/RDMA has code in rdma_read_max_sge() that is supposed to
> limit the sge_count for RDMA read requests to 1, but the code to do
> that is inside an #ifdef RDMA_TRANSPORT_IWARP block. In the mainline
> kernel at least, RDMA_TRANSPORT_IWARP is an enum and not a
> preprocessor #define, so the #ifdef'ed code is never compiled.
>
> In my test of a kernel build with -j8 on an NFS/RDMA mount, this
> problem eventually leads to trouble starting with:
>
> svcrdma: Error posting send = -22
> svcrdma : RDMA_READ error = -22
>
> and things go downhill from there.
>
> The trivial fix is to delete the #ifdef guard. The check seems to be
> a remnant of when the NFS/RDMA code was not merged and needed to
> compile against multiple kernel versions, although I don't think it
> ever worked as intended. In any case now that the code is upstream
> there's no need to test whether the RDMA_TRANSPORT_IWARP constant is
> defined or not.
>
> Without this patch, my kernel build on an NFS/RDMA mount using NetEffect
> adapters quickly and 100% reproducibly failed with an error like:
>
> ld: final link failed: Software caused connection abort
>
> With the patch applied I was able to complete a kernel build on the
> same setup.
>
> Signed-off-by: Roland Dreier <[email protected]>
> ---
> I guess this should probably go into 2.6.25 if possible, since things
> get seriously screwed up in my testing once this bug is hit.
>
> Not sure why this doesn't trigger on Chelsio or Ammasso adapters (or
> does it?), but it's easily reproducible here on Neteffect adapters
> (and that driver is now upstream for 2.6.25).
>
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> index ab54a73..9712716 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> @@ -237,14 +237,12 @@ static void rdma_set_ctxt_sge(struct svc_rdma_op_ctxt
> *ctxt,
>
> static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count)
> {
> -#ifdef RDMA_TRANSPORT_IWARP
> if ((RDMA_TRANSPORT_IWARP ==
> rdma_node_get_transport(xprt->sc_cm_id->
> device->node_type))
> && sge_count > 1)
> return 1;
> else
> -#endif
> return min_t(int, sge_count, xprt->sc_max_sge);
> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html



2008-03-24 03:06:21

by Tom Tucker

[permalink] [raw]
Subject: FW: [PATCH for 2.6.25] SVCRDMA: Use only 1 RDMA read scatter entry for iWARP adapters

Bruce:

If possible, this should go in 2.6.25. This was a merge error on my part.
Thanks to Roland for figuring this out...

Tom

------ Forwarded Message
From: Tom Tucker <[email protected]>
Date: Sun, 23 Mar 2008 18:52:49 -0500
To: Roland Dreier <[email protected]>, "J. Bruce Fields"
<[email protected]>
Cc: NeilBrown <[email protected]>, Trond Myklebust <[email protected]>,
<[email protected]>, <[email protected]>
Conversation: [PATCH for 2.6.25] SVCRDMA: Use only 1 RDMA read scatter entry
for iWARP adapters
Subject: Re: [PATCH for 2.6.25] SVCRDMA: Use only 1 RDMA read scatter entry
for iWARP adapters


Ack. It's actually an _ancient_ remnant when it had to compile against iWARP
vs. non-iWARP enabled OFA trees.


On 3/23/08 4:27 PM, "Roland Dreier" <[email protected]> wrote:

> The iWARP protocol limits RDMA read requests to a single scatter
> entry. NFS/RDMA has code in rdma_read_max_sge() that is supposed to
> limit the sge_count for RDMA read requests to 1, but the code to do
> that is inside an #ifdef RDMA_TRANSPORT_IWARP block. In the mainline
> kernel at least, RDMA_TRANSPORT_IWARP is an enum and not a
> preprocessor #define, so the #ifdef'ed code is never compiled.
>
> In my test of a kernel build with -j8 on an NFS/RDMA mount, this
> problem eventually leads to trouble starting with:
>
> svcrdma: Error posting send = -22
> svcrdma : RDMA_READ error = -22
>
> and things go downhill from there.
>
> The trivial fix is to delete the #ifdef guard. The check seems to be
> a remnant of when the NFS/RDMA code was not merged and needed to
> compile against multiple kernel versions, although I don't think it
> ever worked as intended. In any case now that the code is upstream
> there's no need to test whether the RDMA_TRANSPORT_IWARP constant is
> defined or not.
>
> Without this patch, my kernel build on an NFS/RDMA mount using NetEffect
> adapters quickly and 100% reproducibly failed with an error like:
>
> ld: final link failed: Software caused connection abort
>
> With the patch applied I was able to complete a kernel build on the
> same setup.
>
> Signed-off-by: Roland Dreier <[email protected]>
> ---
> I guess this should probably go into 2.6.25 if possible, since things
> get seriously screwed up in my testing once this bug is hit.
>
> Not sure why this doesn't trigger on Chelsio or Ammasso adapters (or
> does it?), but it's easily reproducible here on Neteffect adapters
> (and that driver is now upstream for 2.6.25).
>
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> index ab54a73..9712716 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> @@ -237,14 +237,12 @@ static void rdma_set_ctxt_sge(struct svc_rdma_op_ctxt
> *ctxt,
>
> static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count)
> {
> -#ifdef RDMA_TRANSPORT_IWARP
> if ((RDMA_TRANSPORT_IWARP ==
> rdma_node_get_transport(xprt->sc_cm_id->
> device->node_type))
> && sge_count > 1)
> return 1;
> else
> -#endif
> return min_t(int, sge_count, xprt->sc_max_sge);
> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2008-03-24 15:44:56

by J. Bruce Fields

[permalink] [raw]
Subject: Re: FW: [PATCH for 2.6.25] SVCRDMA: Use only 1 RDMA read scatter entry for iWARP adapters

On Sun, Mar 23, 2008 at 10:06:08PM -0500, Tom Tucker wrote:
> Bruce:
>
> If possible, this should go in 2.6.25. This was a merge error on my part.
> Thanks to Roland for figuring this out...

Will do, thanks, but, as a subsequent patch, could we clean this up a
little? For example, assuming sge_count and sc_max_sge are always
positive, isn't the (sge_count > 1) check superfluous, given that the
following min_t's going to produce 1 in that case anyway?

Also, would it be possible just to ensure sc_max_sge is just set to 1
from the start in this case?

--b.

>
> Tom
>
> ------ Forwarded Message
> From: Tom Tucker <[email protected]>
> Date: Sun, 23 Mar 2008 18:52:49 -0500
> To: Roland Dreier <[email protected]>, "J. Bruce Fields"
> <[email protected]>
> Cc: NeilBrown <[email protected]>, Trond Myklebust <[email protected]>,
> <[email protected]>, <[email protected]>
> Conversation: [PATCH for 2.6.25] SVCRDMA: Use only 1 RDMA read scatter entry
> for iWARP adapters
> Subject: Re: [PATCH for 2.6.25] SVCRDMA: Use only 1 RDMA read scatter entry
> for iWARP adapters
>
>
> Ack. It's actually an _ancient_ remnant when it had to compile against iWARP
> vs. non-iWARP enabled OFA trees.
>
>
> On 3/23/08 4:27 PM, "Roland Dreier" <[email protected]> wrote:
>
> > The iWARP protocol limits RDMA read requests to a single scatter
> > entry. NFS/RDMA has code in rdma_read_max_sge() that is supposed to
> > limit the sge_count for RDMA read requests to 1, but the code to do
> > that is inside an #ifdef RDMA_TRANSPORT_IWARP block. In the mainline
> > kernel at least, RDMA_TRANSPORT_IWARP is an enum and not a
> > preprocessor #define, so the #ifdef'ed code is never compiled.
> >
> > In my test of a kernel build with -j8 on an NFS/RDMA mount, this
> > problem eventually leads to trouble starting with:
> >
> > svcrdma: Error posting send = -22
> > svcrdma : RDMA_READ error = -22
> >
> > and things go downhill from there.
> >
> > The trivial fix is to delete the #ifdef guard. The check seems to be
> > a remnant of when the NFS/RDMA code was not merged and needed to
> > compile against multiple kernel versions, although I don't think it
> > ever worked as intended. In any case now that the code is upstream
> > there's no need to test whether the RDMA_TRANSPORT_IWARP constant is
> > defined or not.
> >
> > Without this patch, my kernel build on an NFS/RDMA mount using NetEffect
> > adapters quickly and 100% reproducibly failed with an error like:
> >
> > ld: final link failed: Software caused connection abort
> >
> > With the patch applied I was able to complete a kernel build on the
> > same setup.
> >
> > Signed-off-by: Roland Dreier <[email protected]>
> > ---
> > I guess this should probably go into 2.6.25 if possible, since things
> > get seriously screwed up in my testing once this bug is hit.
> >
> > Not sure why this doesn't trigger on Chelsio or Ammasso adapters (or
> > does it?), but it's easily reproducible here on Neteffect adapters
> > (and that driver is now upstream for 2.6.25).
> >
> > diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > index ab54a73..9712716 100644
> > --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > @@ -237,14 +237,12 @@ static void rdma_set_ctxt_sge(struct svc_rdma_op_ctxt
> > *ctxt,
> >
> > static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count)
> > {
> > -#ifdef RDMA_TRANSPORT_IWARP
> > if ((RDMA_TRANSPORT_IWARP ==
> > rdma_node_get_transport(xprt->sc_cm_id->
> > device->node_type))
> > && sge_count > 1)
> > return 1;
> > else
> > -#endif
> > return min_t(int, sge_count, xprt->sc_max_sge);
> > }
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> ------ End of Forwarded Message
>
>

2008-03-24 16:57:08

by Roland Dreier

[permalink] [raw]
Subject: Re: FW: [PATCH for 2.6.25] SVCRDMA: Use only 1 RDMA read scatter entry for iWARP adapters

> Will do, thanks, but, as a subsequent patch, could we clean this up a
> little? For example, assuming sge_count and sc_max_sge are always
> positive, isn't the (sge_count > 1) check superfluous, given that the
> following min_t's going to produce 1 in that case anyway?

Yes, I had the same thought reading the code but I didn't want to try
and do too many things in one patch.

> Also, would it be possible just to ensure sc_max_sge is just set to 1
> from the start in this case?

I think the problem with this is that on InfiniBand, it is fine to
have multiple scatter entries for an RDMA read request, while on iWARP
only 1 scatter entry is possible for RDMA read. For both IB and
iWARP, you can eg have multiple gather entries for an RDMA write. So
sc_max_sge as it stands may want to be bigger than 1 even for iWARP devices.

One not-too-bad possibility would just be to have sc_max_sge and
sc_max_sge_rdma_read fields and initialize them properly based on IB
vs iWARP...

- R.

2008-03-24 17:05:31

by J. Bruce Fields

[permalink] [raw]
Subject: Re: FW: [PATCH for 2.6.25] SVCRDMA: Use only 1 RDMA read scatter entry for iWARP adapters

On Mon, Mar 24, 2008 at 09:57:04AM -0700, Roland Dreier wrote:
> > Will do, thanks, but, as a subsequent patch, could we clean this up a
> > little? For example, assuming sge_count and sc_max_sge are always
> > positive, isn't the (sge_count > 1) check superfluous, given that the
> > following min_t's going to produce 1 in that case anyway?
>
> Yes, I had the same thought reading the code but I didn't want to try
> and do too many things in one patch.

Sure.

>
> > Also, would it be possible just to ensure sc_max_sge is just set to 1
> > from the start in this case?
>
> I think the problem with this is that on InfiniBand, it is fine to
> have multiple scatter entries for an RDMA read request, while on iWARP
> only 1 scatter entry is possible for RDMA read. For both IB and
> iWARP, you can eg have multiple gather entries for an RDMA write. So
> sc_max_sge as it stands may want to be bigger than 1 even for iWARP devices.
>
> One not-too-bad possibility would just be to have sc_max_sge and
> sc_max_sge_rdma_read fields and initialize them properly based on IB
> vs iWARP...

I'll leave the decision to people who understand this code better than
me, but offhand that does sound like it'd be a little more
straightforward.

--b.