2015-06-03 18:40:16

by Shirley Ma

[permalink] [raw]
Subject: [PATCH RFC] NFS/RDMA Release resources in svcrdma when device is removed

When removing underlying RDMA device, the rmmod will hang forever if there
are any outstanding NFS/RDMA client mounts. The outstanding NFS/RDMA counts
could also prevent the server from shutting down. Further debugging shows
that the existing connections are not teared down and resource are not
released when receiving RDMA_CM_EVENT_DEVICE_REMOVAL event. It seems the
original code missing svc_xprt_put() in RDMA_CM_EVENT_REMOVAL event handler
thus svc_xprt_free is never invoked to release the existing connection resources.

The patch has been passed removing, adding device back and forth without
stopping NFS/RDMA service. This will also allow a device to be unplugged
and swapped out without shutting down NFS service.

Signed-off-by: Shirley Ma <[email protected]>
---
net/sunrpc/xprtrdma/svc_rdma_transport.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index f609c1c..2b82569 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -673,6 +673,7 @@ static int rdma_cma_handler(struct rdma_cm_id *cma_id,
if (xprt) {
set_bit(XPT_CLOSE, &xprt->xpt_flags);
svc_xprt_enqueue(xprt);
+ svc_xprt_put(xprt);
}
break;
default:

Shirley


2015-06-03 18:42:17

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH RFC] NFS/RDMA Release resources in svcrdma when device is removed


On Jun 3, 2015, at 2:40 PM, Shirley Ma <[email protected]> wrote:

> When removing underlying RDMA device, the rmmod will hang forever if there
> are any outstanding NFS/RDMA client mounts. The outstanding NFS/RDMA counts
> could also prevent the server from shutting down. Further debugging shows
> that the existing connections are not teared down and resource are not
> released when receiving RDMA_CM_EVENT_DEVICE_REMOVAL event. It seems the
> original code missing svc_xprt_put() in RDMA_CM_EVENT_REMOVAL event handler
> thus svc_xprt_free is never invoked to release the existing connection resources.
>
> The patch has been passed removing, adding device back and forth without
> stopping NFS/RDMA service. This will also allow a device to be unplugged
> and swapped out without shutting down NFS service.
>
> Signed-off-by: Shirley Ma <[email protected]>

Reviewed-by: Chuck Lever <[email protected]>

> ---
> net/sunrpc/xprtrdma/svc_rdma_transport.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index f609c1c..2b82569 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -673,6 +673,7 @@ static int rdma_cma_handler(struct rdma_cm_id *cma_id,
> if (xprt) {
> set_bit(XPT_CLOSE, &xprt->xpt_flags);
> svc_xprt_enqueue(xprt);
> + svc_xprt_put(xprt);
> }
> break;
> default:
>
> Shirley

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2015-06-03 18:47:16

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH RFC] NFS/RDMA Release resources in svcrdma when device is removed


On Jun 3, 2015, at 2:44 PM, Chuck Lever <[email protected]> wrote:

>
> On Jun 3, 2015, at 2:40 PM, Shirley Ma <[email protected]> wrote:
>
>> When removing underlying RDMA device, the rmmod will hang forever if there
>> are any outstanding NFS/RDMA client mounts. The outstanding NFS/RDMA counts
>> could also prevent the server from shutting down. Further debugging shows
>> that the existing connections are not teared down and resource are not
>> released when receiving RDMA_CM_EVENT_DEVICE_REMOVAL event. It seems the
>> original code missing svc_xprt_put() in RDMA_CM_EVENT_REMOVAL event handler
>> thus svc_xprt_free is never invoked to release the existing connection resources.
>>
>> The patch has been passed removing, adding device back and forth without
>> stopping NFS/RDMA service. This will also allow a device to be unplugged
>> and swapped out without shutting down NFS service.

And maybe also add:

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=252

here.

>> Signed-off-by: Shirley Ma <[email protected]>
>
> Reviewed-by: Chuck Lever <[email protected]>
>
>> ---
>> net/sunrpc/xprtrdma/svc_rdma_transport.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> index f609c1c..2b82569 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> @@ -673,6 +673,7 @@ static int rdma_cma_handler(struct rdma_cm_id *cma_id,
>> if (xprt) {
>> set_bit(XPT_CLOSE, &xprt->xpt_flags);
>> svc_xprt_enqueue(xprt);
>> + svc_xprt_put(xprt);
>> }
>> break;
>> default:
>>
>> Shirley
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2015-06-03 21:56:57

by Shirley Ma

[permalink] [raw]
Subject: Re: [PATCH RFC] NFS/RDMA Release resources in svcrdma when device is removed

On 06/03/2015 11:49 AM, Chuck Lever wrote:
>
> On Jun 3, 2015, at 2:44 PM, Chuck Lever <[email protected]> wrote:
>
>>
>> On Jun 3, 2015, at 2:40 PM, Shirley Ma <[email protected]> wrote:
>>
>>> When removing underlying RDMA device, the rmmod will hang forever if there
>>> are any outstanding NFS/RDMA client mounts. The outstanding NFS/RDMA counts
>>> could also prevent the server from shutting down. Further debugging shows
>>> that the existing connections are not teared down and resource are not
>>> released when receiving RDMA_CM_EVENT_DEVICE_REMOVAL event. It seems the
>>> original code missing svc_xprt_put() in RDMA_CM_EVENT_REMOVAL event handler
>>> thus svc_xprt_free is never invoked to release the existing connection resources.
>>>
>>> The patch has been passed removing, adding device back and forth without
>>> stopping NFS/RDMA service. This will also allow a device to be unplugged
>>> and swapped out without shutting down NFS service.
>
> And maybe also add:
>
> BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=252

Yes, this patch has addressed above problem too. I forgot this bug.

> here.
>
>>> Signed-off-by: Shirley Ma <[email protected]>
>>
>> Reviewed-by: Chuck Lever <[email protected]>
>>
>>> ---
>>> net/sunrpc/xprtrdma/svc_rdma_transport.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>> index f609c1c..2b82569 100644
>>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>> @@ -673,6 +673,7 @@ static int rdma_cma_handler(struct rdma_cm_id *cma_id,
>>> if (xprt) {
>>> set_bit(XPT_CLOSE, &xprt->xpt_flags);
>>> svc_xprt_enqueue(xprt);
>>> + svc_xprt_put(xprt);
>>> }
>>> break;
>>> default:
>>>
>>> Shirley
>>
>> --
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>