2009-01-05 03:33:55

by Tom Tucker

[permalink] [raw]
Subject: Re: [PATCH 3/3] SUNRPC: svc_xprt_enqueue should not refuse to enqueue 'XPT_DEAD' transports

Trond Myklebust wrote:
> On Fri, 2009-01-02 at 15:44 -0600, Tom Tucker wrote:
>> Bruce/Trond:
>>
>> This is an alternative to patches 2 and 3 from Trond's fix. I think
>> Trond's fix is correct, but I believe this approach to be simpler.
>>
>> From: Tom Tucker <[email protected]>
>> Date: Wed, 31 Dec 2008 17:18:33 -0600
>> Subject: [PATCH] svc: Clean up deferred requests on transport destruction
>>
>> A race between svc_revisit and svc_delete_xprt can result in
>> deferred requests holding references on a transport that can never be
>> recovered because dead transports are not enqueued for subsequent
>> processing.
>>
>> Check for XPT_DEAD in revisit to clean up completing deferrals on a dead
>> transport and sweep a transport's deferred queue to do the same for queued
>> but unprocessed deferrals.
>>
>> Signed-off-by: Tom Tucker <[email protected]>
>> ---
>> net/sunrpc/svc_xprt.c | 20 +++++++++++++++-----
>> 1 files changed, 15 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>> index bf5b5cd..92ca5c6 100644
>> --- a/net/sunrpc/svc_xprt.c
>> +++ b/net/sunrpc/svc_xprt.c
>> @@ -837,6 +837,11 @@ static void svc_age_temp_xprts(unsigned long closure)
>> void svc_delete_xprt(struct svc_xprt *xprt)
>> {
>> struct svc_serv *serv = xprt->xpt_server;
>> + struct svc_deferred_req *dr;
>> +
>> + /* Only do this once */
>> + if (test_and_set_bit(XPT_DEAD, &xprt->xpt_flags))
>> + return;
>>
>> dprintk("svc: svc_delete_xprt(%p)\n", xprt);
>> xprt->xpt_ops->xpo_detach(xprt);
>> @@ -851,12 +856,16 @@ void svc_delete_xprt(struct svc_xprt *xprt)
>> * while still attached to a queue, the queue itself
>> * is about to be destroyed (in svc_destroy).
>> */
>> - if (!test_and_set_bit(XPT_DEAD, &xprt->xpt_flags)) {
>> - BUG_ON(atomic_read(&xprt->xpt_ref.refcount) < 2);
>> - if (test_bit(XPT_TEMP, &xprt->xpt_flags))
>> - serv->sv_tmpcnt--;
>> + if (test_bit(XPT_TEMP, &xprt->xpt_flags))
>> + serv->sv_tmpcnt--;
>> +
>> + for (dr = svc_deferred_dequeue(xprt); dr;
>> + dr = svc_deferred_dequeue(xprt)) {
>> svc_xprt_put(xprt);
>> + kfree(dr);
>> }
>> +

If there are queued deferrals that are not processed, they will be
cleaned up here and their references dropped.

>> + svc_xprt_put(xprt);
>> spin_unlock_bh(&serv->sv_lock);
>> }
>>
>> @@ -902,7 +911,8 @@ static void svc_revisit(struct cache_deferred_req *dreq, int too_many)
>> container_of(dreq, struct svc_deferred_req, handle);
>> struct svc_xprt *xprt = dr->xprt;
>>
>> - if (too_many) {
>> + if (too_many || test_bit(XPT_DEAD, &xprt->xpt_flags)) {

If there were references held by the cache, they will get cleaned up here.

>> + dprintk("revisit cancelled\n");
>> svc_xprt_put(xprt);
>> kfree(dr);
>> return;
>>
>
> I see nothing that stops svc_delete_xprt() from setting XPT_DEAD after
> the above test in svc_revisit(), and before the test inside
> svc_xprt_enqueue(). What's preventing a race there?

Yep. I originally had a lock around the check for XPT_DEAD and the
deferred enqueue, but convinced myself (incorrectly) it wasn't
necessary. Erf.

Thanks, I'll repost a fix...

>
> Trond