2006-08-09 18:14:51

by Wendy Cheng

[permalink] [raw]
Subject: [PATCH] fix recursive nlm_file_mutex deadlock

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


Attachments:
gfs_nlm_deadlock.patch (388.00 B)
(No filename) (373.00 B)
(No filename) (140.00 B)
Download all attachments

2006-08-09 18:33:43

by Wendy Cheng

[permalink] [raw]
Subject: Re: [PATCH] fix recursive nlm_file_mutex deadlock

Wendy Cheng wrote:

>
> The attached patch seems to fix the issue - it skips (defers) the file
> removal. Eventually either nlm_gc_hosts (some time later when client
> is unmonitored) or nlmsvc_traverse_files will finish the clean up.
> Note that this is a 10-minutes work - not sure its ramification at
> this moment. Take a look ?
>
Well, I never really understand what this "signed-off" line is all
about. But if it is required, here it is:

Signed-off-by: S. Wendy Cheng ([email protected])


>
>------------------------------------------------------------------------
>
>--- linux-2/fs/lockd/svclock.c 2006-08-08 10:20:16.000000000 -0400
>+++ linux/fs/lockd/svclock.c 2006-08-09 10:28:35.000000000 -0400
>@@ -264,7 +264,9 @@ static void nlmsvc_free_block(struct kre
>
> nlmsvc_freegrantargs(block->b_call);
> nlm_release_call(block->b_call);
>- nlm_release_file(block->b_file);
>+ down(&file->f_sema);
>+ file->f_count--;
>+ up(&file->f_sema);
> kfree(block);
> }
>
>
>

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-08-09 21:45:48

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] fix recursive nlm_file_mutex deadlock

On Wed, 2006-08-09 at 14:13 -0400, Wendy Cheng wrote:
> I was testing NLM failover patches this morning and found the command
> hangs. Look like nlm_traverse_files(), where it grabs nlm_file_mutex
> early in the call, will have a chance to call nlm_release_file() via
> nlmsvc_free_block() inside kref_put(). The nlm_release_file() wants
> nlm_file_mutex too - this would generate a deadlock as the following:
>
> dhcp59-234 kernel: Call Trace:
> [<c02dd749>] __mutex_lock_slowpath+0x4c/0x7e
> [<c02dd78a>] .text.lock.mutex+0xf/0x14
> [<f8afeacd>] nlm_release_file+0x2b/0xdf [lockd]
> [<f8afda90>] nlmsvc_free_block+0x8c/0x9d [lockd]
> [<f8afda04>] nlmsvc_free_block+0x0/0x9d [lockd]
> [<c01be98d>] kref_put+0x4e/0x58
> [<f8afd175>] nlmsvc_traverse_blocks+0xaf/0xc6 [lockd]
> [<f8afe960>] nlm_traverse_files+0x108/0x1cd [lockd]
>
> The attached patch seems to fix the issue - it skips (defers) the file
> removal. Eventually either nlm_gc_hosts (some time later when client is
> unmonitored) or nlmsvc_traverse_files will finish the clean up. Note
> that this is a 10-minutes work - not sure its ramification at this
> moment. Take a look ?
>
> -- Wendy
>
> plain text document attachment (gfs_nlm_deadlock.patch)
> --- linux-2/fs/lockd/svclock.c 2006-08-08 10:20:16.000000000 -0400
> +++ linux/fs/lockd/svclock.c 2006-08-09 10:28:35.000000000 -0400
> @@ -264,7 +264,9 @@ static void nlmsvc_free_block(struct kre
>
> nlmsvc_freegrantargs(block->b_call);
> nlm_release_call(block->b_call);
> - nlm_release_file(block->b_file);
> + down(&file->f_sema);
> + file->f_count--;
> + up(&file->f_sema);
> kfree(block);
> }

Vetoed. The block holds a reference to the file. It _must_ call
nlm_release_file() in order to release that reference. It is in any case
a bug to grab file->f_sema without holding a reference to the file.

I suspect, rather, that the problem is due to nlmsvc_create_block()
incrementing file->f_count without holding the nlm_file_mutex. If we
convert it to an atomic_t instead, then that problem should be solved.

aside: Note also that we want to get rid of all that mark and sweep
braindamage in nlm_traverse_*() with all the silly counting of f_lock,
f_blocks, f_shares,.... and replace those variables with proper
references to the struct nlm_file by the locks, blocks (is already the
case?), and shares.

Cheers,
Trond


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-08-09 22:41:22

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] fix recursive nlm_file_mutex deadlock

On Wed, 2006-08-09 at 18:07 -0400, Wendy Cheng wrote:

>
> Disagree ! :)
>
> The whole thing is about deadlock, not about reference count. Look at
> the logic ... nlm_traverse_files grabs the nlm_file_mutex, then comes
> down to nlm_release_file where it tries to get nlm_file_mutex lock
> again. I'll need to run now - we can discuss this tomorrow morning.
> Please re-read the issue and we'll discuss later.

OK. The we need to fix the borken mark/sweep crap. Trying to circumvent
refcounting rules is unacceptable.

Trond


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-08-09 22:15:47

by Wendy Cheng

[permalink] [raw]
Subject: Re: [PATCH] fix recursive nlm_file_mutex deadlock

Trond Myklebust wrote:

>On Wed, 2006-08-09 at 14:13 -0400, Wendy Cheng wrote:
>
>
>>I was testing NLM failover patches this morning and found the command
>>hangs. Look like nlm_traverse_files(), where it grabs nlm_file_mutex
>>early in the call, will have a chance to call nlm_release_file() via
>>nlmsvc_free_block() inside kref_put(). The nlm_release_file() wants
>>nlm_file_mutex too - this would generate a deadlock as the following:
>>
>>dhcp59-234 kernel: Call Trace:
>>[<c02dd749>] __mutex_lock_slowpath+0x4c/0x7e
>>[<c02dd78a>] .text.lock.mutex+0xf/0x14
>>[<f8afeacd>] nlm_release_file+0x2b/0xdf [lockd]
>>[<f8afda90>] nlmsvc_free_block+0x8c/0x9d [lockd]
>>[<f8afda04>] nlmsvc_free_block+0x0/0x9d [lockd]
>>[<c01be98d>] kref_put+0x4e/0x58
>>[<f8afd175>] nlmsvc_traverse_blocks+0xaf/0xc6 [lockd]
>>[<f8afe960>] nlm_traverse_files+0x108/0x1cd [lockd]
>>
>>The attached patch seems to fix the issue - it skips (defers) the file
>>removal. Eventually either nlm_gc_hosts (some time later when client is
>>unmonitored) or nlmsvc_traverse_files will finish the clean up. Note
>>that this is a 10-minutes work - not sure its ramification at this
>>moment. Take a look ?
>>
>>-- Wendy
>>
>>plain text document attachment (gfs_nlm_deadlock.patch)
>>--- linux-2/fs/lockd/svclock.c 2006-08-08 10:20:16.000000000 -0400
>>+++ linux/fs/lockd/svclock.c 2006-08-09 10:28:35.000000000 -0400
>>@@ -264,7 +264,9 @@ static void nlmsvc_free_block(struct kre
>>
>> nlmsvc_freegrantargs(block->b_call);
>> nlm_release_call(block->b_call);
>>- nlm_release_file(block->b_file);
>>+ down(&file->f_sema);
>>+ file->f_count--;
>>+ up(&file->f_sema);
>> kfree(block);
>> }
>>
>>
>
>Vetoed. The block holds a reference to the file. It _must_ call
>nlm_release_file() in order to release that reference. It is in any case
>a bug to grab file->f_sema without holding a reference to the file.
>
>I suspect, rather, that the problem is due to nlmsvc_create_block()
>incrementing file->f_count without holding the nlm_file_mutex. If we
>convert it to an atomic_t instead, then that problem should be solved.
>
>

Disagree ! :)

The whole thing is about deadlock, not about reference count. Look at
the logic ... nlm_traverse_files grabs the nlm_file_mutex, then comes
down to nlm_release_file where it tries to get nlm_file_mutex lock
again. I'll need to run now - we can discuss this tomorrow morning.
Please re-read the issue and we'll discuss later.

-- Wendy

>aside: Note also that we want to get rid of all that mark and sweep
>braindamage in nlm_traverse_*() with all the silly counting of f_lock,
>f_blocks, f_shares,.... and replace those variables with proper
>references to the struct nlm_file by the locks, blocks (is already the
>case?), and shares.
>
>Cheers,
> Trond
>
>
>

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-08-10 15:40:28

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] fix recursive nlm_file_mutex deadlock

On Thu, 2006-08-10 at 11:24 -0400, Wendy Cheng wrote:

> We simply can't call nlm_release_file() as long as it requires
> nlm_file_mutex (no matter where you move the locking line) within
> nlm_traverse_files() code path. I'm ok with atomic variable though.

As long as we can guarantee that file->f_count doesn't drop to zero (and
we can make that guarantee in nlm_traverse_file()), then the code is
correct.

Trond


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-08-10 15:16:35

by Wendy Cheng

[permalink] [raw]
Subject: Re: [PATCH] fix recursive nlm_file_mutex deadlock

On Wed, 2006-08-09 at 19:57 -0400, Trond Myklebust wrote:

> Here you go. Assuming this compiles, it ought to prevent the recursive
> call to mutex_lock(&nlm_file_mutex), _and_ speed up nlm_release_file()
> in the case where we know we're not going to free up the file.

Look like this issue can't be patched by any 10-minutes code (that
applies to both of us).

We simply can't call nlm_release_file() as long as it requires
nlm_file_mutex (no matter where you move the locking line) within
nlm_traverse_files() code path. I'm ok with atomic variable though.

I always think nlm_file_mutex is used to protect nlm_files *list* (which
is a list of individual nlm_file) and file->f_sema is used to protect
the *individual* nlm_file. But I can be wrong (and will go to read the
code more).

In short, you patch will deadlock again.

-- Wendy

>
> Cheers,
> Trond
>
> plain text document attachment (linux-2.6.18-009-
> make_nlm_file_f_count_atomic.dif)
> LOCKD: Make nlm_file->f_count an atomic
>
> From: Trond Myklebust <[email protected]>
>
> This allows us to grab the mutex in nlm_release_file only if we suspect
> that the file needs to be freed.
>
> The latter fixes a deadlock in which nlm_traverse_files() ends up calling
> mutex_lock(&nlm_file_mutex) recursively (the second time being by means of
> the call to nlm_release_file() in nlmsvc_free_block()).
>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
>
> fs/lockd/svclock.c | 2 +-
> fs/lockd/svcsubs.c | 26 +++++++++++++++-----------
> include/linux/lockd/lockd.h | 2 +-
> 3 files changed, 17 insertions(+), 13 deletions(-)
>
> diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
> index c9d4197..57dcb2d 100644
> --- a/fs/lockd/svclock.c
> +++ b/fs/lockd/svclock.c
> @@ -207,7 +207,7 @@ nlmsvc_create_block(struct svc_rqst *rqs
> block->b_daemon = rqstp->rq_server;
> block->b_host = host;
> block->b_file = file;
> - file->f_count++;
> + atomic_inc(&file->f_count);
>
> /* Add to file's list of blocks */
> block->b_fnext = file->f_blocks;
> diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c
> index 2a4df9b..dfca9e7 100644
> --- a/fs/lockd/svcsubs.c
> +++ b/fs/lockd/svcsubs.c
> @@ -94,8 +94,10 @@ nlm_lookup_file(struct svc_rqst *rqstp,
> mutex_lock(&nlm_file_mutex);
>
> for (file = nlm_files[hash]; file; file = file->f_next)
> - if (!nfs_compare_fh(&file->f_handle, f))
> + if (!nfs_compare_fh(&file->f_handle, f)) {
> + atomic_inc(&file->f_count);
> goto found;
> + }
>
> nlm_debug_print_fh("creating file for", f);
>
> @@ -108,6 +110,7 @@ nlm_lookup_file(struct svc_rqst *rqstp,
> memcpy(&file->f_handle, f, sizeof(struct nfs_fh));
> file->f_hash = hash;
> init_MUTEX(&file->f_sema);
> + atomic_set(&file->f_count, 1);
>
> /* Open the file. Note that this must not sleep for too long, else
> * we would lock up lockd:-) So no NFS re-exports, folks.
> @@ -124,9 +127,8 @@ nlm_lookup_file(struct svc_rqst *rqstp,
> nlm_files[hash] = file;
>
> found:
> - dprintk("lockd: found file %p (count %d)\n", file, file->f_count);
> + dprintk("lockd: found file %p (count %d)\n", file, atomic_read(&file->f_count));
> *result = file;
> - file->f_count++;
> nfserr = 0;
>
> out_unlock:
> @@ -221,7 +223,7 @@ nlm_inspect_file(struct nlm_host *host,
> {
> if (action == NLM_ACT_CHECK) {
> /* Fast path for mark and sweep garbage collection */
> - if (file->f_count || file->f_blocks || file->f_shares)
> + if (atomic_read(&file->f_count) || file->f_blocks || file->f_shares)
> return 1;
> } else {
> nlmsvc_traverse_blocks(host, file, action);
> @@ -252,7 +254,7 @@ nlm_traverse_files(struct nlm_host *host
>
> /* No more references to this file. Let go of it. */
> if (!file->f_blocks && !file->f_locks
> - && !file->f_shares && !file->f_count) {
> + && !file->f_shares && !atomic_read(&file->f_count)) {
> *fp = file->f_next;
> nlmsvc_ops->fclose(file->f_file);
> kfree(file);
> @@ -278,18 +280,20 @@ void
> nlm_release_file(struct nlm_file *file)
> {
> dprintk("lockd: nlm_release_file(%p, ct = %d)\n",
> - file, file->f_count);
> -
> - /* Lock file table */
> - mutex_lock(&nlm_file_mutex);
> + file, atomic_read(&file->f_count));
>
> /* If there are no more locks etc, delete the file */
> - if(--file->f_count == 0) {
> + if (atomic_dec_and_test(&file->f_count)) {
> + /*
> + * Note! nlm_inspect_file() will check file->f_count
> + * to see if we raced while taking nlm_file_mutex
> + */
> + mutex_lock(&nlm_file_mutex);
> if(!nlm_inspect_file(NULL, file, NLM_ACT_CHECK))
> nlm_delete_file(file);
> + mutex_unlock(&nlm_file_mutex);
> }
>
> - mutex_unlock(&nlm_file_mutex);
> }
>
> /*
> diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h
> index 0d92c46..59b63a1 100644
> --- a/include/linux/lockd/lockd.h
> +++ b/include/linux/lockd/lockd.h
> @@ -102,7 +102,7 @@ struct nlm_file {
> struct nlm_share * f_shares; /* DOS shares */
> struct nlm_block * f_blocks; /* blocked locks */
> unsigned int f_locks; /* guesstimate # of locks */
> - unsigned int f_count; /* reference count */
> + atomic_t f_count; /* reference count */
> struct semaphore f_sema; /* avoid concurrent access */
> int f_hash; /* hash of f_handle */
> };
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________ NFS maillist - [email protected] https://lists.sourceforge.net/lists/listinfo/nfs


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs