LinuxLists.cc - NFS Kernel Bug

2014-09-18 13:07:38

Subject: NFS Kernel Bug

Good morning!

I believe we have found a bug in the NFS kernel area. The "bug" is a
leak of a file handle where the NFS client never tells the server to
close the file. The problem is very similar to one we had reported and
got a fix for previously. We are using that patch, but ran in to another
case where the client sends out an OPEN_DOWNGRADE but never sends a CLOSE.

Attached is a simple c program that we have been able to reproduce the
bug with, along with a packet capture of what we see on the wire.

To reproduce the bug:
-compile the c code
-execute the c code with:

./test ; cat testfile3 > /dev/nul

-now if we try to remove the file we get a file in use error (server is
using mandatory locking)

Things to note:

-if you just run the program without the immediate cat'ing of the file,
the bug does not happen
suggesting a timing issue
-If you alter the program so the code mimics the cat of the file, the
bug does not happen (ie, add an open, read file, close to the code).
-If you run the program as described above, and then run it again
without the "; cat testfile3 > /dev/nul", the kernel squeaks out the
file close to the server when the code does the close.

The attached packet capture is us doing:

./test ; cat testfile3 > /dev/null
rm testfile3
./test
rm testfile3

where we are denied the rm the first time, but not the second.

Thanks
James

Attachments:

test.c (450.00 B)
nfsbug.zip (5.96 kB)
Download all attachments

2014-09-19 17:20:10

by Ken Hahn

[permalink] [raw]

Subject: Re: NFS Kernel Bug

Hello,

I work with [email protected].

On 09/18/2014 11:00 AM, Trond Myklebust wrote:
>>
>> Argh. This is a situation where the client shouldn't have called
>> OPEN_DOWNGRADE, but should have done a CLOSE. The issue is that the
>> client opens the file with OPEN4_SHARE_ACCESS_BOTH, so it is not
>> allowed to downgrade to OPEN4_SHARE_ACCESS_READ. Instead it should
>> have closed the file, and then used the delegation...
>>
>
> Does the following patch help (and does it still fix your original
> bugreport)?

I tried it out, patching the Debian 3.14.15 kernel.. had to alter the
patch a tiny bit, but I don't think I messed up any real functionality.

Indeed it passes the old test, and avoids the new bug too.

Thanks!

-Ken Hahn

Here's the altered patch for what it is worth (for anyone else using the
current debian backports kernel):

--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2560,20 +2560,22 @@ static void nfs4_close_prepare(struct rpc_task
*task, void *data)
is_rdwr = test_bit(NFS_O_RDWR_STATE, &state->flags);
is_rdonly = test_bit(NFS_O_RDONLY_STATE, &state->flags);
is_wronly = test_bit(NFS_O_WRONLY_STATE, &state->flags);
- if (is_rdonly || is_rdwr)
- calldata->arg.fmode |= FMODE_READ;
- if (is_wronly || is_rdwr)
- calldata->arg.fmode |= FMODE_WRITE;
+ calldata->arg.fmode = 0;
if (state->n_rdwr == 0) {
- if (state->n_rdonly == 0) {
- call_close |= is_rdonly || is_rdwr;
- calldata->arg.fmode &= ~FMODE_READ;
- }
- if (state->n_wronly == 0) {
- call_close |= is_wronly || is_rdwr;
- calldata->arg.fmode &= ~FMODE_WRITE;
- }
- }
+ if (state->n_rdonly == 0)
+ call_close |= is_rdonly;
+ else if (is_rdonly)
+ calldata->arg.fmode |= FMODE_READ;
+ if (state->n_wronly == 0)
+ call_close |= is_wronly;
+ else if (is_wronly)
+ calldata->arg.fmode |= FMODE_WRITE;
+ } else if (is_rdwr)
+ calldata->arg.fmode |= FMODE_READ|FMODE_WRITE;
+
+ if (calldata->arg.fmode == 0)
+ call_close |= is_rdwr;
+
if (!nfs4_valid_open_stateid(state))
call_close = 0;
spin_unlock(&state->owner->so_lock);
--
1.9.3

2014-09-18 14:31:49

by Trond Myklebust

[permalink] [raw]

Subject: Re: NFS Kernel Bug

On Thu, Sep 18, 2014 at 9:02 AM, James Drews <[email protected]> wrote:
> Good morning!
>
> I believe we have found a bug in the NFS kernel area. The "bug" is a leak
> of a file handle where the NFS client never tells the server to close the
> file. The problem is very similar to one we had reported and got a fix for
> previously. We are using that patch, but ran in to another case where the
> client sends out an OPEN_DOWNGRADE but never sends a CLOSE.
>
> Attached is a simple c program that we have been able to reproduce the bug
> with, along with a packet capture of what we see on the wire.
>
> To reproduce the bug:
> -compile the c code
> -execute the c code with:
>
> ./test ; cat testfile3 > /dev/nul
>
> -now if we try to remove the file we get a file in use error (server is
> using mandatory locking)
>
> Things to note:
>
> -if you just run the program without the immediate cat'ing of the file, the
> bug does not happen
> suggesting a timing issue
> -If you alter the program so the code mimics the cat of the file, the bug
> does not happen (ie, add an open, read file, close to the code).
> -If you run the program as described above, and then run it again without
> the "; cat testfile3 > /dev/nul", the kernel squeaks out the file close to
> the server when the code does the close.
>
> The attached packet capture is us doing:
>
> ./test ; cat testfile3 > /dev/null
> rm testfile3
> ./test
> rm testfile3
>
> where we are denied the rm the first time, but not the second.
>

Argh. This is a situation where the client shouldn't have called
OPEN_DOWNGRADE, but should have done a CLOSE. The issue is that the
client opens the file with OPEN4_SHARE_ACCESS_BOTH, so it is not
allowed to downgrade to OPEN4_SHARE_ACCESS_READ. Instead it should
have closed the file, and then used the delegation...

--
Trond Myklebust

Linux NFS client maintainer, PrimaryData

[email protected]

2014-09-18 16:00:24

by Trond Myklebust

[permalink] [raw]

Subject: Re: NFS Kernel Bug

On Thu, 2014-09-18 at 10:31 -0400, Trond Myklebust wrote:
> On Thu, Sep 18, 2014 at 9:02 AM, James Drews <[email protected]> wrote:
> > Good morning!
> >
> > I believe we have found a bug in the NFS kernel area. The "bug" is a leak
> > of a file handle where the NFS client never tells the server to close the
> > file. The problem is very similar to one we had reported and got a fix for
> > previously. We are using that patch, but ran in to another case where the
> > client sends out an OPEN_DOWNGRADE but never sends a CLOSE.
> >
> > Attached is a simple c program that we have been able to reproduce the bug
> > with, along with a packet capture of what we see on the wire.
> >
> > To reproduce the bug:
> > -compile the c code
> > -execute the c code with:
> >
> > ./test ; cat testfile3 > /dev/nul
> >
> > -now if we try to remove the file we get a file in use error (server is
> > using mandatory locking)
> >
> > Things to note:
> >
> > -if you just run the program without the immediate cat'ing of the file, the
> > bug does not happen
> > suggesting a timing issue
> > -If you alter the program so the code mimics the cat of the file, the bug
> > does not happen (ie, add an open, read file, close to the code).
> > -If you run the program as described above, and then run it again without
> > the "; cat testfile3 > /dev/nul", the kernel squeaks out the file close to
> > the server when the code does the close.
> >
> > The attached packet capture is us doing:
> >
> > ./test ; cat testfile3 > /dev/null
> > rm testfile3
> > ./test
> > rm testfile3
> >
> > where we are denied the rm the first time, but not the second.
> >
>
> Argh. This is a situation where the client shouldn't have called
> OPEN_DOWNGRADE, but should have done a CLOSE. The issue is that the
> client opens the file with OPEN4_SHARE_ACCESS_BOTH, so it is not
> allowed to downgrade to OPEN4_SHARE_ACCESS_READ. Instead it should
> have closed the file, and then used the delegation...
>

Does the following patch help (and does it still fix your original
bugreport)?

Cheers
Trond
8<----------------------------------------------------------------
>From d5f56bc67514a290032ff063d837179853231acf Mon Sep 17 00:00:00 2001
From: Trond Myklebust <[email protected]>
Date: Thu, 18 Sep 2014 11:51:32 -0400
Subject: [PATCH] NFSv4: Fix another bug in the close/open_downgrade code

James Drew reports another bug whereby the NFS client is now sending
an OPEN_DOWNGRADE in a situation where it should really have sent a
CLOSE: the client is opening the file for O_RDWR, but then trying to
do a downgrade to O_RDONLY, which is not allowed by the NFSv4 spec.

Reported-by: James Drews <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Fixes: aee7af356e15 (NFSv4: Fix problems with close in the presence...)
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/nfs4proc.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index ac2dd953fc18..6ca0c8e7a945 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2618,23 +2618,23 @@ static void nfs4_close_prepare(struct rpc_task *task, void *data)
is_rdwr = test_bit(NFS_O_RDWR_STATE, &state->flags);
is_rdonly = test_bit(NFS_O_RDONLY_STATE, &state->flags);
is_wronly = test_bit(NFS_O_WRONLY_STATE, &state->flags);
- /* Calculate the current open share mode */
- calldata->arg.fmode = 0;
- if (is_rdonly || is_rdwr)
- calldata->arg.fmode |= FMODE_READ;
- if (is_wronly || is_rdwr)
- calldata->arg.fmode |= FMODE_WRITE;
/* Calculate the change in open mode */
+ calldata->arg.fmode = 0;
if (state->n_rdwr == 0) {
- if (state->n_rdonly == 0) {
- call_close |= is_rdonly || is_rdwr;
- calldata->arg.fmode &= ~FMODE_READ;
- }
- if (state->n_wronly == 0) {
- call_close |= is_wronly || is_rdwr;
- calldata->arg.fmode &= ~FMODE_WRITE;
- }
- }
+ if (state->n_rdonly == 0)
+ call_close |= is_rdonly;
+ else if (is_rdonly)
+ calldata->arg.fmode |= FMODE_READ;
+ if (state->n_wronly == 0)
+ call_close |= is_wronly;
+ else if (is_wronly)
+ calldata->arg.fmode |= FMODE_WRITE;
+ } else if (is_rdwr)
+ calldata->arg.fmode |= FMODE_READ|FMODE_WRITE;
+
+ if (calldata->arg.fmode == 0)
+ call_close |= is_rdwr;
+
if (!nfs4_valid_open_stateid(state))
call_close = 0;
spin_unlock(&state->owner->so_lock);
--
1.9.3