2013-03-29 19:57:12

by Malahal Naineni

[permalink] [raw]
Subject: [PATCH] NFSv3/v2: Fix data corruption with NFS short reads.

This bug seems to be present in v2.6.37 or lower versions. The code was
re-organized in v2.6.38 that eliminated the bug. Current upstream code
doesn't have this bug. This may be applicable to some longterm releases!

Here are the bug details:

1. nfs_read_rpcsetup(), args.count and res.count are both set to the
actual number of bytes to be read. Let us assume that the request is
for 16K, so arg.count = res.count = 16K
2. nfs3_xdr_readres() conditionally sets res.count to to the actual
number of bytes read. This condition is true for the first response
as res.count was set to args.count before the first request. Let us
say the server returned only 4K bytes. res.count=4K
3. Another read request is sent for the remaining data. Note that
res.count is NOT updated. It is still set to the actual amount of
bytes we got in the first response. The client will send a READ
request for the remaining 12K.
4. Assume that the server gave all 12K bytes. We still think we got ony
4K bytes due to conditional update in nfs3_xdr_readres(). The client
sends further requests, if not EOF! If this response includes EOF, it
truncates pages beyond 4K+4K causing data corruption beyond 8K. The
corrupted data is filled with zeros!

Signed-off-by: Malahal Naineni <[email protected]>
---
fs/nfs/nfs2xdr.c | 3 +--
fs/nfs/nfs3xdr.c | 3 +--
2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c
index 5914a19..710ca56 100644
--- a/fs/nfs/nfs2xdr.c
+++ b/fs/nfs/nfs2xdr.c
@@ -287,8 +287,7 @@ nfs_xdr_readres(struct rpc_rqst *req, __be32 *p, struct nfs_readres *res)
}

dprintk("RPC: readres OK count %u\n", count);
- if (count < res->count)
- res->count = count;
+ res->count = count;

return count;
}
diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index f6cc60f..152a5e4 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -914,8 +914,7 @@ nfs3_xdr_readres(struct rpc_rqst *req, __be32 *p, struct nfs_readres *res)
res->eof = 0;
}

- if (count < res->count)
- res->count = count;
+ res->count = count;

return count;
}
--
1.7.11.7



2013-04-11 01:21:30

by NeilBrown

[permalink] [raw]
Subject: Re: NFSv3/v2: Fix data corruption with NFS short reads.

On Fri, 29 Mar 2013 19:57:06 -0000 malahal naineni <[email protected]> wrote:

> This bug seems to be present in v2.6.37 or lower versions. The code was
> re-organized in v2.6.38 that eliminated the bug. Current upstream code
> doesn't have this bug. This may be applicable to some longterm releases!
>
> Here are the bug details:
>
> 1. nfs_read_rpcsetup(), args.count and res.count are both set to the
> actual number of bytes to be read. Let us assume that the request is
> for 16K, so arg.count = res.count = 16K
> 2. nfs3_xdr_readres() conditionally sets res.count to to the actual
> number of bytes read. This condition is true for the first response
> as res.count was set to args.count before the first request. Let us
> say the server returned only 4K bytes. res.count=4K
> 3. Another read request is sent for the remaining data. Note that
> res.count is NOT updated. It is still set to the actual amount of
> bytes we got in the first response. The client will send a READ
> request for the remaining 12K.

This is looks like a real bug, but I think the "NOT" above is the best thing
to fix.
i.e. when another read request is set, res.count *SHOULD*BE* updated. That
makes it consistent with the original send, and consistency is good!

So this patch.

Index: linux-2.6.32-SLE11-SP1-LTSS/fs/nfs/read.c
===================================================================
--- linux-2.6.32-SLE11-SP1-LTSS.orig/fs/nfs/read.c 2013-03-20 16:24:31.426605189 +1100
+++ linux-2.6.32-SLE11-SP1-LTSS/fs/nfs/read.c 2013-04-11 11:19:57.670724540 +1000
@@ -368,6 +368,7 @@ static void nfs_readpage_retry(struct rp
argp->offset += resp->count;
argp->pgbase += resp->count;
argp->count -= resp->count;
+ resp->count = argp->count;
nfs4_restart_rpc(task, NFS_SERVER(data->inode)->nfs_client);
return;
out:


This would old affect clients with servers which would sometimes return
partial reads, and I don't think the Linux NFS server does. What server have
you seen this against?

NeilBrown



> 4. Assume that the server gave all 12K bytes. We still think we got ony
> 4K bytes due to conditional update in nfs3_xdr_readres(). The client
> sends further requests, if not EOF! If this response includes EOF, it
> truncates pages beyond 4K+4K causing data corruption beyond 8K. The
> corrupted data is filled with zeros!
>
> Signed-off-by: Malahal Naineni <[email protected]>
>
> ---
> fs/nfs/nfs2xdr.c | 3 +--
> fs/nfs/nfs3xdr.c | 3 +--
> 2 files changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c
> index 5914a19..710ca56 100644
> --- a/fs/nfs/nfs2xdr.c
> +++ b/fs/nfs/nfs2xdr.c
> @@ -287,8 +287,7 @@ nfs_xdr_readres(struct rpc_rqst *req, __be32 *p, struct nfs_readres *res)
> }
>
> dprintk("RPC: readres OK count %u\n", count);
> - if (count < res->count)
> - res->count = count;
> + res->count = count;
>
> return count;
> }
> diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
> index f6cc60f..152a5e4 100644
> --- a/fs/nfs/nfs3xdr.c
> +++ b/fs/nfs/nfs3xdr.c
> @@ -914,8 +914,7 @@ nfs3_xdr_readres(struct rpc_rqst *req, __be32 *p, struct nfs_readres *res)
> res->eof = 0;
> }
>
> - if (count < res->count)
> - res->count = count;
> + res->count = count;
>
> return count;
> }


Attachments:
signature.asc (828.00 B)

2013-04-11 03:55:11

by Malahal Naineni

[permalink] [raw]
Subject: Re: NFSv3/v2: Fix data corruption with NFS short reads.

NeilBrown [[email protected]] wrote:
> On Fri, 29 Mar 2013 19:57:06 -0000 malahal naineni <[email protected]> wrote:
>
> > This bug seems to be present in v2.6.37 or lower versions. The code was
> > re-organized in v2.6.38 that eliminated the bug. Current upstream code
> > doesn't have this bug. This may be applicable to some longterm releases!
> >
> > Here are the bug details:
> >
> > 1. nfs_read_rpcsetup(), args.count and res.count are both set to the
> > actual number of bytes to be read. Let us assume that the request is
> > for 16K, so arg.count = res.count = 16K
> > 2. nfs3_xdr_readres() conditionally sets res.count to to the actual
> > number of bytes read. This condition is true for the first response
> > as res.count was set to args.count before the first request. Let us
> > say the server returned only 4K bytes. res.count=4K
> > 3. Another read request is sent for the remaining data. Note that
> > res.count is NOT updated. It is still set to the actual amount of
> > bytes we got in the first response. The client will send a READ
> > request for the remaining 12K.
>
> This is looks like a real bug, but I think the "NOT" above is the best thing
> to fix.

No doubt, a real bug! Easily reproduced with an instrumented nfsd.

> i.e. when another read request is set, res.count *SHOULD*BE* updated. That
> makes it consistent with the original send, and consistency is good!

I thought about it, but the resp->count is NOT really used as far as I
know. And the current upstream code unconditionally sets the resp->count
in xdr function. So I chose the upstream method! I agree, your patch is
more consistent with the existing code.

> Index: linux-2.6.32-SLE11-SP1-LTSS/fs/nfs/read.c
> ===================================================================
> --- linux-2.6.32-SLE11-SP1-LTSS.orig/fs/nfs/read.c 2013-03-20 16:24:31.426605189 +1100
> +++ linux-2.6.32-SLE11-SP1-LTSS/fs/nfs/read.c 2013-04-11 11:19:57.670724540 +1000
> @@ -368,6 +368,7 @@ static void nfs_readpage_retry(struct rp
> argp->offset += resp->count;
> argp->pgbase += resp->count;
> argp->count -= resp->count;
> + resp->count = argp->count;
> nfs4_restart_rpc(task, NFS_SERVER(data->inode)->nfs_client);
> return;
> out:

This patch should fix the bug as well.

> This would old affect clients with servers which would sometimes
> return partial reads, and I don't think the Linux NFS server does.
> What server have you seen this against?

We came across under Ganesha development, and I was told the same thing
that Linux NFS server doesn't do this. I didn't bother to post then, but
now we saw the same thing with linux NFS server, so I decided to post it
now. Although linux NFS server doesn't in itself create short reads but
it just calls the underlying back-end file system and sends without a
short read "check" to NFS clients. In other words, the short read
behaviour actually depends on the back-end file system rather than linux
NFS server. We saw this bug with our GPFS file system in combination
with HSM -- request for reading data on tape would fail until it is
brought back to disk.

Regards, Malahal.