Hi-
Here's what I came up with to address the failure of Cthon basic test
six with NFSv4. The first patch is mere clean up that I noticed
while investigating the problem; the second addresses the problem.
I've applied your lockdep splat fix from yesterday, and have not seen
a recurrence of the splat. I did not have a 100% reproducer.
---
Chuck Lever (2):
NFS: NFSv4 readdir loses entries
NFS: Micro-optimize nfs4_decode_dirent()
fs/nfs/nfs4xdr.c | 9 +++------
1 files changed, 3 insertions(+), 6 deletions(-)
--
Chuck Lever
On Jan 28, 2011, at 1:45 PM, Trond Myklebust wrote:
> On Fri, 2011-01-28 at 13:42 -0500, Bryan Schumaker wrote:
>> On 01/28/2011 12:41 PM, Chuck Lever wrote:
>>> On recent 2.6.38-rc kernels, connectathon basic test 6 fails on
>>> NFSv4 mounts of OpenSolaris with something like:
>>>
>>>> ./test6: readdir
>>>> ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.12' dir entry, pass 0
>>>> ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.82' dir entry, pass 0
>>>> ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.164' dir entry, pass 0
>>>> ./test6: (/mnt/klimt/matisse.test) Test failed with 3 errors
>>>> basic tests failed
>>>> Tests failed, leaving /mnt/klimt mounted
>>>> [cel@matisse cthon04]$
>>>
>>> I narrowed the problem down to nfs4_decode_direct() reporting that the
>>
>> I'm guessing that should be nfs4_decode_dirent()?
>
> I've fixed that up in the version I applied to the 'bugfixes' branch.
Thanks. I make that typo nearly every time I type "decode_dirent".
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
On Fri, 2011-01-28 at 13:42 -0500, Bryan Schumaker wrote:
> On 01/28/2011 12:41 PM, Chuck Lever wrote:
> > On recent 2.6.38-rc kernels, connectathon basic test 6 fails on
> > NFSv4 mounts of OpenSolaris with something like:
> >
> >> ./test6: readdir
> >> ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.12' dir entry, pass 0
> >> ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.82' dir entry, pass 0
> >> ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.164' dir entry, pass 0
> >> ./test6: (/mnt/klimt/matisse.test) Test failed with 3 errors
> >> basic tests failed
> >> Tests failed, leaving /mnt/klimt mounted
> >> [cel@matisse cthon04]$
> >
> > I narrowed the problem down to nfs4_decode_direct() reporting that the
>
> I'm guessing that should be nfs4_decode_dirent()?
I've fixed that up in the version I applied to the 'bugfixes' branch.
--
Trond Myklebust
Linux NFS client maintainer
NetApp
[email protected]
http://www.netapp.com
Make the decoding of NFSv4 directory entries slightly more efficient
by:
1. Avoiding unnecessary byte swapping when checking XDR booleans,
and
2. Not bumping "p" when its value will be immediately replaced by
xdr_inline_decode()
This commit makes nfs4_decode_dirent() consistent with similar logic
in the other two decode_dirent() functions.
Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfs/nfs4xdr.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 2ab8e5c..009aef9 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -6086,11 +6086,11 @@ int nfs4_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry,
__be32 *p = xdr_inline_decode(xdr, 4);
if (unlikely(!p))
goto out_overflow;
- if (!ntohl(*p++)) {
+ if (*p == xdr_zero) {
p = xdr_inline_decode(xdr, 4);
if (unlikely(!p))
goto out_overflow;
- if (!ntohl(*p++))
+ if (*p == xdr_zero)
return -EAGAIN;
entry->eof = 1;
return -EBADCOOKIE;
@@ -6101,7 +6101,7 @@ int nfs4_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry,
goto out_overflow;
entry->prev_cookie = entry->cookie;
p = xdr_decode_hyper(p, &entry->cookie);
- entry->len = ntohl(*p++);
+ entry->len = be32_to_cpup(p);
p = xdr_inline_decode(xdr, entry->len);
if (unlikely(!p))
On 01/28/2011 12:41 PM, Chuck Lever wrote:
> On recent 2.6.38-rc kernels, connectathon basic test 6 fails on
> NFSv4 mounts of OpenSolaris with something like:
>
>> ./test6: readdir
>> ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.12' dir entry, pass 0
>> ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.82' dir entry, pass 0
>> ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.164' dir entry, pass 0
>> ./test6: (/mnt/klimt/matisse.test) Test failed with 3 errors
>> basic tests failed
>> Tests failed, leaving /mnt/klimt mounted
>> [cel@matisse cthon04]$
>
> I narrowed the problem down to nfs4_decode_direct() reporting that the
I'm guessing that should be nfs4_decode_dirent()?
> decode buffer had overflowed while decoding the entries for those
> missing files.
>
> verify_attr_len() assumes both it's pointer arguments reside on the
> same page. When these arguments point to locations on two different
> pages, verify_attr_len() can report false errors. This can happen now
> that a large NFSv4 readdir result can span pages.
>
> We have reasonably good checking in nfs4_decode_dirent() anyway, so
> it should be safe to simply remove the extra checking.
>
> At a guess, this was introduced by commit 6650239a, "NFS: Don't use
> vm_map_ram() in readdir".
>
> Cc: [email protected] [2.6.37]
> Signed-off-by: Chuck Lever <[email protected]>
> ---
>
> fs/nfs/nfs4xdr.c | 3 ---
> 1 files changed, 0 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> index 009aef9..4e2c168 100644
> --- a/fs/nfs/nfs4xdr.c
> +++ b/fs/nfs/nfs4xdr.c
> @@ -6132,9 +6132,6 @@ int nfs4_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry,
> if (entry->fattr->valid & NFS_ATTR_FATTR_TYPE)
> entry->d_type = nfs_umode_to_dtype(entry->fattr->mode);
>
> - if (verify_attr_len(xdr, p, len) < 0)
> - goto out_overflow;
> -
> return 0;
>
> out_overflow:
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On recent 2.6.38-rc kernels, connectathon basic test 6 fails on
NFSv4 mounts of OpenSolaris with something like:
> ./test6: readdir
> ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.12' dir entry, pass 0
> ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.82' dir entry, pass 0
> ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.164' dir entry, pass 0
> ./test6: (/mnt/klimt/matisse.test) Test failed with 3 errors
> basic tests failed
> Tests failed, leaving /mnt/klimt mounted
> [cel@matisse cthon04]$
I narrowed the problem down to nfs4_decode_direct() reporting that the
decode buffer had overflowed while decoding the entries for those
missing files.
verify_attr_len() assumes both it's pointer arguments reside on the
same page. When these arguments point to locations on two different
pages, verify_attr_len() can report false errors. This can happen now
that a large NFSv4 readdir result can span pages.
We have reasonably good checking in nfs4_decode_dirent() anyway, so
it should be safe to simply remove the extra checking.
At a guess, this was introduced by commit 6650239a, "NFS: Don't use
vm_map_ram() in readdir".
Cc: [email protected] [2.6.37]
Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfs/nfs4xdr.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 009aef9..4e2c168 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -6132,9 +6132,6 @@ int nfs4_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry,
if (entry->fattr->valid & NFS_ATTR_FATTR_TYPE)
entry->d_type = nfs_umode_to_dtype(entry->fattr->mode);
- if (verify_attr_len(xdr, p, len) < 0)
- goto out_overflow;
-
return 0;
out_overflow:
On Fri, 2011-01-28 at 12:41 -0500, Chuck Lever wrote:
> On recent 2.6.38-rc kernels, connectathon basic test 6 fails on
> NFSv4 mounts of OpenSolaris with something like:
>
> > ./test6: readdir
> > ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.12' dir entry, pass 0
> > ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.82' dir entry, pass 0
> > ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.164' dir entry, pass 0
> > ./test6: (/mnt/klimt/matisse.test) Test failed with 3 errors
> > basic tests failed
> > Tests failed, leaving /mnt/klimt mounted
> > [cel@matisse cthon04]$
>
> I narrowed the problem down to nfs4_decode_direct() reporting that the
> decode buffer had overflowed while decoding the entries for those
> missing files.
>
> verify_attr_len() assumes both it's pointer arguments reside on the
> same page. When these arguments point to locations on two different
> pages, verify_attr_len() can report false errors. This can happen now
> that a large NFSv4 readdir result can span pages.
>
> We have reasonably good checking in nfs4_decode_dirent() anyway, so
> it should be safe to simply remove the extra checking.
>
> At a guess, this was introduced by commit 6650239a, "NFS: Don't use
> vm_map_ram() in readdir".
>
> Cc: [email protected] [2.6.37]
> Signed-off-by: Chuck Lever <[email protected]>
> ---
>
> fs/nfs/nfs4xdr.c | 3 ---
> 1 files changed, 0 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> index 009aef9..4e2c168 100644
> --- a/fs/nfs/nfs4xdr.c
> +++ b/fs/nfs/nfs4xdr.c
> @@ -6132,9 +6132,6 @@ int nfs4_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry,
> if (entry->fattr->valid & NFS_ATTR_FATTR_TYPE)
> entry->d_type = nfs_umode_to_dtype(entry->fattr->mode);
>
> - if (verify_attr_len(xdr, p, len) < 0)
> - goto out_overflow;
> -
> return 0;
>
> out_overflow:
>
I'm assuming that this is 'Cc: [email protected] [2.6.37]'?
--
Trond Myklebust
Linux NFS client maintainer
NetApp
[email protected]
http://www.netapp.com
On Fri, 2011-01-28 at 13:34 -0500, Trond Myklebust wrote:
> On Fri, 2011-01-28 at 12:41 -0500, Chuck Lever wrote:
> > On recent 2.6.38-rc kernels, connectathon basic test 6 fails on
> > NFSv4 mounts of OpenSolaris with something like:
> >
> > > ./test6: readdir
> > > ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.12' dir entry, pass 0
> > > ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.82' dir entry, pass 0
> > > ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.164' dir entry, pass 0
> > > ./test6: (/mnt/klimt/matisse.test) Test failed with 3 errors
> > > basic tests failed
> > > Tests failed, leaving /mnt/klimt mounted
> > > [cel@matisse cthon04]$
> >
> > I narrowed the problem down to nfs4_decode_direct() reporting that the
> > decode buffer had overflowed while decoding the entries for those
> > missing files.
> >
> > verify_attr_len() assumes both it's pointer arguments reside on the
> > same page. When these arguments point to locations on two different
> > pages, verify_attr_len() can report false errors. This can happen now
> > that a large NFSv4 readdir result can span pages.
> >
> > We have reasonably good checking in nfs4_decode_dirent() anyway, so
> > it should be safe to simply remove the extra checking.
> >
> > At a guess, this was introduced by commit 6650239a, "NFS: Don't use
> > vm_map_ram() in readdir".
> >
> > Cc: [email protected] [2.6.37]
> > Signed-off-by: Chuck Lever <[email protected]>
> > ---
> >
> > fs/nfs/nfs4xdr.c | 3 ---
> > 1 files changed, 0 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> > index 009aef9..4e2c168 100644
> > --- a/fs/nfs/nfs4xdr.c
> > +++ b/fs/nfs/nfs4xdr.c
> > @@ -6132,9 +6132,6 @@ int nfs4_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry,
> > if (entry->fattr->valid & NFS_ATTR_FATTR_TYPE)
> > entry->d_type = nfs_umode_to_dtype(entry->fattr->mode);
> >
> > - if (verify_attr_len(xdr, p, len) < 0)
> > - goto out_overflow;
> > -
> > return 0;
> >
> > out_overflow:
> >
>
> I'm assuming that this is 'Cc: [email protected] [2.6.37]'?
Duh... I didn't read the s.o.b.s before replying. Sorry...
--
Trond Myklebust
Linux NFS client maintainer
NetApp
[email protected]
http://www.netapp.com