2011-03-10 18:54:35

by Dr Andrew John Hughes

[permalink] [raw]
Subject: NFS regression in 2.6.37.1 (current stable)

[Please CC me on responses as I'm not subscribed]

Hi,

I seem to have uncovered a regression in the NFS code between 2.6.37 and 2.6.37.1
caused by this changeset:

commit 55ea499d60aefa3d03a77fc8590c26b5881faa92
Author: Trond Myklebust <[email protected]>
Date: Sat Jan 8 17:45:38 2011 -0500
NFS: Don't use vm_map_ram() in readdir
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.37.y.git;a=commit;h=6650239a4b01077e80d5a4468562756d77afaa59

With this change applied, copying of files between NFS and non-NFS
mounts seems to be broken. The easiest way I've found to replicate
this myself is to use a VCS to do a clone of a tree on a NFS mount to
a directory on a non-NFS mount. I used Mercurial, as I had Mercurial
trees to hand from work on IcedTea, but I assume doing it with a git
tree such as the linux tree would also work. The idea is to do
something which involves copying over a bunch of directories and
checking the result is readable.

$ hg clone $HOME/projects/openjdk/icedtea6-hg
destination directory: icedtea6-hg
updating to branch default
abort:
data/contrib/templater/hotspot/src/cpu/CPU/vm/bytecodeInterpreter_CPU.inline.hpp.i@16d04ce16287:
no match found!

In the above, $HOME is an NFS mount and $PWD is a local reiserfs
partition. I initially hit failures doing builds with source on $HOME
and the build directory on a local reiserfs partition. In that
scenario, it would fail as not being able to find files that should
have been copied over.

Reverting the changeset fixes the issue. 2.6.37.2 still has the bug.
I haven't checked 2.6.37.3 yet but I didn't see any NFS changes in there.
--
Andrew :)

Free Java Software Engineer
Red Hat, Inc. (http://www.redhat.com)


2011-03-10 18:59:53

by Marc Kleine-Budde

[permalink] [raw]
Subject: Re: NFS regression in 2.6.37.1 (current stable)

Hello Andrew,

On 03/10/2011 07:53 PM, Dr Andrew John Hughes wrote:
> [Please CC me on responses as I'm not subscribed]
>
> Hi,
>
> I seem to have uncovered a regression in the NFS code between 2.6.37 and 2.6.37.1
> caused by this changeset:
>
> commit 55ea499d60aefa3d03a77fc8590c26b5881faa92
> Author: Trond Myklebust <[email protected]>
> Date: Sat Jan 8 17:45:38 2011 -0500
> NFS: Don't use vm_map_ram() in readdir
> http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.37.y.git;a=commit;h=6650239a4b01077e80d5a4468562756d77afaa59
>
> With this change applied, copying of files between NFS and non-NFS
> mounts seems to be broken. The easiest way I've found to replicate
> this myself is to use a VCS to do a clone of a tree on a NFS mount to
> a directory on a non-NFS mount. I used Mercurial, as I had Mercurial
> trees to hand from work on IcedTea, but I assume doing it with a git
> tree such as the linux tree would also work. The idea is to do
> something which involves copying over a bunch of directories and
> checking the result is readable.
>
> $ hg clone $HOME/projects/openjdk/icedtea6-hg
> destination directory: icedtea6-hg
> updating to branch default
> abort:
> data/contrib/templater/hotspot/src/cpu/CPU/vm/bytecodeInterpreter_CPU.inline.hpp.i@16d04ce16287:
> no match found!
>
> In the above, $HOME is an NFS mount and $PWD is a local reiserfs
> partition. I initially hit failures doing builds with source on $HOME
> and the build directory on a local reiserfs partition. In that
> scenario, it would fail as not being able to find files that should
> have been copied over.
>
> Reverting the changeset fixes the issue. 2.6.37.2 still has the bug.
> I haven't checked 2.6.37.3 yet but I didn't see any NFS changes in there.

Which arch are you using? As this patch is also part of the upcoming
2.6.38, can you check if the latest .38 is affected, too?

cheers, Marc

--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |


Attachments:
signature.asc (262.00 B)
OpenPGP digital signature

2011-03-10 19:35:22

by Pekka Enberg

[permalink] [raw]
Subject: Re: NFS regression in 2.6.37.1 (current stable)

On Thu, Mar 10, 2011 at 8:53 PM, Dr Andrew John Hughes
<[email protected]> wrote:
> [Please CC me on responses as I'm not subscribed]
>
> Hi,
>
> I seem to have uncovered a regression in the NFS code between 2.6.37 and 2.6.37.1
> caused by this changeset:
>
> commit 55ea499d60aefa3d03a77fc8590c26b5881faa92
> Author: Trond Myklebust <[email protected]>
> Date: ? Sat Jan 8 17:45:38 2011 -0500
> NFS: Don't use vm_map_ram() in readdir
> http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.37.y.git;a=commit;h=6650239a4b01077e80d5a4468562756d77afaa59
>
> With this change applied, copying of files between NFS and non-NFS
> mounts seems to be broken. ?The easiest way I've found to replicate
> this myself is to use a VCS to do a clone of a tree on a NFS mount to
> a directory on a non-NFS mount. ?I used Mercurial, as I had Mercurial
> trees to hand from work on IcedTea, but I assume doing it with a git
> tree such as the linux tree would also work. ?The idea is to do
> something which involves copying over a bunch of directories and
> checking the result is readable.
>
> $ hg clone $HOME/projects/openjdk/icedtea6-hg
> destination directory: icedtea6-hg
> updating to branch default
> abort:
> data/contrib/templater/hotspot/src/cpu/CPU/vm/bytecodeInterpreter_CPU.inline.hpp.i@16d04ce16287:
> no match found!
>
> In the above, $HOME is an NFS mount and $PWD is a local reiserfs
> partition. ?I initially hit failures doing builds with source on $HOME
> and the build directory on a local reiserfs partition. ?In that
> scenario, it would fail as not being able to find files that should
> have been copied over.
>
> Reverting the changeset fixes the issue. ?2.6.37.2 still has the bug.
> I haven't checked 2.6.37.3 yet but I didn't see any NFS changes in there.

There's some more discussion here:

https://bugs.gentoo.org/show_bug.cgi?id=357121

Greg, Trond, as this is a regression in -stable, I wonder if it's best
that we just revert the commit?

Pekka

2011-03-10 19:37:33

by Pekka Enberg

[permalink] [raw]
Subject: Re: NFS regression in 2.6.37.1 (current stable)

On Thu, Mar 10, 2011 at 8:59 PM, Marc Kleine-Budde <[email protected]> wrote:
> Hello Andrew,
>
> On 03/10/2011 07:53 PM, Dr Andrew John Hughes wrote:
>> [Please CC me on responses as I'm not subscribed]
>>
>> Hi,
>>
>> I seem to have uncovered a regression in the NFS code between 2.6.37 and 2.6.37.1
>> caused by this changeset:
>>
>> commit 55ea499d60aefa3d03a77fc8590c26b5881faa92
>> Author: Trond Myklebust <[email protected]>
>> Date: ? Sat Jan 8 17:45:38 2011 -0500
>> NFS: Don't use vm_map_ram() in readdir
>> http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.37.y.git;a=commit;h=6650239a4b01077e80d5a4468562756d77afaa59
>>
>> With this change applied, copying of files between NFS and non-NFS
>> mounts seems to be broken. ?The easiest way I've found to replicate
>> this myself is to use a VCS to do a clone of a tree on a NFS mount to
>> a directory on a non-NFS mount. ?I used Mercurial, as I had Mercurial
>> trees to hand from work on IcedTea, but I assume doing it with a git
>> tree such as the linux tree would also work. ?The idea is to do
>> something which involves copying over a bunch of directories and
>> checking the result is readable.
>>
>> $ hg clone $HOME/projects/openjdk/icedtea6-hg
>> destination directory: icedtea6-hg
>> updating to branch default
>> abort:
>> data/contrib/templater/hotspot/src/cpu/CPU/vm/bytecodeInterpreter_CPU.inline.hpp.i@16d04ce16287:
>> no match found!
>>
>> In the above, $HOME is an NFS mount and $PWD is a local reiserfs
>> partition. ?I initially hit failures doing builds with source on $HOME
>> and the build directory on a local reiserfs partition. ?In that
>> scenario, it would fail as not being able to find files that should
>> have been copied over.
>>
>> Reverting the changeset fixes the issue. ?2.6.37.2 still has the bug.
>> I haven't checked 2.6.37.3 yet but I didn't see any NFS changes in there.
>
> Which arch are you using? As this patch is also part of the upcoming
> 2.6.38, can you check if the latest .38 is affected, too?

Looking at the Gentoo bug report, I think Andrew is using x86-64:

https://bugs.gentoo.org/show_bug.cgi?id=357121

2011-03-10 19:39:40

by Myklebust, Trond

[permalink] [raw]
Subject: Re: NFS regression in 2.6.37.1 (current stable)

On Thu, 2011-03-10 at 18:53 +0000, Dr Andrew John Hughes wrote:
> [Please CC me on responses as I'm not subscribed]
>
> Hi,
>
> I seem to have uncovered a regression in the NFS code between 2.6.37 and 2.6.37.1
> caused by this changeset:
>
> commit 55ea499d60aefa3d03a77fc8590c26b5881faa92
> Author: Trond Myklebust <[email protected]>
> Date: Sat Jan 8 17:45:38 2011 -0500
> NFS: Don't use vm_map_ram() in readdir
> http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.37.y.git;a=commit;h=6650239a4b01077e80d5a4468562756d77afaa59
>
> With this change applied, copying of files between NFS and non-NFS
> mounts seems to be broken. The easiest way I've found to replicate
> this myself is to use a VCS to do a clone of a tree on a NFS mount to
> a directory on a non-NFS mount. I used Mercurial, as I had Mercurial
> trees to hand from work on IcedTea, but I assume doing it with a git
> tree such as the linux tree would also work. The idea is to do
> something which involves copying over a bunch of directories and
> checking the result is readable.
>
> $ hg clone $HOME/projects/openjdk/icedtea6-hg
> destination directory: icedtea6-hg
> updating to branch default
> abort:
> data/contrib/templater/hotspot/src/cpu/CPU/vm/bytecodeInterpreter_CPU.inline.hpp.i@16d04ce16287:
> no match found!
>
> In the above, $HOME is an NFS mount and $PWD is a local reiserfs
> partition. I initially hit failures doing builds with source on $HOME
> and the build directory on a local reiserfs partition. In that
> scenario, it would fail as not being able to find files that should
> have been copied over.
>
> Reverting the changeset fixes the issue. 2.6.37.2 still has the bug.
> I haven't checked 2.6.37.3 yet but I didn't see any NFS changes in there.
> --
> Andrew :)
>
> Free Java Software Engineer
> Red Hat, Inc. (http://www.redhat.com)

It looks to me as if you are hitting the issue that was fixed in
mainline by commit d1205f87bbb8040c1408bbd9e0a720310b2b0b9b (NFS: NFSv4
readdir loses entries). That commit was labelled as "Cc:
[email protected]" but has still not made it into the 2.6.37 stable
series.

I've attached it below...

Cheers
Trond

8<-------------------------------------------------------------------
>From d1205f87bbb8040c1408bbd9e0a720310b2b0b9b Mon Sep 17 00:00:00 2001
From: Chuck Lever <[email protected]>
Date: Fri, 28 Jan 2011 12:41:05 -0500
Subject: [PATCH] NFS: NFSv4 readdir loses entries

On recent 2.6.38-rc kernels, connectathon basic test 6 fails on
NFSv4 mounts of OpenSolaris with something like:

> ./test6: readdir
> ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.12' dir
entry, pass 0
> ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.82' dir
entry, pass 0
> ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.164' dir
entry, pass 0
> ./test6: (/mnt/klimt/matisse.test) Test failed with 3 errors
> basic tests failed
> Tests failed, leaving /mnt/klimt mounted
> [cel@matisse cthon04]$

I narrowed the problem down to nfs4_decode_dirent() reporting that the
decode buffer had overflowed while decoding the entries for those
missing files.

verify_attr_len() assumes both it's pointer arguments reside on the
same page. When these arguments point to locations on two different
pages, verify_attr_len() can report false errors. This can happen now
that a large NFSv4 readdir result can span pages.

We have reasonably good checking in nfs4_decode_dirent() anyway, so
it should be safe to simply remove the extra checking.

At a guess, this was introduced by commit 6650239a, "NFS: Don't use
vm_map_ram() in readdir".

Cc: [email protected] [2.6.37]
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/nfs4xdr.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 009aef9..4e2c168 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -6132,9 +6132,6 @@ int nfs4_decode_dirent(struct xdr_stream *xdr,
struct nfs_entry *entry,
if (entry->fattr->valid & NFS_ATTR_FATTR_TYPE)
entry->d_type = nfs_umode_to_dtype(entry->fattr->mode);

- if (verify_attr_len(xdr, p, len) < 0)
- goto out_overflow;
-
return 0;

out_overflow:
--
1.7.4



--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2011-03-10 19:44:19

by Marc Kleine-Budde

[permalink] [raw]
Subject: Re: NFS regression in 2.6.37.1 (current stable)

On 03/10/2011 08:35 PM, Pekka Enberg wrote:
[...]
> There's some more discussion here:
>
> https://bugs.gentoo.org/show_bug.cgi?id=357121

They have identified the 55ea499d60aefa3d03a77fc8590c26b5881faa92 as the
offending commit, too. Used system is x86_64.

> Greg, Trond, as this is a regression in -stable, I wonder if it's best
> that we just revert the commit?

No - that would break arm < armv6 and IIRC parisc.

Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |


Attachments:
signature.asc (262.00 B)
OpenPGP digital signature

2011-03-10 20:51:35

by Greg KH

[permalink] [raw]
Subject: Re: [stable] NFS regression in 2.6.37.1 (current stable)

On Thu, Mar 10, 2011 at 02:39:25PM -0500, Trond Myklebust wrote:
> On Thu, 2011-03-10 at 18:53 +0000, Dr Andrew John Hughes wrote:
> > [Please CC me on responses as I'm not subscribed]
> >
> > Hi,
> >
> > I seem to have uncovered a regression in the NFS code between 2.6.37 and 2.6.37.1
> > caused by this changeset:
> >
> > commit 55ea499d60aefa3d03a77fc8590c26b5881faa92
> > Author: Trond Myklebust <[email protected]>
> > Date: Sat Jan 8 17:45:38 2011 -0500
> > NFS: Don't use vm_map_ram() in readdir
> > http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.37.y.git;a=commit;h=6650239a4b01077e80d5a4468562756d77afaa59
> >
> > With this change applied, copying of files between NFS and non-NFS
> > mounts seems to be broken. The easiest way I've found to replicate
> > this myself is to use a VCS to do a clone of a tree on a NFS mount to
> > a directory on a non-NFS mount. I used Mercurial, as I had Mercurial
> > trees to hand from work on IcedTea, but I assume doing it with a git
> > tree such as the linux tree would also work. The idea is to do
> > something which involves copying over a bunch of directories and
> > checking the result is readable.
> >
> > $ hg clone $HOME/projects/openjdk/icedtea6-hg
> > destination directory: icedtea6-hg
> > updating to branch default
> > abort:
> > data/contrib/templater/hotspot/src/cpu/CPU/vm/bytecodeInterpreter_CPU.inline.hpp.i@16d04ce16287:
> > no match found!
> >
> > In the above, $HOME is an NFS mount and $PWD is a local reiserfs
> > partition. I initially hit failures doing builds with source on $HOME
> > and the build directory on a local reiserfs partition. In that
> > scenario, it would fail as not being able to find files that should
> > have been copied over.
> >
> > Reverting the changeset fixes the issue. 2.6.37.2 still has the bug.
> > I haven't checked 2.6.37.3 yet but I didn't see any NFS changes in there.
> > --
> > Andrew :)
> >
> > Free Java Software Engineer
> > Red Hat, Inc. (http://www.redhat.com)
>
> It looks to me as if you are hitting the issue that was fixed in
> mainline by commit d1205f87bbb8040c1408bbd9e0a720310b2b0b9b (NFS: NFSv4
> readdir loses entries). That commit was labelled as "Cc:
> [email protected]" but has still not made it into the 2.6.37 stable
> series.
>
> I've attached it below...

That's because this patch does not apply to the 2.6.37-stable kernel
tree. I should have sent out an email saying that this is why it was
not included.

Just to be sure, I just now resent the "this doesn't apply" email.
Please feel free to backport it and send it to [email protected] and I
will be glad to queue it up.

thanks,

greg k-h

2011-03-11 00:09:18

by Dr Andrew John Hughes

[permalink] [raw]
Subject: Re: NFS regression in 2.6.37.1 (current stable)

On 19:59 Thu 10 Mar , Marc Kleine-Budde wrote:
> Hello Andrew,
>
> On 03/10/2011 07:53 PM, Dr Andrew John Hughes wrote:
> > [Please CC me on responses as I'm not subscribed]
> >
> > Hi,
> >
> > I seem to have uncovered a regression in the NFS code between 2.6.37 and 2.6.37.1
> > caused by this changeset:
> >
> > commit 55ea499d60aefa3d03a77fc8590c26b5881faa92
> > Author: Trond Myklebust <[email protected]>
> > Date: Sat Jan 8 17:45:38 2011 -0500
> > NFS: Don't use vm_map_ram() in readdir
> > http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.37.y.git;a=commit;h=6650239a4b01077e80d5a4468562756d77afaa59
> >
> > With this change applied, copying of files between NFS and non-NFS
> > mounts seems to be broken. The easiest way I've found to replicate
> > this myself is to use a VCS to do a clone of a tree on a NFS mount to
> > a directory on a non-NFS mount. I used Mercurial, as I had Mercurial
> > trees to hand from work on IcedTea, but I assume doing it with a git
> > tree such as the linux tree would also work. The idea is to do
> > something which involves copying over a bunch of directories and
> > checking the result is readable.
> >
> > $ hg clone $HOME/projects/openjdk/icedtea6-hg
> > destination directory: icedtea6-hg
> > updating to branch default
> > abort:
> > data/contrib/templater/hotspot/src/cpu/CPU/vm/bytecodeInterpreter_CPU.inline.hpp.i@16d04ce16287:
> > no match found!
> >
> > In the above, $HOME is an NFS mount and $PWD is a local reiserfs
> > partition. I initially hit failures doing builds with source on $HOME
> > and the build directory on a local reiserfs partition. In that
> > scenario, it would fail as not being able to find files that should
> > have been copied over.
> >
> > Reverting the changeset fixes the issue. 2.6.37.2 still has the bug.
> > I haven't checked 2.6.37.3 yet but I didn't see any NFS changes in there.
>
> Which arch are you using? As this patch is also part of the upcoming
> 2.6.38, can you check if the latest .38 is affected, too?
>

Sorry, I should have said; it's x86_64. I'll give .38 a go. It sounds,
from the other responses, like that has the required fix and it just needs
to be backported to the 2.6.37 tree.

> cheers, Marc
>
> --
> Pengutronix e.K. | Marc Kleine-Budde |
> Industrial Linux Solutions | Phone: +49-231-2826-924 |
> Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
> Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
>

Cheers,
--
Andrew :)

Free Java Software Engineer
Red Hat, Inc. (http://www.redhat.com)

2011-03-11 00:11:29

by Dr Andrew John Hughes

[permalink] [raw]
Subject: Re: NFS regression in 2.6.37.1 (current stable)

On 20:44 Thu 10 Mar , Marc Kleine-Budde wrote:
> On 03/10/2011 08:35 PM, Pekka Enberg wrote:
> [...]
> > There's some more discussion here:
> >
> > https://bugs.gentoo.org/show_bug.cgi?id=357121
>
> They have identified the 55ea499d60aefa3d03a77fc8590c26b5881faa92 as the
> offending commit, too. Used system is x86_64.
>

FWIW, that 'they' is me too; I reported it there first, but decided to bring
the discussion to the kernel lists when things didn't seem to be progressing
at the distro level.

> > Greg, Trond, as this is a regression in -stable, I wonder if it's best
> > that we just revert the commit?
>
> No - that would break arm < armv6 and IIRC parisc.
>
> Marc
> --
> Pengutronix e.K. | Marc Kleine-Budde |
> Industrial Linux Solutions | Phone: +49-231-2826-924 |
> Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
> Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
>



--
Andrew :)

Free Java Software Engineer
Red Hat, Inc. (http://www.redhat.com)