2012-08-27 22:02:36

by Simon Kirby

[permalink] [raw]
Subject: [3.6-rc3] rdirplus broken? (EBUSY)

Hello!

Something seems broiken in 3.6-rc[123] which was fine in 3.5 and earlier.
This is a 3.4.1 knfsd server with ext3 and XFS-based NFS exports:

/ 192.168.13.0/24(rw,no_root_squash,no_subtree_check,async)
/pics 192.168.13.0/24(rw,no_root_squash,no_subtree_check,async)
/raid 192.168.13.0/24(rw,no_root_squash,no_subtree_check,async)

and a 3.6-rc3 client with this in fstab:

flick:/ /flick nfs rw,vers=3
flick:/raid /flick/raid nfs rw,vers=3
flick:/pics /flick/pics nfs rw,vers=3

This seems to fail now as follows:

[sroot@oof:/]# mount flick
[sroot@oof:/]# mount flick/raid
[sroot@oof:/]# mount flick/pics
[sroot@oof:/]# ls -l flick
ls: cannot access flick/pics: Device or resource busy
ls: cannot access flick/raid: Device or resource busy
total 2180
drwxr-xr-x 45 root root 4096 Jun 18 14:19 ./
drwxr-xr-x 58 root root 4096 Jul 3 22:24 ../
...
?????????? ? ? ? ? ? pics
?????????? ? ? ? ? ? raid
...
[sroot@oof:/]# cd flick/pics
flick/pics: Device or resource busy.

These mount points are now stuck and cannot be unmounted until
I reboot (umount -l fails with EBUSY).

If I mount with "nordirplus", I can't seem to get it to break. However,
sometimes it will work regardless. I can bisect this if it would help..

Simon-


2012-09-20 00:13:05

by Simon Kirby

[permalink] [raw]
Subject: Re: [3.6-rc3] rdirplus broken? (EBUSY)

On Wed, Sep 12, 2012 at 02:20:35PM -0700, Simon Kirby wrote:

> On Wed, Sep 12, 2012 at 08:16:13AM -0400, J. Bruce Fields wrote:
>
> > The symptoms sound similar to
> > http://marc.info/?l=linux-fsdevel&m=134738157303017&w=2
> >
> > Might be worth checking whether it's that patch?
>
> Indeed! I tried this hack:
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 8086636..649a112 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -2404,6 +2404,10 @@ out_unalias:
> if (likely(!d_mountpoint(alias))) {
> __d_move(alias, dentry);
> ret = alias;
> + } else {
> + printk(KERN_WARNING "VFS: __d_move()ing a d_mountpoint(), uh oh\n");
> + __d_move(alias, dentry);
> + ret = alias;
> }
> out_err:
> spin_unlock(&inode->i_lock);
>
> With this applied, "ls -l flick" prints:
>
> [ 77.217420] VFS: __d_move()ing a d_mountpoint(), uh oh
> [ 77.222390] VFS: __d_move()ing a d_mountpoint(), uh oh
>
> ...and "pics" and "raid" then work as they did before, or with "nordirplus"
> set. So, is something broken with nordirplus or the NFS layer, or should
> __d_unalias() really move a mountpoint? With nordirplus, it works without
> complaining about moving a mountpoint.

By the way, This seems fixed in 3.6-rc6, likely due to
c3f52af3e03013db5237e339c817beaae5ec9e3a. Thanks!

Simon-

2012-09-11 19:25:24

by Simon Kirby

[permalink] [raw]
Subject: Re: [3.6-rc3] rdirplus broken? (EBUSY)

On Mon, Aug 27, 2012 at 02:55:10PM -0700, Simon Kirby wrote:

> Hello!
>
> Something seems broiken in 3.6-rc[123] which was fine in 3.5 and earlier.
> This is a 3.4.1 knfsd server with ext3 and XFS-based NFS exports:
>
> / 192.168.13.0/24(rw,no_root_squash,no_subtree_check,async)
> /pics 192.168.13.0/24(rw,no_root_squash,no_subtree_check,async)
> /raid 192.168.13.0/24(rw,no_root_squash,no_subtree_check,async)
>
> and a 3.6-rc3 client with this in fstab:
>
> flick:/ /flick nfs rw,vers=3
> flick:/raid /flick/raid nfs rw,vers=3
> flick:/pics /flick/pics nfs rw,vers=3
>
> This seems to fail now as follows:
>
> [sroot@oof:/]# mount flick
> [sroot@oof:/]# mount flick/raid
> [sroot@oof:/]# mount flick/pics
> [sroot@oof:/]# ls -l flick
> ls: cannot access flick/pics: Device or resource busy
> ls: cannot access flick/raid: Device or resource busy
> total 2180
> drwxr-xr-x 45 root root 4096 Jun 18 14:19 ./
> drwxr-xr-x 58 root root 4096 Jul 3 22:24 ../
> ...
> ?????????? ? ? ? ? ? pics
> ?????????? ? ? ? ? ? raid
> ...
> [sroot@oof:/]# cd flick/pics
> flick/pics: Device or resource busy.
>
> These mount points are now stuck and cannot be unmounted until
> I reboot (umount -l fails with EBUSY).
>
> If I mount with "nordirplus", I can't seem to get it to break. However,
> sometimes it will work regardless. I can bisect this if it would help..

This is still the case with 3.6-rc5. I hadn't noticed any problem since
mounting with nordirplus, and it broke immediately after removing the
option again. I will bisect.

Simon-

2012-09-12 12:16:15

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [3.6-rc3] rdirplus broken? (EBUSY)

On Tue, Sep 11, 2012 at 12:25:23PM -0700, Simon Kirby wrote:
> On Mon, Aug 27, 2012 at 02:55:10PM -0700, Simon Kirby wrote:
>
> > Hello!
> >
> > Something seems broiken in 3.6-rc[123] which was fine in 3.5 and earlier.
> > This is a 3.4.1 knfsd server with ext3 and XFS-based NFS exports:
> >
> > / 192.168.13.0/24(rw,no_root_squash,no_subtree_check,async)
> > /pics 192.168.13.0/24(rw,no_root_squash,no_subtree_check,async)
> > /raid 192.168.13.0/24(rw,no_root_squash,no_subtree_check,async)
> >
> > and a 3.6-rc3 client with this in fstab:
> >
> > flick:/ /flick nfs rw,vers=3
> > flick:/raid /flick/raid nfs rw,vers=3
> > flick:/pics /flick/pics nfs rw,vers=3
> >
> > This seems to fail now as follows:
> >
> > [sroot@oof:/]# mount flick
> > [sroot@oof:/]# mount flick/raid
> > [sroot@oof:/]# mount flick/pics
> > [sroot@oof:/]# ls -l flick
> > ls: cannot access flick/pics: Device or resource busy
> > ls: cannot access flick/raid: Device or resource busy
> > total 2180
> > drwxr-xr-x 45 root root 4096 Jun 18 14:19 ./
> > drwxr-xr-x 58 root root 4096 Jul 3 22:24 ../
> > ...
> > ?????????? ? ? ? ? ? pics
> > ?????????? ? ? ? ? ? raid
> > ...
> > [sroot@oof:/]# cd flick/pics
> > flick/pics: Device or resource busy.
> >
> > These mount points are now stuck and cannot be unmounted until
> > I reboot (umount -l fails with EBUSY).
> >
> > If I mount with "nordirplus", I can't seem to get it to break. However,
> > sometimes it will work regardless. I can bisect this if it would help..
>
> This is still the case with 3.6-rc5. I hadn't noticed any problem since
> mounting with nordirplus, and it broke immediately after removing the
> option again. I will bisect.

The symptoms sound similar to
http://marc.info/?l=linux-fsdevel&m=134738157303017&w=2

Might be worth checking whether it's that patch?

--b.

2012-09-12 21:20:37

by Simon Kirby

[permalink] [raw]
Subject: Re: [3.6-rc3] rdirplus broken? (EBUSY)

On Wed, Sep 12, 2012 at 08:16:13AM -0400, J. Bruce Fields wrote:

> On Tue, Sep 11, 2012 at 12:25:23PM -0700, Simon Kirby wrote:
> > On Mon, Aug 27, 2012 at 02:55:10PM -0700, Simon Kirby wrote:
> >
> > > Something seems broiken in 3.6-rc[123] which was fine in 3.5 and earlier.
> > > This is a 3.4.1 knfsd server with ext3 and XFS-based NFS exports:
> > >
> > > / 192.168.13.0/24(rw,no_root_squash,no_subtree_check,async)
> > > /pics 192.168.13.0/24(rw,no_root_squash,no_subtree_check,async)
> > > /raid 192.168.13.0/24(rw,no_root_squash,no_subtree_check,async)
> > >
> > > and a 3.6-rc3 client with this in fstab:
> > >
> > > flick:/ /flick nfs rw,vers=3
> > > flick:/raid /flick/raid nfs rw,vers=3
> > > flick:/pics /flick/pics nfs rw,vers=3
> > >
> > > This seems to fail now as follows:
> > >
> > > [sroot@oof:/]# mount flick
> > > [sroot@oof:/]# mount flick/raid
> > > [sroot@oof:/]# mount flick/pics
> > > [sroot@oof:/]# ls -l flick
> > > ls: cannot access flick/pics: Device or resource busy
> > > ls: cannot access flick/raid: Device or resource busy
> > > total 2180
> > > drwxr-xr-x 45 root root 4096 Jun 18 14:19 ./
> > > drwxr-xr-x 58 root root 4096 Jul 3 22:24 ../
> > > ...
> > > ?????????? ? ? ? ? ? pics
> > > ?????????? ? ? ? ? ? raid
> > > ...
> > > [sroot@oof:/]# cd flick/pics
> > > flick/pics: Device or resource busy.
> > >
> > > These mount points are now stuck and cannot be unmounted until
> > > I reboot (umount -l fails with EBUSY).
> > >
> > > If I mount with "nordirplus", I can't seem to get it to break. However,
> > > sometimes it will work regardless. I can bisect this if it would help..
> >
> > This is still the case with 3.6-rc5. I hadn't noticed any problem since
> > mounting with nordirplus, and it broke immediately after removing the
> > option again. I will bisect.
>
> The symptoms sound similar to
> http://marc.info/?l=linux-fsdevel&m=134738157303017&w=2
>
> Might be worth checking whether it's that patch?

Indeed! I tried this hack:

diff --git a/fs/dcache.c b/fs/dcache.c
index 8086636..649a112 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2404,6 +2404,10 @@ out_unalias:
if (likely(!d_mountpoint(alias))) {
__d_move(alias, dentry);
ret = alias;
+ } else {
+ printk(KERN_WARNING "VFS: __d_move()ing a d_mountpoint(), uh oh\n");
+ __d_move(alias, dentry);
+ ret = alias;
}
out_err:
spin_unlock(&inode->i_lock);

With this applied, "ls -l flick" prints:

[ 77.217420] VFS: __d_move()ing a d_mountpoint(), uh oh
[ 77.222390] VFS: __d_move()ing a d_mountpoint(), uh oh

...and "pics" and "raid" then work as they did before, or with "nordirplus"
set. So, is something broken with nordirplus or the NFS layer, or should
__d_unalias() really move a mountpoint? With nordirplus, it works without
complaining about moving a mountpoint.

Simon-

2012-10-03 01:17:55

by Simon Kirby

[permalink] [raw]
Subject: Re: [3.6-rc3] rdirplus broken? (EBUSY)

On Wed, Sep 19, 2012 at 05:13:03PM -0700, Simon Kirby wrote:

> On Wed, Sep 12, 2012 at 02:20:35PM -0700, Simon Kirby wrote:
>
> > On Wed, Sep 12, 2012 at 08:16:13AM -0400, J. Bruce Fields wrote:
> >
> > > The symptoms sound similar to
> > > http://marc.info/?l=linux-fsdevel&m=134738157303017&w=2
> > >
> > > Might be worth checking whether it's that patch?
> >
> > Indeed! I tried this hack:
> >
> > diff --git a/fs/dcache.c b/fs/dcache.c
> > index 8086636..649a112 100644
> > --- a/fs/dcache.c
> > +++ b/fs/dcache.c
> > @@ -2404,6 +2404,10 @@ out_unalias:
> > if (likely(!d_mountpoint(alias))) {
> > __d_move(alias, dentry);
> > ret = alias;
> > + } else {
> > + printk(KERN_WARNING "VFS: __d_move()ing a d_mountpoint(), uh oh\n");
> > + __d_move(alias, dentry);
> > + ret = alias;
> > }
> > out_err:
> > spin_unlock(&inode->i_lock);
> >
> > With this applied, "ls -l flick" prints:
> >
> > [ 77.217420] VFS: __d_move()ing a d_mountpoint(), uh oh
> > [ 77.222390] VFS: __d_move()ing a d_mountpoint(), uh oh
> >
> > ...and "pics" and "raid" then work as they did before, or with "nordirplus"
> > set. So, is something broken with nordirplus or the NFS layer, or should
> > __d_unalias() really move a mountpoint? With nordirplus, it works without
> > complaining about moving a mountpoint.
>
> By the way, This seems fixed in 3.6-rc6, likely due to
> c3f52af3e03013db5237e339c817beaae5ec9e3a. Thanks!

I confused myself with my own patch here. This still happens to me in
release 3.6, making it not possible for me to use these NFS mounts unless
I set "nordirplus" or apply my above call-__d_move-anyway patch.

I'm also getting file data corruption when mounted TCP, for some stupid
reason, even with all TSO/GSO/GRO disabled, and this goes away with UDP.
Reproducible on different client hardware, and on client kernels back to
2.6.32. Probably related to the 3.4.1 server. More debugging to do...

Simon-

2013-08-27 15:33:52

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [3.6-rc3] rdirplus broken? (EBUSY)

On Tue, Oct 02, 2012 at 06:17:53PM -0700, Simon Kirby wrote:
> I'm also getting file data corruption when mounted TCP, for some stupid
> reason, even with all TSO/GSO/GRO disabled, and this goes away with UDP.
> Reproducible on different client hardware, and on client kernels back to
> 2.6.32. Probably related to the 3.4.1 server. More debugging to do...

Is this still happening?

--b.