2013-05-10 19:55:56

by Andreas Radke

[permalink] [raw]
Subject: NFSv4 ACL issues since 3.0.72 leading to corrupt files (xfs)

I'm running a server with a NFSv4 share drive that also hosts a Mail
directory I provide with dovecot IMAP server. Dovecot has ACL plugin
disabled.

Since the 4 NFSv4 ACL related commits in kernel 3.0.72 (and all other
stable kernels) all kernels until the recent stable releases give such
errors when changing :

imap(andyrtr):
Error:maildir_file_do(/mnt/share/andyrtr/Mail/.Mailinglisten.Linux-stable/cur/msg.MFVZBC:2,):
Filename keeps changing

These are file system errors. I need to run xfs_repair on the drive
and then need to stay with an outdated kernel.

I think this is a serious regression. Can somebody confirm it and
tell me if this should be reported somewhere to a bug tracker
against nfs/xfs/... and how to solve this?

-Andy
ArchLinux


2013-05-11 09:56:39

by Andreas Radke

[permalink] [raw]
Subject: Re: NFSv4 ACL issues since 3.0.72 leading to corrupt files (xfs)

Am Sat, 11 May 2013 02:56:33 +0000
schrieb "Myklebust, Trond" <[email protected]>:

> On Fri, 2013-05-10 at 21:55 +0200, Andreas Radke wrote:
> > I'm running a server with a NFSv4 share drive that also hosts a Mail
> > directory I provide with dovecot IMAP server. Dovecot has ACL plugin
> > disabled.
> >
> > Since the 4 NFSv4 ACL related commits in kernel 3.0.72 (and all
> > other stable kernels) all kernels until the recent stable releases
> > give such errors when changing :
>
> Which 4 NFSv4 acl related commits?

It should be one of these commits:

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.0.y&id=7e36f505caf7882b6cc89ecedcd7f26749ef917a
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.0.y&id=2c34b4ae8f8228e1ec083be0333426eca4a31357
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.0.y&id=01b140abad66f022ff6dff7cc1307b07281035fa
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.0.y&id=c938c22b48302eeb9a6f3cc83f223f37d98ba6f7

> Have you bisected in order to figure out which one is the culprit? If
> not, how do you know that this is related to NFSv4 acl changes?

The problem had started with kernel 3.0.72 and all other stable kernels
that included these commits (my distro only ships 3.0.x and 3.8.x tree
at that time). All other commits from 3.0.71 to 3.0.72 look unrelated to
my server setup. Local clients can run up to date kernels without any problems.
Other people have reported also issues with ACL on servers since that
update.

I can reproduce the issue instantly when marking lots of mails as
read/unread that leads instantly to corrupt files on the server drive.

Because it hosts serious data I can't play too much with that machine.
It would be nice if you could point me the the related commit. It will
cost me lots of time to build a kernel with the patches reverted one by
one and I'm afraid of loosing more data.

-Andy

2013-05-12 10:20:06

by Andreas Radke

[permalink] [raw]
Subject: Re: NFSv4 ACL issues since 3.0.72 leading to corrupt files (xfs)

Am Sun, 12 May 2013 03:59:47 +0000
schrieb "Myklebust, Trond" <[email protected]>:

> Only one of those is an NFS server change (the one which rejects
> negative server lengths). Unless your NFS server is also acting as an
> NFS client, it is highly unlikely that the other patches are relevant.
>
> So the relevant commit to revert should be
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.0.y&id=7e36f505caf7882b6cc89ecedcd7f26749ef917a
>

Thanks for your help. Unfortunately even reverting all 4 commits didn't
solve my file corrupting issues. Now I need to check what other
commit in 3.0.72 could cause this.

Sorry for noise here.

-Andy

2013-05-11 02:56:34

by Myklebust, Trond

[permalink] [raw]
Subject: Re: NFSv4 ACL issues since 3.0.72 leading to corrupt files (xfs)

On Fri, 2013-05-10 at 21:55 +0200, Andreas Radke wrote:
> I'm running a server with a NFSv4 share drive that also hosts a Mail
> directory I provide with dovecot IMAP server. Dovecot has ACL plugin
> disabled.
>
> Since the 4 NFSv4 ACL related commits in kernel 3.0.72 (and all other
> stable kernels) all kernels until the recent stable releases give such
> errors when changing :

Which 4 NFSv4 acl related commits?

Have you bisected in order to figure out which one is the culprit? If
not, how do you know that this is related to NFSv4 acl changes?

> imap(andyrtr):
> Error:maildir_file_do(/mnt/share/andyrtr/Mail/.Mailinglisten.Linux-stable/cur/msg.MFVZBC:2,):
> Filename keeps changing
>
> These are file system errors. I need to run xfs_repair on the drive
> and then need to stay with an outdated kernel.
>
> I think this is a serious regression. Can somebody confirm it and
> tell me if this should be reported somewhere to a bug tracker
> against nfs/xfs/... and how to solve this?
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2013-05-12 03:59:51

by Myklebust, Trond

[permalink] [raw]
Subject: Re: NFSv4 ACL issues since 3.0.72 leading to corrupt files (xfs)

On Sat, 2013-05-11 at 11:56 +0200, Andreas Radke wrote:
> Am Sat, 11 May 2013 02:56:33 +0000
> schrieb "Myklebust, Trond" <[email protected]>:
>
> > On Fri, 2013-05-10 at 21:55 +0200, Andreas Radke wrote:
> > > I'm running a server with a NFSv4 share drive that also hosts a Mail
> > > directory I provide with dovecot IMAP server. Dovecot has ACL plugin
> > > disabled.
> > >
> > > Since the 4 NFSv4 ACL related commits in kernel 3.0.72 (and all
> > > other stable kernels) all kernels until the recent stable releases
> > > give such errors when changing :
> >
> > Which 4 NFSv4 acl related commits?
>
> It should be one of these commits:
>
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.0.y&id=7e36f505caf7882b6cc89ecedcd7f26749ef917a
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.0.y&id=2c34b4ae8f8228e1ec083be0333426eca4a31357
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.0.y&id=01b140abad66f022ff6dff7cc1307b07281035fa
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.0.y&id=c938c22b48302eeb9a6f3cc83f223f37d98ba6f7
>
> > Have you bisected in order to figure out which one is the culprit? If
> > not, how do you know that this is related to NFSv4 acl changes?
>
> The problem had started with kernel 3.0.72 and all other stable kernels
> that included these commits (my distro only ships 3.0.x and 3.8.x tree
> at that time). All other commits from 3.0.71 to 3.0.72 look unrelated to
> my server setup. Local clients can run up to date kernels without any problems.
> Other people have reported also issues with ACL on servers since that
> update.
>
> I can reproduce the issue instantly when marking lots of mails as
> read/unread that leads instantly to corrupt files on the server drive.
>
> Because it hosts serious data I can't play too much with that machine.
> It would be nice if you could point me the the related commit. It will
> cost me lots of time to build a kernel with the patches reverted one by
> one and I'm afraid of loosing more data.
>
> -Andy

Only one of those is an NFS server change (the one which rejects
negative server lengths). Unless your NFS server is also acting as an
NFS client, it is highly unlikely that the other patches are relevant.

So the relevant commit to revert should be
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.0.y&id=7e36f505caf7882b6cc89ecedcd7f26749ef917a

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com