2006-11-14 18:09:41

by Fredrik Lindgren

[permalink] [raw]
Subject: NFS corruption in 2.6.18.2?

Hello

We're running a mail system with Linux machines being served by two
NetApps. (Debian stable, our "own" kernel off kernel.org)

At present we're running 2.6.13 kernels, we had some corruption issues
before that was fixed in 2.6.13. However when we tried to upgrade
to 2.6.18.2 the we see corruption again.

pre 2.6.13 the problem seemed to be that the file size was being
cached, which meant that sometimes there were blocks of NULL
characters in the files.

With 2.6.18.2 we see blocks of NULL chars in the data again, this time
it's sometimes in the middle of a message. pre 2.6.13 that didn't
happen,
then there were just big blocks of NULL chars between two messages.
The only consistent thing is that it only occurs when the 1 machine
running 2.6.18.2 (out of 5) has delivered a message to the spool-file.

I don't know if it's relevant, but when checking the NFS stats I see
the 2.6.18.2 machine doing almost precisely half the amount of "GetAttr"
calls compared to the 2.6.13 machines.

Is this something anyone else has seen?

Also on a side note, the statistics still seem to be using signed
values, so we're seeing negative numbers on some stats after some
uptime. This is true for both using "nfsstat" and "cat
/proc/net/rpc/nfs.

Regards,
Fredrik Lindgren





-------------------------------------------------------------------------
SF.net email is sponsored by: A Better Job is Waiting for You - Find it Now.
Check out Slashdot's new job board. Browse through tons of technical jobs
posted by companies looking to hire people just like you.
http://jobs.slashdot.org/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2006-11-25 18:27:16

by Kasparek Tomas

[permalink] [raw]
Subject: Re: NFS corruption in 2.6.18.2?

On Fri, Nov 24, 2006 at 02:11:55PM -0500, Trond Myklebust wrote:
> > > > On Thu, Nov 23, 2006 at 05:17:38PM +0100, Kasparek Tomas wrote:
> > > > Just update: 2.6.17.14 is ok
> > >
> > > did git bisect between 2.6.17 and 2.6.18 and found that the commit is:
> > >
> > > 44b11874ff583b6e766a05856b04f3c492c32b84
> > > NFS: Separate metadata and page cache revalidation mechanisms
> > >
> > > will verify (with and without patch) tomorrow.
> >
> > 2.6.18.3 with reversed 44b11874ff583b6e766a05856b04f3c492c32b84 is OK.
> >
> > exact patch used included.
>
> If you are not using any form of synchronisation, then your test would
> appear to be violating the close-to-open cache consistency rules (see
> http://nfs.sourceforge.net/#faq_a8).

Yes, I understand this, but with no synchronization I should get old or
somewhat 'not current' data at worst, I don't expect to get some data
generated by the kernel. The test is synthetic, it was constructed to be
as easy as possible, but this one works well on 2.6.16 (mean no zeros, data
are sometimes overwriten or mixed from several clients as a result of no
locking, but that's ok for my application).

I think the real bug/problem/misbehaving is somwhere else than in the patch
itself I mention, this patch just enables different behaviour that ends
with the real bad code having the possibility to show up.

I will try to find this bad code next week.

Regards

--

Tomas Kasparek, PhD student E-mail: [email protected]
CVT FIT VUT Brno, BI/140a Web: http://www.fit.vutbr.cz/~kasparek
Bozetechova 2, 612 66 Fax: +420 54114-1270
Brno, Czech Republic Phone: +420 54114-1220

ICQ: 293092805 jabber: [email protected]
GPG: 2F1E 1AAF FD3B CFA3 1537 63BD DCBE 18FF A035 53BC


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-23 16:17:49

by Kasparek Tomas

[permalink] [raw]
Subject: Re: NFS corruption in 2.6.18.2?


> >have seen this behaviour with kernel 2.6.18 and above up to 19-rc4.
> > Reported this, but no response.
> >
> > http://lkml.org/lkml/2006/9/28/89
>
> Your test script doesn't use any form of locking. How are you ensuring
> that only one client has the file open at a time?

I use no locking at all, it's just synthetic test, I don't expect data to
be ordered or whatever, but the problem is blocks of zeros inserted.
It behaves right with the last 2.6.16.32, zeros are contained if clients
are 2.6.18.3 and 2.6.19-rc4.

I'm going to do more testing in next few days, so hoe bring some new info.

As I wrote before, it does not depend on the server (tried with FreeBSD
server and several versions of linux 2.6.16.x and 2.6.18.x).

Regards

--

Tomas Kasparek, PhD student E-mail: [email protected]
CVT FIT VUT Brno, BI/140a Web: http://www.fit.vutbr.cz/~kasparek
Bozetechova 2, 612 66 Fax: +420 54114-1270
Brno, Czech Republic Phone: +420 54114-1220

ICQ: 293092805 jabber: [email protected]
GPG: 2F1E 1AAF FD3B CFA3 1537 63BD DCBE 18FF A035 53BC


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-23 17:25:08

by Kasparek Tomas

[permalink] [raw]
Subject: Re: NFS corruption in 2.6.18.2?

On Thu, Nov 23, 2006 at 05:17:38PM +0100, Kasparek Tomas wrote:
>
> > >have seen this behaviour with kernel 2.6.18 and above up to 19-rc4.
> > > Reported this, but no response.
> > >
> > > http://lkml.org/lkml/2006/9/28/89
> >
> > Your test script doesn't use any form of locking. How are you ensuring
> > that only one client has the file open at a time?
>
> I use no locking at all, it's just synthetic test, I don't expect data to
> be ordered or whatever, but the problem is blocks of zeros inserted.
> It behaves right with the last 2.6.16.32, zeros are contained if clients
> are 2.6.18.3 and 2.6.19-rc4.
>
> I'm going to do more testing in next few days, so hoe bring some new info.
>
> As I wrote before, it does not depend on the server (tried with FreeBSD
> server and several versions of linux 2.6.16.x and 2.6.18.x).

Just update: 2.6.17.14 is ok

--

Tomas Kasparek, PhD student E-mail: [email protected]
CVT FIT VUT Brno, BI/140a Web: http://www.fit.vutbr.cz/~kasparek
Bozetechova 2, 612 66 Fax: +420 54114-1270
Brno, Czech Republic Phone: +420 54114-1220

ICQ: 293092805 jabber: [email protected]
GPG: 2F1E 1AAF FD3B CFA3 1537 63BD DCBE 18FF A035 53BC


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-23 21:33:20

by Kasparek Tomas

[permalink] [raw]
Subject: Re: NFS corruption in 2.6.18.2?

On Thu, Nov 23, 2006 at 06:24:56PM +0100, Kasparek Tomas wrote:
> On Thu, Nov 23, 2006 at 05:17:38PM +0100, Kasparek Tomas wrote:
> >
> > > >have seen this behaviour with kernel 2.6.18 and above up to 19-rc4.
> > > > Reported this, but no response.
> > > >
> > > > http://lkml.org/lkml/2006/9/28/89
> > >
> > > Your test script doesn't use any form of locking. How are you ensuring
> > > that only one client has the file open at a time?
> >
> > I use no locking at all, it's just synthetic test, I don't expect data to
> > be ordered or whatever, but the problem is blocks of zeros inserted.
> > It behaves right with the last 2.6.16.32, zeros are contained if clients
> > are 2.6.18.3 and 2.6.19-rc4.
> >
> > I'm going to do more testing in next few days, so hoe bring some new info.
> >
> > As I wrote before, it does not depend on the server (tried with FreeBSD
> > server and several versions of linux 2.6.16.x and 2.6.18.x).
>
> Just update: 2.6.17.14 is ok

did git bisect between 2.6.17 and 2.6.18 and found that the commit is:

44b11874ff583b6e766a05856b04f3c492c32b84
NFS: Separate metadata and page cache revalidation mechanisms

will verify (with and without patch) tomorrow.

Bye

--

Tomas Kasparek, PhD student E-mail: [email protected]
CVT FIT VUT Brno, BI/140a Web: http://www.fit.vutbr.cz/~kasparek
Bozetechova 2, 612 66 Fax: +420 54114-1270
Brno, Czech Republic Phone: +420 54114-1220

ICQ: 293092805 jabber: [email protected]
GPG: 2F1E 1AAF FD3B CFA3 1537 63BD DCBE 18FF A035 53BC


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-24 19:12:16

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS corruption in 2.6.18.2?

On Fri, 2006-11-24 at 07:47 +0100, Kasparek Tomas wrote:
> On Thu, Nov 23, 2006 at 10:33:11PM +0100, Kasparek Tomas wrote:
> > On Thu, Nov 23, 2006 at 06:24:56PM +0100, Kasparek Tomas wrote:
> > > On Thu, Nov 23, 2006 at 05:17:38PM +0100, Kasparek Tomas wrote:
> > > >
> > > > > >have seen this behaviour with kernel 2.6.18 and above up to 19-rc4.
> > > > > > Reported this, but no response.
> > > > > >
> > > > > > http://lkml.org/lkml/2006/9/28/89
> > > > >
> > > > > Your test script doesn't use any form of locking. How are you ensuring
> > > > > that only one client has the file open at a time?
> > > >
> > > > I use no locking at all, it's just synthetic test, I don't expect data to
> > > > be ordered or whatever, but the problem is blocks of zeros inserted.
> > > > It behaves right with the last 2.6.16.32, zeros are contained if clients
> > > > are 2.6.18.3 and 2.6.19-rc4.
> > > >
> > > > I'm going to do more testing in next few days, so hoe bring some new info.
> > > >
> > > > As I wrote before, it does not depend on the server (tried with FreeBSD
> > > > server and several versions of linux 2.6.16.x and 2.6.18.x).
> > >
> > > Just update: 2.6.17.14 is ok
> >
> > did git bisect between 2.6.17 and 2.6.18 and found that the commit is:
> >
> > 44b11874ff583b6e766a05856b04f3c492c32b84
> > NFS: Separate metadata and page cache revalidation mechanisms
> >
> > will verify (with and without patch) tomorrow.
>
> 2.6.18.3 with reversed 44b11874ff583b6e766a05856b04f3c492c32b84 is OK.
>
> exact patch used included.

If you are not using any form of synchronisation, then your test would
appear to be violating the close-to-open cache consistency rules (see
http://nfs.sourceforge.net/#faq_a8).

Cheers
Trond


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-15 17:51:25

by Kasparek Tomas

[permalink] [raw]
Subject: Re: NFS corruption in 2.6.18.2?

On Tue, Nov 14, 2006 at 07:09:17PM +0100, Fredrik Lindgren wrote:
> Hello
>
> We're running a mail system with Linux machines being served by two
> NetApps. (Debian stable, our "own" kernel off kernel.org)
>
> At present we're running 2.6.13 kernels, we had some corruption issues
> before that was fixed in 2.6.13. However when we tried to upgrade
> to 2.6.18.2 the we see corruption again.
>
> pre 2.6.13 the problem seemed to be that the file size was being
> cached, which meant that sometimes there were blocks of NULL
> characters in the files.
>
> With 2.6.18.2 we see blocks of NULL chars in the data again, this time
> it's sometimes in the middle of a message. pre 2.6.13 that didn't
> happen,
> then there were just big blocks of NULL chars between two messages.
> The only consistent thing is that it only occurs when the 1 machine
> running 2.6.18.2 (out of 5) has delivered a message to the spool-file.
>
> I don't know if it's relevant, but when checking the NFS stats I see
> the 2.6.18.2 machine doing almost precisely half the amount of "GetAttr"
> calls compared to the 2.6.13 machines.
>
> Is this something anyone else has seen?
>
> Also on a side note, the statistics still seem to be using signed
> values, so we're seeing negative numbers on some stats after some
> uptime. This is true for both using "nfsstat" and "cat
> /proc/net/rpc/nfs.

I have seen this behaviour with kernel 2.6.18 and above up to 19-rc4.
Reported this, but no response.

http://lkml.org/lkml/2006/9/28/89

--

Tomas Kasparek, PhD student E-mail: [email protected]
CVT FIT VUT Brno, BI/140a Web: http://www.fit.vutbr.cz/~kasparek
Bozetechova 2, 612 66 Fax: +420 54114-1270
Brno, Czech Republic Phone: +420 54114-1220

ICQ: 293092805 jabber: [email protected]
GPG: 2F1E 1AAF FD3B CFA3 1537 63BD DCBE 18FF A035 53BC


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-16 03:37:10

by Brad Barnett

[permalink] [raw]
Subject: Re: NFS corruption in 2.6.18.2?

On Wed, 15 Nov 2006 18:29:47 +0100
Kasparek Tomas <[email protected]> wrote:

> On Tue, Nov 14, 2006 at 07:09:17PM +0100, Fredrik Lindgren wrote:
> > Hello
> >
> > We're running a mail system with Linux machines being served by two
> > NetApps. (Debian stable, our "own" kernel off kernel.org)
> >
> > At present we're running 2.6.13 kernels, we had some corruption issues
> > before that was fixed in 2.6.13. However when we tried to upgrade
> > to 2.6.18.2 the we see corruption again.
> >
> > pre 2.6.13 the problem seemed to be that the file size was being
> > cached, which meant that sometimes there were blocks of NULL
> > characters in the files.
> >
> > With 2.6.18.2 we see blocks of NULL chars in the data again, this time
> > it's sometimes in the middle of a message. pre 2.6.13 that didn't
> > happen,
> > then there were just big blocks of NULL chars between two messages.
> > The only consistent thing is that it only occurs when the 1 machine
> > running 2.6.18.2 (out of 5) has delivered a message to the spool-file.
> >
> > I don't know if it's relevant, but when checking the NFS stats I see
> > the 2.6.18.2 machine doing almost precisely half the amount of
> > "GetAttr" calls compared to the 2.6.13 machines.
> >
> > Is this something anyone else has seen?
> >
> > Also on a side note, the statistics still seem to be using signed
> > values, so we're seeing negative numbers on some stats after some
> > uptime. This is true for both using "nfsstat" and "cat
> > /proc/net/rpc/nfs.
>
> I have seen this behaviour with kernel 2.6.18 and above up to 19-rc4.
> Reported this, but no response.
>
> http://lkml.org/lkml/2006/9/28/89
>

I believe my previous post about NFS root filesystems, and trying to debug
it has to do with this very issue.

I've recently noticed that large files copied via ssh to /tmp on the root
(/ mounted) NFS file system turn out to be corrupt and can not be
untarred. I have no such issues with these same boxes, non root NFS
mounted, with 2.6.9 and 2.6.8. I am in the process of compiling 2.6.8
from the same Debian patched sources, in order to do a comparison.

I will let the list know, if my NFS corruption issues disappear. However,
it seems likely that this is what is causing my boxes to die....

I've seen this on current Debian 2.6.18 and 2.6.19 kernel packages....


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-11-16 16:09:41

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS corruption in 2.6.18.2?

On Wed, 2006-11-15 at 18:29 +0100, Kasparek Tomas wrote:
> I have seen this behaviour with kernel 2.6.18 and above up to 19-rc4.
> Reported this, but no response.
>
> http://lkml.org/lkml/2006/9/28/89

Your test script doesn't use any form of locking. How are you ensuring
that only one client has the file open at a time?

Trond


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-12-22 09:39:36

by Kasparek Tomas

[permalink] [raw]
Subject: Re: NFS corruption in 2.6.18.2?

On Tue, Dec 12, 2006 at 11:43:05AM -0500, Trond Myklebust wrote:
> On Tue, 2006-12-12 at 14:45 +0100, Kasparek Tomas wrote:
>
> > Hi, still me with the same problem, sorry for being pain in the ass, but I
> > need to solve it somehow (being not able to change the applications used).
> >
> > I have tried mounting the share with 'noac' option as mentioned in the FAQ,
> > but the behaviour is the same.
> >
> > Would you have any other advice how to get the old behaviour (for me that
> > means that of kernel 2.6.16.x)?
> >
> > (I tested some newer kernels to - 2.6.18.5 and 19.1 just for the case
> > someting changes but no success there.)
>
> Could you try 2.6.19 w/ the NFS_ALL patch from
> http://client.linux-nfs.org/Linux-2.6.x/2.6.19/ ?
>
> I know of one corruption issue that was happening there due to a race
> between invalidate_inode_pages2() and the write code.

Hi,

sorry it takes so long, I finnaly tried it and results are the same

- 2.6.19 + NFS_ALL (both with/withnout noac)
- 2.6.20-rc1 (both with/withnout noac)

Will try to do more investigations afer Xmas.

Regards

--

Tomas Kasparek, PhD student E-mail: [email protected]
CVT FIT VUT Brno, BI/140a Web: http://www.fit.vutbr.cz/~kasparek
Bozetechova 2, 612 66 Fax: +420 54114-1270
Brno, Czech Republic Phone: +420 54114-1220

ICQ: 293092805 jabber: [email protected]
GPG: 2F1E 1AAF FD3B CFA3 1537 63BD DCBE 18FF A035 53BC


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-12-12 16:44:37

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS corruption in 2.6.18.2?

On Tue, 2006-12-12 at 14:45 +0100, Kasparek Tomas wrote:

> Hi, still me with the same problem, sorry for being pain in the ass, but I
> need to solve it somehow (being not able to change the applications used).
>
> I have tried mounting the share with 'noac' option as mentioned in the FAQ,
> but the behaviour is the same.
>
> Would you have any other advice how to get the old behaviour (for me that
> means that of kernel 2.6.16.x)?
>
> (I tested some newer kernels to - 2.6.18.5 and 19.1 just for the case
> someting changes but no success there.)

Could you try 2.6.19 w/ the NFS_ALL patch from
http://client.linux-nfs.org/Linux-2.6.x/2.6.19/ ?

I know of one corruption issue that was happening there due to a race
between invalidate_inode_pages2() and the write code.

Cheers
Trond


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-12-12 13:45:11

by Kasparek Tomas

[permalink] [raw]
Subject: Re: NFS corruption in 2.6.18.2?

On Sat, Nov 25, 2006 at 01:33:03PM -0500, Trond Myklebust wrote:
> On Sat, 2006-11-25 at 19:26 +0100, Kasparek Tomas wrote:
> > On Fri, Nov 24, 2006 at 02:11:55PM -0500, Trond Myklebust wrote:
> > > > > > On Thu, Nov 23, 2006 at 05:17:38PM +0100, Kasparek Tomas wrote:
> > > > > > Just update: 2.6.17.14 is ok
> > > > >
> > > > > did git bisect between 2.6.17 and 2.6.18 and found that the commit is:
> > > > >
> > > > > 44b11874ff583b6e766a05856b04f3c492c32b84
> > > > > NFS: Separate metadata and page cache revalidation mechanisms
> > > > >
> > > > > will verify (with and without patch) tomorrow.
> > > >
> > > > 2.6.18.3 with reversed 44b11874ff583b6e766a05856b04f3c492c32b84 is OK.
> > > >
> > > > exact patch used included.
> > >
> > > If you are not using any form of synchronisation, then your test would
> > > appear to be violating the close-to-open cache consistency rules (see
> > > http://nfs.sourceforge.net/#faq_a8).
> >
> > Yes, I understand this, but with no synchronization I should get old or
> > somewhat 'not current' data at worst, I don't expect to get some data
> > generated by the kernel. The test is synthetic, it was constructed to be
> > as easy as possible, but this one works well on 2.6.16 (mean no zeros, data
> > are sometimes overwriten or mixed from several clients as a result of no
> > locking, but that's ok for my application).
>
> No. Extending the file on the server while your client has pending
> background writes can result in exactly the behaviour that you're
> observing.

Hi, still me with the same problem, sorry for being pain in the ass, but I
need to solve it somehow (being not able to change the applications used).

I have tried mounting the share with 'noac' option as mentioned in the FAQ,
but the behaviour is the same.

Would you have any other advice how to get the old behaviour (for me that
means that of kernel 2.6.16.x)?

(I tested some newer kernels to - 2.6.18.5 and 19.1 just for the case
someting changes but no success there.)

Regards.

--

Tomas Kasparek, PhD student E-mail: [email protected]
CVT FIT VUT Brno, BI/140a Web: http://www.fit.vutbr.cz/~kasparek
Bozetechova 2, 612 66 Fax: +420 54114-1270
Brno, Czech Republic Phone: +420 54114-1220

ICQ: 293092805 jabber: [email protected]
GPG: 2F1E 1AAF FD3B CFA3 1537 63BD DCBE 18FF A035 53BC


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-12-05 11:09:25

by Fredrik Lindgren

[permalink] [raw]
Subject: Re: NFS corruption in 2.6.18.2?

> > > > > As I wrote before, it does not depend on the server
> > > > > (tried with FreeBSD server and several versions of
> > > > > linux 2.6.16.x and 2.6.18.x).
> > > >
> > > > Just update: 2.6.17.14 is ok
> > >
> > > did git bisect between 2.6.17 and 2.6.18 and found that
> > > the commit is:
> > >
> > > 44b11874ff583b6e766a05856b04f3c492c32b84
> > > NFS: Separate metadata and page cache revalidation mechanisms
> > >
> > > will verify (with and without patch) tomorrow.
> >
> > 2.6.18.3 with reversed
> > 44b11874ff583b6e766a05856b04f3c492c32b84 is OK.
> >
> > exact patch used included.
>
> If you are not using any form of synchronisation, then your test would
> appear to be violating the close-to-open cache consistency rules (see
> http://nfs.sourceforge.net/#faq_a8).

Hello,

I have tested our application (CommuniGate Pro) with 2.6.18.3 with
Mr. Kaspareks patch applied and I'm seeing no corruption after several
days of uptime. Without the patch I usually see bogus data in in some
files within an hour.

Also, anyone has any input in why 2.6.18.3 (both with and without
Mr Kaspareks patch) does about half the amount of GETATTR calls
compared to 2.6.13 with our application? Is this an indended change?
Not that I'm complaining or anything, but it would be nice to know
that it was intentional :)

Wheter CGP voilates the "close-to-open cache consistency rules" or not,
I don't know. but the fact is that it works with a stock 2.6.13 and
2.6.18.3 sans patch 44b11874ff583b6e766a05856b04f3c492c32b84, it breaks
on 2.6.18.3.

Regards,
Fredrik Lindgren




-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-01-22 07:40:39

by Kasparek Tomas

[permalink] [raw]
Subject: Re: NFS corruption in 2.6.18.2?

On Fri, Dec 22, 2006 at 10:39:24AM +0100, Kasparek Tomas wrote:
> On Tue, Dec 12, 2006 at 11:43:05AM -0500, Trond Myklebust wrote:
> > On Tue, 2006-12-12 at 14:45 +0100, Kasparek Tomas wrote:
> >
> > > Hi, still me with the same problem, sorry for being pain in the ass, but I
> > > need to solve it somehow (being not able to change the applications used).
> > >
> > > I have tried mounting the share with 'noac' option as mentioned in the FAQ,
> > > but the behaviour is the same.
> > >
> > > Would you have any other advice how to get the old behaviour (for me that
> > > means that of kernel 2.6.16.x)?
> > >
> > > (I tested some newer kernels to - 2.6.18.5 and 19.1 just for the case
> > > someting changes but no success there.)
> >
> > Could you try 2.6.19 w/ the NFS_ALL patch from
> > http://client.linux-nfs.org/Linux-2.6.x/2.6.19/ ?
> >
> > I know of one corruption issue that was happening there due to a race
> > between invalidate_inode_pages2() and the write code.
>
> sorry it takes so long, I finnaly tried it and results are the same
>
> - 2.6.19 + NFS_ALL (both with/withnout noac)
> - 2.6.20-rc1 (both with/withnout noac)
>
> Will try to do more investigations afer Xmas.

Have tried another versions with interesting results:

- 2.6.19.1 (with/out noac) - BAD
- 2.6.19.2 (without noac) - BAD
- 2.6.19.2 (with noac) - OK !!!!
- 2.6.20-rc5 (with/out noac) - BAD

so something in 2.6.19.2 corrected the behaviour if mounted with 'noac',
but it does not work again in 20-rc5.

--

Tomas Kasparek, PhD student E-mail: [email protected]
CVT FIT VUT Brno, BI/140a Web: http://www.fit.vutbr.cz/~kasparek
Bozetechova 2, 612 66 Fax: +420 54114-1270
Brno, Czech Republic Phone: +420 54114-1220

ICQ: 293092805 jabber: [email protected]
GPG: 2F1E 1AAF FD3B CFA3 1537 63BD DCBE 18FF A035 53BC


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs