2005-09-13 09:05:31

by Thomas Stockheim

[permalink] [raw]
Subject: NFS problem - close to open cache consistency broken ?

I'm having a problem with NFS caching after updating from Redhat 8 to
Fedora. Sometimes changes to files ( even when done on the server) just
do not show up. At first we suspected timing problems, but all machines
are synchronized with ntp.

Then I found out how to reproduce it:
1 server: echo xxxx > testfile
2 client: Application 1 on client opens testfile, does nothing with it -
keeps it open forever ( simple fortran program with an open and an
endless loop )
3 client: cat testfile gives xxxx
4 server: echo yyyy > testfile
5 client: cat testfile still shows xxxx

Its not a file attribute caching problem I think, ls -l shows the
correct, updated modificiation time for the file. But the file contents
never get updated. Once I kill the application that was holding open the
testfile, it still shows the wrong content, but any new changes show
immediately.

At first, we tested this on kernel 2.6.11 and 2.6.9. Here, increasing
file size by 1 worked to get the updated file contents on the client,
but decreasing file size did not work.

6a server: echo zzzzz > testfile
7a client: cat testfile shows zzzzz

6b server: echo zzz > testfile
7b server: cat testfile shows xxxx

With kernel 2.6.13, any file size changes show the new content directly,
but if the file size stays constant it still does not work.

I tried all the mount options, but even turning attribute
caching totally off does not seem to help. Mouting as nfsv2 did not help
either.

I assume its a problem of the client, not the server - mounting the
same exports on old redhat 7 with kernel 2.4.2 everything works.

If I understand Close-to-open cache consistency right, having the file
opened from a different application on the same client should not be a
problem ?


Thanks, Thomas

--

************************************************************************
**
** B&B-AGEMA GmbH
**
** Thomas Stockheim
** Juelicher Strasse 338
** D-52070 Aachen
** Germany
**
** Tel. (Zentrale): ++49-(0)241-56878-0
** Tel. (Durchwahl): ++49-(0)241-56878-31
** Fax.: ++49-(0)241-56878-79
** e-mail: <[email protected]>
** Internet: http://www.bub-agema.de
**
************************************************************************


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2005-09-14 02:22:28

by Lever, Charles

[permalink] [raw]
Subject: RE: NFS problem - close to open cache consistency broken ?

> I'm having a problem with NFS caching after updating from Redhat 8 to=20
> Fedora. Sometimes changes to files ( even when done on the=20
> server) just=20
> do not show up. At first we suspected timing problems, but=20
> all machines
> are synchronized with ntp.
>=20
> Then I found out how to reproduce it:
> 1 server: echo xxxx > testfile
> 2 client: Application 1 on client opens testfile, does=20
> nothing with it -=20
> keeps it open forever ( simple fortran program with an open and an
> endless loop )
> 3 client: cat testfile gives xxxx
> 4 server: echo yyyy > testfile
> 5 client: cat testfile still shows xxxx
>=20
> Its not a file attribute caching problem I think, ls -l shows the
> correct, updated modificiation time for the file. But the=20
> file contents
> never get updated. Once I kill the application that was=20
> holding open the=20
> testfile, it still shows the wrong content, but any new changes show=20
> immediately.
>=20
> At first, we tested this on kernel 2.6.11 and 2.6.9. Here, increasing
> file size by 1 worked to get the updated file contents on the client,=20
> but decreasing file size did not work.
>=20
> 6a server: echo zzzzz > testfile
> 7a client: cat testfile shows zzzzz
>=20
> 6b server: echo zzz > testfile
> 7b server: cat testfile shows xxxx
>=20
> With kernel 2.6.13, any file size changes show the new=20
> content directly,
> but if the file size stays constant it still does not work.
>=20
> I tried all the mount options, but even turning attribute
> caching totally off does not seem to help. Mouting as nfsv2=20
> did not help
> either.
>=20
> I assume its a problem of the client, not the server - mounting the
> same exports on old redhat 7 with kernel 2.4.2 everything works.
>=20
> If I understand Close-to-open cache consistency right, having the file
> opened from a different application on the same client should=20
> not be a problem ?

thomas-

the client uses mtime and size to detect file changes on the server. if
the mtime doesn't change on the server, the clients won't detect any
data changes.

what physical file system are you using on the server?


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-15 07:48:58

by Thomas Stockheim

[permalink] [raw]
Subject: Re: NFS problem - close to open cache consistency broken ?

Lever, Charles wrote:
>>I'm having a problem with NFS caching after updating from Redhat 8 to
>>Fedora. Sometimes changes to files ( even when done on the
>>server) just
>>do not show up. At first we suspected timing problems, but
>>all machines
>>are synchronized with ntp.
>>
>>Then I found out how to reproduce it:
>>1 server: echo xxxx > testfile
>>2 client: Application 1 on client opens testfile, does
>>nothing with it -
>>keeps it open forever ( simple fortran program with an open and an
>>endless loop )
>>3 client: cat testfile gives xxxx
>>4 server: echo yyyy > testfile
>>5 client: cat testfile still shows xxxx
>>
>>Its not a file attribute caching problem I think, ls -l shows the
>>correct, updated modificiation time for the file. But the
>>file contents
>>never get updated. Once I kill the application that was
>>holding open the
>>testfile, it still shows the wrong content, but any new changes show
>>immediately.
>>
>>At first, we tested this on kernel 2.6.11 and 2.6.9. Here, increasing
>>file size by 1 worked to get the updated file contents on the client,
>>but decreasing file size did not work.
>>
>>6a server: echo zzzzz > testfile
>>7a client: cat testfile shows zzzzz
>>
>>6b server: echo zzz > testfile
>>7b server: cat testfile shows xxxx
>>
>>With kernel 2.6.13, any file size changes show the new
>>content directly,
>>but if the file size stays constant it still does not work.
>>
>>I tried all the mount options, but even turning attribute
>>caching totally off does not seem to help. Mouting as nfsv2
>>did not help
>>either.
>>
>>I assume its a problem of the client, not the server - mounting the
>>same exports on old redhat 7 with kernel 2.4.2 everything works.
>>
>>If I understand Close-to-open cache consistency right, having the file
>>opened from a different application on the same client should
>>not be a problem ?
>
>
> thomas-
>
> the client uses mtime and size to detect file changes on the server. if
> the mtime doesn't change on the server, the clients won't detect any
> data changes.
>
> what physical file system are you using on the server?
>
>

Thanks for the reply, but thats exactly my problem: Mtime does change
on the server, and the client even sees that change ( faster or slower
depending on caching settings ).

But the data on the client never gets updated.

I tested some more, and this only happens if another application on the
client has the file open for writing.

I think it might have to do something with the data_unstable flag in
nfs_refresh_inode - but I don't understand the kernel code well enough
to see what happens exactly or how to fix it for us.

Thomas



--

************************************************************************
**
** B&B-AGEMA GmbH
**
** Thomas Stockheim
** Juelicher Strasse 338
** D-52070 Aachen
** Germany
**
** Tel. (Zentrale): ++49-(0)241-56878-0
** Tel. (Durchwahl): ++49-(0)241-56878-31
** Fax.: ++49-(0)241-56878-79
** e-mail: <[email protected]>
** Internet: http://www.bub-agema.de
**
************************************************************************


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-15 13:51:02

by Lever, Charles

[permalink] [raw]
Subject: RE: NFS problem - close to open cache consistency broken ?

> > the client uses mtime and size to detect file changes on=20
> the server. if
> > the mtime doesn't change on the server, the clients won't detect any
> > data changes.
> >=20
> > what physical file system are you using on the server?
>=20
> Thanks for the reply, but thats exactly my problem: Mtime does change
> on the server, and the client even sees that change ( faster=20
> or slower=20
> depending on caching settings ).
>=20
> But the data on the client never gets updated.
>=20
> I tested some more, and this only happens if another=20
> application on the
> client has the file open for writing.
>=20
> I think it might have to do something with the data_unstable flag in
> nfs_refresh_inode - but I don't understand the kernel code well enough
> to see what happens exactly or how to fix it for us.

peter, do you happen to know how the solaris NFS client behaves when an
application holds a file open like this?

close-to-open is a convention, not a specification, so it's really up to
the client developers to implement what they think is right. i'm
guessing that the Linux client is working as designed here. i'm not
convinced it's very *convenient* behavior, though.


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-15 14:19:36

by Peter Staubach

[permalink] [raw]
Subject: Re: NFS problem - close to open cache consistency broken ?

Lever, Charles wrote:

>
>peter, do you happen to know how the solaris NFS client behaves when an
>application holds a file open like this?
>
>close-to-open is a convention, not a specification, so it's really up to
>the client developers to implement what they think is right. i'm
>guessing that the Linux client is working as designed here. i'm not
>convinced it's very *convenient* behavior, though.
>

The Solaris client issues an over the wire GETATTR for virtually every open
call. The client also flushes data when the last reference to an open file
is closed. This is on a per file descriptor basis and not a system wide
basis. Therefore, even if the file is held open, the close-to-open
semantics
are still maintained, albeit on a per file descriptor level.

The NFS client uses attributes from every call except WRITE responses to do
cache validation. The file being open for reading and/or writing is not
taken into account.

Thanx...

ps




-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-15 15:08:42

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS problem - close to open cache consistency broken ?

On Thu, Sep 15, 2005 at 06:50:49AM -0700, Lever, Charles wrote:
> close-to-open is a convention, not a specification, so it's really up to
> the client developers to implement what they think is right.

You can't turn off cache revalidation checks while someone else on the
client has the file open. Wouldn't that would make close-to-open
useless in a lot of cases?

--b.


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-15 15:12:55

by Peter Staubach

[permalink] [raw]
Subject: Re: NFS problem - close to open cache consistency broken ?

J. Bruce Fields wrote:

>On Thu, Sep 15, 2005 at 06:50:49AM -0700, Lever, Charles wrote:
>
>
>>close-to-open is a convention, not a specification, so it's really up to
>>the client developers to implement what they think is right.
>>
>>
>
>You can't turn off cache revalidation checks while someone else on the
>client has the file open. Wouldn't that would make close-to-open
>useless in a lot of cases?
>

Perhaps, but close-to-open still covers cases that the normal cache
revalidation does not. There are still windows possible with the
normal attribute cache which close-to-open handles. Some of this
depends upon the attribute cache timeouts...

ps


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-15 15:15:16

by Thomas Stockheim

[permalink] [raw]
Subject: Re: NFS problem - close to open cache consistency broken ?

Lever, Charles wrote:
>>>peter, do you happen to know how the solaris NFS client
>>
>>behaves when an
>>
>>>application holds a file open like this?
>>>
>>>close-to-open is a convention, not a specification, so it's
>>
>>really up to
>>
>>>the client developers to implement what they think is right. i'm
>>>guessing that the Linux client is working as designed here. i'm not
>>>convinced it's very *convenient* behavior, though.
>>>
>>>
>>>
>>
>>Well, linux did not allways do this - same directory mounted
>>on an older
>>redhat with kernel 2.4.2 and its working fine.
>
>
> those ancient kernels don't support close-to-open. that was introduced
> around 2.4.20.
>
>
I don't know what model they had instead, but it worked better
for us. I've been running linux clusters for 5 years or so and
never had this problem before.



--

************************************************************************
**
** B&B-AGEMA GmbH
**
** Thomas Stockheim
** Juelicher Strasse 338
** D-52070 Aachen
** Germany
**
** Tel. (Zentrale): ++49-(0)241-56878-0
** Tel. (Durchwahl): ++49-(0)241-56878-31
** Fax.: ++49-(0)241-56878-79
** e-mail: <[email protected]>
** Internet: http://www.bub-agema.de
**
************************************************************************


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-09-16 07:24:08

by Thomas Stockheim

[permalink] [raw]
Subject: Re: NFS problem - close to open cache consistency broken ?

Robert Gordon wrote:
>
> On Sep 15, 2005, at 10:14 AM, Thomas Stockheim wrote:
>
>> Lever, Charles wrote:
>>
>>>>> peter, do you happen to know how the solaris NFS client
>>>>
>>>>
>>>> behaves when an
>>>>
>>>>> application holds a file open like this?
>>>>>
>>>>> close-to-open is a convention, not a specification, so it's
>>>>
>>>>
>>>> really up to
>>>>
>>>>> the client developers to implement what they think is right. i'm
>>>>> guessing that the Linux client is working as designed here. i'm not
>>>>> convinced it's very *convenient* behavior, though.
>>>>>
>>>>>
>>>>>
>>>>
>>>> Well, linux did not allways do this - same directory mounted on an
>>>> older redhat with kernel 2.4.2 and its working fine.
>>>
>>> those ancient kernels don't support close-to-open. that was introduced
>>> around 2.4.20.
>>
>> I don't know what model they had instead, but it worked better
>> for us. I've been running linux clusters for 5 years or so and
>> never had this problem before.
>
>
> so there is the 'nocto' option (i assume) for the mount that would
> restore the old behavior -- it would prove that was the culprit
>
Nocto was one of the first mount options I tried - it did not help.

--

************************************************************************
**
** B&B-AGEMA GmbH
**
** Thomas Stockheim
** Juelicher Strasse 338
** D-52070 Aachen
** Germany
**
** Tel. (Zentrale): ++49-(0)241-56878-0
** Tel. (Durchwahl): ++49-(0)241-56878-31
** Fax.: ++49-(0)241-56878-79
** e-mail: <[email protected]>
** Internet: http://www.bub-agema.de
**
************************************************************************


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs