2004-10-06 16:41:09

by James Pearson

[permalink] [raw]
Subject: File system cache corruptions?

Very occasionally we get problems where by an application on an NFS
server fails to load on one NFS client - for example just had one which
fails when you try to run it:

/path/to/nfs/mounted/binary: relocation error:
/path/to/nfs/mounted/shared/object.so: symbol , version GLIBC_2.0 not
defined in file libpthread.so.0 with link time reference

I 'fixed' the problem by running a simple tool on the client that just
grabs memory - and hence flushes any cached files.

The application in question hasn't changed in months.

In this case, the client is running Fedora Core 1 with a 2.4.22 based
kernel (with Trond's NFS client patches), but we've seen similar
problems with Redhat 7.2 based machines with a variety of kernel.org
2.4.X based kernels.

Any ideas on what may be causing this?

James Pearson


-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2004-10-06 17:45:49

by Trond Myklebust

[permalink] [raw]
Subject: Re: File system cache corruptions?

P=E5 on , 06/10/2004 klokka 18:40, skreiv James Pearson:

> Any ideas on what may be causing this?

Shared mmapped files tend not to be easy to keep consistent w.r.t. the
server, because mmap pins the pages in memory.

Instead of copying data into the mmapped file, it is better to create a
new file, then rename it onto the old one (like tools such as GNU
install do for you).

Cheers,
Trond



-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-10-06 17:58:31

by Lever, Charles

[permalink] [raw]
Subject: RE: File system cache corruptions?

are you using soft mounts?

> -----Original Message-----
> From: James Pearson [mailto:[email protected]]=20
> Sent: Wednesday, October 06, 2004 12:41 PM
> To: [email protected]
> Subject: [NFS] File system cache corruptions?
>=20
>=20
> Very occasionally we get problems where by an application on an NFS=20
> server fails to load on one NFS client - for example just had=20
> one which=20
> fails when you try to run it:
>=20
> /path/to/nfs/mounted/binary: relocation error:=20
> /path/to/nfs/mounted/shared/object.so: symbol , version GLIBC_2.0 not=20
> defined in file libpthread.so.0 with link time reference
>=20
> I 'fixed' the problem by running a simple tool on the client=20
> that just=20
> grabs memory - and hence flushes any cached files.
>=20
> The application in question hasn't changed in months.
>=20
> In this case, the client is running Fedora Core 1 with a 2.4.22 based=20
> kernel (with Trond's NFS client patches), but we've seen similar=20
> problems with Redhat 7.2 based machines with a variety of kernel.org=20
> 2.4.X based kernels.
>=20
> Any ideas on what may be causing this?
>=20
> James Pearson
>=20
>=20
> -------------------------------------------------------
> This SF.net email is sponsored by: IT Product Guide on=20
> ITManagersJournal Use IT products in your business? Tell us=20
> what you think of them. Give us Your Opinions, Get Free=20
> ThinkGeek Gift Certificates! Click to find out more=20
> http://productguide.itmanagersjournal.com/guid> epromo.tmpl
>=20
> _______________________________________________
>=20
> NFS maillist - [email protected]=20
> https://lists.sourceforge.net/lists/listinfo/n> fs
>=20


-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-10-07 08:59:46

by James Pearson

[permalink] [raw]
Subject: Re: File system cache corruptions?

Trond Myklebust wrote:
> P? on , 06/10/2004 klokka 18:40, skreiv James Pearson:
>
>
>>Any ideas on what may be causing this?
>
>
> Shared mmapped files tend not to be easy to keep consistent w.r.t. the
> server, because mmap pins the pages in memory.
>
> Instead of copying data into the mmapped file, it is better to create a
> new file, then rename it onto the old one (like tools such as GNU
> install do for you).

This is true (and something I do as a matter of course if I ever have to
replace/upgrade shared libraries/binaries etc), but not the case here -
the executable and its required shared libraries which are on the NFS
server have been in place and not changed for about 6 months.

There are over 100 clients using the exact same binary and shared
libraries - only one had this problem.

The server is hard mounted.

Thanks

James Pearson




-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs