2002-10-28 20:06:58

by Alan Witz

[permalink] [raw]
Subject: Corrupt Data when using NFS on Linux

I work for a small software company that recently began using NFS to implement a solution using a lesser-known database (Appgen). The problem is that we're getting lots of corrupt database files in those files modified via NFS. The on-line manual on linux.org makes the following reference which I think may be relevant:

7.10. File Corruption When Using Multiple Clients
If a file has been modified within one second of its previous modification and left the same size, it will continue to generate the same inode number. Because of this, constant reads and writes to a file by multiple clients may cause file corruption. Fixing this bug requires changes deep within the filesystem layer, and therefore it is a 2.5 item.

I was wondering if someone could clarify what is meant by this. What is the relevance of the inode number? And doesn't the inode of the file stay the same even if it is being modified? Any help would be greatly appreciated. Even some direction as to where else I might look would be helpful. Thanks,

Alan Witz


2002-10-28 20:48:31

by Lever, Charles

[permalink] [raw]
Subject: RE: Corrupt Data when using NFS on Linux

hi alan-

that information is crap, and should be removed from whereever you found it.

the problem is that typical file systems used on *Linux* NFS servers (like ext2) can't
store time stamps with sub-second resolution. this is not a problem with typical
commercial NFS servers like Solaris or NetApp filers. i'm not aware of any plan to
address this specific problem in 2.5, but that doesn't mean it won't be.

can you tell us more about your environment, especially which kernel is running
on your clients and what mount options you're using?


-----Original Message-----
From: Alan Witz [mailto:[email protected]]
Sent: Monday, October 28, 2002 3:07 PM
To: [email protected]
Subject: [NFS] Corrupt Data when using NFS on Linux


I work for a small software company that recently began using NFS to implement a solution using a lesser-known database (Appgen). The problem is that we're getting lots of corrupt database files in those files modified via NFS. The on-line manual on linux.org makes the following reference which I think may be relevant:



SYMPTOM107.10. File Corruption When Using Multiple Clients


If a file has been modified within one second of its previous modification and left the same size, it will continue to generate the same inode number. Because of this, constant reads and writes to a file by multiple clients may cause file corruption. Fixing this bug requires changes deep within the filesystem layer, and therefore it is a 2.5 item.

I was wondering if someone could clarify what is meant by this. What is the relevance of the inode number? And doesn't the inode of the file stay the same even if it is being modified? Any help would be greatly appreciated. Even some direction as to where else I might look would be helpful. Thanks,

Alan Witz

2002-10-28 20:57:50

by Trond Myklebust

[permalink] [raw]
Subject: Re: Corrupt Data when using NFS on Linux

>>>>> " " == Alan Witz <[email protected]> writes:

> *7.10. File Corruption When Using Multiple Clients*

> If a file has been modified within one second of its
> previous modification and left the same size, it will
> continue to generate the same inode number. Because of
> this, constant reads and writes to a file by multiple
> clients may cause file corruption. Fixing this bug
> requires changes deep within the filesystem layer, and
> therefore it is a 2.5 item.

That passage looks pretty confused to me. The inode number is indeed
irrelevant here.

The reason for the problem is that NFS uses the file's mtime and size
in order to determine whether or not another NFS client has written to
the file since it was last opened.

For most NFS servers, the mtime has a resolution of 1 microsecond or
better. Under Linux, however, the mtime has a resolution of 1
second. Hence if your *server* is a Linux machine, then a change by
another client that occurs within 1 second of your last change might
not cause mtime to be updated. Unless this second change also causes
the file size to change, it will appear to your client as if nobody
else has touched the file, and file corruption may follow due to the
fact that the 2 clients' file caches disagree.

Cheers,
Trond


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-28 23:05:15

by Alan Witz

[permalink] [raw]
Subject: Re: Corrupt Data when using NFS on Linux

Thanks for your quick response.

The operating environment is Red Hat Linux 2.4.18-17.7.xsmp. We are running NFS version 3. We have implemented some of our own rudimentary file locking techniques to try and circumvent the problem. This consists of creating a lock file which acts as a flag which tells the other clients not to access the file. Basically, if the lock file exists then the other clients will wait until the file is cleared before writing to the database file. To ensure that this works properly the "lock" flag is being created using the "ln" command so that the process of checking for a lock and setting a lock is essentially done in one step (thus eliminating the possibility of another client setting the lock after the current client has checked for the lock but before it can set the lock itself). We are also running NFS in synchronous mode to try and reduce the chances of data corruption due to multiple clients. The mount options are as follows:

rsize=8192,wsize=8192,noac,hard,sync,nfsvers=3

Any thoughts would be greatly appreciated.

Alan Witz


----- Original Message -----
From: Lever, Charles
To: 'Alan Witz'
Cc: [email protected]
Sent: Monday, October 28, 2002 3:48 PM
Subject: RE: [NFS] Corrupt Data when using NFS on Linux


hi alan-

that information is crap, and should be removed from whereever you found it.

the problem is that typical file systems used on *Linux* NFS servers (like ext2) can't
store time stamps with sub-second resolution. this is not a problem with typical
commercial NFS servers like Solaris or NetApp filers. i'm not aware of any plan to
address this specific problem in 2.5, but that doesn't mean it won't be.

can you tell us more about your environment, especially which kernel is running
on your clients and what mount options you're using?

-----Original Message-----
From: Alan Witz [mailto:[email protected]]
Sent: Monday, October 28, 2002 3:07 PM
To: [email protected]
Subject: [NFS] Corrupt Data when using NFS on Linux


I work for a small software company that recently began using NFS to implement a solution using a lesser-known database (Appgen). The problem is that we're getting lots of corrupt database files in those files modified via NFS. The on-line manual on linux.org makes the following reference which I think may be relevant:

7.10. File Corruption When Using Multiple Clients
If a file has been modified within one second of its previous modification and left the same size, it will continue to generate the same inode number. Because of this, constant reads and writes to a file by multiple clients may cause file corruption. Fixing this bug requires changes deep within the filesystem layer, and therefore it is a 2.5 item.

I was wondering if someone could clarify what is meant by this. What is the relevance of the inode number? And doesn't the inode of the file stay the same even if it is being modified? Any help would be greatly appreciated. Even some direction as to where else I might look would be helpful. Thanks,

Alan Witz

2002-10-29 01:34:49

by Eff Norwood

[permalink] [raw]
Subject: RE: Corrupt Data when using NFS on Linux

Hi Alan,

> rsize=8192,wsize=8192,noac,hard,sync,nfsvers=3
>
>Any thoughts would be greatly appreciated.

>From my own experiences with similar issues, I think you have two and maybe
three options:

1. Stick with your own Linux locking scheme until the kernel catches up. It
will eventually as there are a lot of darn smart people working on it-like
those on this list. :)
2. Possibly use FreeBSD? I don't know if it addresses this problem or not.
Does anyone know if FreeBSD works here?
3. Get a non-free working commercial solution like a Network Appliance
filer.

Eff Norwood





-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-29 15:10:20

by pwitting

[permalink] [raw]
Subject: Re: Corrupt Data when using NFS on Linux

> -----Original Message-----
> Subject: Re: [NFS] Corrupt Data when using NFS on Linux
> From: Trond Myklebust <[email protected]>
>
> >>>>> " " == Alan Witz <[email protected]> writes:
>
> > *7.10. File Corruption When Using Multiple Clients*
>
> > If a file has been modified within one second of its
> > previous modification and left the same size, it will
> > continue to generate the same inode number. Because of
> > this, constant reads and writes to a file by multiple
> > clients may cause file corruption. Fixing this bug
> > requires changes deep within the filesystem layer, and
> > therefore it is a 2.5 item.
>
> That passage looks pretty confused to me. The inode number is indeed
> irrelevant here.
>
> The reason for the problem is that NFS uses the file's mtime and size
> in order to determine whether or not another NFS client has written to
> the file since it was last opened.
>
> For most NFS servers, the mtime has a resolution of 1 microsecond or
> better. Under Linux, however, the mtime has a resolution of 1
> second. Hence if your *server* is a Linux machine, then a change by
> another client that occurs within 1 second of your last change might
> not cause mtime to be updated. Unless this second change also causes
> the file size to change, it will appear to your client as if nobody
> else has touched the file, and file corruption may follow due to the
> fact that the 2 clients' file caches disagree.
>
> Cheers,
> Trond

I'm curious now. Is this issue really in the linux kernel or is this
actually a Filesystem issue? It would seem to be a filesystem issue,
specifically how the FS chose to store the mtime, which could allow a
different FS (say, a non-linux fs such as jfs or xfs) to solve the problem.

Of course, given the "file centric" design of linux/unix I could see it
being a kernel issue; but wonder how quickly they'll move to resolve the
issue given the forward/backward compatibility issues (but then, they seemed
to have migrated ext2 from 32 to 64 bits pretty cleanly, so maybe)

Thanks


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-29 15:36:33

by Trond Myklebust

[permalink] [raw]
Subject: Re: Corrupt Data when using NFS on Linux

>>>>> " " == pwitting <[email protected]> writes:

> I'm curious now. Is this issue really in the linux kernel or is
> this actually a Filesystem issue? It would seem to be a
> filesystem issue, specifically how the FS chose to store the
> mtime, which could allow a different FS (say, a non-linux fs
> such as jfs or xfs) to solve the problem.

The problem is that Linux lacks support for >32-bit mtimes in the VFS
layer. Even if the underlying filesystem is capable of supporting a
64-bit mtime, the kernel structures are only seeing the high 32 bits.

Cheers,
Trond


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-29 17:03:17

by Daniel Forrest

[permalink] [raw]
Subject: Re: Corrupt Data when using NFS on Linux

Alan,

>> The operating environment is Red Hat Linux 2.4.18-17.7.xsmp. We
>> are running NFS version 3. We have implemented some of our own
>> rudimentary file locking techniques to try and circumvent the
>> problem. This consists of creating a lock file which acts as a
>> flag which tells the other clients not to access the file.
>> Basically, if the lock file exists then the other clients will wait
>> until the file is cleared before writing to the database file. To
>> ensure that this works properly the "lock" flag is being created
>> using the "ln" command so that the process of checking for a lock
>> and setting a lock is essentially done in one step (thus
>> eliminating the possibility of another client setting the lock
>> after the current client has checked for the lock but before it can
>> set the lock itself). We are also running NFS in synchronous mode
>> to try and reduce the chances of data corruption due to multiple
>> clients. The mount options are as follows:
>>
>> rsize=8192,wsize=8192,noac,hard,sync,nfsvers=3
>>
>> Any thoughts would be greatly appreciated.

You need to be careful when creating your lock files.

The "guaranteed" way to create a lock file over NFS:

create tempfile
link tempfile lockfile (ignore return code)
stat tempfile

If the link count is 2, then you have the lock file. Apparently, link
may return success even if the link failed or return failure even if
the link succeeded (I don't remember which). Doing the stat verifies
if you have actually created a link to the temporary file. While I
have never seen this problem, the people who do mailbox locking have
documented this as a problem over NFS.

Also, you will have to use "fcntl" locking if you want to ensure the
data you are reading is consistent. Doing a "lockf(fd, F_LOCK, 0)"
will guarantee that data written by other clients has been written to
the file and clear the client cache. Of course, now that you're using
"fcntl" locking for this, you can probably get rid of the lock file.

--
+----------------------------------+----------------------------------+
| Daniel K. Forrest | Laboratory for Molecular and |
| [email protected] | Computational Genomics |
| (608)262-9479 | University of Wisconsin, Madison |
+----------------------------------+----------------------------------+


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-10-29 17:20:21

by Bryan J. Smith

[permalink] [raw]
Subject: Re: Corrupt Data when using NFS on Linux -- This should be in the HOWTO


Quoting Daniel Forrest <[email protected]>:
> The "guaranteed" way to create a lock file over NFS:
> create tempfile
> link tempfile lockfile (ignore return code)
> stat tempfile
> If the link count is 2, then you have the lock file. Apparently, link
> may return success even if the link failed or return failure even if
> the link succeeded (I don't remember which). Doing the stat verifies
> if you have actually created a link to the temporary file.

I know the HOWTO is more for users/sysadmins, but stuff like this could really
help in an additional "common workarounds for potential gotchas" section at the
end of the HOWTO.

--
Bryan J. Smith, E.I. Contact Info: http://thebs.org
A+/i-Net+/Linux+/Network+/Server+ CCNA CIWA CNA SCSA/SCWSE/SCNA
---------------------------------------------------------------
limit guilt = { psychopath,
remorse->0 innocent }



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs