on den 31.08.2005 Klokka 08:55 (-0600) skreiv Rob Sims:
> We have noticed when changing from kernel 2.4.23 to 2.6.8 that
> timestamps of files are not changed if opened for a write and nothing is
> written. When using 2.4.23 timestamps are changed. When using a local
> filesystem (reiserfs) with either kernel, timestamps are changed.
> Symptoms vary with the client, not the server. See the script below.
>
> When run on a 2.4.23 machine in an NFS mounted directory, output is
> "Good." When run on a 2.6.8 or 2.6.12-rc4 machine in an NFS directory,
> output is "Error."
>
> Is this a bug? How do we revert to the 2.4/local fs behavior?
This is a consequence of 2.6 NFS clients optimising away unnecessary
truncate calls. Whereas this is correct behaviour for truncate(), it
appears to be incorrect for open(O_TRUNC).
In fact, local filesystems like xfs and ext3 appear to have the opposite
problem: they change ctime if you call ftruncate(0) on the zero-length
file, as the attached test shows.
Cheers,
Trond
VFS/NFS: Fix up behaviour w.r.t. truncate() and open(O_TRUNC)
POSIX and the SUSv3 specify that open(O_TRUNC) should always bump ctime/mtime
whereas truncate() should only do so if the file size actually changes.
Fix the behaviour of NFS, which currently is broken w.r.t. open(), and fix
the VFS truncate() so that it no enforces the POSIX rules.
Signed-off-by: Trond Myklebust <[email protected]>
---
nfs/inode.c | 5 -----
open.c | 12 ++++++++++--
2 files changed, 10 insertions(+), 7 deletions(-)
Index: linux-2.6.13/fs/nfs/inode.c
===================================================================
--- linux-2.6.13.orig/fs/nfs/inode.c
+++ linux-2.6.13/fs/nfs/inode.c
@@ -833,11 +833,6 @@ nfs_setattr(struct dentry *dentry, struc
struct nfs_fattr fattr;
int error;
- if (attr->ia_valid & ATTR_SIZE) {
- if (!S_ISREG(inode->i_mode) || attr->ia_size == i_size_read(inode))
- attr->ia_valid &= ~ATTR_SIZE;
- }
-
/* Optimization: if the end result is no change, don't RPC */
attr->ia_valid &= NFS_VALID_ATTRS;
if (attr->ia_valid == 0)
Index: linux-2.6.13/fs/open.c
===================================================================
--- linux-2.6.13.orig/fs/open.c
+++ linux-2.6.13/fs/open.c
@@ -211,6 +211,14 @@ int do_truncate(struct dentry *dentry, l
return err;
}
+static inline int do_posix_truncate(struct dentry *dentry, loff_t length)
+{
+ /* In SuS/Posix lore, truncate to the current file size is a no-op */
+ if (length == i_size_read(dentry->d_inode))
+ return 0;
+ return do_truncate(dentry, length);
+}
+
static inline long do_sys_truncate(const char __user * path, loff_t length)
{
struct nameidata nd;
@@ -261,7 +269,7 @@ static inline long do_sys_truncate(const
error = locks_verify_truncate(inode, NULL, length);
if (!error) {
DQUOT_INIT(inode);
- error = do_truncate(nd.dentry, length);
+ error = do_posix_truncate(nd.dentry, length);
}
put_write_access(inode);
@@ -313,7 +321,7 @@ static inline long do_sys_ftruncate(unsi
error = locks_verify_truncate(inode, file, length);
if (!error)
- error = do_truncate(dentry, length);
+ error = do_posix_truncate(dentry, length);
out_putf:
fput(file);
out:
Trond Myklebust <[email protected]> wrote:
>
> +static inline int do_posix_truncate(struct dentry *dentry, loff_t length)
> +{
> + /* In SuS/Posix lore, truncate to the current file size is a no-op */
> + if (length == i_size_read(dentry->d_inode))
> + return 0;
> + return do_truncate(dentry, length);
> +}
We have the same optimisation in inode_setattr()...
to den 01.09.2005 Klokka 20:45 (-0700) skreiv Andrew Morton:
> Trond Myklebust <[email protected]> wrote:
> >
> > +static inline int do_posix_truncate(struct dentry *dentry, loff_t length)
> > +{
> > + /* In SuS/Posix lore, truncate to the current file size is a no-op */
> > + if (length == i_size_read(dentry->d_inode))
> > + return 0;
> > + return do_truncate(dentry, length);
> > +}
>
> We have the same optimisation in inode_setattr()...
Look again. The two are NOT the same.
The code in inode_setattr() will cause truncate() to erroneously update
the ctime/mtime.
Cheers,
Trond
Trond Myklebust <[email protected]> wrote:
>
> to den 01.09.2005 Klokka 20:45 (-0700) skreiv Andrew Morton:
> > Trond Myklebust <[email protected]> wrote:
> > >
> > > +static inline int do_posix_truncate(struct dentry *dentry, loff_t length)
> > > +{
> > > + /* In SuS/Posix lore, truncate to the current file size is a no-op */
> > > + if (length == i_size_read(dentry->d_inode))
> > > + return 0;
> > > + return do_truncate(dentry, length);
> > > +}
> >
> > We have the same optimisation in inode_setattr()...
>
> Look again. The two are NOT the same.
>
> The code in inode_setattr() will cause truncate() to erroneously update
> the ctime/mtime.
Of course. But with your patch, the optimisation in inode_setattr() is
redundant (except for O_TRUNC, perhaps).
VFS/NFS: Fix up behaviour w.r.t. truncate() and open(O_TRUNC)
POSIX and the SUSv3 specify that open(O_TRUNC) should always bump ctime/mtime
whereas truncate() should only do so if the file size actually changes.
Fix the behaviour of NFS, which currently is broken w.r.t. open(), and fix
the VFS truncate() so that it no enforces the POSIX rules.
Signed-off-by: Trond Myklebust <[email protected]>
---
attr.c | 14 +++-----------
nfs/inode.c | 5 -----
open.c | 25 +++++++++++++++++++++++--
3 files changed, 26 insertions(+), 18 deletions(-)
Index: linux-2.6.13/fs/nfs/inode.c
===================================================================
--- linux-2.6.13.orig/fs/nfs/inode.c
+++ linux-2.6.13/fs/nfs/inode.c
@@ -833,11 +833,6 @@ nfs_setattr(struct dentry *dentry, struc
struct nfs_fattr fattr;
int error;
- if (attr->ia_valid & ATTR_SIZE) {
- if (!S_ISREG(inode->i_mode) || attr->ia_size == i_size_read(inode))
- attr->ia_valid &= ~ATTR_SIZE;
- }
-
/* Optimization: if the end result is no change, don't RPC */
attr->ia_valid &= NFS_VALID_ATTRS;
if (attr->ia_valid == 0)
Index: linux-2.6.13/fs/open.c
===================================================================
--- linux-2.6.13.orig/fs/open.c
+++ linux-2.6.13/fs/open.c
@@ -206,11 +206,32 @@ int do_truncate(struct dentry *dentry, l
newattrs.ia_valid = ATTR_SIZE | ATTR_CTIME;
down(&dentry->d_inode->i_sem);
+ /* This should be used for open(O_TRUNC) and functions that need to
+ * set mtime/ctime whether or not the size changes
+ */
+ if (length == i_size_read(dentry->d_inode))
+ newattrs.ia_valid &= ~ATTR_SIZE;
err = notify_change(dentry, &newattrs);
up(&dentry->d_inode->i_sem);
return err;
}
+static int do_posix_truncate(struct dentry *dentry, loff_t length)
+{
+ int err = 0;
+ struct iattr newattrs;
+
+ newattrs.ia_size = length;
+ newattrs.ia_valid = ATTR_SIZE;
+
+ down(&dentry->d_inode->i_sem);
+ /* In SuS/Posix lore, truncate to the current file size is a no-op */
+ if (length != i_size_read(dentry->d_inode))
+ err = do_truncate(dentry, length);
+ up(&dentry->d_inode->i_sem);
+ return err;
+}
+
static inline long do_sys_truncate(const char __user * path, loff_t length)
{
struct nameidata nd;
@@ -261,7 +282,7 @@ static inline long do_sys_truncate(const
error = locks_verify_truncate(inode, NULL, length);
if (!error) {
DQUOT_INIT(inode);
- error = do_truncate(nd.dentry, length);
+ error = do_posix_truncate(nd.dentry, length);
}
put_write_access(inode);
@@ -313,7 +334,7 @@ static inline long do_sys_ftruncate(unsi
error = locks_verify_truncate(inode, file, length);
if (!error)
- error = do_truncate(dentry, length);
+ error = do_posix_truncate(dentry, length);
out_putf:
fput(file);
out:
Index: linux-2.6.13/fs/attr.c
===================================================================
--- linux-2.6.13.orig/fs/attr.c
+++ linux-2.6.13/fs/attr.c
@@ -70,17 +70,9 @@ int inode_setattr(struct inode * inode,
int error = 0;
if (ia_valid & ATTR_SIZE) {
- if (attr->ia_size != i_size_read(inode)) {
- error = vmtruncate(inode, attr->ia_size);
- if (error || (ia_valid == ATTR_SIZE))
- goto out;
- } else {
- /*
- * We skipped the truncate but must still update
- * timestamps
- */
- ia_valid |= ATTR_MTIME|ATTR_CTIME;
- }
+ error = vmtruncate(inode, attr->ia_size);
+ if (error || (ia_valid == ATTR_SIZE))
+ goto out;
}
if (ia_valid & ATTR_UID)
VFS/NFS: Fix up behaviour w.r.t. truncate() and open(O_TRUNC)
POSIX and the SUSv3 specify that open(O_TRUNC) should always bump ctime/mtime
whereas truncate() should only do so if the file size actually changes.
Fix the behaviour of NFS, which currently is broken w.r.t. open(), and fix
the VFS truncate() so that it no enforces the POSIX rules.
Signed-off-by: Trond Myklebust <[email protected]>
---
attr.c | 14 +++-----------
nfs/inode.c | 5 -----
open.c | 25 +++++++++++++++++++++++--
3 files changed, 26 insertions(+), 18 deletions(-)
Index: linux-2.6.13/fs/nfs/inode.c
===================================================================
--- linux-2.6.13.orig/fs/nfs/inode.c
+++ linux-2.6.13/fs/nfs/inode.c
@@ -833,11 +833,6 @@ nfs_setattr(struct dentry *dentry, struc
struct nfs_fattr fattr;
int error;
- if (attr->ia_valid & ATTR_SIZE) {
- if (!S_ISREG(inode->i_mode) || attr->ia_size == i_size_read(inode))
- attr->ia_valid &= ~ATTR_SIZE;
- }
-
/* Optimization: if the end result is no change, don't RPC */
attr->ia_valid &= NFS_VALID_ATTRS;
if (attr->ia_valid == 0)
Index: linux-2.6.13/fs/open.c
===================================================================
--- linux-2.6.13.orig/fs/open.c
+++ linux-2.6.13/fs/open.c
@@ -206,11 +206,32 @@ int do_truncate(struct dentry *dentry, l
newattrs.ia_valid = ATTR_SIZE | ATTR_CTIME;
down(&dentry->d_inode->i_sem);
+ /* This should be used for open(O_TRUNC) and functions that need to
+ * set mtime/ctime whether or not the size changes
+ */
+ if (length == i_size_read(dentry->d_inode))
+ newattrs.ia_valid &= ~ATTR_SIZE;
err = notify_change(dentry, &newattrs);
up(&dentry->d_inode->i_sem);
return err;
}
+static int do_posix_truncate(struct dentry *dentry, loff_t length)
+{
+ int err = 0;
+ struct iattr newattrs;
+
+ newattrs.ia_size = length;
+ newattrs.ia_valid = ATTR_SIZE;
+
+ down(&dentry->d_inode->i_sem);
+ /* In SuS/Posix lore, truncate to the current file size is a no-op */
+ if (length != i_size_read(dentry->d_inode))
+ err = notify_change(dentry, &newattrs);
+ up(&dentry->d_inode->i_sem);
+ return err;
+}
+
static inline long do_sys_truncate(const char __user * path, loff_t length)
{
struct nameidata nd;
@@ -261,7 +282,7 @@ static inline long do_sys_truncate(const
error = locks_verify_truncate(inode, NULL, length);
if (!error) {
DQUOT_INIT(inode);
- error = do_truncate(nd.dentry, length);
+ error = do_posix_truncate(nd.dentry, length);
}
put_write_access(inode);
@@ -313,7 +334,7 @@ static inline long do_sys_ftruncate(unsi
error = locks_verify_truncate(inode, file, length);
if (!error)
- error = do_truncate(dentry, length);
+ error = do_posix_truncate(dentry, length);
out_putf:
fput(file);
out:
Index: linux-2.6.13/fs/attr.c
===================================================================
--- linux-2.6.13.orig/fs/attr.c
+++ linux-2.6.13/fs/attr.c
@@ -70,17 +70,9 @@ int inode_setattr(struct inode * inode,
int error = 0;
if (ia_valid & ATTR_SIZE) {
- if (attr->ia_size != i_size_read(inode)) {
- error = vmtruncate(inode, attr->ia_size);
- if (error || (ia_valid == ATTR_SIZE))
- goto out;
- } else {
- /*
- * We skipped the truncate but must still update
- * timestamps
- */
- ia_valid |= ATTR_MTIME|ATTR_CTIME;
- }
+ error = vmtruncate(inode, attr->ia_size);
+ if (error || (ia_valid == ATTR_SIZE))
+ goto out;
}
if (ia_valid & ATTR_UID)
Trond Myklebust <[email protected]> wrote:
>
Looks good from a quick scan.
> +static int do_posix_truncate(struct dentry *dentry, loff_t length)
> +{
> + int err = 0;
> + struct iattr newattrs;
> +
> + newattrs.ia_size = length;
> + newattrs.ia_valid = ATTR_SIZE;
> +
> + down(&dentry->d_inode->i_sem);
> + /* In SuS/Posix lore, truncate to the current file size is a no-op */
> + if (length != i_size_read(dentry->d_inode))
> + err = notify_change(dentry, &newattrs);
> + up(&dentry->d_inode->i_sem);
> + return err;
> +}
I'm not sure that we really need to bother putting the i_size test inside
i_sem. If somebody is changing the file size in parallel with truncate
then they'll get unpredictable results anyway. Needs deep thought.
On Thu, Sep 01, 2005 at 11:43:17PM -0400, Trond Myklebust wrote:
> to den 01.09.2005 Klokka 19:38 (-0400) skreiv Trond Myklebust:
> > This is a consequence of 2.6 NFS clients optimising away unnecessary
> > truncate calls. Whereas this is correct behaviour for truncate(), it
> > appears to be incorrect for open(O_TRUNC).
> > In fact, local filesystems like xfs and ext3 appear to have the opposite
> > problem: they change ctime if you call ftruncate(0) on the zero-length
> > file, as the attached test shows.
The following patch fixes the problem, at least when applied against
2.6.8:
--- inode.c.orig 2005-08-31 16:54:27.000000000 -0600
+++ inode.c 2005-08-31 17:06:52.000000000 -0600
@@ -756,7 +756,7 @@
int error;
if (attr->ia_valid & ATTR_SIZE) {
- if (!S_ISREG(inode->i_mode) || attr->ia_size == i_size_read(inode))
+ if (!S_ISREG(inode->i_mode))
attr->ia_valid &= ~ATTR_SIZE;
}
> Could you please check if the following patch fixes NFS (and also the
> local filesystems) for you?
I'll try the latest in the flood today. Thanks for all the help!
--
Rob
On Fri, Sep 02, 2005 at 12:19:07AM -0400, Trond Myklebust wrote:
> fr den 02.09.2005 Klokka 00:15 (-0400) skreiv Trond Myklebust:
>
> > Sure. The other problem is that the test is made before the i_sem is
> > grabbed. OK, so how about the appended patch instead?
>
> Doh!
>
> Trond
> VFS/NFS: Fix up behaviour w.r.t. truncate() and open(O_TRUNC)
>
> POSIX and the SUSv3 specify that open(O_TRUNC) should always bump ctime/mtime
> whereas truncate() should only do so if the file size actually changes.
>
> Fix the behaviour of NFS, which currently is broken w.r.t. open(), and fix
> the VFS truncate() so that it no enforces the POSIX rules.
>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> attr.c | 14 +++-----------
> nfs/inode.c | 5 -----
> open.c | 25 +++++++++++++++++++++++--
> 3 files changed, 26 insertions(+), 18 deletions(-)
This patch does not fix the original issue - timestamps are not updated
as expected.
--
Rob