2005-10-31 06:34:50

by NeilBrown

[permalink] [raw]
Subject: [PATCH against 2.6.14] truncate() or ftruncate shouldn't change mtime if size doesn't change.



According to Posix and SUS, truncate(2) and ftruncate(2) only update
ctime and mtime if the size actually changes. Linux doesn't currently
obey this.

There is no need to test the size under i_sem, as loosing any race
will not make a noticable different the mtime or ctime.

(According to SUS, truncate and ftruncate 'may' clear setuid/setgid
as well, currently we don't. Should we?
)


Signed-off-by: Neil Brown <[email protected]>

### Diffstat output
./fs/open.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff ./fs/open.c~current~ ./fs/open.c
--- ./fs/open.c~current~ 2005-10-31 16:22:44.000000000 +1100
+++ ./fs/open.c 2005-10-31 16:22:44.000000000 +1100
@@ -260,7 +260,8 @@ static inline long do_sys_truncate(const
goto dput_and_out;

error = locks_verify_truncate(inode, NULL, length);
- if (!error) {
+ if (!error &&
+ length != i_size_read(dentry->d_inode)) {
DQUOT_INIT(inode);
error = do_truncate(nd.dentry, length);
}
@@ -313,7 +314,8 @@ static inline long do_sys_ftruncate(unsi
goto out_putf;

error = locks_verify_truncate(inode, file, length);
- if (!error)
+ if (!error &&
+ length != i_size_read(dentry->d_inode))
error = do_truncate(dentry, length);
out_putf:
fput(file);


2005-10-31 06:49:11

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH against 2.6.14] truncate() or ftruncate shouldn't change mtime if size doesn't change.

NeilBrown <[email protected]> wrote:
>
>
>
> According to Posix and SUS, truncate(2) and ftruncate(2) only update
> ctime and mtime if the size actually changes. Linux doesn't currently
> obey this.
>
> There is no need to test the size under i_sem, as loosing any race
> will not make a noticable different the mtime or ctime.

Well if there's a race then the file may end up not being truncated after
this patch is applied. But that could have happened anwyay, so I don't see
a need for i_sem synchronisation either.

> (According to SUS, truncate and ftruncate 'may' clear setuid/setgid
> as well, currently we don't. Should we?
> )
>
>
> Signed-off-by: Neil Brown <[email protected]>
>
> ### Diffstat output
> ./fs/open.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff ./fs/open.c~current~ ./fs/open.c
> --- ./fs/open.c~current~ 2005-10-31 16:22:44.000000000 +1100
> +++ ./fs/open.c 2005-10-31 16:22:44.000000000 +1100
> @@ -260,7 +260,8 @@ static inline long do_sys_truncate(const
> goto dput_and_out;
>
> error = locks_verify_truncate(inode, NULL, length);
> - if (!error) {
> + if (!error &&
> + length != i_size_read(dentry->d_inode)) {

Odd code layout?

> DQUOT_INIT(inode);
> error = do_truncate(nd.dentry, length);
> }
> @@ -313,7 +314,8 @@ static inline long do_sys_ftruncate(unsi
> goto out_putf;
>
> error = locks_verify_truncate(inode, file, length);
> - if (!error)
> + if (!error &&
> + length != i_size_read(dentry->d_inode))
> error = do_truncate(dentry, length);
> out_putf:
> fput(file);

This partially obsoletes the similar optimisation in inode_setattr(). I
guess the optimisation there retains some usefulness for O_TRUNC opens of
zero-length files, but for symettry and micro-efficiency, perhaps we should
remvoe the inode_setattr() test and check for i_size==0 in may_open()?


2005-10-31 09:49:14

by Anton Altaparmakov

[permalink] [raw]
Subject: Re: [PATCH against 2.6.14] truncate() or ftruncate shouldn't change mtime if size doesn't change.

Hi,

On Sun, 2005-10-30 at 23:48 -0800, Andrew Morton wrote:
> NeilBrown <[email protected]> wrote:
> > According to Posix and SUS, truncate(2) and ftruncate(2) only update
> > ctime and mtime if the size actually changes. Linux doesn't currently
> > obey this.
> >
> > There is no need to test the size under i_sem, as loosing any race
> > will not make a noticable different the mtime or ctime.
>
> Well if there's a race then the file may end up not being truncated after
> this patch is applied. But that could have happened anwyay, so I don't see
> a need for i_sem synchronisation either.
>
> > (According to SUS, truncate and ftruncate 'may' clear setuid/setgid
> > as well, currently we don't. Should we?
> > )
> >
> >
> > Signed-off-by: Neil Brown <[email protected]>
> >
> > ### Diffstat output
> > ./fs/open.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff ./fs/open.c~current~ ./fs/open.c
> > --- ./fs/open.c~current~ 2005-10-31 16:22:44.000000000 +1100
> > +++ ./fs/open.c 2005-10-31 16:22:44.000000000 +1100
> > @@ -260,7 +260,8 @@ static inline long do_sys_truncate(const
> > goto dput_and_out;
> >
> > error = locks_verify_truncate(inode, NULL, length);
> > - if (!error) {
> > + if (!error &&
> > + length != i_size_read(dentry->d_inode)) {
>
> Odd code layout?
>
> > DQUOT_INIT(inode);
> > error = do_truncate(nd.dentry, length);
> > }
> > @@ -313,7 +314,8 @@ static inline long do_sys_ftruncate(unsi
> > goto out_putf;
> >
> > error = locks_verify_truncate(inode, file, length);
> > - if (!error)
> > + if (!error &&
> > + length != i_size_read(dentry->d_inode))
> > error = do_truncate(dentry, length);
> > out_putf:
> > fput(file);
>
> This partially obsoletes the similar optimisation in inode_setattr(). I
> guess the optimisation there retains some usefulness for O_TRUNC opens of
> zero-length files, but for symettry and micro-efficiency, perhaps we should
> remvoe the inode_setattr() test and check for i_size==0 in may_open()?

Sounds like a good idea. That does simplify inode_setattr() nicely...

Signed-off-by: Anton Altaparmakov <[email protected]>

--- linux-2.6/fs/attr.c.old 2005-10-31 09:29:38.000000000 +0000
+++ linux-2.6/fs/attr.c 2005-10-31 09:30:39.000000000 +0000
@@ -70,19 +70,10 @@ int inode_setattr(struct inode * inode,
int error = 0;

if (ia_valid & ATTR_SIZE) {
- if (attr->ia_size != i_size_read(inode)) {
- error = vmtruncate(inode, attr->ia_size);
- if (error || (ia_valid == ATTR_SIZE))
- goto out;
- } else {
- /*
- * We skipped the truncate but must still update
- * timestamps
- */
- ia_valid |= ATTR_MTIME|ATTR_CTIME;
- }
+ error = vmtruncate(inode, attr->ia_size);
+ if (error || (ia_valid == ATTR_SIZE))
+ goto out;
}
-
if (ia_valid & ATTR_UID)
inode->i_uid = attr->ia_uid;
if (ia_valid & ATTR_GID)

btw. Is it actually correct that we "goto out;" when "ia_valid ==
ATTR_SIZE"? That way we skip the mark_inode_dirty() call just before
the "out" label...

For ntfs at least that is fine because ntfs does an
"inode_update_time(inode, 1)" unconditionally in ntfs_truncate() even
when the size has not changed which calls mark_inode_dirty_sync() and
when the size changes it also does a "__mark_inode_dirty(inode,
I_DIRTY_SYNC | I_DIRTY_DATASYNC);" but I am not sure all filesystems are
fine in that respect?

Best regards,

Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

2005-10-31 10:17:39

by Anton Altaparmakov

[permalink] [raw]
Subject: Re: [PATCH against 2.6.14] truncate() or ftruncate shouldn't change mtime if size doesn't change.

On Mon, 2005-10-31 at 09:48 +0000, Anton Altaparmakov wrote:
> On Sun, 2005-10-30 at 23:48 -0800, Andrew Morton wrote:
> > NeilBrown <[email protected]> wrote:
> > > According to Posix and SUS, truncate(2) and ftruncate(2) only update
> > > ctime and mtime if the size actually changes. Linux doesn't currently
> > > obey this.
> > >
> > > There is no need to test the size under i_sem, as loosing any race
> > > will not make a noticable different the mtime or ctime.
> >
> > Well if there's a race then the file may end up not being truncated after
> > this patch is applied. But that could have happened anwyay, so I don't see
> > a need for i_sem synchronisation either.
> >
> > > (According to SUS, truncate and ftruncate 'may' clear setuid/setgid
> > > as well, currently we don't. Should we?
> > > )
> > >
> > >
> > > Signed-off-by: Neil Brown <[email protected]>
> > >
> > > ### Diffstat output
> > > ./fs/open.c | 6 ++++--
> > > 1 file changed, 4 insertions(+), 2 deletions(-)
> > >
> > > diff ./fs/open.c~current~ ./fs/open.c
> > > --- ./fs/open.c~current~ 2005-10-31 16:22:44.000000000 +1100
> > > +++ ./fs/open.c 2005-10-31 16:22:44.000000000 +1100
> > > @@ -260,7 +260,8 @@ static inline long do_sys_truncate(const
> > > goto dput_and_out;
> > >
> > > error = locks_verify_truncate(inode, NULL, length);
> > > - if (!error) {
> > > + if (!error &&
> > > + length != i_size_read(dentry->d_inode)) {
> >
> > Odd code layout?
> >
> > > DQUOT_INIT(inode);
> > > error = do_truncate(nd.dentry, length);
> > > }
> > > @@ -313,7 +314,8 @@ static inline long do_sys_ftruncate(unsi
> > > goto out_putf;
> > >
> > > error = locks_verify_truncate(inode, file, length);
> > > - if (!error)
> > > + if (!error &&
> > > + length != i_size_read(dentry->d_inode))
> > > error = do_truncate(dentry, length);
> > > out_putf:
> > > fput(file);
> >
> > This partially obsoletes the similar optimisation in inode_setattr(). I
> > guess the optimisation there retains some usefulness for O_TRUNC opens of
> > zero-length files, but for symettry and micro-efficiency, perhaps we should
> > remvoe the inode_setattr() test and check for i_size==0 in may_open()?
>
> Sounds like a good idea. That does simplify inode_setattr() nicely...

D'oh! I forgot to append the may_open diff! Below is full patch...

> Signed-off-by: Anton Altaparmakov <[email protected]>
>
> --- linux-2.6/fs/attr.c.old 2005-10-31 09:29:38.000000000 +0000
> +++ linux-2.6/fs/attr.c 2005-10-31 09:30:39.000000000 +0000
> @@ -70,19 +70,10 @@ int inode_setattr(struct inode * inode,
> int error = 0;
>
> if (ia_valid & ATTR_SIZE) {
> - if (attr->ia_size != i_size_read(inode)) {
> - error = vmtruncate(inode, attr->ia_size);
> - if (error || (ia_valid == ATTR_SIZE))
> - goto out;
> - } else {
> - /*
> - * We skipped the truncate but must still update
> - * timestamps
> - */
> - ia_valid |= ATTR_MTIME|ATTR_CTIME;
> - }
> + error = vmtruncate(inode, attr->ia_size);
> + if (error || (ia_valid == ATTR_SIZE))
> + goto out;
> }
> -
> if (ia_valid & ATTR_UID)
> inode->i_uid = attr->ia_uid;
> if (ia_valid & ATTR_GID)
>
> btw. Is it actually correct that we "goto out;" when "ia_valid ==
> ATTR_SIZE"? That way we skip the mark_inode_dirty() call just before
> the "out" label...
>
> For ntfs at least that is fine because ntfs does an
> "inode_update_time(inode, 1)" unconditionally in ntfs_truncate() even
> when the size has not changed which calls mark_inode_dirty_sync() and
> when the size changes it also does a "__mark_inode_dirty(inode,
> I_DIRTY_SYNC | I_DIRTY_DATASYNC);" but I am not sure all filesystems are
> fine in that respect?

Best regards,

Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

---

From: Anton Altaparmakov <[email protected]>

In may_open() only truncate when O_TRUNC is set and the file size is not
zero.

And in inode_setattr() always do the truncate as we catch the file size
not changing case earlier. (And on a race condition the filesystem will
notice that the size is not changing...)

Signed-off-by: Anton Altaparmakov <[email protected]>

--- linux-2.6/fs/attr.c.old 2005-10-31 09:29:38.000000000 +0000
+++ linux-2.6/fs/attr.c 2005-10-31 09:30:39.000000000 +0000
@@ -70,19 +70,10 @@ int inode_setattr(struct inode * inode,
int error = 0;

if (ia_valid & ATTR_SIZE) {
- if (attr->ia_size != i_size_read(inode)) {
- error = vmtruncate(inode, attr->ia_size);
- if (error || (ia_valid == ATTR_SIZE))
- goto out;
- } else {
- /*
- * We skipped the truncate but must still update
- * timestamps
- */
- ia_valid |= ATTR_MTIME|ATTR_CTIME;
- }
+ error = vmtruncate(inode, attr->ia_size);
+ if (error || (ia_valid == ATTR_SIZE))
+ goto out;
}
-
if (ia_valid & ATTR_UID)
inode->i_uid = attr->ia_uid;
if (ia_valid & ATTR_GID)
--- linux-2.6/fs/namei.c.old 2005-10-31 09:28:38.000000000 +0000
+++ linux-2.6/fs/namei.c 2005-10-31 09:30:39.000000000 +0000
@@ -1447,7 +1447,7 @@ int may_open(struct nameidata *nd, int a
if (error)
return error;

- if (flag & O_TRUNC) {
+ if (flag & O_TRUNC && i_size_read(inode)) {
error = get_write_access(inode);
if (error)
return error;


2005-10-31 10:30:23

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH against 2.6.14] truncate() or ftruncate shouldn't change mtime if size doesn't change.

On Monday October 31, [email protected] wrote:
> >
> > This partially obsoletes the similar optimisation in inode_setattr(). I
> > guess the optimisation there retains some usefulness for O_TRUNC opens of
> > zero-length files, but for symettry and micro-efficiency, perhaps we should
> > remvoe the inode_setattr() test and check for i_size==0 in may_open()?
>
> Sounds like a good idea. That does simplify inode_setattr() nicely...
>
> Signed-off-by: Anton Altaparmakov <[email protected]>
>
> --- linux-2.6/fs/attr.c.old 2005-10-31 09:29:38.000000000 +0000
> +++ linux-2.6/fs/attr.c 2005-10-31 09:30:39.000000000 +0000
> @@ -70,19 +70,10 @@ int inode_setattr(struct inode * inode,
> int error = 0;
>
> if (ia_valid & ATTR_SIZE) {
> - if (attr->ia_size != i_size_read(inode)) {
> - error = vmtruncate(inode, attr->ia_size);
> - if (error || (ia_valid == ATTR_SIZE))
> - goto out;
> - } else {
> - /*
> - * We skipped the truncate but must still update
> - * timestamps
> - */
> - ia_valid |= ATTR_MTIME|ATTR_CTIME;
> - }
> + error = vmtruncate(inode, attr->ia_size);
> + if (error || (ia_valid == ATTR_SIZE))
> + goto out;
> }
> -
> if (ia_valid & ATTR_UID)
> inode->i_uid = attr->ia_uid;
> if (ia_valid & ATTR_GID)
>
> btw. Is it actually correct that we "goto out;" when "ia_valid ==
> ATTR_SIZE"? That way we skip the mark_inode_dirty() call just before
> the "out" label...
>
> For ntfs at least that is fine because ntfs does an
> "inode_update_time(inode, 1)" unconditionally in ntfs_truncate() even
> when the size has not changed which calls mark_inode_dirty_sync() and
> when the size changes it also does a "__mark_inode_dirty(inode,
> I_DIRTY_SYNC | I_DIRTY_DATASYNC);" but I am not sure all filesystems are
> fine in that respect?
>

I think the 'goto' is fine as presumably an error from vmtruncate
means that nothing was changed, and as there are no other attr bits,
there is no need to mark_inode_dirty.

However, there always will be other ATTR bits as whenever we set
ATTR_SIZE (do_truncate and nfsd_setattr) we also set ATTR_CTIME,
so this "optimisation" is just a waste of space.
Best make it:

> if (ia_valid & ATTR_SIZE)
> error = vmtruncate(inode, attr->ia_size);

and assume that the filesystem's ->truncate or ->setattr will still
set mtime if the size doesn't change (->truncate would have a hard
time knowing if it has changed or not, so it has to set mtime
unconditionally if it exists .. ->setattr... probably does the right
thing)

NeilBrown

2005-10-31 10:41:46

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH against 2.6.14] truncate() or ftruncate shouldn't change mtime if size doesn't change.

On Monday October 31, [email protected] wrote:
>
> In may_open() only truncate when O_TRUNC is set and the file size is not
> zero.

No, this is wrong. open( ..,O_TRUNC) needs to update the mtime, even
if the file is already size==0.

What started me looking at this is that open( O_TRUNC) over NFS
*doesn't* update the mtime on an empty file, which is inconsistent
with local file systems, and dis-obeys SUS:

http://www.opengroup.org/onlinepubs/007908799/xsh/open.html

"If O_TRUNC is set and the file did previously exist, upon
successful completion, open() will mark for update the st_ctime and
st_mtime fields of the file. "

So we DO NOT want this change to may_open (and we DO want a different
change in NFS which I have asked Trond to submit... To be honest, this
first came up a couple of months ago, and Trond suggested a patch
then, but it didn't get any further. I'm just trying to make sure the
important bits do make it into the kernel..
http://lkml.org/lkml/2005/8/31/134
)

NeilBrown



> --- linux-2.6/fs/namei.c.old 2005-10-31 09:28:38.000000000 +0000
> +++ linux-2.6/fs/namei.c 2005-10-31 09:30:39.000000000 +0000
> @@ -1447,7 +1447,7 @@ int may_open(struct nameidata *nd, int a
> if (error)
> return error;
>
> - if (flag & O_TRUNC) {
> + if (flag & O_TRUNC && i_size_read(inode)) {
> error = get_write_access(inode);
> if (error)
> return error;
>

2005-10-31 10:43:09

by Anton Altaparmakov

[permalink] [raw]
Subject: Re: [PATCH against 2.6.14] truncate() or ftruncate shouldn't change mtime if size doesn't change.

On Mon, 2005-10-31 at 21:30 +1100, Neil Brown wrote:
> On Monday October 31, [email protected] wrote:
> > >
> > > This partially obsoletes the similar optimisation in inode_setattr(). I
> > > guess the optimisation there retains some usefulness for O_TRUNC opens of
> > > zero-length files, but for symettry and micro-efficiency, perhaps we should
> > > remvoe the inode_setattr() test and check for i_size==0 in may_open()?
> >
> > Sounds like a good idea. That does simplify inode_setattr() nicely...
> >
> > Signed-off-by: Anton Altaparmakov <[email protected]>
> >
> > --- linux-2.6/fs/attr.c.old 2005-10-31 09:29:38.000000000 +0000
> > +++ linux-2.6/fs/attr.c 2005-10-31 09:30:39.000000000 +0000
> > @@ -70,19 +70,10 @@ int inode_setattr(struct inode * inode,
> > int error = 0;
> >
> > if (ia_valid & ATTR_SIZE) {
> > - if (attr->ia_size != i_size_read(inode)) {
> > - error = vmtruncate(inode, attr->ia_size);
> > - if (error || (ia_valid == ATTR_SIZE))
> > - goto out;
> > - } else {
> > - /*
> > - * We skipped the truncate but must still update
> > - * timestamps
> > - */
> > - ia_valid |= ATTR_MTIME|ATTR_CTIME;
> > - }
> > + error = vmtruncate(inode, attr->ia_size);
> > + if (error || (ia_valid == ATTR_SIZE))
> > + goto out;
> > }
> > -
> > if (ia_valid & ATTR_UID)
> > inode->i_uid = attr->ia_uid;
> > if (ia_valid & ATTR_GID)
> >
> > btw. Is it actually correct that we "goto out;" when "ia_valid ==
> > ATTR_SIZE"? That way we skip the mark_inode_dirty() call just before
> > the "out" label...
> >
> > For ntfs at least that is fine because ntfs does an
> > "inode_update_time(inode, 1)" unconditionally in ntfs_truncate() even
> > when the size has not changed which calls mark_inode_dirty_sync() and
> > when the size changes it also does a "__mark_inode_dirty(inode,
> > I_DIRTY_SYNC | I_DIRTY_DATASYNC);" but I am not sure all filesystems are
> > fine in that respect?
>
> I think the 'goto' is fine as presumably an error from vmtruncate
> means that nothing was changed, and as there are no other attr bits,
> there is no need to mark_inode_dirty.

But is it not "and" it is "or", so when there is no error, i.e. changes
have happened, and no other attr bits are set, the inode is not marked
dirty. And if a filesystem does not mark it dirty itself, the inode
will not get marked dirty at all and strange things may result...

> However, there always will be other ATTR bits as whenever we set
> ATTR_SIZE (do_truncate and nfsd_setattr) we also set ATTR_CTIME,
> so this "optimisation" is just a waste of space.
> Best make it:
>
> > if (ia_valid & ATTR_SIZE)
> > error = vmtruncate(inode, attr->ia_size);
>
> and assume that the filesystem's ->truncate or ->setattr will still
> set mtime if the size doesn't change (->truncate would have a hard
> time knowing if it has changed or not, so it has to set mtime
> unconditionally if it exists .. ->setattr... probably does the right
> thing)

That is possible. It does change the behaviour though. Now you always
get the inode marked dirty where as before you did not. Mind you I
think it is possibly wrong now so changing it as you suggest would
definitely make it right.

Best regards,

Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

2005-10-31 12:56:12

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH against 2.6.14] truncate() or ftruncate shouldn't change mtime if size doesn't change.

On Mon, 2005-10-31 at 17:34 +1100, NeilBrown wrote:
>
> According to Posix and SUS, truncate(2) and ftruncate(2) only update
> ctime and mtime if the size actually changes. Linux doesn't currently
> obey this.
>
> There is no need to test the size under i_sem, as loosing any race
> will not make a noticable different the mtime or ctime.
>
> (According to SUS, truncate and ftruncate 'may' clear setuid/setgid
> as well, currently we don't. Should we?
> )
>
>
> Signed-off-by: Neil Brown <[email protected]>
>
> ### Diffstat output
> ./fs/open.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff ./fs/open.c~current~ ./fs/open.c
> --- ./fs/open.c~current~ 2005-10-31 16:22:44.000000000 +1100
> +++ ./fs/open.c 2005-10-31 16:22:44.000000000 +1100
> @@ -260,7 +260,8 @@ static inline long do_sys_truncate(const
> goto dput_and_out;
>
> error = locks_verify_truncate(inode, NULL, length);
> - if (!error) {
> + if (!error &&
> + length != i_size_read(dentry->d_inode)) {

This has me worried because it is putting yet another dependency on
cached attributes in the VFS.
This should normally be OK as far as NFS is concerned since we usually
end up revalidating the attribute cache in the lookup() code, but you
could imagine a networked filesystem that does not do this. I'd
therefore prefer if such checks were made in the filesystem itself.

What we can, however, do is to ensure that truncate() and ftruncate()
only set ATTR_SIZE, but ensure that may_open() sets ATTR_MTIME|
ATTR_CTIME as well.

Cheers,
Trond

2005-10-31 13:37:42

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH against 2.6.14] truncate() or ftruncate shouldn't change mtime if size doesn't change.

On Mon, 2005-10-31 at 07:51 -0500, Trond Myklebust wrote:
> This should normally be OK as far as NFS is concerned since we usually
> end up revalidating the attribute cache in the lookup() code, but you
> could imagine a networked filesystem that does not do this. I'd
> therefore prefer if such checks were made in the filesystem itself.

Actually, I'm wrong about this. ftruncate() doesn't do a
lookup/lookup_revalidate, so the current NFS code may end up using stale
attribute data.

Moving the checks to the VFS will just make that problem unfixable,
though.

Cheers,
Trond

2005-10-31 22:53:27

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH against 2.6.14] truncate() or ftruncate shouldn't change mtime if size doesn't change.

On Monday October 31, [email protected] wrote:
>
> What we can, however, do is to ensure that truncate() and ftruncate()
> only set ATTR_SIZE, but ensure that may_open() sets ATTR_MTIME|
> ATTR_CTIME as well.

Thanks. This makes lots of sense, in that it gives power to the
filesystem. We should keep the optimisations in placed where the
filesystem can over-ride them, just as inode_setattr.

So, here is a revised patch.
Comment?

Thanks,
NeilBrown


------------------
Fix some problems with truncate and mtime semantics.

SUS requires that when truncating a file to the size that it currently
is:
truncate and ftruncate should NOT modify ctime or mtime
O_TRUNC SHOULD modify ctime and mtime.

Currently mtime and ctime are always modified on most local
filesystems (side effect of ->truncate) or never modified (on NFS).

With this patch:
ATTR_CTIME|ATTR_MTIME are sent with ATTR_SIZE precisely when
an update of these times is required whether size changes or not
(via a new argument to do_truncate). This allows NFS to do
the right thing for O_TRUNC.
inode_setattr no longer forces ATTR_MTIME|ATTR_CTIME when the ATTR_SIZE
sets the size to it's current value. This allows local filesystems
to do the right thing for f?truncate.

Also, the logic in inode_setattr is changed a bit so there are two
return points. One returns the error from vmtruncate if it failed,
the other returns 0 (there can be no other failure).

Finally, if vmtruncate succeeds, and ATTR_SIZE is the only change
requested, we now fall-through and mark_inode_dirty. If a filesystem
did not have a ->truncate function, then vmtruncate will have changed
i_size, without marking the inode as 'dirty', and I think this is wrong.


Signed-off-by: Neil Brown <[email protected]>

### Diffstat output
./fs/attr.c | 22 +++++++---------------
./fs/exec.c | 2 +-
./fs/namei.c | 2 +-
./fs/open.c | 8 ++++----
./include/linux/fs.h | 2 +-
5 files changed, 14 insertions(+), 22 deletions(-)

diff ./fs/attr.c~current~ ./fs/attr.c
--- ./fs/attr.c~current~ 2005-11-01 09:42:22.000000000 +1100
+++ ./fs/attr.c 2005-11-01 09:22:40.000000000 +1100
@@ -67,20 +67,12 @@ EXPORT_SYMBOL(inode_change_ok);
int inode_setattr(struct inode * inode, struct iattr * attr)
{
unsigned int ia_valid = attr->ia_valid;
- int error = 0;

- if (ia_valid & ATTR_SIZE) {
- if (attr->ia_size != i_size_read(inode)) {
- error = vmtruncate(inode, attr->ia_size);
- if (error || (ia_valid == ATTR_SIZE))
- goto out;
- } else {
- /*
- * We skipped the truncate but must still update
- * timestamps
- */
- ia_valid |= ATTR_MTIME|ATTR_CTIME;
- }
+ if (ia_valid & ATTR_SIZE &&
+ attr->ia_size != i_size_read(inode)) {
+ int error = vmtruncate(inode, attr->ia_size);
+ if (error)
+ return error;
}

if (ia_valid & ATTR_UID)
@@ -104,8 +96,8 @@ int inode_setattr(struct inode * inode,
inode->i_mode = mode;
}
mark_inode_dirty(inode);
-out:
- return error;
+
+ return 0;
}
EXPORT_SYMBOL(inode_setattr);


diff ./fs/exec.c~current~ ./fs/exec.c
--- ./fs/exec.c~current~ 2005-11-01 09:42:22.000000000 +1100
+++ ./fs/exec.c 2005-11-01 09:19:35.000000000 +1100
@@ -1510,7 +1510,7 @@ int do_coredump(long signr, int exit_cod
goto close_fail;
if (!file->f_op->write)
goto close_fail;
- if (do_truncate(file->f_dentry, 0) != 0)
+ if (do_truncate(file->f_dentry, 0, 0) != 0)
goto close_fail;

retval = binfmt->core_dump(signr, regs, file);

diff ./fs/namei.c~current~ ./fs/namei.c
--- ./fs/namei.c~current~ 2005-11-01 09:42:22.000000000 +1100
+++ ./fs/namei.c 2005-11-01 09:24:34.000000000 +1100
@@ -1459,7 +1459,7 @@ int may_open(struct nameidata *nd, int a
if (!error) {
DQUOT_INIT(inode);

- error = do_truncate(dentry, 0);
+ error = do_truncate(dentry, 0, ATTR_MTIME|ATTR_CTIME);
}
put_write_access(inode);
if (error)

diff ./fs/open.c~current~ ./fs/open.c
--- ./fs/open.c~current~ 2005-11-01 09:42:22.000000000 +1100
+++ ./fs/open.c 2005-11-01 09:42:26.000000000 +1100
@@ -194,7 +194,7 @@ out:
return error;
}

-int do_truncate(struct dentry *dentry, loff_t length)
+int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs)
{
int err;
struct iattr newattrs;
@@ -204,7 +204,7 @@ int do_truncate(struct dentry *dentry, l
return -EINVAL;

newattrs.ia_size = length;
- newattrs.ia_valid = ATTR_SIZE | ATTR_CTIME;
+ newattrs.ia_valid = ATTR_SIZE | time_attrs;

down(&dentry->d_inode->i_sem);
err = notify_change(dentry, &newattrs);
@@ -262,7 +262,7 @@ static inline long do_sys_truncate(const
error = locks_verify_truncate(inode, NULL, length);
if (!error) {
DQUOT_INIT(inode);
- error = do_truncate(nd.dentry, length);
+ error = do_truncate(nd.dentry, length, 0);
}
put_write_access(inode);

@@ -314,7 +314,7 @@ static inline long do_sys_ftruncate(unsi

error = locks_verify_truncate(inode, file, length);
if (!error)
- error = do_truncate(dentry, length);
+ error = do_truncate(dentry, length, 0);
out_putf:
fput(file);
out:

diff ./include/linux/fs.h~current~ ./include/linux/fs.h
--- ./include/linux/fs.h~current~ 2005-11-01 09:42:22.000000000 +1100
+++ ./include/linux/fs.h 2005-11-01 09:18:35.000000000 +1100
@@ -1295,7 +1295,7 @@ static inline int break_lease(struct ino

/* fs/open.c */

-extern int do_truncate(struct dentry *, loff_t start);
+extern int do_truncate(struct dentry *, loff_t start, unsigned int time_attrs);
extern long do_sys_open(const char __user *filename, int flags, int mode);
extern struct file *filp_open(const char *, int, int);
extern struct file * dentry_open(struct dentry *, struct vfsmount *, int);

2005-11-01 00:40:37

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH against 2.6.14] truncate() or ftruncate shouldn't change mtime if size doesn't change.

On Tue, 2005-11-01 at 09:53 +1100, Neil Brown wrote:
> On Monday October 31, [email protected] wrote:
> >
> > What we can, however, do is to ensure that truncate() and ftruncate()
> > only set ATTR_SIZE, but ensure that may_open() sets ATTR_MTIME|
> > ATTR_CTIME as well.
>
> Thanks. This makes lots of sense, in that it gives power to the
> filesystem. We should keep the optimisations in placed where the
> filesystem can over-ride them, just as inode_setattr.
>
> So, here is a revised patch.
> Comment?

Just one. Don't most of the inode->i_op->truncate() methods on local
filesystems call mark_inode_dirty()?
If so, then the call at the end of inode_setattr() will be superfluous
for the case where ia_valid==ATTR_SIZE, since vmtruncate() will dirty
the inode for us.

Otherwise, this patch looks great!

Cheers,
Trond