LinuxLists.cc - [PATCH v2] ocfs2: Let ocfs2_setattr use new truncate sequence.

2010-06-10 03:54:29

Subject: [PATCH v2] ocfs2: Let ocfs2_setattr use new truncate sequence.

Let ocfs2 use the new truncate sequence. The changes include:
1. Remove the extra check for inode_newsize_ok since Christoph
has moved it into inode_change_ok. So we will check it at the
beginning of ocfs2_setattr.
2. Use truncate_setsize directly since we don't implement our
own ->truncate and what we need is "update i_size and
truncate_pagecache" which truncate_setsize now does.
3. For direct write, ocfs2 actually don't allow write to pass
i_size(see ocfs2_prepare_inode_for_write), so we don't have
a chance to increase i_size. So remove the bogus check.

Cc: Joel Becker <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Nick Piggin <[email protected]>
Signed-off-by: Tao Ma <[email protected]>
---
fs/ocfs2/file.c | 34 +++++-----------------------------
1 files changed, 5 insertions(+), 29 deletions(-)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 1fb0985..764fffb 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -983,10 +983,6 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
}

if (size_change && attr->ia_size != i_size_read(inode)) {
- status = inode_newsize_ok(inode, attr->ia_size);
- if (status)
- goto bail_unlock;
-
if (i_size_read(inode) > attr->ia_size) {
if (ocfs2_should_order_data(inode)) {
status = ocfs2_begin_ordered_truncate(inode,
@@ -1052,22 +1048,13 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
}

/*
- * This will intentionally not wind up calling truncate_setsize(),
- * since all the work for a size change has been done above.
- * Otherwise, we could get into problems with truncate as
- * ip_alloc_sem is used there to protect against i_size
- * changes.
- *
- * XXX: this means the conditional below can probably be removed.
+ * Since all the work for a size change has been done above.
+ * Call truncate_setsize directly to change size and truncate
+ * pagecache.
*/
if ((attr->ia_valid & ATTR_SIZE) &&
- attr->ia_size != i_size_read(inode)) {
- status = vmtruncate(inode, attr->ia_size);
- if (status) {
- mlog_errno(status);
- goto bail_commit;
- }
- }
+ attr->ia_size != i_size_read(inode))
+ truncate_setsize(inode, attr->ia_size);

setattr_copy(inode, attr);
mark_inode_dirty(inode);
@@ -2122,17 +2109,6 @@ relock:
written = generic_file_direct_write(iocb, iov, &nr_segs, *ppos,
ppos, count, ocount);
if (written < 0) {
- /*
- * direct write may have instantiated a few
- * blocks outside i_size. Trim these off again.
- * Don't need i_size_read because we hold i_mutex.
- *
- * XXX(truncate): this looks buggy because ocfs2 did not
- * actually implement ->truncate. Take a look at
- * the new truncate sequence and update this accordingly
- */
- if (*ppos + count > inode->i_size)
- truncate_setsize(inode, inode->i_size);
ret = written;
goto out_dio;
}
--
1.5.5

2010-06-10 04:43:04

by Nick Piggin

[permalink] [raw]

Subject: Re: [PATCH v2] ocfs2: Let ocfs2_setattr use new truncate sequence.

On Thu, Jun 10, 2010 at 11:53:06AM +0800, Tao Ma wrote:
> Let ocfs2 use the new truncate sequence. The changes include:
> 1. Remove the extra check for inode_newsize_ok since Christoph
> has moved it into inode_change_ok. So we will check it at the
> beginning of ocfs2_setattr.

So this deals with our questions regarding check of i_size outside
the inode cluster lock? (see fsdevel discussion)

> 2. Use truncate_setsize directly since we don't implement our
> own ->truncate and what we need is "update i_size and
> truncate_pagecache" which truncate_setsize now does.
> 3. For direct write, ocfs2 actually don't allow write to pass
> i_size(see ocfs2_prepare_inode_for_write), so we don't have
> a chance to increase i_size. So remove the bogus check.
>
> Cc: Joel Becker <[email protected]>
> Cc: Christoph Hellwig <[email protected]>
> Cc: Nick Piggin <[email protected]>
> Signed-off-by: Tao Ma <[email protected]>
> ---
> fs/ocfs2/file.c | 34 +++++-----------------------------
> 1 files changed, 5 insertions(+), 29 deletions(-)
>
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 1fb0985..764fffb 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -983,10 +983,6 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
> }
>
> if (size_change && attr->ia_size != i_size_read(inode)) {
> - status = inode_newsize_ok(inode, attr->ia_size);
> - if (status)
> - goto bail_unlock;
> -
> if (i_size_read(inode) > attr->ia_size) {

While you're here, you should be able to use inode->i_size if you're
under i_mutex, no?

2010-06-10 05:07:14

by Tao Ma

[permalink] [raw]

Subject: Re: [PATCH v2] ocfs2: Let ocfs2_setattr use new truncate sequence.

Hi Nick,

On 06/10/2010 12:42 PM, Nick Piggin wrote:
> On Thu, Jun 10, 2010 at 11:53:06AM +0800, Tao Ma wrote:
>> Let ocfs2 use the new truncate sequence. The changes include:
>> 1. Remove the extra check for inode_newsize_ok since Christoph
>> has moved it into inode_change_ok. So we will check it at the
>> beginning of ocfs2_setattr.
>
> So this deals with our questions regarding check of i_size outside
> the inode cluster lock? (see fsdevel discussion)
oh, I forget about this. yes, we should have cluster lock and shouldn't
remove this check.
>
>
>> 2. Use truncate_setsize directly since we don't implement our
>> own ->truncate and what we need is "update i_size and
>> truncate_pagecache" which truncate_setsize now does.
>> 3. For direct write, ocfs2 actually don't allow write to pass
>> i_size(see ocfs2_prepare_inode_for_write), so we don't have
>> a chance to increase i_size. So remove the bogus check.
>>
>> Cc: Joel Becker<[email protected]>
>> Cc: Christoph Hellwig<[email protected]>
>> Cc: Nick Piggin<[email protected]>
>> Signed-off-by: Tao Ma<[email protected]>
>> ---
>> fs/ocfs2/file.c | 34 +++++-----------------------------
>> 1 files changed, 5 insertions(+), 29 deletions(-)
>>
>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
>> index 1fb0985..764fffb 100644
>> --- a/fs/ocfs2/file.c
>> +++ b/fs/ocfs2/file.c
>> @@ -983,10 +983,6 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
>> }
>>
>> if (size_change&& attr->ia_size != i_size_read(inode)) {
>> - status = inode_newsize_ok(inode, attr->ia_size);
>> - if (status)
>> - goto bail_unlock;
>> -
>> if (i_size_read(inode)> attr->ia_size) {
>
> While you're here, you should be able to use inode->i_size if you're
> under i_mutex, no?
ok, will change it and the correpsonding part in truncate_setsize.

Regards,
Tao

2010-06-10 05:09:10

by Tao Ma

[permalink] [raw]

Subject: [PATCH v3] ocfs2: Let ocfs2_setattr use new truncate sequence.

Let ocfs2 use the new truncate sequence. The changes include:
1. Use truncate_setsize directly since we don't implement our
own ->truncate and what we need is "update i_size and
truncate_pagecache" which truncate_setsize now does.
2. For direct write, ocfs2 actually don't allow write to pass
i_size(see ocfs2_prepare_inode_for_write), so we don't have
a chance to increase i_size. So remove the bogus check.

Cc: Joel Becker <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Nick Piggin <[email protected]>
Signed-off-by: Tao Ma <[email protected]>
---
fs/ocfs2/file.c | 32 ++++++--------------------------
1 files changed, 6 insertions(+), 26 deletions(-)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 1fb0985..8b5447e 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -987,7 +987,7 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
if (status)
goto bail_unlock;

- if (i_size_read(inode) > attr->ia_size) {
+ if (inode->i_size > attr->ia_size) {
if (ocfs2_should_order_data(inode)) {
status = ocfs2_begin_ordered_truncate(inode,
attr->ia_size);
@@ -1052,22 +1052,13 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
}

/*
- * This will intentionally not wind up calling truncate_setsize(),
- * since all the work for a size change has been done above.
- * Otherwise, we could get into problems with truncate as
- * ip_alloc_sem is used there to protect against i_size
- * changes.
- *
- * XXX: this means the conditional below can probably be removed.
+ * Since all the work for a size change has been done above.
+ * Call truncate_setsize directly to change size and truncate
+ * pagecache.
*/
if ((attr->ia_valid & ATTR_SIZE) &&
- attr->ia_size != i_size_read(inode)) {
- status = vmtruncate(inode, attr->ia_size);
- if (status) {
- mlog_errno(status);
- goto bail_commit;
- }
- }
+ attr->ia_size != inode->i_size)
+ truncate_setsize(inode, attr->ia_size);

setattr_copy(inode, attr);
mark_inode_dirty(inode);
@@ -2122,17 +2113,6 @@ relock:
written = generic_file_direct_write(iocb, iov, &nr_segs, *ppos,
ppos, count, ocount);
if (written < 0) {
- /*
- * direct write may have instantiated a few
- * blocks outside i_size. Trim these off again.
- * Don't need i_size_read because we hold i_mutex.
- *
- * XXX(truncate): this looks buggy because ocfs2 did not
- * actually implement ->truncate. Take a look at
- * the new truncate sequence and update this accordingly
- */
- if (*ppos + count > inode->i_size)
- truncate_setsize(inode, inode->i_size);
ret = written;
goto out_dio;
}
--
1.5.5

2010-06-10 06:00:35

by Joel Becker

[permalink] [raw]

Subject: Re: [PATCH v3] ocfs2: Let ocfs2_setattr use new truncate sequence.

Acked-by: Joel Becker <[email protected]>

Al, can you carry this atop the truncate sequence code?

Joel

On Thu, Jun 10, 2010 at 01:08:05PM +0800, Tao Ma wrote:
> Let ocfs2 use the new truncate sequence. The changes include:
> 1. Use truncate_setsize directly since we don't implement our
> own ->truncate and what we need is "update i_size and
> truncate_pagecache" which truncate_setsize now does.
> 2. For direct write, ocfs2 actually don't allow write to pass
> i_size(see ocfs2_prepare_inode_for_write), so we don't have
> a chance to increase i_size. So remove the bogus check.
>
> Cc: Joel Becker <[email protected]>
> Cc: Christoph Hellwig <[email protected]>
> Cc: Nick Piggin <[email protected]>
> Signed-off-by: Tao Ma <[email protected]>
> ---
> fs/ocfs2/file.c | 32 ++++++--------------------------
> 1 files changed, 6 insertions(+), 26 deletions(-)
>
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 1fb0985..8b5447e 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -987,7 +987,7 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
> if (status)
> goto bail_unlock;
>
> - if (i_size_read(inode) > attr->ia_size) {
> + if (inode->i_size > attr->ia_size) {
> if (ocfs2_should_order_data(inode)) {
> status = ocfs2_begin_ordered_truncate(inode,
> attr->ia_size);
> @@ -1052,22 +1052,13 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
> }
>
> /*
> - * This will intentionally not wind up calling truncate_setsize(),
> - * since all the work for a size change has been done above.
> - * Otherwise, we could get into problems with truncate as
> - * ip_alloc_sem is used there to protect against i_size
> - * changes.
> - *
> - * XXX: this means the conditional below can probably be removed.
> + * Since all the work for a size change has been done above.
> + * Call truncate_setsize directly to change size and truncate
> + * pagecache.
> */
> if ((attr->ia_valid & ATTR_SIZE) &&
> - attr->ia_size != i_size_read(inode)) {
> - status = vmtruncate(inode, attr->ia_size);
> - if (status) {
> - mlog_errno(status);
> - goto bail_commit;
> - }
> - }
> + attr->ia_size != inode->i_size)
> + truncate_setsize(inode, attr->ia_size);
>
> setattr_copy(inode, attr);
> mark_inode_dirty(inode);
> @@ -2122,17 +2113,6 @@ relock:
> written = generic_file_direct_write(iocb, iov, &nr_segs, *ppos,
> ppos, count, ocount);
> if (written < 0) {
> - /*
> - * direct write may have instantiated a few
> - * blocks outside i_size. Trim these off again.
> - * Don't need i_size_read because we hold i_mutex.
> - *
> - * XXX(truncate): this looks buggy because ocfs2 did not
> - * actually implement ->truncate. Take a look at
> - * the new truncate sequence and update this accordingly
> - */
> - if (*ppos + count > inode->i_size)
> - truncate_setsize(inode, inode->i_size);
> ret = written;
> goto out_dio;
> }
> --
> 1.5.5
>

--

"If you took all of the grains of sand in the world, and lined
them up end to end in a row, you'd be working for the government!"
- Mr. Interesting

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2010-06-10 08:27:20

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH v3] ocfs2: Let ocfs2_setattr use new truncate sequence.

On Thu, Jun 10, 2010 at 01:08:05PM +0800, Tao Ma wrote:
> Let ocfs2 use the new truncate sequence. The changes include:
> 1. Use truncate_setsize directly since we don't implement our
> own ->truncate and what we need is "update i_size and
> truncate_pagecache" which truncate_setsize now does.
> 2. For direct write, ocfs2 actually don't allow write to pass
> i_size(see ocfs2_prepare_inode_for_write), so we don't have
> a chance to increase i_size. So remove the bogus check.

You just leave the duplicate inode_newsize_ok in, but still have
one as part of inode_change_ok. See the previous thread - we'll
need to move inode_change_ok to under the cluster locks, both
for the truncate and non-truncate case.

> /*
> + * Since all the work for a size change has been done above.
> + * Call truncate_setsize directly to change size and truncate
> + * pagecache.
> */
> if ((attr->ia_valid & ATTR_SIZE) &&
> + attr->ia_size != inode->i_size)

this could be on one line now.

> + truncate_setsize(inode, attr->ia_size);

But any reason this isn't done inside the

if (size_change && attr->ia_size != inode->i_size) {

conditional above? You'll never get size and uid/gid changes in the
same request, so there won't be any change in behaviour.

2010-06-10 08:45:16

by Tao Ma

[permalink] [raw]

Subject: Re: [PATCH v3] ocfs2: Let ocfs2_setattr use new truncate sequence.

On 06/10/2010 04:27 PM, Christoph Hellwig wrote:
> On Thu, Jun 10, 2010 at 01:08:05PM +0800, Tao Ma wrote:
>> Let ocfs2 use the new truncate sequence. The changes include:
>> 1. Use truncate_setsize directly since we don't implement our
>> own ->truncate and what we need is "update i_size and
>> truncate_pagecache" which truncate_setsize now does.
>> 2. For direct write, ocfs2 actually don't allow write to pass
>> i_size(see ocfs2_prepare_inode_for_write), so we don't have
>> a chance to increase i_size. So remove the bogus check.
>
> You just leave the duplicate inode_newsize_ok in, but still have
> one as part of inode_change_ok. See the previous thread - we'll
> need to move inode_change_ok to under the cluster locks, both
> for the truncate and non-truncate case.
uh, I just don't change the original inode_change_ok, and maybe you are
right that we should check all these under cluster lock. But it looks as
if it is written like this intentionally.

Mark and Joel, do you have any option that why we write like this or it
is a bug?
>
>> /*
>> + * Since all the work for a size change has been done above.
>> + * Call truncate_setsize directly to change size and truncate
>> + * pagecache.
>> */
>> if ((attr->ia_valid& ATTR_SIZE)&&
>> + attr->ia_size != inode->i_size)
>
> this could be on one line now.
ok, I will regenerate the patch after I get the feedback from Mark and Joel.
>
>> + truncate_setsize(inode, attr->ia_size);
>
> But any reason this isn't done inside the
>
> if (size_change&& attr->ia_size != inode->i_size) {
>
> conditional above? You'll never get size and uid/gid changes in the
> same request, so there won't be any change in behaviour.
Because we want the inode change in a transaction. In the above
condition, we do truncate/extend in a transaction. And after it is done,
we start a new transaction that update the inode info.

Regards,
Tao

2010-06-10 08:48:54

by Joel Becker

[permalink] [raw]

Subject: Re: [PATCH v3] ocfs2: Let ocfs2_setattr use new truncate sequence.

On Thu, Jun 10, 2010 at 10:27:11AM +0200, Christoph Hellwig wrote:
> You just leave the duplicate inode_newsize_ok in, but still have
> one as part of inode_change_ok. See the previous thread - we'll
> need to move inode_change_ok to under the cluster locks, both
> for the truncate and non-truncate case.

Is your concern that the u/gid checks may be against stale ids?

> > + truncate_setsize(inode, attr->ia_size);
>
> But any reason this isn't done inside the
>
> if (size_change && attr->ia_size != inode->i_size) {
>
> conditional above? You'll never get size and uid/gid changes in the
> same request, so there won't be any change in behaviour.

I think the code exists as-is so that the i_size update only
happens after the quota transfer has been approved. Jan added the quota
bits in this location.
I can't see a standard posix op that changes size and ids at the
same time. I think we just add BUG_ON expressions that ensure such a
behavior, right?

Joel

--

"I'm living so far beyond my income that we may almost be said
to be living apart."
- e e cummings

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2010-06-10 12:09:53

by Tao Ma

[permalink] [raw]

Subject: Re: [PATCH v3] ocfs2: Let ocfs2_setattr use new truncate sequence.

Joel Becker wrote:
> On Thu, Jun 10, 2010 at 10:27:11AM +0200, Christoph Hellwig wrote:
>
>> You just leave the duplicate inode_newsize_ok in, but still have
>> one as part of inode_change_ok. See the previous thread - we'll
>> need to move inode_change_ok to under the cluster locks, both
>> for the truncate and non-truncate case.
>>
>
> Is your concern that the u/gid checks may be against stale ids?
>
So I think we should have one inode_change_ok before the cluster lock
and another after the cluster lock.
The first one will save us a lot of cluster lock effort if the user pass
us the wrong arguments while the later
one will test again with the refreshed inode info.

Regards,
Tao
>
>>> + truncate_setsize(inode, attr->ia_size);
>>>
>> But any reason this isn't done inside the
>>
>> if (size_change && attr->ia_size != inode->i_size) {
>>
>> conditional above? You'll never get size and uid/gid changes in the
>> same request, so there won't be any change in behaviour.
>>
>
> I think the code exists as-is so that the i_size update only
> happens after the quota transfer has been approved. Jan added the quota
> bits in this location.
> I can't see a standard posix op that changes size and ids at the
> same time. I think we just add BUG_ON expressions that ensure such a
> behavior, right?
>
> Joel
>
>

2010-06-10 12:28:38

by Nick Piggin

[permalink] [raw]

Subject: Re: [PATCH v3] ocfs2: Let ocfs2_setattr use new truncate sequence.

On Thu, Jun 10, 2010 at 08:09:36PM +0800, Tao Ma wrote:
> Joel Becker wrote:
> >On Thu, Jun 10, 2010 at 10:27:11AM +0200, Christoph Hellwig wrote:
> >>You just leave the duplicate inode_newsize_ok in, but still have
> >>one as part of inode_change_ok. See the previous thread - we'll
> >>need to move inode_change_ok to under the cluster locks, both
> >>for the truncate and non-truncate case.
> >
> > Is your concern that the u/gid checks may be against stale ids?
> So I think we should have one inode_change_ok before the cluster
> lock and another after the cluster lock.
> The first one will save us a lot of cluster lock effort if the user
> pass us the wrong arguments while the later
> one will test again with the refreshed inode info.

If attributes cannot be stale, then do the checks before the cluster
lock and not again. If they can be stale, then the check outside the
cluster lock might give incorrect results so it is not harmless to do
it twice.

If you have a mix of some attributes may be stale, then why not do the
inode_change_ok check inside the lock, and then do some open code checks
for optimization before taking the lock.

2010-06-10 18:12:23

by Joel Becker

[permalink] [raw]

Subject: Re: [PATCH v3] ocfs2: Let ocfs2_setattr use new truncate sequence.

On Thu, Jun 10, 2010 at 08:09:36PM +0800, Tao Ma wrote:
> Joel Becker wrote:
> > Is your concern that the u/gid checks may be against stale ids?
> So I think we should have one inode_change_ok before the cluster
> lock and another after the cluster lock.
> The first one will save us a lot of cluster lock effort if the user
> pass us the wrong arguments while the later
> one will test again with the refreshed inode info.

But what if the other node has given us permission, and then we
fail? Say the file was owned by you. On node 2, root sets it to be
owned by me. Then on node 1, I go to change the file permissions.
inode_change_ok() will fail, because the in-memory inode still thinks
you are the owner.
I guess it does need to be under the lock.

Joel

--

Bram's Law:
The easier a piece of software is to write, the worse it's
implemented in practice.

Joel Becker
Principal Software Developer
Oracle
E-mail: [email protected]
Phone: (650) 506-8127

2010-06-11 00:01:23

by Tao Ma

[permalink] [raw]

Subject: Re: [PATCH v3] ocfs2: Let ocfs2_setattr use new truncate sequence.

Joel Becker wrote:
> On Thu, Jun 10, 2010 at 08:09:36PM +0800, Tao Ma wrote:
>
>> Joel Becker wrote:
>>
>>> Is your concern that the u/gid checks may be against stale ids?
>>>
>> So I think we should have one inode_change_ok before the cluster
>> lock and another after the cluster lock.
>> The first one will save us a lot of cluster lock effort if the user
>> pass us the wrong arguments while the later
>> one will test again with the refreshed inode info.
>>
>
> But what if the other node has given us permission, and then we
> fail? Say the file was owned by you. On node 2, root sets it to be
> owned by me. Then on node 1, I go to change the file permissions.
> inode_change_ok() will fail, because the in-memory inode still thinks
> you are the owner.
> I guess it does need to be under the lock.
>
OK, so I will revise my patch to move it under cluster lock.

Regards,
Tao