2010-10-25 07:06:52

by Dmitry Monakhov

[permalink] [raw]
Subject: [PATCH] ext4: optimize orphan_list handling for ext4_setattr

Surprisingly chown() on ext4 is not SMP scalable operation.
Due to unconditional orphan_del(NULL, inode) in ext4_setattr()
result in significant performance overhead because of global orphan
mutex, especially in no-journal mode (where orphan_add() is noop).
It is possible to skip explicit orphan_del if possible.
Results of fchown() micro-benchmark in no-journal mode
while (1) {
iteration++;
fchown(fd, uid, gid);
fchown(fd, uid + 1, gid + 1)
}
measured: iterations per millisecond
| nr_tasks | w/o patch | with patch |
| 1 | 142 | 185 |
| 4 | 109 | 642 |

Signed-off-by: Dmitry Monakhov <[email protected]>
---
fs/ext4/inode.c | 10 +++++++---
1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 18eed80..a16097b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5540,6 +5540,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
{
struct inode *inode = dentry->d_inode;
int error, rc = 0;
+ int orphan = 0;
const unsigned int ia_valid = attr->ia_valid;

error = inode_change_ok(inode, attr);
@@ -5595,8 +5596,10 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
error = PTR_ERR(handle);
goto err_out;
}
-
- error = ext4_orphan_add(handle, inode);
+ if (ext4_handle_valid(handle)) {
+ error = ext4_orphan_add(handle, inode);
+ orphan = 1;
+ }
EXT4_I(inode)->i_disksize = attr->ia_size;
rc = ext4_mark_inode_dirty(handle, inode);
if (!error)
@@ -5608,6 +5611,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
attr->ia_size);
if (error) {
/* Do as much error cleanup as possible */
+ orphan = 0;
handle = ext4_journal_start(inode, 3);
if (IS_ERR(handle)) {
ext4_orphan_del(NULL, inode);
@@ -5636,7 +5640,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
* If the call to ext4_truncate failed to get a transaction handle at
* all, we need to clean up the in-core orphan list manually.
*/
- if (inode->i_nlink)
+ if (orphan && inode->i_nlink)
ext4_orphan_del(NULL, inode);

if (!rc && (ia_valid & ATTR_MODE))
--
1.6.6.1



2010-11-03 09:06:43

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] ext4: optimize orphan_list handling for ext4_setattr

On 2010-10-25, at 01:06, Dmitry Monakhov wrote:
> Surprisingly chown() on ext4 is not SMP scalable operation.
> Due to unconditional orphan_del(NULL, inode) in ext4_setattr()
> result in significant performance overhead because of global orphan
> mutex, especially in no-journal mode (where orphan_add() is noop).
> It is possible to skip explicit orphan_del if possible.
>
> Results of fchown() micro-benchmark in no-journal mode
> while (1) {
> iteration++;
> fchown(fd, uid, gid);
> fchown(fd, uid + 1, gid + 1)
> }
> measured: iterations per millisecond
> | nr_tasks | w/o patch | with patch |
> | 1 | 142 | 185 |
> | 4 | 109 | 642 |

AFAICS, both ext4_orphan_add() and ext4_orphan_del() already check right at the top of the function whether the handle is valid, so I can't really understand how this patch would make any difference for no-journal mode.

> @@ -5636,7 +5640,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr

> + if (ext4_handle_valid(handle)) {
> + error = ext4_orphan_add(handle, inode);
> + orphan = 1;
> + }
> EXT4_I(inode)->i_disksize = attr->ia_size;
> rc = ext4_mark_inode_dirty(handle, inode);
> if (!error)

> @@ -5636,7 +5640,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr
> * If the call to ext4_truncate failed to get a transaction handle at
> * all, we need to clean up the in-core orphan list manually.
> */
> - if (inode->i_nlink)
> + if (orphan && inode->i_nlink)
> ext4_orphan_del(NULL, inode);
>
> if (!rc && (ia_valid & ATTR_MODE))

Basically, the first hunk is moving the ext4_handle_valid() into the caller, and the second is avoiding a call to ext4_orphan_del->ext4_handle_valid(), which would likely be an unmeasurable performance difference. No locks are taken (or avoided) before ext4_handle_valid() is checked.

I think the actual fix would be to always set "orphan = 1" after ext4_orphan_add() is called, and only call ext4_orphan_del() in that case. This is essentially what your patch does for with-journal mode, and the non-journal mode is a red-herring due to the early exit from ext4_orphan_del().

Even better would be a no-lock check of list_empty(&ei->i_orphan()) before getting s_orphan_lock, since it shouldn't be possible to have two threads adding/removing the same inode to the orphan list (otherwise the inode may not be on the orphan list at all since add/del is not refcounted). The only reason for s_orphan_lock is to prevent corruption of the global list.

Cheers, Andreas






2010-11-03 09:31:58

by Dmitry Monakhov

[permalink] [raw]
Subject: Re: [PATCH] ext4: optimize orphan_list handling for ext4_setattr

On Wed, 3 Nov 2010 03:06:41 -0600, Andreas Dilger <[email protected]> wrote:
> On 2010-10-25, at 01:06, Dmitry Monakhov wrote:
> > Surprisingly chown() on ext4 is not SMP scalable operation.
> > Due to unconditional orphan_del(NULL, inode) in ext4_setattr()
> > result in significant performance overhead because of global orphan
> > mutex, especially in no-journal mode (where orphan_add() is noop).
> > It is possible to skip explicit orphan_del if possible.
> >
> > Results of fchown() micro-benchmark in no-journal mode
> > while (1) {
> > iteration++;
> > fchown(fd, uid, gid);
> > fchown(fd, uid + 1, gid + 1)
> > }
> > measured: iterations per millisecond
> > | nr_tasks | w/o patch | with patch |
> > | 1 | 142 | 185 |
> > | 4 | 109 | 642 |
>
> AFAICS, both ext4_orphan_add() and ext4_orphan_del() already check right at the top of the function whether the handle is valid, so I can't really understand how this patch would make any difference for no-journal mode.
>
> > @@ -5636,7 +5640,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr
>
> > + if (ext4_handle_valid(handle)) {
> > + error = ext4_orphan_add(handle, inode);
> > + orphan = 1;
> > + }
> > EXT4_I(inode)->i_disksize = attr->ia_size;
> > rc = ext4_mark_inode_dirty(handle, inode);
> > if (!error)
>
> > @@ -5636,7 +5640,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr
> > * If the call to ext4_truncate failed to get a transaction handle at
> > * all, we need to clean up the in-core orphan list manually.
> > */
> > - if (inode->i_nlink)
> > + if (orphan && inode->i_nlink)
> > ext4_orphan_del(NULL, inode);
> >
> > if (!rc && (ia_valid & ATTR_MODE))
>
> Basically, the first hunk is moving the ext4_handle_valid() into the caller, and the second is avoiding a call to ext4_orphan_del->ext4_handle_valid(), which would likely be an unmeasurable performance difference. No locks are taken (or avoided) before ext4_handle_valid() is checked.
>
> I think the actual fix would be to always set "orphan = 1" after
> ext4_orphan_add() is called,
But in case of nojournal mode it will be called with noop result.
but orphan_del(NULL, inode) will result in useless locking.
> and only call ext4_orphan_del() in that case. This is essentially what your patch does for with-journal mode, and the non-journal mode is a red-herring due to the early exit from ext4_orphan_del().
>
> Even better would be a no-lock check of list_empty(&ei->i_orphan()) before getting s_orphan_lock, since it shouldn't be possible to have two threads adding/removing the same inode to the orphan list (otherwise the inode may not be on the orphan list at all since add/del is not refcounted). The only reason for s_orphan_lock is to prevent corruption of the global list.
Yes, this is the best solution. Actually i've thought about that, but
endup with more obvious fix for nojournal mode.
With that we can avoid 1 of 3 locks on truncate in journal mode.
I'll test that solution.
>
> Cheers, Andreas
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2010-11-03 17:35:42

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] ext4: optimize orphan_list handling for ext4_setattr

On 2010-11-03, at 03:31, Dmitry wrote:
> On Wed, 3 Nov 2010 03:06:41 -0600, Andreas Dilger <[email protected]> wrote:
>>
>> Basically, the first hunk is moving the ext4_handle_valid() into the caller, and the second is avoiding a call to ext4_orphan_del->ext4_handle_valid(), which would likely be an unmeasurable performance difference. No locks are taken (or avoided) before ext4_handle_valid() is checked.
>
> But in case of nojournal mode it will be called with noop result.
> but orphan_del(NULL, inode) will result in useless locking.

Looking at ext4_orphan_del(), I'm not sure why it checks:

/* ext4_handle_valid() assumes a valid handle_t pointer */
if (handle && !ext4_handle_valid(handle))
return 0;

when in ext4_orphan_add() it is only checks:

if (!ext4_handle_valid(handle))
return 0;

I think the extra check for "handle" in ext4_orphan_del() is not needed, despite the comment there, because none of the other callsites for ext4_handle_valid() do this extra check.

>> Even better would be a no-lock check of list_empty(&ei->i_orphan()) before getting s_orphan_lock, since it shouldn't be possible to have two threads adding/removing the same inode to the orphan list (otherwise the inode may not be on the orphan list at all since add/del is not refcounted). The only reason for s_orphan_lock is to prevent corruption of the global list.
>
> Yes, this is the best solution. Actually i've thought about that, but
> end up with more obvious fix for nojournal mode.
> With that we can avoid 1 of 3 locks on truncate in journal mode.
> I'll test that solution.

Cheers, Andreas