2011-10-03 19:01:51

by Allison Henderson

[permalink] [raw]
Subject: Plan for reducing i_mutex in ext4

Hi all,

I've been working on locating all the existing uses of i_mutex in the
current ext4 code because I know we are planning to reduce the usage of
i_mutex in ext4. So I've gone through the ext4 code and also the vfs
code and come up with a list of ext4 items that appear to be protected
under i_mutex. I'm thinking about doing a patch to replace i_mutex with
a private ext4 mutex, and I wanted to update folks on this idea and pick
up any feed back people might have.

I'm thinking maybe we can have a separate mutex for functions that only
modify meta data like ext4_ioctl and ext4_setattr to help relieve
unneeded contention. And then the rest of functions that are modifying
data can go under a data mutex (including truncate since sometimes
ext4_ioctl and ext4_setattr will call ext4_truncate if they modify i_size).

So these are ext4 functions that currently lock i_mutex:

ext4_sync_file
ext4_fallocate
ext4_move_extents via two helper routines:
mext_inode_double_lock and mext_inode_double_unlock
ext4_ioctl (for the EXT4_IOC_SETFLAGS ioctl)
ext4_quota_write
ext4_llseek
ext4_end_io_work
ext4_evict_inode (only while calling ext4_flush_completed_IO)
ext4_ind_direct_IO (only while calling ext4_flush_completed_IO)


And these are ext4 functions that have i_mutex locked by the vfs layer.
So we will need to lock the new private mutex here too if we want them
to be synchronous with the above functions.

ext4_setattr
ext4_da_writepages
ext4_rmdir
ext4_unlink
ext4_symlink
ext4_link
ext4_rename

And one unique case:
ext4_fiemap calls generic_block_fiemap and passes it a function pointer
to ext4_get_block. generic_block_fiemap will lock i_mutex before
calling the pointer. I dont think ext4_get_block needs i_mutex locked
all the time, so I think we can just make a wrapper for ext4_get_block
that locks the new private mutex and then we can pass a pointer to the
wrapper.


That's my list so far, if anyone knows of one I missed please let me
know, and also if you spot any other places where we can reduce unneeded
contention by using a separate lock. Thx!

Allison Henderson


2011-10-04 08:38:43

by Lukas Czerner

[permalink] [raw]
Subject: Re: Plan for reducing i_mutex in ext4

On Mon, 3 Oct 2011, Allison Henderson wrote:

> Hi all,
>
> I've been working on locating all the existing uses of i_mutex in the current
> ext4 code because I know we are planning to reduce the usage of i_mutex in
> ext4. So I've gone through the ext4 code and also the vfs code and come up
> with a list of ext4 items that appear to be protected under i_mutex. I'm
> thinking about doing a patch to replace i_mutex with a private ext4 mutex, and
> I wanted to update folks on this idea and pick up any feed back people might
> have.
>
> I'm thinking maybe we can have a separate mutex for functions that only modify
> meta data like ext4_ioctl and ext4_setattr to help relieve unneeded
> contention. And then the rest of functions that are modifying data can go
> under a data mutex (including truncate since sometimes ext4_ioctl and
> ext4_setattr will call ext4_truncate if they modify i_size).

Just the other day I was talking with Christoph (adding him to cc) about
this, but unfortunately I still did not have time to look at this, but I
am glad that someone did.

His suggestion was a bit more general than creating separate ext4
specific mutex. His idea was to change i_mutex to union of plain mutex
for directories and a rwlock for regular files. Then this union can be
used in other file systems as well, for example to replace xfs_iolock in
xfs.

Also it might be nice to do something smarter than just a rwlock for
regular files. It would be nice to have an structure of extent locks, so
we can use it for file system using extents, which will improve
scalability while hammering a single file from different processes.

Note that currently ext4 concurrent read/write are atomic only wrt
individual pages, but not on the system call as the whole. This might
cause read() to return data mixed from several different writes, which
is not posix conform. That could be solved with the generic rwlock for
files, or even better with the system of extent locking.

But Christoph, can probably describe hi idea a bit better.

Thanks!
-Lukas

>
> So these are ext4 functions that currently lock i_mutex:
>
> ext4_sync_file
> ext4_fallocate
> ext4_move_extents via two helper routines:
> mext_inode_double_lock and mext_inode_double_unlock
> ext4_ioctl (for the EXT4_IOC_SETFLAGS ioctl)
> ext4_quota_write
> ext4_llseek
> ext4_end_io_work
> ext4_evict_inode (only while calling ext4_flush_completed_IO)
> ext4_ind_direct_IO (only while calling ext4_flush_completed_IO)
>
>
> And these are ext4 functions that have i_mutex locked by the vfs layer. So we
> will need to lock the new private mutex here too if we want them to be
> synchronous with the above functions.
>
> ext4_setattr
> ext4_da_writepages
> ext4_rmdir
> ext4_unlink
> ext4_symlink
> ext4_link
> ext4_rename
>
> And one unique case:
> ext4_fiemap calls generic_block_fiemap and passes it a function pointer to
> ext4_get_block. generic_block_fiemap will lock i_mutex before calling the
> pointer. I dont think ext4_get_block needs i_mutex locked all the time, so I
> think we can just make a wrapper for ext4_get_block that locks the new private
> mutex and then we can pass a pointer to the wrapper.
>
>
> That's my list so far, if anyone knows of one I missed please let me know, and
> also if you spot any other places where we can reduce unneeded contention by
> using a separate lock. Thx!
>
> Allison Henderson
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--

2011-10-04 08:54:03

by Dmitry Monakhov

[permalink] [raw]
Subject: Re: Plan for reducing i_mutex in ext4

On Mon, 03 Oct 2011 12:00:00 -0700, Allison Henderson <[email protected]> wrote:
> Hi all,
>
> I've been working on locating all the existing uses of i_mutex in the
> current ext4 code because I know we are planning to reduce the usage of
> i_mutex in ext4. So I've gone through the ext4 code and also the vfs
> code and come up with a list of ext4 items that appear to be protected
> under i_mutex. I'm thinking about doing a patch to replace i_mutex with
> a private ext4 mutex, and I wanted to update folks on this idea and pick
> up any feed back people might have.
>
> I'm thinking maybe we can have a separate mutex for functions that only
> modify meta data like ext4_ioctl and ext4_setattr to help relieve
> unneeded contention.
Are you going to change vfs core locking?
> And then the rest of functions that are modifying
> data can go under a data mutex (including truncate since sometimes
> ext4_ioctl and ext4_setattr will call ext4_truncate if they modify i_size).
>
> So these are ext4 functions that currently lock i_mutex:
>
> ext4_sync_file
> ext4_fallocate
> ext4_move_extents via two helper routines:
> mext_inode_double_lock and mext_inode_double_unlock
> ext4_ioctl (for the EXT4_IOC_SETFLAGS ioctl)
> ext4_quota_write
We can easily avoid i_mutex on quota write because quota file can not
be truncated, and grows only in case of new dquot added.
I'll send you a patch.
> ext4_llseek
> ext4_end_io_work
> ext4_evict_inode (only while calling ext4_flush_completed_IO)
> ext4_ind_direct_IO (only while calling ext4_flush_completed_IO)
>
>
> And these are ext4 functions that have i_mutex locked by the vfs layer.
> So we will need to lock the new private mutex here too if we want them
> to be synchronous with the above functions.
>
> ext4_setattr
> ext4_da_writepages
> ext4_rmdir
> ext4_unlink
> ext4_symlink
> ext4_link
> ext4_rename
>
> And one unique case:
> ext4_fiemap calls generic_block_fiemap and passes it a function pointer
> to ext4_get_block. generic_block_fiemap will lock i_mutex before
> calling the pointer. I dont think ext4_get_block needs i_mutex locked
> all the time, so I think we can just make a wrapper for ext4_get_block
> that locks the new private mutex and then we can pass a pointer to the
> wrapper.
>
>
> That's my list so far, if anyone knows of one I missed please let me
> know, and also if you spot any other places where we can reduce unneeded
> contention by using a separate lock. Thx!
>
> Allison Henderson
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2011-10-04 19:18:43

by Allison Henderson

[permalink] [raw]
Subject: Re: Plan for reducing i_mutex in ext4

On 10/04/2011 01:57 AM, Dmitry Monakhov wrote:
> On Mon, 03 Oct 2011 12:00:00 -0700, Allison Henderson<[email protected]> wrote:
>> Hi all,
>>
>> I've been working on locating all the existing uses of i_mutex in the
>> current ext4 code because I know we are planning to reduce the usage of
>> i_mutex in ext4. So I've gone through the ext4 code and also the vfs
>> code and come up with a list of ext4 items that appear to be protected
>> under i_mutex. I'm thinking about doing a patch to replace i_mutex with
>> a private ext4 mutex, and I wanted to update folks on this idea and pick
>> up any feed back people might have.
>>
>> I'm thinking maybe we can have a separate mutex for functions that only
>> modify meta data like ext4_ioctl and ext4_setattr to help relieve
>> unneeded contention.
> Are you going to change vfs core locking?

Hi there,

No, I initially had only thought about adding private locks to ext4, and
removing any occurrence of i_mutex locking in ext4, but it sounds like
Christoph has some more ideas to share to make this more generic.

>> And then the rest of functions that are modifying
>> data can go under a data mutex (including truncate since sometimes
>> ext4_ioctl and ext4_setattr will call ext4_truncate if they modify i_size).
>>
>> So these are ext4 functions that currently lock i_mutex:
>>
>> ext4_sync_file
>> ext4_fallocate
>> ext4_move_extents via two helper routines:
>> mext_inode_double_lock and mext_inode_double_unlock
>> ext4_ioctl (for the EXT4_IOC_SETFLAGS ioctl)
>> ext4_quota_write
> We can easily avoid i_mutex on quota write because quota file can not
> be truncated, and grows only in case of new dquot added.
> I'll send you a patch.

Ah, alrighty then, thx! Any place we can currently remove i_mutex where
it is not needed is certainly helpful. :)

>> ext4_llseek
>> ext4_end_io_work
>> ext4_evict_inode (only while calling ext4_flush_completed_IO)
>> ext4_ind_direct_IO (only while calling ext4_flush_completed_IO)
>>
>>
>> And these are ext4 functions that have i_mutex locked by the vfs layer.
>> So we will need to lock the new private mutex here too if we want them
>> to be synchronous with the above functions.
>>
>> ext4_setattr
>> ext4_da_writepages
>> ext4_rmdir
>> ext4_unlink
>> ext4_symlink
>> ext4_link
>> ext4_rename
>>
>> And one unique case:
>> ext4_fiemap calls generic_block_fiemap and passes it a function pointer
>> to ext4_get_block. generic_block_fiemap will lock i_mutex before
>> calling the pointer. I dont think ext4_get_block needs i_mutex locked
>> all the time, so I think we can just make a wrapper for ext4_get_block
>> that locks the new private mutex and then we can pass a pointer to the
>> wrapper.
>>
>>
>> That's my list so far, if anyone knows of one I missed please let me
>> know, and also if you spot any other places where we can reduce unneeded
>> contention by using a separate lock. Thx!
>>
>> Allison Henderson
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2011-10-06 17:45:27

by Allison Henderson

[permalink] [raw]
Subject: Re: Plan for reducing i_mutex in ext4

On 10/04/2011 01:38 AM, Lukas Czerner wrote:
> On Mon, 3 Oct 2011, Allison Henderson wrote:
>
>> Hi all,
>>
>> I've been working on locating all the existing uses of i_mutex in the current
>> ext4 code because I know we are planning to reduce the usage of i_mutex in
>> ext4. So I've gone through the ext4 code and also the vfs code and come up
>> with a list of ext4 items that appear to be protected under i_mutex. I'm
>> thinking about doing a patch to replace i_mutex with a private ext4 mutex, and
>> I wanted to update folks on this idea and pick up any feed back people might
>> have.
>>
>> I'm thinking maybe we can have a separate mutex for functions that only modify
>> meta data like ext4_ioctl and ext4_setattr to help relieve unneeded
>> contention. And then the rest of functions that are modifying data can go
>> under a data mutex (including truncate since sometimes ext4_ioctl and
>> ext4_setattr will call ext4_truncate if they modify i_size).
>
> Just the other day I was talking with Christoph (adding him to cc) about
> this, but unfortunately I still did not have time to look at this, but I
> am glad that someone did.
>
> His suggestion was a bit more general than creating separate ext4
> specific mutex. His idea was to change i_mutex to union of plain mutex
> for directories and a rwlock for regular files. Then this union can be
> used in other file systems as well, for example to replace xfs_iolock in
> xfs.
>
> Also it might be nice to do something smarter than just a rwlock for
> regular files. It would be nice to have an structure of extent locks, so
> we can use it for file system using extents, which will improve
> scalability while hammering a single file from different processes.
>
> Note that currently ext4 concurrent read/write are atomic only wrt
> individual pages, but not on the system call as the whole. This might
> cause read() to return data mixed from several different writes, which
> is not posix conform. That could be solved with the generic rwlock for
> files, or even better with the system of extent locking.
>
> But Christoph, can probably describe hi idea a bit better.
>
> Thanks!
> -Lukas

Hi Lukas,

Sorry for the delay, and thanks for the response :) Alrighty, I will
have to do some prototyping and see if I can work in some of these
concepts into a solution. At the moment, Im trying to make sure I come
up with something that still provides all the existing functionality so
I dont introduce any new race problems, but there's certainly a lot of
room for optimizing too. Thx!

Allison Henderson
>
>>
>> So these are ext4 functions that currently lock i_mutex:
>>
>> ext4_sync_file
>> ext4_fallocate
>> ext4_move_extents via two helper routines:
>> mext_inode_double_lock and mext_inode_double_unlock
>> ext4_ioctl (for the EXT4_IOC_SETFLAGS ioctl)
>> ext4_quota_write
>> ext4_llseek
>> ext4_end_io_work
>> ext4_evict_inode (only while calling ext4_flush_completed_IO)
>> ext4_ind_direct_IO (only while calling ext4_flush_completed_IO)
>>
>>
>> And these are ext4 functions that have i_mutex locked by the vfs layer. So we
>> will need to lock the new private mutex here too if we want them to be
>> synchronous with the above functions.
>>
>> ext4_setattr
>> ext4_da_writepages
>> ext4_rmdir
>> ext4_unlink
>> ext4_symlink
>> ext4_link
>> ext4_rename
>>
>> And one unique case:
>> ext4_fiemap calls generic_block_fiemap and passes it a function pointer to
>> ext4_get_block. generic_block_fiemap will lock i_mutex before calling the
>> pointer. I dont think ext4_get_block needs i_mutex locked all the time, so I
>> think we can just make a wrapper for ext4_get_block that locks the new private
>> mutex and then we can pass a pointer to the wrapper.
>>
>>
>> That's my list so far, if anyone knows of one I missed please let me know, and
>> also if you spot any other places where we can reduce unneeded contention by
>> using a separate lock. Thx!
>>
>> Allison Henderson
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>