2019-12-04 23:41:43

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC] Thing 1: Shardmap fox Ext4

On Wed, Dec 04, 2019 at 11:31:50AM -0700, Andreas Dilger wrote:
> One important use case that we have for Lustre that is not yet in the
> upstream ext4[*] is the ability to do parallel directory operations.
> This means we can create, lookup, and/or unlink entries in the same
> directory concurrently, to increase parallelism for large directories.
>
>
> [*] we've tried to submit the pdirops patch a couple of times, but the
> main blocker is that the VFS has a single directory mutex and couldn't
> use the added functionality without significant VFS changes.
> Patch at https://git.whamcloud.com/?p=fs/lustre-release.git;f=ldiskfs/kernel_patches/patches/rhel8/ext4-pdirop.patch;hb=HEAD
>

The XFS folks recently added support for parallel directory operations
into the VFS, for the benefit of XFS has this feature. So it should
be possible adjust the patch so it will work with the upstream kernel
now.

Cheers,

- Ted



2019-12-06 01:17:12

by Dave Chinner

[permalink] [raw]
Subject: Re: [RFC] Thing 1: Shardmap fox Ext4

On Wed, Dec 04, 2019 at 06:41:06PM -0500, Theodore Y. Ts'o wrote:
> On Wed, Dec 04, 2019 at 11:31:50AM -0700, Andreas Dilger wrote:
> > One important use case that we have for Lustre that is not yet in the
> > upstream ext4[*] is the ability to do parallel directory operations.
> > This means we can create, lookup, and/or unlink entries in the same
> > directory concurrently, to increase parallelism for large directories.
> >
> >
> > [*] we've tried to submit the pdirops patch a couple of times, but the
> > main blocker is that the VFS has a single directory mutex and couldn't
> > use the added functionality without significant VFS changes.
> > Patch at https://git.whamcloud.com/?p=fs/lustre-release.git;f=ldiskfs/kernel_patches/patches/rhel8/ext4-pdirop.patch;hb=HEAD
> >
>
> The XFS folks recently added support for parallel directory operations
> into the VFS, for the benefit of XFS has this feature.

The use of shared i_rwsem locking on the directory inode during
lookup/pathwalk allows for concurrent lookup/readdir operations on
a single directory. However, the parent dir i_rwsem is still held
exclusive for directory modifications like create, unlink, etc.

IOWs, the VFS doesn't allow for concurrent directory modification
right now, and that's going to be the limiting factor no matter what
you do with internal filesystem locking.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2019-12-06 05:11:25

by Daniel Phillips

[permalink] [raw]
Subject: Re: [RFC] Thing 1: Shardmap for Ext4

On 2019-12-05 5:16 p.m., Dave Chinner wrote:
> On Wed, Dec 04, 2019 at 06:41:06PM -0500, Theodore Y. Ts'o wrote:
>> On Wed, Dec 04, 2019 at 11:31:50AM -0700, Andreas Dilger wrote:
>>> One important use case that we have for Lustre that is not yet in the
>>> upstream ext4[*] is the ability to do parallel directory operations.
>>> This means we can create, lookup, and/or unlink entries in the same
>>> directory concurrently, to increase parallelism for large directories.
>>>
>>> [*] we've tried to submit the pdirops patch a couple of times, but the
>>> main blocker is that the VFS has a single directory mutex and couldn't
>>> use the added functionality without significant VFS changes.
>>> Patch at https://git.whamcloud.com/?p=fs/lustre-release.git;f=ldiskfs/kernel_patches/patches/rhel8/ext4-pdirop.patch;hb=HEAD
>>>
>>
>> The XFS folks recently added support for parallel directory operations
>> into the VFS, for the benefit of XFS has this feature.
>
> The use of shared i_rwsem locking on the directory inode during
> lookup/pathwalk allows for concurrent lookup/readdir operations on
> a single directory. However, the parent dir i_rwsem is still held
> exclusive for directory modifications like create, unlink, etc.
>
> IOWs, the VFS doesn't allow for concurrent directory modification
> right now, and that's going to be the limiting factor no matter what
> you do with internal filesystem locking.

On a scale of 0 to 10, how hard do you think that would be to relax
in VFS, given the restriction of no concurrent inter-directory moves?

2019-12-08 22:44:36

by Dave Chinner

[permalink] [raw]
Subject: Re: [RFC] Thing 1: Shardmap for Ext4

On Thu, Dec 05, 2019 at 09:09:28PM -0800, Daniel Phillips wrote:
> On 2019-12-05 5:16 p.m., Dave Chinner wrote:
> > On Wed, Dec 04, 2019 at 06:41:06PM -0500, Theodore Y. Ts'o wrote:
> >> On Wed, Dec 04, 2019 at 11:31:50AM -0700, Andreas Dilger wrote:
> >>> One important use case that we have for Lustre that is not yet in the
> >>> upstream ext4[*] is the ability to do parallel directory operations.
> >>> This means we can create, lookup, and/or unlink entries in the same
> >>> directory concurrently, to increase parallelism for large directories.
> >>>
> >>> [*] we've tried to submit the pdirops patch a couple of times, but the
> >>> main blocker is that the VFS has a single directory mutex and couldn't
> >>> use the added functionality without significant VFS changes.
> >>> Patch at https://git.whamcloud.com/?p=fs/lustre-release.git;f=ldiskfs/kernel_patches/patches/rhel8/ext4-pdirop.patch;hb=HEAD
> >>>
> >>
> >> The XFS folks recently added support for parallel directory operations
> >> into the VFS, for the benefit of XFS has this feature.
> >
> > The use of shared i_rwsem locking on the directory inode during
> > lookup/pathwalk allows for concurrent lookup/readdir operations on
> > a single directory. However, the parent dir i_rwsem is still held
> > exclusive for directory modifications like create, unlink, etc.
> >
> > IOWs, the VFS doesn't allow for concurrent directory modification
> > right now, and that's going to be the limiting factor no matter what
> > you do with internal filesystem locking.
>
> On a scale of 0 to 10, how hard do you think that would be to relax
> in VFS, given the restriction of no concurrent inter-directory moves?

My initial reaction is to run away screaming in horror. Beyond that,
I have no idea what terrible dangers lurk in the dark shadows where
mortals fear to tread...

-Dave.
--
Dave Chinner
[email protected]