2017-08-03 11:31:05

by Wang Shilong

[permalink] [raw]
Subject: Re: quota: dqio_mutex design

Hello Guys,

We DDN is investigating the same issue!

Some comments comes:

On Thu, Aug 3, 2017 at 1:52 AM, Andrew Perepechko <[email protected]> wrote:
>> On Tue 01-08-17 15:02:42, Jan Kara wrote:
>> > Hi Andrew,
>> >
>> I've been experimenting with this today but this idea didn't bring any
>> benefit in my testing. Was your setup with multiple users or a single user?
>> Could you give some testing to my patches to see whether they bring some
>> benefit to you?
>>
>> Honza
>
> Hi Jan!
>
> My setup was with a single user. Unfortunately, it may take some time before
> I can try a patched kernel other than RHEL6 or RHEL7 with the same test,
> we have a lot of dependencies on these kernels.
>
> The actual test we ran was mdtest.
>
> By the way, we had 15+% performance improvement in creates from the
> change that was discussed earlier in this thread:
>
> EXT4_SB(dquot->dq_sb)->s_qf_names[GRPQUOTA]) {
> + if (test_bit(DQ_MOD_B, &dquot->dq_flags))
> + return 0;

I don't think this is right, as far as i understand, journal quota need go
together with quota space change update inside same transaction, this will
break consistency if power off or RO happen.

Here is some ideas that i have thought:

1) switch dqio_mutex to a read/write lock, especially, i think most of
time journal quota updates is in-place update, that means we don't need
change quota tree in memory, firstly try read lock, retry with write lock if
there is real tree change.

2)another is similar idea of Andrew's walkaround, but we need make correct
fix, maintain dirty list for per transaction, and gurantee quota updates are
flushed when commit transaction, this might be complex, i am not very
familiar with JBD2 codes.

It will be really nice if we could fix this regression, as we see 20% performace
regression.

Thanks,
Shilong

> dquot_mark_dquot_dirty(dquot);
> return ext4_write_dquot(dquot);
>
> The idea was that if we know that some thread is somewhere between
> mark_dirty and clear_dirty, then we can avoid blocking on dqio_mutex,
> since that thread will update the ondisk dquot for us.
>


2017-08-03 13:16:01

by Andrew Perepechko

[permalink] [raw]
Subject: Re: quota: dqio_mutex design

>
> I don't think this is right, as far as i understand, journal quota need go
> together with quota space change update inside same transaction, this will
> break consistency if power off or RO happen.
>

Hello Wang!

There is no transaction change in this case because all callers of this
function have open handles for the same transaction.

If you enter that DQ_MOD_B check, you are guaranteed to reference
the SAME transaction as the thread that's in between of mark_dirty
and clear_dirty.

Thank you,
Andrew

2017-08-03 13:19:02

by Wang Shilong

[permalink] [raw]
Subject: Re: quota: dqio_mutex design

Hi,

On Thu, Aug 3, 2017 at 8:24 PM, Andrew Perepechko <[email protected]> wrote:
>>
>> I don't think this is right, as far as i understand, journal quota need go
>> together with quota space change update inside same transaction, this will
>> break consistency if power off or RO happen.
>>
>
> Hello Wang!
>
> There is no transaction change in this case because all callers of this
> function have open handles for the same transaction.
>
> If you enter that DQ_MOD_B check, you are guaranteed to reference
> the SAME transaction as the thread that's in between of mark_dirty
> and clear_dirty.
>

This change mean if this dquot is dirty we skip, this
won't work because in this way, quota update is only kept in vfs dquota memory
and newer update is not wrote to journal file and not wrapped into transaction
too.

This is not what journal quota means to do.


Thanks,
Shilong


> Thank you,
> Andrew

2017-08-03 13:41:38

by Andrew Perepechko

[permalink] [raw]
Subject: Re: quota: dqio_mutex design

>
> This change mean if this dquot is dirty we skip, this
> won't work because in this way, quota update is only kept in vfs dquota
> memory and newer update is not wrote to journal file and not wrapped into
> transaction too.

That's not true.

As I explained earlier, having DQ_MOD_B set at this point means another
thread is going to write dquot but hasn't yet started doing so. This thread
does not care whether it updates the ondisk dquot with its own data or with
fresher data which came from another thread. In-core dquot has no indication
of whose data in contains.

As I also explained earlier, the update cannot happen in the context of
another transaction because thread A which sees DQ_MOD_B set and thread
B which is running dquot_commit() both have journal handles to the same
transaction. There's only one running transaction at a time and thread B does
not switch to another transaction.

Please read the code carefully.


>
> This is not what journal quota means to do.
>
>
> Thanks,
> Shilong
>
> > Thank you,
> > Andrew

2017-08-03 13:57:39

by Andrew Perepechko

[permalink] [raw]
Subject: Re: quota: dqio_mutex design

Let me put it this way:

Under file creation from different threads, ext4 will generate a series of
dquot updates (incore and then ondisk, through journal):

dquot update1
dquot update2
dquot update3
...
dquot updateN

Either with my patch or without it, ondisk dquot update through journal
may miss dquot update1, dquot update2, ... dquot update{N-1}.

You can easily see that from the code of dquot_commit():

int dquot_commit(struct dquot *dquot)
{
int ret = 0;
struct quota_info *dqopt = sb_dqopt(dquot->dq_sb);

mutex_lock(&dqopt->dqio_mutex);
spin_lock(&dq_list_lock);
if (!clear_dquot_dirty(dquot)) {
spin_unlock(&dq_list_lock);
goto out_sem;
}
...
}


If actual dquot_commit() wrote dquot update N, the threads commiting
updates 1 through N-1 will exit immediately once they get dqio_mutex
since the dquot will NOT be dirty.

My patch only avoids blocking on dqio_mutex when we know for sure
that another will NECESSARILY write the needed or a FRESHER dquot ondisk.

> > This change mean if this dquot is dirty we skip, this
> > won't work because in this way, quota update is only kept in vfs dquota
> > memory and newer update is not wrote to journal file and not wrapped into
> > transaction too.
>
> That's not true.
>
> As I explained earlier, having DQ_MOD_B set at this point means another
> thread is going to write dquot but hasn't yet started doing so. This thread
> does not care whether it updates the ondisk dquot with its own data or with
> fresher data which came from another thread. In-core dquot has no indication
> of whose data in contains.
>
> As I also explained earlier, the update cannot happen in the context of
> another transaction because thread A which sees DQ_MOD_B set and thread
> B which is running dquot_commit() both have journal handles to the same
> transaction. There's only one running transaction at a time and thread B
> does not switch to another transaction.
>
> Please read the code carefully.
>
> > This is not what journal quota means to do.
> >
> >
> > Thanks,
> > Shilong
> >
> > > Thank you,
> > > Andrew

2017-08-03 14:23:23

by Jan Kara

[permalink] [raw]
Subject: Re: quota: dqio_mutex design

On Thu 03-08-17 16:55:40, Andrew Perepechko wrote:
> Let me put it this way:
>
> Under file creation from different threads, ext4 will generate a series of
> dquot updates (incore and then ondisk, through journal):
>
> dquot update1
> dquot update2
> dquot update3
> ...
> dquot updateN
>
> Either with my patch or without it, ondisk dquot update through journal
> may miss dquot update1, dquot update2, ... dquot update{N-1}.
>
> You can easily see that from the code of dquot_commit():
>
> int dquot_commit(struct dquot *dquot)
> {
> int ret = 0;
> struct quota_info *dqopt = sb_dqopt(dquot->dq_sb);
>
> mutex_lock(&dqopt->dqio_mutex);
> spin_lock(&dq_list_lock);
> if (!clear_dquot_dirty(dquot)) {
> spin_unlock(&dq_list_lock);
> goto out_sem;
> }
> ...
> }
>
>
> If actual dquot_commit() wrote dquot update N, the threads commiting
> updates 1 through N-1 will exit immediately once they get dqio_mutex
> since the dquot will NOT be dirty.
>
> My patch only avoids blocking on dqio_mutex when we know for sure
> that another will NECESSARILY write the needed or a FRESHER dquot ondisk.

Yeah, I agree with Andrew. What they did is *almost* safe for ext4. The
only moment when it is not safe is when someone calls mark_dquot_dirty()
outside of a scope of a transaction which happens when doing Q_SETQUOTA
quotactl.

Another things which is subtle with Andrew's approach is that process
modifying quota information can return and stop its handle before quota
data gets copied to transaction buffer. This does not currently create any
real problem since nobody is relying on that however it relies on intimate
details of JBD2 transaction machinery and that could bite us in the future.

Honza

> > > This change mean if this dquot is dirty we skip, this
> > > won't work because in this way, quota update is only kept in vfs dquota
> > > memory and newer update is not wrote to journal file and not wrapped into
> > > transaction too.
> >
> > That's not true.
> >
> > As I explained earlier, having DQ_MOD_B set at this point means another
> > thread is going to write dquot but hasn't yet started doing so. This thread
> > does not care whether it updates the ondisk dquot with its own data or with
> > fresher data which came from another thread. In-core dquot has no indication
> > of whose data in contains.
> >
> > As I also explained earlier, the update cannot happen in the context of
> > another transaction because thread A which sees DQ_MOD_B set and thread
> > B which is running dquot_commit() both have journal handles to the same
> > transaction. There's only one running transaction at a time and thread B
> > does not switch to another transaction.
> >
> > Please read the code carefully.
> >
> > > This is not what journal quota means to do.
> > >
> > >
> > > Thanks,
> > > Shilong
> > >
> > > > Thank you,
> > > > Andrew
>
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2017-08-03 14:37:00

by Jan Kara

[permalink] [raw]
Subject: Re: quota: dqio_mutex design

Hello!

On Thu 03-08-17 19:31:04, Wang Shilong wrote:
> We DDN is investigating the same issue!
>
> Some comments comes:
>
> On Thu, Aug 3, 2017 at 1:52 AM, Andrew Perepechko <[email protected]> wrote:
> >> On Tue 01-08-17 15:02:42, Jan Kara wrote:
> >> > Hi Andrew,
> >> >
> >> I've been experimenting with this today but this idea didn't bring any
> >> benefit in my testing. Was your setup with multiple users or a single user?
> >> Could you give some testing to my patches to see whether they bring some
> >> benefit to you?
> >>
> >> Honza
> >
> > Hi Jan!
> >
> > My setup was with a single user. Unfortunately, it may take some time before
> > I can try a patched kernel other than RHEL6 or RHEL7 with the same test,
> > we have a lot of dependencies on these kernels.
> >
> > The actual test we ran was mdtest.
> >
> > By the way, we had 15+% performance improvement in creates from the
> > change that was discussed earlier in this thread:
> >
> > EXT4_SB(dquot->dq_sb)->s_qf_names[GRPQUOTA]) {
> > + if (test_bit(DQ_MOD_B, &dquot->dq_flags))
> > + return 0;
>
> I don't think this is right, as far as i understand, journal quota need go
> together with quota space change update inside same transaction, this will
> break consistency if power off or RO happen.
>
> Here is some ideas that i have thought:
>
> 1) switch dqio_mutex to a read/write lock, especially, i think most of
> time journal quota updates is in-place update, that means we don't need
> change quota tree in memory, firstly try read lock, retry with write lock if
> there is real tree change.
>
> 2)another is similar idea of Andrew's walkaround, but we need make correct
> fix, maintain dirty list for per transaction, and gurantee quota updates are
> flushed when commit transaction, this might be complex, i am not very
> familiar with JBD2 codes.
>
> It will be really nice if we could fix this regression, as we see 20% performace
> regression.

So I have couple of patches:

1) I convert dqio_mutex do rw semaphore and use it in exclusive mode only
when quota tree is going to change. We also use dq_lock to serialize writes
of dquot - you cannot have two writes happening in parallel as that could
result in stale data being on disk. This patch brings benefit when there
are multiple users - now they don't contend on common lock. It shows
advantage in my testing so I plan to merge these patches. When the
contention is on a structure for single user this change however doesn't
bring much (the performance change is in statistical noise in my testing).

2) I have patches to remove some contention on dq_list_lock by not using
dirty list for tracking dquots in ext4 (and thus avoid dq_list_lock
completely in quota modification path). This does not bring measurable
benefit in my testing even on ramdisk but lockstat data for dq_list_lock
looks much better after this - it seems lock contention just shifted to
dq_data_lock - I'll try to address that as well and see whether I'll be
able to measure some advantage.

3) I have patches to convert dquot dirty bit to sequence counter so that
in commit_dqblk() we can check whether dquot state we wanted to write is
already on disk. Note that this is different from Andrew's approach in that
we do wait for dquot to be actually written before returning. We just don't
repeat the write unnecessarily. However this didn't bring any measurable
benefit in my testing so unless I'll be able to confirm it benefits some
workloads I won't merge this change.

If you can experiment with your workloads, I can send you patches. I'd be
keen on having some performance data from real setups...

Honza

>
> Thanks,
> Shilong
>
> > dquot_mark_dquot_dirty(dquot);
> > return ext4_write_dquot(dquot);
> >
> > The idea was that if we know that some thread is somewhere between
> > mark_dirty and clear_dirty, then we can avoid blocking on dqio_mutex,
> > since that thread will update the ondisk dquot for us.
> >
--
Jan Kara <[email protected]>
SUSE Labs, CR

2017-08-03 14:39:51

by Wang Shilong

[permalink] [raw]
Subject: Re: quota: dqio_mutex design

Hello Jan,


Please send me patches, we could test and response you!

Thanks,
Shilong

On Thu, Aug 3, 2017 at 10:36 PM, Jan Kara <[email protected]> wrote:
> Hello!
>
> On Thu 03-08-17 19:31:04, Wang Shilong wrote:
>> We DDN is investigating the same issue!
>>
>> Some comments comes:
>>
>> On Thu, Aug 3, 2017 at 1:52 AM, Andrew Perepechko <[email protected]> wrote:
>> >> On Tue 01-08-17 15:02:42, Jan Kara wrote:
>> >> > Hi Andrew,
>> >> >
>> >> I've been experimenting with this today but this idea didn't bring any
>> >> benefit in my testing. Was your setup with multiple users or a single user?
>> >> Could you give some testing to my patches to see whether they bring some
>> >> benefit to you?
>> >>
>> >> Honza
>> >
>> > Hi Jan!
>> >
>> > My setup was with a single user. Unfortunately, it may take some time before
>> > I can try a patched kernel other than RHEL6 or RHEL7 with the same test,
>> > we have a lot of dependencies on these kernels.
>> >
>> > The actual test we ran was mdtest.
>> >
>> > By the way, we had 15+% performance improvement in creates from the
>> > change that was discussed earlier in this thread:
>> >
>> > EXT4_SB(dquot->dq_sb)->s_qf_names[GRPQUOTA]) {
>> > + if (test_bit(DQ_MOD_B, &dquot->dq_flags))
>> > + return 0;
>>
>> I don't think this is right, as far as i understand, journal quota need go
>> together with quota space change update inside same transaction, this will
>> break consistency if power off or RO happen.
>>
>> Here is some ideas that i have thought:
>>
>> 1) switch dqio_mutex to a read/write lock, especially, i think most of
>> time journal quota updates is in-place update, that means we don't need
>> change quota tree in memory, firstly try read lock, retry with write lock if
>> there is real tree change.
>>
>> 2)another is similar idea of Andrew's walkaround, but we need make correct
>> fix, maintain dirty list for per transaction, and gurantee quota updates are
>> flushed when commit transaction, this might be complex, i am not very
>> familiar with JBD2 codes.
>>
>> It will be really nice if we could fix this regression, as we see 20% performace
>> regression.
>
> So I have couple of patches:
>
> 1) I convert dqio_mutex do rw semaphore and use it in exclusive mode only
> when quota tree is going to change. We also use dq_lock to serialize writes
> of dquot - you cannot have two writes happening in parallel as that could
> result in stale data being on disk. This patch brings benefit when there
> are multiple users - now they don't contend on common lock. It shows
> advantage in my testing so I plan to merge these patches. When the
> contention is on a structure for single user this change however doesn't
> bring much (the performance change is in statistical noise in my testing).
>
> 2) I have patches to remove some contention on dq_list_lock by not using
> dirty list for tracking dquots in ext4 (and thus avoid dq_list_lock
> completely in quota modification path). This does not bring measurable
> benefit in my testing even on ramdisk but lockstat data for dq_list_lock
> looks much better after this - it seems lock contention just shifted to
> dq_data_lock - I'll try to address that as well and see whether I'll be
> able to measure some advantage.
>
> 3) I have patches to convert dquot dirty bit to sequence counter so that
> in commit_dqblk() we can check whether dquot state we wanted to write is
> already on disk. Note that this is different from Andrew's approach in that
> we do wait for dquot to be actually written before returning. We just don't
> repeat the write unnecessarily. However this didn't bring any measurable
> benefit in my testing so unless I'll be able to confirm it benefits some
> workloads I won't merge this change.
>
> If you can experiment with your workloads, I can send you patches. I'd be
> keen on having some performance data from real setups...
>
> Honza
>
>>
>> Thanks,
>> Shilong
>>
>> > dquot_mark_dquot_dirty(dquot);
>> > return ext4_write_dquot(dquot);
>> >
>> > The idea was that if we know that some thread is somewhere between
>> > mark_dirty and clear_dirty, then we can avoid blocking on dqio_mutex,
>> > since that thread will update the ondisk dquot for us.
>> >
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

2017-08-08 16:06:38

by Jan Kara

[permalink] [raw]
Subject: Re: quota: dqio_mutex design

Hi,

On Thu 03-08-17 22:39:51, Wang Shilong wrote:
> Please send me patches, we could test and response you!

So I finally have something which isn't obviously wrong (it survives basic
testing and gives me improvements for some workloads). I have pushed out
the patches to:

git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git quota_scaling

I'd be happy if you can share your results with my patches. I have not yet
figured out a safe way to reduce the contention on dq_lock during update of
on-disk structure when lot of processes bang single dquot. I have
experimental patch but it didn't bring any benefit in my testing - I'll
rebase it on top of other patches I have send it to you for some testing.

Honza

> On Thu, Aug 3, 2017 at 10:36 PM, Jan Kara <[email protected]> wrote:
> > Hello!
> >
> > On Thu 03-08-17 19:31:04, Wang Shilong wrote:
> >> We DDN is investigating the same issue!
> >>
> >> Some comments comes:
> >>
> >> On Thu, Aug 3, 2017 at 1:52 AM, Andrew Perepechko <[email protected]> wrote:
> >> >> On Tue 01-08-17 15:02:42, Jan Kara wrote:
> >> >> > Hi Andrew,
> >> >> >
> >> >> I've been experimenting with this today but this idea didn't bring any
> >> >> benefit in my testing. Was your setup with multiple users or a single user?
> >> >> Could you give some testing to my patches to see whether they bring some
> >> >> benefit to you?
> >> >>
> >> >> Honza
> >> >
> >> > Hi Jan!
> >> >
> >> > My setup was with a single user. Unfortunately, it may take some time before
> >> > I can try a patched kernel other than RHEL6 or RHEL7 with the same test,
> >> > we have a lot of dependencies on these kernels.
> >> >
> >> > The actual test we ran was mdtest.
> >> >
> >> > By the way, we had 15+% performance improvement in creates from the
> >> > change that was discussed earlier in this thread:
> >> >
> >> > EXT4_SB(dquot->dq_sb)->s_qf_names[GRPQUOTA]) {
> >> > + if (test_bit(DQ_MOD_B, &dquot->dq_flags))
> >> > + return 0;
> >>
> >> I don't think this is right, as far as i understand, journal quota need go
> >> together with quota space change update inside same transaction, this will
> >> break consistency if power off or RO happen.
> >>
> >> Here is some ideas that i have thought:
> >>
> >> 1) switch dqio_mutex to a read/write lock, especially, i think most of
> >> time journal quota updates is in-place update, that means we don't need
> >> change quota tree in memory, firstly try read lock, retry with write lock if
> >> there is real tree change.
> >>
> >> 2)another is similar idea of Andrew's walkaround, but we need make correct
> >> fix, maintain dirty list for per transaction, and gurantee quota updates are
> >> flushed when commit transaction, this might be complex, i am not very
> >> familiar with JBD2 codes.
> >>
> >> It will be really nice if we could fix this regression, as we see 20% performace
> >> regression.
> >
> > So I have couple of patches:
> >
> > 1) I convert dqio_mutex do rw semaphore and use it in exclusive mode only
> > when quota tree is going to change. We also use dq_lock to serialize writes
> > of dquot - you cannot have two writes happening in parallel as that could
> > result in stale data being on disk. This patch brings benefit when there
> > are multiple users - now they don't contend on common lock. It shows
> > advantage in my testing so I plan to merge these patches. When the
> > contention is on a structure for single user this change however doesn't
> > bring much (the performance change is in statistical noise in my testing).
> >
> > 2) I have patches to remove some contention on dq_list_lock by not using
> > dirty list for tracking dquots in ext4 (and thus avoid dq_list_lock
> > completely in quota modification path). This does not bring measurable
> > benefit in my testing even on ramdisk but lockstat data for dq_list_lock
> > looks much better after this - it seems lock contention just shifted to
> > dq_data_lock - I'll try to address that as well and see whether I'll be
> > able to measure some advantage.
> >
> > 3) I have patches to convert dquot dirty bit to sequence counter so that
> > in commit_dqblk() we can check whether dquot state we wanted to write is
> > already on disk. Note that this is different from Andrew's approach in that
> > we do wait for dquot to be actually written before returning. We just don't
> > repeat the write unnecessarily. However this didn't bring any measurable
> > benefit in my testing so unless I'll be able to confirm it benefits some
> > workloads I won't merge this change.
> >
> > If you can experiment with your workloads, I can send you patches. I'd be
> > keen on having some performance data from real setups...
> >
> > Honza
> >
> >>
> >> Thanks,
> >> Shilong
> >>
> >> > dquot_mark_dquot_dirty(dquot);
> >> > return ext4_write_dquot(dquot);
> >> >
> >> > The idea was that if we know that some thread is somewhere between
> >> > mark_dirty and clear_dirty, then we can avoid blocking on dqio_mutex,
> >> > since that thread will update the ondisk dquot for us.
> >> >
> > --
> > Jan Kara <[email protected]>
> > SUSE Labs, CR
--
Jan Kara <[email protected]>
SUSE Labs, CR

2017-08-14 03:24:13

by Wang Shilong

[permalink] [raw]
Subject: RE: quota: dqio_mutex design

Hello Jan,

We have tested your patches, in generally, it helped in our case. Noticed,
our test case is only one user with many process create/remove file.


4.13.0-rc3 without any patches
no Quota -O quota' -O quota, project'
File Creation File Unlink File Creation File Unlink File Creation File Unlink
0 93,068 296,028 86,860 285,131 85,199 189,653
1 79,501 280,921 91,079 277,349 186,279 170,982
2 79,932 299,750 90,246 274,457 133,922 191,677
3 80,146 297,525 86,416 272,160 192,354 198,869

4.13.0-rc3/w Jan Kara patch
no Quota -O quota' -O quota, project'
File Creation File Unlink File Creation File Unlink File Creation File Unlink
0 73,057 311,217 74,898 286,120 81,217 288,138 ops/per second
1 78,872 312,471 76,470 277,033 77,014 288,057
2 79,170 291,440 76,174 283,525 73,686 283,526
3 79,941 309,168 78,493 277,331 78,751 281,377

4.13.0-rc3/with https://patchwork.ozlabs.org/patch/799014/
no Quota -O quota' -O quota, project'
File Creation File Unlink File Creation File Unlink File Creation File Unlink
0 100,319 322,746 87,480 302,579 84,569 218,969
1 728,424 299,808 312,766 293,471 219,198 199,389
2 729,410 300,930 315,590 289,664 218,283 197,871
3 727,555 298,797 316,837 289,108 213,095 213,458

4.13.0-rc3/w https://patchwork.ozlabs.org/patch/799014/ + Jan Kara patch
no Quota -O quota' -O quota, project'
File Creation File Unlink File Creation File Unlink File Creation File Unlink
0 100,312 324,871 87,076 267,303 86,258 288,137
1 707,524 298,892 361,963 252,493 421,919 282,492
2 707,792 298,162 363,450 264,923 397,723 283,675
3 707,420 302,552 354,013 266,638 421,537 281,763


In conclusion, your patches helped a lot for our testing, noticed, please ignored test0 running
for creation, the first time testing will loaded inode cache in memory, we used test1-3 to compare.

With extra patch applied, your patches improved File creation(quota+project) 2X, File unlink
1.5X.

Thanks,
Shilong

________________________________________
From: Jan Kara [[email protected]]
Sent: Wednesday, August 09, 2017 0:06
To: Wang Shilong
Cc: Jan Kara; Andrew Perepechko; Shuichi Ihara; Wang Shilong; Li Xi; Ext4 Developers List; [email protected]
Subject: Re: quota: dqio_mutex design

Hi,

On Thu 03-08-17 22:39:51, Wang Shilong wrote:
> Please send me patches, we could test and response you!

So I finally have something which isn't obviously wrong (it survives basic
testing and gives me improvements for some workloads). I have pushed out
the patches to:

git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git quota_scaling

I'd be happy if you can share your results with my patches. I have not yet
figured out a safe way to reduce the contention on dq_lock during update of
on-disk structure when lot of processes bang single dquot. I have
experimental patch but it didn't bring any benefit in my testing - I'll
rebase it on top of other patches I have send it to you for some testing.

Honza

> On Thu, Aug 3, 2017 at 10:36 PM, Jan Kara <[email protected]> wrote:
> > Hello!
> >
> > On Thu 03-08-17 19:31:04, Wang Shilong wrote:
> >> We DDN is investigating the same issue!
> >>
> >> Some comments comes:
> >>
> >> On Thu, Aug 3, 2017 at 1:52 AM, Andrew Perepechko <[email protected]> wrote:
> >> >> On Tue 01-08-17 15:02:42, Jan Kara wrote:
> >> >> > Hi Andrew,
> >> >> >
> >> >> I've been experimenting with this today but this idea didn't bring any
> >> >> benefit in my testing. Was your setup with multiple users or a single user?
> >> >> Could you give some testing to my patches to see whether they bring some
> >> >> benefit to you?
> >> >>
> >> >> Honza
> >> >
> >> > Hi Jan!
> >> >
> >> > My setup was with a single user. Unfortunately, it may take some time before
> >> > I can try a patched kernel other than RHEL6 or RHEL7 with the same test,
> >> > we have a lot of dependencies on these kernels.
> >> >
> >> > The actual test we ran was mdtest.
> >> >
> >> > By the way, we had 15+% performance improvement in creates from the
> >> > change that was discussed earlier in this thread:
> >> >
> >> > EXT4_SB(dquot->dq_sb)->s_qf_names[GRPQUOTA]) {
> >> > + if (test_bit(DQ_MOD_B, &dquot->dq_flags))
> >> > + return 0;
> >>
> >> I don't think this is right, as far as i understand, journal quota need go
> >> together with quota space change update inside same transaction, this will
> >> break consistency if power off or RO happen.
> >>
> >> Here is some ideas that i have thought:
> >>
> >> 1) switch dqio_mutex to a read/write lock, especially, i think most of
> >> time journal quota updates is in-place update, that means we don't need
> >> change quota tree in memory, firstly try read lock, retry with write lock if
> >> there is real tree change.
> >>
> >> 2)another is similar idea of Andrew's walkaround, but we need make correct
> >> fix, maintain dirty list for per transaction, and gurantee quota updates are
> >> flushed when commit transaction, this might be complex, i am not very
> >> familiar with JBD2 codes.
> >>
> >> It will be really nice if we could fix this regression, as we see 20% performace
> >> regression.
> >
> > So I have couple of patches:
> >
> > 1) I convert dqio_mutex do rw semaphore and use it in exclusive mode only
> > when quota tree is going to change. We also use dq_lock to serialize writes
> > of dquot - you cannot have two writes happening in parallel as that could
> > result in stale data being on disk. This patch brings benefit when there
> > are multiple users - now they don't contend on common lock. It shows
> > advantage in my testing so I plan to merge these patches. When the
> > contention is on a structure for single user this change however doesn't
> > bring much (the performance change is in statistical noise in my testing).
> >
> > 2) I have patches to remove some contention on dq_list_lock by not using
> > dirty list for tracking dquots in ext4 (and thus avoid dq_list_lock
> > completely in quota modification path). This does not bring measurable
> > benefit in my testing even on ramdisk but lockstat data for dq_list_lock
> > looks much better after this - it seems lock contention just shifted to
> > dq_data_lock - I'll try to address that as well and see whether I'll be
> > able to measure some advantage.
> >
> > 3) I have patches to convert dquot dirty bit to sequence counter so that
> > in commit_dqblk() we can check whether dquot state we wanted to write is
> > already on disk. Note that this is different from Andrew's approach in that
> > we do wait for dquot to be actually written before returning. We just don't
> > repeat the write unnecessarily. However this didn't bring any measurable
> > benefit in my testing so unless I'll be able to confirm it benefits some
> > workloads I won't merge this change.
> >
> > If you can experiment with your workloads, I can send you patches. I'd be
> > keen on having some performance data from real setups...
> >
> > Honza
> >
> >>
> >> Thanks,
> >> Shilong
> >>
> >> > dquot_mark_dquot_dirty(dquot);
> >> > return ext4_write_dquot(dquot);
> >> >
> >> > The idea was that if we know that some thread is somewhere between
> >> > mark_dirty and clear_dirty, then we can avoid blocking on dqio_mutex,
> >> > since that thread will update the ondisk dquot for us.
> >> >
> > --
> > Jan Kara <[email protected]>
> > SUSE Labs, CR
--
Jan Kara <[email protected]>
SUSE Labs, CR

2017-08-14 03:28:37

by Wang Shilong

[permalink] [raw]
Subject: Re: quota: dqio_mutex design

sorry, format did not look fine, please use attachment.

On Mon, Aug 14, 2017 at 11:24 AM, Wang Shilong <[email protected]> wrote:
> Hello Jan,
>
> We have tested your patches, in generally, it helped in our case. Noticed,
> our test case is only one user with many process create/remove file.
>
>
> 4.13.0-rc3 without any patches
> no Quota -O quota' -O quota, project'
> File Creation File Unlink File Creation File Unlink File Creation File Unlink
> 0 93,068 296,028 86,860 285,131 85,199 189,653
> 1 79,501 280,921 91,079 277,349 186,279 170,982
> 2 79,932 299,750 90,246 274,457 133,922 191,677
> 3 80,146 297,525 86,416 272,160 192,354 198,869
>
> 4.13.0-rc3/w Jan Kara patch
> no Quota -O quota' -O quota, project'
> File Creation File Unlink File Creation File Unlink File Creation File Unlink
> 0 73,057 311,217 74,898 286,120 81,217 288,138 ops/per second
> 1 78,872 312,471 76,470 277,033 77,014 288,057
> 2 79,170 291,440 76,174 283,525 73,686 283,526
> 3 79,941 309,168 78,493 277,331 78,751 281,377
>
> 4.13.0-rc3/with https://patchwork.ozlabs.org/patch/799014/
> no Quota -O quota' -O quota, project'
> File Creation File Unlink File Creation File Unlink File Creation File Unlink
> 0 100,319 322,746 87,480 302,579 84,569 218,969
> 1 728,424 299,808 312,766 293,471 219,198 199,389
> 2 729,410 300,930 315,590 289,664 218,283 197,871
> 3 727,555 298,797 316,837 289,108 213,095 213,458
>
> 4.13.0-rc3/w https://patchwork.ozlabs.org/patch/799014/ + Jan Kara patch
> no Quota -O quota' -O quota, project'
> File Creation File Unlink File Creation File Unlink File Creation File Unlink
> 0 100,312 324,871 87,076 267,303 86,258 288,137
> 1 707,524 298,892 361,963 252,493 421,919 282,492
> 2 707,792 298,162 363,450 264,923 397,723 283,675
> 3 707,420 302,552 354,013 266,638 421,537 281,763
>
>
> In conclusion, your patches helped a lot for our testing, noticed, please ignored test0 running
> for creation, the first time testing will loaded inode cache in memory, we used test1-3 to compare.
>
> With extra patch applied, your patches improved File creation(quota+project) 2X, File unlink
> 1.5X.
>
> Thanks,
> Shilong
>
> ________________________________________
> From: Jan Kara [[email protected]]
> Sent: Wednesday, August 09, 2017 0:06
> To: Wang Shilong
> Cc: Jan Kara; Andrew Perepechko; Shuichi Ihara; Wang Shilong; Li Xi; Ext4 Developers List; [email protected]
> Subject: Re: quota: dqio_mutex design
>
> Hi,
>
> On Thu 03-08-17 22:39:51, Wang Shilong wrote:
>> Please send me patches, we could test and response you!
>
> So I finally have something which isn't obviously wrong (it survives basic
> testing and gives me improvements for some workloads). I have pushed out
> the patches to:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git quota_scaling
>
> I'd be happy if you can share your results with my patches. I have not yet
> figured out a safe way to reduce the contention on dq_lock during update of
> on-disk structure when lot of processes bang single dquot. I have
> experimental patch but it didn't bring any benefit in my testing - I'll
> rebase it on top of other patches I have send it to you for some testing.
>
> Honza
>
>> On Thu, Aug 3, 2017 at 10:36 PM, Jan Kara <[email protected]> wrote:
>> > Hello!
>> >
>> > On Thu 03-08-17 19:31:04, Wang Shilong wrote:
>> >> We DDN is investigating the same issue!
>> >>
>> >> Some comments comes:
>> >>
>> >> On Thu, Aug 3, 2017 at 1:52 AM, Andrew Perepechko <[email protected]> wrote:
>> >> >> On Tue 01-08-17 15:02:42, Jan Kara wrote:
>> >> >> > Hi Andrew,
>> >> >> >
>> >> >> I've been experimenting with this today but this idea didn't bring any
>> >> >> benefit in my testing. Was your setup with multiple users or a single user?
>> >> >> Could you give some testing to my patches to see whether they bring some
>> >> >> benefit to you?
>> >> >>
>> >> >> Honza
>> >> >
>> >> > Hi Jan!
>> >> >
>> >> > My setup was with a single user. Unfortunately, it may take some time before
>> >> > I can try a patched kernel other than RHEL6 or RHEL7 with the same test,
>> >> > we have a lot of dependencies on these kernels.
>> >> >
>> >> > The actual test we ran was mdtest.
>> >> >
>> >> > By the way, we had 15+% performance improvement in creates from the
>> >> > change that was discussed earlier in this thread:
>> >> >
>> >> > EXT4_SB(dquot->dq_sb)->s_qf_names[GRPQUOTA]) {
>> >> > + if (test_bit(DQ_MOD_B, &dquot->dq_flags))
>> >> > + return 0;
>> >>
>> >> I don't think this is right, as far as i understand, journal quota need go
>> >> together with quota space change update inside same transaction, this will
>> >> break consistency if power off or RO happen.
>> >>
>> >> Here is some ideas that i have thought:
>> >>
>> >> 1) switch dqio_mutex to a read/write lock, especially, i think most of
>> >> time journal quota updates is in-place update, that means we don't need
>> >> change quota tree in memory, firstly try read lock, retry with write lock if
>> >> there is real tree change.
>> >>
>> >> 2)another is similar idea of Andrew's walkaround, but we need make correct
>> >> fix, maintain dirty list for per transaction, and gurantee quota updates are
>> >> flushed when commit transaction, this might be complex, i am not very
>> >> familiar with JBD2 codes.
>> >>
>> >> It will be really nice if we could fix this regression, as we see 20% performace
>> >> regression.
>> >
>> > So I have couple of patches:
>> >
>> > 1) I convert dqio_mutex do rw semaphore and use it in exclusive mode only
>> > when quota tree is going to change. We also use dq_lock to serialize writes
>> > of dquot - you cannot have two writes happening in parallel as that could
>> > result in stale data being on disk. This patch brings benefit when there
>> > are multiple users - now they don't contend on common lock. It shows
>> > advantage in my testing so I plan to merge these patches. When the
>> > contention is on a structure for single user this change however doesn't
>> > bring much (the performance change is in statistical noise in my testing).
>> >
>> > 2) I have patches to remove some contention on dq_list_lock by not using
>> > dirty list for tracking dquots in ext4 (and thus avoid dq_list_lock
>> > completely in quota modification path). This does not bring measurable
>> > benefit in my testing even on ramdisk but lockstat data for dq_list_lock
>> > looks much better after this - it seems lock contention just shifted to
>> > dq_data_lock - I'll try to address that as well and see whether I'll be
>> > able to measure some advantage.
>> >
>> > 3) I have patches to convert dquot dirty bit to sequence counter so that
>> > in commit_dqblk() we can check whether dquot state we wanted to write is
>> > already on disk. Note that this is different from Andrew's approach in that
>> > we do wait for dquot to be actually written before returning. We just don't
>> > repeat the write unnecessarily. However this didn't bring any measurable
>> > benefit in my testing so unless I'll be able to confirm it benefits some
>> > workloads I won't merge this change.
>> >
>> > If you can experiment with your workloads, I can send you patches. I'd be
>> > keen on having some performance data from real setups...
>> >
>> > Honza
>> >
>> >>
>> >> Thanks,
>> >> Shilong
>> >>
>> >> > dquot_mark_dquot_dirty(dquot);
>> >> > return ext4_write_dquot(dquot);
>> >> >
>> >> > The idea was that if we know that some thread is somewhere between
>> >> > mark_dirty and clear_dirty, then we can avoid blocking on dqio_mutex,
>> >> > since that thread will update the ondisk dquot for us.
>> >> >
>> > --
>> > Jan Kara <[email protected]>
>> > SUSE Labs, CR
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR


Attachments:
mdtest-JK-patch.xlsx (27.30 kB)

2017-08-14 03:53:38

by Wang Shilong

[permalink] [raw]
Subject: Re: quota: dqio_mutex design

Txt format attched.

BTW, Jan, it will be cool if you could point which patch help a lot for
our test case, since there are a lot of patches there, we want to port
some of patches to RHEL7.

Thanks,
Shilong

On Mon, Aug 14, 2017 at 11:24 AM, Wang Shilong <[email protected]> wrote:
> Hello Jan,
>
> We have tested your patches, in generally, it helped in our case. Noticed,
> our test case is only one user with many process create/remove file.
>
>
> 4.13.0-rc3 without any patches
> no Quota -O quota' -O quota, project'
> File Creation File Unlink File Creation File Unlink File Creation File Unlink
> 0 93,068 296,028 86,860 285,131 85,199 189,653
> 1 79,501 280,921 91,079 277,349 186,279 170,982
> 2 79,932 299,750 90,246 274,457 133,922 191,677
> 3 80,146 297,525 86,416 272,160 192,354 198,869
>
> 4.13.0-rc3/w Jan Kara patch
> no Quota -O quota' -O quota, project'
> File Creation File Unlink File Creation File Unlink File Creation File Unlink
> 0 73,057 311,217 74,898 286,120 81,217 288,138 ops/per second
> 1 78,872 312,471 76,470 277,033 77,014 288,057
> 2 79,170 291,440 76,174 283,525 73,686 283,526
> 3 79,941 309,168 78,493 277,331 78,751 281,377
>
> 4.13.0-rc3/with https://patchwork.ozlabs.org/patch/799014/
> no Quota -O quota' -O quota, project'
> File Creation File Unlink File Creation File Unlink File Creation File Unlink
> 0 100,319 322,746 87,480 302,579 84,569 218,969
> 1 728,424 299,808 312,766 293,471 219,198 199,389
> 2 729,410 300,930 315,590 289,664 218,283 197,871
> 3 727,555 298,797 316,837 289,108 213,095 213,458
>
> 4.13.0-rc3/w https://patchwork.ozlabs.org/patch/799014/ + Jan Kara patch
> no Quota -O quota' -O quota, project'
> File Creation File Unlink File Creation File Unlink File Creation File Unlink
> 0 100,312 324,871 87,076 267,303 86,258 288,137
> 1 707,524 298,892 361,963 252,493 421,919 282,492
> 2 707,792 298,162 363,450 264,923 397,723 283,675
> 3 707,420 302,552 354,013 266,638 421,537 281,763
>
>
> In conclusion, your patches helped a lot for our testing, noticed, please ignored test0 running
> for creation, the first time testing will loaded inode cache in memory, we used test1-3 to compare.
>
> With extra patch applied, your patches improved File creation(quota+project) 2X, File unlink
> 1.5X.
>
> Thanks,
> Shilong
>
> ________________________________________
> From: Jan Kara [[email protected]]
> Sent: Wednesday, August 09, 2017 0:06
> To: Wang Shilong
> Cc: Jan Kara; Andrew Perepechko; Shuichi Ihara; Wang Shilong; Li Xi; Ext4 Developers List; [email protected]
> Subject: Re: quota: dqio_mutex design
>
> Hi,
>
> On Thu 03-08-17 22:39:51, Wang Shilong wrote:
>> Please send me patches, we could test and response you!
>
> So I finally have something which isn't obviously wrong (it survives basic
> testing and gives me improvements for some workloads). I have pushed out
> the patches to:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git quota_scaling
>
> I'd be happy if you can share your results with my patches. I have not yet
> figured out a safe way to reduce the contention on dq_lock during update of
> on-disk structure when lot of processes bang single dquot. I have
> experimental patch but it didn't bring any benefit in my testing - I'll
> rebase it on top of other patches I have send it to you for some testing.
>
> Honza
>
>> On Thu, Aug 3, 2017 at 10:36 PM, Jan Kara <[email protected]> wrote:
>> > Hello!
>> >
>> > On Thu 03-08-17 19:31:04, Wang Shilong wrote:
>> >> We DDN is investigating the same issue!
>> >>
>> >> Some comments comes:
>> >>
>> >> On Thu, Aug 3, 2017 at 1:52 AM, Andrew Perepechko <[email protected]> wrote:
>> >> >> On Tue 01-08-17 15:02:42, Jan Kara wrote:
>> >> >> > Hi Andrew,
>> >> >> >
>> >> >> I've been experimenting with this today but this idea didn't bring any
>> >> >> benefit in my testing. Was your setup with multiple users or a single user?
>> >> >> Could you give some testing to my patches to see whether they bring some
>> >> >> benefit to you?
>> >> >>
>> >> >> Honza
>> >> >
>> >> > Hi Jan!
>> >> >
>> >> > My setup was with a single user. Unfortunately, it may take some time before
>> >> > I can try a patched kernel other than RHEL6 or RHEL7 with the same test,
>> >> > we have a lot of dependencies on these kernels.
>> >> >
>> >> > The actual test we ran was mdtest.
>> >> >
>> >> > By the way, we had 15+% performance improvement in creates from the
>> >> > change that was discussed earlier in this thread:
>> >> >
>> >> > EXT4_SB(dquot->dq_sb)->s_qf_names[GRPQUOTA]) {
>> >> > + if (test_bit(DQ_MOD_B, &dquot->dq_flags))
>> >> > + return 0;
>> >>
>> >> I don't think this is right, as far as i understand, journal quota need go
>> >> together with quota space change update inside same transaction, this will
>> >> break consistency if power off or RO happen.
>> >>
>> >> Here is some ideas that i have thought:
>> >>
>> >> 1) switch dqio_mutex to a read/write lock, especially, i think most of
>> >> time journal quota updates is in-place update, that means we don't need
>> >> change quota tree in memory, firstly try read lock, retry with write lock if
>> >> there is real tree change.
>> >>
>> >> 2)another is similar idea of Andrew's walkaround, but we need make correct
>> >> fix, maintain dirty list for per transaction, and gurantee quota updates are
>> >> flushed when commit transaction, this might be complex, i am not very
>> >> familiar with JBD2 codes.
>> >>
>> >> It will be really nice if we could fix this regression, as we see 20% performace
>> >> regression.
>> >
>> > So I have couple of patches:
>> >
>> > 1) I convert dqio_mutex do rw semaphore and use it in exclusive mode only
>> > when quota tree is going to change. We also use dq_lock to serialize writes
>> > of dquot - you cannot have two writes happening in parallel as that could
>> > result in stale data being on disk. This patch brings benefit when there
>> > are multiple users - now they don't contend on common lock. It shows
>> > advantage in my testing so I plan to merge these patches. When the
>> > contention is on a structure for single user this change however doesn't
>> > bring much (the performance change is in statistical noise in my testing).
>> >
>> > 2) I have patches to remove some contention on dq_list_lock by not using
>> > dirty list for tracking dquots in ext4 (and thus avoid dq_list_lock
>> > completely in quota modification path). This does not bring measurable
>> > benefit in my testing even on ramdisk but lockstat data for dq_list_lock
>> > looks much better after this - it seems lock contention just shifted to
>> > dq_data_lock - I'll try to address that as well and see whether I'll be
>> > able to measure some advantage.
>> >
>> > 3) I have patches to convert dquot dirty bit to sequence counter so that
>> > in commit_dqblk() we can check whether dquot state we wanted to write is
>> > already on disk. Note that this is different from Andrew's approach in that
>> > we do wait for dquot to be actually written before returning. We just don't
>> > repeat the write unnecessarily. However this didn't bring any measurable
>> > benefit in my testing so unless I'll be able to confirm it benefits some
>> > workloads I won't merge this change.
>> >
>> > If you can experiment with your workloads, I can send you patches. I'd be
>> > keen on having some performance data from real setups...
>> >
>> > Honza
>> >
>> >>
>> >> Thanks,
>> >> Shilong
>> >>
>> >> > dquot_mark_dquot_dirty(dquot);
>> >> > return ext4_write_dquot(dquot);
>> >> >
>> >> > The idea was that if we know that some thread is somewhere between
>> >> > mark_dirty and clear_dirty, then we can avoid blocking on dqio_mutex,
>> >> > since that thread will update the ondisk dquot for us.
>> >> >
>> > --
>> > Jan Kara <[email protected]>
>> > SUSE Labs, CR
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR


Attachments:
quota-scaling-results.txt (1.73 kB)

2017-08-14 08:22:57

by Jan Kara

[permalink] [raw]
Subject: Re: quota: dqio_mutex design

Hello,

On Mon 14-08-17 11:53:37, Wang Shilong wrote:
> Txt format attched.
>
> BTW, Jan, it will be cool if you could point which patch help a lot for
> our test case, since there are a lot of patches there, we want to port
> some of patches to RHEL7.

Thanks for the test results! They are really interesting. Do you have any
explanation why without any patches the '-O quota,project' runs for 'File
Creation' are faster than runs without quota or any other runs in the test?

WRT which patches helped I don't have a good subset for you. In my testing
each patch helped a bit. I expect in your setup the conversion of dqio_sem
to rwsem and then to use dq_lock might not have that big impact. So you
might try backporting patches from "quota: Fix possible corruption of
dqi_flags" onward.

Honza

> On Mon, Aug 14, 2017 at 11:24 AM, Wang Shilong <[email protected]> wrote:
> > Hello Jan,
> >
> > We have tested your patches, in generally, it helped in our case. Noticed,
> > our test case is only one user with many process create/remove file.
> >
> >
> > 4.13.0-rc3 without any patches
> > no Quota -O quota' -O quota, project'
> > File Creation File Unlink File Creation File Unlink File Creation File Unlink
> > 0 93,068 296,028 86,860 285,131 85,199 189,653
> > 1 79,501 280,921 91,079 277,349 186,279 170,982
> > 2 79,932 299,750 90,246 274,457 133,922 191,677
> > 3 80,146 297,525 86,416 272,160 192,354 198,869
> >
> > 4.13.0-rc3/w Jan Kara patch
> > no Quota -O quota' -O quota, project'
> > File Creation File Unlink File Creation File Unlink File Creation File Unlink
> > 0 73,057 311,217 74,898 286,120 81,217 288,138 ops/per second
> > 1 78,872 312,471 76,470 277,033 77,014 288,057
> > 2 79,170 291,440 76,174 283,525 73,686 283,526
> > 3 79,941 309,168 78,493 277,331 78,751 281,377
> >
> > 4.13.0-rc3/with https://patchwork.ozlabs.org/patch/799014/
> > no Quota -O quota' -O quota, project'
> > File Creation File Unlink File Creation File Unlink File Creation File Unlink
> > 0 100,319 322,746 87,480 302,579 84,569 218,969
> > 1 728,424 299,808 312,766 293,471 219,198 199,389
> > 2 729,410 300,930 315,590 289,664 218,283 197,871
> > 3 727,555 298,797 316,837 289,108 213,095 213,458
> >
> > 4.13.0-rc3/w https://patchwork.ozlabs.org/patch/799014/ + Jan Kara patch
> > no Quota -O quota' -O quota, project'
> > File Creation File Unlink File Creation File Unlink File Creation File Unlink
> > 0 100,312 324,871 87,076 267,303 86,258 288,137
> > 1 707,524 298,892 361,963 252,493 421,919 282,492
> > 2 707,792 298,162 363,450 264,923 397,723 283,675
> > 3 707,420 302,552 354,013 266,638 421,537 281,763
> >
> >
> > In conclusion, your patches helped a lot for our testing, noticed, please ignored test0 running
> > for creation, the first time testing will loaded inode cache in memory, we used test1-3 to compare.
> >
> > With extra patch applied, your patches improved File creation(quota+project) 2X, File unlink
> > 1.5X.
> >
> > Thanks,
> > Shilong
> >
> > ________________________________________
> > From: Jan Kara [[email protected]]
> > Sent: Wednesday, August 09, 2017 0:06
> > To: Wang Shilong
> > Cc: Jan Kara; Andrew Perepechko; Shuichi Ihara; Wang Shilong; Li Xi; Ext4 Developers List; [email protected]
> > Subject: Re: quota: dqio_mutex design
> >
> > Hi,
> >
> > On Thu 03-08-17 22:39:51, Wang Shilong wrote:
> >> Please send me patches, we could test and response you!
> >
> > So I finally have something which isn't obviously wrong (it survives basic
> > testing and gives me improvements for some workloads). I have pushed out
> > the patches to:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git quota_scaling
> >
> > I'd be happy if you can share your results with my patches. I have not yet
> > figured out a safe way to reduce the contention on dq_lock during update of
> > on-disk structure when lot of processes bang single dquot. I have
> > experimental patch but it didn't bring any benefit in my testing - I'll
> > rebase it on top of other patches I have send it to you for some testing.
> >
> > Honza
> >
> >> On Thu, Aug 3, 2017 at 10:36 PM, Jan Kara <[email protected]> wrote:
> >> > Hello!
> >> >
> >> > On Thu 03-08-17 19:31:04, Wang Shilong wrote:
> >> >> We DDN is investigating the same issue!
> >> >>
> >> >> Some comments comes:
> >> >>
> >> >> On Thu, Aug 3, 2017 at 1:52 AM, Andrew Perepechko <[email protected]> wrote:
> >> >> >> On Tue 01-08-17 15:02:42, Jan Kara wrote:
> >> >> >> > Hi Andrew,
> >> >> >> >
> >> >> >> I've been experimenting with this today but this idea didn't bring any
> >> >> >> benefit in my testing. Was your setup with multiple users or a single user?
> >> >> >> Could you give some testing to my patches to see whether they bring some
> >> >> >> benefit to you?
> >> >> >>
> >> >> >> Honza
> >> >> >
> >> >> > Hi Jan!
> >> >> >
> >> >> > My setup was with a single user. Unfortunately, it may take some time before
> >> >> > I can try a patched kernel other than RHEL6 or RHEL7 with the same test,
> >> >> > we have a lot of dependencies on these kernels.
> >> >> >
> >> >> > The actual test we ran was mdtest.
> >> >> >
> >> >> > By the way, we had 15+% performance improvement in creates from the
> >> >> > change that was discussed earlier in this thread:
> >> >> >
> >> >> > EXT4_SB(dquot->dq_sb)->s_qf_names[GRPQUOTA]) {
> >> >> > + if (test_bit(DQ_MOD_B, &dquot->dq_flags))
> >> >> > + return 0;
> >> >>
> >> >> I don't think this is right, as far as i understand, journal quota need go
> >> >> together with quota space change update inside same transaction, this will
> >> >> break consistency if power off or RO happen.
> >> >>
> >> >> Here is some ideas that i have thought:
> >> >>
> >> >> 1) switch dqio_mutex to a read/write lock, especially, i think most of
> >> >> time journal quota updates is in-place update, that means we don't need
> >> >> change quota tree in memory, firstly try read lock, retry with write lock if
> >> >> there is real tree change.
> >> >>
> >> >> 2)another is similar idea of Andrew's walkaround, but we need make correct
> >> >> fix, maintain dirty list for per transaction, and gurantee quota updates are
> >> >> flushed when commit transaction, this might be complex, i am not very
> >> >> familiar with JBD2 codes.
> >> >>
> >> >> It will be really nice if we could fix this regression, as we see 20% performace
> >> >> regression.
> >> >
> >> > So I have couple of patches:
> >> >
> >> > 1) I convert dqio_mutex do rw semaphore and use it in exclusive mode only
> >> > when quota tree is going to change. We also use dq_lock to serialize writes
> >> > of dquot - you cannot have two writes happening in parallel as that could
> >> > result in stale data being on disk. This patch brings benefit when there
> >> > are multiple users - now they don't contend on common lock. It shows
> >> > advantage in my testing so I plan to merge these patches. When the
> >> > contention is on a structure for single user this change however doesn't
> >> > bring much (the performance change is in statistical noise in my testing).
> >> >
> >> > 2) I have patches to remove some contention on dq_list_lock by not using
> >> > dirty list for tracking dquots in ext4 (and thus avoid dq_list_lock
> >> > completely in quota modification path). This does not bring measurable
> >> > benefit in my testing even on ramdisk but lockstat data for dq_list_lock
> >> > looks much better after this - it seems lock contention just shifted to
> >> > dq_data_lock - I'll try to address that as well and see whether I'll be
> >> > able to measure some advantage.
> >> >
> >> > 3) I have patches to convert dquot dirty bit to sequence counter so that
> >> > in commit_dqblk() we can check whether dquot state we wanted to write is
> >> > already on disk. Note that this is different from Andrew's approach in that
> >> > we do wait for dquot to be actually written before returning. We just don't
> >> > repeat the write unnecessarily. However this didn't bring any measurable
> >> > benefit in my testing so unless I'll be able to confirm it benefits some
> >> > workloads I won't merge this change.
> >> >
> >> > If you can experiment with your workloads, I can send you patches. I'd be
> >> > keen on having some performance data from real setups...
> >> >
> >> > Honza
> >> >
> >> >>
> >> >> Thanks,
> >> >> Shilong
> >> >>
> >> >> > dquot_mark_dquot_dirty(dquot);
> >> >> > return ext4_write_dquot(dquot);
> >> >> >
> >> >> > The idea was that if we know that some thread is somewhere between
> >> >> > mark_dirty and clear_dirty, then we can avoid blocking on dqio_mutex,
> >> >> > since that thread will update the ondisk dquot for us.
> >> >> >
> >> > --
> >> > Jan Kara <[email protected]>
> >> > SUSE Labs, CR
> > --
> > Jan Kara <[email protected]>
> > SUSE Labs, CR

> 4.13.0-rc3 without any patches
> no Quota -O quota -O quota,project
> creation unlink creation unlink creation unlink
> 0 93,068 296,028 86,860 285,131 85,199 189,653 ops/per second
> 1 79,501 280,921 91,079 277,349 186,279 170,982
> 2 79,932 299,750 90,246 274,457 133,922 191,677
> 3 80,146 297,525 86,416 272,160 192,354 198,869
>
> Jan Kara branch (quota_scaling)
> no Quota -O quota -O quota,project
> creation unlink creation unlink creation unlink
> 0 73,057 311,217 74,898 286,120 81,217 288,138
> 1 78,872 312,471 76,470 277,033 77,014 288,057
> 2 79,170 291,440 76,174 283,525 73,686 283,526
> 3 79,941 309,168 78,493 277,331 78,751 281,377
>
> 4.13.0-rc3 with v5 patch https://patchwork.ozlabs.org/patch/799014/
> no Quota -O quota -O quota,project
> creation unlink creation unlink creation unlink
> 0 100,319 322,746 87,480 302,579 84,569 218,969
> 1 728,424 299,808 312,766 293,471 219,198 199,389
> 2 729,410 300,930 315,590 289,664 218,283 197,871
> 3 727,555 298,797 316,837 289,108 213,095 213,458
>
> Jan Kara branch (quota_scaling) with v5 patch https://patchwork.ozlabs.org/patch/799014/
> no Quota -O quota -O quota,project
> creation unlink creation unlink creation unlink
> 0 100,312 324,871 87,076 267,303 86,258 288,137
> 1 707,524 298,892 361,963 252,493 421,919 282,492
> 2 707,792 298,162 363,450 264,923 397,723 283,675
> 3 707,420 302,552 354,013 266,638 421,537 281,763

--
Jan Kara <[email protected]>
SUSE Labs, CR