2010-07-07 15:22:57

by Corrado Zoccolo

[permalink] [raw]
Subject: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

Hi Jens,
patch 8e55063 "cfq-iosched: fix corner cases in idling logic", is
suspected for some regressions on high end hardware.
The two patches from this series:
- [PATCH 1/2] cfq-iosched: fix tree-wide handling of rq_noidle
- [PATCH 2/2] cfq-iosched: RQ_NOIDLE enabled for SYNC_WORKLOAD
fix two issues that I have identified, related to how RQ_NOIDLE is
used by the upper layers.
First patch makes sure that a RQ_NOIDLE coming after a sequence of
possibly idling requests from the same queue on the no-idle tree will
clear the noidle_tree_requires_idle flag.
Second patch enables RQ_NOIDLE for queues in the idling tree,
restoring the behaviour pre-8e55063 patch.

An other option to consider is the partial revert of 8e55063, if the
corner cases we are trying to handle are not frequent enough to
justify this added complexity.

Thanks,
Corrado


2010-07-07 15:56:09

by Corrado Zoccolo

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

Fixed Jens' mail address, and resending the patches based on for-2.6.36 tree.
On Wed, Jul 7, 2010 at 5:22 PM, Corrado Zoccolo <[email protected]> wrote:
> Hi Jens,
> patch 8e55063 "cfq-iosched: fix corner cases in idling logic", is
> suspected for some regressions on high end hardware.
> The two patches from this series:
> - [PATCH 1/2] cfq-iosched: fix tree-wide handling of rq_noidle
> - [PATCH 2/2] cfq-iosched: RQ_NOIDLE enabled for SYNC_WORKLOAD
> fix two issues that I have identified, related to how RQ_NOIDLE is
> used by the upper layers.
> First patch makes sure that a RQ_NOIDLE coming after a sequence of
> possibly idling requests from the same queue on the no-idle tree will
> clear the noidle_tree_requires_idle flag.
> Second patch enables RQ_NOIDLE for queues in the idling tree,
> restoring the behaviour pre-8e55063 patch.
>
> An other option to consider is the partial revert of 8e55063, if the
> corner cases we are trying to handle are not frequent enough to
> justify this added complexity.
>
> Thanks,
> Corrado
>



--
__________________________________________________________________________

dott. Corrado Zoccolo mailto:[email protected]
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
Tales of Power - C. Castaneda

2010-07-07 17:03:18

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

Corrado Zoccolo <[email protected]> writes:

> Hi Jens,
> patch 8e55063 "cfq-iosched: fix corner cases in idling logic", is
> suspected for some regressions on high end hardware.
> The two patches from this series:
> - [PATCH 1/2] cfq-iosched: fix tree-wide handling of rq_noidle
> - [PATCH 2/2] cfq-iosched: RQ_NOIDLE enabled for SYNC_WORKLOAD
> fix two issues that I have identified, related to how RQ_NOIDLE is
> used by the upper layers.
> First patch makes sure that a RQ_NOIDLE coming after a sequence of
> possibly idling requests from the same queue on the no-idle tree will
> clear the noidle_tree_requires_idle flag.
> Second patch enables RQ_NOIDLE for queues in the idling tree,
> restoring the behaviour pre-8e55063 patch.

Hi, Corrado,

I ran your kernel through my tests. Here are the results, up against
vanilla, deadline, and the blk_yield patch set:

just just
fs_mark fio mixed
-------------------------------+--------------
deadline 529.44 151.4 | 450.0 78.2
vanilla cfq 107.88 164.4 | 6.6 137.2
blk_yield cfq 530.82 158.7 | 113.2 78.6
corrado cfq 80.82 138.1 | 4.5 130.7

fs_mark results are in files/second, fio results are in MB/s. All
results are the average of 5 runs. In order to get results for the
mixed workload for both vanilla and Corrado's kernels, I had to extend
the runtime from 30s to 300s.

So, the changes proposed in this thread actually make performance worse
across the board.

I re-ran my tests against a RHEL 5 kernel (which is based on 2.6.18),
and it shows that fs_mark performance is much better than stock CFQ in
2.6.35-rc3, and the mixed workload results are much the same as they are
now (which is to say, the fs_mark process is completely starved by the
sequential reader). So, that problem has existed for a long time.

I'm still in the process of collecting data from production servers and
will report back with my findings there.

Cheers,
Jeff

2010-07-07 17:39:53

by Corrado Zoccolo

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

On Wed, Jul 7, 2010 at 7:03 PM, Jeff Moyer <[email protected]> wrote:
> Corrado Zoccolo <[email protected]> writes:
>
>> Hi Jens,
>> patch 8e55063 "cfq-iosched: fix corner cases in idling logic", is
>> suspected for some regressions on high end hardware.
>> The two patches from this series:
>> - [PATCH 1/2] cfq-iosched: fix tree-wide handling of rq_noidle
>> - [PATCH 2/2] cfq-iosched: RQ_NOIDLE enabled for SYNC_WORKLOAD
>> fix two issues that I have identified, related to how RQ_NOIDLE is
>> used by the upper layers.
>> First patch makes sure that a RQ_NOIDLE coming after a sequence of
>> possibly idling requests from the same queue on the no-idle tree will
>> clear the noidle_tree_requires_idle flag.
>> Second patch enables RQ_NOIDLE for queues in the idling tree,
>> restoring the behaviour pre-8e55063 patch.
>
> Hi, Corrado,
>
> I ran your kernel through my tests.  Here are the results, up against
> vanilla, deadline, and the blk_yield patch set:
>
Hi Jeff,
can you also add cfq with 8e55063 reverted to the testing mix?

>                 just    just
>                fs_mark  fio        mixed
> -------------------------------+--------------
> deadline        529.44   151.4 | 450.0    78.2
> vanilla cfq     107.88   164.4 |   6.6   137.2
> blk_yield cfq   530.82   158.7 | 113.2    78.6
> corrado cfq      80.82   138.1 |   4.5   130.7

So it doesn't seem to help. I wonder if other parts of that commit are
affecting those workloads.

>
> fs_mark results are in files/second, fio results are in MB/s.  All
> results are the average of 5 runs.  In order to get results for the
> mixed workload for both vanilla and Corrado's kernels, I had to extend
> the runtime from 30s to 300s.
>
> So, the changes proposed in this thread actually make performance worse
> across the board.
>
> I re-ran my tests against a RHEL 5 kernel (which is based on 2.6.18),
> and it shows that fs_mark performance is much better than stock CFQ in
> 2.6.35-rc3, and the mixed workload results are much the same as they are
> now (which is to say, the fs_mark process is completely starved by the
> sequential reader).  So, that problem has existed for a long time.
>
> I'm still in the process of collecting data from production servers and
> will report back with my findings there.

Thanks,
Corrado

>
> Cheers,
> Jeff
>



--
__________________________________________________________________________

dott. Corrado Zoccolo mailto:[email protected]
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
Tales of Power - C. Castaneda

2010-07-07 17:50:18

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

On Wed, Jul 07, 2010 at 01:03:08PM -0400, Jeff Moyer wrote:
> Corrado Zoccolo <[email protected]> writes:
>
> > Hi Jens,
> > patch 8e55063 "cfq-iosched: fix corner cases in idling logic", is
> > suspected for some regressions on high end hardware.
> > The two patches from this series:
> > - [PATCH 1/2] cfq-iosched: fix tree-wide handling of rq_noidle
> > - [PATCH 2/2] cfq-iosched: RQ_NOIDLE enabled for SYNC_WORKLOAD
> > fix two issues that I have identified, related to how RQ_NOIDLE is
> > used by the upper layers.
> > First patch makes sure that a RQ_NOIDLE coming after a sequence of
> > possibly idling requests from the same queue on the no-idle tree will
> > clear the noidle_tree_requires_idle flag.
> > Second patch enables RQ_NOIDLE for queues in the idling tree,
> > restoring the behaviour pre-8e55063 patch.
>
> Hi, Corrado,
>
> I ran your kernel through my tests. Here are the results, up against
> vanilla, deadline, and the blk_yield patch set:
>
> just just
> fs_mark fio mixed
> -------------------------------+--------------
> deadline 529.44 151.4 | 450.0 78.2
> vanilla cfq 107.88 164.4 | 6.6 137.2
> blk_yield cfq 530.82 158.7 | 113.2 78.6
> corrado cfq 80.82 138.1 | 4.5 130.7
>
> fs_mark results are in files/second, fio results are in MB/s. All
> results are the average of 5 runs. In order to get results for the
> mixed workload for both vanilla and Corrado's kernels, I had to extend
> the runtime from 30s to 300s.
>
> So, the changes proposed in this thread actually make performance worse
> across the board.

This is really surprising. It should have atleast helped in just fs_mark
case.

I think what is happening is that we are idling on the fsync queue
(because it is last queue in the group). After some time jbd thread will
submit some IO and we will not preempt the fsync thread. That's why
I had also implemented the logic of allowing preemption in case of group
idle and that had helped.

>
> I re-ran my tests against a RHEL 5 kernel (which is based on 2.6.18),
> and it shows that fs_mark performance is much better than stock CFQ in
> 2.6.35-rc3, and the mixed workload results are much the same as they are
> now (which is to say, the fs_mark process is completely starved by the
> sequential reader). So, that problem has existed for a long time.

If we just stop idling on WRITE_SYNC, we will should be back to almost
2.6.18 CFQ behavior.

>
> I'm still in the process of collecting data from production servers and
> will report back with my findings there.

That would be great.

Vivek

2010-07-07 20:06:12

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

Corrado Zoccolo <[email protected]> writes:

> On Wed, Jul 7, 2010 at 7:03 PM, Jeff Moyer <[email protected]> wrote:
>> Corrado Zoccolo <[email protected]> writes:
>>
>>> Hi Jens,
>>> patch 8e55063 "cfq-iosched: fix corner cases in idling logic", is
>>> suspected for some regressions on high end hardware.
>>> The two patches from this series:
>>> - [PATCH 1/2] cfq-iosched: fix tree-wide handling of rq_noidle
>>> - [PATCH 2/2] cfq-iosched: RQ_NOIDLE enabled for SYNC_WORKLOAD
>>> fix two issues that I have identified, related to how RQ_NOIDLE is
>>> used by the upper layers.
>>> First patch makes sure that a RQ_NOIDLE coming after a sequence of
>>> possibly idling requests from the same queue on the no-idle tree will
>>> clear the noidle_tree_requires_idle flag.
>>> Second patch enables RQ_NOIDLE for queues in the idling tree,
>>> restoring the behaviour pre-8e55063 patch.
>>
>> Hi, Corrado,
>>
>> I ran your kernel through my tests.  Here are the results, up against
>> vanilla, deadline, and the blk_yield patch set:
>>
> Hi Jeff,
> can you also add cfq with 8e55063 reverted to the testing mix?

Sure, the results now look like this:

just just
fs_mark fio mixed
-------------------------------+--------------
deadline 529.44 151.4 | 450.0 78.2
vanilla cfq 107.88 164.4 | 6.6 137.2
blk_yield cfq 530.82 158.7 | 113.2 78.6
corrado cfq 110.16 220.6 | 7.0 159.8
8e55063 revert 559.66 198.9 | 16.1 153.3

I had accidentally run your patch set (corrado cfq) on ext3, so the
numbers were a bit off (everything else was run against ext4). The
corrected numbers above reflect the performance on ext4, which is much
better for the sequential reader, but still not great for the fs_mark
run. Reverting 8e55063 definitely gets us into better shape. However,
if we care about the mixed workload, then it won't be enough.

It's worth noting that I can't explain that jump from 151MB/s for
deadline vs 220MB/s for corrado cfq. I'm not sure how you can vary
driving a single queue depth sequential read at all. Those are the
averages of 5 runs and this storage should be solely accessible by me,
so I am at a loss.

Cheers,
Jeff

2010-07-08 14:35:35

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

On Wed, Jul 07, 2010 at 01:03:08PM -0400, Jeff Moyer wrote:
> Corrado Zoccolo <[email protected]> writes:
>
> > Hi Jens,
> > patch 8e55063 "cfq-iosched: fix corner cases in idling logic", is
> > suspected for some regressions on high end hardware.
> > The two patches from this series:
> > - [PATCH 1/2] cfq-iosched: fix tree-wide handling of rq_noidle
> > - [PATCH 2/2] cfq-iosched: RQ_NOIDLE enabled for SYNC_WORKLOAD
> > fix two issues that I have identified, related to how RQ_NOIDLE is
> > used by the upper layers.
> > First patch makes sure that a RQ_NOIDLE coming after a sequence of
> > possibly idling requests from the same queue on the no-idle tree will
> > clear the noidle_tree_requires_idle flag.
> > Second patch enables RQ_NOIDLE for queues in the idling tree,
> > restoring the behaviour pre-8e55063 patch.
>
> Hi, Corrado,
>
> I ran your kernel through my tests. Here are the results, up against
> vanilla, deadline, and the blk_yield patch set:
>
> just just
> fs_mark fio mixed
> -------------------------------+--------------
> deadline 529.44 151.4 | 450.0 78.2
> vanilla cfq 107.88 164.4 | 6.6 137.2
> blk_yield cfq 530.82 158.7 | 113.2 78.6
> corrado cfq 80.82 138.1 | 4.5 130.7
>
> fs_mark results are in files/second, fio results are in MB/s. All
> results are the average of 5 runs. In order to get results for the
> mixed workload for both vanilla and Corrado's kernels, I had to extend
> the runtime from 30s to 300s.
>
> So, the changes proposed in this thread actually make performance worse
> across the board.
>
> I re-ran my tests against a RHEL 5 kernel (which is based on 2.6.18),
> and it shows that fs_mark performance is much better than stock CFQ in
> 2.6.35-rc3, and the mixed workload results are much the same as they are
> now (which is to say, the fs_mark process is completely starved by the
> sequential reader). So, that problem has existed for a long time.
>
> I'm still in the process of collecting data from production servers and
> will report back with my findings there.

Hi Jeff and all,

How about if we simply get rid of idling on RQ_NOIDLE threads (as
corrado's patch series does) and not try to solve the problem of fsync
being starved in the presence of sequential readers. I mean it might just
be a theoritical problem and not many people are running into it. That's
how CFQ has been behaving for long-2 time and if nobody is complaining
then we probably don't have to fix it.

Thanks
Vivek

2010-07-08 14:38:25

by Corrado Zoccolo

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

On Wed, Jul 7, 2010 at 10:06 PM, Jeff Moyer <[email protected]> wrote:
> Corrado Zoccolo <[email protected]> writes:
>
>> On Wed, Jul 7, 2010 at 7:03 PM, Jeff Moyer <[email protected]> wrote:
>>> Corrado Zoccolo <[email protected]> writes:
>>>
>>>> Hi Jens,
>>>> patch 8e55063 "cfq-iosched: fix corner cases in idling logic", is
>>>> suspected for some regressions on high end hardware.
>>>> The two patches from this series:
>>>> - [PATCH 1/2] cfq-iosched: fix tree-wide handling of rq_noidle
>>>> - [PATCH 2/2] cfq-iosched: RQ_NOIDLE enabled for SYNC_WORKLOAD
>>>> fix two issues that I have identified, related to how RQ_NOIDLE is
>>>> used by the upper layers.
>>>> First patch makes sure that a RQ_NOIDLE coming after a sequence of
>>>> possibly idling requests from the same queue on the no-idle tree will
>>>> clear the noidle_tree_requires_idle flag.
>>>> Second patch enables RQ_NOIDLE for queues in the idling tree,
>>>> restoring the behaviour pre-8e55063 patch.
>>>
>>> Hi, Corrado,
>>>
>>> I ran your kernel through my tests.  Here are the results, up against
>>> vanilla, deadline, and the blk_yield patch set:
>>>
>> Hi Jeff,
>> can you also add cfq with 8e55063 reverted to the testing mix?
>
> Sure, the results now look like this:
>
>                 just    just
>                fs_mark  fio        mixed
> -------------------------------+--------------
> deadline        529.44   151.4 | 450.0    78.2
> vanilla cfq     107.88   164.4 |   6.6   137.2
> blk_yield cfq   530.82   158.7 | 113.2    78.6
> corrado cfq     110.16   220.6 |   7.0   159.8
> 8e55063 revert  559.66   198.9 |  16.1   153.3
>
> I had accidentally run your patch set (corrado cfq) on ext3, so the
> numbers were a bit off (everything else was run against ext4).  The
> corrected numbers above reflect the performance on ext4, which is much
> better for the sequential reader, but still not great for the fs_mark
> run.  Reverting 8e55063 definitely gets us into better shape.  However,
> if we care about the mixed workload, then it won't be enough.

I wonder why 8e55063 revert gives such a big improvement on fsync ops.
Maybe, before 8e55063, we ended up not idling even if
cfq_arm_slice_timer was called, due to other requests still pending?
I think your patch that allows both async and sync requests to be in
flight at the same time could help here.

>
> It's worth noting that I can't explain that jump from 151MB/s for
> deadline vs 220MB/s for corrado cfq.  I'm not sure how you can vary
> driving a single queue depth sequential read at all.  Those are the
> averages of 5 runs and this storage should be solely accessible by me,
> so I am at a loss.

I guess ext4 tries to be smart, and issues some background reads of fs
data structures needed to keep reading the sequential file without
interruption.
Those reads will be far from the current head, so if you service them
immediately (as deadline would), they can cause a degradation, while
delaying them to when you are servicing other random requests (as cfq
would) can help.

>
> Cheers,
> Jeff
>



--
__________________________________________________________________________

dott. Corrado Zoccolo mailto:[email protected]
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
Tales of Power - C. Castaneda

2010-07-08 14:38:49

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

Vivek Goyal <[email protected]> writes:

> On Wed, Jul 07, 2010 at 01:03:08PM -0400, Jeff Moyer wrote:
>> Corrado Zoccolo <[email protected]> writes:
>>
>> > Hi Jens,
>> > patch 8e55063 "cfq-iosched: fix corner cases in idling logic", is
>> > suspected for some regressions on high end hardware.
>> > The two patches from this series:
>> > - [PATCH 1/2] cfq-iosched: fix tree-wide handling of rq_noidle
>> > - [PATCH 2/2] cfq-iosched: RQ_NOIDLE enabled for SYNC_WORKLOAD
>> > fix two issues that I have identified, related to how RQ_NOIDLE is
>> > used by the upper layers.
>> > First patch makes sure that a RQ_NOIDLE coming after a sequence of
>> > possibly idling requests from the same queue on the no-idle tree will
>> > clear the noidle_tree_requires_idle flag.
>> > Second patch enables RQ_NOIDLE for queues in the idling tree,
>> > restoring the behaviour pre-8e55063 patch.
>>
>> Hi, Corrado,
>>
>> I ran your kernel through my tests. Here are the results, up against
>> vanilla, deadline, and the blk_yield patch set:
>>
>> just just
>> fs_mark fio mixed
>> -------------------------------+--------------
>> deadline 529.44 151.4 | 450.0 78.2
>> vanilla cfq 107.88 164.4 | 6.6 137.2
>> blk_yield cfq 530.82 158.7 | 113.2 78.6
>> corrado cfq 80.82 138.1 | 4.5 130.7
>>
>> fs_mark results are in files/second, fio results are in MB/s. All
>> results are the average of 5 runs. In order to get results for the
>> mixed workload for both vanilla and Corrado's kernels, I had to extend
>> the runtime from 30s to 300s.
>>
>> So, the changes proposed in this thread actually make performance worse
>> across the board.
>>
>> I re-ran my tests against a RHEL 5 kernel (which is based on 2.6.18),
>> and it shows that fs_mark performance is much better than stock CFQ in
>> 2.6.35-rc3, and the mixed workload results are much the same as they are
>> now (which is to say, the fs_mark process is completely starved by the
>> sequential reader). So, that problem has existed for a long time.
>>
>> I'm still in the process of collecting data from production servers and
>> will report back with my findings there.
>
> Hi Jeff and all,
>
> How about if we simply get rid of idling on RQ_NOIDLE threads (as
> corrado's patch series does) and not try to solve the problem of fsync
> being starved in the presence of sequential readers. I mean it might just
> be a theoritical problem and not many people are running into it. That's
> how CFQ has been behaving for long-2 time and if nobody is complaining
> then we probably don't have to fix it.

I would instead suggest we just revert that one commit, if this is the
route we're going to go. Please keep in mind, though, that folks who
may have experienced this issue may also have just switched to deadline.

Cheers,
Jeff

2010-07-08 14:45:14

by Corrado Zoccolo

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

On Thu, Jul 8, 2010 at 4:35 PM, Vivek Goyal <[email protected]> wrote:
> On Wed, Jul 07, 2010 at 01:03:08PM -0400, Jeff Moyer wrote:
>> Corrado Zoccolo <[email protected]> writes:
>>
>> > Hi Jens,
>> > patch 8e55063 "cfq-iosched: fix corner cases in idling logic", is
>> > suspected for some regressions on high end hardware.
>> > The two patches from this series:
>> > - [PATCH 1/2] cfq-iosched: fix tree-wide handling of rq_noidle
>> > - [PATCH 2/2] cfq-iosched: RQ_NOIDLE enabled for SYNC_WORKLOAD
>> > fix two issues that I have identified, related to how RQ_NOIDLE is
>> > used by the upper layers.
>> > First patch makes sure that a RQ_NOIDLE coming after a sequence of
>> > possibly idling requests from the same queue on the no-idle tree will
>> > clear the noidle_tree_requires_idle flag.
>> > Second patch enables RQ_NOIDLE for queues in the idling tree,
>> > restoring the behaviour pre-8e55063 patch.
>>
>> Hi, Corrado,
>>
>> I ran your kernel through my tests.  Here are the results, up against
>> vanilla, deadline, and the blk_yield patch set:
>>
>>                  just    just
>>                 fs_mark  fio        mixed
>> -------------------------------+--------------
>> deadline        529.44   151.4 | 450.0    78.2
>> vanilla cfq     107.88   164.4 |   6.6   137.2
>> blk_yield cfq   530.82   158.7 | 113.2    78.6
>> corrado cfq      80.82   138.1 |   4.5   130.7
>>
>> fs_mark results are in files/second, fio results are in MB/s.  All
>> results are the average of 5 runs.  In order to get results for the
>> mixed workload for both vanilla and Corrado's kernels, I had to extend
>> the runtime from 30s to 300s.
>>
>> So, the changes proposed in this thread actually make performance worse
>> across the board.
>>
>> I re-ran my tests against a RHEL 5 kernel (which is based on 2.6.18),
>> and it shows that fs_mark performance is much better than stock CFQ in
>> 2.6.35-rc3, and the mixed workload results are much the same as they are
>> now (which is to say, the fs_mark process is completely starved by the
>> sequential reader).  So, that problem has existed for a long time.
>>
>> I'm still in the process of collecting data from production servers and
>> will report back with my findings there.
>
> Hi Jeff and all,
>
> How about if we simply get rid of idling on RQ_NOIDLE threads (as
> corrado's patch series does) and not try to solve the problem of fsync
> being starved in the presence of sequential readers. I mean it might just
> be a theoritical problem and not many people are running into it. That's
> how CFQ has been behaving for long-2 time and if nobody is complaining
> then we probably don't have to fix it.

8e55063 was done to fix theoretical problems as well :)
I think, instead, that Jeff's approach of yielding the queue when a
better knowledge is present is good, and this set of patches is not
intended as a replacement. It is intended just to fix some regressions
introduced by a previous commit, and I hope it could work together
with Jeff's patch.
Clearly, if RQ_NOIDLE is used only in the places that Jeff is already
handling, then it is better to completely remove RQ_NOIDLE handling,
so my patch set becomes obsolete.

Thanks,
Corrado
>
> Thanks
> Vivek
>



--
__________________________________________________________________________

dott. Corrado Zoccolo mailto:[email protected]
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
Tales of Power - C. Castaneda

2010-07-09 10:33:50

by Corrado Zoccolo

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

On Wed, Jul 7, 2010 at 10:06 PM, Jeff Moyer <[email protected]> wrote:
> Corrado Zoccolo <[email protected]> writes:
>
>> On Wed, Jul 7, 2010 at 7:03 PM, Jeff Moyer <[email protected]> wrote:
>>> Corrado Zoccolo <[email protected]> writes:
>>>
>>>> Hi Jens,
>>>> patch 8e55063 "cfq-iosched: fix corner cases in idling logic", is
>>>> suspected for some regressions on high end hardware.
>>>> The two patches from this series:
>>>> - [PATCH 1/2] cfq-iosched: fix tree-wide handling of rq_noidle
>>>> - [PATCH 2/2] cfq-iosched: RQ_NOIDLE enabled for SYNC_WORKLOAD
>>>> fix two issues that I have identified, related to how RQ_NOIDLE is
>>>> used by the upper layers.
>>>> First patch makes sure that a RQ_NOIDLE coming after a sequence of
>>>> possibly idling requests from the same queue on the no-idle tree will
>>>> clear the noidle_tree_requires_idle flag.
>>>> Second patch enables RQ_NOIDLE for queues in the idling tree,
>>>> restoring the behaviour pre-8e55063 patch.
>>>
>>> Hi, Corrado,
>>>
>>> I ran your kernel through my tests.  Here are the results, up against
>>> vanilla, deadline, and the blk_yield patch set:
>>>
>> Hi Jeff,
>> can you also add cfq with 8e55063 reverted to the testing mix?
>
> Sure, the results now look like this:
>
>                 just    just
>                fs_mark  fio        mixed
> -------------------------------+--------------
> deadline        529.44   151.4 | 450.0    78.2
> vanilla cfq     107.88   164.4 |   6.6   137.2
> blk_yield cfq   530.82   158.7 | 113.2    78.6
> corrado cfq     110.16   220.6 |   7.0   159.8
> 8e55063 revert  559.66   198.9 |  16.1   153.3
>
> I had accidentally run your patch set (corrado cfq) on ext3, so the
> numbers were a bit off (everything else was run against ext4).  The
> corrected numbers above reflect the performance on ext4, which is much
> better for the sequential reader, but still not great for the fs_mark
> run.  Reverting 8e55063 definitely gets us into better shape.  However,
> if we care about the mixed workload, then it won't be enough.

Wondering why deadline performs so well in the fs_mark workload. Is it
because it doesn't distinguish between sync and async writes?
Maybe we can achieve something similar by putting all sync writes
(that are marked as REQ_NOIDLE) in the noidle tree? This, coupled with
making jbd(2) perform sync writes, should make the yield automatic,
since they all live in the same tree for which we don't idle between
queues, and should be able to provide fairness compared to a
sequential reader (that lives in the other tree).

Can you test the attached patch, where I also added your changes to
make jbd(2) to perform sync writes?

Thanks,
Corrado

>
> It's worth noting that I can't explain that jump from 151MB/s for
> deadline vs 220MB/s for corrado cfq.  I'm not sure how you can vary
> driving a single queue depth sequential read at all.  Those are the
> averages of 5 runs and this storage should be solely accessible by me,
> so I am at a loss.
>
> Cheers,
> Jeff
>



--
__________________________________________________________________________

dott. Corrado Zoccolo                          mailto:[email protected]
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
                               Tales of Power - C. Castaneda


Attachments:
0001-p.o.c.-fairness-between-seq-reader-and-sync-writers.patch (3.23 kB)

2010-07-09 13:23:56

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

On Fri, Jul 09, 2010 at 12:33:36PM +0200, Corrado Zoccolo wrote:
> On Wed, Jul 7, 2010 at 10:06 PM, Jeff Moyer <[email protected]> wrote:
> > Corrado Zoccolo <[email protected]> writes:
> >
> >> On Wed, Jul 7, 2010 at 7:03 PM, Jeff Moyer <[email protected]> wrote:
> >>> Corrado Zoccolo <[email protected]> writes:
> >>>
> >>>> Hi Jens,
> >>>> patch 8e55063 "cfq-iosched: fix corner cases in idling logic", is
> >>>> suspected for some regressions on high end hardware.
> >>>> The two patches from this series:
> >>>> - [PATCH 1/2] cfq-iosched: fix tree-wide handling of rq_noidle
> >>>> - [PATCH 2/2] cfq-iosched: RQ_NOIDLE enabled for SYNC_WORKLOAD
> >>>> fix two issues that I have identified, related to how RQ_NOIDLE is
> >>>> used by the upper layers.
> >>>> First patch makes sure that a RQ_NOIDLE coming after a sequence of
> >>>> possibly idling requests from the same queue on the no-idle tree will
> >>>> clear the noidle_tree_requires_idle flag.
> >>>> Second patch enables RQ_NOIDLE for queues in the idling tree,
> >>>> restoring the behaviour pre-8e55063 patch.
> >>>
> >>> Hi, Corrado,
> >>>
> >>> I ran your kernel through my tests. ?Here are the results, up against
> >>> vanilla, deadline, and the blk_yield patch set:
> >>>
> >> Hi Jeff,
> >> can you also add cfq with 8e55063 reverted to the testing mix?
> >
> > Sure, the results now look like this:
> >
> > ? ? ? ? ? ? ? ? just ? ?just
> > ? ? ? ? ? ? ? ?fs_mark ?fio ? ? ? ?mixed
> > -------------------------------+--------------
> > deadline ? ? ? ?529.44 ? 151.4 | 450.0 ? ?78.2
> > vanilla cfq ? ? 107.88 ? 164.4 | ? 6.6 ? 137.2
> > blk_yield cfq ? 530.82 ? 158.7 | 113.2 ? ?78.6
> > corrado cfq ? ? 110.16 ? 220.6 | ? 7.0 ? 159.8
> > 8e55063 revert ?559.66 ? 198.9 | ?16.1 ? 153.3
> >
> > I had accidentally run your patch set (corrado cfq) on ext3, so the
> > numbers were a bit off (everything else was run against ext4). ?The
> > corrected numbers above reflect the performance on ext4, which is much
> > better for the sequential reader, but still not great for the fs_mark
> > run. ?Reverting 8e55063 definitely gets us into better shape. ?However,
> > if we care about the mixed workload, then it won't be enough.
>
> Wondering why deadline performs so well in the fs_mark workload. Is it
> because it doesn't distinguish between sync and async writes?
> Maybe we can achieve something similar by putting all sync writes
> (that are marked as REQ_NOIDLE) in the noidle tree? This, coupled with
> making jbd(2) perform sync writes, should make the yield automatic,
> since they all live in the same tree for which we don't idle between
> queues, and should be able to provide fairness compared to a
> sequential reader (that lives in the other tree).
>

This makes sense conceptually at least at CFQ level. By putting
OSYNC/fsync on sync-noidle tree we will not be able to take advantage
of sequential nature of queue but christoph mentioned that all sequential
writes in general should be lumped together and then sent down to CFQ
instead of issuing small writes after some delay. So this probably
is not an issue.

What I am not sure about is impact of switching jbd thread writes from
async to sync (WRITE ---> WRITE_SYNC). Especially if somebody is
journalling the data also (data=journal).

But it is definitely worth trying because then we don't have to idle on
individual queues of WRITE_SYNC as well as fsync performance issue should
also be solved.

Thanks
Vivek

2010-07-09 14:07:11

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

Corrado Zoccolo <[email protected]> writes:

> On Wed, Jul 7, 2010 at 10:06 PM, Jeff Moyer <[email protected]> wrote:
>> Corrado Zoccolo <[email protected]> writes:
>>
>>> On Wed, Jul 7, 2010 at 7:03 PM, Jeff Moyer <[email protected]> wrote:
>>>> Corrado Zoccolo <[email protected]> writes:
>>>>
>>>>> Hi Jens,
>>>>> patch 8e55063 "cfq-iosched: fix corner cases in idling logic", is
>>>>> suspected for some regressions on high end hardware.
>>>>> The two patches from this series:
>>>>> - [PATCH 1/2] cfq-iosched: fix tree-wide handling of rq_noidle
>>>>> - [PATCH 2/2] cfq-iosched: RQ_NOIDLE enabled for SYNC_WORKLOAD
>>>>> fix two issues that I have identified, related to how RQ_NOIDLE is
>>>>> used by the upper layers.
>>>>> First patch makes sure that a RQ_NOIDLE coming after a sequence of
>>>>> possibly idling requests from the same queue on the no-idle tree will
>>>>> clear the noidle_tree_requires_idle flag.
>>>>> Second patch enables RQ_NOIDLE for queues in the idling tree,
>>>>> restoring the behaviour pre-8e55063 patch.
>>>>
>>>> Hi, Corrado,
>>>>
>>>> I ran your kernel through my tests.  Here are the results, up against
>>>> vanilla, deadline, and the blk_yield patch set:
>>>>
>>> Hi Jeff,
>>> can you also add cfq with 8e55063 reverted to the testing mix?
>>
>> Sure, the results now look like this:
>>
>>                 just    just
>>                fs_mark  fio        mixed
>> -------------------------------+--------------
>> deadline        529.44   151.4 | 450.0    78.2
>> vanilla cfq     107.88   164.4 |   6.6   137.2
>> blk_yield cfq   530.82   158.7 | 113.2    78.6
>> corrado cfq     110.16   220.6 |   7.0   159.8
>> 8e55063 revert  559.66   198.9 |  16.1   153.3
>>
>> I had accidentally run your patch set (corrado cfq) on ext3, so the
>> numbers were a bit off (everything else was run against ext4).  The
>> corrected numbers above reflect the performance on ext4, which is much
>> better for the sequential reader, but still not great for the fs_mark
>> run.  Reverting 8e55063 definitely gets us into better shape.  However,
>> if we care about the mixed workload, then it won't be enough.
>
> Wondering why deadline performs so well in the fs_mark workload. Is it
> because it doesn't distinguish between sync and async writes?

It performs well because it doesn't do any idling.

> Maybe we can achieve something similar by putting all sync writes
> (that are marked as REQ_NOIDLE) in the noidle tree? This, coupled with
> making jbd(2) perform sync writes, should make the yield automatic,
> since they all live in the same tree for which we don't idle between
> queues, and should be able to provide fairness compared to a
> sequential reader (that lives in the other tree).
>
> Can you test the attached patch, where I also added your changes to
> make jbd(2) to perform sync writes?

I'm not sure what kernel you generated that patch against. I'm working
with 2.6.35-rc3 or later, and your patch does not apply there.

Cheers,
Jeff

2010-07-09 19:45:27

by Corrado Zoccolo

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

On Fri, Jul 9, 2010 at 4:07 PM, Jeff Moyer <[email protected]> wrote:
> Corrado Zoccolo <[email protected]> writes:
>
>> On Wed, Jul 7, 2010 at 10:06 PM, Jeff Moyer <[email protected]> wrote:
>>> Corrado Zoccolo <[email protected]> writes:
>>>
>>>> On Wed, Jul 7, 2010 at 7:03 PM, Jeff Moyer <[email protected]> wrote:
>>>>> Corrado Zoccolo <[email protected]> writes:
>>>>>
>>>>>> Hi Jens,
>>>>>> patch 8e55063 "cfq-iosched: fix corner cases in idling logic", is
>>>>>> suspected for some regressions on high end hardware.
>>>>>> The two patches from this series:
>>>>>> - [PATCH 1/2] cfq-iosched: fix tree-wide handling of rq_noidle
>>>>>> - [PATCH 2/2] cfq-iosched: RQ_NOIDLE enabled for SYNC_WORKLOAD
>>>>>> fix two issues that I have identified, related to how RQ_NOIDLE is
>>>>>> used by the upper layers.
>>>>>> First patch makes sure that a RQ_NOIDLE coming after a sequence of
>>>>>> possibly idling requests from the same queue on the no-idle tree will
>>>>>> clear the noidle_tree_requires_idle flag.
>>>>>> Second patch enables RQ_NOIDLE for queues in the idling tree,
>>>>>> restoring the behaviour pre-8e55063 patch.
>>>>>
>>>>> Hi, Corrado,
>>>>>
>>>>> I ran your kernel through my tests.  Here are the results, up against
>>>>> vanilla, deadline, and the blk_yield patch set:
>>>>>
>>>> Hi Jeff,
>>>> can you also add cfq with 8e55063 reverted to the testing mix?
>>>
>>> Sure, the results now look like this:
>>>
>>>                 just    just
>>>                fs_mark  fio        mixed
>>> -------------------------------+--------------
>>> deadline        529.44   151.4 | 450.0    78.2
>>> vanilla cfq     107.88   164.4 |   6.6   137.2
>>> blk_yield cfq   530.82   158.7 | 113.2    78.6
>>> corrado cfq     110.16   220.6 |   7.0   159.8
>>> 8e55063 revert  559.66   198.9 |  16.1   153.3
>>>
>>> I had accidentally run your patch set (corrado cfq) on ext3, so the
>>> numbers were a bit off (everything else was run against ext4).  The
>>> corrected numbers above reflect the performance on ext4, which is much
>>> better for the sequential reader, but still not great for the fs_mark
>>> run.  Reverting 8e55063 definitely gets us into better shape.  However,
>>> if we care about the mixed workload, then it won't be enough.
>>
>> Wondering why deadline performs so well in the fs_mark workload. Is it
>> because it doesn't distinguish between sync and async writes?
>
> It performs well because it doesn't do any idling.
>
>> Maybe we can achieve something similar by putting all sync writes
>> (that are marked as REQ_NOIDLE) in the noidle tree? This, coupled with
>> making jbd(2) perform sync writes, should make the yield automatic,
>> since they all live in the same tree for which we don't idle between
>> queues, and should be able to provide fairness compared to a
>> sequential reader (that lives in the other tree).
>>
>> Can you test the attached patch, where I also added your changes to
>> make jbd(2) to perform sync writes?
>
> I'm not sure what kernel you generated that patch against.  I'm working
> with 2.6.35-rc3 or later, and your patch does not apply there.
It's Jens' block/for-2.6.36 tree.

>
> Cheers,
> Jeff
>



--
__________________________________________________________________________

dott. Corrado Zoccolo                          mailto:[email protected]
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
                               Tales of Power - C. Castaneda

2010-07-09 20:48:35

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

Corrado Zoccolo <[email protected]> writes:

>> I'm not sure what kernel you generated that patch against.  I'm working
>> with 2.6.35-rc3 or later, and your patch does not apply there.

> It's Jens' block/for-2.6.36 tree.

OK. I'll get back to you on this next week. My storage is kind of
broken right now. :(

-Jeff

2010-07-13 19:38:17

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

Corrado Zoccolo <[email protected]> writes:

> Can you test the attached patch, where I also added your changes to
> make jbd(2) to perform sync writes?

I got new storage, so I have new numbers. I only re-ran deadline and
vanilla cfq for the fs_mark only test. The average of 10 runs comes out
like so:

deadline: 571.98
vanilla cfq: 107.42
patched cfq: 460.9

Mixed workload results with your suggested patch:

fs_mark: 15.65 files/sec
fio: 132.5 MB/s

So, again, not looking great for the mixed workload, but the patch
does improve the fs_mark only case. Looking at the blktrace data shows
that the jbd2 thread preempts the fs_mark thread at all the right
times. The only thing holding throughput back is the whole notion that
we need to only dispatch from one queue (even though the storage is
capable of serving both the reads and writes simultaneously).

I added in the patch that allows the simultaneous dispatch of both reads
and writes, and here are the results from that run:

fs_mark: 15.975 files/sec
fio: 132.4 MB/s

So, it looks like that didn't help. The reason this patch doesn't come
close to the yield patch in the mixed workload is because the yield
patch set allows the fs_mark process to continue to issue I/O. With
your patch, the fs_mark process does 64KB of I/O, the jbd2 thread does
the journal commit, and then the fio process runs again. Given that the
fs_mark process typically only uses a small fraction of its time slice,
you end up with an unfair balance.

Now, we still have to decide whether that's a problem that needs
solving. I tried to gather data from the field, but I've been unable to
conclusively say whether an application issues this sort of dependent
I/O.

As such, I am happy with this patch. If we see that we need something
like the blk_yield approach, then I'm happy to resurrect that work.

Jens, do you find that an agreeable solution? If so, you can add my
signed-off-by and tested-by to the patch that Corrado posted.

Cheers,
Jeff

2010-07-13 19:56:58

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

On Tue, Jul 13, 2010 at 03:38:11PM -0400, Jeff Moyer wrote:
> Corrado Zoccolo <[email protected]> writes:
>
> > Can you test the attached patch, where I also added your changes to
> > make jbd(2) to perform sync writes?
>
> I got new storage, so I have new numbers. I only re-ran deadline and
> vanilla cfq for the fs_mark only test. The average of 10 runs comes out
> like so:
>
> deadline: 571.98
> vanilla cfq: 107.42
> patched cfq: 460.9
>
> Mixed workload results with your suggested patch:
>
> fs_mark: 15.65 files/sec
> fio: 132.5 MB/s
>
> So, again, not looking great for the mixed workload, but the patch
> does improve the fs_mark only case. Looking at the blktrace data shows
> that the jbd2 thread preempts the fs_mark thread at all the right
> times. The only thing holding throughput back is the whole notion that
> we need to only dispatch from one queue (even though the storage is
> capable of serving both the reads and writes simultaneously).
>
> I added in the patch that allows the simultaneous dispatch of both reads
> and writes, and here are the results from that run:
>
> fs_mark: 15.975 files/sec
> fio: 132.4 MB/s
>
> So, it looks like that didn't help. The reason this patch doesn't come
> close to the yield patch in the mixed workload is because the yield
> patch set allows the fs_mark process to continue to issue I/O. With
> your patch, the fs_mark process does 64KB of I/O, the jbd2 thread does
> the journal commit, and then the fio process runs again. Given that the
> fs_mark process typically only uses a small fraction of its time slice,
> you end up with an unfair balance.

Hi Jeff,

This is little strange. Given the fact that now both fs_mark and jbd
threads are on sync-noidle tree, we should have idled on sync-noidle
tree to provide fairness and that should have made sure that fs_mark/jbd
do more IO and slice is not lost to fio thread.

Not sure what is happening though in practice. Only you can look at
traces more closely and see if timer is being armed or not.

Thanks
Vivek

2010-07-13 20:30:34

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

Vivek Goyal <[email protected]> writes:

> On Tue, Jul 13, 2010 at 03:38:11PM -0400, Jeff Moyer wrote:
>> Corrado Zoccolo <[email protected]> writes:
>>
>> > Can you test the attached patch, where I also added your changes to
>> > make jbd(2) to perform sync writes?
>>
>> I got new storage, so I have new numbers. I only re-ran deadline and
>> vanilla cfq for the fs_mark only test. The average of 10 runs comes out
>> like so:
>>
>> deadline: 571.98
>> vanilla cfq: 107.42
>> patched cfq: 460.9
>>
>> Mixed workload results with your suggested patch:
>>
>> fs_mark: 15.65 files/sec
>> fio: 132.5 MB/s
>>
>> So, again, not looking great for the mixed workload, but the patch
>> does improve the fs_mark only case. Looking at the blktrace data shows
>> that the jbd2 thread preempts the fs_mark thread at all the right
>> times. The only thing holding throughput back is the whole notion that
>> we need to only dispatch from one queue (even though the storage is
>> capable of serving both the reads and writes simultaneously).
>>
>> I added in the patch that allows the simultaneous dispatch of both reads
>> and writes, and here are the results from that run:
>>
>> fs_mark: 15.975 files/sec
>> fio: 132.4 MB/s
>>
>> So, it looks like that didn't help. The reason this patch doesn't come
>> close to the yield patch in the mixed workload is because the yield
>> patch set allows the fs_mark process to continue to issue I/O. With
>> your patch, the fs_mark process does 64KB of I/O, the jbd2 thread does
>> the journal commit, and then the fio process runs again. Given that the
>> fs_mark process typically only uses a small fraction of its time slice,
>> you end up with an unfair balance.
>
> Hi Jeff,
>
> This is little strange. Given the fact that now both fs_mark and jbd
> threads are on sync-noidle tree, we should have idled on sync-noidle
> tree to provide fairness and that should have made sure that fs_mark/jbd
> do more IO and slice is not lost to fio thread.
>
> Not sure what is happening though in practice. Only you can look at
> traces more closely and see if timer is being armed or not.

Vivek, if you want to look at traces, just ask. I'd be happy to show
them to you, upload them, whatever. I'm not sure why you think
otherwise (though I wouldn't blame you for not wanting to look at
them!).

Now, to answer your question, the jbd2 thread runs and issues a barrier,
which causes a forced dispatch of requests. After that a new queue is
selected, and since the fs_mark thread is blocked on the journal commit,
it's always the fio process that gets to run.

This, of course, raises the question of why the blk_yield patches didn't
run into the same problem. Looking back at some saved traces, I don't
see WBS (write barrier sync) requests, so I wonder if barriers weren't
supported by my last storage system.

Cheers,
Jeff

2010-07-13 20:42:41

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

On Tue, Jul 13, 2010 at 04:30:23PM -0400, Jeff Moyer wrote:
> Vivek Goyal <[email protected]> writes:
>
> > On Tue, Jul 13, 2010 at 03:38:11PM -0400, Jeff Moyer wrote:
> >> Corrado Zoccolo <[email protected]> writes:
> >>
> >> > Can you test the attached patch, where I also added your changes to
> >> > make jbd(2) to perform sync writes?
> >>
> >> I got new storage, so I have new numbers. I only re-ran deadline and
> >> vanilla cfq for the fs_mark only test. The average of 10 runs comes out
> >> like so:
> >>
> >> deadline: 571.98
> >> vanilla cfq: 107.42
> >> patched cfq: 460.9
> >>
> >> Mixed workload results with your suggested patch:
> >>
> >> fs_mark: 15.65 files/sec
> >> fio: 132.5 MB/s
> >>
> >> So, again, not looking great for the mixed workload, but the patch
> >> does improve the fs_mark only case. Looking at the blktrace data shows
> >> that the jbd2 thread preempts the fs_mark thread at all the right
> >> times. The only thing holding throughput back is the whole notion that
> >> we need to only dispatch from one queue (even though the storage is
> >> capable of serving both the reads and writes simultaneously).
> >>
> >> I added in the patch that allows the simultaneous dispatch of both reads
> >> and writes, and here are the results from that run:
> >>
> >> fs_mark: 15.975 files/sec
> >> fio: 132.4 MB/s
> >>
> >> So, it looks like that didn't help. The reason this patch doesn't come
> >> close to the yield patch in the mixed workload is because the yield
> >> patch set allows the fs_mark process to continue to issue I/O. With
> >> your patch, the fs_mark process does 64KB of I/O, the jbd2 thread does
> >> the journal commit, and then the fio process runs again. Given that the
> >> fs_mark process typically only uses a small fraction of its time slice,
> >> you end up with an unfair balance.
> >
> > Hi Jeff,
> >
> > This is little strange. Given the fact that now both fs_mark and jbd
> > threads are on sync-noidle tree, we should have idled on sync-noidle
> > tree to provide fairness and that should have made sure that fs_mark/jbd
> > do more IO and slice is not lost to fio thread.
> >
> > Not sure what is happening though in practice. Only you can look at
> > traces more closely and see if timer is being armed or not.
>
> Vivek, if you want to look at traces, just ask. I'd be happy to show
> them to you, upload them, whatever. I'm not sure why you think
> otherwise (though I wouldn't blame you for not wanting to look at
> them!).

I don't mind looking at traces. Do let me know where can I access those.

>
> Now, to answer your question, the jbd2 thread runs and issues a barrier,
> which causes a forced dispatch of requests. After that a new queue is
> selected, and since the fs_mark thread is blocked on the journal commit,
> it's always the fio process that gets to run.

Ok, that explains it. So somehow after the barrier, fio always wins
as issues next read request before the fs_mark is able to issue the
next set of writes.

>
> This, of course, raises the question of why the blk_yield patches didn't
> run into the same problem. Looking back at some saved traces, I don't
> see WBS (write barrier sync) requests, so I wonder if barriers weren't
> supported by my last storage system.

I think that blk_yield patches will also run into the same issue if
barriers are enabled.

Thanks
Vivek

2010-07-13 21:00:22

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

Jeff Moyer <[email protected]> writes:

> This, of course, raises the question of why the blk_yield patches didn't
> run into the same problem. Looking back at some saved traces, I don't
> see WBS (write barrier sync) requests, so I wonder if barriers weren't
> supported by my last storage system.

So, I tested Corrado's approach with -o nobarrier, and here are the
results:

fs_mark: 363.291 files/sec
fio: 38.5 MB/s

I don't have time to analyze the data right now, and it's 600MB worth of
binary output. If you want, I can upload a representative sample
somewhere, let me know.

Anyway, I'll post an analysis tomorrow.

Cheers,
Jeff

2010-07-19 16:08:32

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

Vivek Goyal <[email protected]> writes:

> On Tue, Jul 13, 2010 at 04:30:23PM -0400, Jeff Moyer wrote:
>> Vivek Goyal <[email protected]> writes:

> I don't mind looking at traces. Do let me know where can I access those.

Forwarded privately.

>> Now, to answer your question, the jbd2 thread runs and issues a barrier,
>> which causes a forced dispatch of requests. After that a new queue is
>> selected, and since the fs_mark thread is blocked on the journal commit,
>> it's always the fio process that gets to run.
>
> Ok, that explains it. So somehow after the barrier, fio always wins
> as issues next read request before the fs_mark is able to issue the
> next set of writes.
>
>>
>> This, of course, raises the question of why the blk_yield patches didn't
>> run into the same problem. Looking back at some saved traces, I don't
>> see WBS (write barrier sync) requests, so I wonder if barriers weren't
>> supported by my last storage system.
>
> I think that blk_yield patches will also run into the same issue if
> barriers are enabled.

Agreed.

Here are the results again with barriers disabled for Corrado's patch:

fs_mark: 348.2 files/sec
fio: 53324.6 KB/s

Remember that deadline was seeing 450 files/sec and 78 MB/s. So, in
this case, the buffered reader appears to be starved. Looking into this
further, I found that the journal thread is running with I/O priority 0,
while the fio and fs_mark processes are running at the default (4).
Because the jbd thread has a higher I/O priority, its requests are
always closer to the front of the sort list, and thus the sync-noidle
workload is chosen more often than the sync workload. This essentially
results in an elevated I/O priority for the fs_mark process as well.
While troubling, that problem is not directly related to the problem
we're looking at.

So, I'm still in favor of Corrado's approach. Are there any remaining
dissenting opinions on this?

Cheers,
Jeff

2010-07-19 20:31:15

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

On Mon, Jul 19, 2010 at 12:08:23PM -0400, Jeff Moyer wrote:
> Vivek Goyal <[email protected]> writes:
>
> > On Tue, Jul 13, 2010 at 04:30:23PM -0400, Jeff Moyer wrote:
> >> Vivek Goyal <[email protected]> writes:
>
> > I don't mind looking at traces. Do let me know where can I access those.
>
> Forwarded privately.
>
> >> Now, to answer your question, the jbd2 thread runs and issues a barrier,
> >> which causes a forced dispatch of requests. After that a new queue is
> >> selected, and since the fs_mark thread is blocked on the journal commit,
> >> it's always the fio process that gets to run.
> >
> > Ok, that explains it. So somehow after the barrier, fio always wins
> > as issues next read request before the fs_mark is able to issue the
> > next set of writes.
> >
> >>
> >> This, of course, raises the question of why the blk_yield patches didn't
> >> run into the same problem. Looking back at some saved traces, I don't
> >> see WBS (write barrier sync) requests, so I wonder if barriers weren't
> >> supported by my last storage system.
> >
> > I think that blk_yield patches will also run into the same issue if
> > barriers are enabled.
>
> Agreed.
>
> Here are the results again with barriers disabled for Corrado's patch:
>
> fs_mark: 348.2 files/sec
> fio: 53324.6 KB/s
>
> Remember that deadline was seeing 450 files/sec and 78 MB/s. So, in
> this case, the buffered reader appears to be starved. Looking into this
> further, I found that the journal thread is running with I/O priority 0,
> while the fio and fs_mark processes are running at the default (4).
> Because the jbd thread has a higher I/O priority, its requests are
> always closer to the front of the sort list, and thus the sync-noidle
> workload is chosen more often than the sync workload. This essentially
> results in an elevated I/O priority for the fs_mark process as well.
> While troubling, that problem is not directly related to the problem
> we're looking at.
>
> So, I'm still in favor of Corrado's approach. Are there any remaining
> dissenting opinions on this?

Nope. I am fine with moving all WRITE_SYNC with RQ_NOIDLE to sync-noidle
tree and also marking jbd writes as WRITE_SYNC. By bringing dependent
threads on single service tree, we don't have to worry about slice
yielding.

Acked-by: Vivek Goyal <[email protected]>

Thanks
Vivek

2010-07-20 14:03:07

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

Vivek Goyal <[email protected]> writes:

> On Mon, Jul 19, 2010 at 12:08:23PM -0400, Jeff Moyer wrote:
>> So, I'm still in favor of Corrado's approach. Are there any remaining
>> dissenting opinions on this?
>
> Nope. I am fine with moving all WRITE_SYNC with RQ_NOIDLE to sync-noidle
> tree and also marking jbd writes as WRITE_SYNC. By bringing dependent
> threads on single service tree, we don't have to worry about slice
> yielding.
>
> Acked-by: Vivek Goyal <[email protected]>

Corrado, would you mind reposting the patches and adding:

Reviewed-by: Jeff Moyer <[email protected]>
Tested-by: Jeff Moyer <[email protected]>

Thanks!
Jeff

2010-07-20 14:11:12

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

Didn't you guys have a previous iteration of the fixes that gets
rid of REQ_NODILE by improving the heuristics inside cfq? That
would be much, much preffered from the filesystem point of view.

2010-07-20 14:26:56

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

On Tue, Jul 20, 2010 at 10:11:03AM -0400, Christoph Hellwig wrote:
> Didn't you guys have a previous iteration of the fixes that gets
> rid of REQ_NODILE by improving the heuristics inside cfq? That
> would be much, much preffered from the filesystem point of view.

Actually in this patch, I was thinking we can probably get rid of
RQ_NOIDLE flag and just check for WRITE_SYNC. Any WRITE_SYNC queue
gets served on sync-noidle tree. I am wondering will we not face jbd
thread issues with direct writes also? If yes, then not special casing
direct IO writes and treat them same as O_SYNC writes will make sense.

I really wished that we had some blktrace of some standard workloads
stored somewhere which we could simply replay using "btreplay" and come
to some kind of conclusion whenever we are faced with taking such
decisions.

Thanks
Vivek

2010-07-20 19:11:00

by Corrado Zoccolo

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

On Tue, Jul 20, 2010 at 4:26 PM, Vivek Goyal <[email protected]> wrote:
> On Tue, Jul 20, 2010 at 10:11:03AM -0400, Christoph Hellwig wrote:
>> Didn't you guys have a previous iteration of the fixes that gets
>> rid of REQ_NODILE by improving the heuristics inside cfq?  That
>> would be much, much preffered from the filesystem point of view.
I think the previous iteration required more complex heuristics, while
this one uses existing ones to handle one more class of problems.
I understand that you still see the complexity from the fs side, but
Vivek's proposal may help also there. It only needs to be tested thoroughly.

>
> Actually in this patch, I was thinking we can probably get rid of
> RQ_NOIDLE flag and just check for WRITE_SYNC. Any WRITE_SYNC queue
> gets served on sync-noidle tree. I am wondering will we not face jbd
> thread issues with direct writes also? If yes, then not special casing
> direct IO writes and treat them same as O_SYNC writes will make sense.

Probably it is better to submit this first, since it is already
tested, and then have a different patch that can finish the work
This will help when bisecting for possible regressions, since I'm not
sure why the other writes are not already marked with RQ_NOIDLE (maybe
it was introduced for some good reason to distinguish the two sets,
and we won't know unless we find the workload where it helped).
I'll resend the current patch with Jeff's reviewed and tested tags.

Corrado

>
> I really wished that we had some blktrace of some standard workloads
> stored somewhere which we could simply replay using "btreplay" and come
> to some kind of conclusion whenever we are faced with taking such
> decisions.
>
> Thanks
> Vivek
>
>



--
__________________________________________________________________________

dott. Corrado Zoccolo                          mailto:[email protected]
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
The self-confidence of a warrior is not the self-confidence of the average
man. The average man seeks certainty in the eyes of the onlooker and calls
that self-confidence. The warrior seeks impeccability in his own eyes and
calls that humbleness.
                               Tales of Power - C. Castaneda

2010-07-20 19:32:36

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 0/2] cfq-iosched: fixing RQ_NOIDLE handling.

On Tue, Jul 20, 2010 at 09:10:56PM +0200, Corrado Zoccolo wrote:
> On Tue, Jul 20, 2010 at 4:26 PM, Vivek Goyal <[email protected]> wrote:
> > On Tue, Jul 20, 2010 at 10:11:03AM -0400, Christoph Hellwig wrote:
> >> Didn't you guys have a previous iteration of the fixes that gets
> >> rid of REQ_NODILE by improving the heuristics inside cfq? ?That
> >> would be much, much preffered from the filesystem point of view.
> I think the previous iteration required more complex heuristics, while
> this one uses existing ones to handle one more class of problems.
> I understand that you still see the complexity from the fs side, but
> Vivek's proposal may help also there. It only needs to be tested thoroughly.
>
> >
> > Actually in this patch, I was thinking we can probably get rid of
> > RQ_NOIDLE flag and just check for WRITE_SYNC. Any WRITE_SYNC queue
> > gets served on sync-noidle tree. I am wondering will we not face jbd
> > thread issues with direct writes also? If yes, then not special casing
> > direct IO writes and treat them same as O_SYNC writes will make sense.
>
> Probably it is better to submit this first, since it is already
> tested, and then have a different patch that can finish the work
> This will help when bisecting for possible regressions, since I'm not
> sure why the other writes are not already marked with RQ_NOIDLE (maybe
> it was introduced for some good reason to distinguish the two sets,
> and we won't know unless we find the workload where it helped).
> I'll resend the current patch with Jeff's reviewed and tested tags.
>

I am fine with pushing this patch as it is first and then once we have
an answer to question whether direct IO path and O_SYNC/fsync path need
same treatment or different treatment in IO scheduler, we can fix RQ_NOIDLE
flag issue also.

Vivek