by Vivek Goyal

[permalink] [raw]

Subject: Re: [RFC PATCH 0/3] block: Fix fsync slowness with CFQ cgroups

On Wed, Jun 29, 2011 at 09:04:55AM +0800, Shaohua Li wrote:

[..]
> > We idle on last queue on sync-noidle tree. So we idle on fysnc queue as
> > it is last queue on sync-noidle tree. That's how we provide protection
> > to all sync-noidle queues against sync-idle queues. Instead of idling
> > on individual quues we do idling in group and that is on service tree.
> Ok. but this looks silly. We are idling in a noidle service tree or a
> group (backed by the last queue of the tree or group) because we assume
> the tree or group can dispatch a request soon. But if the think time of
> the tree or group is big, the assumption isn't true. Doing idle here is
> blind. I thought we can extend the think time check for both service
> tree and group.

We can implement the thinktime for noidle service tree and group idle as
well. That's not a problem, though I am yet to be convinced that thinktime
still makes sense for the group. I guess it will just mean that in the
past have you done a bunch of IO with gap between IO less than 8ms. If
yes, then we expect you to do more IO in future. Frankly speaking, I am
not too sure that how past IO pattern predicts the future IO pattern
of the group.

But anyway, the point is, even if you we implement it, it will not solve
the fsync issue at hand. The reason I explained in previous mail. We
will be oscillating between high think time and low thinktime depending
on whether we are idling or not. There is no correlation between think
time of fsync thread and idling here.

I think you are banking on the fact that after fsync, journaling thread
IO can take more than 8ms hence delaying next IO to fsync thread, pushing
its thinktim more than 8ms hence we will not idle on fsync thread at
all. It is just one corner case and I think it is broken in multiple
cases.

- If filesystem barriers are disabled or backend storage has battery
backup then journal IO most likely will go in cache and barriers
will be ignored. In that case write will finish almost instantly
and we will get next IO from fsync thread very soon hence pushing
down thinktime of fsync thread which will enable idling and we will
be back to the problem we are trying to solve.

- Fsync thread might be submitting string of IOs (say 10-12) before it
moves to journal thread to commit meta data. In that case we might
have lowered thinktime of fsync hence enable idle.

So implementing think time for service tree/group might be a good idea
in general but it will not solve this IO dependecny issue across cgroups.

Thanks
Vivek

2011-06-30 00:29:46

by Shaohua Li

[permalink] [raw]

Subject: Re: [RFC PATCH 0/3] block: Fix fsync slowness with CFQ cgroups

On Wed, Jun 29, 2011 at 09:29:55AM +0800, Vivek Goyal wrote:
> On Wed, Jun 29, 2011 at 09:04:55AM +0800, Shaohua Li wrote:
>
> [..]
> > > We idle on last queue on sync-noidle tree. So we idle on fysnc queue as
> > > it is last queue on sync-noidle tree. That's how we provide protection
> > > to all sync-noidle queues against sync-idle queues. Instead of idling
> > > on individual quues we do idling in group and that is on service tree.
> > Ok. but this looks silly. We are idling in a noidle service tree or a
> > group (backed by the last queue of the tree or group) because we assume
> > the tree or group can dispatch a request soon. But if the think time of
> > the tree or group is big, the assumption isn't true. Doing idle here is
> > blind. I thought we can extend the think time check for both service
> > tree and group.
>
> We can implement the thinktime for noidle service tree and group idle as
> well. That's not a problem, though I am yet to be convinced that thinktime
> still makes sense for the group. I guess it will just mean that in the
> past have you done a bunch of IO with gap between IO less than 8ms. If
> yes, then we expect you to do more IO in future. Frankly speaking, I am
> not too sure that how past IO pattern predicts the future IO pattern
> of the group.
>
> But anyway, the point is, even if you we implement it, it will not solve
> the fsync issue at hand. The reason I explained in previous mail. We
> will be oscillating between high think time and low thinktime depending
> on whether we are idling or not. There is no correlation between think
> time of fsync thread and idling here.
>
> I think you are banking on the fact that after fsync, journaling thread
> IO can take more than 8ms hence delaying next IO to fsync thread, pushing
> its thinktim more than 8ms hence we will not idle on fsync thread at
> all. It is just one corner case and I think it is broken in multiple
> cases.
>
> - If filesystem barriers are disabled or backend storage has battery
> backup then journal IO most likely will go in cache and barriers
> will be ignored. In that case write will finish almost instantly
> and we will get next IO from fsync thread very soon hence pushing
> down thinktime of fsync thread which will enable idling and we will
> be back to the problem we are trying to solve.
>
> - Fsync thread might be submitting string of IOs (say 10-12) before it
> moves to journal thread to commit meta data. In that case we might
> have lowered thinktime of fsync hence enable idle.
>
> So implementing think time for service tree/group might be a good idea
> in general but it will not solve this IO dependecny issue across cgroups.
Ok, fair enough. I'll give a try and check how things change with the fsync workload.

Thanks,
Shaohua