Message-ID: <4B25A4F0.60407@cn.fujitsu.com>
Date: Mon, 14 Dec 2009 10:37:36 +0800
From: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
User-Agent: Thunderbird 2.0.0.23 (Windows/20090812)
MIME-Version: 1.0
To: Vivek Goyal <vgoyal@redhat.com>, Corrado Zoccolo <czoccolo@gmail.com>
CC: Jens Axboe <jens.axboe@oracle.com>,
       linux kernel mailing list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] cfq: Take whether cfq group is changed into account	when
 choosing service tree
References: <4B21D252.1060902@cn.fujitsu.com> <20091211150727.GB2756@redhat.com> <4e5e476b0912111001h3c0b9798u2a2b25c9fcc39504@mail.gmail.com> <20091211184630.GA7066@redhat.com>
In-Reply-To: <20091211184630.GA7066@redhat.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4807
Lines: 111

Vivek Goyal wrote:
> On Fri, Dec 11, 2009 at 07:01:14PM +0100, Corrado Zoccolo wrote:
>> Hi guys,
>> On Fri, Dec 11, 2009 at 4:07 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>>> On Fri, Dec 11, 2009 at 01:02:10PM +0800, Gui Jianfeng wrote:
>>>> Currently, with IO Controller introduced, CFQ chooses cfq group
>>>> at the top, and then choose service tree. So we need to take
>>>> whether cfq group is changed into account to decide whether we
>>>> should choose service tree start from scratch.
>>>>
>>> I am not able to understand the need/purpose of this patch. Once we
>>> switched the group during scheduling, why should we reset the order
>>> of workload with-in group.
>> I understand it, and in fact I was thinking about this.
>> The idea is the same as with priorities. If we have not serviced a group
>> for a while, we want to start with the no-idle workload to reduce its latency.
>>
>> Unfortunately, a group may have a too small share, that could cause some
>> workloads to be starved, as Vivek correctly points out, and we should
>> avoid this.
>> It should be easily reproducible testing a "many groups with mixed workloads"
>> scenario with group_isolation=1.
>>
>> Moreover, even if the approach is groups on top, when group isolation
>> is not set (i.e. the default), in the non-root groups you will only
>> have the sync-idle
>> queues, so it is much more similar (logically) to a workload on top
>> than it appears
>> from the code.
>>
>> I think the net result of this patch is, when group isolation is not
>> set, to select no-idle
>> workload first only when entering the root group, thus a slight
>> penalization of the
>> async workload.
> 
> Also penalization of sync-idle workload in root group.
> 
> The higher latencies for sync-noidle workload with-in a group will be
> observed only if group_isolation=1 and if group has low weight. I think
> in general this problem should be solved by bumping up the weight of the
> group, otherwise just live with it to ensure fairness for sync-idle
> workload in the group.
> 
> With group_isolation=0, the problem should be less visible as root group
> runs with max weight in the system (1000). In general I will assume that
> one will choose system weights in such a manner so that root group does
> not starve.
> 
> 
>> Gui, if you observed improvements with this patch, probably you can obtain them
>> without the starvation drawback by making it conditional to !group_isolation.
>>
>> BTW, since you always use cfqg_changed and prio_changed OR-ed together, you
>> can have just one argument, called e.g. boost_no_idle, and you pass
>> (!group_isolation && cfqg_changed) || prio_changed.
>>
> 
> Because it will impact the share of sync-idle workload in root group and
> asyn workload systemwide, I am not too keen on doing this change. I would
> rather rely on root group being higher weight.
> 
> So far this reset of workload order happens only if we decide to higher
> prio class task. So if there is some RT activity happening in system, we
> will constantly be resetting the workload order for BE class and keep on
> servicing sync-noidle workload while starving sync-idle and async
> workload.
> 
> So it is probably still ok for priority class switching, but I am not too keen
> on doing this at group switching event. This event will be too frequent if
> there are significant number of groups in the system and we don't want to
> starve root group sync-idle and system wide async workload. 
> 
> If somebody is not happy with latecies of sync-noidle workload, then
> easier way to solve that issue is readjust weights of groups.
> 
> Thanks
> Vivek

Hi Vivek, Corrado

Thanks for Corrado's explanation, i have the same concern.

Consider the following code,

prio_changed = (cfqd->serving_prio != previous_prio);
st = service_tree_for(cfqg, cfqd->serving_prio, cfqd->serving_type,
                        cfqd);
count = st->count;
/*
 * If priority didn't change, check workload expiration,
 * and that we still have other queues ready
 */
if (!prio_changed && count &&
      !time_after(jiffies, cfqd->workload_expires))
      return;

One more thing i do this change is that currently if serving_prio isn't changed,
we'll start from where we left. But with io group introduced, cfqd->serving_prio and
previous_prio might come from different groups. So i don't think "prio_changed" still
makes sense in the case of group changing. cfqd->workload_expires also might come from
the workload from another group when deciding whether starting from "cfqd->serving_type".

Thanks,
Gui

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/