Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754979AbZKKHUa (ORCPT ); Wed, 11 Nov 2009 02:20:30 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752238AbZKKHUa (ORCPT ); Wed, 11 Nov 2009 02:20:30 -0500 Received: from casper.infradead.org ([85.118.1.10]:57330 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751965AbZKKHU3 convert rfc822-to-8bit (ORCPT ); Wed, 11 Nov 2009 02:20:29 -0500 Subject: Re: [BUG] cpu controller can't provide fair CPU time for each group From: Peter Zijlstra To: Yasunori Goto Cc: Miao Xie , Linux-Kernel , containers , Ingo Molnar In-Reply-To: <20091111134910.5F42.E1E9C6FF@jp.fujitsu.com> References: <4AEF94E8.3030403@cn.fujitsu.com> <1257846518.4648.18.camel@twins> <20091111134910.5F42.E1E9C6FF@jp.fujitsu.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT Date: Wed, 11 Nov 2009 08:20:07 +0100 Message-ID: <1257924007.23203.18.camel@twins> Mime-Version: 1.0 X-Mailer: Evolution 2.28.1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2393 Lines: 57 On Wed, 2009-11-11 at 15:21 +0900, Yasunori Goto wrote: > When users use cpuset/cpu affinity, then they would like to controll cpu affinity. > Not CPU time. What are people using affinity for? The only use of affinity is to restrict or disable the load-balancer. Don't complain the load-balancer doesn't work when you're taking active steps to hinder its work. If you don't want things load-balanced, turn it off, if you want the load-balancer to work on smaller groups of cpus, use cpusets. Anyway, I said there needs to be done something because the interaction between cpusets and the cpu-controller is utter crap, they never should have been separated like they are. > To be honest, I don't have any good idea because I'm not familiar with > schduler's code. But I have one question. > > > 1618 static int tg_shares_up(struct task_group *tg, void *data) > 1619 { > 1620 unsigned long weight, rq_weight = 0, shares = 0; > > (snip) > > 1632 for_each_cpu(i, sched_domain_span(sd)) { > 1633 weight = tg->cfs_rq[i]->load.weight; > 1634 usd->rq_weight[i] = weight; > 1635 > 1636 /* > 1637 * If there are currently no tasks on the cpu pretend there > 1638 * is one of average load so that when a new task gets to > 1639 * run here it will not get delayed by group starvation. > 1640 */ > 1641 if (!weight) > 1642 weight = NICE_0_LOAD; ---------(*) > > I heard from test team when (*) was removed, 1) didn't occur. > > The comment said (*) is to avoid starvation condition. > However, I don't understand why NICE_0_LOAD must be specified. > Could you tell me why small value (like 2 or 3) is not used for (*)? > What is side effect? Exactly what the comment says, it will get delayed because the group won't get scheduled on that cpu until all the group weights get re-adjusted again, which can be much longer than the typical runtimes of the workload in question. Regular weights are NICE_0_LOAD, if you stick a 3 next to that I'll not get ran much -> starvation. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/