Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751019AbWE1WrD (ORCPT ); Sun, 28 May 2006 18:47:03 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751022AbWE1WrD (ORCPT ); Sun, 28 May 2006 18:47:03 -0400 Received: from watts.utsl.gen.nz ([202.78.240.73]:25236 "EHLO watts.utsl.gen.nz") by vger.kernel.org with ESMTP id S1751019AbWE1WrB (ORCPT ); Sun, 28 May 2006 18:47:01 -0400 Message-ID: <447A2853.2080901@vilain.net> Date: Mon, 29 May 2006 10:46:43 +1200 From: Sam Vilain User-Agent: Mozilla Thunderbird 1.0.7 (X11/20051013) X-Accept-Language: en-us, en MIME-Version: 1.0 To: =?ISO-8859-1?Q?Bj=F6rn_Steinbrink?= Cc: Mike Galbraith , Peter Williams , Con Kolivas , Linux Kernel , Kingsley Cheung , Ingo Molnar , Rene Herman , Herbert Poetzl , Kirill Korotaev Subject: Re: [RFC 0/5] sched: Add CPU rate caps References: <20060526042021.2886.4957.sendpatchset@heathwren.pw.nest> <1148630661.7589.65.camel@homer> <20060526161148.GA23680@atjola.homenet> In-Reply-To: <20060526161148.GA23680@atjola.homenet> X-Enigmail-Version: 0.92.1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3765 Lines: 83 Bj?rn Steinbrink wrote: >>The killer problem I see with this approach is that it doesn't address >>the divide and conquer problem. If a task is capped, and forks off >>workers, each worker inherits the total cap, effectively extending same. >> >> Yes, although the current thinking is that you need to set a special clone() flag (which may be restricted via capabilities such as CAP_SYS_RESOURCE) to set a new CPU scheduling namespace, so the workers will inherit the same scheduling ns and therefore be accounted against the one resource. Sorry if I'm replying out of context, I'll catch up on this thread shortly. Thanks for including me. >>IMHO, per task resource management is too severely limited in it's >>usefulness, because jobs are what need managing, and they're seldom >>single threaded. In order to use per task limits to manage any given >>job, you have to both know the number of components, and manually >>distribute resources to each component of the job. If a job has a >>dynamic number of components, it becomes impossible to manage. >> >> > >Linux-VServer uses a token bucket scheduler (TBS) to limit cpu ressources >for processes in a "context". All processes in a context share one token >bucket, which has a set of parameters to tune scheduling behaviour. >As the token bucket is shared by a group of processes, and inherited by >child processes/threads, management is quite easy. And the parameters >can be tuned to allow different scheduling behaviours, like allowing a >process group to burst, ie. use as much cpu time as is available, after >being idle for some time, but being limited to X % cpu time on average. > > This is correct. Basically I read the LARTC.org (which explains Linux network schedulers etc) and the description of the Token Bucket Scheduler inspired me to write the same thing for CPU resources. It was originally developed for the 2.4 Alan Cox series kernels. The primary design guarantee of the scheduler is a low total performance impact - maximum CPU utilisation prioritisation and fairness a secondary concern. It was built with the idea that people wanting different sorts of scheduling policies could at least get a set of userland controls to implement their approach - to the limit of the effectiveness of task priorities. I most recently described this at http://lkml.org/lkml/2006/3/29/59, a lot of that thread is probably worth catching up on. It would be nice if we could somehow re-use the scheduling algorithms in use in the network space here, if it doesn't impact on performance. For instance, the "CBQ" network scheduler is the same approach as used in OpenVZ's CPU scheduler, and the classful Token Bucket Filter is the approach used in VServer. The "Sched_prio" and "Sched_hard" distinction in vserver could probably be compared to "Ingres Policing", where available CPU that could run a process instead sits idle - similar to the network world where data that has arrived is dropped to try to convince the application to throttle its network activity. As in the network space (http://lkml.org/lkml/2006/5/19/216) in this space we have a continual scale of possible implementations, marked by a highly efficient solution akin to "binding" at one end, and a virtualisation at the other. Each deliver guarantees most applicable to certain users or workloads. Sam. >I'm CC'ing Herbert and Sam on this as they can explain the whole thing a >lot better and I'm not familiar with implementation details. > >Regards >Bj?rn > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/