Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965032AbWH2Pcq (ORCPT ); Tue, 29 Aug 2006 11:32:46 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S965033AbWH2Pcq (ORCPT ); Tue, 29 Aug 2006 11:32:46 -0400 Received: from mailhub.sw.ru ([195.214.233.200]:52509 "EHLO relay.sw.ru") by vger.kernel.org with ESMTP id S965032AbWH2Pcq (ORCPT ); Tue, 29 Aug 2006 11:32:46 -0400 Message-ID: <44F45ED7.3050708@sw.ru> Date: Tue, 29 Aug 2006 19:35:51 +0400 From: Kirill Korotaev User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.13) Gecko/20060417 X-Accept-Language: en-us, en, ru MIME-Version: 1.0 To: Andrew Morton CC: Linux Kernel Mailing List , Alan Cox , Christoph Hellwig , Pavel Emelianov , Andrey Savochkin , devel@openvz.org, Rik van Riel , Andi Kleen , Greg KH , Oleg Nesterov , Matt Helsley , Rohit Seth , Chandra Seetharaman Subject: Re: [PATCH] BC: resource beancounters (v2) References: <44EC31FB.2050002@sw.ru> <20060823100532.459da50a.akpm@osdl.org> <44EEE3BB.10303@sw.ru> <20060825073003.e6b5ae16.akpm@osdl.org> In-Reply-To: <20060825073003.e6b5ae16.akpm@osdl.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3892 Lines: 96 Andrew Morton wrote: > On Fri, 25 Aug 2006 15:49:15 +0400 > Kirill Korotaev wrote: > > >>>We need to go over this work before we can commit to the BC >>>core. Last time I looked at the VM accounting patch it >>>seemed rather unpleasing from a maintainability POV. >> >>hmmm... in which regard? > > > Little changes all over the MM code which might get accidentally broken. > > >>>And, if I understand it correctly, the only response to a job >>>going over its VM limits is to kill it, rather than trimming >>>it. Which sounds like a big problem? >> >>No, UBC virtual memory management refuses occur on mmap()'s. > > > That's worse, isn't it? Firstly it rules out big sparse mappings and secondly 1) if mappings are private then yes, you can not mmap too much. This is logical, since this whole mappings are potentially allocatable and there is no way to control it later except for SIGKILL. 2) if mappings are shared file mappings (shmem is handled in similar way) then you can mmap as much as you want, since these pages can be reclaimed. > mmap_and_use(80% of container size) > fork_and_immediately_exec(/bin/true) > > will fail at the fork? yes, it will fail on fork() or exec() in case of much private (1) mappings. fail on fork() and exec() is much friendlier then SIGKILL, don't you think so? private mappings parameter which is limited by UBC is a kind of upper estimation of container RSS. From our experience such an estimation is ~ 5-20% higher then a real physical memory used (with real-life applications). >>Andrey Savochkin wrote already a brief summary on vm resource management: >> >>------------- cut ---------------- >>The task of limiting a container to 4.5GB of memory bottles down to the >>question: what to do when the container starts to use more than assigned >>4.5GB of memory? >> >>At this moment there are only 3 viable alternatives. >> >>A) Have separate memory management for each container, >> with separate buddy allocator, lru lists, page replacement mechanism. >> That implies a considerable overhead, and the main challenge there >> is sharing of pages between these separate memory managers. >> >>B) Return errors on extension of mappings, but not on page faults, where >> memory is actually consumed. >> In this case it makes sense to take into account not only the size of used >> memory, but the size of created mappings as well. >> This is approximately what "privvmpages" accounting/limiting provides in >> UBC. >> >>C) Rely on OOM killer. >> This is a fall-back method in UBC, for the case "privvmpages" limits >> still leave the possibility to overload the system. >> > > > D) Virtual scan of mm's in the over-limit container > > E) Modify existing physical scanner to be able to skip pages which > belong to not-over-limit containers. > > F) Something else ;) We fully agree that other possible algorithms can and should exist. My idea only is that any of them would need accounting anyway (which is the most part of beancounters). Throtling, modified scanners etc. can be implemented as a separate BC parameters. Thus, an administrator will be able to select which policy should be applied to the container which is near its limit. So the patches I'm trying to send are a step-by-step accounting of all the resources and their simple limitations. More comprehensive limitation policy will be built on top of it later. BTW, UBC page beancounters allow to distinguish pages used by only one container and pages which are shared. So scanner can try to reclaim container private pages first, thus not influencing other containers. Thanks, Kirill - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/