Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755366AbbHYChb (ORCPT ); Mon, 24 Aug 2015 22:37:31 -0400 Received: from mgwkm02.jp.fujitsu.com ([202.219.69.169]:57847 "EHLO mgwkm02.jp.fujitsu.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754075AbbHYCh2 (ORCPT ); Mon, 24 Aug 2015 22:37:28 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v2.3.2 X-SHieldMailCheckerPolicyVersion: FJ-ISEC-20150223 X-SHieldMailCheckerMailID: 2e95373d13464867ae86a05f40cd0145 Subject: Re: [PATCH 3/3] sched: Implement interface for cgroup unified hierarchy To: Paul Turner , Tejun Heo References: <20150822182916.GE20768@mtj.duckdns.org> <55DB3C76.5010009@gmail.com> <20150824170427.GA27262@mtj.duckdns.org> <20150824210223.GH28944@mtj.duckdns.org> <20150824211707.GJ28944@mtj.duckdns.org> <20150824214000.GL28944@mtj.duckdns.org> <20150824224936.GO28944@mtj.duckdns.org> Cc: Austin S Hemmelgarn , Peter Zijlstra , Ingo Molnar , Johannes Weiner , lizefan@huawei.com, cgroups , LKML , kernel-team , Linus Torvalds , Andrew Morton From: Kamezawa Hiroyuki Message-ID: <55DBD4A9.7080603@jp.fujitsu.com> Date: Tue, 25 Aug 2015 11:36:25 +0900 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4901 Lines: 122 On 2015/08/25 8:15, Paul Turner wrote: > On Mon, Aug 24, 2015 at 3:49 PM, Tejun Heo wrote: >> Hello, >> >> On Mon, Aug 24, 2015 at 03:03:05PM -0700, Paul Turner wrote: >>>> Hmm... I was hoping for an actual configurations and usage scenarios. >>>> Preferably something people can set up and play with. >>> >>> This is much easier to set up and play with synthetically. Just >>> create the 10 threads and 100 threads above then experiment with >>> configurations designed at guaranteeing the set of 100 threads >>> relatively uniform throughput regardless of how many are active. I >>> don't think trying to run a VM stack adds anything except complexity >>> of reproduction here. >> >> Well, but that loses most of details and why such use cases matter to >> begin with. We can imagine up stuff to induce arbitrary set of >> requirements. > > All that's being proved or disproved here is that it's difficult to > coordinate the consumption of asymmetric thread pools using nice. The > constraints here were drawn from a real-world example. > >> >>>> I take that the >>>> CPU intensive helper threads are usually IO workers? Is the scenario >>>> where the VM is set up with a lot of IO devices and different ones may >>>> consume large amount of CPU cycles at any given point? >>> >>> Yes, generally speaking there are a few major classes of IO (flash, >>> disk, network) that a guest may invoke. Each of these backends is >>> separate and chooses its own threading. >> >> Hmmm... if that's the case, would limiting iops on those IO devices >> (or classes of them) work? qemu already implements IO limit mechanism >> after all. > > No. > > 1) They should proceed at the maximum rate that they can that's still > within their provisioning budget. > 2) The cost/IO is both inconsistent and changes over time. Attempting > to micro-optimize every backend for this is infeasible, this is > exactly the type of problem that the scheduler can usefully help > arbitrate. > 3) Even pretending (2) is fixable, dynamically dividing these > right-to-work tokens between different I/O device backends is > extremely complex. > I think I should explain my customer's use case of qemu + cpuset/cpu (via libvirt) (1) Isolating hypervisor thread. As already discussed, hypervisor threads are isolated by cpuset. But their purpose is to avoid _latency_ spike caused by hypervisor behavior. So, "nice" cannot be solution as already discussed. (2) Fixed rate vcpu service. With using cpu controller's quota/period feature, my customer creates vcpu models like Low(1GHz), Mid(2GHz), High(3GHz) for IaaS system. To do this, each vcpus should be quota-limited independently, with per-thread cpu control. Especially, the method (1) is used in several enterprise customers for stabilizing their system. Sub-process control should be provided by some way. Thanks, -Kame >> >> Anyways, a point here is that threads of the same process competing >> isn't a new problem. There are many ways to make those threads play >> nice as the application itself often has to be involved anyway, >> especially for something like qemu which is heavily involved in >> provisioning resources. > > It's certainly not a new problem, but it's a real one, and it's > _hard_. You're proposing removing the best known solution. > >> >> cgroups can be a nice brute-force add-on which lets sysadmins do wild >> things but it's inherently hacky and incomplete for coordinating >> threads. For example, what is it gonna do if qemu cloned vcpus and IO >> helpers dynamically off of the same parent thread? > > We're talking about sub-process usage here. This is the application > coordinating itself, NOT the sysadmin. Processes are becoming larger > and larger, we need many of the same controls within them that we have > between them. > >> It requires >> application's cooperation anyway but at the same time is painful to >> actually interact from those applications. > > As discussed elsewhere on thread this is really not a problem if you > define consistent rules with respect to which parts are managed by > who. The argument of potential interference is no different to > messing with an application's on-disk configuration behind its back. > Alternate strawmen which greatly improve this from where we are today > have also been proposed. > >> >> Thanks. >> >> -- >> tejun > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/