Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751418AbaDPCf4 (ORCPT ); Tue, 15 Apr 2014 22:35:56 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:51513 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750919AbaDPCfy (ORCPT ); Tue, 15 Apr 2014 22:35:54 -0400 Message-ID: <534DEC7B.3070300@huawei.com> Date: Wed, 16 Apr 2014 10:35:39 +0800 From: Li Zefan User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Tejun Heo CC: , , Subject: Re: [PATCHSET cgroup/for-3.16] cgroup: implement unified hierarchy, v2 References: <1397511430-2673-1-git-send-email-tj@kernel.org> In-Reply-To: <1397511430-2673-1-git-send-email-tj@kernel.org> Content-Type: text/plain; charset="GB2312" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.18.230] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2014/4/15 5:36, Tejun Heo wrote: > Hello, > > This is v2 of the unified hierarchy patchset. Changes from v1[1] are, > > * Rebased on top of v3.15-rc1 > > * Interface file "cgroup.controllers" which was only available in the > root is now available in all cgroups. This allows, e.g., a > sub-manager in charge of a subtree to tell which controllers are > available to it. > > cgroup currently allows creating arbitrary number of hierarchies and > any number of controllers may be associated with a given tree. This > allows for huge amount of variance how tasks are associated with > various cgroups and controllers; unfortunately, the variance is > extreme to the extent that it unnecessarily complicates capabilities > which can otherwise be straight-forward and hinders implementation of > features which can benefit from coordination among different > controllers. > > Here are some of the issues which we're facing with the current > multiple hierarchies. > > * cgroup membership of a task can't be described in finite number of > paths. As there can be arbitrary number of hierarchies, the key > describing a task's cgroup membership can be arbitrarily long. This > is painful when userland or other parts of the kernel needs to take > cgroup membership into account and leads to proliferation of > controllers which are just there to identify membership rather than > actually control resources, which in turn exacerbates the problem. > > * Different controllers may or may not reside on the same hierarchy. > Features or optimizations which can benefit from sharing the > hierarchical organization either can't be implemented or becomes > overly complicated. > > * Tasks of a process may belong to different cgroups, which doesn't > make any sense for some controllers. Those controllers end up > ignoring such configurations in their own ways leading to > inconsistent behavior. In addition, in-process resource control > fundamentally isn't something which belongs to cgroup. As it has to > be visible to the binary for the process, it must be part of the > stable programming interface which is easily accessible to the > process proper in an easy race-free way. > > * The current cgroup allows cgroups which have child cgroups to have > tasks in it. This means that the child cgroups end up competing > against the internal tasks. This introduces inherent ambiguity as > the two are separate types of entities and the latter doesn't have > the same control knobs assigned to them. > > Different controllers are dealing with the issue in different ways. > cpu treats internal tasks and child cgroups as equivalents, which > makes giving a child cgroup a given ratio of the parent's cpu time > difficult as the number of competing entities may fluctuate without > any indication. blkio, in my misguided attempt to deal with the > issue, introduced a whole duplicate set of knobs for internal tasks > and deal with them as if they belong to a separate child cgroup > making the interface and implementation a mess. memcg seems > somewhat ambiguous on the issue but there are attempts to introduce > ad-hoc modifications to tilt the way it's handled to suit specific > use cases. > > This is an inherent problem. All of the solutions that different > controllers came up with are unsatisfactory, the different behaviors > greatly increases the level of inconsistency and complicates the > controller implementations. > > This patchset finally implements the default unified hierarchy. The > goal is providing enough flexibility while enforcing stricter common > structure where appropriate to address the above listed issues. > > Controllers which aren't bound to other hierarchies are > automatically attached to the unified hierarchy, which is different in > that controllers are enabled explicitly for each subtree. > "cgroup.subtree_control" controls which controllers are enabled on the > child cgroups. Let's assume a hierarchy like the following. > > root - A - B - C > \ D > > root's "cgroup.subtree_control" determines which controllers are > enabled on A. A's on B. B's on C and D. This coincides with the > fact that controllers on the immediate sub-level are used to > distribute the resources of the parent. In fact, it's natural to > assume that resource control knobs of a child belong to its parent. > Enabling a controller in "cgroup.subtree_control" declares that > distribution of the respective resources of the cgroup will be > controlled. Note that this means that controller enable states are > shared among siblings. > > The default hierarchy has an extra restriction - only cgroups which > don't contain any task may have controllers enabled in > "cgroup.subtree_control". Combined with the other properties of the > default hierarchy, this guarantees that, from the view point of > controllers, tasks are only on the leaf cgroups. In other words, only > leaf csses may contain tasks. This rules out situations where child > cgroups compete against internal tasks of the parent. > > This patchset contains the following twelve patches. > > 0001-cgroup-update-cgroup-subsys_mask-to-child_subsys_mas.patch > 0002-cgroup-introduce-effective-cgroup_subsys_state.patch > 0003-cgroup-implement-cgroup-e_csets.patch > 0004-cgroup-make-css_next_child-skip-missing-csses.patch > 0005-cgroup-reorganize-css_task_iter.patch > 0006-cgroup-teach-css_task_iter-about-effective-csses.patch > 0007-cgroup-cgroup-subsys-should-be-cleared-after-the-css.patch > 0008-cgroup-allow-cgroup-creation-and-suppress-automatic-.patch > 0009-cgroup-add-css_set-dfl_cgrp.patch > 0010-cgroup-update-subsystem-rebind-restrictions.patch > 0011-cgroup-prepare-migration-path-for-unified-hierarchy.patch > 0012-cgroup-implement-dynamic-subtree-controller-enable-d.patch > Acked-by: Li Zefan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/