Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933690AbeAHX0j (ORCPT + 1 other); Mon, 8 Jan 2018 18:26:39 -0500 Received: from mail-pf0-f178.google.com ([209.85.192.178]:43872 "EHLO mail-pf0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933648AbeAHX0f (ORCPT ); Mon, 8 Jan 2018 18:26:35 -0500 X-Google-Smtp-Source: ACJfBotXozcOWjFw23jdA4xw+d1dv26s39p7mcwMKCB3Yl6NMezfR3kn1ISU/YjEhSDMznpOyZ34bA== Cc: mtk.manpages@gmail.com, "Serge E. Hallyn" , lkml , linux-man , cgroups@vger.kernel.org To: Tejun Heo From: "Michael Kerrisk (man-pages)" Subject: Re: cgroups(7): documenting the nsdelegate mount option Message-ID: <4768da37-ba3e-7d0c-841a-1b026c558bdd@gmail.com> Date: Tue, 9 Jan 2018 00:26:28 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Hello Tejun, Here is my attempt to document dgroup v2 delegation using 'nsdelegate'. Could you please take a look at the text and let me know if anything needs fixing: Cgroups v2 delegation: nsdelegate and cgroup namespaces Starting with Linux 4.13, there is a second way to perform cgroup delegation. This is done by mounting the cgroup v2 filesystem with the nsdelegate mount option: $ mount -t cgroup2 -o nsdelegate none /sys/fs/cgroup/unified The effect of this option is to cause cgroup namespaces to auto‐ matically become delegation boundaries. More specifically, the following restrictions apply for processes inside the cgroup namespace: * Writes to controller interface files in the root directory will fail with the error EPERM. Processes inside the cgroup names‐ pace can still write to delegatable files such as cgroup.procs and cgroup.subtree_control, and can create subhierarchy under‐ neath the root directory of the cgroup namespace. * Attempts to migrate processes across the namespace boundary are denied (with the error ENOENT). Processes inside the cgroup namespace can still (subject to the containment rules described below) move processes between cgroups within the subhierarchy under the namespace root. The ability to define cgroup namespaces as delegation boundaries makes cgroup namespaces more useful. To understand why, suppose that we already have one cgroup hierarchy that has been delegated to a nonprivileged user, cecilia, using the older delegation tech‐ nique described above. Suppose further that cecilia wanted to further delegate a subhierarchy under the existing delegated hier‐ archy. (For example, the delegated hierarchy might be associated with an unprivileged container run by cecilia.) Even if a cgroup namespace was employed, because both hierarchies are owned by the unprivileged user cecilia, the following illegitimate actions could be performed: * A process in the inferior hierarchy could change the resource controller settings in the root directory of the that hierar‐ chy. (These resource controller settings are intended to allow control to be exercised from the parent cgroup; a process inside the child cgroup should not be allowed to modify them.) * A process inside the inferior hierarchy could move processes into and out of the inferior hierarchy if the cgroups in the superior hierarchy were somehow visible. Employing the nsdelegate mount option prevents both of these pos‐ sibilities. The nsdelegate mount option only has an effect when performed in the initial mount namespace; in other mount namespaces, the option is silently ignored. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/