Received: by 2002:a05:6a10:9e8c:0:0:0:0 with SMTP id y12csp3464796pxx; Mon, 2 Nov 2020 09:32:27 -0800 (PST) X-Google-Smtp-Source: ABdhPJzljY6/maKdXwZYqTHWjEmu2+/QTsjwcuL9ijjvLFq3f9cbouc2xxfFkVF10FOwUkYt+Ggk X-Received: by 2002:aa7:cb92:: with SMTP id r18mr18313885edt.13.1604338347378; Mon, 02 Nov 2020 09:32:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604338347; cv=none; d=google.com; s=arc-20160816; b=zVeNpypIno0xVt+cioKDOuSmGyQXcP6qguE/4/ZJ/S7PjcaFoOR8G6pm/6Ft9qVmfU IiMHYejeGypR93/4QzolOsk3GV2VFAQd8vURAu6WlQpGQ80UkBa5aBmVvF6Fj+O4+kER 3sInDCN4SP7OUkoEzagr6mVPyWuhkumlvVfkB+0YJ1FaBEpzgSjOxPt0/rs0pPkX6fVS sCRV/4WkWmMbiDW3QMQXutmrQxHjDVIZXUkUjSfE+ZGY64SibvkBftrU9zvGCbvRLLwG gQO41eps6zOceNpjX3oZhDcWSOd9QaGlzAMagYep+7BXHHxevVkrtKobDO5I1HpVCKPx xSXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=QOkiYKqFfeNA0FGlGiHEzDffH0VUSuWhVJ7laQu0JPA=; b=Dua9VNOAK2fxBSNgU7KDhH+u5267lSSZ+YrfWRGgH4XVw3UctOoMbQbqBINEufcTyX HrF6OPWyjQMY0oYHU8mi8jRu3GwS/3CoNqqBTO0QOY48DOo2tjd3TT5Ae8DSaMi1PQy3 kCN3g2r9Fz67FruAr8AkKwSBrFunFx00jL2Xs1ah9KZNvmDOVQ9R2f2nbsjNHUYBnRwg mVdREdK0Tft6fpcu6pTH2lSyUgblgSpXedhZEnNZUfDNf0b7rG3nz8aNev8Z/JJCSiBT JM/ItHat8nb3y70m+FxMNuG6uBWX7CAdhn9LmY17T3u6KvWKS+I08weU8XnI+MseLxlw 2hXA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b1si11635165ejb.290.2020.11.02.09.32.03; Mon, 02 Nov 2020 09:32:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727449AbgKBRad (ORCPT + 99 others); Mon, 2 Nov 2020 12:30:33 -0500 Received: from raptor.unsafe.ru ([5.9.43.93]:44072 "EHLO raptor.unsafe.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726587AbgKBRac (ORCPT ); Mon, 2 Nov 2020 12:30:32 -0500 Received: from comp-core-i7-2640m-0182e6 (ip-89-103-122-167.net.upcbroadband.cz [89.103.122.167]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by raptor.unsafe.ru (Postfix) with ESMTPSA id 028AA209AF; Mon, 2 Nov 2020 17:30:28 +0000 (UTC) Date: Mon, 2 Nov 2020 18:30:24 +0100 From: Alexey Gladkov To: Jann Horn Cc: LKML , Linux Containers , Kernel Hardening , "Eric W . Biederman" , Kees Cook , Christian Brauner Subject: Re: [RFC PATCH v1 4/4] Allow to change the user namespace in which user rlimits are counted Message-ID: <20201102173024.oflzudkq6cnolqyr@comp-core-i7-2640m-0182e6> References: <2718f7b13189dfd159414efb68e3533552593140.1604335819.git.gladkov.alexey@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.6.1 (raptor.unsafe.ru [5.9.43.93]); Mon, 02 Nov 2020 17:30:29 +0000 (UTC) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 02, 2020 at 06:10:06PM +0100, Jann Horn wrote: > On Mon, Nov 2, 2020 at 5:52 PM Alexey Gladkov wrote: > > Add a new prctl to change the user namespace in which the process > > counter is located. A pointer to the user namespace is in cred struct > > to be inherited by all child processes. > [...] > > + case PR_SET_RLIMIT_USER_NAMESPACE: > > + if (!capable(CAP_SYS_RESOURCE)) > > + return -EPERM; > > + > > + switch (arg2) { > > + case PR_RLIMIT_BIND_GLOBAL_USERNS: > > + error = set_rlimit_ns(&init_user_ns); > > + break; > > + case PR_RLIMIT_BIND_CURRENT_USERNS: > > + error = set_rlimit_ns(current_user_ns()); > > + break; > > + default: > > + error = -EINVAL; > > + } > > + break; > > I don't see how this can work. capable() requires that > current_user_ns()==&init_user_ns, so you can't use this API to bind > rlimits to any other user namespace. > > Fundamentally, if it requires CAP_SYS_RESOURCE, this probably can't be > done as an API that a process uses to change its own rlimit scope. In > that case I would implement this as part of clone3() instead of > prctl(). (Then init_user_ns can set it if the caller has > CAP_SYS_RESOURCE. If you want to have support for doing the same thing > with nested namespaces, you'd also need a flag that the first-level > clone3() can set on the namespace to say "further rlimit splitting > should be allowed".) > > Or alternatively, we could say that CAP_SYS_RESOURCE doesn't matter, > and instead you're allowed to move the rlimit scope if your current > hard rlimit is INFINITY. That might make more sense? Maybe? I think you are right. CAP_SYS_RESOURCE is not needed here since you still cannot exceed the rlimit in the parent user namespace. -- Rgrds, legion