Received: by 2002:a05:6a10:9e8c:0:0:0:0 with SMTP id y12csp3452836pxx; Mon, 2 Nov 2020 09:14:31 -0800 (PST) X-Google-Smtp-Source: ABdhPJxRbg+T/rfb2bgwgZcG2qUCGc+vLO1SvO1A8S/iKZVDzyEOGqwc4fKoYMGZvWG22VHnyD/7 X-Received: by 2002:a17:906:fcc2:: with SMTP id qx2mr6773908ejb.549.1604337270999; Mon, 02 Nov 2020 09:14:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604337270; cv=none; d=google.com; s=arc-20160816; b=TMkorzwDVYnHMo0duCUN7+e/FcgRB/wdgCqtsSD3cgodTrF2bUFbqTK1OCCUd3VpTy g54qdnM8PkSxhwPsaOQys17nxiZfz6owegfEki6NGBe6zWcMoHvGfdfQS7GVeMxdhkEc 4hLd94GXsSW6o/glp7u+HG59E4DHiIfW59PfLBEnfNquc4uUVFcac6WbjSuWHqnbcr2T SuUWR5V12jFVrsq7NAAk029gSxTwlYSK3MWUJSTQ0gVNfjP55coEvD0qkhP9kPsAFIXq 3z98QcDgme9/4j2CJLwvZ67c37CY5unZguufLzrnwpqV0E69faQp4FE12isdtteSRYQV 9TDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=iUhgKcjbc2s2EhhIkvemuRNHWDak5R9RJ9CGVBXgZ48=; b=GUH6hDaZs1bOby3SdOXdJMJVf0gUirT0Gji6wYlgYXtfPSLP9loZy3HG4hnQNxC8Uz SbmNZ+S15mITDId2zguEpuCWV3EMUZWMnKMrHXviL6gZYy5veHcVUgF2FvEltXIO3hmr mHgzzD2o5IUnF8iO5/uVOwcPfiGguYUQO7gM5aq1Z7PmWarQW6w9fTziTuyEmdJINzKD 74pTrBTJlETJkDoIb/ypOE7wlALlLz14LoTccrwm30ctD9AbC2KeipkSUksswye7J1CG 4j8BNSa9z6qMXNy1r7fjZyEuecnhj12/Jc9VoHerVZQX4wg2MxGOSvmcj7wVkoHr6WYx SqXw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=HjoXjBTh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id x4si10298147ejf.537.2020.11.02.09.14.07; Mon, 02 Nov 2020 09:14:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=HjoXjBTh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727337AbgKBRKf (ORCPT + 99 others); Mon, 2 Nov 2020 12:10:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45938 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727200AbgKBRKf (ORCPT ); Mon, 2 Nov 2020 12:10:35 -0500 Received: from mail-lf1-x144.google.com (mail-lf1-x144.google.com [IPv6:2a00:1450:4864:20::144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0CED6C0617A6 for ; Mon, 2 Nov 2020 09:10:35 -0800 (PST) Received: by mail-lf1-x144.google.com with SMTP id v6so18328608lfa.13 for ; Mon, 02 Nov 2020 09:10:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=iUhgKcjbc2s2EhhIkvemuRNHWDak5R9RJ9CGVBXgZ48=; b=HjoXjBTh/qMKF+PBX2unf6tg/4usdIKMSKAIxixFWbuZHeBxEiGMSat4Zh8HZSatkD yh34u2mi6GUPgEYC6XkEPYo9Jg3f2gJXkUFr9G4Vvun79vVX68OERc5TLzSyunRKdFDX dXic1qzf0FZGzvOWnm+oao39D40nWO8SbcQW9xGmSduixw6yzJAAWRiiCCgacbUftFDg CO3ynoaxLyFgWKeEvUBTvyhNfWfeINFsEGF0BpEt24j21pUMQmdXQoisKd5Nrev316TT QVVxu63+MfITJp0rNMsvdZHkgJ6zrk7DolMLGWJcPuz9HNC5O73afCtEwXRLxA/Cvp9z P+HQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=iUhgKcjbc2s2EhhIkvemuRNHWDak5R9RJ9CGVBXgZ48=; b=DaMnKlXuwLyEpOcuqMEl3PSCAfmjIbRrnMSlfiWLVpTBS08VTfMmc6odIj+RF+Mi1/ Y9HMyk6CyAmQQXAfFZOwLNtHGEhj5jMAYi5MwbzL0PCR0Awr4xsa3XkjUuwzoR4Vypgw btF+3PgdFliBnBgPvTug5hdQwnO5KO1nVePbO0y4B6TVVvQ0QJ+Cd8X737mFgZ6JLWmR cIfo9OVjyTDSLw2tI1G0fsnrFygQ/jzINfkfY19AOcjAp/MFNyPHCCffMqTFVSSJC1Ng pSUHb46pDJlmszogHigPhdJInml02wjobEuxlt/nCq+cB7dv9zwP0pioJCN4gUuWen/U ESoA== X-Gm-Message-State: AOAM531dtaFqsoPVQvg02KRp6MSwnXrwgfroxaWPMn4g1JSvRh8pREq6 IQQfFZReVYvZp1QtyEhL21+ipUqOVcH7Lul7ydaSKK4fMUo= X-Received: by 2002:a19:c357:: with SMTP id t84mr5636062lff.34.1604337033294; Mon, 02 Nov 2020 09:10:33 -0800 (PST) MIME-Version: 1.0 References: <2718f7b13189dfd159414efb68e3533552593140.1604335819.git.gladkov.alexey@gmail.com> In-Reply-To: <2718f7b13189dfd159414efb68e3533552593140.1604335819.git.gladkov.alexey@gmail.com> From: Jann Horn Date: Mon, 2 Nov 2020 18:10:06 +0100 Message-ID: Subject: Re: [RFC PATCH v1 4/4] Allow to change the user namespace in which user rlimits are counted To: Alexey Gladkov Cc: LKML , Linux Containers , Kernel Hardening , Alexey Gladkov , "Eric W . Biederman" , Kees Cook , Christian Brauner Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 2, 2020 at 5:52 PM Alexey Gladkov wrote: > Add a new prctl to change the user namespace in which the process > counter is located. A pointer to the user namespace is in cred struct > to be inherited by all child processes. [...] > + case PR_SET_RLIMIT_USER_NAMESPACE: > + if (!capable(CAP_SYS_RESOURCE)) > + return -EPERM; > + > + switch (arg2) { > + case PR_RLIMIT_BIND_GLOBAL_USERNS: > + error = set_rlimit_ns(&init_user_ns); > + break; > + case PR_RLIMIT_BIND_CURRENT_USERNS: > + error = set_rlimit_ns(current_user_ns()); > + break; > + default: > + error = -EINVAL; > + } > + break; I don't see how this can work. capable() requires that current_user_ns()==&init_user_ns, so you can't use this API to bind rlimits to any other user namespace. Fundamentally, if it requires CAP_SYS_RESOURCE, this probably can't be done as an API that a process uses to change its own rlimit scope. In that case I would implement this as part of clone3() instead of prctl(). (Then init_user_ns can set it if the caller has CAP_SYS_RESOURCE. If you want to have support for doing the same thing with nested namespaces, you'd also need a flag that the first-level clone3() can set on the namespace to say "further rlimit splitting should be allowed".) Or alternatively, we could say that CAP_SYS_RESOURCE doesn't matter, and instead you're allowed to move the rlimit scope if your current hard rlimit is INFINITY. That might make more sense? Maybe?