Received: by 2002:a05:6a10:9e8c:0:0:0:0 with SMTP id y12csp3438134pxx; Mon, 2 Nov 2020 08:54:36 -0800 (PST) X-Google-Smtp-Source: ABdhPJwie+89GD6br9wQb9DEAtKCMGd/Y48I2bdVZHa1fv2GQCHrsj/zp6HBA/EM9Ov3MU2qF45U X-Received: by 2002:a17:907:960b:: with SMTP id gb11mr15422582ejc.396.1604336075793; Mon, 02 Nov 2020 08:54:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604336075; cv=none; d=google.com; s=arc-20160816; b=jaCPgcd7GoUXJv+suvoh0r5vfj6exu7v+9eLkbZyMIuLr6dTou7LR29A1CWbRN++/Q 8TBPI/2+HO8frcd721+iyto8tKkF511lHdHTThITcW9UFGsABm72Sk1YM4X/MY4BqTIw rkEfBf5j997sFZYpMC3yoLrAoun6LSqbFGhnL1NBOVBLl3b3woPr/LXYasd56VpeBrK0 ns9AnN8FCyaxG/0EHdQFVymSie6AOIsxg4TUWZgR0QkMJe+eVyjPvsjBWAr5go0R+AUh tiVjT0wWGSBIB9j5uRNbvJ1rA5zf4hKb/tqfFfPYXqK9BeS/50LoFpJ7QvwhHzSacPiA aGYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=VbQMXijJuTmhG9mxfugm7nfxm3Xa3IsdtzvBvg1G9XM=; b=Tub3ZXHwzsBLc1bHku4Rt7aKCRwOZTDu+D/qcYVsjDnn4Sg+MmxZhyHcpNS6+0Q/Tm SHCryPVTw4WnpQxPRidUcY8E6r1mO6pYQORx03WRGx+44cQzknd5SgSZJ+V/BdPVyFZZ bE9T5HRqXGWaz7Byya0iTEKEWKvtbuAHBI2qsKZbhUhFchAXSKbtx3up8q+/CXOCPJS2 tsr9/V0Tph8vVnHjnomhNFfbIPeIBG9yLhpgZKCVBljn/P5f/pdfe/Stvs622RfthFDP YQNmNWKl+0h4WWveElPjsgme1VbD3UrZ7tNwbTIpVLaUX3JPn0DVj11x1Z6CtZd9IDVJ yWrw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p5si11126242ejy.384.2020.11.02.08.54.12; Mon, 02 Nov 2020 08:54:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727420AbgKBQwd (ORCPT + 99 others); Mon, 2 Nov 2020 11:52:33 -0500 Received: from raptor.unsafe.ru ([5.9.43.93]:51584 "EHLO raptor.unsafe.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727402AbgKBQwc (ORCPT ); Mon, 2 Nov 2020 11:52:32 -0500 Received: from comp-core-i7-2640m-0182e6.redhat.com (ip-89-103-122-167.net.upcbroadband.cz [89.103.122.167]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by raptor.unsafe.ru (Postfix) with ESMTPSA id C2A43209AF; Mon, 2 Nov 2020 16:52:27 +0000 (UTC) From: Alexey Gladkov To: LKML , Linux Containers , Kernel Hardening Cc: Alexey Gladkov , "Eric W . Biederman" , Kees Cook , Christian Brauner Subject: [RFC PATCH v1 0/4] Per user namespace rlimits Date: Mon, 2 Nov 2020 17:50:29 +0100 Message-Id: X-Mailer: git-send-email 2.25.4 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.6.1 (raptor.unsafe.ru [5.9.43.93]); Mon, 02 Nov 2020 16:52:29 +0000 (UTC) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Preface ------- These patches are for binding the rlimits to a user in the user namespace. This patch set can be applied on top of: git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.8-2-g43e210d68200 Problem ------- Some rlimits are set per user: RLIMIT_NPROC, RLIMIT_MEMLOCK, RLIMIT_SIGPENDING, RLIMIT_MSGQUEUE. When several containers are created from one user then the processes inside the containers influence each other. Eric W. Biederman mentioned this issue [1][2][3]. Introduced changes ------------------ To fix this problem, you can bind the counter of the specified rlimits to the user within the user namespace. By default, to preserve backward compatibility, only the initial user namespace is used. This patch adds one more prctl parameter to change the binding to the user namespace. This will not cause the user to take more resources than allowed in the parent user namespace because it only virtualizes the rlimit counter. Limits in all parent user namespaces are taken into account. For example, this allows us to run multiple containers by the same user and set the RLIMIT_NPROC to 1 inside. ToDo ---- * RLIMIT_MEMLOCK, RLIMIT_SIGPENDING and RLIMIT_MSGQUEUE are not implemented. * No documentation. * No tests. [1] https://lore.kernel.org/containers/87imd2incs.fsf@x220.int.ebiederm.org/ [2] https://lists.linuxfoundation.org/pipermail/containers/2020-August/042096.html [3] https://lists.linuxfoundation.org/pipermail/containers/2020-October/042524.html Changelog --------- v1: * After discussion with Eric W. Biederman, I increased the size of ucounts to atomic_long_t. * Added ucount_max to avoid the fork bomb. -- Alexey Gladkov (4): Increase size of ucounts to atomic_long_t Move the user's process counter to ucounts Do not allow fork if RLIMIT_NPROC is exceeded in the user namespace tree Allow to change the user namespace in which user rlimits are counted fs/exec.c | 13 ++++++--- fs/io-wq.c | 25 +++++++++++++----- fs/io-wq.h | 1 + fs/io_uring.c | 1 + include/linux/cred.h | 8 ++++++ include/linux/sched.h | 3 +++ include/linux/sched/user.h | 1 - include/linux/user_namespace.h | 12 +++++++-- include/uapi/linux/prctl.h | 5 ++++ kernel/cred.c | 44 ++++++++++++++++++++++++------- kernel/exit.c | 2 +- kernel/fork.c | 13 ++++++--- kernel/sys.c | 26 ++++++++++++++++-- kernel/ucount.c | 48 +++++++++++++++++++++++++++++----- kernel/user.c | 3 ++- kernel/user_namespace.c | 3 +++ 16 files changed, 171 insertions(+), 37 deletions(-) -- 2.25.4