Received: by 2002:ab2:69cc:0:b0:1fd:c486:4f03 with SMTP id n12csp77152lqp; Mon, 10 Jun 2024 19:21:10 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWeZQWZ2h+kVbN6Cv6HCQ2eZ8Xmp5okdbeNqD9mtEFZKlltNI1wNtSulodVP2lpfpZcXYYz8ov9Kx2ghlf1dN4TobSzi+RR4t1bBlKbZw== X-Google-Smtp-Source: AGHT+IFAYLwYInx8DtwPoGMog7tjNe69oD9H8oS64sb8wITIPiJsPVTrIsmAU2VGfzM62SRU9PN3 X-Received: by 2002:a17:906:19d3:b0:a6f:10a5:57a with SMTP id a640c23a62f3a-a6f10a505bdmr380139466b.15.1718072470519; Mon, 10 Jun 2024 19:21:10 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1718072470; cv=pass; d=google.com; s=arc-20160816; b=oDCmGkQZCNwVazKKqJrDlI0w74XLKPSBku6n1v8xk6Mswikmhls50Ht9b+BII7f925 0JhrX2t/WWnDuyHgV/J/+SfWnqTSBLhfbOzgnZRcM6fb5DxoJzeQ6hS+xhtQ+9REgdW0 ty7l0DPeNsxnzlzSJDghlVVwNtxIFiB6nsmBe7FUgNH6Oh3XtlpAguyyFIbME3FIDjtP ucpUO7bjeLt53N81GiBSZevNocnqzPkSA6Llh+rkOZwNyNR+dqDLg1eDaEOaLAKpeTT+ b5mOqixXxqklwA8Y3o/y0n7qB9s1QLBGqJ5Dngb37DhYtnUF2TWEvfTrtbf7iVzd4Ns4 Tgtw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id:dkim-signature; bh=d2cLskHqQK8hk38s4/1iX9SZkTIRIUV1aAaRQUth+3s=; fh=6I8XcRHNzOxm5cC2jf0v3b6L/ymbFtOrGv6b7N78J48=; b=JnfiPI0OM9gO0huZu9F9Jsby/1lvE5X+Oswbw4JLHCp+M5YUjVA1OHIz+6r90xeg9P ZfMni/0blu7QWMNVNiT6rkoJs57BZmvRs6Fu9rJvdxH3chQmqDDjzUVBFDTK5vTKv7bG HgJpVTw1cIW3keMUK+0bd5nb691L6s1TcF+eCyPLh2egq2ZHxYH7LGlTylBPDXSzTJCV cf9O3m9azlReWmao0kHe2yNnL0IxqK9RMxlWlzVi7G4Kjfv36cPxOE8QcsNiyj+eFG8c 001IWjJgZ+pU0nULPa1RIDQHaKZ2rx6Rp+3DqTRnrKOunJrF8ogD6PTNJ2+VnIWnoSL5 NRRw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b=NjeOmRux; arc=pass (i=1 dkim=pass dkdomain=infradead.org); spf=pass (google.com: domain of linux-kernel+bounces-209177-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-209177-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id a640c23a62f3a-a6c8072a208si493035466b.884.2024.06.10.19.21.10 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Jun 2024 19:21:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-209177-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b=NjeOmRux; arc=pass (i=1 dkim=pass dkdomain=infradead.org); spf=pass (google.com: domain of linux-kernel+bounces-209177-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-209177-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id EF5ED1F224C3 for ; Tue, 11 Jun 2024 02:21:09 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 65202C8FB; Tue, 11 Jun 2024 02:21:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="NjeOmRux" Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 374A411712; Tue, 11 Jun 2024 02:20:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718072460; cv=none; b=sHxcPKTzIvaX0uQVlvVxqE0zKgYt95e2OuUqRrfY6aaRaM+4iDqyWlE7ru0L7utsf3bk92WrsKGoNWCAtaKblmj3A0Q/feJad2TrRLQ2r169TyrR2TJcI5r8TKTTwQvlT4erKWyPths3iV4sAupEa6k9D5FBqyGCxmV3pBdhYBY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718072460; c=relaxed/simple; bh=V/PNTQDmYRnF9nQiLdiCEupgRkUujKPkEQVHPU+HxQo=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=QARuKi0Za15KjRoUplK0M4HJmhmeHsR+RgqitsUwgXG1B2VOz4FAahx11ssp0zdPbN6iZqjKh8JxO9pJ+oH8vppvdgKboIar7NQBz57D18NFurxluRxQ5lzDOL5eGzh+dpQAZFFqgE92ZaO7rGPsPeRDaFNLN8CMEZ1nkkq2BTQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=NjeOmRux; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: Content-Type:In-Reply-To:From:References:Cc:To:Subject:MIME-Version:Date: Message-ID:Sender:Reply-To:Content-ID:Content-Description; bh=d2cLskHqQK8hk38s4/1iX9SZkTIRIUV1aAaRQUth+3s=; b=NjeOmRuxVwrN8/kyK0fzcbla/G gjIVdM91KO0RO2HMnkLvsGdSplTE4NWKp/Zhtxnp/p3r6Wz+IOmusIr+z/iY0Lr7Te7xrTsEglxRF yzQVbMjWtDb3qZLjxt/PhBdJWcpSHAcgBJsIqGCJ0r4KipCq44vOuYeQa5FFH3Vj80hs26sCv8Z+M +i1jQLamzvjPcoxIBOKlNhumX61Mmpk26pWgDffWFH/HpERgm2xjAy2kG18mA7Kz/nFm+ZhuCVpcw trLo88MRF1Pv+0JwdwKJ+5G3N+e+owWfVtBUcAgONcWn9nFZtu1wBVT0Og+W1Sv9u6+mIS9shpmui ClLZNo9w==; Received: from [50.53.4.147] (helo=[192.168.254.15]) by bombadil.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sGr7x-000000078R6-2y2o; Tue, 11 Jun 2024 02:20:49 +0000 Message-ID: <0988dfae-69d0-4fbf-b145-15f6e853cbcc@infradead.org> Date: Mon, 10 Jun 2024 19:20:48 -0700 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1 1/1] mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC To: jeffxu@chromium.org Cc: akpm@linux-foundation.org, cyphar@cyphar.com, david@readahead.eu, dmitry.torokhov@gmail.com, dverkamp@chromium.org, hughd@google.com, jeffxu@google.com, jorgelo@chromium.org, keescook@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, pobrn@protonmail.com, skhan@linuxfoundation.org, stable@vger.kernel.org References: <20240607203543.2151433-1-jeffxu@google.com> <20240607203543.2151433-2-jeffxu@google.com> Content-Language: en-US From: Randy Dunlap In-Reply-To: <20240607203543.2151433-2-jeffxu@google.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Hi-- On 6/7/24 1:35 PM, jeffxu@chromium.org wrote: > From: Jeff Xu > > Add documentation for memfd_create flags: FMD_NOEXEC_SEAL s/FMD/MFD/ > and MFD_EXEC > > Signed-off-by: Jeff Xu > --- > Documentation/userspace-api/index.rst | 1 + > Documentation/userspace-api/mfd_noexec.rst | 86 ++++++++++++++++++++++ > 2 files changed, 87 insertions(+) > create mode 100644 Documentation/userspace-api/mfd_noexec.rst > > diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst > index 5926115ec0ed..8a251d71fa6e 100644 > --- a/Documentation/userspace-api/index.rst > +++ b/Documentation/userspace-api/index.rst > @@ -32,6 +32,7 @@ Security-related interfaces > seccomp_filter > landlock > lsm > + mfd_noexec > spec_ctrl > tee > > diff --git a/Documentation/userspace-api/mfd_noexec.rst b/Documentation/userspace-api/mfd_noexec.rst > new file mode 100644 > index 000000000000..0d2c840f37e1 > --- /dev/null > +++ b/Documentation/userspace-api/mfd_noexec.rst > @@ -0,0 +1,86 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +================================== > +Introduction of non executable mfd non-executable mfd > +================================== > +:Author: > + Daniel Verkamp > + Jeff Xu > + > +:Contributor: > + Aleksa Sarai > + > +Since Linux introduced the memfd feature, memfd have always had their memfds i.e., plural > +execute bit set, and the memfd_create() syscall doesn't allow setting > +it differently. > + > +However, in a secure by default system, such as ChromeOS, (where all secure-by-default > +executables should come from the rootfs, which is protected by Verified > +boot), this executable nature of memfd opens a door for NoExec bypass > +and enables “confused deputy attack”. E.g, in VRP bug [1]: cros_vm > +process created a memfd to share the content with an external process, > +however the memfd is overwritten and used for executing arbitrary code > +and root escalation. [2] lists more VRP in this kind. of this kind. > + > +On the other hand, executable memfd has its legit use, runc uses memfd’s use: > +seal and executable feature to copy the contents of the binary then > +execute them, for such system, we need a solution to differentiate runc's them. For such a system, > +use of executable memfds and an attacker's [3]. > + > +To address those above. above: > + - Let memfd_create() set X bit at creation time. > + - Let memfd be sealed for modifying X bit when NX is set. > + - A new pid namespace sysctl: vm.memfd_noexec to help applications to - Add a new applications in > + migrating and enforcing non-executable MFD. > + > +User API > +======== > +``int memfd_create(const char *name, unsigned int flags)`` > + > +``MFD_NOEXEC_SEAL`` > + When MFD_NOEXEC_SEAL bit is set in the ``flags``, memfd is created > + with NX. F_SEAL_EXEC is set and the memfd can't be modified to > + add X later. MFD_ALLOW_SEALING is also implied. > + This is the most common case for the application to use memfd. > + > +``MFD_EXEC`` > + When MFD_EXEC bit is set in the ``flags``, memfd is created with X. > + > +Note: > + ``MFD_NOEXEC_SEAL`` implies ``MFD_ALLOW_SEALING``. In case that > + app doesn't want sealing, it can add F_SEAL_SEAL after creation. an app > + > + > +Sysctl: > +======== > +``pid namespaced sysctl vm.memfd_noexec`` > + > +The new pid namespaced sysctl vm.memfd_noexec has 3 values: > + > + - 0: MEMFD_NOEXEC_SCOPE_EXEC > + memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like > + MFD_EXEC was set. > + > + - 1: MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL > + memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like > + MFD_NOEXEC_SEAL was set. > + > + - 2: MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED > + memfd_create() without MFD_NOEXEC_SEAL will be rejected. > + > +The sysctl allows finer control of memfd_create for old-software that old software > +doesn't set the executable bit, for example, a container with bit; > +vm.memfd_noexec=1 means the old-software will create non-executable memfd old software > +by default while new-software can create executable memfd by setting new software > +MFD_EXEC. > + > +The value of vm.memfd_noexec is passed to child namespace at creation > +time, in addition, the setting is hierarchical, i.e. during memfd_create, time. In addition, > +we will search from current ns to root ns and use the most restrictive > +setting. > + > +[1] https://crbug.com/1305267 > + > +[2] https://bugs.chromium.org/p/chromium/issues/list?q=type%3Dbug-security%20memfd%20escalation&can=1 > + > +[3] https://lwn.net/Articles/781013/ -- ~Randy