From: Jeff Xu <[email protected]>
By default, memfd_create() creates a non-sealable MFD, unless the
MFD_ALLOW_SEALING flag is set.
When the MFD_NOEXEC_SEAL flag is initially introduced, the MFD created
with that flag is sealable, even though MFD_ALLOW_SEALING is not set.
This patch changes MFD_NOEXEC_SEAL to be non-sealable by default,
unless MFD_ALLOW_SEALING is explicitly set.
This is a non-backward compatible change. However, as MFD_NOEXEC_SEAL
is new, we expect not many applications will rely on the nature of
MFD_NOEXEC_SEAL being sealable. In most cases, the application already
sets MFD_ALLOW_SEALING if they need a sealable MFD.
Additionally, this enhances the useability of pid namespace sysctl
vm.memfd_noexec. When vm.memfd_noexec equals 1 or 2, the kernel will
add MFD_NOEXEC_SEAL if mfd_create does not specify MFD_EXEC or
MFD_NOEXEC_SEAL, and the addition of MFD_NOEXEC_SEAL enables the MFD
to be sealable. This means, any application that does not desire this
behavior will be unable to utilize vm.memfd_noexec = 1 or 2 to
migrate/enforce non-executable MFD. This adjustment ensures that
applications can anticipate that the sealable characteristic will
remain unmodified by vm.memfd_noexec.
This patch was initially developed by Barnabás Pőcze, and Barnabás
used Debian Code Search and GitHub to try to find potential breakages
and could only find a single one. Dbus-broker's memfd_create() wrapper
is aware of this implicit `MFD_ALLOW_SEALING` behavior, and tries to
work around it [1]. This workaround will break. Luckily, this only
affects the test suite, it does not affect
the normal operations of dbus-broker. There is a PR with a fix[2]. In
addition, David Rheinsberg also raised similar fix in [3]
[1]: https://github.com/bus1/dbus-broker/blob/9eb0b7e5826fc76cad7b025bc46f267d4a8784cb/src/util/misc.c#L114
[2]: https://github.com/bus1/dbus-broker/pull/366
[3]: https://lore.kernel.org/lkml/[email protected]/
History
======
V2:
update commit message.
add testcase for vm.memfd_noexec
add documentation.
V1:
https://lore.kernel.org/lkml/[email protected]/
Jeff Xu (2):
memfd: fix MFD_NOEXEC_SEAL to be non-sealable by default
memfd:add MEMFD_NOEXEC_SEAL documentation
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/mfd_noexec.rst | 90 ++++++++++++++++++++++
mm/memfd.c | 9 +--
tools/testing/selftests/memfd/memfd_test.c | 26 ++++++-
4 files changed, 120 insertions(+), 6 deletions(-)
create mode 100644 Documentation/userspace-api/mfd_noexec.rst
--
2.45.1.288.g0e0cd299f1-goog
From: Jeff Xu <[email protected]>
Add documentation for MFD_NOEXEC_SEAL and MFD_EXEC
Cc: [email protected]
Signed-off-by: Jeff Xu <[email protected]>
---
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/mfd_noexec.rst | 90 ++++++++++++++++++++++
2 files changed, 91 insertions(+)
create mode 100644 Documentation/userspace-api/mfd_noexec.rst
diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index 5926115ec0ed..8a251d71fa6e 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -32,6 +32,7 @@ Security-related interfaces
seccomp_filter
landlock
lsm
+ mfd_noexec
spec_ctrl
tee
diff --git a/Documentation/userspace-api/mfd_noexec.rst b/Documentation/userspace-api/mfd_noexec.rst
new file mode 100644
index 000000000000..6f11ad86b076
--- /dev/null
+++ b/Documentation/userspace-api/mfd_noexec.rst
@@ -0,0 +1,90 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================================
+Introduction of non executable mfd
+==================================
+:Author:
+ Daniel Verkamp <[email protected]>
+ Jeff Xu <[email protected]>
+
+:Contributor:
+ Aleksa Sarai <[email protected]>
+ Barnabás Pőcze <[email protected]>
+ David Rheinsberg <[email protected]>
+
+Since Linux introduced the memfd feature, memfd have always had their
+execute bit set, and the memfd_create() syscall doesn't allow setting
+it differently.
+
+However, in a secure by default system, such as ChromeOS, (where all
+executables should come from the rootfs, which is protected by Verified
+boot), this executable nature of memfd opens a door for NoExec bypass
+and enables “confused deputy attack”. E.g, in VRP bug [1]: cros_vm
+process created a memfd to share the content with an external process,
+however the memfd is overwritten and used for executing arbitrary code
+and root escalation. [2] lists more VRP in this kind.
+
+On the other hand, executable memfd has its legit use, runc uses memfd’s
+seal and executable feature to copy the contents of the binary then
+execute them, for such system, we need a solution to differentiate runc's
+use of executable memfds and an attacker's [3].
+
+To address those above.
+ - Let memfd_create() set X bit at creation time.
+ - Let memfd be sealed for modifying X bit when NX is set.
+ - A new pid namespace sysctl: vm.memfd_noexec to help applications to
+ migrating and enforcing non-executable MFD.
+
+User API
+========
+``int memfd_create(const char *name, unsigned int flags)``
+
+``MFD_NOEXEC_SEAL``
+ When MFD_NOEXEC_SEAL bit is set in the ``flags``, memfd is created
+ with NX. F_SEAL_EXEC is set and the memfd can't be modified to
+ add X later.
+ This is the most common case for the application to use memfd.
+
+``MFD_EXEC``
+ When MFD_EXEC bit is set in the ``flags``, memfd is created with X.
+
+Note:
+ ``MFD_NOEXEC_SEAL`` and ``MFD_EXEC`` doesn't change the sealable
+ characteristic of memfd, which is controlled by ``MFD_ALLOW_SEALING``.
+
+
+Sysctl:
+========
+``pid namespaced sysctl vm.memfd_noexec``
+
+The new pid namespaced sysctl vm.memfd_noexec has 3 values:
+
+ - 0: MEMFD_NOEXEC_SCOPE_EXEC
+ memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
+ MFD_EXEC was set.
+
+ - 1: MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL
+ memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
+ MFD_NOEXEC_SEAL was set.
+
+ - 2: MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED
+ memfd_create() without MFD_NOEXEC_SEAL will be rejected.
+
+The sysctl allows finer control of memfd_create for old-software that
+doesn't set the executable bit, for example, a container with
+vm.memfd_noexec=1 means the old-software will create non-executable memfd
+by default while new-software can create executable memfd by setting
+MFD_EXEC.
+
+The value of memfd_noexec is passed to child namespace at creation time,
+in addition, the setting is hierarchical, i.e. during memfd_create,
+we will search from current ns to root ns and use the most restrictive
+setting.
+
+Reference:
+==========
+[1] https://crbug.com/1305267
+
+[2] https://bugs.chromium.org/p/chromium/issues/list?q=type%3Dbug-security%20memfd%20escalation&can=1
+
+[3] https://lwn.net/Articles/781013/
--
2.45.1.288.g0e0cd299f1-goog
From: Jeff Xu <[email protected]>
By default, memfd_create() creates a non-sealable MFD, unless the
MFD_ALLOW_SEALING flag is set.
When the MFD_NOEXEC_SEAL flag is initially introduced, the MFD created
with that flag is sealable, even though MFD_ALLOW_SEALING is not set.
This patch changes MFD_NOEXEC_SEAL to be non-sealable by default,
unless MFD_ALLOW_SEALING is explicitly set.
This is a non-backward compatible change. However, as MFD_NOEXEC_SEAL
is new, we expect not many applications will rely on the nature of
MFD_NOEXEC_SEAL being sealable. In most cases, the application already
sets MFD_ALLOW_SEALING if they need a sealable MFD.
Additionally, this enhances the useability of pid namespace sysctl
vm.memfd_noexec. When vm.memfd_noexec equals 1 or 2, the kernel will
add MFD_NOEXEC_SEAL if mfd_create does not specify MFD_EXEC or
MFD_NOEXEC_SEAL, and the addition of MFD_NOEXEC_SEAL enables the MFD
to be sealable. This means, any application that does not desire this
behavior will be unable to utilize vm.memfd_noexec = 1 or 2 to
migrate/enforce non-executable MFD. This adjustment ensures that
applications can anticipate that the sealable characteristic will
remain unmodified by vm.memfd_noexec.
This patch was initially developed by Barnabás Pőcze, and Barnabás
used Debian Code Search and GitHub to try to find potential breakages
and could only find a single one. Dbus-broker's memfd_create() wrapper
is aware of this implicit `MFD_ALLOW_SEALING` behavior, and tries to
work around it [1]. This workaround will break. Luckily, this only
affects the test suite, it does not affect
the normal operations of dbus-broker. There is a PR with a fix[2]. In
addition, David Rheinsberg also raised similar fix in [3]
[1]: https://github.com/bus1/dbus-broker/blob/9eb0b7e5826fc76cad7b025bc46f267d4a8784cb/src/util/misc.c#L114
[2]: https://github.com/bus1/dbus-broker/pull/366
[3]: https://lore.kernel.org/lkml/[email protected]/
Cc: [email protected]
Fixes: 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
Signed-off-by: Barnabás Pőcze <[email protected]>
Signed-off-by: Jeff Xu <[email protected]>
Reviewed-by: David Rheinsberg <[email protected]>
---
mm/memfd.c | 9 ++++----
tools/testing/selftests/memfd/memfd_test.c | 26 +++++++++++++++++++++-
2 files changed, 29 insertions(+), 6 deletions(-)
diff --git a/mm/memfd.c b/mm/memfd.c
index 7d8d3ab3fa37..8b7f6afee21d 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -356,12 +356,11 @@ SYSCALL_DEFINE2(memfd_create,
inode->i_mode &= ~0111;
file_seals = memfd_file_seals_ptr(file);
- if (file_seals) {
- *file_seals &= ~F_SEAL_SEAL;
+ if (file_seals)
*file_seals |= F_SEAL_EXEC;
- }
- } else if (flags & MFD_ALLOW_SEALING) {
- /* MFD_EXEC and MFD_ALLOW_SEALING are set */
+ }
+
+ if (flags & MFD_ALLOW_SEALING) {
file_seals = memfd_file_seals_ptr(file);
if (file_seals)
*file_seals &= ~F_SEAL_SEAL;
diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c
index 95af2d78fd31..8579a93d006b 100644
--- a/tools/testing/selftests/memfd/memfd_test.c
+++ b/tools/testing/selftests/memfd/memfd_test.c
@@ -1151,7 +1151,7 @@ static void test_noexec_seal(void)
mfd_def_size,
MFD_CLOEXEC | MFD_NOEXEC_SEAL);
mfd_assert_mode(fd, 0666);
- mfd_assert_has_seals(fd, F_SEAL_EXEC);
+ mfd_assert_has_seals(fd, F_SEAL_SEAL | F_SEAL_EXEC);
mfd_fail_chmod(fd, 0777);
close(fd);
}
@@ -1169,6 +1169,14 @@ static void test_sysctl_sysctl0(void)
mfd_assert_has_seals(fd, 0);
mfd_assert_chmod(fd, 0644);
close(fd);
+
+ fd = mfd_assert_new("kern_memfd_sysctl_0_dfl",
+ mfd_def_size,
+ MFD_CLOEXEC);
+ mfd_assert_mode(fd, 0777);
+ mfd_assert_has_seals(fd, F_SEAL_SEAL);
+ mfd_assert_chmod(fd, 0644);
+ close(fd);
}
static void test_sysctl_set_sysctl0(void)
@@ -1206,6 +1214,14 @@ static void test_sysctl_sysctl1(void)
mfd_assert_has_seals(fd, F_SEAL_EXEC);
mfd_fail_chmod(fd, 0777);
close(fd);
+
+ fd = mfd_assert_new("kern_memfd_sysctl_1_noexec_nosealable",
+ mfd_def_size,
+ MFD_CLOEXEC | MFD_NOEXEC_SEAL);
+ mfd_assert_mode(fd, 0666);
+ mfd_assert_has_seals(fd, F_SEAL_EXEC | F_SEAL_SEAL);
+ mfd_fail_chmod(fd, 0777);
+ close(fd);
}
static void test_sysctl_set_sysctl1(void)
@@ -1238,6 +1254,14 @@ static void test_sysctl_sysctl2(void)
mfd_assert_has_seals(fd, F_SEAL_EXEC);
mfd_fail_chmod(fd, 0777);
close(fd);
+
+ fd = mfd_assert_new("kern_memfd_sysctl_2_noexec_notsealable",
+ mfd_def_size,
+ MFD_CLOEXEC | MFD_NOEXEC_SEAL);
+ mfd_assert_mode(fd, 0666);
+ mfd_assert_has_seals(fd, F_SEAL_EXEC | F_SEAL_SEAL);
+ mfd_fail_chmod(fd, 0777);
+ close(fd);
}
static void test_sysctl_set_sysctl2(void)
--
2.45.1.288.g0e0cd299f1-goog
Hi Aleksa
On Thu, May 23, 2024 at 8:39 PM <[email protected]> wrote:
>
> From: Jeff Xu <[email protected]>
>
> Add documentation for MFD_NOEXEC_SEAL and MFD_EXEC
>
> Cc: [email protected]
> Signed-off-by: Jeff Xu <[email protected]>
> ---
> Documentation/userspace-api/index.rst | 1 +
> Documentation/userspace-api/mfd_noexec.rst | 90 ++++++++++++++++++++++
> 2 files changed, 91 insertions(+)
> create mode 100644 Documentation/userspace-api/mfd_noexec.rst
>
> diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
> index 5926115ec0ed..8a251d71fa6e 100644
> --- a/Documentation/userspace-api/index.rst
> +++ b/Documentation/userspace-api/index.rst
> @@ -32,6 +32,7 @@ Security-related interfaces
> seccomp_filter
> landlock
> lsm
> + mfd_noexec
> spec_ctrl
> tee
>
> diff --git a/Documentation/userspace-api/mfd_noexec.rst b/Documentation/userspace-api/mfd_noexec.rst
> new file mode 100644
> index 000000000000..6f11ad86b076
> --- /dev/null
> +++ b/Documentation/userspace-api/mfd_noexec.rst
> @@ -0,0 +1,90 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +==================================
> +Introduction of non executable mfd
> +==================================
> +:Author:
> + Daniel Verkamp <[email protected]>
> + Jeff Xu <[email protected]>
> +
> +:Contributor:
> + Aleksa Sarai <[email protected]>
> + Barnabás Pőcze <[email protected]>
> + David Rheinsberg <[email protected]>
> +
> +Since Linux introduced the memfd feature, memfd have always had their
> +execute bit set, and the memfd_create() syscall doesn't allow setting
> +it differently.
> +
> +However, in a secure by default system, such as ChromeOS, (where all
> +executables should come from the rootfs, which is protected by Verified
> +boot), this executable nature of memfd opens a door for NoExec bypass
> +and enables “confused deputy attack”. E.g, in VRP bug [1]: cros_vm
> +process created a memfd to share the content with an external process,
> +however the memfd is overwritten and used for executing arbitrary code
> +and root escalation. [2] lists more VRP in this kind.
> +
> +On the other hand, executable memfd has its legit use, runc uses memfd’s
> +seal and executable feature to copy the contents of the binary then
> +execute them, for such system, we need a solution to differentiate runc's
> +use of executable memfds and an attacker's [3].
> +
> +To address those above.
> + - Let memfd_create() set X bit at creation time.
> + - Let memfd be sealed for modifying X bit when NX is set.
> + - A new pid namespace sysctl: vm.memfd_noexec to help applications to
> + migrating and enforcing non-executable MFD.
> +
> +User API
> +========
> +``int memfd_create(const char *name, unsigned int flags)``
> +
> +``MFD_NOEXEC_SEAL``
> + When MFD_NOEXEC_SEAL bit is set in the ``flags``, memfd is created
> + with NX. F_SEAL_EXEC is set and the memfd can't be modified to
> + add X later.
> + This is the most common case for the application to use memfd.
> +
> +``MFD_EXEC``
> + When MFD_EXEC bit is set in the ``flags``, memfd is created with X.
> +
> +Note:
> + ``MFD_NOEXEC_SEAL`` and ``MFD_EXEC`` doesn't change the sealable
> + characteristic of memfd, which is controlled by ``MFD_ALLOW_SEALING``.
> +
> +
> +Sysctl:
> +========
> +``pid namespaced sysctl vm.memfd_noexec``
> +
> +The new pid namespaced sysctl vm.memfd_noexec has 3 values:
> +
> + - 0: MEMFD_NOEXEC_SCOPE_EXEC
> + memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
> + MFD_EXEC was set.
> +
> + - 1: MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL
> + memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
> + MFD_NOEXEC_SEAL was set.
> +
> + - 2: MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED
> + memfd_create() without MFD_NOEXEC_SEAL will be rejected.
> +
> +The sysctl allows finer control of memfd_create for old-software that
> +doesn't set the executable bit, for example, a container with
> +vm.memfd_noexec=1 means the old-software will create non-executable memfd
> +by default while new-software can create executable memfd by setting
> +MFD_EXEC.
> +
> +The value of memfd_noexec is passed to child namespace at creation time,
> +in addition, the setting is hierarchical, i.e. during memfd_create,
> +we will search from current ns to root ns and use the most restrictive
> +setting.
> +
Can you please help to review the sysctl part to check if I captured
your change correctly ?
Thanks
-Jeff
> +Reference:
> +==========
> +[1] https://crbug.com/1305267
> +
> +[2] https://bugs.chromium.org/p/chromium/issues/list?q=type%3Dbug-security%20memfd%20escalation&can=1
> +
> +[3] https://lwn.net/Articles/781013/
> --
> 2.45.1.288.g0e0cd299f1-goog
>
Hi
On Fri, May 24, 2024, at 5:39 AM, [email protected] wrote:
> From: Jeff Xu <[email protected]>
>
> By default, memfd_create() creates a non-sealable MFD, unless the
> MFD_ALLOW_SEALING flag is set.
>
> When the MFD_NOEXEC_SEAL flag is initially introduced, the MFD created
> with that flag is sealable, even though MFD_ALLOW_SEALING is not set.
> This patch changes MFD_NOEXEC_SEAL to be non-sealable by default,
> unless MFD_ALLOW_SEALING is explicitly set.
>
> This is a non-backward compatible change. However, as MFD_NOEXEC_SEAL
> is new, we expect not many applications will rely on the nature of
> MFD_NOEXEC_SEAL being sealable. In most cases, the application already
> sets MFD_ALLOW_SEALING if they need a sealable MFD.
This does not really reflect the effort that went into this. Shouldn't this be something along the lines of:
This is a non-backward compatible change. However, MFD_NOEXEC_SEAL
was only recently introduced and a codesearch revealed no breaking
users apart from dbus-broker unit-tests (which have a patch pending
and explicitly support this change).
> Additionally, this enhances the useability of pid namespace sysctl
> vm.memfd_noexec. When vm.memfd_noexec equals 1 or 2, the kernel will
> add MFD_NOEXEC_SEAL if mfd_create does not specify MFD_EXEC or
> MFD_NOEXEC_SEAL, and the addition of MFD_NOEXEC_SEAL enables the MFD
> to be sealable. This means, any application that does not desire this
> behavior will be unable to utilize vm.memfd_noexec = 1 or 2 to
> migrate/enforce non-executable MFD. This adjustment ensures that
> applications can anticipate that the sealable characteristic will
> remain unmodified by vm.memfd_noexec.
>
> This patch was initially developed by Barnabás Pőcze, and Barnabás
> used Debian Code Search and GitHub to try to find potential breakages
> and could only find a single one. Dbus-broker's memfd_create() wrapper
> is aware of this implicit `MFD_ALLOW_SEALING` behavior, and tries to
> work around it [1]. This workaround will break. Luckily, this only
> affects the test suite, it does not affect
> the normal operations of dbus-broker. There is a PR with a fix[2]. In
> addition, David Rheinsberg also raised similar fix in [3]
>
> [1]:
> https://github.com/bus1/dbus-broker/blob/9eb0b7e5826fc76cad7b025bc46f267d4a8784cb/src/util/misc.c#L114
> [2]: https://github.com/bus1/dbus-broker/pull/366
> [3]:
> https://lore.kernel.org/lkml/[email protected]/
>
> Cc: [email protected]
> Fixes: 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
> Signed-off-by: Barnabás Pőcze <[email protected]>
> Signed-off-by: Jeff Xu <[email protected]>
> Reviewed-by: David Rheinsberg <[email protected]>
Looks good! Thanks!
David
Hi David and Barnabás
On Fri, May 24, 2024 at 7:15 AM David Rheinsberg <[email protected]> wrote:
>
> Hi
>
> On Fri, May 24, 2024, at 5:39 AM, [email protected] wrote:
> > From: Jeff Xu <[email protected]>
> >
> > By default, memfd_create() creates a non-sealable MFD, unless the
> > MFD_ALLOW_SEALING flag is set.
> >
> > When the MFD_NOEXEC_SEAL flag is initially introduced, the MFD created
> > with that flag is sealable, even though MFD_ALLOW_SEALING is not set.
> > This patch changes MFD_NOEXEC_SEAL to be non-sealable by default,
> > unless MFD_ALLOW_SEALING is explicitly set.
> >
> > This is a non-backward compatible change. However, as MFD_NOEXEC_SEAL
> > is new, we expect not many applications will rely on the nature of
> > MFD_NOEXEC_SEAL being sealable. In most cases, the application already
> > sets MFD_ALLOW_SEALING if they need a sealable MFD.
>
> This does not really reflect the effort that went into this. Shouldn't this be something along the lines of:
>
> This is a non-backward compatible change. However, MFD_NOEXEC_SEAL
> was only recently introduced and a codesearch revealed no breaking
> users apart from dbus-broker unit-tests (which have a patch pending
> and explicitly support this change).
>
Actually, I think we might need to hold on to this change. With debian
code search, I found more codes that already use MFD_NOEXEC_SEAL
without MFD_ALLOW_SEALING. e.g. systemd [1], [2] [3]
I'm not sure if this will break more applications not-knowingly that
have started relying on MFD_NOEXEC_SEAL being sealable. The feature
has been out for more than a year.
Would you consider my augments in [4] to make MFD to be sealable by default ?
At this moment, I'm willing to add a document to clarify that
MFD_NOEXEC_SEAL is sealable by default, and that an app that needs
non-sealable MFD can set SEAL_SEAL. Because both MFD_NOEXEC_SEAL
and vm.memfd_noexec are new, I don't think it breaks the existing
ABI, and vm.memfd_noexec=0 is there for backward compatibility
reasons. Besides, I honestly think there is little reason that MFD
needs to be non-sealable by default. There might be few rare cases,
but the majority of apps don't need that. On the flip side, the fact
that MFD is set up to be sealable by default is a nice bonus for an
app - it makes it easier for apps to use the sealing feature.
What do you think ?
Thanks
-Jeff
[1] https://codesearch.debian.net/search?q=MFD_NOEXEC_SEAL
[2] https://codesearch.debian.net/show?file=systemd_256~rc3-5%2Fsrc%2Fhome%2Fhomed-home.c&line=1274
[3] https://sources.debian.org/src/elogind/255.5-1debian1/src/shared/serialize.c/?hl=558#L558
[4] https://lore.kernel.org/lkml/CALmYWFuPBEM2DE97mQvB2eEgSO9Dvt=uO9OewMhGfhGCY66Hbw@mail.gmail.com/
> > Additionally, this enhances the useability of pid namespace sysctl
> > vm.memfd_noexec. When vm.memfd_noexec equals 1 or 2, the kernel will
> > add MFD_NOEXEC_SEAL if mfd_create does not specify MFD_EXEC or
> > MFD_NOEXEC_SEAL, and the addition of MFD_NOEXEC_SEAL enables the MFD
> > to be sealable. This means, any application that does not desire this
> > behavior will be unable to utilize vm.memfd_noexec = 1 or 2 to
> > migrate/enforce non-executable MFD. This adjustment ensures that
> > applications can anticipate that the sealable characteristic will
> > remain unmodified by vm.memfd_noexec.
> >
> > This patch was initially developed by Barnabás Pőcze, and Barnabás
> > used Debian Code Search and GitHub to try to find potential breakages
> > and could only find a single one. Dbus-broker's memfd_create() wrapper
> > is aware of this implicit `MFD_ALLOW_SEALING` behavior, and tries to
> > work around it [1]. This workaround will break. Luckily, this only
> > affects the test suite, it does not affect
> > the normal operations of dbus-broker. There is a PR with a fix[2]. In
> > addition, David Rheinsberg also raised similar fix in [3]
> >
> > [1]:
> > https://github.com/bus1/dbus-broker/blob/9eb0b7e5826fc76cad7b025bc46f267d4a8784cb/src/util/misc.c#L114
> > [2]: https://github.com/bus1/dbus-broker/pull/366
> > [3]:
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > Cc: [email protected]
> > Fixes: 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
> > Signed-off-by: Barnabás Pőcze <[email protected]>
> > Signed-off-by: Jeff Xu <[email protected]>
> > Reviewed-by: David Rheinsberg <[email protected]>
>
> Looks good! Thanks!
> David
Hi Barnabás
On Fri, May 31, 2024 at 11:56 AM Barnabás Pőcze <[email protected]> wrote:
>
> 2024. május 30., csütörtök 0:24 keltezéssel, Jeff Xu <[email protected]> írta:
>
> > On Wed, May 29, 2024 at 2:46 PM Barnabás Pőcze <[email protected]> wrote:
> > >
> > > Hi
> > >
> > >
> > > 2024. május 29., szerda 23:30 keltezéssel, Jeff Xu <[email protected]> írta:
> > >
> > > > Hi David and Barnabás
> > > >
> > > > On Fri, May 24, 2024 at 7:15 AM David Rheinsberg <[email protected]> wrote:
> > > > >
> > > > > Hi
> > > > >
> > > > > On Fri, May 24, 2024, at 5:39 AM, [email protected] wrote:
> > > > > > From: Jeff Xu <[email protected]>
> > > > > >
> > > > > > By default, memfd_create() creates a non-sealable MFD, unless the
> > > > > > MFD_ALLOW_SEALING flag is set.
> > > > > >
> > > > > > When the MFD_NOEXEC_SEAL flag is initially introduced, the MFD created
> > > > > > with that flag is sealable, even though MFD_ALLOW_SEALING is not set.
> > > > > > This patch changes MFD_NOEXEC_SEAL to be non-sealable by default,
> > > > > > unless MFD_ALLOW_SEALING is explicitly set.
> > > > > >
> > > > > > This is a non-backward compatible change. However, as MFD_NOEXEC_SEAL
> > > > > > is new, we expect not many applications will rely on the nature of
> > > > > > MFD_NOEXEC_SEAL being sealable. In most cases, the application already
> > > > > > sets MFD_ALLOW_SEALING if they need a sealable MFD.
> > > > >
> > > > > This does not really reflect the effort that went into this. Shouldn't this be something along the lines of:
> > > > >
> > > > > This is a non-backward compatible change. However, MFD_NOEXEC_SEAL
> > > > > was only recently introduced and a codesearch revealed no breaking
> > > > > users apart from dbus-broker unit-tests (which have a patch pending
> > > > > and explicitly support this change).
> > > > >
> > > > Actually, I think we might need to hold on to this change. With debian
> > > > code search, I found more codes that already use MFD_NOEXEC_SEAL
> > > > without MFD_ALLOW_SEALING. e.g. systemd [1], [2] [3]
> > >
> > > Yes, I have looked at those as well, and as far as I could tell,
> > > they are not affected. Have I missed something?
> > >
> > In the example, the MFD was created then passed into somewhere else
> > (safe_fork_full, open_serialization_fd, etc.), the scope and usage of
> > mfd isn't that clear to me, you might have checked all the user cases.
> > In addition, MFD_NOEXEC_SEAL exists in libc and rust and go lib. I
> > don't know if debian code search is sufficient to cover enough apps .
> > There is a certain risk.
> >
> > Fundamentally, I'm not convinced that making MFD default-non-sealable
> > has meaningful benefit, especially when MFD_NOEXEC_SEAL is new.
>
> Certainly, there is always a risk, I did not mean to imply that there isn't.
> However, I believe this risk is low enough to at least warrant an attempt at
> eliminating this inconsistency. It can always be reverted if it turns out that
> the effects have been vastly underestimated by me.
>
> So I would still like to see this change merged.
>
The MFD_NOEXEC_SEAL is a new flag, technically, ABI is not broken.
The sysctl vm.memfd_noexec=1 or 2, is meant to help
migration/enforcement of MFD_NOEXEC_SEAL, so it will break application
if it is used pre-maturely, that is by-design.
I think the main problem here is lack of documentation, instead of a code bug.
ABI change shouldn't be treated lightly, given the risk, I would like
to keep the API the same and add the documentation instead. I think
that is the best route forward.
Best Regards,
-Jeff
>
> Regards,
> Barnabás Pőcze
>
>
> >
> >
> > >
> > > Regards,
> > > Barnabás
> > >
> > >
> > > >
> > > > I'm not sure if this will break more applications not-knowingly that
> > > > have started relying on MFD_NOEXEC_SEAL being sealable. The feature
> > > > has been out for more than a year.
> > > >
> > > > Would you consider my augments in [4] to make MFD to be sealable by default ?
> > > >
> > > > At this moment, I'm willing to add a document to clarify that
> > > > MFD_NOEXEC_SEAL is sealable by default, and that an app that needs
> > > > non-sealable MFD can set SEAL_SEAL. Because both MFD_NOEXEC_SEAL
> > > > and vm.memfd_noexec are new, I don't think it breaks the existing
> > > > ABI, and vm.memfd_noexec=0 is there for backward compatibility
> > > > reasons. Besides, I honestly think there is little reason that MFD
> > > > needs to be non-sealable by default. There might be few rare cases,
> > > > but the majority of apps don't need that. On the flip side, the fact
> > > > that MFD is set up to be sealable by default is a nice bonus for an
> > > > app - it makes it easier for apps to use the sealing feature.
> > > >
> > > > What do you think ?
> > > >
> > > > Thanks
> > > > -Jeff
> > > >
> > > > [1] https://codesearch.debian.net/search?q=MFD_NOEXEC_SEAL
> > > > [2] https://codesearch.debian.net/show?file=systemd_256~rc3-5%2Fsrc%2Fhome%2Fhomed-home.c&line=1274
> > > > [3] https://sources.debian.org/src/elogind/255.5-1debian1/src/shared/serialize.c/?hl=558#L558
> > > > [4] https://lore.kernel.org/lkml/CALmYWFuPBEM2DE97mQvB2eEgSO9Dvt=uO9OewMhGfhGCY66Hbw@mail.gmail.com/
> > > >
> > > >
> > > > > > Additionally, this enhances the useability of pid namespace sysctl
> > > > > > vm.memfd_noexec. When vm.memfd_noexec equals 1 or 2, the kernel will
> > > > > > add MFD_NOEXEC_SEAL if mfd_create does not specify MFD_EXEC or
> > > > > > MFD_NOEXEC_SEAL, and the addition of MFD_NOEXEC_SEAL enables the MFD
> > > > > > to be sealable. This means, any application that does not desire this
> > > > > > behavior will be unable to utilize vm.memfd_noexec = 1 or 2 to
> > > > > > migrate/enforce non-executable MFD. This adjustment ensures that
> > > > > > applications can anticipate that the sealable characteristic will
> > > > > > remain unmodified by vm.memfd_noexec.
> > > > > >
> > > > > > This patch was initially developed by Barnabás Pőcze, and Barnabás
> > > > > > used Debian Code Search and GitHub to try to find potential breakages
> > > > > > and could only find a single one. Dbus-broker's memfd_create() wrapper
> > > > > > is aware of this implicit `MFD_ALLOW_SEALING` behavior, and tries to
> > > > > > work around it [1]. This workaround will break. Luckily, this only
> > > > > > affects the test suite, it does not affect
> > > > > > the normal operations of dbus-broker. There is a PR with a fix[2]. In
> > > > > > addition, David Rheinsberg also raised similar fix in [3]
> > > > > >
> > > > > > [1]:
> > > > > > https://github.com/bus1/dbus-broker/blob/9eb0b7e5826fc76cad7b025bc46f267d4a8784cb/src/util/misc.c#L114
> > > > > > [2]: https://github.com/bus1/dbus-broker/pull/366
> > > > > > [3]:
> > > > > > https://lore.kernel.org/lkml/[email protected]/
> > > > > >
> > > > > > Cc: [email protected]
> > > > > > Fixes: 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
> > > > > > Signed-off-by: Barnabás Pőcze <[email protected]>
> > > > > > Signed-off-by: Jeff Xu <[email protected]>
> > > > > > Reviewed-by: David Rheinsberg <[email protected]>
> > > > >
> > > > > Looks good! Thanks!
> > > > > David
> > > >