Joey recently introduced a Memory-Deny-Write-Executable (MDWE) prctl which tags
current with a flag that prevents pages that were previously not executable from
becoming executable.
This tag always gets inherited by children tasks. (it's in MMF_INIT_MASK)
At Google, we've been using a somewhat similar downstream patch for a few years
now. To make the adoption of this feature easier, we've had it support a mode in
which the W^X flag does not propagate to children. For example, this is handy if
a C process which wants W^X protection suspects it could start children
processes that would use a JIT.
I'd like to align our features with the upstream prctl. This series proposes a
new NO_INHERIT flag to the MDWE prctl to make this kind of adoption easier. It
sets a different flag in current that is not in MMF_INIT_MASK and which does not
propagate.
As part of looking into MDWE, I also fixed a couple of things in the MDWE test.
This series applies on the mm-everything-2023-05-16-23-30 tag of the mm tree:
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/
Diff since v2:
- Turned the MMF_INIT_FLAGS macro into a mmf_init_flags function as suggested by
David Hildenbrand
- Removed the ability to transition from to PR_MDWE_REFUSE_EXEC_GAIN from
(PR_MDWE_REFUSE_EXEC_GAIN | PR_MDWE_NO_INHERIT) which also significantly
simplifies the prctl_set_mdwe logic
- Cc-ed -stable on patch 3 as suggested by Alexey Izbyshev
- Added a handful of Reviewed-by/Acked-by trailers
Diff since v1:
- MMF_HAS_MDWE_NO_INHERIT clears MMF_HAS_MDWE in the fork path as part of a
MMF_INIT_FLAGS macro (suggested by Catalin)
- PR_MDWE_* are defined as unsigned long rather than int (suggested by Andrey)
Florent Revest (5):
kselftest: vm: Fix tabs/spaces inconsistency in the mdwe test
kselftest: vm: Fix mdwe's mmap_FIXED test case
mm: Make PR_MDWE_REFUSE_EXEC_GAIN an unsigned long
mm: Add a NO_INHERIT flag to the PR_SET_MDWE prctl
kselftest: vm: Add tests for no-inherit memory-deny-write-execute
include/linux/sched/coredump.h | 10 +++
include/uapi/linux/prctl.h | 3 +-
kernel/fork.c | 2 +-
kernel/sys.c | 32 +++++--
tools/include/uapi/linux/prctl.h | 3 +-
tools/testing/selftests/mm/mdwe_test.c | 113 +++++++++++++++++++++----
6 files changed, 139 insertions(+), 24 deletions(-)
--
2.41.0.255.g8b1d071c50-goog
Add some tests to cover the new PR_MDWE_NO_INHERIT flag of the
PR_SET_MDWE prctl.
Check that:
- it can't be set without PR_SET_MDWE
- MDWE flags can't be unset
- when set, PR_SET_MDWE doesn't propagate to children
Signed-off-by: Florent Revest <[email protected]>
---
tools/testing/selftests/mm/mdwe_test.c | 98 ++++++++++++++++++++++++--
1 file changed, 92 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/mm/mdwe_test.c b/tools/testing/selftests/mm/mdwe_test.c
index 91aa9c3099e7..7bfc98bf9baa 100644
--- a/tools/testing/selftests/mm/mdwe_test.c
+++ b/tools/testing/selftests/mm/mdwe_test.c
@@ -22,6 +22,8 @@
TEST(prctl_flags)
{
+ EXPECT_LT(prctl(PR_SET_MDWE, PR_MDWE_NO_INHERIT, 0L, 0L, 7L), 0);
+
EXPECT_LT(prctl(PR_SET_MDWE, 7L, 0L, 0L, 0L), 0);
EXPECT_LT(prctl(PR_SET_MDWE, 0L, 7L, 0L, 0L), 0);
EXPECT_LT(prctl(PR_SET_MDWE, 0L, 0L, 7L, 0L), 0);
@@ -33,6 +35,69 @@ TEST(prctl_flags)
EXPECT_LT(prctl(PR_GET_MDWE, 0L, 0L, 0L, 7L), 0);
}
+FIXTURE(consecutive_prctl_flags) {};
+FIXTURE_SETUP(consecutive_prctl_flags) {}
+FIXTURE_TEARDOWN(consecutive_prctl_flags) {}
+
+FIXTURE_VARIANT(consecutive_prctl_flags)
+{
+ unsigned long first_flags;
+ unsigned long second_flags;
+ bool should_work;
+};
+
+FIXTURE_VARIANT_ADD(consecutive_prctl_flags, same)
+{
+ .first_flags = PR_MDWE_REFUSE_EXEC_GAIN,
+ .second_flags = PR_MDWE_REFUSE_EXEC_GAIN,
+ .should_work = true,
+};
+
+FIXTURE_VARIANT_ADD(consecutive_prctl_flags, cant_disable_mdwe)
+{
+ .first_flags = PR_MDWE_REFUSE_EXEC_GAIN,
+ .second_flags = 0,
+ .should_work = false,
+};
+
+FIXTURE_VARIANT_ADD(consecutive_prctl_flags, cant_disable_mdwe_no_inherit)
+{
+ .first_flags = PR_MDWE_REFUSE_EXEC_GAIN | PR_MDWE_NO_INHERIT,
+ .second_flags = 0,
+ .should_work = false,
+};
+
+FIXTURE_VARIANT_ADD(consecutive_prctl_flags, cant_disable_no_inherit)
+{
+ .first_flags = PR_MDWE_REFUSE_EXEC_GAIN | PR_MDWE_NO_INHERIT,
+ .second_flags = PR_MDWE_REFUSE_EXEC_GAIN,
+ .should_work = false,
+};
+
+FIXTURE_VARIANT_ADD(consecutive_prctl_flags, cant_enable_no_inherit)
+{
+ .first_flags = PR_MDWE_REFUSE_EXEC_GAIN,
+ .second_flags = PR_MDWE_REFUSE_EXEC_GAIN | PR_MDWE_NO_INHERIT,
+ .should_work = false,
+};
+
+TEST_F(consecutive_prctl_flags, two_prctls)
+{
+ int ret;
+
+ EXPECT_EQ(prctl(PR_SET_MDWE, variant->first_flags, 0L, 0L, 0L), 0);
+
+ ret = prctl(PR_SET_MDWE, variant->second_flags, 0L, 0L, 0L);
+ if (variant->should_work) {
+ EXPECT_EQ(ret, 0);
+
+ ret = prctl(PR_GET_MDWE, 0L, 0L, 0L, 0L);
+ ASSERT_EQ(ret, variant->second_flags);
+ } else {
+ EXPECT_NE(ret, 0);
+ }
+}
+
FIXTURE(mdwe)
{
void *p;
@@ -45,28 +110,45 @@ FIXTURE_VARIANT(mdwe)
{
bool enabled;
bool forked;
+ bool inherit;
};
FIXTURE_VARIANT_ADD(mdwe, stock)
{
.enabled = false,
.forked = false,
+ .inherit = false,
};
FIXTURE_VARIANT_ADD(mdwe, enabled)
{
.enabled = true,
.forked = false,
+ .inherit = true,
};
-FIXTURE_VARIANT_ADD(mdwe, forked)
+FIXTURE_VARIANT_ADD(mdwe, inherited)
{
.enabled = true,
.forked = true,
+ .inherit = true,
};
+FIXTURE_VARIANT_ADD(mdwe, not_inherited)
+{
+ .enabled = true,
+ .forked = true,
+ .inherit = false,
+};
+
+static bool executable_map_should_fail(const FIXTURE_VARIANT(mdwe) *variant)
+{
+ return variant->enabled && (!variant->forked || variant->inherit);
+}
+
FIXTURE_SETUP(mdwe)
{
+ unsigned long mdwe_flags;
int ret, status;
self->p = NULL;
@@ -76,13 +158,17 @@ FIXTURE_SETUP(mdwe)
if (!variant->enabled)
return;
- ret = prctl(PR_SET_MDWE, PR_MDWE_REFUSE_EXEC_GAIN, 0L, 0L, 0L);
+ mdwe_flags = PR_MDWE_REFUSE_EXEC_GAIN;
+ if (!variant->inherit)
+ mdwe_flags |= PR_MDWE_NO_INHERIT;
+
+ ret = prctl(PR_SET_MDWE, mdwe_flags, 0L, 0L, 0L);
ASSERT_EQ(ret, 0) {
TH_LOG("PR_SET_MDWE failed or unsupported");
}
ret = prctl(PR_GET_MDWE, 0L, 0L, 0L, 0L);
- ASSERT_EQ(ret, 1);
+ ASSERT_EQ(ret, mdwe_flags);
if (variant->forked) {
self->pid = fork();
@@ -113,7 +199,7 @@ TEST_F(mdwe, mmap_READ_EXEC)
TEST_F(mdwe, mmap_WRITE_EXEC)
{
self->p = mmap(NULL, self->size, PROT_WRITE | PROT_EXEC, self->flags, 0, 0);
- if (variant->enabled) {
+ if (executable_map_should_fail(variant)) {
EXPECT_EQ(self->p, MAP_FAILED);
} else {
EXPECT_NE(self->p, MAP_FAILED);
@@ -139,7 +225,7 @@ TEST_F(mdwe, mprotect_add_EXEC)
ASSERT_NE(self->p, MAP_FAILED);
ret = mprotect(self->p, self->size, PROT_READ | PROT_EXEC);
- if (variant->enabled) {
+ if (executable_map_should_fail(variant)) {
EXPECT_LT(ret, 0);
} else {
EXPECT_EQ(ret, 0);
@@ -154,7 +240,7 @@ TEST_F(mdwe, mprotect_WRITE_EXEC)
ASSERT_NE(self->p, MAP_FAILED);
ret = mprotect(self->p, self->size, PROT_WRITE | PROT_EXEC);
- if (variant->enabled) {
+ if (executable_map_should_fail(variant)) {
EXPECT_LT(ret, 0);
} else {
EXPECT_EQ(ret, 0);
--
2.41.0.255.g8b1d071c50-goog
This extends the current PR_SET_MDWE prctl arg with a bit to indicate
that the process doesn't want MDWE protection to propagate to children.
To implement this no-inherit mode, the tag in current->mm->flags must be
absent from MMF_INIT_MASK. This means that the encoding for "MDWE but
without inherit" is different in the prctl than in the mm flags. This
leads to a bit of bit-mangling in the prctl implementation.
Signed-off-by: Florent Revest <[email protected]>
---
include/linux/sched/coredump.h | 10 ++++++++++
include/uapi/linux/prctl.h | 1 +
kernel/fork.c | 2 +-
kernel/sys.c | 32 ++++++++++++++++++++++++++------
tools/include/uapi/linux/prctl.h | 1 +
5 files changed, 39 insertions(+), 7 deletions(-)
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index 0ee96ea7a0e9..1b37fa8fc723 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -91,4 +91,14 @@ static inline int get_dumpable(struct mm_struct *mm)
MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK)
#define MMF_VM_MERGE_ANY 29
+#define MMF_HAS_MDWE_NO_INHERIT 30
+
+static inline unsigned long mmf_init_flags(unsigned long flags)
+{
+ if (flags & (1UL << MMF_HAS_MDWE_NO_INHERIT))
+ flags &= ~((1UL << MMF_HAS_MDWE) |
+ (1UL << MMF_HAS_MDWE_NO_INHERIT));
+ return flags & MMF_INIT_MASK;
+}
+
#endif /* _LINUX_SCHED_COREDUMP_H */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 6e9af6cbc950..dacbe824e7c3 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -284,6 +284,7 @@ struct prctl_mm_map {
/* Memory deny write / execute */
#define PR_SET_MDWE 65
# define PR_MDWE_REFUSE_EXEC_GAIN (1UL << 0)
+# define PR_MDWE_NO_INHERIT (1UL << 1)
#define PR_GET_MDWE 66
diff --git a/kernel/fork.c b/kernel/fork.c
index d17995934eb4..bc3c762d378f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1284,7 +1284,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
hugetlb_count_init(mm);
if (current->mm) {
- mm->flags = current->mm->flags & MMF_INIT_MASK;
+ mm->flags = mmf_init_flags(current->mm->flags);
mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK;
} else {
mm->flags = default_dump_filter;
diff --git a/kernel/sys.c b/kernel/sys.c
index 339fee3eff6a..1a2dc3da43ea 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2362,19 +2362,41 @@ static int prctl_set_vma(unsigned long opt, unsigned long start,
}
#endif /* CONFIG_ANON_VMA_NAME */
+static inline unsigned long get_current_mdwe(void)
+{
+ unsigned long ret = 0;
+
+ if (test_bit(MMF_HAS_MDWE, ¤t->mm->flags))
+ ret |= PR_MDWE_REFUSE_EXEC_GAIN;
+ if (test_bit(MMF_HAS_MDWE_NO_INHERIT, ¤t->mm->flags))
+ ret |= PR_MDWE_NO_INHERIT;
+
+ return ret;
+}
+
static inline int prctl_set_mdwe(unsigned long bits, unsigned long arg3,
unsigned long arg4, unsigned long arg5)
{
+ unsigned long current_bits;
+
if (arg3 || arg4 || arg5)
return -EINVAL;
- if (bits & ~(PR_MDWE_REFUSE_EXEC_GAIN))
+ if (bits & ~(PR_MDWE_REFUSE_EXEC_GAIN | PR_MDWE_NO_INHERIT))
+ return -EINVAL;
+
+ /* NO_INHERIT only makes sense with REFUSE_EXEC_GAIN */
+ if (bits & PR_MDWE_NO_INHERIT && !(bits & PR_MDWE_REFUSE_EXEC_GAIN))
return -EINVAL;
+ current_bits = get_current_mdwe();
+ if (current_bits && current_bits != bits)
+ return -EPERM; /* Cannot unset the flags */
+
+ if (bits & PR_MDWE_NO_INHERIT)
+ set_bit(MMF_HAS_MDWE_NO_INHERIT, ¤t->mm->flags);
if (bits & PR_MDWE_REFUSE_EXEC_GAIN)
set_bit(MMF_HAS_MDWE, ¤t->mm->flags);
- else if (test_bit(MMF_HAS_MDWE, ¤t->mm->flags))
- return -EPERM; /* Cannot unset the flag */
return 0;
}
@@ -2384,9 +2406,7 @@ static inline int prctl_get_mdwe(unsigned long arg2, unsigned long arg3,
{
if (arg2 || arg3 || arg4 || arg5)
return -EINVAL;
-
- return test_bit(MMF_HAS_MDWE, ¤t->mm->flags) ?
- PR_MDWE_REFUSE_EXEC_GAIN : 0;
+ return (int)get_current_mdwe();
}
static int prctl_get_auxv(void __user *addr, unsigned long len)
diff --git a/tools/include/uapi/linux/prctl.h b/tools/include/uapi/linux/prctl.h
index 6e9af6cbc950..dacbe824e7c3 100644
--- a/tools/include/uapi/linux/prctl.h
+++ b/tools/include/uapi/linux/prctl.h
@@ -284,6 +284,7 @@ struct prctl_mm_map {
/* Memory deny write / execute */
#define PR_SET_MDWE 65
# define PR_MDWE_REFUSE_EXEC_GAIN (1UL << 0)
+# define PR_MDWE_NO_INHERIT (1UL << 1)
#define PR_GET_MDWE 66
--
2.41.0.255.g8b1d071c50-goog
I checked with the original author, the mmap_FIXED test case wasn't
properly tested and fails. Currently, it maps two consecutive (non
overlapping) pages and expects the second mapping to be denied by MDWE
but these two pages have nothing to do with each other so MDWE is
actually out of the picture here.
What the test actually intended to do was to remap a virtual address
using MAP_FIXED. However, this operation unmaps the existing mapping and
creates a new one so the va is backed by a new page and MDWE is again
out of the picture, all remappings should succeed.
This patch keeps the test case to make it clear that this situation is
expected to work.
Signed-off-by: Florent Revest <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Fixes: 4cf1fe34fd18 ("kselftest: vm: add tests for memory-deny-write-execute")
---
tools/testing/selftests/mm/mdwe_test.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/mm/mdwe_test.c b/tools/testing/selftests/mm/mdwe_test.c
index d0954c657feb..91aa9c3099e7 100644
--- a/tools/testing/selftests/mm/mdwe_test.c
+++ b/tools/testing/selftests/mm/mdwe_test.c
@@ -168,13 +168,10 @@ TEST_F(mdwe, mmap_FIXED)
self->p = mmap(NULL, self->size, PROT_READ, self->flags, 0, 0);
ASSERT_NE(self->p, MAP_FAILED);
- p = mmap(self->p + self->size, self->size, PROT_READ | PROT_EXEC,
+ /* MAP_FIXED unmaps the existing page before mapping which is allowed */
+ p = mmap(self->p, self->size, PROT_READ | PROT_EXEC,
self->flags | MAP_FIXED, 0, 0);
- if (variant->enabled) {
- EXPECT_EQ(p, MAP_FAILED);
- } else {
- EXPECT_EQ(p, self->p);
- }
+ EXPECT_EQ(p, self->p);
}
TEST_F(mdwe, arm64_BTI)
--
2.41.0.255.g8b1d071c50-goog
On 04/07/2023 16:36, Florent Revest wrote:
> I checked with the original author, the mmap_FIXED test case wasn't
> properly tested and fails. Currently, it maps two consecutive (non
> overlapping) pages and expects the second mapping to be denied by MDWE
> but these two pages have nothing to do with each other so MDWE is
> actually out of the picture here.
>
> What the test actually intended to do was to remap a virtual address
> using MAP_FIXED. However, this operation unmaps the existing mapping and
> creates a new one so the va is backed by a new page and MDWE is again
> out of the picture, all remappings should succeed.
>
> This patch keeps the test case to make it clear that this situation is
> expected to work.
>
> Signed-off-by: Florent Revest <[email protected]>
> Reviewed-by: David Hildenbrand <[email protected]>
> Fixes: 4cf1fe34fd18 ("kselftest: vm: add tests for memory-deny-write-execute")
I'm currently working to get all the mm selftests running (and ideally passing!)
on arm64 and saw these same spurious failures. Thanks for the patch!
Reviewed-by: Ryan Roberts <[email protected]>
Tested-by: Ryan Roberts <[email protected]>
> ---
> tools/testing/selftests/mm/mdwe_test.c | 9 +++------
> 1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/tools/testing/selftests/mm/mdwe_test.c b/tools/testing/selftests/mm/mdwe_test.c
> index d0954c657feb..91aa9c3099e7 100644
> --- a/tools/testing/selftests/mm/mdwe_test.c
> +++ b/tools/testing/selftests/mm/mdwe_test.c
> @@ -168,13 +168,10 @@ TEST_F(mdwe, mmap_FIXED)
> self->p = mmap(NULL, self->size, PROT_READ, self->flags, 0, 0);
> ASSERT_NE(self->p, MAP_FAILED);
>
> - p = mmap(self->p + self->size, self->size, PROT_READ | PROT_EXEC,
> + /* MAP_FIXED unmaps the existing page before mapping which is allowed */
> + p = mmap(self->p, self->size, PROT_READ | PROT_EXEC,
> self->flags | MAP_FIXED, 0, 0);
> - if (variant->enabled) {
> - EXPECT_EQ(p, MAP_FAILED);
> - } else {
> - EXPECT_EQ(p, self->p);
> - }
> + EXPECT_EQ(p, self->p);
> }
>
> TEST_F(mdwe, arm64_BTI)