From: Jeff Xu <[email protected]>
This is the first set of Memory mapping (VMA) protection patches using PKU.
* * *
Background:
As discussed previously in the kernel mailing list [1], V8 CFI [2] uses
PKU to protect memory, and Stephen Röttger proposes to extend the PKU to
memory mapping [3].
We're using PKU for in-process isolation to enforce control-flow integrity
for a JIT compiler. In our threat model, an attacker exploits a
vulnerability and has arbitrary read/write access to the whole process
space concurrently to other threads being executed. This attacker can
manipulate some arguments to syscalls from some threads.
Under such a powerful attack, we want to create a “safe/isolated”
thread environment. We assign dedicated PKUs to this thread,
and use those PKUs to protect the threads’ runtime environment.
The thread has exclusive access to its run-time memory. This
includes modifying the protection of the memory mapping, or
munmap the memory mapping after use. And the other threads
won’t be able to access the memory or modify the memory mapping
(VMA) belonging to the thread.
* * *
Proposed changes:
This patch introduces a new flag, PKEY_ENFORCE_API, to the pkey_alloc()
function. When a PKEY is created with this flag, it is enforced that any
thread that wants to make changes to the memory mapping (such as mprotect)
of the memory must have write access to the PKEY. PKEYs created without
this flag will continue to work as they do now, for backwards
compatibility.
Only PKEY created from user space can have the new flag set, the PKEY
allocated by the kernel internally will not have it. In other words,
ARCH_DEFAULT_PKEY(0) and execute_only_pkey won’t have this flag set,
and continue work as today.
This flag is checked only at syscall entry, such as mprotect/munmap in
this set of patches. It will not apply to other call paths. In other
words, if the kernel want to change attributes of VMA for some reasons,
the kernel is free to do that and not affected by this new flag.
This set of patch covers mprotect/munmap, I plan to work on other
syscalls after this.
* * *
Testing:
I have tested this patch on a Linux kernel 5.15, 6,1, and 6.4-rc1,
new selftest is added in: pkey_enforce_api.c
* * *
Discussion:
We believe that this patch provides a valuable security feature.
It allows us to create “safe/isolated” thread environments that are
protected from attackers with arbitrary read/write access to
the process space.
We believe that the interface change and the patch don't
introduce backwards compatibility risk.
We would like to disucss this patch in Linux kernel community
for feedback and support.
* * *
Reference:
[1]https://lore.kernel.org/all/202208221331.71C50A6F@keescook/
[2]https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXgeaRHo/edit?usp=sharing
[3]https://docs.google.com/document/d/1qqVoVfRiF2nRylL3yjZyCQvzQaej1HRPh3f5wj1AS9I/edit
* * *
Current status:
There are on-going discussion related to threat model, io_uring, we will continue discuss using v0 thread.
* * *
PATCH history:
v1: update code related review comments:
mprotect.c:
remove syscall from do_mprotect_pkey()
remove pr_warn_ratelimited
munmap.c:
change syscall to enum caller_origin
remove pr_warn_ratelimited
v0:
https://lore.kernel.org/linux-mm/[email protected]/
Best Regards,
-Jeff Xu
Jeff Xu (6):
PKEY: Introduce PKEY_ENFORCE_API flag
PKEY: Add arch_check_pkey_enforce_api()
PKEY: Apply PKEY_ENFORCE_API to mprotect
PKEY:selftest pkey_enforce_api for mprotect
PKEY: Apply PKEY_ENFORCE_API to munmap
PKEY:selftest pkey_enforce_api for munmap
arch/powerpc/include/asm/pkeys.h | 19 +-
arch/x86/include/asm/mmu.h | 7 +
arch/x86/include/asm/pkeys.h | 92 +-
arch/x86/mm/pkeys.c | 2 +-
include/linux/mm.h | 8 +-
include/linux/pkeys.h | 18 +-
include/uapi/linux/mman.h | 5 +
mm/mmap.c | 31 +-
mm/mprotect.c | 17 +-
mm/mremap.c | 6 +-
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/pkey_enforce_api.c | 1312 +++++++++++++++++
12 files changed, 1499 insertions(+), 19 deletions(-)
create mode 100644 tools/testing/selftests/mm/pkey_enforce_api.c
base-commit: ba0ad6ed89fd5dada3b7b65ef2b08e95d449d4ab
--
2.40.1.606.ga4b1b128d6-goog
From: Jeff Xu <[email protected]>
Add selftest for pkey_enforce_api for mprotect
Signed-off-by: Jeff Xu<[email protected]>
---
tools/testing/selftests/mm/pkey_enforce_api.c | 437 ++++++++++++++++++
1 file changed, 437 insertions(+)
diff --git a/tools/testing/selftests/mm/pkey_enforce_api.c b/tools/testing/selftests/mm/pkey_enforce_api.c
index 23663c89bc9c..92aa29248e1f 100644
--- a/tools/testing/selftests/mm/pkey_enforce_api.c
+++ b/tools/testing/selftests/mm/pkey_enforce_api.c
@@ -833,6 +833,429 @@ void test_mprotect_child_thread(bool enforce)
clean_single_address_with_pkey(pkey, ptr, size);
}
+// mmap one address with one page.
+// assign PKEY to the address.
+// munmap on the address is protected.
+void test_munmap_single_address(bool enforce)
+{
+ int pkey;
+ int ret;
+ void *ptr;
+ int size = PAGE_SIZE;
+
+ LOG_TEST_ENTER(enforce);
+
+ setup_single_address_with_pkey(enforce, size, &pkey, &ptr);
+
+ // disable write access.
+ pkey_write_deny(pkey);
+
+ ret = munmap(ptr, size);
+ if (enforce)
+ assert(ret < 0);
+ else
+ assert(!ret);
+
+ pkey_write_allow(pkey);
+
+ if (enforce) {
+ ret = munmap(ptr, size);
+ assert(!ret);
+ }
+
+ clean_single_address_with_pkey(pkey, ptr, size);
+}
+
+// mmap two address (continuous two pages).
+// assign PKEY to them with one mprotect_pkey call (merged address).
+// munmap two address in one call (merged address).
+void test_munmap_two_address_merge(bool enforce)
+{
+ int pkey;
+ int ret;
+ void *ptr;
+ void *ptr2;
+ int size = PAGE_SIZE;
+
+ LOG_TEST_ENTER(enforce);
+
+ setup_two_continues_fixed_address_with_pkey(enforce, size, &pkey, &ptr,
+ &ptr2);
+
+ // disable write.
+ pkey_write_deny(pkey);
+
+ // munmap on both addresses with one call (merged).
+ ret = munmap(ptr, size * 2);
+ if (enforce)
+ assert(ret < 0);
+ else
+ assert(!ret);
+
+ pkey_write_allow(pkey);
+
+ if (enforce) {
+ ret = munmap(ptr, size * 2);
+ assert(!ret);
+ }
+
+ ret = sys_pkey_free(pkey);
+ assert(ret == 0);
+}
+
+// mmap two address (continuous two pages).
+// assign PKEY to the second address.
+// munmap on the second address is protected.
+void test_munmap_two_address_deny_second(bool enforce)
+{
+ int pkey;
+ int ret;
+ void *ptr;
+ void *ptr2;
+ int size = PAGE_SIZE;
+
+ LOG_TEST_ENTER(enforce);
+
+ setup_two_continues_fixed_address_protect_second_with_pkey(
+ enforce, size, &pkey, &ptr, &ptr2);
+
+ // disable write through pkey.
+ pkey_write_deny(pkey);
+
+ ret = munmap(ptr2, size);
+ if (enforce)
+ assert(ret < 0);
+ else
+ assert(!ret);
+
+ ret = munmap(ptr, size);
+ assert(!ret);
+
+ pkey_write_allow(pkey);
+
+ if (enforce) {
+ ret = munmap(ptr2, size);
+ assert(!ret);
+ }
+
+ ret = sys_pkey_free(pkey);
+ assert(ret == 0);
+}
+
+// mmap two address (continuous two pages).
+// assign PKEY to the second address.
+// munmap on the range that includes the second address.
+void test_munmap_two_address_deny_range(bool enforce)
+{
+ int pkey;
+ int ret;
+ void *ptr;
+ void *ptr2;
+ int size = PAGE_SIZE;
+
+ LOG_TEST_ENTER(enforce);
+
+ setup_two_continues_fixed_address_protect_second_with_pkey(
+ enforce, size, &pkey, &ptr, &ptr2);
+
+ // disable write through pkey.
+ pkey_write_deny(pkey);
+
+ ret = munmap(ptr, size * 2);
+ if (enforce)
+ assert(ret < 0);
+ else
+ assert(!ret);
+
+ pkey_write_allow(pkey);
+
+ if (enforce) {
+ ret = munmap(ptr, size * 2);
+ assert(!ret);
+ }
+
+ ret = sys_pkey_free(pkey);
+ assert(ret == 0);
+}
+
+// mmap one address with 4 pages.
+// assign PKEY to the second page only.
+// munmap on memory range that includes the second pages is protected.
+void test_munmap_vma_middle_addr(bool enforce)
+{
+ int pkey;
+ int ret;
+ void *ptr, *ptr2, *ptr3;
+ int size = PAGE_SIZE;
+
+ LOG_TEST_ENTER(enforce);
+
+ setup_4pages_fixed_protect_second_page(enforce, size, &pkey, &ptr,
+ &ptr2, &ptr3);
+
+ // disable write through pkey.
+ pkey_write_deny(pkey);
+
+ // munmap support merge, we are going to make sure we don't regress.
+ ret = munmap(addr1, size * 4);
+ if (enforce)
+ assert(ret < 0);
+ else
+ assert(!ret);
+
+ pkey_write_allow(pkey);
+
+ if (enforce) {
+ ret = munmap(ptr, size * 4);
+ assert(!ret);
+ }
+
+ ret = sys_pkey_free(pkey);
+ assert(ret == 0);
+}
+
+// mmap one address with 4 pages.
+// assign PKEY to the second page only.
+// munmap from 2nd page.
+void test_munmap_shrink(bool enforce)
+{
+ int pkey;
+ int ret;
+ void *ptr, *ptr2, *ptr3;
+ int size = PAGE_SIZE;
+
+ LOG_TEST_ENTER(enforce);
+
+ setup_4pages_fixed_protect_second_page(enforce, size, &pkey, &ptr,
+ &ptr2, &ptr3);
+
+ // disable write through pkey.
+ pkey_write_deny(pkey);
+
+ // munmap support merge, we are going to make sure we don't regress.
+ ret = munmap(ptr2, size * 3);
+ if (enforce)
+ assert(ret < 0);
+ else
+ assert(!ret);
+
+ pkey_write_allow(pkey);
+
+ if (enforce) {
+ ret = munmap(ptr2, size * 3);
+ assert(!ret);
+ }
+
+ ret = munmap(ptr, size);
+ assert(!ret);
+
+ ret = sys_pkey_free(pkey);
+ assert(ret == 0);
+}
+
+// mmap one address with 4 pages.
+// assign PKEY to the second page only.
+// munmap from 2nd page but size is less than one page
+void test_munmap_unaligned(bool enforce)
+{
+ int pkey;
+ int ret;
+ void *ptr, *ptr2, *ptr3;
+ int size = PAGE_SIZE;
+
+ LOG_TEST_ENTER(enforce);
+
+ setup_4pages_fixed_protect_second_page(enforce, size, &pkey, &ptr,
+ &ptr2, &ptr3);
+
+ // disable write through pkey.
+ pkey_write_deny(pkey);
+
+ // munmap support merge, we are going to make sure we don't regress.
+ ret = munmap(ptr2, size - 1);
+ if (enforce)
+ assert(ret < 0);
+ else
+ assert(!ret);
+
+ pkey_write_allow(pkey);
+
+ if (enforce) {
+ ret = munmap(ptr2, size - 1);
+ assert(!ret);
+ }
+
+ ret = munmap(ptr, size * 4);
+ assert(!ret);
+
+ ret = sys_pkey_free(pkey);
+ assert(ret == 0);
+}
+
+// mmap one address with 4 pages.
+// assign PKEY to the second page only.
+// munmap from 2nd page but size is less than one page
+void test_munmap_unaligned2(bool enforce)
+{
+ int pkey;
+ int ret;
+ void *ptr, *ptr2, *ptr3;
+ int size = PAGE_SIZE;
+
+ LOG_TEST_ENTER(enforce);
+
+ setup_4pages_fixed_protect_second_page(enforce, size, &pkey, &ptr,
+ &ptr2, &ptr3);
+
+ // disable write through pkey.
+ pkey_write_deny(pkey);
+
+ // munmap support merge, we are going to make sure we don't regress.
+ ret = munmap(ptr2, size + 1);
+ if (enforce)
+ assert(ret < 0);
+ else
+ assert(!ret);
+
+ pkey_write_allow(pkey);
+
+ if (enforce) {
+ ret = munmap(ptr2, size + 1);
+ assert(!ret);
+ }
+
+ ret = munmap(ptr, size * 4);
+ assert(!ret);
+
+ ret = sys_pkey_free(pkey);
+ assert(ret == 0);
+}
+
+// mmap one address with one page.
+// assign PKEY to the address.
+// munmap on the address but with size of 4 pages(should OK).
+void test_munmap_outbound_addr(bool enforce)
+{
+ int pkey;
+ int ret;
+ void *ptr;
+ int size = PAGE_SIZE;
+
+ LOG_TEST_ENTER(enforce);
+
+ setup_single_fixed_address_with_pkey(enforce, size, &pkey, &ptr);
+
+ // disable write through pkey.
+ pkey_write_deny(pkey);
+
+ // Interesting enough, this is allowed, even the other 3 pages are not allocated.
+ ret = munmap(addr1, size * 4);
+ if (enforce)
+ assert(ret < 0);
+ else
+ assert(!ret);
+
+ pkey_write_allow(pkey);
+
+ if (enforce) {
+ ret = munmap(addr1, size * 4);
+ assert(!ret);
+ }
+
+ clean_single_address_with_pkey(pkey, ptr, size);
+}
+// mmap two addresses, with a page gap between two.
+// assign pkeys on both address.
+// disable access to the second address.
+// munmap from start of address1 to the end of address 2,
+// because there is a gap in the memory range, mprotect will fail.
+void test_munmap_gapped_address_with_two_pkeys(bool enforce)
+{
+ int pkey, pkey2;
+ int ret;
+ void *ptr, *ptr2;
+ int size = PAGE_SIZE;
+
+ LOG_TEST_ENTER(enforce);
+
+ setup_address_with_gap_two_pkeys(enforce, size, &pkey, &pkey2, &ptr,
+ &ptr2);
+
+ // disable write access.
+ pkey_write_deny(pkey2);
+
+ // Interesting enough, this is allowed, even there is a gap beween address 1 and 2.
+ ret = munmap(addr1, size * 3);
+ if (enforce)
+ assert(ret < 0);
+ else
+ assert(!ret);
+
+ pkey_write_allow(pkey2);
+ if (enforce) {
+ ret = munmap(addr1, size * 3);
+ assert(!ret);
+ }
+}
+
+// use write-deny pkey and see if program can exit properly.
+// This is manual test, run it at end if needed.
+void test_exit_munmap_disable_write(void)
+{
+ int pkey;
+ int ret;
+ void *ptr;
+ int size = PAGE_SIZE;
+
+ pkey = sys_pkey_alloc(PKEY_ENFORCE_API, 0);
+ assert(pkey > 0);
+
+ // allocate 1 page.
+ ptr = mmap(addr1, size, PROT_READ,
+ MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
+ assert(ptr == addr1);
+
+ // assign pkey to the first address.
+ ret = sys_mprotect_pkey(ptr, size, PROT_READ | PROT_WRITE | PROT_EXEC,
+ pkey);
+ assert(!ret);
+
+ // disable write through pkey.
+ pkey_write_deny(pkey);
+
+ ret = munmap(ptr, size);
+ assert(ret < 0);
+}
+
+// use disable-all pkey and see if program can exit properly.
+// This is manual test, run it at end if needed.
+void test_exit_munmap_disable_all(void)
+{
+ int pkey;
+ int ret;
+ void *ptr;
+ int size = PAGE_SIZE;
+
+ pkey = sys_pkey_alloc(PKEY_ENFORCE_API, 0);
+ assert(pkey > 0);
+
+ // allocate 1 page.
+ ptr = mmap(addr2, size, PROT_READ,
+ MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
+ assert(ptr == addr2);
+
+ // assign pkey to the first address.
+ ret = sys_mprotect_pkey(ptr, size, PROT_READ | PROT_WRITE | PROT_EXEC,
+ pkey);
+ assert(!ret);
+
+ // disable write through pkey.
+ pkey_access_deny(pkey);
+
+ ret = munmap(addr1, size);
+ assert(ret < 0);
+}
+
void test_enforce_api(void)
{
for (int i = 0; i < 2; i++) {
@@ -848,7 +1271,21 @@ void test_enforce_api(void)
test_mprotect_unaligned2(enforce);
test_mprotect_child_thread(enforce);
test_mprotect_gapped_address_with_two_pkeys(enforce);
+
+ test_munmap_single_address(enforce);
+ test_munmap_two_address_merge(enforce);
+ test_munmap_two_address_deny_second(enforce);
+ test_munmap_two_address_deny_range(enforce);
+ test_munmap_vma_middle_addr(enforce);
+ test_munmap_outbound_addr(enforce);
+ test_munmap_shrink(enforce);
+ test_munmap_unaligned(enforce);
+ test_munmap_unaligned2(enforce);
+ test_munmap_gapped_address_with_two_pkeys(enforce);
}
+
+ test_exit_munmap_disable_write();
+ test_exit_munmap_disable_all();
}
int main(void)
--
2.40.1.606.ga4b1b128d6-goog
This is updating code comments from v0.
There are on-going discussions related to threat-model and io_uring
which we can use the V0 thread.
On Thu, May 18, 2023 at 6:19 PM <[email protected]> wrote:
>
> From: Jeff Xu <[email protected]>
>
> This is the first set of Memory mapping (VMA) protection patches using PKU.
>
> * * *
>
> Background:
>
> As discussed previously in the kernel mailing list [1], V8 CFI [2] uses
> PKU to protect memory, and Stephen Röttger proposes to extend the PKU to
> memory mapping [3].
>
> We're using PKU for in-process isolation to enforce control-flow integrity
> for a JIT compiler. In our threat model, an attacker exploits a
> vulnerability and has arbitrary read/write access to the whole process
> space concurrently to other threads being executed. This attacker can
> manipulate some arguments to syscalls from some threads.
>
> Under such a powerful attack, we want to create a “safe/isolated”
> thread environment. We assign dedicated PKUs to this thread,
> and use those PKUs to protect the threads’ runtime environment.
> The thread has exclusive access to its run-time memory. This
> includes modifying the protection of the memory mapping, or
> munmap the memory mapping after use. And the other threads
> won’t be able to access the memory or modify the memory mapping
> (VMA) belonging to the thread.
>
> * * *
>
> Proposed changes:
>
> This patch introduces a new flag, PKEY_ENFORCE_API, to the pkey_alloc()
> function. When a PKEY is created with this flag, it is enforced that any
> thread that wants to make changes to the memory mapping (such as mprotect)
> of the memory must have write access to the PKEY. PKEYs created without
> this flag will continue to work as they do now, for backwards
> compatibility.
>
> Only PKEY created from user space can have the new flag set, the PKEY
> allocated by the kernel internally will not have it. In other words,
> ARCH_DEFAULT_PKEY(0) and execute_only_pkey won’t have this flag set,
> and continue work as today.
>
> This flag is checked only at syscall entry, such as mprotect/munmap in
> this set of patches. It will not apply to other call paths. In other
> words, if the kernel want to change attributes of VMA for some reasons,
> the kernel is free to do that and not affected by this new flag.
>
> This set of patch covers mprotect/munmap, I plan to work on other
> syscalls after this.
>
> * * *
>
> Testing:
>
> I have tested this patch on a Linux kernel 5.15, 6,1, and 6.4-rc1,
> new selftest is added in: pkey_enforce_api.c
>
> * * *
>
> Discussion:
>
> We believe that this patch provides a valuable security feature.
> It allows us to create “safe/isolated” thread environments that are
> protected from attackers with arbitrary read/write access to
> the process space.
>
> We believe that the interface change and the patch don't
> introduce backwards compatibility risk.
>
> We would like to disucss this patch in Linux kernel community
> for feedback and support.
>
> * * *
>
> Reference:
>
> [1]https://lore.kernel.org/all/202208221331.71C50A6F@keescook/
> [2]https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXgeaRHo/edit?usp=sharing
> [3]https://docs.google.com/document/d/1qqVoVfRiF2nRylL3yjZyCQvzQaej1HRPh3f5wj1AS9I/edit
>
> * * *
> Current status:
>
> There are on-going discussion related to threat model, io_uring, we will continue discuss using v0 thread.
>
> * * *
> PATCH history:
>
> v1: update code related review comments:
> mprotect.c:
> remove syscall from do_mprotect_pkey()
> remove pr_warn_ratelimited
>
> munmap.c:
> change syscall to enum caller_origin
> remove pr_warn_ratelimited
>
> v0:
> https://lore.kernel.org/linux-mm/[email protected]/
>
> Best Regards,
> -Jeff Xu
>
>
> Jeff Xu (6):
> PKEY: Introduce PKEY_ENFORCE_API flag
> PKEY: Add arch_check_pkey_enforce_api()
> PKEY: Apply PKEY_ENFORCE_API to mprotect
> PKEY:selftest pkey_enforce_api for mprotect
> PKEY: Apply PKEY_ENFORCE_API to munmap
> PKEY:selftest pkey_enforce_api for munmap
>
> arch/powerpc/include/asm/pkeys.h | 19 +-
> arch/x86/include/asm/mmu.h | 7 +
> arch/x86/include/asm/pkeys.h | 92 +-
> arch/x86/mm/pkeys.c | 2 +-
> include/linux/mm.h | 8 +-
> include/linux/pkeys.h | 18 +-
> include/uapi/linux/mman.h | 5 +
> mm/mmap.c | 31 +-
> mm/mprotect.c | 17 +-
> mm/mremap.c | 6 +-
> tools/testing/selftests/mm/Makefile | 1 +
> tools/testing/selftests/mm/pkey_enforce_api.c | 1312 +++++++++++++++++
> 12 files changed, 1499 insertions(+), 19 deletions(-)
> create mode 100644 tools/testing/selftests/mm/pkey_enforce_api.c
>
>
> base-commit: ba0ad6ed89fd5dada3b7b65ef2b08e95d449d4ab
> --
> 2.40.1.606.ga4b1b128d6-goog
>
From: Jeff Xu <[email protected]>
This patch adds an architecture-independent function,
arch_check_pkey_enforce_api(), that checks whether the calling thread
has write access to the PKRU for a given range of memory. If the
memory range is protected by PKEY_ENFORCE_API, then the thread must
have write access to the PKRU in order to make changes to the memory
mapping (such as mprotect/munmap).
This function is used by the kernel to enforce the
PKEY_ENFORCE_API flag.
Signed-off-by: Jeff Xu<[email protected]>
---
arch/powerpc/include/asm/pkeys.h | 8 +++++
arch/x86/include/asm/pkeys.h | 50 ++++++++++++++++++++++++++++++++
include/linux/pkeys.h | 9 ++++++
3 files changed, 67 insertions(+)
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 943333ac0fee..24c481e5e95b 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -177,5 +177,13 @@ static inline bool arch_check_pkey_alloc_flags(unsigned long flags)
return true;
}
+static inline int arch_check_pkey_enforce_api(struct mm_struct *mm,
+ unsigned long start,
+ unsigned long end)
+{
+ /* Allow by default */
+ return 0;
+}
+
extern void pkey_mm_init(struct mm_struct *mm);
#endif /*_ASM_POWERPC_KEYS_H */
diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index ecadf04a8251..8b94ffc4ca32 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -161,4 +161,54 @@ static inline bool arch_check_pkey_alloc_flags(unsigned long flags)
return true;
}
+
+static inline int __arch_check_vma_pkey_for_write(struct vm_area_struct *vma)
+{
+ int pkey = vma_pkey(vma);
+
+ if (mm_pkey_enforce_api(vma->vm_mm, pkey)) {
+ if (!__pkru_allows_write(read_pkru(), pkey))
+ return -EACCES;
+ }
+
+ return 0;
+}
+
+/*
+ * arch_check_pkey_enforce_api is used by the kernel to enforce
+ * PKEY_ENFORCE_API flag.
+ * It checks whether the calling thread has write access to the PKRU
+ * for a given range of memory. If the memory range is protected by
+ * PKEY_ENFORCE_API, then the thread must have write access to the
+ * PKRU in order to make changes to the memory mapping, such as
+ * mprotect/munmap.
+ */
+static inline int arch_check_pkey_enforce_api(struct mm_struct *mm,
+ unsigned long start,
+ unsigned long end)
+{
+ int error;
+ struct vm_area_struct *vma;
+
+ if (!arch_pkeys_enabled())
+ return 0;
+
+ while (true) {
+ vma = find_vma_intersection(mm, start, end);
+ if (!vma)
+ break;
+
+ error = __arch_check_vma_pkey_for_write(vma);
+ if (error)
+ return error;
+
+ if (vma->vm_end >= end)
+ break;
+
+ start = vma->vm_end;
+ }
+
+ return 0;
+}
+
#endif /*_ASM_X86_PKEYS_H */
diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h
index 81a482c3e051..7b00689e1c24 100644
--- a/include/linux/pkeys.h
+++ b/include/linux/pkeys.h
@@ -53,6 +53,15 @@ static inline bool arch_check_pkey_alloc_flags(unsigned long flags)
return false;
return true;
}
+
+static inline int arch_check_pkey_enforce_api(struct mm_struct *mm,
+ unsigned long start,
+ unsigned long end)
+{
+ // Allow by default.
+ return 0;
+}
+
#endif /* ! CONFIG_ARCH_HAS_PKEYS */
#endif /* _LINUX_PKEYS_H */
--
2.40.1.606.ga4b1b128d6-goog