2021-05-14 05:29:24

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH v19 0/8] mm: introduce memfd_secret system call to create "secret" memory areas

From: Mike Rapoport <[email protected]>

Hi,

@Andrew, this is based on v5.13-rc1, I can rebase whatever way you prefer.

This is an implementation of "secret" mappings backed by a file descriptor.

The file descriptor backing secret memory mappings is created using a
dedicated memfd_secret system call The desired protection mode for the
memory is configured using flags parameter of the system call. The mmap()
of the file descriptor created with memfd_secret() will create a "secret"
memory mapping. The pages in that mapping will be marked as not present in
the direct map and will be present only in the page table of the owning mm.

Although normally Linux userspace mappings are protected from other users,
such secret mappings are useful for environments where a hostile tenant is
trying to trick the kernel into giving them access to other tenants
mappings.

It's designed to provide the following protections:

* Enhanced protection (in conjunction with all the other in-kernel
attack prevention systems) against ROP attacks. Seceretmem makes "simple"
ROP insufficient to perform exfiltration, which increases the required
complexity of the attack. Along with other protections like the kernel
stack size limit and address space layout randomization which make finding
gadgets is really hard, absence of any in-kernel primitive for accessing
secret memory means the one gadget ROP attack can't work. Since the only
way to access secret memory is to reconstruct the missing mapping entry,
the attacker has to recover the physical page and insert a PTE pointing to
it in the kernel and then retrieve the contents. That takes at least three
gadgets which is a level of difficulty beyond most standard attacks.

* Prevent cross-process secret userspace memory exposures. Once the secret
memory is allocated, the user can't accidentally pass it into the kernel to
be transmitted somewhere. The secreremem pages cannot be accessed via the
direct map and they are disallowed in GUP.

* Harden against exploited kernel flaws. In order to access secretmem, a
kernel-side attack would need to either walk the page tables and create new
ones, or spawn a new privileged uiserspace process to perform secrets
exfiltration using ptrace.

In the future the secret mappings may be used as a mean to protect guest memory
in a virtual machine host.

For demonstration of secret memory usage we've created a userspace library

https://git.kernel.org/pub/scm/linux/kernel/git/jejb/secret-memory-preloader.git

that does two things: the first is act as a preloader for openssl to
redirect all the OPENSSL_malloc calls to secret memory meaning any secret
keys get automatically protected this way and the other thing it does is
expose the API to the user who needs it. We anticipate that a lot of the
use cases would be like the openssl one: many toolkits that deal with
secret keys already have special handling for the memory to try to give
them greater protection, so this would simply be pluggable into the
toolkits without any need for user application modification.

Hiding secret memory mappings behind an anonymous file allows usage of
the page cache for tracking pages allocated for the "secret" mappings as
well as using address_space_operations for e.g. page migration callbacks.

The anonymous file may be also used implicitly, like hugetlb files, to
implement mmap(MAP_SECRET) and use the secret memory areas with "native" mm
ABIs in the future.

Removing of the pages from the direct map may cause its fragmentation on
architectures that use large pages to map the physical memory which affects
the system performance. However, the original Kconfig text for
CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "... can
improve the kernel's performance a tiny bit ..." (commit 00d1c5e05736
("x86: add gbpages switches")) and the recent report [1] showed that "...
although 1G mappings are a good default choice, there is no compelling
evidence that it must be the only choice". Hence, it is sufficient to have
secretmem disabled by default with the ability of a system administrator to
enable it at boot time.

In addition, there is also a long term goal to improve management of the
direct map.

[1] https://lore.kernel.org/linux-mm/[email protected]/

v19:
* block /dev/mem mmap access, per David
* disallow mmap/mprotect with PROT_EXEC, per Kees
* simplify return in page_is_secretmem(), per Matthew
* use unsigned int for syscall falgs, per Yury

v18: https://lore.kernel.org/lkml/[email protected]
* rebase on v5.12-rc1
* merge kfence fix into the original patch
* massage commit message of the patch introducing the memfd_secret syscall

v17: https://lore.kernel.org/lkml/[email protected]
* Remove pool of large pages backing secretmem allocations, per Michal Hocko
* Add secretmem pages to unevictable LRU, per Michal Hocko
* Use GFP_HIGHUSER as secretmem mapping mask, per Michal Hocko
* Make secretmem an opt-in feature that is disabled by default

v16: https://lore.kernel.org/lkml/[email protected]
* Fix memory leak intorduced in v15
* Clean the data left from previous page user before handing the page to
the userspace

v15: https://lore.kernel.org/lkml/[email protected]
* Add riscv/Kconfig update to disable set_memory operations for nommu
builds (patch 3)
* Update the code around add_to_page_cache() per Matthew's comments
(patches 6,7)
* Add fixups for build/checkpatch errors discovered by CI systems

Older history:
v14: https://lore.kernel.org/lkml/[email protected]
v13: https://lore.kernel.org/lkml/[email protected]
v12: https://lore.kernel.org/lkml/[email protected]
v11: https://lore.kernel.org/lkml/[email protected]
v10: https://lore.kernel.org/lkml/[email protected]
v9: https://lore.kernel.org/lkml/[email protected]
v8: https://lore.kernel.org/lkml/[email protected]
v7: https://lore.kernel.org/lkml/[email protected]
v6: https://lore.kernel.org/lkml/[email protected]
v5: https://lore.kernel.org/lkml/[email protected]
v4: https://lore.kernel.org/lkml/[email protected]
v3: https://lore.kernel.org/lkml/[email protected]
v2: https://lore.kernel.org/lkml/[email protected]
v1: https://lore.kernel.org/lkml/[email protected]
rfc-v2: https://lore.kernel.org/lkml/[email protected]/
rfc-v1: https://lore.kernel.org/lkml/20200130162340.GA14232@rapoport-lnx/
rfc-v0: https://lore.kernel.org/lkml/[email protected]/

Mike Rapoport (8):
mmap: make mlock_future_check() global
riscv/Kconfig: make direct map manipulation options depend on MMU
set_memory: allow set_direct_map_*_noflush() for multiple pages
set_memory: allow querying whether set_direct_map_*() is actually enabled
mm: introduce memfd_secret system call to create "secret" memory areas
PM: hibernate: disable when there are active secretmem users
arch, mm: wire up memfd_secret system call where relevant
secretmem: test: add basic selftest for memfd_secret(2)

arch/arm64/include/asm/Kbuild | 1 -
arch/arm64/include/asm/cacheflush.h | 6 -
arch/arm64/include/asm/kfence.h | 2 +-
arch/arm64/include/asm/set_memory.h | 17 ++
arch/arm64/include/uapi/asm/unistd.h | 1 +
arch/arm64/kernel/machine_kexec.c | 1 +
arch/arm64/mm/mmu.c | 6 +-
arch/arm64/mm/pageattr.c | 23 +-
arch/riscv/Kconfig | 4 +-
arch/riscv/include/asm/set_memory.h | 4 +-
arch/riscv/include/asm/unistd.h | 1 +
arch/riscv/mm/pageattr.c | 8 +-
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/x86/include/asm/set_memory.h | 4 +-
arch/x86/mm/pat/set_memory.c | 8 +-
drivers/char/mem.c | 4 +
include/linux/secretmem.h | 54 ++++
include/linux/set_memory.h | 16 +-
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 7 +-
include/uapi/linux/magic.h | 1 +
kernel/power/hibernate.c | 5 +-
kernel/power/snapshot.c | 4 +-
kernel/sys_ni.c | 2 +
mm/Kconfig | 4 +
mm/Makefile | 1 +
mm/gup.c | 12 +
mm/internal.h | 3 +
mm/mlock.c | 3 +-
mm/mmap.c | 5 +-
mm/secretmem.c | 254 +++++++++++++++++++
mm/vmalloc.c | 5 +-
scripts/checksyscalls.sh | 4 +
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 3 +-
tools/testing/selftests/vm/memfd_secret.c | 296 ++++++++++++++++++++++
tools/testing/selftests/vm/run_vmtests.sh | 17 ++
38 files changed, 744 insertions(+), 46 deletions(-)
create mode 100644 arch/arm64/include/asm/set_memory.h
create mode 100644 include/linux/secretmem.h
create mode 100644 mm/secretmem.c
create mode 100644 tools/testing/selftests/vm/memfd_secret.c


base-commit: 6efb943b8616ec53a5e444193dccf1af9ad627b5
--
2.28.0



2021-05-14 05:29:35

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH v19 1/8] mmap: make mlock_future_check() global

From: Mike Rapoport <[email protected]>

It will be used by the upcoming secret memory implementation.

Signed-off-by: Mike Rapoport <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christopher Lameter <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Elena Reshetova <[email protected]>
Cc: Hagen Paul Pfeifer <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rick Edgecombe <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tycho Andersen <[email protected]>
Cc: Will Deacon <[email protected]>
---
mm/internal.h | 3 +++
mm/mmap.c | 5 ++---
2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 54bd0dc2c23c..46eb82eaa195 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -373,6 +373,9 @@ static inline void munlock_vma_pages_all(struct vm_area_struct *vma)
extern void mlock_vma_page(struct page *page);
extern unsigned int munlock_vma_page(struct page *page);

+extern int mlock_future_check(struct mm_struct *mm, unsigned long flags,
+ unsigned long len);
+
/*
* Clear the page's PageMlocked(). This can be useful in a situation where
* we want to unconditionally remove a page from the pagecache -- e.g.,
diff --git a/mm/mmap.c b/mm/mmap.c
index 0584e540246e..81f5595a8490 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1352,9 +1352,8 @@ static inline unsigned long round_hint_to_min(unsigned long hint)
return hint;
}

-static inline int mlock_future_check(struct mm_struct *mm,
- unsigned long flags,
- unsigned long len)
+int mlock_future_check(struct mm_struct *mm, unsigned long flags,
+ unsigned long len)
{
unsigned long locked, lock_limit;

--
2.28.0


2021-05-14 05:32:06

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH v19 6/8] PM: hibernate: disable when there are active secretmem users

From: Mike Rapoport <[email protected]>

It is unsafe to allow saving of secretmem areas to the hibernation
snapshot as they would be visible after the resume and this essentially
will defeat the purpose of secret memory mappings.

Prevent hibernation whenever there are active secret memory users.

Signed-off-by: Mike Rapoport <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christopher Lameter <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Elena Reshetova <[email protected]>
Cc: Hagen Paul Pfeifer <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rick Edgecombe <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tycho Andersen <[email protected]>
Cc: Will Deacon <[email protected]>
---
include/linux/secretmem.h | 6 ++++++
kernel/power/hibernate.c | 5 ++++-
mm/secretmem.c | 15 +++++++++++++++
3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
index e617b4afcc62..21c3771e6a56 100644
--- a/include/linux/secretmem.h
+++ b/include/linux/secretmem.h
@@ -30,6 +30,7 @@ static inline bool page_is_secretmem(struct page *page)
}

bool vma_is_secretmem(struct vm_area_struct *vma);
+bool secretmem_active(void);

#else

@@ -43,6 +44,11 @@ static inline bool page_is_secretmem(struct page *page)
return false;
}

+static inline bool secretmem_active(void)
+{
+ return false;
+}
+
#endif /* CONFIG_SECRETMEM */

#endif /* _LINUX_SECRETMEM_H */
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index da0b41914177..559acef3fddb 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -31,6 +31,7 @@
#include <linux/genhd.h>
#include <linux/ktime.h>
#include <linux/security.h>
+#include <linux/secretmem.h>
#include <trace/events/power.h>

#include "power.h"
@@ -81,7 +82,9 @@ void hibernate_release(void)

bool hibernation_available(void)
{
- return nohibernate == 0 && !security_locked_down(LOCKDOWN_HIBERNATION);
+ return nohibernate == 0 &&
+ !security_locked_down(LOCKDOWN_HIBERNATION) &&
+ !secretmem_active();
}

/**
diff --git a/mm/secretmem.c b/mm/secretmem.c
index 1ae50089adf1..7c2499e4de22 100644
--- a/mm/secretmem.c
+++ b/mm/secretmem.c
@@ -40,6 +40,13 @@ module_param_named(enable, secretmem_enable, bool, 0400);
MODULE_PARM_DESC(secretmem_enable,
"Enable secretmem and memfd_secret(2) system call");

+static atomic_t secretmem_users;
+
+bool secretmem_active(void)
+{
+ return !!atomic_read(&secretmem_users);
+}
+
static vm_fault_t secretmem_fault(struct vm_fault *vmf)
{
struct address_space *mapping = vmf->vma->vm_file->f_mapping;
@@ -94,6 +101,12 @@ static const struct vm_operations_struct secretmem_vm_ops = {
.fault = secretmem_fault,
};

+static int secretmem_release(struct inode *inode, struct file *file)
+{
+ atomic_dec(&secretmem_users);
+ return 0;
+}
+
static int secretmem_mmap(struct file *file, struct vm_area_struct *vma)
{
unsigned long len = vma->vm_end - vma->vm_start;
@@ -116,6 +129,7 @@ bool vma_is_secretmem(struct vm_area_struct *vma)
}

static const struct file_operations secretmem_fops = {
+ .release = secretmem_release,
.mmap = secretmem_mmap,
};

@@ -202,6 +216,7 @@ SYSCALL_DEFINE1(memfd_secret, unsigned int, flags)
file->f_flags |= O_LARGEFILE;

fd_install(fd, file);
+ atomic_inc(&secretmem_users);
return fd;

err_put_fd:
--
2.28.0


2021-05-14 05:32:48

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH v19 7/8] arch, mm: wire up memfd_secret system call where relevant

From: Mike Rapoport <[email protected]>

Wire up memfd_secret system call on architectures that define
ARCH_HAS_SET_DIRECT_MAP, namely arm64, risc-v and x86.

Signed-off-by: Mike Rapoport <[email protected]>
Acked-by: Palmer Dabbelt <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christopher Lameter <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Elena Reshetova <[email protected]>
Cc: Hagen Paul Pfeifer <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rick Edgecombe <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tycho Andersen <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/uapi/asm/unistd.h | 1 +
arch/riscv/include/asm/unistd.h | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 7 ++++++-
scripts/checksyscalls.sh | 4 ++++
7 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/uapi/asm/unistd.h b/arch/arm64/include/uapi/asm/unistd.h
index f83a70e07df8..ce2ee8f1e361 100644
--- a/arch/arm64/include/uapi/asm/unistd.h
+++ b/arch/arm64/include/uapi/asm/unistd.h
@@ -20,5 +20,6 @@
#define __ARCH_WANT_SET_GET_RLIMIT
#define __ARCH_WANT_TIME32_SYSCALLS
#define __ARCH_WANT_SYS_CLONE3
+#define __ARCH_WANT_MEMFD_SECRET

#include <asm-generic/unistd.h>
diff --git a/arch/riscv/include/asm/unistd.h b/arch/riscv/include/asm/unistd.h
index 977ee6181dab..6c316093a1e5 100644
--- a/arch/riscv/include/asm/unistd.h
+++ b/arch/riscv/include/asm/unistd.h
@@ -9,6 +9,7 @@
*/

#define __ARCH_WANT_SYS_CLONE
+#define __ARCH_WANT_MEMFD_SECRET

#include <uapi/asm/unistd.h>

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 28a1423ce32e..e44519020a43 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -451,3 +451,4 @@
444 i386 landlock_create_ruleset sys_landlock_create_ruleset
445 i386 landlock_add_rule sys_landlock_add_rule
446 i386 landlock_restrict_self sys_landlock_restrict_self
+447 i386 memfd_secret sys_memfd_secret
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index ecd551b08d05..a06f16106f24 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -368,6 +368,7 @@
444 common landlock_create_ruleset sys_landlock_create_ruleset
445 common landlock_add_rule sys_landlock_add_rule
446 common landlock_restrict_self sys_landlock_restrict_self
+447 common memfd_secret sys_memfd_secret

#
# Due to a historical design error, certain syscalls are numbered differently
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 050511e8f1f8..1a1b5d724497 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1050,6 +1050,7 @@ asmlinkage long sys_landlock_create_ruleset(const struct landlock_ruleset_attr _
asmlinkage long sys_landlock_add_rule(int ruleset_fd, enum landlock_rule_type rule_type,
const void __user *rule_attr, __u32 flags);
asmlinkage long sys_landlock_restrict_self(int ruleset_fd, __u32 flags);
+asmlinkage long sys_memfd_secret(unsigned int flags);

/*
* Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 6de5a7fc066b..28b388368cf6 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -873,8 +873,13 @@ __SYSCALL(__NR_landlock_add_rule, sys_landlock_add_rule)
#define __NR_landlock_restrict_self 446
__SYSCALL(__NR_landlock_restrict_self, sys_landlock_restrict_self)

+#ifdef __ARCH_WANT_MEMFD_SECRET
+#define __NR_memfd_secret 447
+__SYSCALL(__NR_memfd_secret, sys_memfd_secret)
+#endif
+
#undef __NR_syscalls
-#define __NR_syscalls 447
+#define __NR_syscalls 448

/*
* 32 bit systems traditionally used different
diff --git a/scripts/checksyscalls.sh b/scripts/checksyscalls.sh
index a18b47695f55..b7609958ee36 100755
--- a/scripts/checksyscalls.sh
+++ b/scripts/checksyscalls.sh
@@ -40,6 +40,10 @@ cat << EOF
#define __IGNORE_setrlimit /* setrlimit */
#endif

+#ifndef __ARCH_WANT_MEMFD_SECRET
+#define __IGNORE_memfd_secret
+#endif
+
/* Missing flags argument */
#define __IGNORE_renameat /* renameat2 */

--
2.28.0


2021-05-14 06:31:49

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH v19 2/8] riscv/Kconfig: make direct map manipulation options depend on MMU

From: Mike Rapoport <[email protected]>

ARCH_HAS_SET_DIRECT_MAP and ARCH_HAS_SET_MEMORY configuration options have
no meaning when CONFIG_MMU is disabled and there is no point to enable
them for the nommu case.

Add an explicit dependency on MMU for these options.

Signed-off-by: Mike Rapoport <[email protected]>
Reported-by: kernel test robot <[email protected]>
---
arch/riscv/Kconfig | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index a8ad8eb76120..c426e7d20907 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -26,8 +26,8 @@ config RISCV
select ARCH_HAS_KCOV
select ARCH_HAS_MMIOWB
select ARCH_HAS_PTE_SPECIAL
- select ARCH_HAS_SET_DIRECT_MAP
- select ARCH_HAS_SET_MEMORY
+ select ARCH_HAS_SET_DIRECT_MAP if MMU
+ select ARCH_HAS_SET_MEMORY if MMU
select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
select ARCH_HAS_STRICT_MODULE_RWX if MMU && !XIP_KERNEL
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
--
2.28.0


2021-05-14 06:32:14

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH v19 4/8] set_memory: allow querying whether set_direct_map_*() is actually enabled

From: Mike Rapoport <[email protected]>

On arm64, set_direct_map_*() functions may return 0 without actually
changing the linear map. This behaviour can be controlled using kernel
parameters, so we need a way to determine at runtime whether calls to
set_direct_map_invalid_noflush() and set_direct_map_default_noflush() have
any effect.

Extend set_memory API with can_set_direct_map() function that allows
checking if calling set_direct_map_*() will actually change the page
table, replace several occurrences of open coded checks in arm64 with the
new function and provide a generic stub for architectures that always
modify page tables upon calls to set_direct_map APIs.

[[email protected]: arm64: kfence: fix header inclusion ]

Signed-off-by: Mike Rapoport <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christopher Lameter <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Elena Reshetova <[email protected]>
Cc: Hagen Paul Pfeifer <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rick Edgecombe <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tycho Andersen <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/asm/Kbuild | 1 -
arch/arm64/include/asm/cacheflush.h | 6 ------
arch/arm64/include/asm/kfence.h | 2 +-
arch/arm64/include/asm/set_memory.h | 17 +++++++++++++++++
arch/arm64/kernel/machine_kexec.c | 1 +
arch/arm64/mm/mmu.c | 6 +++---
arch/arm64/mm/pageattr.c | 13 +++++++++----
include/linux/set_memory.h | 12 ++++++++++++
8 files changed, 43 insertions(+), 15 deletions(-)
create mode 100644 arch/arm64/include/asm/set_memory.h

diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index 07ac208edc89..73aa25843f65 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -3,5 +3,4 @@ generic-y += early_ioremap.h
generic-y += mcs_spinlock.h
generic-y += qrwlock.h
generic-y += qspinlock.h
-generic-y += set_memory.h
generic-y += user.h
diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index ace2c3d7ae7e..4e3c13799735 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -131,12 +131,6 @@ static __always_inline void __flush_icache_all(void)
dsb(ish);
}

-int set_memory_valid(unsigned long addr, int numpages, int enable);
-
-int set_direct_map_invalid_noflush(struct page *page, int numpages);
-int set_direct_map_default_noflush(struct page *page, int numpages);
-bool kernel_page_present(struct page *page);
-
#include <asm-generic/cacheflush.h>

#endif /* __ASM_CACHEFLUSH_H */
diff --git a/arch/arm64/include/asm/kfence.h b/arch/arm64/include/asm/kfence.h
index d061176d57ea..aa855c6a0ae6 100644
--- a/arch/arm64/include/asm/kfence.h
+++ b/arch/arm64/include/asm/kfence.h
@@ -8,7 +8,7 @@
#ifndef __ASM_KFENCE_H
#define __ASM_KFENCE_H

-#include <asm/cacheflush.h>
+#include <asm/set_memory.h>

static inline bool arch_kfence_init_pool(void) { return true; }

diff --git a/arch/arm64/include/asm/set_memory.h b/arch/arm64/include/asm/set_memory.h
new file mode 100644
index 000000000000..ecb6b0f449ab
--- /dev/null
+++ b/arch/arm64/include/asm/set_memory.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_ARM64_SET_MEMORY_H
+#define _ASM_ARM64_SET_MEMORY_H
+
+#include <asm-generic/set_memory.h>
+
+bool can_set_direct_map(void);
+#define can_set_direct_map can_set_direct_map
+
+int set_memory_valid(unsigned long addr, int numpages, int enable);
+
+int set_direct_map_invalid_noflush(struct page *page, int numpages);
+int set_direct_map_default_noflush(struct page *page, int numpages);
+bool kernel_page_present(struct page *page);
+
+#endif /* _ASM_ARM64_SET_MEMORY_H */
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index 90a335c74442..0ec94e718724 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -11,6 +11,7 @@
#include <linux/kernel.h>
#include <linux/kexec.h>
#include <linux/page-flags.h>
+#include <linux/set_memory.h>
#include <linux/smp.h>

#include <asm/cacheflush.h>
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 6dd9369e3ea0..e42aeff6c344 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -22,6 +22,7 @@
#include <linux/io.h>
#include <linux/mm.h>
#include <linux/vmalloc.h>
+#include <linux/set_memory.h>

#include <asm/barrier.h>
#include <asm/cputype.h>
@@ -515,7 +516,7 @@ static void __init map_mem(pgd_t *pgdp)
*/
BUILD_BUG_ON(pgd_index(direct_map_end - 1) == pgd_index(direct_map_end));

- if (rodata_full || crash_mem_map || debug_pagealloc_enabled())
+ if (can_set_direct_map() || crash_mem_map)
flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;

/*
@@ -1483,8 +1484,7 @@ int arch_add_memory(int nid, u64 start, u64 size,
* KFENCE requires linear map to be mapped at page granularity, so that
* it is possible to protect/unprotect single pages in the KFENCE pool.
*/
- if (rodata_full || debug_pagealloc_enabled() ||
- IS_ENABLED(CONFIG_KFENCE))
+ if (can_set_direct_map() || IS_ENABLED(CONFIG_KFENCE))
flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;

__create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index b53ef37bf95a..d505172265b0 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -19,6 +19,11 @@ struct page_change_data {

bool rodata_full __ro_after_init = IS_ENABLED(CONFIG_RODATA_FULL_DEFAULT_ENABLED);

+bool can_set_direct_map(void)
+{
+ return rodata_full || debug_pagealloc_enabled();
+}
+
static int change_page_range(pte_t *ptep, unsigned long addr, void *data)
{
struct page_change_data *cdata = data;
@@ -156,7 +161,7 @@ int set_direct_map_invalid_noflush(struct page *page, int numpages)
};
unsigned long size = PAGE_SIZE * numpages;

- if (!debug_pagealloc_enabled() && !rodata_full)
+ if (!can_set_direct_map())
return 0;

return apply_to_page_range(&init_mm,
@@ -172,7 +177,7 @@ int set_direct_map_default_noflush(struct page *page, int numpages)
};
unsigned long size = PAGE_SIZE * numpages;

- if (!debug_pagealloc_enabled() && !rodata_full)
+ if (!can_set_direct_map())
return 0;

return apply_to_page_range(&init_mm,
@@ -183,7 +188,7 @@ int set_direct_map_default_noflush(struct page *page, int numpages)
#ifdef CONFIG_DEBUG_PAGEALLOC
void __kernel_map_pages(struct page *page, int numpages, int enable)
{
- if (!debug_pagealloc_enabled() && !rodata_full)
+ if (!can_set_direct_map())
return;

set_memory_valid((unsigned long)page_address(page), numpages, enable);
@@ -208,7 +213,7 @@ bool kernel_page_present(struct page *page)
pte_t *ptep;
unsigned long addr = (unsigned long)page_address(page);

- if (!debug_pagealloc_enabled() && !rodata_full)
+ if (!can_set_direct_map())
return true;

pgdp = pgd_offset_k(addr);
diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
index c650f82db813..7b4b6626032d 100644
--- a/include/linux/set_memory.h
+++ b/include/linux/set_memory.h
@@ -28,7 +28,19 @@ static inline bool kernel_page_present(struct page *page)
{
return true;
}
+#else /* CONFIG_ARCH_HAS_SET_DIRECT_MAP */
+/*
+ * Some architectures, e.g. ARM64 can disable direct map modifications at
+ * boot time. Let them overrive this query.
+ */
+#ifndef can_set_direct_map
+static inline bool can_set_direct_map(void)
+{
+ return true;
+}
+#define can_set_direct_map can_set_direct_map
#endif
+#endif /* CONFIG_ARCH_HAS_SET_DIRECT_MAP */

#ifndef set_mce_nospec
static inline int set_mce_nospec(unsigned long pfn, bool unmap)
--
2.28.0


2021-05-14 06:34:28

by Mike Rapoport

[permalink] [raw]
Subject: [PATCH v19 8/8] secretmem: test: add basic selftest for memfd_secret(2)

From: Mike Rapoport <[email protected]>

The test verifies that file descriptor created with memfd_secret does not
allow read/write operations, that secret memory mappings respect
RLIMIT_MEMLOCK and that remote accesses with process_vm_read() and
ptrace() to the secret memory fail.

Signed-off-by: Mike Rapoport <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christopher Lameter <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Elena Reshetova <[email protected]>
Cc: Hagen Paul Pfeifer <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rick Edgecombe <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tycho Andersen <[email protected]>
Cc: Will Deacon <[email protected]>
---
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 3 +-
tools/testing/selftests/vm/memfd_secret.c | 296 ++++++++++++++++++++++
tools/testing/selftests/vm/run_vmtests.sh | 17 ++
4 files changed, 316 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/vm/memfd_secret.c

diff --git a/tools/testing/selftests/vm/.gitignore b/tools/testing/selftests/vm/.gitignore
index 1f651e85ed60..da92ded5a27c 100644
--- a/tools/testing/selftests/vm/.gitignore
+++ b/tools/testing/selftests/vm/.gitignore
@@ -21,5 +21,6 @@ va_128TBswitch
map_fixed_noreplace
write_to_hugetlbfs
hmm-tests
+memfd_secret
local_config.*
split_huge_page_test
diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
index 73e1cc96d7c2..266580ea938c 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -34,6 +34,7 @@ TEST_GEN_FILES += khugepaged
TEST_GEN_FILES += map_fixed_noreplace
TEST_GEN_FILES += map_hugetlb
TEST_GEN_FILES += map_populate
+TEST_GEN_FILES += memfd_secret
TEST_GEN_FILES += mlock-random-test
TEST_GEN_FILES += mlock2-tests
TEST_GEN_FILES += mremap_dontunmap
@@ -134,7 +135,7 @@ warn_32bit_failure:
endif
endif

-$(OUTPUT)/mlock-random-test: LDLIBS += -lcap
+$(OUTPUT)/mlock-random-test $(OUTPUT)/memfd_secret: LDLIBS += -lcap

$(OUTPUT)/gup_test: ../../../../mm/gup_test.h

diff --git a/tools/testing/selftests/vm/memfd_secret.c b/tools/testing/selftests/vm/memfd_secret.c
new file mode 100644
index 000000000000..2462f52e9c96
--- /dev/null
+++ b/tools/testing/selftests/vm/memfd_secret.c
@@ -0,0 +1,296 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright IBM Corporation, 2020
+ *
+ * Author: Mike Rapoport <[email protected]>
+ */
+
+#define _GNU_SOURCE
+#include <sys/uio.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+#include <sys/types.h>
+#include <sys/ptrace.h>
+#include <sys/syscall.h>
+#include <sys/resource.h>
+#include <sys/capability.h>
+
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <errno.h>
+#include <stdio.h>
+
+#include "../kselftest.h"
+
+#define fail(fmt, ...) ksft_test_result_fail(fmt, ##__VA_ARGS__)
+#define pass(fmt, ...) ksft_test_result_pass(fmt, ##__VA_ARGS__)
+#define skip(fmt, ...) ksft_test_result_skip(fmt, ##__VA_ARGS__)
+
+#ifdef __NR_memfd_secret
+
+#define PATTERN 0x55
+
+static const int prot = PROT_READ | PROT_WRITE;
+static const int mode = MAP_SHARED;
+
+static unsigned long page_size;
+static unsigned long mlock_limit_cur;
+static unsigned long mlock_limit_max;
+
+static int memfd_secret(unsigned int flags)
+{
+ return syscall(__NR_memfd_secret, flags);
+}
+
+static void test_file_apis(int fd)
+{
+ char buf[64];
+
+ if ((read(fd, buf, sizeof(buf)) >= 0) ||
+ (write(fd, buf, sizeof(buf)) >= 0) ||
+ (pread(fd, buf, sizeof(buf), 0) >= 0) ||
+ (pwrite(fd, buf, sizeof(buf), 0) >= 0))
+ fail("unexpected file IO\n");
+ else
+ pass("file IO is blocked as expected\n");
+}
+
+static void test_mlock_limit(int fd)
+{
+ size_t len;
+ char *mem;
+
+ len = mlock_limit_cur;
+ mem = mmap(NULL, len, prot, mode, fd, 0);
+ if (mem == MAP_FAILED) {
+ fail("unable to mmap secret memory\n");
+ return;
+ }
+ munmap(mem, len);
+
+ len = mlock_limit_max * 2;
+ mem = mmap(NULL, len, prot, mode, fd, 0);
+ if (mem != MAP_FAILED) {
+ fail("unexpected mlock limit violation\n");
+ munmap(mem, len);
+ return;
+ }
+
+ pass("mlock limit is respected\n");
+}
+
+static void try_process_vm_read(int fd, int pipefd[2])
+{
+ struct iovec liov, riov;
+ char buf[64];
+ char *mem;
+
+ if (read(pipefd[0], &mem, sizeof(mem)) < 0) {
+ fail("pipe write: %s\n", strerror(errno));
+ exit(KSFT_FAIL);
+ }
+
+ liov.iov_len = riov.iov_len = sizeof(buf);
+ liov.iov_base = buf;
+ riov.iov_base = mem;
+
+ if (process_vm_readv(getppid(), &liov, 1, &riov, 1, 0) < 0) {
+ if (errno == ENOSYS)
+ exit(KSFT_SKIP);
+ exit(KSFT_PASS);
+ }
+
+ exit(KSFT_FAIL);
+}
+
+static void try_ptrace(int fd, int pipefd[2])
+{
+ pid_t ppid = getppid();
+ int status;
+ char *mem;
+ long ret;
+
+ if (read(pipefd[0], &mem, sizeof(mem)) < 0) {
+ perror("pipe write");
+ exit(KSFT_FAIL);
+ }
+
+ ret = ptrace(PTRACE_ATTACH, ppid, 0, 0);
+ if (ret) {
+ perror("ptrace_attach");
+ exit(KSFT_FAIL);
+ }
+
+ ret = waitpid(ppid, &status, WUNTRACED);
+ if ((ret != ppid) || !(WIFSTOPPED(status))) {
+ fprintf(stderr, "weird waitppid result %ld stat %x\n",
+ ret, status);
+ exit(KSFT_FAIL);
+ }
+
+ if (ptrace(PTRACE_PEEKDATA, ppid, mem, 0))
+ exit(KSFT_PASS);
+
+ exit(KSFT_FAIL);
+}
+
+static void check_child_status(pid_t pid, const char *name)
+{
+ int status;
+
+ waitpid(pid, &status, 0);
+
+ if (WIFEXITED(status) && WEXITSTATUS(status) == KSFT_SKIP) {
+ skip("%s is not supported\n", name);
+ return;
+ }
+
+ if ((WIFEXITED(status) && WEXITSTATUS(status) == KSFT_PASS) ||
+ WIFSIGNALED(status)) {
+ pass("%s is blocked as expected\n", name);
+ return;
+ }
+
+ fail("%s: unexpected memory access\n", name);
+}
+
+static void test_remote_access(int fd, const char *name,
+ void (*func)(int fd, int pipefd[2]))
+{
+ int pipefd[2];
+ pid_t pid;
+ char *mem;
+
+ if (pipe(pipefd)) {
+ fail("pipe failed: %s\n", strerror(errno));
+ return;
+ }
+
+ pid = fork();
+ if (pid < 0) {
+ fail("fork failed: %s\n", strerror(errno));
+ return;
+ }
+
+ if (pid == 0) {
+ func(fd, pipefd);
+ return;
+ }
+
+ mem = mmap(NULL, page_size, prot, mode, fd, 0);
+ if (mem == MAP_FAILED) {
+ fail("Unable to mmap secret memory\n");
+ return;
+ }
+
+ ftruncate(fd, page_size);
+ memset(mem, PATTERN, page_size);
+
+ if (write(pipefd[1], &mem, sizeof(mem)) < 0) {
+ fail("pipe write: %s\n", strerror(errno));
+ return;
+ }
+
+ check_child_status(pid, name);
+}
+
+static void test_process_vm_read(int fd)
+{
+ test_remote_access(fd, "process_vm_read", try_process_vm_read);
+}
+
+static void test_ptrace(int fd)
+{
+ test_remote_access(fd, "ptrace", try_ptrace);
+}
+
+static int set_cap_limits(rlim_t max)
+{
+ struct rlimit new;
+ cap_t cap = cap_init();
+
+ new.rlim_cur = max;
+ new.rlim_max = max;
+ if (setrlimit(RLIMIT_MEMLOCK, &new)) {
+ perror("setrlimit() returns error");
+ return -1;
+ }
+
+ /* drop capabilities including CAP_IPC_LOCK */
+ if (cap_set_proc(cap)) {
+ perror("cap_set_proc() returns error");
+ return -2;
+ }
+
+ return 0;
+}
+
+static void prepare(void)
+{
+ struct rlimit rlim;
+
+ page_size = sysconf(_SC_PAGE_SIZE);
+ if (!page_size)
+ ksft_exit_fail_msg("Failed to get page size %s\n",
+ strerror(errno));
+
+ if (getrlimit(RLIMIT_MEMLOCK, &rlim))
+ ksft_exit_fail_msg("Unable to detect mlock limit: %s\n",
+ strerror(errno));
+
+ mlock_limit_cur = rlim.rlim_cur;
+ mlock_limit_max = rlim.rlim_max;
+
+ printf("page_size: %ld, mlock.soft: %ld, mlock.hard: %ld\n",
+ page_size, mlock_limit_cur, mlock_limit_max);
+
+ if (page_size > mlock_limit_cur)
+ mlock_limit_cur = page_size;
+ if (page_size > mlock_limit_max)
+ mlock_limit_max = page_size;
+
+ if (set_cap_limits(mlock_limit_max))
+ ksft_exit_fail_msg("Unable to set mlock limit: %s\n",
+ strerror(errno));
+}
+
+#define NUM_TESTS 4
+
+int main(int argc, char *argv[])
+{
+ int fd;
+
+ prepare();
+
+ ksft_print_header();
+ ksft_set_plan(NUM_TESTS);
+
+ fd = memfd_secret(0);
+ if (fd < 0) {
+ if (errno == ENOSYS)
+ ksft_exit_skip("memfd_secret is not supported\n");
+ else
+ ksft_exit_fail_msg("memfd_secret failed: %s\n",
+ strerror(errno));
+ }
+
+ test_mlock_limit(fd);
+ test_file_apis(fd);
+ test_process_vm_read(fd);
+ test_ptrace(fd);
+
+ close(fd);
+
+ ksft_exit(!ksft_get_fail_cnt());
+}
+
+#else /* __NR_memfd_secret */
+
+int main(int argc, char *argv[])
+{
+ printf("skip: skipping memfd_secret test (missing __NR_memfd_secret)\n");
+ return KSFT_SKIP;
+}
+
+#endif /* __NR_memfd_secret */
diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/selftests/vm/run_vmtests.sh
index e953f3cd9664..95a67382f132 100755
--- a/tools/testing/selftests/vm/run_vmtests.sh
+++ b/tools/testing/selftests/vm/run_vmtests.sh
@@ -346,4 +346,21 @@ else
exitcode=1
fi

+echo "running memfd_secret test"
+echo "------------------------------------"
+./memfd_secret
+ret_val=$?
+
+if [ $ret_val -eq 0 ]; then
+ echo "[PASS]"
+elif [ $ret_val -eq $ksft_skip ]; then
+ echo "[SKIP]"
+ exitcode=$ksft_skip
+else
+ echo "[FAIL]"
+ exitcode=1
+fi
+
+exit $exitcode
+
exit $exitcode
--
2.28.0


2021-05-14 10:24:59

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v19 1/8] mmap: make mlock_future_check() global

On 13.05.21 20:47, Mike Rapoport wrote:
> From: Mike Rapoport <[email protected]>
>
> It will be used by the upcoming secret memory implementation.
>
> Signed-off-by: Mike Rapoport <[email protected]>
> Cc: Alexander Viro <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Christopher Lameter <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: David Hildenbrand <[email protected]>
> Cc: Elena Reshetova <[email protected]>
> Cc: Hagen Paul Pfeifer <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: James Bottomley <[email protected]>
> Cc: "Kirill A. Shutemov" <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Michael Kerrisk <[email protected]>
> Cc: Palmer Dabbelt <[email protected]>
> Cc: Palmer Dabbelt <[email protected]>
> Cc: Paul Walmsley <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Rick Edgecombe <[email protected]>
> Cc: Roman Gushchin <[email protected]>
> Cc: Shakeel Butt <[email protected]>
> Cc: Shuah Khan <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Tycho Andersen <[email protected]>
> Cc: Will Deacon <[email protected]>
> ---
> mm/internal.h | 3 +++
> mm/mmap.c | 5 ++---
> 2 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 54bd0dc2c23c..46eb82eaa195 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -373,6 +373,9 @@ static inline void munlock_vma_pages_all(struct vm_area_struct *vma)
> extern void mlock_vma_page(struct page *page);
> extern unsigned int munlock_vma_page(struct page *page);
>
> +extern int mlock_future_check(struct mm_struct *mm, unsigned long flags,
> + unsigned long len);
> +
> /*
> * Clear the page's PageMlocked(). This can be useful in a situation where
> * we want to unconditionally remove a page from the pagecache -- e.g.,
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 0584e540246e..81f5595a8490 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1352,9 +1352,8 @@ static inline unsigned long round_hint_to_min(unsigned long hint)
> return hint;
> }
>
> -static inline int mlock_future_check(struct mm_struct *mm,
> - unsigned long flags,
> - unsigned long len)
> +int mlock_future_check(struct mm_struct *mm, unsigned long flags,
> + unsigned long len)
> {
> unsigned long locked, lock_limit;
>
>

Reviewed-by: David Hildenbrand <[email protected]>

--
Thanks,

David / dhildenb


2021-05-14 10:40:30

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v19 7/8] arch, mm: wire up memfd_secret system call where relevant

On 13.05.21 20:47, Mike Rapoport wrote:
> From: Mike Rapoport <[email protected]>
>
> Wire up memfd_secret system call on architectures that define
> ARCH_HAS_SET_DIRECT_MAP, namely arm64, risc-v and x86.
>
> Signed-off-by: Mike Rapoport <[email protected]>
> Acked-by: Palmer Dabbelt <[email protected]>
> Acked-by: Arnd Bergmann <[email protected]>
> Acked-by: Catalin Marinas <[email protected]>
> Cc: Alexander Viro <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Christopher Lameter <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: David Hildenbrand <[email protected]>
> Cc: Elena Reshetova <[email protected]>
> Cc: Hagen Paul Pfeifer <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: James Bottomley <[email protected]>
> Cc: "Kirill A. Shutemov" <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Michael Kerrisk <[email protected]>
> Cc: Palmer Dabbelt <[email protected]>
> Cc: Paul Walmsley <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Rick Edgecombe <[email protected]>
> Cc: Roman Gushchin <[email protected]>
> Cc: Shakeel Butt <[email protected]>
> Cc: Shuah Khan <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Tycho Andersen <[email protected]>
> Cc: Will Deacon <[email protected]>
> ---
> arch/arm64/include/uapi/asm/unistd.h | 1 +
> arch/riscv/include/asm/unistd.h | 1 +
> arch/x86/entry/syscalls/syscall_32.tbl | 1 +
> arch/x86/entry/syscalls/syscall_64.tbl | 1 +
> include/linux/syscalls.h | 1 +
> include/uapi/asm-generic/unistd.h | 7 ++++++-
> scripts/checksyscalls.sh | 4 ++++
> 7 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/uapi/asm/unistd.h b/arch/arm64/include/uapi/asm/unistd.h
> index f83a70e07df8..ce2ee8f1e361 100644
> --- a/arch/arm64/include/uapi/asm/unistd.h
> +++ b/arch/arm64/include/uapi/asm/unistd.h
> @@ -20,5 +20,6 @@
> #define __ARCH_WANT_SET_GET_RLIMIT
> #define __ARCH_WANT_TIME32_SYSCALLS
> #define __ARCH_WANT_SYS_CLONE3
> +#define __ARCH_WANT_MEMFD_SECRET
>
> #include <asm-generic/unistd.h>
> diff --git a/arch/riscv/include/asm/unistd.h b/arch/riscv/include/asm/unistd.h
> index 977ee6181dab..6c316093a1e5 100644
> --- a/arch/riscv/include/asm/unistd.h
> +++ b/arch/riscv/include/asm/unistd.h
> @@ -9,6 +9,7 @@
> */
>
> #define __ARCH_WANT_SYS_CLONE
> +#define __ARCH_WANT_MEMFD_SECRET
>
> #include <uapi/asm/unistd.h>
>
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index 28a1423ce32e..e44519020a43 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -451,3 +451,4 @@
> 444 i386 landlock_create_ruleset sys_landlock_create_ruleset
> 445 i386 landlock_add_rule sys_landlock_add_rule
> 446 i386 landlock_restrict_self sys_landlock_restrict_self
> +447 i386 memfd_secret sys_memfd_secret
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index ecd551b08d05..a06f16106f24 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -368,6 +368,7 @@
> 444 common landlock_create_ruleset sys_landlock_create_ruleset
> 445 common landlock_add_rule sys_landlock_add_rule
> 446 common landlock_restrict_self sys_landlock_restrict_self
> +447 common memfd_secret sys_memfd_secret
>
> #
> # Due to a historical design error, certain syscalls are numbered differently
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 050511e8f1f8..1a1b5d724497 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -1050,6 +1050,7 @@ asmlinkage long sys_landlock_create_ruleset(const struct landlock_ruleset_attr _
> asmlinkage long sys_landlock_add_rule(int ruleset_fd, enum landlock_rule_type rule_type,
> const void __user *rule_attr, __u32 flags);
> asmlinkage long sys_landlock_restrict_self(int ruleset_fd, __u32 flags);
> +asmlinkage long sys_memfd_secret(unsigned int flags);
>
> /*
> * Architecture-specific system calls
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index 6de5a7fc066b..28b388368cf6 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -873,8 +873,13 @@ __SYSCALL(__NR_landlock_add_rule, sys_landlock_add_rule)
> #define __NR_landlock_restrict_self 446
> __SYSCALL(__NR_landlock_restrict_self, sys_landlock_restrict_self)
>
> +#ifdef __ARCH_WANT_MEMFD_SECRET
> +#define __NR_memfd_secret 447
> +__SYSCALL(__NR_memfd_secret, sys_memfd_secret)
> +#endif
> +
> #undef __NR_syscalls
> -#define __NR_syscalls 447
> +#define __NR_syscalls 448
>
> /*
> * 32 bit systems traditionally used different
> diff --git a/scripts/checksyscalls.sh b/scripts/checksyscalls.sh
> index a18b47695f55..b7609958ee36 100755
> --- a/scripts/checksyscalls.sh
> +++ b/scripts/checksyscalls.sh
> @@ -40,6 +40,10 @@ cat << EOF
> #define __IGNORE_setrlimit /* setrlimit */
> #endif
>
> +#ifndef __ARCH_WANT_MEMFD_SECRET
> +#define __IGNORE_memfd_secret
> +#endif
> +
> /* Missing flags argument */
> #define __IGNORE_renameat /* renameat2 */
>
>

Acked-by: David Hildenbrand <[email protected]>

--
Thanks,

David / dhildenb


2021-05-14 14:22:24

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v19 2/8] riscv/Kconfig: make direct map manipulation options depend on MMU

On 13.05.21 20:47, Mike Rapoport wrote:
> From: Mike Rapoport <[email protected]>
>
> ARCH_HAS_SET_DIRECT_MAP and ARCH_HAS_SET_MEMORY configuration options have
> no meaning when CONFIG_MMU is disabled and there is no point to enable
> them for the nommu case.
>
> Add an explicit dependency on MMU for these options.
>
> Signed-off-by: Mike Rapoport <[email protected]>
> Reported-by: kernel test robot <[email protected]>
> ---
> arch/riscv/Kconfig | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index a8ad8eb76120..c426e7d20907 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -26,8 +26,8 @@ config RISCV
> select ARCH_HAS_KCOV
> select ARCH_HAS_MMIOWB
> select ARCH_HAS_PTE_SPECIAL
> - select ARCH_HAS_SET_DIRECT_MAP
> - select ARCH_HAS_SET_MEMORY
> + select ARCH_HAS_SET_DIRECT_MAP if MMU
> + select ARCH_HAS_SET_MEMORY if MMU
> select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
> select ARCH_HAS_STRICT_MODULE_RWX if MMU && !XIP_KERNEL
> select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>

Reviewed-by: David Hildenbrand <[email protected]>

--
Thanks,

David / dhildenb


2021-05-14 15:29:45

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v19 6/8] PM: hibernate: disable when there are active secretmem users

On 13.05.21 20:47, Mike Rapoport wrote:
> From: Mike Rapoport <[email protected]>
>
> It is unsafe to allow saving of secretmem areas to the hibernation
> snapshot as they would be visible after the resume and this essentially
> will defeat the purpose of secret memory mappings.
>
> Prevent hibernation whenever there are active secret memory users.
>
> Signed-off-by: Mike Rapoport <[email protected]>
> Cc: Alexander Viro <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Christopher Lameter <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: David Hildenbrand <[email protected]>
> Cc: Elena Reshetova <[email protected]>
> Cc: Hagen Paul Pfeifer <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: James Bottomley <[email protected]>
> Cc: "Kirill A. Shutemov" <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Michael Kerrisk <[email protected]>
> Cc: Palmer Dabbelt <[email protected]>
> Cc: Palmer Dabbelt <[email protected]>
> Cc: Paul Walmsley <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Rick Edgecombe <[email protected]>
> Cc: Roman Gushchin <[email protected]>
> Cc: Shakeel Butt <[email protected]>
> Cc: Shuah Khan <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Tycho Andersen <[email protected]>
> Cc: Will Deacon <[email protected]>
> ---
> include/linux/secretmem.h | 6 ++++++
> kernel/power/hibernate.c | 5 ++++-
> mm/secretmem.c | 15 +++++++++++++++
> 3 files changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
> index e617b4afcc62..21c3771e6a56 100644
> --- a/include/linux/secretmem.h
> +++ b/include/linux/secretmem.h
> @@ -30,6 +30,7 @@ static inline bool page_is_secretmem(struct page *page)
> }
>
> bool vma_is_secretmem(struct vm_area_struct *vma);
> +bool secretmem_active(void);
>
> #else
>
> @@ -43,6 +44,11 @@ static inline bool page_is_secretmem(struct page *page)
> return false;
> }
>
> +static inline bool secretmem_active(void)
> +{
> + return false;
> +}
> +
> #endif /* CONFIG_SECRETMEM */
>
> #endif /* _LINUX_SECRETMEM_H */
> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> index da0b41914177..559acef3fddb 100644
> --- a/kernel/power/hibernate.c
> +++ b/kernel/power/hibernate.c
> @@ -31,6 +31,7 @@
> #include <linux/genhd.h>
> #include <linux/ktime.h>
> #include <linux/security.h>
> +#include <linux/secretmem.h>
> #include <trace/events/power.h>
>
> #include "power.h"
> @@ -81,7 +82,9 @@ void hibernate_release(void)
>
> bool hibernation_available(void)
> {
> - return nohibernate == 0 && !security_locked_down(LOCKDOWN_HIBERNATION);
> + return nohibernate == 0 &&
> + !security_locked_down(LOCKDOWN_HIBERNATION) &&
> + !secretmem_active();
> }
>
> /**
> diff --git a/mm/secretmem.c b/mm/secretmem.c
> index 1ae50089adf1..7c2499e4de22 100644
> --- a/mm/secretmem.c
> +++ b/mm/secretmem.c
> @@ -40,6 +40,13 @@ module_param_named(enable, secretmem_enable, bool, 0400);
> MODULE_PARM_DESC(secretmem_enable,
> "Enable secretmem and memfd_secret(2) system call");
>
> +static atomic_t secretmem_users;
> +
> +bool secretmem_active(void)
> +{
> + return !!atomic_read(&secretmem_users);
> +}
> +
> static vm_fault_t secretmem_fault(struct vm_fault *vmf)
> {
> struct address_space *mapping = vmf->vma->vm_file->f_mapping;
> @@ -94,6 +101,12 @@ static const struct vm_operations_struct secretmem_vm_ops = {
> .fault = secretmem_fault,
> };
>
> +static int secretmem_release(struct inode *inode, struct file *file)
> +{
> + atomic_dec(&secretmem_users);
> + return 0;
> +}
> +
> static int secretmem_mmap(struct file *file, struct vm_area_struct *vma)
> {
> unsigned long len = vma->vm_end - vma->vm_start;
> @@ -116,6 +129,7 @@ bool vma_is_secretmem(struct vm_area_struct *vma)
> }
>
> static const struct file_operations secretmem_fops = {
> + .release = secretmem_release,
> .mmap = secretmem_mmap,
> };
>
> @@ -202,6 +216,7 @@ SYSCALL_DEFINE1(memfd_secret, unsigned int, flags)
> file->f_flags |= O_LARGEFILE;
>
> fd_install(fd, file);
> + atomic_inc(&secretmem_users);
> return fd;
>
> err_put_fd:
>

It looks a bit racy, but I guess we don't really care about these corner
cases.

Acked-by: David Hildenbrand <[email protected]>

--
Thanks,

David / dhildenb


2021-05-14 15:39:57

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v19 8/8] secretmem: test: add basic selftest for memfd_secret(2)

On 13.05.21 20:47, Mike Rapoport wrote:
> From: Mike Rapoport <[email protected]>
>
> The test verifies that file descriptor created with memfd_secret does not
> allow read/write operations, that secret memory mappings respect
> RLIMIT_MEMLOCK and that remote accesses with process_vm_read() and
> ptrace() to the secret memory fail.
>

[...]

> @@ -0,0 +1,296 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright IBM Corporation, 2020

2021 ?


--
Thanks,

David / dhildenb


2021-05-19 17:48:49

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v19 6/8] PM: hibernate: disable when there are active secretmem users

On Thu, May 13, 2021 at 09:47:32PM +0300, Mike Rapoport wrote:
> From: Mike Rapoport <[email protected]>
>
> It is unsafe to allow saving of secretmem areas to the hibernation
> snapshot as they would be visible after the resume and this essentially
> will defeat the purpose of secret memory mappings.
>
> Prevent hibernation whenever there are active secret memory users.

Have we thought about how this is going to work in practice, e.g. on
mobile systems? It seems to me that there are a variety of common
applications which might want to use this which people don't expect to
inhibit hibernate (e.g. authentication agents, web browsers).

Are we happy to say that any userspace application can incidentally
inhibit hibernate?

Thanks,
Mark.

> Signed-off-by: Mike Rapoport <[email protected]>
> Cc: Alexander Viro <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Christopher Lameter <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: David Hildenbrand <[email protected]>
> Cc: Elena Reshetova <[email protected]>
> Cc: Hagen Paul Pfeifer <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: James Bottomley <[email protected]>
> Cc: "Kirill A. Shutemov" <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Michael Kerrisk <[email protected]>
> Cc: Palmer Dabbelt <[email protected]>
> Cc: Palmer Dabbelt <[email protected]>
> Cc: Paul Walmsley <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Rick Edgecombe <[email protected]>
> Cc: Roman Gushchin <[email protected]>
> Cc: Shakeel Butt <[email protected]>
> Cc: Shuah Khan <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Tycho Andersen <[email protected]>
> Cc: Will Deacon <[email protected]>
> ---
> include/linux/secretmem.h | 6 ++++++
> kernel/power/hibernate.c | 5 ++++-
> mm/secretmem.c | 15 +++++++++++++++
> 3 files changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
> index e617b4afcc62..21c3771e6a56 100644
> --- a/include/linux/secretmem.h
> +++ b/include/linux/secretmem.h
> @@ -30,6 +30,7 @@ static inline bool page_is_secretmem(struct page *page)
> }
>
> bool vma_is_secretmem(struct vm_area_struct *vma);
> +bool secretmem_active(void);
>
> #else
>
> @@ -43,6 +44,11 @@ static inline bool page_is_secretmem(struct page *page)
> return false;
> }
>
> +static inline bool secretmem_active(void)
> +{
> + return false;
> +}
> +
> #endif /* CONFIG_SECRETMEM */
>
> #endif /* _LINUX_SECRETMEM_H */
> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> index da0b41914177..559acef3fddb 100644
> --- a/kernel/power/hibernate.c
> +++ b/kernel/power/hibernate.c
> @@ -31,6 +31,7 @@
> #include <linux/genhd.h>
> #include <linux/ktime.h>
> #include <linux/security.h>
> +#include <linux/secretmem.h>
> #include <trace/events/power.h>
>
> #include "power.h"
> @@ -81,7 +82,9 @@ void hibernate_release(void)
>
> bool hibernation_available(void)
> {
> - return nohibernate == 0 && !security_locked_down(LOCKDOWN_HIBERNATION);
> + return nohibernate == 0 &&
> + !security_locked_down(LOCKDOWN_HIBERNATION) &&
> + !secretmem_active();
> }
>
> /**
> diff --git a/mm/secretmem.c b/mm/secretmem.c
> index 1ae50089adf1..7c2499e4de22 100644
> --- a/mm/secretmem.c
> +++ b/mm/secretmem.c
> @@ -40,6 +40,13 @@ module_param_named(enable, secretmem_enable, bool, 0400);
> MODULE_PARM_DESC(secretmem_enable,
> "Enable secretmem and memfd_secret(2) system call");
>
> +static atomic_t secretmem_users;
> +
> +bool secretmem_active(void)
> +{
> + return !!atomic_read(&secretmem_users);
> +}
> +
> static vm_fault_t secretmem_fault(struct vm_fault *vmf)
> {
> struct address_space *mapping = vmf->vma->vm_file->f_mapping;
> @@ -94,6 +101,12 @@ static const struct vm_operations_struct secretmem_vm_ops = {
> .fault = secretmem_fault,
> };
>
> +static int secretmem_release(struct inode *inode, struct file *file)
> +{
> + atomic_dec(&secretmem_users);
> + return 0;
> +}
> +
> static int secretmem_mmap(struct file *file, struct vm_area_struct *vma)
> {
> unsigned long len = vma->vm_end - vma->vm_start;
> @@ -116,6 +129,7 @@ bool vma_is_secretmem(struct vm_area_struct *vma)
> }
>
> static const struct file_operations secretmem_fops = {
> + .release = secretmem_release,
> .mmap = secretmem_mmap,
> };
>
> @@ -202,6 +216,7 @@ SYSCALL_DEFINE1(memfd_secret, unsigned int, flags)
> file->f_flags |= O_LARGEFILE;
>
> fd_install(fd, file);
> + atomic_inc(&secretmem_users);
> return fd;
>
> err_put_fd:
> --
> 2.28.0
>

2021-05-19 17:49:41

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v19 6/8] PM: hibernate: disable when there are active secretmem users

On 18.05.21 12:24, Mark Rutland wrote:
> On Thu, May 13, 2021 at 09:47:32PM +0300, Mike Rapoport wrote:
>> From: Mike Rapoport <[email protected]>
>>
>> It is unsafe to allow saving of secretmem areas to the hibernation
>> snapshot as they would be visible after the resume and this essentially
>> will defeat the purpose of secret memory mappings.
>>
>> Prevent hibernation whenever there are active secret memory users.
>
> Have we thought about how this is going to work in practice, e.g. on
> mobile systems? It seems to me that there are a variety of common
> applications which might want to use this which people don't expect to
> inhibit hibernate (e.g. authentication agents, web browsers).
>
> Are we happy to say that any userspace application can incidentally
> inhibit hibernate?

It's worth noting that secretmem has to be explicitly enabled by the
admin to even work.

--
Thanks,

David / dhildenb


2021-05-19 19:01:36

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH v19 6/8] PM: hibernate: disable when there are active secretmem users

On Tue, 2021-05-18 at 11:24 +0100, Mark Rutland wrote:
> On Thu, May 13, 2021 at 09:47:32PM +0300, Mike Rapoport wrote:
> > From: Mike Rapoport <[email protected]>
> >
> > It is unsafe to allow saving of secretmem areas to the hibernation
> > snapshot as they would be visible after the resume and this
> > essentially will defeat the purpose of secret memory mappings.
> >
> > Prevent hibernation whenever there are active secret memory users.
>
> Have we thought about how this is going to work in practice, e.g. on
> mobile systems? It seems to me that there are a variety of common
> applications which might want to use this which people don't expect
> to inhibit hibernate (e.g. authentication agents, web browsers).

If mobile systems require hibernate, then the choice is to disable this
functionality or implement a secure hibernation store. I also thought
most mobile hibernation was basically equivalent to S3, in which case
there's no actual writing of ram into storage, in which case there's no
security barrier and likely the inhibition needs to be made a bit more
specific to the suspend to disk case?

> Are we happy to say that any userspace application can incidentally
> inhibit hibernate?

Well, yes, for the laptop use case because we don't want suspend to
disk to be able to compromise the secret area. You can disable this
for mobile if you like, or work out how to implement hibernate securely
if you're really suspending to disk.

James



2021-05-19 19:02:33

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v19 6/8] PM: hibernate: disable when there are active secretmem users

On Tue, May 18, 2021 at 6:33 PM James Bottomley <[email protected]> wrote:
>
> On Tue, 2021-05-18 at 11:24 +0100, Mark Rutland wrote:
> > On Thu, May 13, 2021 at 09:47:32PM +0300, Mike Rapoport wrote:
> > > From: Mike Rapoport <[email protected]>
> > >
> > > It is unsafe to allow saving of secretmem areas to the hibernation
> > > snapshot as they would be visible after the resume and this
> > > essentially will defeat the purpose of secret memory mappings.
> > >
> > > Prevent hibernation whenever there are active secret memory users.
> >
> > Have we thought about how this is going to work in practice, e.g. on
> > mobile systems? It seems to me that there are a variety of common
> > applications which might want to use this which people don't expect
> > to inhibit hibernate (e.g. authentication agents, web browsers).
>
> If mobile systems require hibernate, then the choice is to disable this
> functionality or implement a secure hibernation store. I also thought
> most mobile hibernation was basically equivalent to S3, in which case
> there's no actual writing of ram into storage, in which case there's no
> security barrier and likely the inhibition needs to be made a bit more
> specific to the suspend to disk case?
>
> > Are we happy to say that any userspace application can incidentally
> > inhibit hibernate?
>
> Well, yes, for the laptop use case because we don't want suspend to
> disk to be able to compromise the secret area. You can disable this
> for mobile if you like, or work out how to implement hibernate securely
> if you're really suspending to disk.

Forgive me if this was already asked and answered. Why not document
that secretmem is ephemeral in the case of hibernate and push the
problem to userspace to disable hibernation? In other words
hibernation causes applications to need to reload their secretmem, it
will be destroyed on the way down and SIGBUS afterwards. That at least
gives a system the flexibility to either sacrifice hibernate for
secretmem (with a userspace controlled policy), or sacrifice secretmem
using processes for hibernate.

2021-05-19 19:09:48

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH v19 6/8] PM: hibernate: disable when there are active secretmem users

On Tue, 2021-05-18 at 18:49 -0700, Dan Williams wrote:
> On Tue, May 18, 2021 at 6:33 PM James Bottomley <[email protected]>
> wrote:
> > On Tue, 2021-05-18 at 11:24 +0100, Mark Rutland wrote:
> > > On Thu, May 13, 2021 at 09:47:32PM +0300, Mike Rapoport wrote:
> > > > From: Mike Rapoport <[email protected]>
> > > >
> > > > It is unsafe to allow saving of secretmem areas to the
> > > > hibernation snapshot as they would be visible after the resume
> > > > and this essentially will defeat the purpose of secret memory
> > > > mappings.
> > > >
> > > > Prevent hibernation whenever there are active secret memory
> > > > users.
> > >
> > > Have we thought about how this is going to work in practice, e.g.
> > > on mobile systems? It seems to me that there are a variety of
> > > common applications which might want to use this which people
> > > don't expect to inhibit hibernate (e.g. authentication agents,
> > > web browsers).
> >
> > If mobile systems require hibernate, then the choice is to disable
> > this functionality or implement a secure hibernation store. I
> > also thought most mobile hibernation was basically equivalent to
> > S3, in which case there's no actual writing of ram into storage, in
> > which case there's no security barrier and likely the inhibition
> > needs to be made a bit more specific to the suspend to disk case?
> >
> > > Are we happy to say that any userspace application can
> > > incidentally inhibit hibernate?
> >
> > Well, yes, for the laptop use case because we don't want suspend to
> > disk to be able to compromise the secret area. You can disable
> > this for mobile if you like, or work out how to implement hibernate
> > securely if you're really suspending to disk.
>
> Forgive me if this was already asked and answered. Why not document
> that secretmem is ephemeral in the case of hibernate and push the
> problem to userspace to disable hibernation? In other words
> hibernation causes applications to need to reload their secretmem, it
> will be destroyed on the way down and SIGBUS afterwards. That at
> least gives a system the flexibility to either sacrifice hibernate
> for secretmem (with a userspace controlled policy), or sacrifice
> secretmem using processes for hibernate.

Well, realistically, there are many possibilities for embedded if it
wants to use secret memory. However, not really having much of an
interest in the use cases, it's not really for Mike or me to be acting
as armchair fly half. I think the best we can do is demonstrate the
system for our use cases and let embedded kick the tyres for theirs if
they care, and if not they can disable the feature.

James