From: "Madhavan T. Venkataraman" <[email protected]>
Introduction
============
Dynamic code is used in many different user applications. Dynamic code is
often generated at runtime. Dynamic code can also just be a pre-defined
sequence of machine instructions in a data buffer. Examples of dynamic
code are trampolines, JIT code, DBT code, etc.
Dynamic code is placed either in a data page or in a stack page. In order
to execute dynamic code, the page it resides in needs to be mapped with
execute permissions. Writable pages with execute permissions provide an
attack surface for hackers. Attackers can use this to inject malicious
code, modify existing code or do other harm.
To mitigate this, LSMs such as SELinux implement W^X. That is, they may not
allow pages to have both write and execute permissions. This prevents
dynamic code from executing and blocks applications that use it. To allow
genuine applications to run, exceptions have to be made for them (by setting
execmem, etc) which opens the door to security issues.
The W^X implementation today is not complete. There exist many user level
tricks that can be used to load and execute dynamic code. E.g.,
- Load the code into a file and map the file with R-X.
- Load the code in an RW- page. Change the permissions to R--. Then,
change the permissions to R-X.
- Load the code in an RW- page. Remap the page with R-X to get a separate
mapping to the same underlying physical page.
IMO, these are all security holes as an attacker can exploit them to inject
his own code.
In the future, these holes will definitely be closed. For instance, LSMs
(such as the IPE proposal [1]) may only allow code in properly signed object
files to be mapped with execute permissions. This will do two things:
- user level tricks using anonymous pages will fail as anonymous
pages have no file identity
- loading the code in a temporary file and mapping it with R-X
will fail as the temporary file would not have a signature
We need a way to execute such code without making security exceptions.
Trampolines are a good example of dynamic code. A couple of examples
of trampolines are given below. My first use case for this RFC is
libffi.
Examples of trampolines
=======================
libffi (A Portable Foreign Function Interface Library):
libffi allows a user to define functions with an arbitrary list of
arguments and return value through a feature called "Closures".
Closures use trampolines to jump to ABI handlers that handle calling
conventions and call a target function. libffi is used by a lot
of different applications. To name a few:
- Python
- Java
- Javascript
- Ruby FFI
- Lisp
- Objective C
GCC nested functions:
GCC has traditionally used trampolines for implementing nested
functions. The trampoline is placed on the user stack. So, the stack
needs to be executable.
Currently available solution
============================
One solution that has been proposed to allow trampolines to be executed
without making security exceptions is Trampoline Emulation. See:
https://pax.grsecurity.net/docs/emutramp.txt
In this solution, the kernel recognizes certain sequences of instructions
as "well-known" trampolines. When such a trampoline is executed, a page
fault happens because the trampoline page does not have execute permission.
The kernel recognizes the trampoline and emulates it. Basically, the
kernel does the work of the trampoline on behalf of the application.
Currently, the emulated trampolines are the ones used in libffi and GCC
nested functions. To my knowledge, only X86 is supported at this time.
As noted in emutramp.txt, this is not a generic solution. For every new
trampoline that needs to be supported, new instruction sequences need to
be recognized by the kernel and emulated. And this has to be done for
every architecture that needs to be supported.
emutramp.txt notes the following:
"... the real solution is not in emulation but by designing a kernel API
for runtime code generation and modifying userland to make use of it."
Solution proposed in this RFC
=============================
From this RFC's perspective, there are two scenarios for dynamic code:
Scenario 1
----------
We know what code we need only at runtime. For instance, JIT code generated
for frequently executed Java methods. Only at runtime do we know what
methods need to be JIT compiled. Such code cannot be statically defined. It
has to be generated at runtime.
Scenario 2
----------
We know what code we need in advance. User trampolines are a good example of
this. It is possible to define such code statically with some help from the
kernel.
This RFC addresses (2). (1) needs a general purpose trusted code generator
and is out of scope for this RFC.
For (2), the solution is to convert dynamic code to static code and place it
in a source file. The binary generated from the source can be signed. The
kernel can use signature verification to authenticate the binary and
allow the code to be mapped and executed.
The problem is that the static code has to be able to find the data that it
needs when it executes. For functions, the ABI defines the way to pass
parameters. But, for arbitrary dynamic code, there isn't a standard ABI
compliant way to pass data to the code for most architectures. Each instance
of dynamic code defines its own way. For instance, co-location of code and
data and PC-relative data referencing are used in cases where the ISA
supports it.
We need one standard way that would work for all architectures and ABIs.
The solution proposed here is:
1. Write the static code assuming that the data needed by the code is already
pointed to by a designated register.
2. Get the kernel to supply a small universal trampoline that does the
following:
- Load the address of the data in a designated register
- Load the address of the static code in a designated register
- Jump to the static code
User code would use a kernel supplied API to create and map the trampoline.
The address values would be baked into the code so that no special ISA
features are needed.
To conserve memory, the kernel will pack as many trampolines as possible in
a page and provide a trampoline table to user code. The table itself is
managed by the user.
Trampoline File Descriptor (trampfd)
==========================
I am proposing a kernel API using anonymous file descriptors that can be
used to create the trampolines. The API is described in patch 1/4 of this
patchset. I provide a summary here:
- Create a trampoline file object
- Write a code descriptor into the trampoline file and specify:
- the number of trampolines desired
- the name of the code register
- user pointer to a table of code addresses, one address
per trampoline
- Write a data descriptor into the trampoline file and specify:
- the name of the data register
- user pointer to a table of data addresses, one address
per trampoline
- mmap() the trampoline file. The kernel generates a table of
trampolines in a page and returns the trampoline table address
- munmap() a trampoline file mapping
- Close the trampoline file
Each mmap() will only map a single base page. Large pages are not supported.
A trampoline file can only be mapped once in an address space.
Trampoline file mappings cannot be shared across address spaces. So,
sending the trampoline file descriptor over a unix domain socket and
mapping it in another process will not work.
It is recommended that the code descriptor and the code table be placed
in the .rodata section so an attacker cannot modify them.
Trampoline use and reuse
========================
The code for trampoline X in the trampoline table is:
load &code_table[X], code_reg
load (code_reg), code_reg
load &data_table[X], data_reg
load (data_reg), data_reg
jump code_reg
The addresses &code_table[X] and &data_table[X] are baked into the
trampoline code. So, PC-relative data references are not needed. The user
can modify code_table[X] and data_table[X] dynamically.
For instance, within libffi, the same trampoline X can be used for different
closures at different times by setting:
data_table[X] = closure;
code_table[X] = ABI handling code;
Advantages of the Trampoline File Descriptor approach
=====================================================
- Using this support from the kernel, dynamic code can be converted to
static code with a little effort so applications and libraries can move to
a more secure model. In the simplest cases such as libffi, dynamic code can
even be eliminated.
- This initial work is targeted towards X86 and ARM. But it can be supported
easily on all architectures. We don't need any special ISA features such
as PC-relative data referencing.
- The only code generation needed is for this small, universal trampoline.
- The kernel does not have to deal with any ABI issues in the generation of
this trampoline.
- The kernel provides a trampoline table to conserve memory.
- An SELinux setting called "exectramp" can be implemented along the
lines of "execmem", "execstack" and "execheap" to selectively allow the
use of trampolines on a per application basis.
- In version 1, a trip to the kernel was required to execute the trampoline.
In version 2, that is not required. So, there are no performance
concerns in this approach.
libffi
======
I have implemented my solution for libffi and provided the changes for
X86 and ARM, 32-bit and 64-bit. Here is the reference patch:
http://linux.microsoft.com/~madvenka/libffi/libffi.v2.txt
If the trampfd patchset gets accepted, I will send the libffi changes
to the maintainers for a review. BTW, I have also successfully executed
the libffi self tests.
Work that is pending
====================
- I am working on implementing the SELinux setting - "exectramp".
- I have a test program to test the kernel API. I am working on adding it
to selftests.
References
==========
[1] https://microsoft.github.io/ipe/
---
Changelog:
v1
Introduced the Trampfd feature.
v2
- Changed the system call. Version 2 does not support different
trampoline types and their associated type structures. It only
supports a kernel generated trampoline.
The system call now returns information to the user that is
used to define trampoline descriptors. E.g., the maximum
number of trampolines that can be packed in a single page.
- Removed all the trampoline contexts such as register contexts
and stack contexts. This is based on the feedback that the kernel
should not have to worry about ABI issues and H/W features that
may deal with the context of a process.
- Removed the need to make a trip into the kernel on trampoline
invocation. This is based on the feedback about performance.
- Removed the ability to share trampolines across address spaces.
This would have made sense to different trampoline types based
on their semantics. But since I support only one specific
trampoline, sharing does not make sense.
- Added calls to specify trampoline descriptors that the kernel
uses to generate trampolines.
- Added architecture-specific code to generate the small, universal
trampoline for X86 32 and 64-bit, ARM 32 and 64-bit.
- Implemented the trampoline table in a page.
Madhavan T. Venkataraman (4):
Implement the kernel API for the trampoline file descriptor.
Implement i386 and X86 support for the trampoline file descriptor.
Implement ARM64 support for the trampoline file descriptor.
Implement ARM support for the trampoline file descriptor.
arch/arm/include/uapi/asm/ptrace.h | 21 +++
arch/arm/kernel/Makefile | 1 +
arch/arm/kernel/trampfd.c | 124 +++++++++++++
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 +
arch/arm64/include/uapi/asm/ptrace.h | 59 ++++++
arch/arm64/kernel/Makefile | 2 +
arch/arm64/kernel/trampfd.c | 244 +++++++++++++++++++++++++
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/x86/include/uapi/asm/ptrace.h | 38 ++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/trampfd.c | 238 ++++++++++++++++++++++++
fs/Makefile | 1 +
fs/trampfd/Makefile | 5 +
fs/trampfd/trampfd_fops.c | 241 ++++++++++++++++++++++++
fs/trampfd/trampfd_map.c | 142 ++++++++++++++
include/linux/syscalls.h | 2 +
include/linux/trampfd.h | 49 +++++
include/uapi/asm-generic/unistd.h | 4 +-
include/uapi/linux/trampfd.h | 184 +++++++++++++++++++
init/Kconfig | 7 +
kernel/sys_ni.c | 3 +
24 files changed, 1371 insertions(+), 2 deletions(-)
create mode 100644 arch/arm/kernel/trampfd.c
create mode 100644 arch/arm64/kernel/trampfd.c
create mode 100644 arch/x86/kernel/trampfd.c
create mode 100644 fs/trampfd/Makefile
create mode 100644 fs/trampfd/trampfd_fops.c
create mode 100644 fs/trampfd/trampfd_map.c
create mode 100644 include/linux/trampfd.h
create mode 100644 include/uapi/linux/trampfd.h
--
2.17.1
From: "Madhavan T. Venkataraman" <[email protected]>
- Define architecture specific register names
- Architecture specific functions for:
- system call init
- code descriptor check
- data descriptor check
- Fill a page with a trampoline table for:
- 32-bit user process
- 64-bit user process
Signed-off-by: Madhavan T. Venkataraman <[email protected]>
---
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 +
arch/arm64/include/uapi/asm/ptrace.h | 59 +++++++
arch/arm64/kernel/Makefile | 2 +
arch/arm64/kernel/trampfd.c | 244 +++++++++++++++++++++++++++
5 files changed, 308 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/kernel/trampfd.c
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 3b859596840d..b3b2019f8d16 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -38,7 +38,7 @@
#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
#define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)
-#define __NR_compat_syscalls 440
+#define __NR_compat_syscalls 441
#endif
#define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 6d95d0c8bf2f..c0493c5322d9 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -885,6 +885,8 @@ __SYSCALL(__NR_openat2, sys_openat2)
__SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd)
#define __NR_faccessat2 439
__SYSCALL(__NR_faccessat2, sys_faccessat2)
+#define __NR_trampfd 440
+__SYSCALL(__NR_trampfd, sys_trampfd)
/*
* Please add new compat syscalls above this comment and update
diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h
index 42cbe34d95ce..2778789c1cbe 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -88,6 +88,65 @@ struct user_pt_regs {
__u64 pstate;
};
+/*
+ * These register names are to be used by 32-bit applications.
+ */
+enum reg_32_name {
+ arm_min,
+ arm_r0 = arm_min,
+ arm_r1,
+ arm_r2,
+ arm_r3,
+ arm_r4,
+ arm_r5,
+ arm_r6,
+ arm_r7,
+ arm_r8,
+ arm_r9,
+ arm_r10,
+ arm_r11,
+ arm_r12,
+ arm_max,
+};
+
+/*
+ * These register names are to be used by 64-bit applications.
+ */
+enum reg_64_name {
+ arm64_min = arm_max,
+ arm64_r0 = arm64_min,
+ arm64_r1,
+ arm64_r2,
+ arm64_r3,
+ arm64_r4,
+ arm64_r5,
+ arm64_r6,
+ arm64_r7,
+ arm64_r8,
+ arm64_r9,
+ arm64_r10,
+ arm64_r11,
+ arm64_r12,
+ arm64_r13,
+ arm64_r14,
+ arm64_r15,
+ arm64_r16,
+ arm64_r17,
+ arm64_r18,
+ arm64_r19,
+ arm64_r20,
+ arm64_r21,
+ arm64_r22,
+ arm64_r23,
+ arm64_r24,
+ arm64_r25,
+ arm64_r26,
+ arm64_r27,
+ arm64_r28,
+ arm64_r29,
+ arm64_max,
+};
+
struct user_fpsimd_state {
__uint128_t vregs[32];
__u32 fpsr;
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index a561cbb91d4d..18d373fb1208 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -71,3 +71,5 @@ extra-y += $(head-y) vmlinux.lds
ifeq ($(CONFIG_DEBUG_EFI),y)
AFLAGS_head.o += -DVMLINUX_PATH="\"$(realpath $(objtree)/vmlinux)\""
endif
+
+obj-$(CONFIG_TRAMPFD) += trampfd.o
diff --git a/arch/arm64/kernel/trampfd.c b/arch/arm64/kernel/trampfd.c
new file mode 100644
index 000000000000..3b40ebb12907
--- /dev/null
+++ b/arch/arm64/kernel/trampfd.c
@@ -0,0 +1,244 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Trampoline FD - ARM64 support.
+ *
+ * Author: Madhavan T. Venkataraman ([email protected])
+ *
+ * Copyright (c) 2020, Microsoft Corporation.
+ */
+
+#include <linux/thread_info.h>
+#include <asm/compat.h>
+#include <linux/trampfd.h>
+
+#define TRAMPFD_CODE_32_SIZE 28
+#define TRAMPFD_CODE_64_SIZE 48
+
+static inline bool is_compat(void)
+{
+ return is_compat_thread(task_thread_info(current));
+}
+
+/*
+ * trampfd syscall.
+ */
+void trampfd_arch(struct trampfd_info *info)
+{
+ if (is_compat())
+ info->code_size = TRAMPFD_CODE_32_SIZE;
+ else
+ info->code_size = TRAMPFD_CODE_64_SIZE;
+ info->ntrampolines = PAGE_SIZE / info->code_size;
+ info->code_offset = TRAMPFD_CODE_PGOFF << PAGE_SHIFT;
+ info->reserved = 0;
+}
+
+/*
+ * trampfd code descriptor check.
+ */
+int trampfd_code_arch(struct trampfd_code *code)
+{
+ int ntrampolines;
+ int min, max;
+
+ if (is_compat()) {
+ min = arm_min;
+ max = arm_max;
+ ntrampolines = PAGE_SIZE / TRAMPFD_CODE_32_SIZE;
+ } else {
+ min = arm64_min;
+ max = arm64_max;
+ ntrampolines = PAGE_SIZE / TRAMPFD_CODE_64_SIZE;
+ }
+
+ if (code->reg < min || code->reg >= max)
+ return -EINVAL;
+
+ if (!code->ntrampolines || code->ntrampolines > ntrampolines)
+ return -EINVAL;
+ return 0;
+}
+
+/*
+ * trampfd data descriptor check.
+ */
+int trampfd_data_arch(struct trampfd_data *data)
+{
+ int min, max;
+
+ if (is_compat()) {
+ min = arm_min;
+ max = arm_max;
+ } else {
+ min = arm64_min;
+ max = arm64_max;
+ }
+
+ if (data->reg < min || data->reg >= max)
+ return -EINVAL;
+ return 0;
+}
+
+#define MOVARM(ins, reg, imm32) \
+{ \
+ u16 *_imm16 = (u16 *) &(imm32); /* little endian */ \
+ int _hw, _opcode; \
+ \
+ for (_hw = 0; _hw < 2; _hw++) { \
+ /* movw or movt */ \
+ _opcode = _hw ? 0xe3400000 : 0xe3000000; \
+ *ins++ = _opcode | (_imm16[_hw] >> 12) << 16 | \
+ (reg) << 12 | (_imm16[_hw] & 0xFFF); \
+ } \
+}
+
+#define LDRARM(ins, reg) \
+{ \
+ *ins++ = 0xe5900000 | (reg) << 16 | (reg) << 12; \
+}
+
+#define BXARM(ins, reg) \
+{ \
+ *ins++ = 0xe12fff10 | (reg); \
+}
+
+static void trampfd_code_fill_32(struct trampfd *trampfd, char *addr)
+{
+ char *eaddr = addr + PAGE_SIZE;
+ int creg = trampfd->code_reg - arm_min;
+ int dreg = trampfd->data_reg - arm_min;
+ u32 *code = trampfd->code;
+ u32 *data = trampfd->data;
+ u32 *instruction = (u32 *) addr;
+ int i;
+
+ for (i = 0; i < trampfd->ntrampolines; i++, code++, data++) {
+ /*
+ * movw creg, code & 0xFFFF
+ * movt creg, code >> 16
+ */
+ MOVARM(instruction, creg, code);
+
+ /*
+ * ldr creg, [creg]
+ */
+ LDRARM(instruction, creg);
+
+ /*
+ * movw dreg, data & 0xFFFF
+ * movt dreg, data >> 16
+ */
+ MOVARM(instruction, dreg, data);
+
+ /*
+ * ldr dreg, [dreg]
+ */
+ LDRARM(instruction, dreg);
+
+ /*
+ * bx creg
+ */
+ BXARM(instruction, creg);
+ }
+ addr = (char *) instruction;
+ memset(addr, 0, eaddr - addr);
+}
+
+#define MOVQ(ins, reg, imm64) \
+{ \
+ u16 *_imm16 = (u16 *) &(imm64); /* little endian */ \
+ int _hw, _opcode; \
+ \
+ for (_hw = 0; _hw < 4; _hw++) { \
+ /* movz or movk */ \
+ _opcode = _hw ? 0xf2800000 : 0xd2800000; \
+ *ins++ = _opcode | _hw << 21 | _imm16[_hw] << 5 | (reg);\
+ } \
+}
+
+#define LDR(ins, reg) \
+{ \
+ *ins++ = 0xf9400000 | (reg) << 5 | (reg); \
+}
+
+#define BR(ins, reg) \
+{ \
+ *ins++ = 0xd61f0000 | (reg) << 5; \
+}
+
+#define PAD(ins) \
+{ \
+ while ((uintptr_t) ins & 7) \
+ *ins++ = 0; \
+}
+
+static void trampfd_code_fill_64(struct trampfd *trampfd, char *addr)
+{
+ char *eaddr = addr + PAGE_SIZE;
+ int creg = trampfd->code_reg - arm64_min;
+ int dreg = trampfd->data_reg - arm64_min;
+ u64 *code = trampfd->code;
+ u64 *data = trampfd->data;
+ u32 *instruction = (u32 *) addr;
+ int i;
+
+ for (i = 0; i < trampfd->ntrampolines; i++, code++, data++) {
+ /*
+ * Pseudo instruction:
+ *
+ * movq creg, code
+ *
+ * Actual instructions:
+ *
+ * movz creg, code & 0xFFFF
+ * movk creg, (code >> 16) & 0xFFFF, lsl 16
+ * movk creg, (code >> 32) & 0xFFFF, lsl 32
+ * movk creg, (code >> 48) & 0xFFFF, lsl 48
+ */
+ MOVQ(instruction, creg, code);
+
+ /*
+ * ldr creg, [creg]
+ */
+ LDR(instruction, creg);
+
+ /*
+ * Pseudo instruction:
+ *
+ * movq dreg, data
+ *
+ * Actual instructions:
+ *
+ * movz dreg, data & 0xFFFF
+ * movk dreg, (data >> 16) & 0xFFFF, lsl 16
+ * movk dreg, (data >> 32) & 0xFFFF, lsl 32
+ * movk dreg, (data >> 48) & 0xFFFF, lsl 48
+ */
+ MOVQ(instruction, dreg, data);
+
+ /*
+ * ldr dreg, [dreg]
+ */
+ LDR(instruction, dreg);
+
+ /*
+ * br creg
+ */
+ BR(instruction, creg);
+
+ /*
+ * Pad to 8-byte boundary
+ */
+ PAD(instruction);
+ }
+ addr = (char *) instruction;
+ memset(addr, 0, eaddr - addr);
+}
+
+void trampfd_code_fill(struct trampfd *trampfd, char *addr)
+{
+ if (is_compat())
+ trampfd_code_fill_32(trampfd, addr);
+ else
+ trampfd_code_fill_64(trampfd, addr);
+}
--
2.17.1
From: "Madhavan T. Venkataraman" <[email protected]>
- Define architecture specific register names
- Architecture specific functions for:
- system call init
- code descriptor check
- data descriptor check
- Fill a page with a trampoline table for:
- 32-bit user process
- 64-bit user process
Signed-off-by: Madhavan T. Venkataraman <[email protected]>
---
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/x86/include/uapi/asm/ptrace.h | 38 ++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/trampfd.c | 238 +++++++++++++++++++++++++
5 files changed, 279 insertions(+)
create mode 100644 arch/x86/kernel/trampfd.c
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index d8f8a1a69ed1..d4f17806c9ab 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -443,3 +443,4 @@
437 i386 openat2 sys_openat2
438 i386 pidfd_getfd sys_pidfd_getfd
439 i386 faccessat2 sys_faccessat2
+440 i386 trampfd sys_trampfd
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 78847b32e137..91b37bc4b6f0 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -360,6 +360,7 @@
437 common openat2 sys_openat2
438 common pidfd_getfd sys_pidfd_getfd
439 common faccessat2 sys_faccessat2
+440 common trampfd sys_trampfd
#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/arch/x86/include/uapi/asm/ptrace.h b/arch/x86/include/uapi/asm/ptrace.h
index 85165c0edafc..b4be362929b3 100644
--- a/arch/x86/include/uapi/asm/ptrace.h
+++ b/arch/x86/include/uapi/asm/ptrace.h
@@ -9,6 +9,44 @@
#ifndef __ASSEMBLY__
+/*
+ * These register names are to be used by 32-bit applications.
+ */
+enum reg_32_name {
+ x32_min = 0,
+ x32_eax = x32_min,
+ x32_ebx,
+ x32_ecx,
+ x32_edx,
+ x32_esi,
+ x32_edi,
+ x32_ebp,
+ x32_max,
+};
+
+/*
+ * These register names are to be used by 64-bit applications.
+ */
+enum reg_64_name {
+ x64_min = x32_max,
+ x64_rax = x64_min,
+ x64_rbx,
+ x64_rcx,
+ x64_rdx,
+ x64_rsi,
+ x64_rdi,
+ x64_rbp,
+ x64_r8,
+ x64_r9,
+ x64_r10,
+ x64_r11,
+ x64_r12,
+ x64_r13,
+ x64_r14,
+ x64_r15,
+ x64_max,
+};
+
#ifdef __i386__
/* this struct defines the way the registers are stored on the
stack during a system call. */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index e77261db2391..feb7f4f311fd 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -157,3 +157,4 @@ ifeq ($(CONFIG_X86_64),y)
endif
obj-$(CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT) += ima_arch.o
+obj-$(CONFIG_TRAMPFD) += trampfd.o
diff --git a/arch/x86/kernel/trampfd.c b/arch/x86/kernel/trampfd.c
new file mode 100644
index 000000000000..7b812c200d01
--- /dev/null
+++ b/arch/x86/kernel/trampfd.c
@@ -0,0 +1,238 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Trampoline FD - X86 support.
+ *
+ * Author: Madhavan T. Venkataraman ([email protected])
+ *
+ * Copyright (c) 2020, Microsoft Corporation.
+ */
+
+#include <linux/thread_info.h>
+#include <linux/trampfd.h>
+
+#define TRAMPFD_CODE_32_SIZE 24
+#define TRAMPFD_CODE_64_SIZE 40
+
+static inline bool is_compat(void)
+{
+ return (IS_ENABLED(CONFIG_X86_32) ||
+ (IS_ENABLED(CONFIG_COMPAT) && test_thread_flag(TIF_ADDR32)));
+}
+
+/*
+ * trampfd syscall.
+ */
+void trampfd_arch(struct trampfd_info *info)
+{
+ if (is_compat())
+ info->code_size = TRAMPFD_CODE_32_SIZE;
+ else
+ info->code_size = TRAMPFD_CODE_64_SIZE;
+ info->ntrampolines = PAGE_SIZE / info->code_size;
+ info->code_offset = TRAMPFD_CODE_PGOFF << PAGE_SHIFT;
+ info->reserved = 0;
+}
+
+/*
+ * trampfd code descriptor check.
+ */
+int trampfd_code_arch(struct trampfd_code *code)
+{
+ int ntrampolines;
+ int min, max;
+
+ if (is_compat()) {
+ min = x32_min;
+ max = x32_max;
+ ntrampolines = PAGE_SIZE / TRAMPFD_CODE_32_SIZE;
+ } else {
+ min = x64_min;
+ max = x64_max;
+ ntrampolines = PAGE_SIZE / TRAMPFD_CODE_64_SIZE;
+ }
+
+ if (code->reg < min || code->reg >= max)
+ return -EINVAL;
+
+ if (!code->ntrampolines || code->ntrampolines > ntrampolines)
+ return -EINVAL;
+ return 0;
+}
+
+/*
+ * trampfd data descriptor check.
+ */
+int trampfd_data_arch(struct trampfd_data *data)
+{
+ int min, max;
+
+ if (is_compat()) {
+ min = x32_min;
+ max = x32_max;
+ } else {
+ min = x64_min;
+ max = x64_max;
+ }
+
+ if (data->reg < min || data->reg >= max)
+ return -EINVAL;
+ return 0;
+}
+
+/*
+ * X32 register encodings.
+ */
+static unsigned char reg_32[] = {
+ 0, /* x32_eax */
+ 3, /* x32_ebx */
+ 1, /* x32_ecx */
+ 2, /* x32_edx */
+ 6, /* x32_esi */
+ 7, /* x32_edi */
+ 5, /* x32_ebp */
+};
+
+static void trampfd_code_fill_32(struct trampfd *trampfd, char *addr)
+{
+ char *eaddr = addr + PAGE_SIZE;
+ int creg = trampfd->code_reg - x32_min;
+ int dreg = trampfd->data_reg - x32_min;
+ u32 *code = trampfd->code;
+ u32 *data = trampfd->data;
+ int i;
+
+ for (i = 0; i < trampfd->ntrampolines; i++, code++, data++) {
+ /* endbr32 */
+ addr[0] = 0xf3;
+ addr[1] = 0x0f;
+ addr[2] = 0x1e;
+ addr[3] = 0xfb;
+
+ /* mov code, %creg */
+ addr[4] = 0xB8 | reg_32[creg]; /* opcode+reg */
+ memcpy(&addr[5], &code, sizeof(u32)); /* imm32 */
+
+ /* mov (%creg), %creg */
+ addr[9] = 0x8B; /* opcode */
+ addr[10] = 0x00 | /* MODRM.mode */
+ reg_32[creg] << 3 | /* MODRM.reg */
+ reg_32[creg]; /* MODRM.r/m */
+
+ /* mov data, %dreg */
+ addr[11] = 0xB8 | reg_32[dreg]; /* opcode+reg */
+ memcpy(&addr[12], &data, sizeof(u32)); /* imm32 */
+
+ /* mov (%dreg), %dreg */
+ addr[16] = 0x8B; /* opcode */
+ addr[17] = 0x00 | /* MODRM.mode */
+ reg_32[dreg] << 3 | /* MODRM.reg */
+ reg_32[dreg]; /* MODRM.r/m */
+
+ /* jmp *%creg */
+ addr[18] = 0xff; /* opcode */
+ addr[19] = 0xe0 | reg_32[creg]; /* MODRM.r/m */
+
+ /* nopl (%eax) */
+ addr[20] = 0x0f;
+ addr[21] = 0x1f;
+ addr[22] = 0x00;
+
+ /* pad to 4-byte boundary */
+ memset(&addr[23], 0, TRAMPFD_CODE_32_SIZE - 23);
+ addr += TRAMPFD_CODE_32_SIZE;
+ }
+ memset(addr, 0, eaddr - addr);
+}
+
+/*
+ * X64 register encodings.
+ */
+static unsigned char reg_64[] = {
+ 0, /* x64_rax */
+ 3, /* x64_rbx */
+ 1, /* x64_rcx */
+ 2, /* x64_rdx */
+ 6, /* x64_rsi */
+ 7, /* x64_rdi */
+ 5, /* x64_rbp */
+ 8, /* x64_r8 */
+ 9, /* x64_r9 */
+ 10, /* x64_r10 */
+ 11, /* x64_r11 */
+ 12, /* x64_r12 */
+ 13, /* x64_r13 */
+ 14, /* x64_r14 */
+ 15, /* x64_r15 */
+};
+
+static void trampfd_code_fill_64(struct trampfd *trampfd, char *addr)
+{
+ char *eaddr = addr + PAGE_SIZE;
+ int creg = trampfd->code_reg - x64_min;
+ int dreg = trampfd->data_reg - x64_min;
+ u64 *code = trampfd->code;
+ u64 *data = trampfd->data;
+ int i;
+
+ for (i = 0; i < trampfd->ntrampolines; i++, code++, data++) {
+ /* endbr64 */
+ addr[0] = 0xf3;
+ addr[1] = 0x0f;
+ addr[2] = 0x1e;
+ addr[3] = 0xfa;
+
+ /* movabs code, %creg */
+ addr[4] = 0x48 | /* REX.W */
+ ((reg_64[creg] & 0x8) >> 3); /* REX.B */
+ addr[5] = 0xB8 | (reg_64[creg] & 0x7); /* opcode+reg */
+ memcpy(&addr[6], &code, sizeof(u64)); /* imm64 */
+
+ /* movq (%creg), %creg */
+ addr[14] = 0x48 | /* REX.W */
+ ((reg_64[creg] & 0x8) >> 1) | /* REX.R */
+ ((reg_64[creg] & 0x8) >> 3); /* REX.B */
+ addr[15] = 0x8B; /* opcode */
+ addr[16] = 0x00 | /* MODRM.mode */
+ ((reg_64[creg] & 0x7)) << 3 | /* MODRM.reg */
+ ((reg_64[creg] & 0x7)); /* MODRM.r/m */
+
+ /* movabs data, %dreg */
+ addr[17] = 0x48 | /* REX.W */
+ ((reg_64[dreg] & 0x8) >> 3); /* REX.B */
+ addr[18] = 0xB8 | (reg_64[dreg] & 0x7); /* opcode+reg */
+ memcpy(&addr[19], &data, sizeof(u64)); /* imm64 */
+
+ /* movq (%dreg), %dreg */
+ addr[27] = 0x48 | /* REX.W */
+ ((reg_64[dreg] & 0x8) >> 1) | /* REX.R */
+ ((reg_64[dreg] & 0x8) >> 3); /* REX.B */
+ addr[28] = 0x8B; /* opcode */
+ addr[29] = 0x00 | /* MODRM.mode */
+ ((reg_64[dreg] & 0x7)) << 3 | /* MODRM.reg */
+ ((reg_64[dreg] & 0x7)); /* MODRM.r/m */
+
+ /* jmpq *%creg */
+ addr[30] = 0x40 | /* REX.W */
+ ((reg_64[creg] & 0x8) >> 3); /* REX.B */
+ addr[31] = 0xff; /* opcode */
+ addr[32] = 0xe0 | (reg_64[creg] & 0x7); /* MODRM.r/m */
+
+ /* nopl (%rax) */
+ addr[33] = 0x0f;
+ addr[34] = 0x1f;
+ addr[35] = 0x00;
+
+ /* pad to 8-byte boundary */
+ memset(&addr[36], 0, TRAMPFD_CODE_64_SIZE - 36);
+ addr += TRAMPFD_CODE_64_SIZE;
+ }
+ memset(addr, 0, eaddr - addr);
+}
+
+void trampfd_code_fill(struct trampfd *trampfd, char *addr)
+{
+ if (is_compat())
+ trampfd_code_fill_32(trampfd, addr);
+ else
+ trampfd_code_fill_64(trampfd, addr);
+}
--
2.17.1
I just resent the trampfd v2 RFC. I forgot to CC the reviewers who provided comments before.
So sorry.
Madhavan
On 9/22/20 4:53 PM, [email protected] wrote:
> From: "Madhavan T. Venkataraman" <[email protected]>
>
> Introduction
> ============
>
> Dynamic code is used in many different user applications. Dynamic code is
> often generated at runtime. Dynamic code can also just be a pre-defined
> sequence of machine instructions in a data buffer. Examples of dynamic
> code are trampolines, JIT code, DBT code, etc.
>
> Dynamic code is placed either in a data page or in a stack page. In order
> to execute dynamic code, the page it resides in needs to be mapped with
> execute permissions. Writable pages with execute permissions provide an
> attack surface for hackers. Attackers can use this to inject malicious
> code, modify existing code or do other harm.
>
> To mitigate this, LSMs such as SELinux implement W^X. That is, they may not
> allow pages to have both write and execute permissions. This prevents
> dynamic code from executing and blocks applications that use it. To allow
> genuine applications to run, exceptions have to be made for them (by setting
> execmem, etc) which opens the door to security issues.
>
> The W^X implementation today is not complete. There exist many user level
> tricks that can be used to load and execute dynamic code. E.g.,
>
> - Load the code into a file and map the file with R-X.
>
> - Load the code in an RW- page. Change the permissions to R--. Then,
> change the permissions to R-X.
>
> - Load the code in an RW- page. Remap the page with R-X to get a separate
> mapping to the same underlying physical page.
>
> IMO, these are all security holes as an attacker can exploit them to inject
> his own code.
>
> In the future, these holes will definitely be closed. For instance, LSMs
> (such as the IPE proposal [1]) may only allow code in properly signed object
> files to be mapped with execute permissions. This will do two things:
>
> - user level tricks using anonymous pages will fail as anonymous
> pages have no file identity
>
> - loading the code in a temporary file and mapping it with R-X
> will fail as the temporary file would not have a signature
>
> We need a way to execute such code without making security exceptions.
> Trampolines are a good example of dynamic code. A couple of examples
> of trampolines are given below. My first use case for this RFC is
> libffi.
>
> Examples of trampolines
> =======================
>
> libffi (A Portable Foreign Function Interface Library):
>
> libffi allows a user to define functions with an arbitrary list of
> arguments and return value through a feature called "Closures".
> Closures use trampolines to jump to ABI handlers that handle calling
> conventions and call a target function. libffi is used by a lot
> of different applications. To name a few:
>
> - Python
> - Java
> - Javascript
> - Ruby FFI
> - Lisp
> - Objective C
>
> GCC nested functions:
>
> GCC has traditionally used trampolines for implementing nested
> functions. The trampoline is placed on the user stack. So, the stack
> needs to be executable.
>
> Currently available solution
> ============================
>
> One solution that has been proposed to allow trampolines to be executed
> without making security exceptions is Trampoline Emulation. See:
>
> https://pax.grsecurity.net/docs/emutramp.txt
>
> In this solution, the kernel recognizes certain sequences of instructions
> as "well-known" trampolines. When such a trampoline is executed, a page
> fault happens because the trampoline page does not have execute permission.
> The kernel recognizes the trampoline and emulates it. Basically, the
> kernel does the work of the trampoline on behalf of the application.
>
> Currently, the emulated trampolines are the ones used in libffi and GCC
> nested functions. To my knowledge, only X86 is supported at this time.
>
> As noted in emutramp.txt, this is not a generic solution. For every new
> trampoline that needs to be supported, new instruction sequences need to
> be recognized by the kernel and emulated. And this has to be done for
> every architecture that needs to be supported.
>
> emutramp.txt notes the following:
>
> "... the real solution is not in emulation but by designing a kernel API
> for runtime code generation and modifying userland to make use of it."
>
> Solution proposed in this RFC
> =============================
>
>>From this RFC's perspective, there are two scenarios for dynamic code:
>
> Scenario 1
> ----------
>
> We know what code we need only at runtime. For instance, JIT code generated
> for frequently executed Java methods. Only at runtime do we know what
> methods need to be JIT compiled. Such code cannot be statically defined. It
> has to be generated at runtime.
>
> Scenario 2
> ----------
>
> We know what code we need in advance. User trampolines are a good example of
> this. It is possible to define such code statically with some help from the
> kernel.
>
> This RFC addresses (2). (1) needs a general purpose trusted code generator
> and is out of scope for this RFC.
>
> For (2), the solution is to convert dynamic code to static code and place it
> in a source file. The binary generated from the source can be signed. The
> kernel can use signature verification to authenticate the binary and
> allow the code to be mapped and executed.
>
> The problem is that the static code has to be able to find the data that it
> needs when it executes. For functions, the ABI defines the way to pass
> parameters. But, for arbitrary dynamic code, there isn't a standard ABI
> compliant way to pass data to the code for most architectures. Each instance
> of dynamic code defines its own way. For instance, co-location of code and
> data and PC-relative data referencing are used in cases where the ISA
> supports it.
>
> We need one standard way that would work for all architectures and ABIs.
>
> The solution proposed here is:
>
> 1. Write the static code assuming that the data needed by the code is already
> pointed to by a designated register.
>
> 2. Get the kernel to supply a small universal trampoline that does the
> following:
>
> - Load the address of the data in a designated register
> - Load the address of the static code in a designated register
> - Jump to the static code
>
> User code would use a kernel supplied API to create and map the trampoline.
> The address values would be baked into the code so that no special ISA
> features are needed.
>
> To conserve memory, the kernel will pack as many trampolines as possible in
> a page and provide a trampoline table to user code. The table itself is
> managed by the user.
>
> Trampoline File Descriptor (trampfd)
> ==========================
>
> I am proposing a kernel API using anonymous file descriptors that can be
> used to create the trampolines. The API is described in patch 1/4 of this
> patchset. I provide a summary here:
>
> - Create a trampoline file object
>
> - Write a code descriptor into the trampoline file and specify:
>
> - the number of trampolines desired
> - the name of the code register
> - user pointer to a table of code addresses, one address
> per trampoline
>
> - Write a data descriptor into the trampoline file and specify:
>
> - the name of the data register
> - user pointer to a table of data addresses, one address
> per trampoline
>
> - mmap() the trampoline file. The kernel generates a table of
> trampolines in a page and returns the trampoline table address
>
> - munmap() a trampoline file mapping
>
> - Close the trampoline file
>
> Each mmap() will only map a single base page. Large pages are not supported.
>
> A trampoline file can only be mapped once in an address space.
>
> Trampoline file mappings cannot be shared across address spaces. So,
> sending the trampoline file descriptor over a unix domain socket and
> mapping it in another process will not work.
>
> It is recommended that the code descriptor and the code table be placed
> in the .rodata section so an attacker cannot modify them.
>
> Trampoline use and reuse
> ========================
>
> The code for trampoline X in the trampoline table is:
>
> load &code_table[X], code_reg
> load (code_reg), code_reg
> load &data_table[X], data_reg
> load (data_reg), data_reg
> jump code_reg
>
> The addresses &code_table[X] and &data_table[X] are baked into the
> trampoline code. So, PC-relative data references are not needed. The user
> can modify code_table[X] and data_table[X] dynamically.
>
> For instance, within libffi, the same trampoline X can be used for different
> closures at different times by setting:
>
> data_table[X] = closure;
> code_table[X] = ABI handling code;
>
> Advantages of the Trampoline File Descriptor approach
> =====================================================
>
> - Using this support from the kernel, dynamic code can be converted to
> static code with a little effort so applications and libraries can move to
> a more secure model. In the simplest cases such as libffi, dynamic code can
> even be eliminated.
>
> - This initial work is targeted towards X86 and ARM. But it can be supported
> easily on all architectures. We don't need any special ISA features such
> as PC-relative data referencing.
>
> - The only code generation needed is for this small, universal trampoline.
>
> - The kernel does not have to deal with any ABI issues in the generation of
> this trampoline.
>
> - The kernel provides a trampoline table to conserve memory.
>
> - An SELinux setting called "exectramp" can be implemented along the
> lines of "execmem", "execstack" and "execheap" to selectively allow the
> use of trampolines on a per application basis.
>
> - In version 1, a trip to the kernel was required to execute the trampoline.
> In version 2, that is not required. So, there are no performance
> concerns in this approach.
>
> libffi
> ======
>
> I have implemented my solution for libffi and provided the changes for
> X86 and ARM, 32-bit and 64-bit. Here is the reference patch:
>
> http://linux.microsoft.com/~madvenka/libffi/libffi.v2.txt
>
> If the trampfd patchset gets accepted, I will send the libffi changes
> to the maintainers for a review. BTW, I have also successfully executed
> the libffi self tests.
>
> Work that is pending
> ====================
>
> - I am working on implementing the SELinux setting - "exectramp".
>
> - I have a test program to test the kernel API. I am working on adding it
> to selftests.
>
> References
> ==========
>
> [1] https://microsoft.github.io/ipe/
> ---
>
> Changelog:
>
> v1
> Introduced the Trampfd feature.
>
> v2
> - Changed the system call. Version 2 does not support different
> trampoline types and their associated type structures. It only
> supports a kernel generated trampoline.
>
> The system call now returns information to the user that is
> used to define trampoline descriptors. E.g., the maximum
> number of trampolines that can be packed in a single page.
>
> - Removed all the trampoline contexts such as register contexts
> and stack contexts. This is based on the feedback that the kernel
> should not have to worry about ABI issues and H/W features that
> may deal with the context of a process.
>
> - Removed the need to make a trip into the kernel on trampoline
> invocation. This is based on the feedback about performance.
>
> - Removed the ability to share trampolines across address spaces.
> This would have made sense to different trampoline types based
> on their semantics. But since I support only one specific
> trampoline, sharing does not make sense.
>
> - Added calls to specify trampoline descriptors that the kernel
> uses to generate trampolines.
>
> - Added architecture-specific code to generate the small, universal
> trampoline for X86 32 and 64-bit, ARM 32 and 64-bit.
>
> - Implemented the trampoline table in a page.
> Madhavan T. Venkataraman (4):
> Implement the kernel API for the trampoline file descriptor.
> Implement i386 and X86 support for the trampoline file descriptor.
> Implement ARM64 support for the trampoline file descriptor.
> Implement ARM support for the trampoline file descriptor.
>
> arch/arm/include/uapi/asm/ptrace.h | 21 +++
> arch/arm/kernel/Makefile | 1 +
> arch/arm/kernel/trampfd.c | 124 +++++++++++++
> arch/arm/tools/syscall.tbl | 1 +
> arch/arm64/include/asm/unistd.h | 2 +-
> arch/arm64/include/asm/unistd32.h | 2 +
> arch/arm64/include/uapi/asm/ptrace.h | 59 ++++++
> arch/arm64/kernel/Makefile | 2 +
> arch/arm64/kernel/trampfd.c | 244 +++++++++++++++++++++++++
> arch/x86/entry/syscalls/syscall_32.tbl | 1 +
> arch/x86/entry/syscalls/syscall_64.tbl | 1 +
> arch/x86/include/uapi/asm/ptrace.h | 38 ++++
> arch/x86/kernel/Makefile | 1 +
> arch/x86/kernel/trampfd.c | 238 ++++++++++++++++++++++++
> fs/Makefile | 1 +
> fs/trampfd/Makefile | 5 +
> fs/trampfd/trampfd_fops.c | 241 ++++++++++++++++++++++++
> fs/trampfd/trampfd_map.c | 142 ++++++++++++++
> include/linux/syscalls.h | 2 +
> include/linux/trampfd.h | 49 +++++
> include/uapi/asm-generic/unistd.h | 4 +-
> include/uapi/linux/trampfd.h | 184 +++++++++++++++++++
> init/Kconfig | 7 +
> kernel/sys_ni.c | 3 +
> 24 files changed, 1371 insertions(+), 2 deletions(-)
> create mode 100644 arch/arm/kernel/trampfd.c
> create mode 100644 arch/arm64/kernel/trampfd.c
> create mode 100644 arch/x86/kernel/trampfd.c
> create mode 100644 fs/trampfd/Makefile
> create mode 100644 fs/trampfd/trampfd_fops.c
> create mode 100644 fs/trampfd/trampfd_map.c
> create mode 100644 include/linux/trampfd.h
> create mode 100644 include/uapi/linux/trampfd.h
>
Hi!
> Introduction
> ============
>
> Dynamic code is used in many different user applications. Dynamic code is
> often generated at runtime. Dynamic code can also just be a pre-defined
> sequence of machine instructions in a data buffer. Examples of dynamic
> code are trampolines, JIT code, DBT code, etc.
>
> Dynamic code is placed either in a data page or in a stack page. In order
> to execute dynamic code, the page it resides in needs to be mapped with
> execute permissions. Writable pages with execute permissions provide an
> attack surface for hackers. Attackers can use this to inject malicious
> code, modify existing code or do other harm.
>
> To mitigate this, LSMs such as SELinux implement W^X. That is, they may not
> allow pages to have both write and execute permissions. This prevents
> dynamic code from executing and blocks applications that use it. To allow
> genuine applications to run, exceptions have to be made for them (by setting
> execmem, etc) which opens the door to security issues.
>
> The W^X implementation today is not complete. There exist many user level
> tricks that can be used to load and execute dynamic code. E.g.,
>
> - Load the code into a file and map the file with R-X.
>
> - Load the code in an RW- page. Change the permissions to R--. Then,
> change the permissions to R-X.
>
> - Load the code in an RW- page. Remap the page with R-X to get a separate
> mapping to the same underlying physical page.
>
> IMO, these are all security holes as an attacker can exploit them to inject
> his own code.
IMO, you are smoking crack^H^H very seriously misunderstanding what
W^X is supposed to protect from.
W^X is not supposed to protect you from attackers that can already do
system calls. So loading code into a file then mapping the file as R-X
is in no way security hole in W^X.
If you want to provide protection from attackers that _can_ do system
calls, fine, but please don't talk about W^X and please specify what
types of attacks you want to prevent and why that's good thing.
Hint: attacker that can "Load the code into a file and map the file
with R-X." can probably also load the code into /foo and
os.system("/usr/bin/python /foo").
This is not first crazy patch from your company. Perhaps you should
have a person with strong Unix/Linux experience performing "straight
face test" on outgoing patches?
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Hi!
> Solution proposed in this RFC
> =============================
>
> >From this RFC's perspective, there are two scenarios for dynamic code:
>
> Scenario 1
> ----------
>
> We know what code we need only at runtime. For instance, JIT code generated
> for frequently executed Java methods. Only at runtime do we know what
> methods need to be JIT compiled. Such code cannot be statically defined. It
> has to be generated at runtime.
>
> Scenario 2
> ----------
>
> We know what code we need in advance. User trampolines are a good example of
> this. It is possible to define such code statically with some help from the
> kernel.
>
> This RFC addresses (2). (1) needs a general purpose trusted code generator
> and is out of scope for this RFC.
This is slightly less crazy talk than introduction talking about holes
in W^X. But it is very, very far from normal Unix system, where you
have selection of interpretters to run your malware on (sh, python,
awk, emacs, ...) and often you can even compile malware from sources.
And as you noted, we don't have "a general purpose trusted code
generator" for our systems.
I believe you should simply delete confusing "introduction" and
provide details of super-secure system where your patches would be
useful, instead.
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Wed, Sep 23, 2020 at 10:14:26AM +0200, Pavel Machek wrote:
> > Introduction
> > ============
> >
> > Dynamic code is used in many different user applications. Dynamic code is
> > often generated at runtime. Dynamic code can also just be a pre-defined
> > sequence of machine instructions in a data buffer. Examples of dynamic
> > code are trampolines, JIT code, DBT code, etc.
> >
> > Dynamic code is placed either in a data page or in a stack page. In order
> > to execute dynamic code, the page it resides in needs to be mapped with
> > execute permissions. Writable pages with execute permissions provide an
> > attack surface for hackers. Attackers can use this to inject malicious
> > code, modify existing code or do other harm.
> >
> > To mitigate this, LSMs such as SELinux implement W^X. That is, they may not
> > allow pages to have both write and execute permissions. This prevents
> > dynamic code from executing and blocks applications that use it. To allow
> > genuine applications to run, exceptions have to be made for them (by setting
> > execmem, etc) which opens the door to security issues.
> >
> > The W^X implementation today is not complete. There exist many user level
> > tricks that can be used to load and execute dynamic code. E.g.,
> >
> > - Load the code into a file and map the file with R-X.
> >
> > - Load the code in an RW- page. Change the permissions to R--. Then,
> > change the permissions to R-X.
> >
> > - Load the code in an RW- page. Remap the page with R-X to get a separate
> > mapping to the same underlying physical page.
> >
> > IMO, these are all security holes as an attacker can exploit them to inject
> > his own code.
>
> IMO, you are smoking crack^H^H very seriously misunderstanding what
> W^X is supposed to protect from.
>
> W^X is not supposed to protect you from attackers that can already do
> system calls. So loading code into a file then mapping the file as R-X
> is in no way security hole in W^X.
>
> If you want to provide protection from attackers that _can_ do system
> calls, fine, but please don't talk about W^X and please specify what
> types of attacks you want to prevent and why that's good thing.
On one hand, Pavel is absolutely right. It is ridiculous to say that
"these are all security holes as an attacker can exploit them to inject
his own code."
On the other hand, "what W^X is supposed to protect from" depends on how
the term W^X is defined (historically, by PaX and OpenBSD). It may be
that W^X is partially not a feature to defeat attacks per se, but also a
policy enforcement feature preventing use of dangerous techniques (JIT).
Such policy might or might not make sense. It might make sense for ease
of reasoning, e.g. "I've flipped this setting, and now I'm certain the
system doesn't have JIT within a process (can still have it through
dynamically creating and invoking an entire new program), so there are
no opportunities for an attacker to inject code nor generate previously
non-existing ROP gadgets into an executable mapping within a process."
I do find it questionable whether such policy and such reasoning make
sense beyond academia.
Then, there might be even more ways in which W^X is not perfect enough
to enable such reasoning. What about using ptrace(2) to inject code?
Should enabling W^X also disable ability to debug programs by non-root?
We already have Yama ptrace_scope, which can achieve that at the highest
setting, although that's rather inconvenient and is probably unexpected
by most to be a requirement for having (ridiculously?) full W^X allowing
for the academic reasoning.
Personally, I am for policies that make more practical sense. For
example, years ago I advocated here on kernel-hardening that we should
have a mode where ELF flags enabling/disabling executable stack are
ignored, and non-executable stack is always enforced. This should also
be extended to default (at program startup) permissions on more than
just stack (but also on .bss, typical libcs' heap allocations, etc.)
However, I am not convinced there's enough value in extending the policy
to restricting explicit uses of mprotect(2).
Yes, PaX did that, and its emutramp.txt said "runtime code generation is
by its nature incompatible with PaX's PAGEEXEC/SEGMEXEC and MPROTECT
features, therefore the real solution is not in emulation but by
designing a kernel API for runtime code generation and modifying
userland to make use of it." However, not being convinced in the
MPROTECT feature having enough practical value, I am also not convinced
"a kernel API for runtime code generation and modifying userland to make
use of it" is the way to go.
Having static instead of dynamically-generated trampolines in userland
code where possible (and making other userland/ABI changes to make that
possible in more/all cases) is an obvious improvement, and IMO should be
a priority over the above.
While I share my opinion here, I don't mean that to block Madhavan's
work. I'd rather defer to people more knowledgeable in current userland
and ABI issues/limitations and plans on dealing with those, especially
to Florian Weimer. I haven't seen Florian say anything specific for or
against Madhavan's proposal, and I'd like to. (Have I missed that?)
It'd be wrong to introduce a kernel API that userland doesn't need, and
it'd be right to introduce one that userland actually intends to use.
I've also added Rich Felker to CC here, for musl libc and its possible
intent to use the proposed API. (My guess is there's no such need, and
thus no intent, but Rich might want to confirm that or correct me.)
Alexander
On Wed, Sep 23, 2020 at 11:14:56AM +0200, Solar Designer wrote:
> On Wed, Sep 23, 2020 at 10:14:26AM +0200, Pavel Machek wrote:
> > > Introduction
> > > ============
> > >
> > > Dynamic code is used in many different user applications. Dynamic code is
> > > often generated at runtime. Dynamic code can also just be a pre-defined
> > > sequence of machine instructions in a data buffer. Examples of dynamic
> > > code are trampolines, JIT code, DBT code, etc.
> > >
> > > Dynamic code is placed either in a data page or in a stack page. In order
> > > to execute dynamic code, the page it resides in needs to be mapped with
> > > execute permissions. Writable pages with execute permissions provide an
> > > attack surface for hackers. Attackers can use this to inject malicious
> > > code, modify existing code or do other harm.
> > >
> > > To mitigate this, LSMs such as SELinux implement W^X. That is, they may not
> > > allow pages to have both write and execute permissions. This prevents
> > > dynamic code from executing and blocks applications that use it. To allow
> > > genuine applications to run, exceptions have to be made for them (by setting
> > > execmem, etc) which opens the door to security issues.
> > >
> > > The W^X implementation today is not complete. There exist many user level
> > > tricks that can be used to load and execute dynamic code. E.g.,
> > >
> > > - Load the code into a file and map the file with R-X.
> > >
> > > - Load the code in an RW- page. Change the permissions to R--. Then,
> > > change the permissions to R-X.
> > >
> > > - Load the code in an RW- page. Remap the page with R-X to get a separate
> > > mapping to the same underlying physical page.
> > >
> > > IMO, these are all security holes as an attacker can exploit them to inject
> > > his own code.
> >
> > IMO, you are smoking crack^H^H very seriously misunderstanding what
> > W^X is supposed to protect from.
> >
> > W^X is not supposed to protect you from attackers that can already do
> > system calls. So loading code into a file then mapping the file as R-X
> > is in no way security hole in W^X.
> >
> > If you want to provide protection from attackers that _can_ do system
> > calls, fine, but please don't talk about W^X and please specify what
> > types of attacks you want to prevent and why that's good thing.
>
> On one hand, Pavel is absolutely right. It is ridiculous to say that
> "these are all security holes as an attacker can exploit them to inject
> his own code."
I stand corrected, due to Brad's tweet and follow-ups here:
https://twitter.com/spendergrsec/status/1308728284390318082
It sure does make sense to combine ret2libc/ROP to mprotect() with one's
own injected shellcode. Compared to doing everything from ROP, this is
easier and more reliable across versions/builds if the desired payload
is non-trivial. My own example: invoking a shell in a local attack on
Linux is trivial enough to do via ret2libc only, but a connect-back
shell in a remote attack might be easier and more reliably done via
mprotect() + shellcode.
Per the follow-ups, this was an established technique on Windows and iOS
until further hardening prevented it. So it does make sense for Linux
to do the same (as an option because of it breaking existing stuff), and
not so much as policy enforcement for the sake of it and ease of
reasoning, but mostly to force real-world exploits to be more complex
and less reliable.
> On the other hand, "what W^X is supposed to protect from" depends on how
> the term W^X is defined (historically, by PaX and OpenBSD). It may be
> that W^X is partially not a feature to defeat attacks per se, but also a
> policy enforcement feature preventing use of dangerous techniques (JIT).
>
> Such policy might or might not make sense. It might make sense for ease
> of reasoning, e.g. "I've flipped this setting, and now I'm certain the
> system doesn't have JIT within a process (can still have it through
> dynamically creating and invoking an entire new program), so there are
> no opportunities for an attacker to inject code nor generate previously
> non-existing ROP gadgets into an executable mapping within a process."
>
> I do find it questionable whether such policy and such reasoning make
> sense beyond academia.
I was wrong in the above, focusing on the wrong thing.
> Then, there might be even more ways in which W^X is not perfect enough
> to enable such reasoning. What about using ptrace(2) to inject code?
> Should enabling W^X also disable ability to debug programs by non-root?
> We already have Yama ptrace_scope, which can achieve that at the highest
> setting, although that's rather inconvenient and is probably unexpected
> by most to be a requirement for having (ridiculously?) full W^X allowing
> for the academic reasoning.
Thinking out loud:
Technically, ptrace() is also usable from a ROP chain. It might be too
cumbersome to bother using to get a shellcode going, but OTOH it's just
one function to be invoked in a similar fashion multiple times, so might
be more reliable than having a ROP chain depend on multiple actually
needed functions directly (moving that dependency into the shellcode).
> Personally, I am for policies that make more practical sense. For
> example, years ago I advocated here on kernel-hardening that we should
> have a mode where ELF flags enabling/disabling executable stack are
> ignored, and non-executable stack is always enforced. This should also
> be extended to default (at program startup) permissions on more than
> just stack (but also on .bss, typical libcs' heap allocations, etc.)
> However, I am not convinced there's enough value in extending the policy
> to restricting explicit uses of mprotect(2).
>
> Yes, PaX did that, and its emutramp.txt said "runtime code generation is
> by its nature incompatible with PaX's PAGEEXEC/SEGMEXEC and MPROTECT
> features, therefore the real solution is not in emulation but by
> designing a kernel API for runtime code generation and modifying
> userland to make use of it." However, not being convinced in the
> MPROTECT feature having enough practical value,
I am convinced now, however:
> I am also not convinced
> "a kernel API for runtime code generation and modifying userland to make
> use of it" is the way to go.
doesn't automatically follow from the above, because:
> Having static instead of dynamically-generated trampolines in userland
> code where possible (and making other userland/ABI changes to make that
> possible in more/all cases) is an obvious improvement, and IMO should be
> a priority over the above.
>
> While I share my opinion here, I don't mean that to block Madhavan's
> work. I'd rather defer to people more knowledgeable in current userland
> and ABI issues/limitations and plans on dealing with those, especially
> to Florian Weimer. I haven't seen Florian say anything specific for or
> against Madhavan's proposal, and I'd like to. (Have I missed that?)
> It'd be wrong to introduce a kernel API that userland doesn't need, and
> it'd be right to introduce one that userland actually intends to use.
>
> I've also added Rich Felker to CC here, for musl libc and its possible
> intent to use the proposed API. (My guess is there's no such need, and
> thus no intent, but Rich might want to confirm that or correct me.)
So need to hear more from the userland folks, I guess.
Alexander
* Solar Designer:
> While I share my opinion here, I don't mean that to block Madhavan's
> work. I'd rather defer to people more knowledgeable in current userland
> and ABI issues/limitations and plans on dealing with those, especially
> to Florian Weimer. I haven't seen Florian say anything specific for or
> against Madhavan's proposal, and I'd like to. (Have I missed that?)
There was a previous discussion, where I provided feedback (not much
different from the feedback here, given that the mechanism is mostly the
same).
I think it's unnecessary for the libffi use case. Precompiled code can
be loaded from disk because the libffi trampolines are so regular. On
most architectures, it's not even the code that's patched, but some of
the data driving it, which happens to be located on the same page due to
a libffi quirk.
The libffi use case is a bit strange anyway: its trampolines are
type-generic, and the per-call adjustment is data-driven. This means
that once you have libffi in the process, you have a generic
data-to-function-call mechanism available that can be abused (it's even
fully CET compatible in recent versions). And then you need to look at
the processes that use libffi. A lot of them contain bytecode
interpreters, and those enable data-driven arbitrary code execution as
well. I know that there are efforts under way to harden Python, but
it's going to be tough to get to the point where things are still
difficult for an attacker once they have the ability to make mprotect
calls.
It was pointed out to me that libffi is doing things wrong, and the
trampolines should not be type-generic, but generated so that they match
the function being called. That is, the marshal/unmarshal code would be
open-coded in the trampoline, rather than using some generic mechanism
plus run-time dispatch on data tables describing the function type.
That is a very different design (and typically used by compilers (JIT or
not JIT) to implement native calls). Mapping some code page with a
repeating pattern would no longer work to defeat anti-JIT measures
because it's closer to real JIT. I don't know if kernel support could
make sense in this context, but it would be a completely different
patch.
Thanks,
Florian
--
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
Hi!
> > > > The W^X implementation today is not complete. There exist many user level
> > > > tricks that can be used to load and execute dynamic code. E.g.,
> > > >
> > > > - Load the code into a file and map the file with R-X.
> > > >
> > > > - Load the code in an RW- page. Change the permissions to R--. Then,
> > > > change the permissions to R-X.
> > > >
> > > > - Load the code in an RW- page. Remap the page with R-X to get a separate
> > > > mapping to the same underlying physical page.
> > > >
> > > > IMO, these are all security holes as an attacker can exploit them to inject
> > > > his own code.
> > >
> > > IMO, you are smoking crack^H^H very seriously misunderstanding what
> > > W^X is supposed to protect from.
> > >
> > > W^X is not supposed to protect you from attackers that can already do
> > > system calls. So loading code into a file then mapping the file as R-X
> > > is in no way security hole in W^X.
> > >
> > > If you want to provide protection from attackers that _can_ do system
> > > calls, fine, but please don't talk about W^X and please specify what
> > > types of attacks you want to prevent and why that's good thing.
> >
> > On one hand, Pavel is absolutely right. It is ridiculous to say that
> > "these are all security holes as an attacker can exploit them to inject
> > his own code."
>
> I stand corrected, due to Brad's tweet and follow-ups here:
>
> https://twitter.com/spendergrsec/status/1308728284390318082
>
> It sure does make sense to combine ret2libc/ROP to mprotect() with one's
> own injected shellcode. Compared to doing everything from ROP, this is
> easier and more reliable across versions/builds if the desired
> payload
Ok, so this starts to be a bit confusing.
I thought W^X is to protect from attackers that have overflowed buffer
somewhere, but can not to do arbitrary syscalls, yet.
You are saying that there's important class of attackers that can do
some syscalls but not arbitrary ones.
I'd like to see definition of that attacker (and perhaps description
of the system the protection is expected to be useful on -- if it is
not close to common Linux distros).
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Wed, Sep 23, 2020 at 05:18:35PM +0200, Pavel Machek wrote:
> > It sure does make sense to combine ret2libc/ROP to mprotect() with one's
> > own injected shellcode. Compared to doing everything from ROP, this is
> > easier and more reliable across versions/builds if the desired
> > payload
>
> Ok, so this starts to be a bit confusing.
>
> I thought W^X is to protect from attackers that have overflowed buffer
> somewhere, but can not to do arbitrary syscalls, yet.
>
> You are saying that there's important class of attackers that can do
> some syscalls but not arbitrary ones.
They might be able to do many, most, or all arbitrary syscalls via
ret2libc or such. The crucial detail is that each time they do that,
they risk incompatibility with the given target system (version, build,
maybe ASLR if gadgets from multiple libraries are involved). By using
mprotect(), they only take this risk once (need to get the address of an
mprotect() gadget and of what to change protections on right), and then
they can invoke multiple syscalls from their shellcode more reliably.
So for doing a lot of work, mprotect() combined with injected code can
be easier and more reliable. It is also an extra option an attacker can
use, in addition to doing everything via borrowed code. More
flexibility for the attacker means the attacker may choose whichever
approach works better in a given case (or try several).
I am embarrassed for not thinking/recalling this when I first posted
earlier today. It's actually obvious. I'm just getting old and rusty.
> I'd like to see definition of that attacker (and perhaps description
> of the system the protection is expected to be useful on -- if it is
> not close to common Linux distros).
There's nothing unusual about that attacker and the system.
A couple of other things Brad kindly pointed out:
SELinux already has similar protections (execmem, execmod):
http://lkml.iu.edu/hypermail/linux/kernel/0508.2/0194.html
https://danwalsh.livejournal.com/6117.html
PaX MPROTECT is implemented in a way or at a layer that covers ptrace()
abuse that I mentioned. (At least that's how I understood Brad.)
Alexander
P.S. Meanwhile, Twitter locked my account "for security purposes". Fun.
I'll just let it be for now.
On Wed, Sep 23, 2020 at 7:39 AM Florian Weimer <[email protected]> wrote:
>
> * Solar Designer:
>
> > While I share my opinion here, I don't mean that to block Madhavan's
> > work. I'd rather defer to people more knowledgeable in current userland
> > and ABI issues/limitations and plans on dealing with those, especially
> > to Florian Weimer. I haven't seen Florian say anything specific for or
> > against Madhavan's proposal, and I'd like to. (Have I missed that?)
>
> There was a previous discussion, where I provided feedback (not much
> different from the feedback here, given that the mechanism is mostly the
> same).
>
> I think it's unnecessary for the libffi use case. Precompiled code can
> be loaded from disk because the libffi trampolines are so regular. On
> most architectures, it's not even the code that's patched, but some of
> the data driving it, which happens to be located on the same page due to
> a libffi quirk.
>
> The libffi use case is a bit strange anyway: its trampolines are
> type-generic, and the per-call adjustment is data-driven. This means
> that once you have libffi in the process, you have a generic
> data-to-function-call mechanism available that can be abused (it's even
> fully CET compatible in recent versions). And then you need to look at
> the processes that use libffi. A lot of them contain bytecode
> interpreters, and those enable data-driven arbitrary code execution as
> well. I know that there are efforts under way to harden Python, but
> it's going to be tough to get to the point where things are still
> difficult for an attacker once they have the ability to make mprotect
> calls.
>
> It was pointed out to me that libffi is doing things wrong, and the
> trampolines should not be type-generic, but generated so that they match
> the function being called. That is, the marshal/unmarshal code would be
> open-coded in the trampoline, rather than using some generic mechanism
> plus run-time dispatch on data tables describing the function type.
> That is a very different design (and typically used by compilers (JIT or
> not JIT) to implement native calls). Mapping some code page with a
> repeating pattern would no longer work to defeat anti-JIT measures
> because it's closer to real JIT. I don't know if kernel support could
> make sense in this context, but it would be a completely different
> patch.
I would very much like to see a well-designed kernel facility for
helping userspace do JIT in a safer manner, but designing such a thing
is likely to be distinctly nontrivial. To throw a half-backed idea
out there, suppose a program could pre-declare a list of JIT
verifiers:
static bool ffi_trampoline_verifier(void *target_address, size_t
target_size, void *source_data, void *context);
struct jit_verifier {
.magic = 0xMAGIC_HERE,
.verifier = ffi_trampoline_verifier,
} my_verifier __attribute((section("something special here?)));
and then a system call something like:
instantiate_jit_code(target, source, size, &my_verifier, context);
The idea being that even an attacker that can force a call to
instantiate_jit_code() can only create code that passes verification
by one of the pre-declared verifiers in the process.
On Wed, Sep 23, 2020 at 04:39:31PM +0200, Florian Weimer wrote:
> * Solar Designer:
>
> > While I share my opinion here, I don't mean that to block Madhavan's
> > work. I'd rather defer to people more knowledgeable in current userland
> > and ABI issues/limitations and plans on dealing with those, especially
> > to Florian Weimer. I haven't seen Florian say anything specific for or
> > against Madhavan's proposal, and I'd like to. (Have I missed that?)
[...]
> I think it's unnecessary for the libffi use case.
[...]
> I don't know if kernel support could
> make sense in this context, but it would be a completely different
> patch.
Thanks. Are there currently relevant use cases where the proposed
trampfd would be useful and likely actually made use of by userland -
e.g., specific userland project developers saying they'd use it, or
Madhavan intending to develop and contribute userland patches?
Alexander
On Wed, Sep 23, 2020 at 08:00:07PM +0200, Solar Designer wrote:
> A couple of other things Brad kindly pointed out:
>
> SELinux already has similar protections (execmem, execmod):
>
> http://lkml.iu.edu/hypermail/linux/kernel/0508.2/0194.html
> https://danwalsh.livejournal.com/6117.html
Actually, that's right in Madhavan's "Introduction": "LSMs such as
SELinux implement W^X" and "The W^X implementation today is not
complete." I'm sorry I jumped into this thread out of context.
Alexander
...
>> The W^X implementation today is not complete. There exist many user level
>> tricks that can be used to load and execute dynamic code. E.g.,
>>
>> - Load the code into a file and map the file with R-X.
>>
>> - Load the code in an RW- page. Change the permissions to R--. Then,
>> change the permissions to R-X.
>>
>> - Load the code in an RW- page. Remap the page with R-X to get a separate
>> mapping to the same underlying physical page.
>>
>> IMO, these are all security holes as an attacker can exploit them to inject
>> his own code.
>
> IMO, you are smoking crack^H^H very seriously misunderstanding what
> W^X is supposed to protect from.
>
> W^X is not supposed to protect you from attackers that can already do
> system calls. So loading code into a file then mapping the file as R-X
> is in no way security hole in W^X.
>
> If you want to provide protection from attackers that _can_ do system
> calls, fine, but please don't talk about W^X and please specify what
> types of attacks you want to prevent and why that's good thing.
>
There are two things here - the idea behind W^X and the current realization
of that idea in actual implementation. The idea behind W^X, as I understand,
is to prevent a user from loading arbitrary code into a page and getting it
to execute. If the user code contains a vulnerability, an attacker can
exploit it to potentially inject his own code and get it to execute. This
cannot be denied.
From that perspective, all of the above tricks I have mentioned are tricks
that user code can use to load arbitrary code into a page and get it to
execute.
Now, I don't want the discussion to be stuck in a mere name. If what I am
suggesting needs a name other than "W^X" in the opinion of the reviewers,
that is fine with me. But I don't believe there is any disagreement that
the above user tricks are security holes.
Madhavan
On Wed, 23 Sep 2020, Pavel Machek wrote:
> This is not first crazy patch from your company. Perhaps you should
> have a person with strong Unix/Linux experience performing "straight
> face test" on outgoing patches?
Just for the record: the author of the code has 30+ years experience in
SunOS, Solaris, Unixware, Realtime, SVR4, and Linux.
--
James Morris
<[email protected]>
On Wed, Sep 23, 2020 at 08:11:36PM +0200, Solar Designer wrote:
> On Wed, Sep 23, 2020 at 04:39:31PM +0200, Florian Weimer wrote:
> > * Solar Designer:
> >
> > > While I share my opinion here, I don't mean that to block Madhavan's
> > > work. I'd rather defer to people more knowledgeable in current userland
> > > and ABI issues/limitations and plans on dealing with those, especially
> > > to Florian Weimer. I haven't seen Florian say anything specific for or
> > > against Madhavan's proposal, and I'd like to. (Have I missed that?)
>
> [...]
> > I think it's unnecessary for the libffi use case.
> [...]
>
> > I don't know if kernel support could
> > make sense in this context, but it would be a completely different
> > patch.
>
> Thanks. Are there currently relevant use cases where the proposed
> trampfd would be useful and likely actually made use of by userland -
> e.g., specific userland project developers saying they'd use it, or
> Madhavan intending to develop and contribute userland patches?
>
> Alexander
The trampoline it provides in this version can be implemented completely
in userspace. The kernel part of it is essentially just providing a way
to do text relocations without needing a WX mapping, but the text
relocations would be unnecessary in the first place if the trampoline
was position-independent code.
On 9/23/20 3:42 AM, Pavel Machek wrote:
> Hi!
>
>> Solution proposed in this RFC
>> =============================
>>
>> >From this RFC's perspective, there are two scenarios for dynamic code:
>>
>> Scenario 1
>> ----------
>>
>> We know what code we need only at runtime. For instance, JIT code generated
>> for frequently executed Java methods. Only at runtime do we know what
>> methods need to be JIT compiled. Such code cannot be statically defined. It
>> has to be generated at runtime.
>>
>> Scenario 2
>> ----------
>>
>> We know what code we need in advance. User trampolines are a good example of
>> this. It is possible to define such code statically with some help from the
>> kernel.
>>
>> This RFC addresses (2). (1) needs a general purpose trusted code generator
>> and is out of scope for this RFC.
>
> This is slightly less crazy talk than introduction talking about holes
> in W^X. But it is very, very far from normal Unix system, where you
> have selection of interpretters to run your malware on (sh, python,
> awk, emacs, ...) and often you can even compile malware from sources.
>
> And as you noted, we don't have "a general purpose trusted code
> generator" for our systems.
>
> I believe you should simply delete confusing "introduction" and
> provide details of super-secure system where your patches would be
> useful, instead.
>
> Best regards,
> Pavel
>
This RFC talks about converting dynamic code (which cannot be authenticated)
to static code that can be authenticated using signature verification. That
is the scope of this RFC.
If I have not been clear before, by dynamic code, I mean machine code that is
dynamic in nature. Scripts are beyond the scope of this RFC.
Also, malware compiled from sources is not dynamic code. That is orthogonal
to this RFC. If such malware has a valid signature that the kernel permits its
execution, we have a systemic problem.
I am not saying that script authentication or compiled malware are not problems.
I am just saying that this RFC is not trying to solve all of the security problems.
It is trying to define one way to convert dynamic code to static code to address
one class of problems.
Madhavan
On 9/23/20 4:14 AM, Solar Designer wrote:
>>> The W^X implementation today is not complete. There exist many user level
>>> tricks that can be used to load and execute dynamic code. E.g.,
>>>
>>> - Load the code into a file and map the file with R-X.
>>>
>>> - Load the code in an RW- page. Change the permissions to R--. Then,
>>> change the permissions to R-X.
>>>
>>> - Load the code in an RW- page. Remap the page with R-X to get a separate
>>> mapping to the same underlying physical page.
>>>
>>> IMO, these are all security holes as an attacker can exploit them to inject
>>> his own code.
>> IMO, you are smoking crack^H^H very seriously misunderstanding what
>> W^X is supposed to protect from.
>>
>> W^X is not supposed to protect you from attackers that can already do
>> system calls. So loading code into a file then mapping the file as R-X
>> is in no way security hole in W^X.
>>
>> If you want to provide protection from attackers that _can_ do system
>> calls, fine, but please don't talk about W^X and please specify what
>> types of attacks you want to prevent and why that's good thing.
> On one hand, Pavel is absolutely right. It is ridiculous to say that
> "these are all security holes as an attacker can exploit them to inject
> his own code."
>
Why? Isn't it possible that an attacker can exploit some vulnerability such
as buffer overflow and overwrite the buffer that contains the dynamic code?
> On the other hand, "what W^X is supposed to protect from" depends on how
> the term W^X is defined (historically, by PaX and OpenBSD). It may be
> that W^X is partially not a feature to defeat attacks per se, but also a
> policy enforcement feature preventing use of dangerous techniques (JIT).
>
> Such policy might or might not make sense. It might make sense for ease
> of reasoning, e.g. "I've flipped this setting, and now I'm certain the
> system doesn't have JIT within a process (can still have it through
> dynamically creating and invoking an entire new program), so there are
> no opportunities for an attacker to inject code nor generate previously
> non-existing ROP gadgets into an executable mapping within a process."
>
> I do find it questionable whether such policy and such reasoning make
> sense beyond academia.
>
> Then, there might be even more ways in which W^X is not perfect enough
> to enable such reasoning. What about using ptrace(2) to inject code?
> Should enabling W^X also disable ability to debug programs by non-root?
> We already have Yama ptrace_scope, which can achieve that at the highest
> setting, although that's rather inconvenient and is probably unexpected
> by most to be a requirement for having (ridiculously?) full W^X allowing
> for the academic reasoning.
>
I am not suggesting that W^X be fixed. That is up to the maintainers of that
code. I am saying that if the security subsystem is enhanced in the future with
policies and settings that prevent the user tricks I mentioned, then it becomes
impossible to execute dynamic code except by making security exceptions on a case
by case basis.
As an alternative to making security exceptions, one could convert dynamic code
to static code which can then be authenticated.
> Personally, I am for policies that make more practical sense. For
> example, years ago I advocated here on kernel-hardening that we should
> have a mode where ELF flags enabling/disabling executable stack are
> ignored, and non-executable stack is always enforced. This should also
> be extended to default (at program startup) permissions on more than
> just stack (but also on .bss, typical libcs' heap allocations, etc.)
> However, I am not convinced there's enough value in extending the policy
> to restricting explicit uses of mprotect(2).
>
> Yes, PaX did that, and its emutramp.txt said "runtime code generation is
> by its nature incompatible with PaX's PAGEEXEC/SEGMEXEC and MPROTECT
> features, therefore the real solution is not in emulation but by
> designing a kernel API for runtime code generation and modifying
> userland to make use of it." However, not being convinced in the
> MPROTECT feature having enough practical value, I am also not convinced
> "a kernel API for runtime code generation and modifying userland to make
> use of it" is the way to go.
>
In a separate email, I will try to answer this and provide justification
for why it is better to do it in the kernel.
> Having static instead of dynamically-generated trampolines in userland
> code where possible (and making other userland/ABI changes to make that
> possible in more/all cases) is an obvious improvement, and IMO should be
> a priority over the above.
>
> While I share my opinion here, I don't mean that to block Madhavan's
> work. I'd rather defer to people more knowledgeable in current userland
> and ABI issues/limitations and plans on dealing with those, especially
> to Florian Weimer. I haven't seen Florian say anything specific for or
> against Madhavan's proposal, and I'd like to. (Have I missed that?)
> It'd be wrong to introduce a kernel API that userland doesn't need, and
> it'd be right to introduce one that userland actually intends to use.
>
> I've also added Rich Felker to CC here, for musl libc and its possible
> intent to use the proposed API. (My guess is there's no such need, and
> thus no intent, but Rich might want to confirm that or correct me.)
>
> Alexander
Madhavan
Hi!
> >> Scenario 2
> >> ----------
> >>
> >> We know what code we need in advance. User trampolines are a good example of
> >> this. It is possible to define such code statically with some help from the
> >> kernel.
> >>
> >> This RFC addresses (2). (1) needs a general purpose trusted code generator
> >> and is out of scope for this RFC.
> >
> > This is slightly less crazy talk than introduction talking about holes
> > in W^X. But it is very, very far from normal Unix system, where you
> > have selection of interpretters to run your malware on (sh, python,
> > awk, emacs, ...) and often you can even compile malware from sources.
> >
> > And as you noted, we don't have "a general purpose trusted code
> > generator" for our systems.
> >
> > I believe you should simply delete confusing "introduction" and
> > provide details of super-secure system where your patches would be
> > useful, instead.
>
> This RFC talks about converting dynamic code (which cannot be authenticated)
> to static code that can be authenticated using signature verification. That
> is the scope of this RFC.
>
> If I have not been clear before, by dynamic code, I mean machine code that is
> dynamic in nature. Scripts are beyond the scope of this RFC.
>
> Also, malware compiled from sources is not dynamic code. That is orthogonal
> to this RFC. If such malware has a valid signature that the kernel permits its
> execution, we have a systemic problem.
>
> I am not saying that script authentication or compiled malware are not problems.
> I am just saying that this RFC is not trying to solve all of the security problems.
> It is trying to define one way to convert dynamic code to static code to address
> one class of problems.
Well, you don't have to solve all problems at once.
But solutions have to exist, and AFAIK in this case they don't. You
are armoring doors, but ignoring open windows.
Or very probably you are thinking about something different than
normal desktop distros (Debian 10). Because on my systems, I have
python, gdb and gcc...
It would be nice to specify what other pieces need to be present for
this to make sense -- because it makes no sense on Debian 10.
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On 9/23/20 3:51 PM, Pavel Machek wrote:
> Hi!
>
>>>> Scenario 2
>>>> ----------
>>>>
>>>> We know what code we need in advance. User trampolines are a good example of
>>>> this. It is possible to define such code statically with some help from the
>>>> kernel.
>>>>
>>>> This RFC addresses (2). (1) needs a general purpose trusted code generator
>>>> and is out of scope for this RFC.
>>>
>>> This is slightly less crazy talk than introduction talking about holes
>>> in W^X. But it is very, very far from normal Unix system, where you
>>> have selection of interpretters to run your malware on (sh, python,
>>> awk, emacs, ...) and often you can even compile malware from sources.
>>>
>>> And as you noted, we don't have "a general purpose trusted code
>>> generator" for our systems.
>>>
>>> I believe you should simply delete confusing "introduction" and
>>> provide details of super-secure system where your patches would be
>>> useful, instead.
>>
>> This RFC talks about converting dynamic code (which cannot be authenticated)
>> to static code that can be authenticated using signature verification. That
>> is the scope of this RFC.
>>
>> If I have not been clear before, by dynamic code, I mean machine code that is
>> dynamic in nature. Scripts are beyond the scope of this RFC.
>>
>> Also, malware compiled from sources is not dynamic code. That is orthogonal
>> to this RFC. If such malware has a valid signature that the kernel permits its
>> execution, we have a systemic problem.
>>
>> I am not saying that script authentication or compiled malware are not problems.
>> I am just saying that this RFC is not trying to solve all of the security problems.
>> It is trying to define one way to convert dynamic code to static code to address
>> one class of problems.
>
> Well, you don't have to solve all problems at once.
>
> But solutions have to exist, and AFAIK in this case they don't. You
> are armoring doors, but ignoring open windows.
>
I am afraid I don't agree that the other open security issues must be
addressed for this RFC to make sense. If you think that any of those
issues actually has a bad interaction/intersection with this RFC,
let me know how and I will address it.
> Or very probably you are thinking about something different than
> normal desktop distros (Debian 10). Because on my systems, I have
> python, gdb and gcc...
>
> It would be nice to specify what other pieces need to be present for
> this to make sense -- because it makes no sense on Debian 10.
>
Since this RFC pertains to converting dynamic machine code to static
code, it has nothing to do with the other items you have mentioned.
I am not disagreeing that the other items need to be addressed. But
they are orthogonal.
Madhavan
On 9/23/20 9:39 AM, Florian Weimer wrote:
> * Solar Designer:
>
>> While I share my opinion here, I don't mean that to block Madhavan's
>> work. I'd rather defer to people more knowledgeable in current userland
>> and ABI issues/limitations and plans on dealing with those, especially
>> to Florian Weimer. I haven't seen Florian say anything specific for or
>> against Madhavan's proposal, and I'd like to. (Have I missed that?)
>
> There was a previous discussion, where I provided feedback (not much
> different from the feedback here, given that the mechanism is mostly the
> same).
>
> I think it's unnecessary for the libffi use case. Precompiled code can
> be loaded from disk because the libffi trampolines are so regular. On
> most architectures, it's not even the code that's patched, but some of
> the data driving it, which happens to be located on the same page due to
> a libffi quirk.
>
> The libffi use case is a bit strange anyway: its trampolines are
> type-generic, and the per-call adjustment is data-driven. This means
> that once you have libffi in the process, you have a generic
> data-to-function-call mechanism available that can be abused (it's even
> fully CET compatible in recent versions). And then you need to look at
> the processes that use libffi. A lot of them contain bytecode
> interpreters, and those enable data-driven arbitrary code execution as
> well. I know that there are efforts under way to harden Python, but
> it's going to be tough to get to the point where things are still
> difficult for an attacker once they have the ability to make mprotect
> calls.
>
> It was pointed out to me that libffi is doing things wrong, and the
> trampolines should not be type-generic, but generated so that they match
> the function being called. That is, the marshal/unmarshal code would be
> open-coded in the trampoline, rather than using some generic mechanism
> plus run-time dispatch on data tables describing the function type.
> That is a very different design (and typically used by compilers (JIT or
> not JIT) to implement native calls). Mapping some code page with a
> repeating pattern would no longer work to defeat anti-JIT measures
> because it's closer to real JIT. I don't know if kernel support could
> make sense in this context, but it would be a completely different
> patch.
>
> Thanks,
> Florian
>
Hi Florian,
I am making myself familiar with anti-JIT measures before I can respond
to this comment. Bear with me. I will also respond to the above
libffi comment.
Madhavan
On 23/09/2020 22:51, Pavel Machek wrote:
> Hi!
>
>>>> Scenario 2
>>>> ----------
>>>>
>>>> We know what code we need in advance. User trampolines are a good example of
>>>> this. It is possible to define such code statically with some help from the
>>>> kernel.
>>>>
>>>> This RFC addresses (2). (1) needs a general purpose trusted code generator
>>>> and is out of scope for this RFC.
>>>
>>> This is slightly less crazy talk than introduction talking about holes
>>> in W^X. But it is very, very far from normal Unix system, where you
>>> have selection of interpretters to run your malware on (sh, python,
>>> awk, emacs, ...) and often you can even compile malware from sources.
>>>
>>> And as you noted, we don't have "a general purpose trusted code
>>> generator" for our systems.
>>>
>>> I believe you should simply delete confusing "introduction" and
>>> provide details of super-secure system where your patches would be
>>> useful, instead.
>>
>> This RFC talks about converting dynamic code (which cannot be authenticated)
>> to static code that can be authenticated using signature verification. That
>> is the scope of this RFC.
>>
>> If I have not been clear before, by dynamic code, I mean machine code that is
>> dynamic in nature. Scripts are beyond the scope of this RFC.
>>
>> Also, malware compiled from sources is not dynamic code. That is orthogonal
>> to this RFC. If such malware has a valid signature that the kernel permits its
>> execution, we have a systemic problem.
>>
>> I am not saying that script authentication or compiled malware are not problems.
>> I am just saying that this RFC is not trying to solve all of the security problems.
>> It is trying to define one way to convert dynamic code to static code to address
>> one class of problems.
>
> Well, you don't have to solve all problems at once.
>
> But solutions have to exist, and AFAIK in this case they don't. You
> are armoring doors, but ignoring open windows.
FYI, script execution is being addressed (for the kernel part) by this
patch series:
https://lore.kernel.org/lkml/[email protected]/
>
> Or very probably you are thinking about something different than
> normal desktop distros (Debian 10). Because on my systems, I have
> python, gdb and gcc...
It doesn't make sense for a tailored security system to leave all these
tools available to an attacker.
>
> It would be nice to specify what other pieces need to be present for
> this to make sense -- because it makes no sense on Debian 10.
Not all kernel features make sense for a generic/undefined usage,
especially specific security mechanisms (e.g. SELinux, Smack, Tomoyo,
SafeSetID, LoadPin, IMA, IPE, secure/trusted boot, lockdown, etc.), but
they can still be definitely useful.
>
> Best regards,
> Pavel
>
Hi!
> >>> I believe you should simply delete confusing "introduction" and
> >>> provide details of super-secure system where your patches would be
> >>> useful, instead.
> >>
> >> This RFC talks about converting dynamic code (which cannot be authenticated)
> >> to static code that can be authenticated using signature verification. That
> >> is the scope of this RFC.
> >>
> >> If I have not been clear before, by dynamic code, I mean machine code that is
> >> dynamic in nature. Scripts are beyond the scope of this RFC.
> >>
> >> Also, malware compiled from sources is not dynamic code. That is orthogonal
> >> to this RFC. If such malware has a valid signature that the kernel permits its
> >> execution, we have a systemic problem.
> >>
> >> I am not saying that script authentication or compiled malware are not problems.
> >> I am just saying that this RFC is not trying to solve all of the security problems.
> >> It is trying to define one way to convert dynamic code to static code to address
> >> one class of problems.
> >
> > Well, you don't have to solve all problems at once.
> >
> > But solutions have to exist, and AFAIK in this case they don't. You
> > are armoring doors, but ignoring open windows.
>
> FYI, script execution is being addressed (for the kernel part) by this
> patch series:
> https://lore.kernel.org/lkml/[email protected]/
Ok.
> > Or very probably you are thinking about something different than
> > normal desktop distros (Debian 10). Because on my systems, I have
> > python, gdb and gcc...
>
> It doesn't make sense for a tailored security system to leave all these
> tools available to an attacker.
And it also does not make sense to use "trampoline file descriptor" on
generic system... while W^X should make sense there.
> > It would be nice to specify what other pieces need to be present for
> > this to make sense -- because it makes no sense on Debian 10.
>
> Not all kernel features make sense for a generic/undefined usage,
> especially specific security mechanisms (e.g. SELinux, Smack, Tomoyo,
> SafeSetID, LoadPin, IMA, IPE, secure/trusted boot, lockdown, etc.), but
> they can still be definitely useful.
Yep... so... I'd expect something like... "so you have single-purpose
system with all script interpreters removed, IMA hashing all the files
to make sure they are not modified, and W^X enabled. Attacker can
still execute code after buffer overflow by .... and trapoline file
descriptor addrsses that"... so that people running generic systems
can stop reading after first sentence.
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On 25/09/2020 00:05, Pavel Machek wrote:
> Hi!
>
>>>>> I believe you should simply delete confusing "introduction" and
>>>>> provide details of super-secure system where your patches would be
>>>>> useful, instead.
>>>>
>>>> This RFC talks about converting dynamic code (which cannot be authenticated)
>>>> to static code that can be authenticated using signature verification. That
>>>> is the scope of this RFC.
>>>>
>>>> If I have not been clear before, by dynamic code, I mean machine code that is
>>>> dynamic in nature. Scripts are beyond the scope of this RFC.
>>>>
>>>> Also, malware compiled from sources is not dynamic code. That is orthogonal
>>>> to this RFC. If such malware has a valid signature that the kernel permits its
>>>> execution, we have a systemic problem.
>>>>
>>>> I am not saying that script authentication or compiled malware are not problems.
>>>> I am just saying that this RFC is not trying to solve all of the security problems.
>>>> It is trying to define one way to convert dynamic code to static code to address
>>>> one class of problems.
>>>
>>> Well, you don't have to solve all problems at once.
>>>
>>> But solutions have to exist, and AFAIK in this case they don't. You
>>> are armoring doors, but ignoring open windows.
>>
>> FYI, script execution is being addressed (for the kernel part) by this
>> patch series:
>> https://lore.kernel.org/lkml/[email protected]/
>
> Ok.
>
>>> Or very probably you are thinking about something different than
>>> normal desktop distros (Debian 10). Because on my systems, I have
>>> python, gdb and gcc...
>>
>> It doesn't make sense for a tailored security system to leave all these
>> tools available to an attacker.
>
> And it also does not make sense to use "trampoline file descriptor" on
> generic system... while W^X should make sense there.
Well, as said before, (full/original/system-wide) W^X may require
trampfd (as well as other building-blocks).
I guess most Linux deployments are not on "generic systems"
anyway (even if they may be based on generic distros), and W^X
contradicts the fact that users/attackers can do whatever they want on
the system.
>
>>> It would be nice to specify what other pieces need to be present for
>>> this to make sense -- because it makes no sense on Debian 10.
>>
>> Not all kernel features make sense for a generic/undefined usage,
>> especially specific security mechanisms (e.g. SELinux, Smack, Tomoyo,
>> SafeSetID, LoadPin, IMA, IPE, secure/trusted boot, lockdown, etc.), but
>> they can still be definitely useful.
>
> Yep... so... I'd expect something like... "so you have single-purpose
> system
No one talked about a single-purpose system.
> with all script interpreters removed,
Not necessarily with the patch series I pointed out just before.
> IMA hashing all the files
> to make sure they are not modified, and W^X enabled.
System-wide W^X is not only for memory, and as Madhavan said: "this RFC
pertains to converting dynamic [writable] machine code to static
[non-writable] code".
> Attacker can
> still execute code after buffer overflow by .... and trapoline file
> descriptor addrsses that"... so that people running generic systems
> can stop reading after first sentence.
Are you proposing to add a
"[feature-not-useful-without-a-proper-system-configuration]" tag in
subjects? :)