From: Petr Tesarik <[email protected]>
This patch series implements x86_64 arch hooks for the generic SandBox
Mode infrastructure.
SandBox Mode on x86_64 is implemented as follows:
* The target function runs with CPL 3 (same as user mode) within its
own virtual address space.
* Interrupt entry/exit paths are modified to let the interrupt handlers
always run with kernel CR3 and restore sandbox CR3 when returning to
sandbox mode.
* To avoid undesirable user mode processing (FPU state, signals, etc.),
the value of pt_regs->cs is temporarily adjusted to make it look like
coming from kernel mode.
* On a CPU fault, execution stops immediately, returning -EFAULT to
the caller.
Petr Tesarik (8):
sbm: x86: page table arch hooks
sbm: x86: execute target function on sandbox mode stack
sbm: x86: map system data structures into the sandbox
sbm: x86: allocate and map an exception stack
sbm: x86: handle sandbox mode faults
sbm: x86: switch to sandbox mode pages in arch_sbm_exec()
sbm: documentation of the x86-64 SandBox Mode implementation
sbm: x86: lazy TLB flushing
Documentation/security/sandbox-mode.rst | 25 ++
arch/x86/Kconfig | 1 +
arch/x86/entry/entry_64.S | 123 ++++++
arch/x86/include/asm/page_64_types.h | 1 +
arch/x86/include/asm/ptrace.h | 21 +
arch/x86/include/asm/sbm.h | 83 ++++
arch/x86/include/asm/segment.h | 7 +
arch/x86/include/asm/thread_info.h | 3 +
arch/x86/kernel/Makefile | 2 +
arch/x86/kernel/asm-offsets.c | 10 +
arch/x86/kernel/sbm/Makefile | 16 +
arch/x86/kernel/sbm/call_64.S | 95 +++++
arch/x86/kernel/sbm/core.c | 499 ++++++++++++++++++++++++
arch/x86/kernel/traps.c | 14 +-
arch/x86/mm/fault.c | 6 +
15 files changed, 905 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/include/asm/sbm.h
create mode 100644 arch/x86/kernel/sbm/Makefile
create mode 100644 arch/x86/kernel/sbm/call_64.S
create mode 100644 arch/x86/kernel/sbm/core.c
--
2.34.1
From: Petr Tesarik <[email protected]>
Map CPU system data structures (GDT, TSS, IDT) read-only into every sandbox
instance. Map interrupt stacks read-write.
The TSS mappings may look confusing. The trick is that TSS pages are mapped
twice in the kernel address space: once read-only and once read-write. The
GDT entry for the TR register uses the read-only address, but since __pa()
does not work for virtual addresses in this range (cpu_entry_area), use the
read-write mapping to get TSS physical address.
Signed-off-by: Petr Tesarik <[email protected]>
---
arch/x86/include/asm/page_64_types.h | 1 +
arch/x86/kernel/sbm/core.c | 74 ++++++++++++++++++++++++++++
2 files changed, 75 insertions(+)
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 06ef25411d62..62f6e40b3361 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -29,6 +29,7 @@
#define IST_INDEX_DB 2
#define IST_INDEX_MCE 3
#define IST_INDEX_VC 4
+#define IST_INDEX_NUM 7
/*
* Set __PAGE_OFFSET to the most negative possible address +
diff --git a/arch/x86/kernel/sbm/core.c b/arch/x86/kernel/sbm/core.c
index de6986801148..f3a123d64afc 100644
--- a/arch/x86/kernel/sbm/core.c
+++ b/arch/x86/kernel/sbm/core.c
@@ -7,9 +7,13 @@
* SandBox Mode (SBM) implementation for the x86 architecture.
*/
+#include <asm/cpu_entry_area.h>
+#include <asm/desc.h>
#include <asm/pgtable.h>
+#include <asm/page.h>
#include <asm/sbm.h>
#include <asm/sections.h>
+#include <linux/cpumask.h>
#include <linux/mm.h>
#include <linux/sbm.h>
#include <linux/sched/task_stack.h>
@@ -155,6 +159,72 @@ static int map_kernel(struct x86_sbm_state *state)
return 0;
}
+/** map_cpu_data() - map CPU system data structures into a sandbox instance
+ * @sbm: Target sandbox instance.
+ *
+ * Create sandbox page tables for:
+ * * Global Descriptor Table (GDT)
+ * * Task State Segment (TSS)
+ * * Interrupt Descriptor Table (IDT).
+ *
+ * Return: Zero on success, negative error code on failure.
+ */
+static int map_cpu_data(struct x86_sbm_state *state)
+{
+ unsigned long off;
+ phys_addr_t paddr;
+ unsigned int ist;
+ void *vaddr;
+ int cpu;
+ int err;
+
+ for_each_possible_cpu(cpu) {
+ struct cpu_entry_area *cea;
+ struct tss_struct *tss;
+
+ err = map_page(state, (unsigned long)get_cpu_gdt_ro(cpu),
+ PHYS_PFN(get_cpu_gdt_paddr(cpu)),
+ PAGE_KERNEL_RO);
+ if (err)
+ return err;
+
+ cea = get_cpu_entry_area(cpu);
+
+ tss = &cea->tss;
+ paddr = __pa(&per_cpu(cpu_tss_rw, cpu));
+ for (off = 0; off < sizeof(cpu_tss_rw); off += PAGE_SIZE) {
+ err = map_page(state, (unsigned long)tss + off,
+ PHYS_PFN(paddr + off), PAGE_KERNEL_RO);
+ if (err)
+ return err;
+ }
+
+ paddr = slow_virt_to_phys(&cea->entry_stack_page);
+ err = map_page(state, (unsigned long)&cea->entry_stack_page,
+ PHYS_PFN(paddr), PAGE_KERNEL);
+ if (err)
+ return err;
+
+ for (ist = 0; ist < IST_INDEX_NUM; ++ist) {
+ vaddr = (void *)tss->x86_tss.ist[ist];
+ if (!vaddr)
+ continue;
+
+ for (off = EXCEPTION_STKSZ; off; off -= PAGE_SIZE) {
+ paddr = slow_virt_to_phys(vaddr - off);
+ err = map_page(state, (unsigned long)vaddr - off,
+ PHYS_PFN(paddr), PAGE_KERNEL);
+ if (err)
+ return err;
+ }
+ }
+ }
+
+ paddr = slow_virt_to_phys((void *)CPU_ENTRY_AREA_RO_IDT);
+ return map_page(state, CPU_ENTRY_AREA_RO_IDT, PHYS_PFN(paddr),
+ PAGE_KERNEL_RO);
+}
+
int arch_sbm_init(struct sbm *sbm)
{
struct x86_sbm_state *state;
@@ -194,6 +264,10 @@ int arch_sbm_init(struct sbm *sbm)
if (err)
return err;
+ err = map_cpu_data(state);
+ if (err)
+ return err;
+
return 0;
}
--
2.34.1
From: Petr Tesarik <[email protected]>
Provide a fault handler for sandbox mode. Set the sandbox mode instance
error code, abort the sandbox and return to the caller. To allow graceful
return from a fatal fault, save all callee-saved registers (including the
stack pointer) just before passing control to the target function.
Modify the handlers for #PF and #DF CPU exceptions to call this handler if
coming from sandbox mode. The check is based on the saved CS register,
which should be modified in the entry path to a value that is otherwise not
possible (__SBM_CS).
For the page fault handler, make sure that sandbox mode check is placed
before do_kern_addr_fault(). That function calls spurious_kernel_fault(),
which implements lazy TLB invalidation of kernel pages and it assumes that
the faulting instruction ran with kernel-mode page tables; it would produce
false positives for sandbox mode.
Signed-off-by: Petr Tesarik <[email protected]>
---
arch/x86/include/asm/ptrace.h | 21 +++++++++++++++++++++
arch/x86/include/asm/sbm.h | 24 ++++++++++++++++++++++++
arch/x86/include/asm/segment.h | 7 +++++++
arch/x86/kernel/asm-offsets.c | 5 +++++
arch/x86/kernel/sbm/call_64.S | 21 +++++++++++++++++++++
arch/x86/kernel/sbm/core.c | 26 ++++++++++++++++++++++++++
arch/x86/kernel/traps.c | 11 +++++++++++
arch/x86/mm/fault.c | 6 ++++++
8 files changed, 121 insertions(+)
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index f4db78b09c8f..f66f16f037b0 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -164,6 +164,27 @@ static inline bool user_64bit_mode(struct pt_regs *regs)
#endif
}
+/*
+ * sandbox_mode() - did a register set come from SandBox Mode?
+ * @regs: register set
+ */
+static inline bool sandbox_mode(struct pt_regs *regs)
+{
+#ifdef CONFIG_X86_64
+#ifdef CONFIG_SANDBOX_MODE
+ /*
+ * SandBox Mode always runs in 64-bit and it is not implemented
+ * on paravirt systems, so this is the only possible value.
+ */
+ return regs->cs == __SBM_CS;
+#else /* !CONFIG_SANDBOX_MODE */
+ return false;
+#endif
+#else /* !CONFIG_X86_64 */
+ return false;
+#endif
+}
+
/*
* Determine whether the register set came from any context that is running in
* 64-bit mode.
diff --git a/arch/x86/include/asm/sbm.h b/arch/x86/include/asm/sbm.h
index ca4741b449e8..229b1ac3bbd4 100644
--- a/arch/x86/include/asm/sbm.h
+++ b/arch/x86/include/asm/sbm.h
@@ -11,23 +11,29 @@
#include <asm/processor.h>
+struct pt_regs;
+
#if defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE)
#include <asm/pgtable_types.h>
/**
* struct x86_sbm_state - Run-time state of the environment.
+ * @sbm: Link back to the SBM instance.
* @pgd: Sandbox mode page global directory.
* @stack: Sandbox mode stack.
* @exc_stack: Exception and IRQ stack.
+ * @return_sp: Stack pointer for returning to kernel mode.
*
* One instance of this union is allocated for each sandbox and stored as SBM
* instance private data.
*/
struct x86_sbm_state {
+ struct sbm *sbm;
pgd_t *pgd;
unsigned long stack;
unsigned long exc_stack;
+ unsigned long return_sp;
};
/**
@@ -43,6 +49,18 @@ static inline unsigned long top_of_intr_stack(void)
return current_top_of_stack();
}
+/**
+ * handle_sbm_fault() - Handle a CPU fault in sandbox mode.
+ * @regs: Saved registers at fault.
+ * @error_code: CPU error code.
+ * @address: Fault address (CR2 register).
+ *
+ * Handle a sandbox mode fault. The caller should use sandbox_mode() to
+ * check that @regs came from sandbox mode before calling this function.
+ */
+void handle_sbm_fault(struct pt_regs *regs, unsigned long error_code,
+ unsigned long address);
+
#else /* defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE) */
static inline unsigned long top_of_intr_stack(void)
@@ -50,6 +68,12 @@ static inline unsigned long top_of_intr_stack(void)
return current_top_of_stack();
}
+static inline void handle_sbm_fault(struct pt_regs *regs,
+ unsigned long error_code,
+ unsigned long address)
+{
+}
+
#endif /* defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE) */
#endif /* __ASM_SBM_H */
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index 9d6411c65920..966831385d18 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -217,6 +217,13 @@
#define __USER_CS (GDT_ENTRY_DEFAULT_USER_CS*8 + 3)
#define __CPUNODE_SEG (GDT_ENTRY_CPUNODE*8 + 3)
+/*
+ * Sandbox runs with __USER_CS, but the interrupt entry code sets the RPL
+ * in the saved selector to zero to avoid user-mode processing (FPU, signal
+ * delivery, etc.). This is the resulting pseudo-CS.
+ */
+#define __SBM_CS (GDT_ENTRY_DEFAULT_USER_CS*8)
+
#endif
#define IDT_ENTRIES 256
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 6913b372ccf7..44d4f0a0cb19 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -20,6 +20,7 @@
#include <asm/suspend.h>
#include <asm/tlbflush.h>
#include <asm/tdx.h>
+#include <asm/sbm.h>
#ifdef CONFIG_XEN
#include <xen/interface/xen.h>
@@ -120,4 +121,8 @@ static void __used common(void)
OFFSET(ARIA_CTX_rounds, aria_ctx, rounds);
#endif
+#if defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE)
+ COMMENT("SandBox Mode");
+ OFFSET(SBM_return_sp, x86_sbm_state, return_sp);
+#endif
}
diff --git a/arch/x86/kernel/sbm/call_64.S b/arch/x86/kernel/sbm/call_64.S
index 1b232c8d15b7..6a615b4f6047 100644
--- a/arch/x86/kernel/sbm/call_64.S
+++ b/arch/x86/kernel/sbm/call_64.S
@@ -22,6 +22,17 @@
* rcx .. top of sandbox stack
*/
SYM_FUNC_START(x86_sbm_exec)
+ /* save all callee-saved registers */
+ push %rbp
+ push %rbx
+ push %r12
+ push %r13
+ push %r14
+ push %r15
+
+ /* to be used by sandbox abort */
+ mov %rsp, SBM_return_sp(%rdi)
+
/*
* Set up the sandbox stack:
* 1. Store the old stack pointer at the top of the sandbox stack,
@@ -37,5 +48,15 @@ SYM_FUNC_START(x86_sbm_exec)
pop %rsp
+SYM_INNER_LABEL(x86_sbm_return, SYM_L_GLOBAL)
+ ANNOTATE_NOENDBR // IRET target via x86_sbm_fault()
+
+ /* restore callee-saved registers and return */
+ pop %r15
+ pop %r14
+ pop %r13
+ pop %r12
+ pop %rbx
+ pop %rbp
RET
SYM_FUNC_END(x86_sbm_exec)
diff --git a/arch/x86/kernel/sbm/core.c b/arch/x86/kernel/sbm/core.c
index 81f1b0093537..d4c378847e93 100644
--- a/arch/x86/kernel/sbm/core.c
+++ b/arch/x86/kernel/sbm/core.c
@@ -13,6 +13,8 @@
#include <asm/page.h>
#include <asm/sbm.h>
#include <asm/sections.h>
+#include <asm/segment.h>
+#include <asm/trap_pf.h>
#include <linux/cpumask.h>
#include <linux/mm.h>
#include <linux/sbm.h>
@@ -23,6 +25,7 @@
asmlinkage int x86_sbm_exec(struct x86_sbm_state *state, sbm_func func,
void *args, unsigned long sbm_tos);
+extern char x86_sbm_return[];
static inline phys_addr_t page_to_ptval(struct page *page)
{
@@ -343,6 +346,8 @@ int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *args)
struct x86_sbm_state *state = sbm->private;
int err;
+ state->sbm = sbm;
+
/* let interrupt handlers use the sandbox state page */
barrier();
WRITE_ONCE(current_thread_info()->sbm_state, state);
@@ -354,3 +359,24 @@ int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *args)
return err;
}
+
+void handle_sbm_fault(struct pt_regs *regs, unsigned long error_code,
+ unsigned long address)
+{
+ struct x86_sbm_state *state = current_thread_info()->sbm_state;
+
+ /*
+ * Force -EFAULT unless the fault was due to a user-mode instruction
+ * fetch from the designated return address.
+ */
+ if (error_code != (X86_PF_PROT | X86_PF_USER | X86_PF_INSTR) ||
+ address != (unsigned long)x86_sbm_return)
+ state->sbm->error = -EFAULT;
+
+ /* modify IRET frame to exit from sandbox */
+ regs->ip = (unsigned long)x86_sbm_return;
+ regs->cs = __KERNEL_CS;
+ regs->flags = X86_EFLAGS_IF;
+ regs->sp = state->return_sp;
+ regs->ss = __KERNEL_DS;
+}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index b9c9c74314e7..8fc5b17b8fb4 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -416,6 +416,12 @@ DEFINE_IDTENTRY_DF(exc_double_fault)
irqentry_nmi_enter(regs);
instrumentation_begin();
+
+ if (sandbox_mode(regs)) {
+ handle_sbm_fault(regs, error_code, 0);
+ return;
+ }
+
notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV);
tsk->thread.error_code = error_code;
@@ -675,6 +681,11 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
goto exit;
}
+ if (sandbox_mode(regs)) {
+ handle_sbm_fault(regs, error_code, 0);
+ return;
+ }
+
if (gp_try_fixup_and_notify(regs, X86_TRAP_GP, error_code, desc, 0))
goto exit;
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 679b09cfe241..f223b258e53f 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -34,6 +34,7 @@
#include <asm/kvm_para.h> /* kvm_handle_async_pf */
#include <asm/vdso.h> /* fixup_vdso_exception() */
#include <asm/irq_stack.h>
+#include <asm/sbm.h>
#define CREATE_TRACE_POINTS
#include <asm/trace/exceptions.h>
@@ -1500,6 +1501,11 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,
if (unlikely(kmmio_fault(regs, address)))
return;
+ if (sandbox_mode(regs)) {
+ handle_sbm_fault(regs, error_code, address);
+ return;
+ }
+
/* Was the fault on kernel-controlled part of the address space? */
if (unlikely(fault_in_kernel_space(address))) {
do_kern_addr_fault(regs, error_code, address);
--
2.34.1
From: Petr Tesarik <[email protected]>
Change CR3 to the sandbox page tables while running in sandbox mode. Since
interrupts are enabled, but interrupt handlers cannot run in sandbox mode,
modify the interrupt entry and exit paths to leave/reenter sandbox mode as
needed. For now, these modifications are not implemented for Xen PV, so add
a conflict with CONFIG_XEN_PV.
For interrupt entry, save the kernel-mode CR3 value in a dynamically
allocated state page and map this page to a fixed virtual address in
sandbox mode, so it can be found without relying on any CPU state other
than paging. In kernel-mode, this address maps to a zero-filled page in the
kernel BSS section.
Special care is needed to flush the TLB when entering and leaving sandbox
mode, because it changes the mapping of kernel addresses. Kernel page table
entries are marked as global and thus normally not flushed when a new value
is written to CR3. To flush them, turn off the PGE bit in CR4 when entering
sandbox mode (and restore CR4.PGE when leaving sandbox mode). Albeit not
very efficient, this method is simple and works.
Signed-off-by: Petr Tesarik <[email protected]>
---
arch/x86/Kconfig | 2 +-
arch/x86/entry/entry_64.S | 136 ++++++++++++++++++++++++++++++++++
arch/x86/include/asm/sbm.h | 4 +
arch/x86/kernel/asm-offsets.c | 5 ++
arch/x86/kernel/sbm/call_64.S | 48 ++++++++++--
arch/x86/kernel/sbm/core.c | 23 +++++-
6 files changed, 209 insertions(+), 9 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 090d46c7ee7c..e6ee1d3a273b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -188,7 +188,7 @@ config X86
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if MMU && COMPAT
select HAVE_ARCH_COMPAT_MMAP_BASES if MMU && COMPAT
select HAVE_ARCH_PREL32_RELOCATIONS
- select HAVE_ARCH_SBM if X86_64
+ select HAVE_ARCH_SBM if X86_64 && !XEN_PV
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
select HAVE_ARCH_STACKLEAK
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index c40f89ab1b4c..e1364115408a 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -623,6 +623,23 @@ SYM_INNER_LABEL(restore_regs_and_return_to_kernel, SYM_L_GLOBAL)
ud2
1:
#endif
+
+#ifdef CONFIG_SANDBOX_MODE
+ /* Restore CR3 if the exception came from sandbox mode. */
+ cmpw $__SBM_CS, CS(%rsp)
+ jne .Lreturn_cr3_done
+
+ movq PER_CPU_VAR(pcpu_hot + X86_current_task), %rcx
+ movq TASK_sbm_state(%rcx), %rcx
+ movq SBM_sbm_cr3(%rcx), %rcx
+ movq %cr4, %rax
+ andb $~X86_CR4_PGE, %al
+ movq %rax, %cr4
+ movq %rcx, %cr3
+ orb $3, CS(%rsp)
+#endif
+
+.Lreturn_cr3_done:
POP_REGS
addq $8, %rsp /* skip regs->orig_ax */
/*
@@ -867,6 +884,27 @@ SYM_CODE_START(paranoid_entry)
PUSH_AND_CLEAR_REGS save_ret=1
ENCODE_FRAME_POINTER 8
+#ifdef CONFIG_SANDBOX_MODE
+ /*
+ * If sandbox mode was active, adjust the saved CS, switch to
+ * kernel CR3 and skip non-sandbox CR3 handling. Save old CR3
+ * in %r14 even if not using PAGE_TABLE_ISOLATION. This is
+ * needed during transition to sandbox mode, when CR3 is already
+ * set, but CS is still __KERNEL_CS.
+ */
+ movq x86_sbm_state + SBM_kernel_cr3, %rcx
+ jrcxz .Lparanoid_switch_to_kernel
+
+ movq %cr3, %r14
+ andb $~3, CS+8(%rsp)
+ movq %cr4, %rax
+ orb $X86_CR4_PGE, %al
+ movq %rax, %cr4
+ movq %rcx, %cr3
+ jmp .Lparanoid_gsbase
+#endif
+
+.Lparanoid_switch_to_kernel:
/*
* Always stash CR3 in %r14. This value will be restored,
* verbatim, at exit. Needed if paranoid_entry interrupted
@@ -884,6 +922,7 @@ SYM_CODE_START(paranoid_entry)
*/
SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14
+.Lparanoid_gsbase:
/*
* Handling GSBASE depends on the availability of FSGSBASE.
*
@@ -967,6 +1006,22 @@ SYM_CODE_START_LOCAL(paranoid_exit)
*/
IBRS_EXIT save_reg=%r15
+#ifdef CONFIG_SANDBOX_MODE
+ /*
+ * When returning to sandbox mode, the sandbox CR3 is restored in
+ * restore_regs_and_return_to_kernel. When returning to kernel mode,
+ * but sandbox mode is active, restore CR3 from %r14 here.
+ */
+ cmpw $__SBM_CS, CS(%rsp)
+ je .Lparanoid_exit_fsgs
+ movq PER_CPU_VAR(pcpu_hot + X86_current_task), %rax
+ cmpq $0, TASK_sbm_state(%rax)
+ je .Lparanoid_exit_no_sbm
+ movq %r14, %cr3
+ jmp .Lparanoid_exit_fsgs
+#endif
+
+.Lparanoid_exit_no_sbm:
/*
* The order of operations is important. RESTORE_CR3 requires
* kernel GSBASE.
@@ -977,6 +1032,7 @@ SYM_CODE_START_LOCAL(paranoid_exit)
*/
RESTORE_CR3 scratch_reg=%rax save_reg=%r14
+.Lparanoid_exit_fsgs:
/* Handle the three GSBASE cases */
ALTERNATIVE "jmp .Lparanoid_exit_checkgs", "", X86_FEATURE_FSGSBASE
@@ -1007,6 +1063,24 @@ SYM_CODE_START(error_entry)
testb $3, CS+8(%rsp)
jz .Lerror_kernelspace
+#ifdef CONFIG_SANDBOX_MODE
+ /*
+ * If sandbox mode was active, adjust the saved CS,
+ * unconditionally switch to kernel CR3 and continue
+ * as if the interrupt was from kernel space.
+ */
+ movq x86_sbm_state + SBM_kernel_cr3, %rcx
+ jrcxz .Lerror_swapgs
+
+ andb $~3, CS+8(%rsp)
+ movq %cr4, %rax
+ orb $X86_CR4_PGE, %al
+ movq %rax, %cr4
+ movq %rcx, %cr3
+ jmp .Lerror_entry_done_lfence
+#endif
+
+.Lerror_swapgs:
/*
* We entered from user mode or we're pretending to have entered
* from user mode due to an IRET fault.
@@ -1149,6 +1223,11 @@ SYM_CODE_START(asm_exc_nmi)
testb $3, CS-RIP+8(%rsp)
jz .Lnmi_from_kernel
+#ifdef CONFIG_SANDBOX_MODE
+ cmpq $0, x86_sbm_state + SBM_kernel_cr3
+ jne .Lnmi_from_sbm
+#endif
+
/*
* NMI from user mode. We need to run on the thread stack, but we
* can't go through the normal entry paths: NMIs are masked, and
@@ -1194,6 +1273,47 @@ SYM_CODE_START(asm_exc_nmi)
*/
jmp swapgs_restore_regs_and_return_to_usermode
+#ifdef CONFIG_SANDBOX_MODE
+.Lnmi_from_sbm:
+ /*
+ * If NMI from sandbox mode, this cannot be a nested NMI. Adjust
+ * saved CS, load kernel CR3 and continue on the sandbox exception
+ * stack. The code is similar to NMI from user mode.
+ */
+ andb $~3, CS-RIP+8(%rsp)
+ movq %cr4, %rdx
+ orb $X86_CR4_PGE, %dl
+ movq %rdx, %cr4
+ movq x86_sbm_state + SBM_kernel_cr3, %rdx
+ movq %rdx, %cr3
+
+ movq PER_CPU_VAR(pcpu_hot + X86_current_task), %rdx
+ movq TASK_sbm_state(%rdx), %rdx
+ movq SBM_exc_stack(%rdx), %rdx
+ addq $EXCEPTION_STKSZ, %rdx
+ xchgq %rsp, %rdx
+ UNWIND_HINT_IRET_REGS base=%rdx offset=8
+ pushq 5*8(%rdx) /* pt_regs->ss */
+ pushq 4*8(%rdx) /* pt_regs->rsp */
+ pushq 3*8(%rdx) /* pt_regs->flags */
+ pushq 2*8(%rdx) /* pt_regs->cs */
+ pushq 1*8(%rdx) /* pt_regs->rip */
+ UNWIND_HINT_IRET_REGS
+ pushq $-1 /* pt_regs->orig_ax */
+ PUSH_AND_CLEAR_REGS rdx=(%rdx)
+ ENCODE_FRAME_POINTER
+
+ FENCE_SWAPGS_KERNEL_ENTRY
+ movq %rsp, %rdi
+ call exc_nmi
+
+ /*
+ * Take the kernel return path. This will take care of restoring
+ * CR3 and return CS.
+ */
+ jmp restore_regs_and_return_to_kernel
+#endif
+
.Lnmi_from_kernel:
/*
* Here's what our stack frame will look like:
@@ -1404,6 +1524,22 @@ end_repeat_nmi:
/* Always restore stashed SPEC_CTRL value (see paranoid_entry) */
IBRS_EXIT save_reg=%r15
+#ifdef CONFIG_SANDBOX_MODE
+ /*
+ * Always restore saved CR3 when sandbox mode is active. This is
+ * needed if an NMI occurs during transition to sandbox mode.
+ */
+ movq PER_CPU_VAR(pcpu_hot + X86_current_task), %rcx
+ movq TASK_sbm_state(%rcx), %rcx
+ jrcxz nmi_no_sbm
+
+ movq %cr4, %rax
+ andb $~X86_CR4_PGE, %al
+ movq %rax, %cr4
+ movq %r14, %cr3
+#endif
+
+nmi_no_sbm:
/* Always restore stashed CR3 value (see paranoid_entry) */
RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
diff --git a/arch/x86/include/asm/sbm.h b/arch/x86/include/asm/sbm.h
index 229b1ac3bbd4..e38dcf9a8017 100644
--- a/arch/x86/include/asm/sbm.h
+++ b/arch/x86/include/asm/sbm.h
@@ -24,6 +24,8 @@ struct pt_regs;
* @stack: Sandbox mode stack.
* @exc_stack: Exception and IRQ stack.
* @return_sp: Stack pointer for returning to kernel mode.
+ * @kernel_cr3: Kernel mode CR3 value.
+ * @sbm_cr3: Sandbox mode CR3 value.
*
* One instance of this union is allocated for each sandbox and stored as SBM
* instance private data.
@@ -34,6 +36,8 @@ struct x86_sbm_state {
unsigned long stack;
unsigned long exc_stack;
unsigned long return_sp;
+ unsigned long kernel_cr3;
+ unsigned long sbm_cr3;
};
/**
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 44d4f0a0cb19..cc2751822532 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -17,6 +17,7 @@
#include <asm/thread_info.h>
#include <asm/sigframe.h>
#include <asm/bootparam.h>
+#include <asm/sbm.h>
#include <asm/suspend.h>
#include <asm/tlbflush.h>
#include <asm/tdx.h>
@@ -123,6 +124,10 @@ static void __used common(void)
#if defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE)
COMMENT("SandBox Mode");
+ OFFSET(TASK_sbm_state, task_struct, thread_info.sbm_state);
+ OFFSET(SBM_exc_stack, x86_sbm_state, exc_stack);
OFFSET(SBM_return_sp, x86_sbm_state, return_sp);
+ OFFSET(SBM_kernel_cr3, x86_sbm_state, kernel_cr3);
+ OFFSET(SBM_sbm_cr3, x86_sbm_state, sbm_cr3);
#endif
}
diff --git a/arch/x86/kernel/sbm/call_64.S b/arch/x86/kernel/sbm/call_64.S
index 6a615b4f6047..8b2b524c5b46 100644
--- a/arch/x86/kernel/sbm/call_64.S
+++ b/arch/x86/kernel/sbm/call_64.S
@@ -10,6 +10,8 @@
#include <linux/linkage.h>
#include <asm/nospec-branch.h>
#include <asm/percpu.h>
+#include <asm/processor-flags.h>
+#include <asm/segment.h>
.code64
.section .entry.text, "ax"
@@ -20,6 +22,7 @@
* rsi .. func
* rdx .. args
* rcx .. top of sandbox stack
+ * r8 .. top of exception stack
*/
SYM_FUNC_START(x86_sbm_exec)
/* save all callee-saved registers */
@@ -29,6 +32,7 @@ SYM_FUNC_START(x86_sbm_exec)
push %r13
push %r14
push %r15
+ UNWIND_HINT_SAVE
/* to be used by sandbox abort */
mov %rsp, SBM_return_sp(%rdi)
@@ -38,18 +42,50 @@ SYM_FUNC_START(x86_sbm_exec)
* 1. Store the old stack pointer at the top of the sandbox stack,
* where various unwinders can find it and link back to the
* kernel stack.
+ * 2. Put a return address at the top of the sandbox stack. Although
+ * the target code is not executable in sandbox mode, the page
+ * fault handler can check the fault address to know that the
+ * target function returned.
*/
- sub $8, %rcx
- mov %rsp, (%rcx)
- mov %rcx, %rsp
+ sub $8 * 2, %rcx
+ mov %rsp, 8(%rcx)
+ movq $x86_sbm_return, (%rcx)
- mov %rdx, %rdi /* args */
- CALL_NOSPEC rsi
+ /*
+ * Move to the sandbox exception stack.
+ * This stack is mapped as writable supervisor pages both in kernel
+ * mode and in sandbox mode, so it survives a CR3 change.
+ */
+ sub $8, %r8
+ mov %rsp, (%r8)
+ mov %r8, %rsp
+
+ /* set up the IRET frame */
+ pushq $__USER_DS
+ push %rcx
+ pushfq
+ pushq $__USER_CS
+ push %rsi
- pop %rsp
+ /*
+ * Switch to sandbox address space. Interrupt handlers cannot cope
+ * with sandbox CR3 in kernel mode. Disable interrupts before setting
+ * CR4, because if this task gets preempted, global pages would stay
+ * disabled, which is really bad for performance.
+ * The NMI handler takes extra care to restore CR3 and CR4.
+ */
+ mov SBM_sbm_cr3(%rdi), %r11
+ mov %cr4, %rax
+ and $~X86_CR4_PGE, %al
+ mov %rdx, %rdi /* args */
+ cli
+ mov %rax, %cr4
+ mov %r11, %cr3
+ iretq
SYM_INNER_LABEL(x86_sbm_return, SYM_L_GLOBAL)
ANNOTATE_NOENDBR // IRET target via x86_sbm_fault()
+ UNWIND_HINT_RESTORE
/* restore callee-saved registers and return */
pop %r15
diff --git a/arch/x86/kernel/sbm/core.c b/arch/x86/kernel/sbm/core.c
index d4c378847e93..0ea193550a83 100644
--- a/arch/x86/kernel/sbm/core.c
+++ b/arch/x86/kernel/sbm/core.c
@@ -24,9 +24,15 @@
#define PGD_ORDER get_order(sizeof(pgd_t) * PTRS_PER_PGD)
asmlinkage int x86_sbm_exec(struct x86_sbm_state *state, sbm_func func,
- void *args, unsigned long sbm_tos);
+ void *args, unsigned long sbm_tos,
+ unsigned long exc_tos);
extern char x86_sbm_return[];
+union {
+ struct x86_sbm_state state;
+ char page[PAGE_SIZE];
+} x86_sbm_state __page_aligned_bss;
+
static inline phys_addr_t page_to_ptval(struct page *page)
{
return PFN_PHYS(page_to_pfn(page)) | _PAGE_TABLE;
@@ -279,6 +285,12 @@ int arch_sbm_init(struct sbm *sbm)
if (err)
return err;
+ BUILD_BUG_ON(sizeof(x86_sbm_state) != PAGE_SIZE);
+ err = map_page(state, (unsigned long)&x86_sbm_state,
+ PHYS_PFN(__pa(state)), PAGE_KERNEL);
+ if (err < 0)
+ return err;
+
return 0;
}
@@ -348,11 +360,18 @@ int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *args)
state->sbm = sbm;
+ /* save current (kernel) CR3 for interrupt entry path */
+ state->kernel_cr3 = __read_cr3();
+
+ /* CR3 while running in sandbox */
+ state->sbm_cr3 = __sme_pa(state->pgd);
+
/* let interrupt handlers use the sandbox state page */
barrier();
WRITE_ONCE(current_thread_info()->sbm_state, state);
- err = x86_sbm_exec(state, func, args, state->stack + THREAD_SIZE);
+ err = x86_sbm_exec(state, func, args, state->stack + THREAD_SIZE,
+ state->exc_stack + EXCEPTION_STKSZ);
/* NULLify the state page pointer before it becomes stale */
WRITE_ONCE(current_thread_info()->sbm_state, NULL);
--
2.34.1
From: Petr Tesarik <[email protected]>
Implement lazy TLB flushing in sandbox mode and keep CR4.PGE enabled.
For the transition from sandbox mode to kernel mode:
1. All user page translations (sandbox code and data) are flushed from the
TLB, because their page protection bits do not include _PAGE_GLOBAL.
2. Any kernel page translations remain valid after the transition. The SBM
state page is an exception; map it without _PAGE_GLOBAL.
For the transition from kernel mode to sandbox mode:
1. Kernel page translations become stale. However, any access by code
running in sandbox mode (with CPL 3) causes a protection violation.
Handle the spurious page faults from such accesses, lazily replacing
entries in the TLB.
2. If the TLB contains any user page translations before the switch to
sandbox mode, they are flushed, because their page protection bits do
not include _PAGE_GLOBAL. This ensures that sandbox mode cannot access
user mode pages.
Note that the TLB may keep kernel page translations for addresses which are
never accessed by sandbox mode. They remain valid after returning to kernel
mode.
Signed-off-by: Petr Tesarik <[email protected]>
---
arch/x86/entry/entry_64.S | 17 +-----
arch/x86/kernel/sbm/call_64.S | 5 +-
arch/x86/kernel/sbm/core.c | 100 +++++++++++++++++++++++++++++++++-
3 files changed, 102 insertions(+), 20 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index e1364115408a..4ba3eea38102 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -632,10 +632,8 @@ SYM_INNER_LABEL(restore_regs_and_return_to_kernel, SYM_L_GLOBAL)
movq PER_CPU_VAR(pcpu_hot + X86_current_task), %rcx
movq TASK_sbm_state(%rcx), %rcx
movq SBM_sbm_cr3(%rcx), %rcx
- movq %cr4, %rax
- andb $~X86_CR4_PGE, %al
- movq %rax, %cr4
movq %rcx, %cr3
+ invlpg x86_sbm_state
orb $3, CS(%rsp)
#endif
@@ -897,9 +895,6 @@ SYM_CODE_START(paranoid_entry)
movq %cr3, %r14
andb $~3, CS+8(%rsp)
- movq %cr4, %rax
- orb $X86_CR4_PGE, %al
- movq %rax, %cr4
movq %rcx, %cr3
jmp .Lparanoid_gsbase
#endif
@@ -1073,9 +1068,6 @@ SYM_CODE_START(error_entry)
jrcxz .Lerror_swapgs
andb $~3, CS+8(%rsp)
- movq %cr4, %rax
- orb $X86_CR4_PGE, %al
- movq %rax, %cr4
movq %rcx, %cr3
jmp .Lerror_entry_done_lfence
#endif
@@ -1281,9 +1273,6 @@ SYM_CODE_START(asm_exc_nmi)
* stack. The code is similar to NMI from user mode.
*/
andb $~3, CS-RIP+8(%rsp)
- movq %cr4, %rdx
- orb $X86_CR4_PGE, %dl
- movq %rdx, %cr4
movq x86_sbm_state + SBM_kernel_cr3, %rdx
movq %rdx, %cr3
@@ -1533,10 +1522,8 @@ end_repeat_nmi:
movq TASK_sbm_state(%rcx), %rcx
jrcxz nmi_no_sbm
- movq %cr4, %rax
- andb $~X86_CR4_PGE, %al
- movq %rax, %cr4
movq %r14, %cr3
+ invlpg x86_sbm_state
#endif
nmi_no_sbm:
diff --git a/arch/x86/kernel/sbm/call_64.S b/arch/x86/kernel/sbm/call_64.S
index 8b2b524c5b46..21edce5666bc 100644
--- a/arch/x86/kernel/sbm/call_64.S
+++ b/arch/x86/kernel/sbm/call_64.S
@@ -10,7 +10,6 @@
#include <linux/linkage.h>
#include <asm/nospec-branch.h>
#include <asm/percpu.h>
-#include <asm/processor-flags.h>
#include <asm/segment.h>
.code64
@@ -75,12 +74,10 @@ SYM_FUNC_START(x86_sbm_exec)
* The NMI handler takes extra care to restore CR3 and CR4.
*/
mov SBM_sbm_cr3(%rdi), %r11
- mov %cr4, %rax
- and $~X86_CR4_PGE, %al
mov %rdx, %rdi /* args */
cli
- mov %rax, %cr4
mov %r11, %cr3
+ invlpg x86_sbm_state
iretq
SYM_INNER_LABEL(x86_sbm_return, SYM_L_GLOBAL)
diff --git a/arch/x86/kernel/sbm/core.c b/arch/x86/kernel/sbm/core.c
index 0ea193550a83..296f1fde3c22 100644
--- a/arch/x86/kernel/sbm/core.c
+++ b/arch/x86/kernel/sbm/core.c
@@ -33,6 +33,11 @@ union {
char page[PAGE_SIZE];
} x86_sbm_state __page_aligned_bss;
+static inline pgprot_t pgprot_nonglobal(pgprot_t prot)
+{
+ return __pgprot(pgprot_val(prot) & ~_PAGE_GLOBAL);
+}
+
static inline phys_addr_t page_to_ptval(struct page *page)
{
return PFN_PHYS(page_to_pfn(page)) | _PAGE_TABLE;
@@ -287,7 +292,7 @@ int arch_sbm_init(struct sbm *sbm)
BUILD_BUG_ON(sizeof(x86_sbm_state) != PAGE_SIZE);
err = map_page(state, (unsigned long)&x86_sbm_state,
- PHYS_PFN(__pa(state)), PAGE_KERNEL);
+ PHYS_PFN(__pa(state)), pgprot_nonglobal(PAGE_KERNEL));
if (err < 0)
return err;
@@ -379,11 +384,104 @@ int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *args)
return err;
}
+static bool spurious_sbm_fault_check(unsigned long error_code, pte_t *pte)
+{
+ if ((error_code & X86_PF_WRITE) && !pte_write(*pte))
+ return false;
+
+ if ((error_code & X86_PF_INSTR) && !pte_exec(*pte))
+ return false;
+
+ return true;
+}
+
+/*
+ * Handle a spurious fault caused by a stale TLB entry.
+ *
+ * This allows us to lazily refresh the TLB when increasing the
+ * permissions of a kernel page (RO -> RW or NX -> X). Doing it
+ * eagerly is very expensive since that implies doing a full
+ * cross-processor TLB flush, even if no stale TLB entries exist
+ * on other processors.
+ *
+ * Spurious faults may only occur if the TLB contains an entry with
+ * fewer permission than the page table entry. Non-present (P = 0)
+ * and reserved bit (R = 1) faults are never spurious.
+ *
+ * There are no security implications to leaving a stale TLB when
+ * increasing the permissions on a page.
+ *
+ * Returns true if a spurious fault was handled, false otherwise.
+ *
+ * See Intel Developer's Manual Vol 3 Section 4.10.4.3, bullet 3
+ * (Optional Invalidation).
+ */
+static bool
+spurious_sbm_fault(struct x86_sbm_state *state, unsigned long error_code,
+ unsigned long address)
+{
+ pgd_t *pgd;
+ p4d_t *p4d;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
+ bool ret;
+
+ if ((error_code & ~(X86_PF_WRITE | X86_PF_INSTR)) !=
+ (X86_PF_USER | X86_PF_PROT))
+ return false;
+
+ pgd = __va(state->sbm_cr3 & CR3_ADDR_MASK) + pgd_index(address);
+ if (!pgd_present(*pgd))
+ return false;
+
+ p4d = p4d_offset(pgd, address);
+ if (!p4d_present(*p4d))
+ return false;
+
+ if (p4d_large(*p4d))
+ return spurious_sbm_fault_check(error_code, (pte_t *)p4d);
+
+ pud = pud_offset(p4d, address);
+ if (!pud_present(*pud))
+ return false;
+
+ if (pud_large(*pud))
+ return spurious_sbm_fault_check(error_code, (pte_t *)pud);
+
+ pmd = pmd_offset(pud, address);
+ if (!pmd_present(*pmd))
+ return false;
+
+ if (pmd_large(*pmd))
+ return spurious_sbm_fault_check(error_code, (pte_t *)pmd);
+
+ pte = pte_offset_kernel(pmd, address);
+ if (!pte_present(*pte))
+ return false;
+
+ ret = spurious_sbm_fault_check(error_code, pte);
+ if (!ret)
+ return false;
+
+ /*
+ * Make sure we have permissions in PMD.
+ * If not, then there's a bug in the page tables:
+ */
+ ret = spurious_sbm_fault_check(error_code, (pte_t *)pmd);
+ WARN_ONCE(!ret, "PMD has incorrect permission bits\n");
+
+ return ret;
+}
+
void handle_sbm_fault(struct pt_regs *regs, unsigned long error_code,
unsigned long address)
{
struct x86_sbm_state *state = current_thread_info()->sbm_state;
+ if (spurious_sbm_fault(state, error_code, address))
+ return;
+
/*
* Force -EFAULT unless the fault was due to a user-mode instruction
* fetch from the designated return address.
--
2.34.1
From: Petr Tesarik <[email protected]>
Add arch hooks for the x86 architecture and select CONFIG_HAVE_ARCH_SBM.
Implement arch_sbm_init(): Allocate an arch-specific state page and store
it as SBM instance private data. Set up mappings for kernel text, static
data, current task and current thread stack into the.
Implement arch_sbm_map_readonly() and arch_sbm_map_writable(): Set the PTE
value, allocating additional page tables as necessary.
Implement arch_sbm_destroy(): Walk the page table hierarchy and free all
page tables, including the page global directory.
Provide a trivial implementation of arch_sbm_exec() to avoid build
failures, but do not switch to the constructed page tables yet.
Signed-off-by: Petr Tesarik <[email protected]>
---
arch/x86/Kconfig | 1 +
arch/x86/include/asm/sbm.h | 29 ++++
arch/x86/kernel/Makefile | 2 +
arch/x86/kernel/sbm/Makefile | 10 ++
arch/x86/kernel/sbm/core.c | 248 +++++++++++++++++++++++++++++++++++
5 files changed, 290 insertions(+)
create mode 100644 arch/x86/include/asm/sbm.h
create mode 100644 arch/x86/kernel/sbm/Makefile
create mode 100644 arch/x86/kernel/sbm/core.c
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5edec175b9bf..41fa4ab84c15 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -188,6 +188,7 @@ config X86
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if MMU && COMPAT
select HAVE_ARCH_COMPAT_MMAP_BASES if MMU && COMPAT
select HAVE_ARCH_PREL32_RELOCATIONS
+ select HAVE_ARCH_SBM
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
select HAVE_ARCH_STACKLEAK
diff --git a/arch/x86/include/asm/sbm.h b/arch/x86/include/asm/sbm.h
new file mode 100644
index 000000000000..01c8d357550b
--- /dev/null
+++ b/arch/x86/include/asm/sbm.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2023-2024 Huawei Technologies Duesseldorf GmbH
+ *
+ * Author: Petr Tesarik <[email protected]>
+ *
+ * SandBox Mode (SBM) declarations for the x86 architecture.
+ */
+#ifndef __ASM_SBM_H
+#define __ASM_SBM_H
+
+#if defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE)
+
+#include <asm/pgtable_types.h>
+
+/**
+ * struct x86_sbm_state - Run-time state of the environment.
+ * @pgd: Sandbox mode page global directory.
+ *
+ * One instance of this union is allocated for each sandbox and stored as SBM
+ * instance private data.
+ */
+struct x86_sbm_state {
+ pgd_t *pgd;
+};
+
+#endif /* defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE) */
+
+#endif /* __ASM_SBM_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 0000325ab98f..4ad63b7d13ee 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -150,6 +150,8 @@ obj-$(CONFIG_X86_CET) += cet.o
obj-$(CONFIG_X86_USER_SHADOW_STACK) += shstk.o
+obj-$(CONFIG_SANDBOX_MODE) += sbm/
+
###
# 64 bit specific files
ifeq ($(CONFIG_X86_64),y)
diff --git a/arch/x86/kernel/sbm/Makefile b/arch/x86/kernel/sbm/Makefile
new file mode 100644
index 000000000000..92d368b526cd
--- /dev/null
+++ b/arch/x86/kernel/sbm/Makefile
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2023-2024 Huawei Technologies Duesseldorf GmbH
+#
+# Author: Petr Tesarik <[email protected]>
+#
+# Makefile for the x86 SandBox Mode (SBM) implementation.
+#
+
+obj-y := core.o
diff --git a/arch/x86/kernel/sbm/core.c b/arch/x86/kernel/sbm/core.c
new file mode 100644
index 000000000000..b775e3b387b1
--- /dev/null
+++ b/arch/x86/kernel/sbm/core.c
@@ -0,0 +1,248 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2023-2024 Huawei Technologies Duesseldorf GmbH
+ *
+ * Author: Petr Tesarik <[email protected]>
+ *
+ * SandBox Mode (SBM) implementation for the x86 architecture.
+ */
+
+#include <asm/pgtable.h>
+#include <asm/sbm.h>
+#include <asm/sections.h>
+#include <linux/mm.h>
+#include <linux/sbm.h>
+#include <linux/sched/task_stack.h>
+
+#define GFP_SBM_PGTABLE (GFP_KERNEL | __GFP_ZERO)
+#define PGD_ORDER get_order(sizeof(pgd_t) * PTRS_PER_PGD)
+
+static inline phys_addr_t page_to_ptval(struct page *page)
+{
+ return PFN_PHYS(page_to_pfn(page)) | _PAGE_TABLE;
+}
+
+static int map_page(struct x86_sbm_state *state, unsigned long addr,
+ unsigned long pfn, pgprot_t prot)
+{
+ struct page *page;
+ pgd_t *pgdp;
+ p4d_t *p4dp;
+ pud_t *pudp;
+ pmd_t *pmdp;
+ pte_t *ptep;
+
+ pgdp = pgd_offset_pgd(state->pgd, addr);
+ if (pgd_none(*pgdp)) {
+ page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+ if (!page)
+ return -ENOMEM;
+ set_pgd(pgdp, __pgd(page_to_ptval(page)));
+ p4dp = (p4d_t *)page_address(page) + p4d_index(addr);
+ } else
+ p4dp = p4d_offset(pgdp, addr);
+
+ if (p4d_none(*p4dp)) {
+ page = alloc_page(GFP_SBM_PGTABLE);
+ if (!page)
+ return -ENOMEM;
+ set_p4d(p4dp, __p4d(page_to_ptval(page)));
+ pudp = (pud_t *)page_address(page) + pud_index(addr);
+ } else
+ pudp = pud_offset(p4dp, addr);
+
+ if (pud_none(*pudp)) {
+ page = alloc_page(GFP_SBM_PGTABLE);
+ if (!page)
+ return -ENOMEM;
+ set_pud(pudp, __pud(page_to_ptval(page)));
+ pmdp = (pmd_t *)page_address(page) + pmd_index(addr);
+ } else
+ pmdp = pmd_offset(pudp, addr);
+
+ if (pmd_none(*pmdp)) {
+ page = alloc_page(GFP_SBM_PGTABLE);
+ if (!page)
+ return -ENOMEM;
+ set_pmd(pmdp, __pmd(page_to_ptval(page)));
+ ptep = (pte_t *)page_address(page) + pte_index(addr);
+ } else
+ ptep = pte_offset_kernel(pmdp, addr);
+
+ set_pte(ptep, pfn_pte(pfn, prot));
+ return 0;
+}
+
+static int map_range(struct x86_sbm_state *state, unsigned long start,
+ unsigned long end, pgprot_t prot)
+{
+ unsigned long pfn;
+ int err;
+
+ start = PAGE_ALIGN_DOWN(start);
+ while (start < end) {
+ if (is_vmalloc_or_module_addr((void *)start))
+ pfn = vmalloc_to_pfn((void *)start);
+ else
+ pfn = PHYS_PFN(__pa(start));
+ err = map_page(state, start, pfn, prot);
+ if (err)
+ return err;
+ start += PAGE_SIZE;
+ }
+
+ return 0;
+}
+
+int arch_sbm_map_readonly(struct sbm *sbm, const struct sbm_buf *buf)
+{
+ return map_range(sbm->private, (unsigned long)buf->sbm_ptr,
+ (unsigned long)buf->sbm_ptr + buf->size,
+ PAGE_READONLY);
+}
+
+int arch_sbm_map_writable(struct sbm *sbm, const struct sbm_buf *buf)
+{
+ return map_range(sbm->private, (unsigned long)buf->sbm_ptr,
+ (unsigned long)buf->sbm_ptr + buf->size,
+ PAGE_SHARED);
+}
+
+/* Map kernel text, data, rodata, BSS and static per-cpu sections. */
+static int map_kernel(struct x86_sbm_state *state)
+{
+ int __maybe_unused cpu;
+ int err;
+
+ err = map_range(state, (unsigned long)_stext, (unsigned long)_etext,
+ PAGE_READONLY_EXEC);
+ if (err)
+ return err;
+
+ err = map_range(state, (unsigned long)__entry_text_start,
+ (unsigned long)__entry_text_end, PAGE_KERNEL_ROX);
+ if (err)
+ return err;
+
+ err = map_range(state, (unsigned long)_sdata, (unsigned long)_edata,
+ PAGE_READONLY);
+ if (err)
+ return err;
+ err = map_range(state, (unsigned long)__bss_start,
+ (unsigned long)__bss_stop, PAGE_READONLY);
+ if (err)
+ return err;
+ err = map_range(state, (unsigned long)__start_rodata,
+ (unsigned long)__end_rodata, PAGE_READONLY);
+ if (err)
+ return err;
+
+#ifdef CONFIG_SMP
+ for_each_possible_cpu(cpu) {
+ unsigned long off = per_cpu_offset(cpu);
+
+ err = map_range(state, (unsigned long)__per_cpu_start + off,
+ (unsigned long)__per_cpu_end + off,
+ PAGE_READONLY);
+ if (err)
+ return err;
+ }
+#endif
+
+ return 0;
+}
+
+int arch_sbm_init(struct sbm *sbm)
+{
+ struct x86_sbm_state *state;
+ unsigned long stack;
+ int err;
+
+ BUILD_BUG_ON(sizeof(*state) > PAGE_SIZE);
+ state = (struct x86_sbm_state *)__get_free_page(GFP_KERNEL);
+ if (!state)
+ return -ENOMEM;
+ sbm->private = state;
+
+ state->pgd = (pgd_t *)__get_free_pages(GFP_SBM_PGTABLE, PGD_ORDER);
+ if (!state->pgd)
+ return -ENOMEM;
+
+ err = map_kernel(state);
+ if (err)
+ return err;
+
+ err = map_range(state, (unsigned long)current,
+ (unsigned long)(current + 1), PAGE_READONLY);
+ if (err)
+ return err;
+
+ stack = (unsigned long)task_stack_page(current);
+ err = map_range(state, stack, stack + THREAD_SIZE, PAGE_READONLY);
+ if (err)
+ return err;
+
+ return 0;
+}
+
+static void free_pmd(pmd_t *pmd)
+{
+ pmd_t *pmdp;
+
+ for (pmdp = pmd; pmdp < pmd + PTRS_PER_PMD; ++pmdp)
+ if (!pmd_none(*pmdp))
+ free_page(pmd_page_vaddr(*pmdp));
+ if (PTRS_PER_PMD > 1)
+ free_page((unsigned long)pmd);
+}
+
+static void free_pud(pud_t *pud)
+{
+ pud_t *pudp;
+
+ for (pudp = pud; pudp < pud + PTRS_PER_PUD; ++pudp)
+ if (!pud_none(*pudp))
+ free_pmd(pmd_offset(pudp, 0));
+ if (PTRS_PER_PUD > 1)
+ free_page((unsigned long)pud);
+}
+
+static void free_p4d(p4d_t *p4d)
+{
+ p4d_t *p4dp;
+
+ for (p4dp = p4d; p4dp < p4d + PTRS_PER_P4D; ++p4dp)
+ if (!p4d_none(*p4dp))
+ free_pud(pud_offset(p4dp, 0));
+ if (PTRS_PER_P4D > 1)
+ free_page((unsigned long)p4d);
+}
+
+static void free_pgd(pgd_t *pgd)
+{
+ pgd_t *pgdp;
+
+ for (pgdp = pgd; pgdp < pgd + PTRS_PER_PGD; ++pgdp)
+ if (!pgd_none(*pgdp))
+ free_p4d(p4d_offset(pgdp, 0));
+}
+
+void arch_sbm_destroy(struct sbm *sbm)
+{
+ struct x86_sbm_state *state = sbm->private;
+
+ if (!state)
+ return;
+
+ if (state->pgd) {
+ free_pgd(state->pgd);
+ free_pages((unsigned long)state->pgd, PGD_ORDER);
+ }
+ free_page((unsigned long)state);
+ sbm->private = NULL;
+}
+
+int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *args)
+{
+ return func(args);
+}
--
2.34.1
From: Petr Tesarik <[email protected]>
Allocate and map a separate stack for sandbox mode in arch_sbm_init().
Switch to this stack in arch_sbm_exec(). Store the address of the stack as
arch-specific state.
On X86_64, RSP is never used to locate thread-specific data, so it is safe
to change its value. If the sandbox is preempted by an interrupt, RSP is
saved by switch_to() and restored when the sandbox task is scheduled
again. The original kernel stack pointer is restored when the sandbox
function returns.
Since the stack switch mechanism is implemented only for 64-bit, make
CONFIG_HAVE_ARCH_SBM depend on X86_64 for now. Leave it under "config X86",
because it would be possible to implement a 32-bit variant.
Signed-off-by: Petr Tesarik <[email protected]>
---
arch/x86/Kconfig | 2 +-
arch/x86/include/asm/sbm.h | 2 ++
arch/x86/kernel/sbm/Makefile | 6 ++++++
arch/x86/kernel/sbm/call_64.S | 40 +++++++++++++++++++++++++++++++++++
arch/x86/kernel/sbm/core.c | 17 ++++++++++++++-
5 files changed, 65 insertions(+), 2 deletions(-)
create mode 100644 arch/x86/kernel/sbm/call_64.S
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 41fa4ab84c15..090d46c7ee7c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -188,7 +188,7 @@ config X86
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if MMU && COMPAT
select HAVE_ARCH_COMPAT_MMAP_BASES if MMU && COMPAT
select HAVE_ARCH_PREL32_RELOCATIONS
- select HAVE_ARCH_SBM
+ select HAVE_ARCH_SBM if X86_64
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
select HAVE_ARCH_STACKLEAK
diff --git a/arch/x86/include/asm/sbm.h b/arch/x86/include/asm/sbm.h
index 01c8d357550b..ed214c17af06 100644
--- a/arch/x86/include/asm/sbm.h
+++ b/arch/x86/include/asm/sbm.h
@@ -16,12 +16,14 @@
/**
* struct x86_sbm_state - Run-time state of the environment.
* @pgd: Sandbox mode page global directory.
+ * @stack: Sandbox mode stack.
*
* One instance of this union is allocated for each sandbox and stored as SBM
* instance private data.
*/
struct x86_sbm_state {
pgd_t *pgd;
+ unsigned long stack;
};
#endif /* defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE) */
diff --git a/arch/x86/kernel/sbm/Makefile b/arch/x86/kernel/sbm/Makefile
index 92d368b526cd..62c3e85c14a4 100644
--- a/arch/x86/kernel/sbm/Makefile
+++ b/arch/x86/kernel/sbm/Makefile
@@ -8,3 +8,9 @@
#
obj-y := core.o
+
+###
+# 64 bit specific files
+ifeq ($(CONFIG_X86_64),y)
+ obj-y += call_64.o
+endif
diff --git a/arch/x86/kernel/sbm/call_64.S b/arch/x86/kernel/sbm/call_64.S
new file mode 100644
index 000000000000..245d0dddce73
--- /dev/null
+++ b/arch/x86/kernel/sbm/call_64.S
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2023-2024 Huawei Technologies Duesseldorf GmbH
+ *
+ * Author: Petr Tesarik <[email protected]>
+ *
+ * SandBox Mode (SBM) low-level x86_64 assembly.
+ */
+
+#include <linux/linkage.h>
+#include <asm/nospec-branch.h>
+
+.code64
+.section .entry.text, "ax"
+
+/*
+ * arguments:
+ * rdi .. SBM state (kernel address)
+ * rsi .. func
+ * rdx .. args
+ * rcx .. top of sandbox stack
+ */
+SYM_FUNC_START(x86_sbm_exec)
+ /*
+ * Set up the sandbox stack:
+ * 1. Store the old stack pointer at the top of the sandbox stack,
+ * where various unwinders can find it and link back to the
+ * kernel stack.
+ */
+ sub $8, %rcx
+ mov %rsp, (%rcx)
+ mov %rcx, %rsp
+
+ mov %rdx, %rdi /* args */
+ CALL_NOSPEC rsi
+
+ pop %rsp
+
+ RET
+SYM_FUNC_END(x86_sbm_exec)
diff --git a/arch/x86/kernel/sbm/core.c b/arch/x86/kernel/sbm/core.c
index b775e3b387b1..de6986801148 100644
--- a/arch/x86/kernel/sbm/core.c
+++ b/arch/x86/kernel/sbm/core.c
@@ -17,6 +17,9 @@
#define GFP_SBM_PGTABLE (GFP_KERNEL | __GFP_ZERO)
#define PGD_ORDER get_order(sizeof(pgd_t) * PTRS_PER_PGD)
+asmlinkage int x86_sbm_exec(struct x86_sbm_state *state, sbm_func func,
+ void *args, unsigned long sbm_tos);
+
static inline phys_addr_t page_to_ptval(struct page *page)
{
return PFN_PHYS(page_to_pfn(page)) | _PAGE_TABLE;
@@ -182,6 +185,15 @@ int arch_sbm_init(struct sbm *sbm)
if (err)
return err;
+ state->stack = __get_free_pages(GFP_KERNEL, THREAD_SIZE_ORDER);
+ if (!state->stack)
+ return -ENOMEM;
+
+ err = map_range(state, state->stack, state->stack + THREAD_SIZE,
+ PAGE_SHARED);
+ if (err)
+ return err;
+
return 0;
}
@@ -238,11 +250,14 @@ void arch_sbm_destroy(struct sbm *sbm)
free_pgd(state->pgd);
free_pages((unsigned long)state->pgd, PGD_ORDER);
}
+ free_pages(state->stack, THREAD_SIZE_ORDER);
free_page((unsigned long)state);
sbm->private = NULL;
}
int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *args)
{
- return func(args);
+ struct x86_sbm_state *state = sbm->private;
+
+ return x86_sbm_exec(state, func, args, state->stack + THREAD_SIZE);
}
--
2.34.1
From: Petr Tesarik <[email protected]>
Sandbox mode should run with CPL 3. It is treated as user mode by the CPU,
so non-IST interrupts will load RSP from TSS. This is the trampoline stack
and that one is fine. However, the interrupt entry code then moves to the
thread stack, assuming that it cannot be currently in use, since the task
was executing in user mode. This assumption is not valid for sandbox mode,
because the code was originally called from kernel mode, and the thread
stack contains precious data of the sandbox mode callers.
Allocate a separate exception stack for sandbox and use it instead of the
thread stack in interrupt handlers while sandbox mode is active. To find
the sandbox exception stack from interrupt entry, store a pointer to the
state page in struct thread_info. This pointer is non-NULL if the current
task is running in sandbox mode. It is also non-NULL during the transition
from/to sandbox mode. The sandbox exception stack is valid in either case.
Signed-off-by: Petr Tesarik <[email protected]>
---
arch/x86/include/asm/sbm.h | 24 ++++++++++++++++++++++++
arch/x86/include/asm/thread_info.h | 3 +++
arch/x86/kernel/sbm/call_64.S | 1 +
arch/x86/kernel/sbm/core.c | 21 ++++++++++++++++++++-
arch/x86/kernel/traps.c | 3 ++-
5 files changed, 50 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/sbm.h b/arch/x86/include/asm/sbm.h
index ed214c17af06..ca4741b449e8 100644
--- a/arch/x86/include/asm/sbm.h
+++ b/arch/x86/include/asm/sbm.h
@@ -9,6 +9,8 @@
#ifndef __ASM_SBM_H
#define __ASM_SBM_H
+#include <asm/processor.h>
+
#if defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE)
#include <asm/pgtable_types.h>
@@ -17,6 +19,7 @@
* struct x86_sbm_state - Run-time state of the environment.
* @pgd: Sandbox mode page global directory.
* @stack: Sandbox mode stack.
+ * @exc_stack: Exception and IRQ stack.
*
* One instance of this union is allocated for each sandbox and stored as SBM
* instance private data.
@@ -24,8 +27,29 @@
struct x86_sbm_state {
pgd_t *pgd;
unsigned long stack;
+ unsigned long exc_stack;
};
+/**
+ * top_of_intr_stack() - Get address interrupt stack.
+ *
+ */
+static inline unsigned long top_of_intr_stack(void)
+{
+ struct x86_sbm_state *sbm = current_thread_info()->sbm_state;
+
+ if (sbm)
+ return sbm->exc_stack + EXCEPTION_STKSZ;
+ return current_top_of_stack();
+}
+
+#else /* defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE) */
+
+static inline unsigned long top_of_intr_stack(void)
+{
+ return current_top_of_stack();
+}
+
#endif /* defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE) */
#endif /* __ASM_SBM_H */
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index d63b02940747..95b1acffb78a 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -60,6 +60,9 @@ struct thread_info {
#ifdef CONFIG_SMP
u32 cpu; /* current CPU */
#endif
+#ifdef CONFIG_SANDBOX_MODE
+ struct x86_sbm_state *sbm_state; /* SandBox mode state page */
+#endif
};
#define INIT_THREAD_INFO(tsk) \
diff --git a/arch/x86/kernel/sbm/call_64.S b/arch/x86/kernel/sbm/call_64.S
index 245d0dddce73..1b232c8d15b7 100644
--- a/arch/x86/kernel/sbm/call_64.S
+++ b/arch/x86/kernel/sbm/call_64.S
@@ -9,6 +9,7 @@
#include <linux/linkage.h>
#include <asm/nospec-branch.h>
+#include <asm/percpu.h>
.code64
.section .entry.text, "ax"
diff --git a/arch/x86/kernel/sbm/core.c b/arch/x86/kernel/sbm/core.c
index f3a123d64afc..81f1b0093537 100644
--- a/arch/x86/kernel/sbm/core.c
+++ b/arch/x86/kernel/sbm/core.c
@@ -264,6 +264,14 @@ int arch_sbm_init(struct sbm *sbm)
if (err)
return err;
+ state->exc_stack = __get_free_pages(GFP_KERNEL, EXCEPTION_STACK_ORDER);
+ if (err)
+ return err;
+ err = map_range(state, state->exc_stack,
+ state->exc_stack + EXCEPTION_STKSZ, PAGE_KERNEL);
+ if (err)
+ return err;
+
err = map_cpu_data(state);
if (err)
return err;
@@ -324,6 +332,7 @@ void arch_sbm_destroy(struct sbm *sbm)
free_pgd(state->pgd);
free_pages((unsigned long)state->pgd, PGD_ORDER);
}
+ free_pages(state->exc_stack, EXCEPTION_STACK_ORDER);
free_pages(state->stack, THREAD_SIZE_ORDER);
free_page((unsigned long)state);
sbm->private = NULL;
@@ -332,6 +341,16 @@ void arch_sbm_destroy(struct sbm *sbm)
int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *args)
{
struct x86_sbm_state *state = sbm->private;
+ int err;
+
+ /* let interrupt handlers use the sandbox state page */
+ barrier();
+ WRITE_ONCE(current_thread_info()->sbm_state, state);
+
+ err = x86_sbm_exec(state, func, args, state->stack + THREAD_SIZE);
+
+ /* NULLify the state page pointer before it becomes stale */
+ WRITE_ONCE(current_thread_info()->sbm_state, NULL);
- return x86_sbm_exec(state, func, args, state->stack + THREAD_SIZE);
+ return err;
}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index c3b2f863acf0..b9c9c74314e7 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -66,6 +66,7 @@
#include <asm/vdso.h>
#include <asm/tdx.h>
#include <asm/cfi.h>
+#include <asm/sbm.h>
#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
@@ -773,7 +774,7 @@ DEFINE_IDTENTRY_RAW(exc_int3)
*/
asmlinkage __visible noinstr struct pt_regs *sync_regs(struct pt_regs *eregs)
{
- struct pt_regs *regs = (struct pt_regs *)this_cpu_read(pcpu_hot.top_of_stack) - 1;
+ struct pt_regs *regs = (struct pt_regs *)top_of_intr_stack() - 1;
if (regs != eregs)
*regs = *eregs;
return regs;
--
2.34.1
From: Petr Tesarik <[email protected]>
Add a section about the x86-64 implementation.
Signed-off-by: Petr Tesarik <[email protected]>
---
Documentation/security/sandbox-mode.rst | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/Documentation/security/sandbox-mode.rst b/Documentation/security/sandbox-mode.rst
index 4405b8858c4a..84816b6b68de 100644
--- a/Documentation/security/sandbox-mode.rst
+++ b/Documentation/security/sandbox-mode.rst
@@ -111,6 +111,31 @@ These hooks must be implemented to select HAVE_ARCH_SBM.
:identifiers: arch_sbm_init arch_sbm_destroy arch_sbm_exec
arch_sbm_map_readonly arch_sbm_map_writable
+X86_64 Implementation
+=====================
+
+The x86_64 implementation provides strong isolation and recovery from CPU
+exceptions.
+
+Sandbox mode runs in protection ring 3 (same as user mode). This means that:
+
+* sandbox code cannot execute privileged CPU instructions,
+* memory accesses are treated as user accesses.
+
+The thread stack is readable in sandbox mode, because an on-stack data
+structure is used by call helpers and thunks to pass target function
+arguments. However, it is not writable, and sandbox code runs on its own
+stack. The thread stack is not used by interrupt handlers either. Non-IST
+interrupt handlers run on a separate sandbox exception stack.
+
+The interrupt entry path modifies the saved pt_regs to make it appear as
+coming from kernel mode. The CR3 register is then switched to kernel mode.
+The interrupt exit path is modified to restore actual pt_regs and switch the
+CR3 register back to its sandbox mode value, overriding CR3 changes for page
+table isolation.
+
+Support for paravirtualized kernels is not (yet) provided.
+
Current Limitations
===================
--
2.34.1
On 2/14/24 03:35, Petr Tesarik wrote:
> This patch series implements x86_64 arch hooks for the generic SandBox
> Mode infrastructure.
I think I'm missing a bit of context here. What does one _do_ with
SandBox Mode? Why is it useful?
On February 14, 2024 6:52:53 AM PST, Dave Hansen <[email protected]> wrote:
>On 2/14/24 03:35, Petr Tesarik wrote:
>> This patch series implements x86_64 arch hooks for the generic SandBox
>> Mode infrastructure.
>
>I think I'm missing a bit of context here. What does one _do_ with
>SandBox Mode? Why is it useful?
Seriously. On the surface it looks like a really bad idea – basically an ad hoc, *more* privileged version of user shave.
On Wed, 14 Feb 2024 07:28:35 -0800
"H. Peter Anvin" <[email protected]> wrote:
> On February 14, 2024 6:52:53 AM PST, Dave Hansen <[email protected]> wrote:
> >On 2/14/24 03:35, Petr Tesarik wrote:
> >> This patch series implements x86_64 arch hooks for the generic SandBox
> >> Mode infrastructure.
> >
> >I think I'm missing a bit of context here. What does one _do_ with
> >SandBox Mode? Why is it useful?
>
> Seriously. On the surface it looks like a really bad idea – basically an ad hoc, *more* privileged version of user shave.
Hi hpa,
I agree that it kind of tries to do "user mode without user mode".
There are some differences from actual user mode:
First, from a process management POV, sandbox mode appears to be
running in kernel mode. So, there is no way to use ptrace(2), send
malicious signals or otherwise interact with the sandbox. In fact,
the process can have three independent contexts: user mode, kernel mode
and sandbox mode.
Second, a sandbox can run unmodified kernel code and interact directly
with other parts of the kernel. It's not really possible with this
initial patch series, but the plan is that sandbox mode can share locks
with the kernel.
Third, sandbox code can be trusted for operations like parsing keys for
the trusted keychain if the kernel is locked down, i.e. when even a
process with UID 0 is not on the same trust level as kernel mode.
HTH
Petr T
On February 14, 2024 8:41:43 AM PST, "Petr Tesařík" <[email protected]> wrote:
>On Wed, 14 Feb 2024 07:28:35 -0800
>"H. Peter Anvin" <[email protected]> wrote:
>
>> On February 14, 2024 6:52:53 AM PST, Dave Hansen <[email protected]> wrote:
>> >On 2/14/24 03:35, Petr Tesarik wrote:
>> >> This patch series implements x86_64 arch hooks for the generic SandBox
>> >> Mode infrastructure.
>> >
>> >I think I'm missing a bit of context here. What does one _do_ with
>> >SandBox Mode? Why is it useful?
>>
>> Seriously. On the surface it looks like a really bad idea – basically an ad hoc, *more* privileged version of user shave.
>
>Hi hpa,
>
>I agree that it kind of tries to do "user mode without user mode".
>There are some differences from actual user mode:
>
>First, from a process management POV, sandbox mode appears to be
>running in kernel mode. So, there is no way to use ptrace(2), send
>malicious signals or otherwise interact with the sandbox. In fact,
>the process can have three independent contexts: user mode, kernel mode
>and sandbox mode.
>
>Second, a sandbox can run unmodified kernel code and interact directly
>with other parts of the kernel. It's not really possible with this
>initial patch series, but the plan is that sandbox mode can share locks
>with the kernel.
>
>Third, sandbox code can be trusted for operations like parsing keys for
>the trusted keychain if the kernel is locked down, i.e. when even a
>process with UID 0 is not on the same trust level as kernel mode.
>
>HTH
>Petr T
>
This, to me, seems like "all the downsides of a microkernel without the upsides." Furthermore, it breaks security-hardening features like LASS and (to a lesser degree) SMAP. Not to mention dropping global pages?
All in all, I cannot see this as anything other than an enormous step in the wrong direction, and it isn't even in the sense of "it is harmless if noone uses it" – you are introducing architectural changes that are most definitely *very* harmful both to maintainers and users.
To me, this feels like paravirtualization all over again. 20 years later we still have not been able to undo all the damage that did.
On Wed, 14 Feb 2024 06:52:53 -0800
Dave Hansen <[email protected]> wrote:
> On 2/14/24 03:35, Petr Tesarik wrote:
> > This patch series implements x86_64 arch hooks for the generic SandBox
> > Mode infrastructure.
>
> I think I'm missing a bit of context here. What does one _do_ with
> SandBox Mode? Why is it useful?
I see, I split the patch series into the base infrastructure and the
x86_64 implementation, but I forgot to merge the two recipient lists.
:-(
Anyway, in the long term I would like to work on gradual decomposition
of the kernel into a core part and many self-contained components.
Sandbox mode is a useful tool to enforce isolation.
In its current form, sandbox mode is too limited for that, but I'm
trying to find some balance between "publish early" and reaching a
feature level where some concrete examples can be shown. I'd rather
fail fast than maintain hundreds of patches in an out-of-tree branch
before submitting (and failing anyway).
Petr T
On 2/14/2024 3:35 AM, Petr Tesarik wrote:
> From: Petr Tesarik <[email protected]>
>
> Add a section about the x86-64 implementation.
>
> Signed-off-by: Petr Tesarik <[email protected]>
> ---
> Documentation/security/sandbox-mode.rst | 25 +++++++++++++++++++++++++
> 1 file changed, 25 insertions(+)
>
> diff --git a/Documentation/security/sandbox-mode.rst b/Documentation/security/sandbox-mode.rst
> index 4405b8858c4a..84816b6b68de 100644
> --- a/Documentation/security/sandbox-mode.rst
> +++ b/Documentation/security/sandbox-mode.rst
where is this file?
I assumed it's newly added, but your patch doesn't say so.
> @@ -111,6 +111,31 @@ These hooks must be implemented to select HAVE_ARCH_SBM.
> :identifiers: arch_sbm_init arch_sbm_destroy arch_sbm_exec
> arch_sbm_map_readonly arch_sbm_map_writable
>
> +X86_64 Implementation
> +=====================
> +
> +The x86_64 implementation provides strong isolation and recovery from CPU
> +exceptions.
> +
> +Sandbox mode runs in protection ring 3 (same as user mode). This means that:
> +
> +* sandbox code cannot execute privileged CPU instructions,
> +* memory accesses are treated as user accesses.
> +
> +The thread stack is readable in sandbox mode, because an on-stack data
> +structure is used by call helpers and thunks to pass target function
> +arguments. However, it is not writable, and sandbox code runs on its own
> +stack. The thread stack is not used by interrupt handlers either. Non-IST
> +interrupt handlers run on a separate sandbox exception stack.
> +
> +The interrupt entry path modifies the saved pt_regs to make it appear as
> +coming from kernel mode. The CR3 register is then switched to kernel mode.
> +The interrupt exit path is modified to restore actual pt_regs and switch the
> +CR3 register back to its sandbox mode value, overriding CR3 changes for page
> +table isolation.
> +
> +Support for paravirtualized kernels is not (yet) provided.
> +
> Current Limitations
> ===================
>
On 2/14/24 10:22, Petr Tesařík wrote:
> Anyway, in the long term I would like to work on gradual decomposition
> of the kernel into a core part and many self-contained components.
> Sandbox mode is a useful tool to enforce isolation.
I'd want to see at least a few examples of how this decomposition would
work and how much of a burden it is on each site that deployed it.
But I'm skeptical that this could ever work. Ring-0 execution really is
special and it's _increasingly_ so. Think of LASS or SMAP or SMEP.
We're even seeing hardware designers add hardware security defenses to
ring-0 that are not applied to ring-3.
In other words, ring-3 isn't just a deprivileged ring-0, it's more
exposed to attacks.
> I'd rather fail fast than maintain hundreds of patches in an
> out-of-tree branch before submitting (and failing anyway).
I don't see any remotely feasible path forward for this approach.
(+Cc Kees)
On Wed, 14 Feb 2024 18:14:49 +0000
"Edgecombe, Rick P" <[email protected]> wrote:
> On Wed, 2024-02-14 at 17:41 +0100, Petr Tesařík wrote:
> > Second, a sandbox can run unmodified kernel code and interact
> > directly
> > with other parts of the kernel. It's not really possible with this
> > initial patch series, but the plan is that sandbox mode can share
> > locks
> > with the kernel.
> >
> > Third, sandbox code can be trusted for operations like parsing keys
> > for
> > the trusted keychain if the kernel is locked down, i.e. when even a
> > process with UID 0 is not on the same trust level as kernel mode.
>
> What use case needs to have the sandbox both protected from the kernel
> (trusted operations) and non-privileged (the kernel protected from it
> via CPL3)? It seems like opposite things.
I think I have mentioned one: parsing keys for the trusted keyring. The
parser is complex enough to be potentially buggy, but the security
folks have already dismissed the idea to run it as a user mode helper.
Petr T
On 2/14/2024 10:22 AM, Petr Tesařík wrote:
> On Wed, 14 Feb 2024 06:52:53 -0800
> Dave Hansen <[email protected]> wrote:
>
>> On 2/14/24 03:35, Petr Tesarik wrote:
>>> This patch series implements x86_64 arch hooks for the generic SandBox
>>> Mode infrastructure.
>>
>> I think I'm missing a bit of context here. What does one _do_ with
>> SandBox Mode? Why is it useful?
>
> I see, I split the patch series into the base infrastructure and the
> x86_64 implementation, but I forgot to merge the two recipient lists.
> :-(
>
> Anyway, in the long term I would like to work on gradual decomposition
> of the kernel into a core part and many self-contained components.
> Sandbox mode is a useful tool to enforce isolation.
>
> In its current form, sandbox mode is too limited for that, but I'm
> trying to find some balance between "publish early" and reaching a
> feature level where some concrete examples can be shown. I'd rather
> fail fast than maintain hundreds of patches in an out-of-tree branch
> before submitting (and failing anyway).
>
> Petr T
>
What you're proposing sounds a gigantic thing, which could potentially
impact all subsystems. Unless you prove it has big advantages with real
world usages, I guess nobody even wants to look into the patches.
BTW, this seems another attempt to get the idea of micro-kernel into
Linux.
--
Thanks!
Xin
On Wed, 2024-02-14 at 17:41 +0100, Petr Tesařík wrote:
> Second, a sandbox can run unmodified kernel code and interact
> directly
> with other parts of the kernel. It's not really possible with this
> initial patch series, but the plan is that sandbox mode can share
> locks
> with the kernel.
>
> Third, sandbox code can be trusted for operations like parsing keys
> for
> the trusted keychain if the kernel is locked down, i.e. when even a
> process with UID 0 is not on the same trust level as kernel mode.
What use case needs to have the sandbox both protected from the kernel
(trusted operations) and non-privileged (the kernel protected from it
via CPL3)? It seems like opposite things.
On Wed, 2024-02-14 at 19:32 +0100, Petr Tesařík wrote:
> > What use case needs to have the sandbox both protected from the
> > kernel
> > (trusted operations) and non-privileged (the kernel protected from
> > it
> > via CPL3)? It seems like opposite things.
>
> I think I have mentioned one: parsing keys for the trusted keyring.
> The
> parser is complex enough to be potentially buggy, but the security
> folks have already dismissed the idea to run it as a user mode
> helper.
Ah, I didn't realize the kernel needed to be protected from the key
parsing part because you called it out as a trusted operation. So on
the protect-the-kernel-side it's similar to the microkernel security
reasoning.
Did I get the other part wrong - that you want to protect the sandbox
from the rest of kernel as well?
On Wed, 14 Feb 2024 10:42:57 -0800
Dave Hansen <[email protected]> wrote:
> On 2/14/24 10:22, Petr Tesařík wrote:
> > Anyway, in the long term I would like to work on gradual decomposition
> > of the kernel into a core part and many self-contained components.
> > Sandbox mode is a useful tool to enforce isolation.
>
> I'd want to see at least a few examples of how this decomposition would
> work and how much of a burden it is on each site that deployed it.
Got it. Are you okay with a couple of examples to illustrate the
concept? Because if you want patches that have been acked by the
respective maintainers, it somehow becomes a chicken-and-egg kind of
problem...
> But I'm skeptical that this could ever work. Ring-0 execution really is
> special and it's _increasingly_ so. Think of LASS or SMAP or SMEP.
I have just answered a similar concern by hpa. In short, I don't think
these features are relevant, because by definition sandbox mode does
not share anything with user mode address space.
> We're even seeing hardware designers add hardware security defenses to
> ring-0 that are not applied to ring-3.
>
> In other words, ring-3 isn't just a deprivileged ring-0, it's more
> exposed to attacks.
>
> > I'd rather fail fast than maintain hundreds of patches in an
> > out-of-tree branch before submitting (and failing anyway).
>
> I don't see any remotely feasible path forward for this approach.
I can live with such decision. But first, I want to make sure that the
concept has been understood correctly. So far, at least some concerns
suggest an understanding that is not quite accurate.
Is this sandbox idea a bit too much out-of-the-box?
Petr T
On Wed, 14 Feb 2024 19:19:27 +0000
"Edgecombe, Rick P" <[email protected]> wrote:
> On Wed, 2024-02-14 at 19:32 +0100, Petr Tesařík wrote:
> > > What use case needs to have the sandbox both protected from the
> > > kernel
> > > (trusted operations) and non-privileged (the kernel protected from
> > > it
> > > via CPL3)? It seems like opposite things.
> >
> > I think I have mentioned one: parsing keys for the trusted keyring.
> > The
> > parser is complex enough to be potentially buggy, but the security
> > folks have already dismissed the idea to run it as a user mode
> > helper.
>
> Ah, I didn't realize the kernel needed to be protected from the key
> parsing part because you called it out as a trusted operation. So on
> the protect-the-kernel-side it's similar to the microkernel security
> reasoning.
>
> Did I get the other part wrong - that you want to protect the sandbox
> from the rest of kernel as well?
Protecting the sandbox from the rest of the kernel is out of scope.
However, different sandboxes should be protected from each other.
Petr T
On Wed, 14 Feb 2024 09:29:06 -0800
"H. Peter Anvin" <[email protected]> wrote:
> On February 14, 2024 8:41:43 AM PST, "Petr Tesařík" <[email protected]> wrote:
> >On Wed, 14 Feb 2024 07:28:35 -0800
> >"H. Peter Anvin" <[email protected]> wrote:
> >
> >> On February 14, 2024 6:52:53 AM PST, Dave Hansen <[email protected]> wrote:
> >> >On 2/14/24 03:35, Petr Tesarik wrote:
> >> >> This patch series implements x86_64 arch hooks for the generic SandBox
> >> >> Mode infrastructure.
> >> >
> >> >I think I'm missing a bit of context here. What does one _do_ with
> >> >SandBox Mode? Why is it useful?
> >>
> >> Seriously. On the surface it looks like a really bad idea – basically an ad hoc, *more* privileged version of user shave.
> >
> >Hi hpa,
> >
> >I agree that it kind of tries to do "user mode without user mode".
> >There are some differences from actual user mode:
> >
> >First, from a process management POV, sandbox mode appears to be
> >running in kernel mode. So, there is no way to use ptrace(2), send
> >malicious signals or otherwise interact with the sandbox. In fact,
> >the process can have three independent contexts: user mode, kernel mode
> >and sandbox mode.
> >
> >Second, a sandbox can run unmodified kernel code and interact directly
> >with other parts of the kernel. It's not really possible with this
> >initial patch series, but the plan is that sandbox mode can share locks
> >with the kernel.
> >
> >Third, sandbox code can be trusted for operations like parsing keys for
> >the trusted keychain if the kernel is locked down, i.e. when even a
> >process with UID 0 is not on the same trust level as kernel mode.
> >
> >HTH
> >Petr T
> >
>
> This, to me, seems like "all the downsides of a microkernel without the upsides." Furthermore, it breaks security-hardening features like LASS and (to a lesser degree) SMAP. Not to mention dropping global pages?
I must be missing something... But I am always open to learn something new.
I don't see how it breaks SMAP. Sandbox mode runs in its own address
space which does not contain any user-mode pages. While running in
sandbox mode, user pages belong to the sandboxed code, kernel pages are
used to enter/exit kernel mode. Bottom half of the PGD is empty, all
user page translations are removed from TLB.
For a similar reason, I don't see right now how it breaks linear
address space separation. Even if it did, I believe I can take care of
it in the entry/exit path. Anyway, which branch contains the LASS
patches now, so I can test?
As for dropping global pages, that's only part of the story. Indeed,
patch 6/8 of the series sets CR4.PGE to zero to have a known-good
working state, but that code is removed again by patch 8/8. I wanted to
implement lazy TLB flushing separately, so it can be easily reverted if
it is suspected to cause an issue.
Plus, each sandbox mode can use PCID to reduce TLB flushing even more.
I haven't done it, because it would be a waste of time if the whole
concept is scratched.
I believe that only those global pages which are actually accessed by
the sandbox need to be flushed. Yes, some parts of the necessary logic
are missing in the current patch series. I can add them in a v2 series
if you wish.
> All in all, I cannot see this as anything other than an enormous step in the wrong direction, and it isn't even in the sense of "it is harmless if noone uses it" – you are introducing architectural changes that are most definitely *very* harmful both to maintainers and users.
I agree that it adds some burden. After all, that's why the ultimate
decision is up to you, the maintainers. To defend my cause, I hope you
have noticed that if CONFIG_SANDBOX_MODE is not set:
1. literally nothing changes in entry_64.
2. sandbox_mode() always evaluates to false, so the added conditionals in fault.c and traps.c are never executed
3. top_of_instr_stack() always returns current_top_of_stack(), which is equivalent to the code it replaces, namely this_cpu_read(pcpu_hot.top_of_stack)
So, all the interesting stuff is under arch/x86/kernel/sbm/. Shall I
add a corresponding entry with my name to MAINTAINERS?
> To me, this feels like paravirtualization all over again. 20 years later we still have not been able to undo all the damage that did.
OK, I can follow you here. Indeed, there is some similarity with Xen PV
(running kernel code with CPL 3), but I don't think there's more than
this.
Petr T
On Wed, 14 Feb 2024 10:37:19 -0800
Xin Li <[email protected]> wrote:
> On 2/14/2024 3:35 AM, Petr Tesarik wrote:
> > From: Petr Tesarik <[email protected]>
> >
> > Add a section about the x86-64 implementation.
> >
> > Signed-off-by: Petr Tesarik <[email protected]>
> > ---
> > Documentation/security/sandbox-mode.rst | 25 +++++++++++++++++++++++++
> > 1 file changed, 25 insertions(+)
> >
> > diff --git a/Documentation/security/sandbox-mode.rst b/Documentation/security/sandbox-mode.rst
> > index 4405b8858c4a..84816b6b68de 100644
> > --- a/Documentation/security/sandbox-mode.rst
> > +++ b/Documentation/security/sandbox-mode.rst
>
> where is this file?
>
> I assumed it's newly added, but your patch doesn't say so.
Ah, right, for people who are not on the Cc list of my arch-independent
series, this series is entirely out of context. FWIW the first part is here:
https://lore.kernel.org/lkml/[email protected]/
I wonder, would it be better to resend it with a more complete list of
recipients?
Petr T
On 2/14/24 11:33, Petr Tesařík wrote:
>> I'd want to see at least a few examples of how this decomposition would
>> work and how much of a burden it is on each site that deployed it.
> Got it. Are you okay with a couple of examples to illustrate the
> concept? Because if you want patches that have been acked by the
> respective maintainers, it somehow becomes a chicken-and-egg kind of
> problem...
I'd be happy to look at a patch or two that demonstrate the concept,
just to make sure I'm not missing something. But I'm still quite skeptical.
On Wed, 14 Feb 2024 10:52:47 -0800
Xin Li <[email protected]> wrote:
> On 2/14/2024 10:22 AM, Petr Tesařík wrote:
> > On Wed, 14 Feb 2024 06:52:53 -0800
> > Dave Hansen <[email protected]> wrote:
> >
> >> On 2/14/24 03:35, Petr Tesarik wrote:
> >>> This patch series implements x86_64 arch hooks for the generic SandBox
> >>> Mode infrastructure.
> >>
> >> I think I'm missing a bit of context here. What does one _do_ with
> >> SandBox Mode? Why is it useful?
> >
> > I see, I split the patch series into the base infrastructure and the
> > x86_64 implementation, but I forgot to merge the two recipient lists.
> > :-(
> >
> > Anyway, in the long term I would like to work on gradual decomposition
> > of the kernel into a core part and many self-contained components.
> > Sandbox mode is a useful tool to enforce isolation.
> >
> > In its current form, sandbox mode is too limited for that, but I'm
> > trying to find some balance between "publish early" and reaching a
> > feature level where some concrete examples can be shown. I'd rather
> > fail fast than maintain hundreds of patches in an out-of-tree branch
> > before submitting (and failing anyway).
> >
> > Petr T
> >
>
> What you're proposing sounds a gigantic thing, which could potentially
> impact all subsystems.
True. Luckily, sandbox mode allows me to move gradually, one component
at a time.
> Unless you prove it has big advantages with real
> world usages, I guess nobody even wants to look into the patches.
>
> BTW, this seems another attempt to get the idea of micro-kernel into
> Linux.
We know it's not feasible to convert Linux to a micro-kernel. AFAICS
that would require some kind of big switch, affecting all subsystems at
once.
But with a growing code base and more or less constant bug-per-LOC rate,
people will continue to come up with some ideas how to limit the
potential impact of each bug. Logically, one of the concepts that come
to mind is decomposition.
If my attempt helps to clarify how such decomposition should be done to
be acceptable, it is worthwile. If nothing else, I can summarize the
situation and ask Jonathan if he would kindly accept it as a LWN
article...
Petr T
On February 14, 2024 10:59:32 PM PST, "Petr Tesařík" <[email protected]> wrote:
>On Wed, 14 Feb 2024 10:52:47 -0800
>Xin Li <[email protected]> wrote:
>
>> On 2/14/2024 10:22 AM, Petr Tesařík wrote:
>> > On Wed, 14 Feb 2024 06:52:53 -0800
>> > Dave Hansen <[email protected]> wrote:
>> >
>> >> On 2/14/24 03:35, Petr Tesarik wrote:
>> >>> This patch series implements x86_64 arch hooks for the generic SandBox
>> >>> Mode infrastructure.
>> >>
>> >> I think I'm missing a bit of context here. What does one _do_ with
>> >> SandBox Mode? Why is it useful?
>> >
>> > I see, I split the patch series into the base infrastructure and the
>> > x86_64 implementation, but I forgot to merge the two recipient lists.
>> > :-(
>> >
>> > Anyway, in the long term I would like to work on gradual decomposition
>> > of the kernel into a core part and many self-contained components.
>> > Sandbox mode is a useful tool to enforce isolation.
>> >
>> > In its current form, sandbox mode is too limited for that, but I'm
>> > trying to find some balance between "publish early" and reaching a
>> > feature level where some concrete examples can be shown. I'd rather
>> > fail fast than maintain hundreds of patches in an out-of-tree branch
>> > before submitting (and failing anyway).
>> >
>> > Petr T
>> >
>>
>> What you're proposing sounds a gigantic thing, which could potentially
>> impact all subsystems.
>
>True. Luckily, sandbox mode allows me to move gradually, one component
>at a time.
>
>> Unless you prove it has big advantages with real
>> world usages, I guess nobody even wants to look into the patches.
>>
>> BTW, this seems another attempt to get the idea of micro-kernel into
>> Linux.
>
>We know it's not feasible to convert Linux to a micro-kernel. AFAICS
>that would require some kind of big switch, affecting all subsystems at
>once.
>
>But with a growing code base and more or less constant bug-per-LOC rate,
>people will continue to come up with some ideas how to limit the
>potential impact of each bug. Logically, one of the concepts that come
>to mind is decomposition.
>
>If my attempt helps to clarify how such decomposition should be done to
>be acceptable, it is worthwile. If nothing else, I can summarize the
>situation and ask Jonathan if he would kindly accept it as a LWN
>article...
>
>Petr T
>
I have been thinking more about this, and I'm more than ever convinced that exposing kernel memory to *any* kind of user space is a really, really bad idea. It is not a door we ever want to open; once that line gets muddled, the attack surface opens up dramatically.
And, in fact, we already have a sandbox mode in the kernel – it is called eBPF.
On Thu, 15 Feb 2024 00:16:13 -0800
"H. Peter Anvin" <[email protected]> wrote:
> On February 14, 2024 10:59:32 PM PST, "Petr Tesařík" <[email protected]> wrote:
> >On Wed, 14 Feb 2024 10:52:47 -0800
> >Xin Li <[email protected]> wrote:
> >
> >> On 2/14/2024 10:22 AM, Petr Tesařík wrote:
> >> > On Wed, 14 Feb 2024 06:52:53 -0800
> >> > Dave Hansen <[email protected]> wrote:
> >> >
> >> >> On 2/14/24 03:35, Petr Tesarik wrote:
> >> >>> This patch series implements x86_64 arch hooks for the generic SandBox
> >> >>> Mode infrastructure.
> >> >>
> >> >> I think I'm missing a bit of context here. What does one _do_ with
> >> >> SandBox Mode? Why is it useful?
> >> >
> >> > I see, I split the patch series into the base infrastructure and the
> >> > x86_64 implementation, but I forgot to merge the two recipient lists.
> >> > :-(
> >> >
> >> > Anyway, in the long term I would like to work on gradual decomposition
> >> > of the kernel into a core part and many self-contained components.
> >> > Sandbox mode is a useful tool to enforce isolation.
> >> >
> >> > In its current form, sandbox mode is too limited for that, but I'm
> >> > trying to find some balance between "publish early" and reaching a
> >> > feature level where some concrete examples can be shown. I'd rather
> >> > fail fast than maintain hundreds of patches in an out-of-tree branch
> >> > before submitting (and failing anyway).
> >> >
> >> > Petr T
> >> >
> >>
> >> What you're proposing sounds a gigantic thing, which could potentially
> >> impact all subsystems.
> >
> >True. Luckily, sandbox mode allows me to move gradually, one component
> >at a time.
> >
> >> Unless you prove it has big advantages with real
> >> world usages, I guess nobody even wants to look into the patches.
> >>
> >> BTW, this seems another attempt to get the idea of micro-kernel into
> >> Linux.
> >
> >We know it's not feasible to convert Linux to a micro-kernel. AFAICS
> >that would require some kind of big switch, affecting all subsystems at
> >once.
> >
> >But with a growing code base and more or less constant bug-per-LOC rate,
> >people will continue to come up with some ideas how to limit the
> >potential impact of each bug. Logically, one of the concepts that come
> >to mind is decomposition.
> >
> >If my attempt helps to clarify how such decomposition should be done to
> >be acceptable, it is worthwile. If nothing else, I can summarize the
> >situation and ask Jonathan if he would kindly accept it as a LWN
> >article...
> >
> >Petr T
> >
>
> I have been thinking more about this, and I'm more than ever convinced that exposing kernel memory to *any* kind of user space is a really, really bad idea. It is not a door we ever want to open; once that line gets muddled, the attack surface opens up dramatically.
Would you mind elaborating on this a bit more?
For one thing, sandbox mode is *not* user mode. Sure, my proposed
x86-64 implementation runs with the same CPU privilege level as user
mode, but it is isolated from user mode with just as strong mechanisms
as any two user mode processes are isolated from each other. Are you
saying that process isolation in Linux is not all that strong after all?
Don't get me wrong. I'm honestly trying to understand what exactly
makes the idea so bad. I have apparently not considered something that
you have, and I would be glad if you could reveal it.
> And, in fact, we already have a sandbox mode in the kernel – it is called eBPF.
Sure. The difference is that eBPF is a platform of its own (with its
own consistency model, machine code etc.). Rewriting code for eBPF may
need a bit more effort.
Besides, Roberto wrote a PGP key parser as an eBPF program at some
point, and I believe it was rejected for that reason. So, it seems
there are situations where eBPF is not an alternative.
Roberto, can you remember and share some details?
Petr T
On Thu, 2024-02-15 at 10:30 +0100, Petr Tesařík wrote:
> On Thu, 15 Feb 2024 00:16:13 -0800
> "H. Peter Anvin" <[email protected]> wrote:
>
> > On February 14, 2024 10:59:32 PM PST, "Petr Tesařík" <[email protected]> wrote:
> > > On Wed, 14 Feb 2024 10:52:47 -0800
> > > Xin Li <[email protected]> wrote:
> > >
> > > > On 2/14/2024 10:22 AM, Petr Tesařík wrote:
> > > > > On Wed, 14 Feb 2024 06:52:53 -0800
> > > > > Dave Hansen <[email protected]> wrote:
> > > > >
> > > > > > On 2/14/24 03:35, Petr Tesarik wrote:
> > > > > > > This patch series implements x86_64 arch hooks for the generic SandBox
> > > > > > > Mode infrastructure.
> > > > > >
> > > > > > I think I'm missing a bit of context here. What does one _do_ with
> > > > > > SandBox Mode? Why is it useful?
> > > > >
> > > > > I see, I split the patch series into the base infrastructure and the
> > > > > x86_64 implementation, but I forgot to merge the two recipient lists.
> > > > > :-(
> > > > >
> > > > > Anyway, in the long term I would like to work on gradual decomposition
> > > > > of the kernel into a core part and many self-contained components.
> > > > > Sandbox mode is a useful tool to enforce isolation.
> > > > >
> > > > > In its current form, sandbox mode is too limited for that, but I'm
> > > > > trying to find some balance between "publish early" and reaching a
> > > > > feature level where some concrete examples can be shown. I'd rather
> > > > > fail fast than maintain hundreds of patches in an out-of-tree branch
> > > > > before submitting (and failing anyway).
> > > > >
> > > > > Petr T
> > > > >
> > > >
> > > > What you're proposing sounds a gigantic thing, which could potentially
> > > > impact all subsystems.
> > >
> > > True. Luckily, sandbox mode allows me to move gradually, one component
> > > at a time.
> > >
> > > > Unless you prove it has big advantages with real
> > > > world usages, I guess nobody even wants to look into the patches.
> > > >
> > > > BTW, this seems another attempt to get the idea of micro-kernel into
> > > > Linux.
> > >
> > > We know it's not feasible to convert Linux to a micro-kernel. AFAICS
> > > that would require some kind of big switch, affecting all subsystems at
> > > once.
> > >
> > > But with a growing code base and more or less constant bug-per-LOC rate,
> > > people will continue to come up with some ideas how to limit the
> > > potential impact of each bug. Logically, one of the concepts that come
> > > to mind is decomposition.
> > >
> > > If my attempt helps to clarify how such decomposition should be done to
> > > be acceptable, it is worthwile. If nothing else, I can summarize the
> > > situation and ask Jonathan if he would kindly accept it as a LWN
> > > article...
> > >
> > > Petr T
> > >
> >
> > I have been thinking more about this, and I'm more than ever convinced that exposing kernel memory to *any* kind of user space is a really, really bad idea. It is not a door we ever want to open; once that line gets muddled, the attack surface opens up dramatically.
>
> Would you mind elaborating on this a bit more?
>
> For one thing, sandbox mode is *not* user mode. Sure, my proposed
> x86-64 implementation runs with the same CPU privilege level as user
> mode, but it is isolated from user mode with just as strong mechanisms
> as any two user mode processes are isolated from each other. Are you
> saying that process isolation in Linux is not all that strong after all?
>
> Don't get me wrong. I'm honestly trying to understand what exactly
> makes the idea so bad. I have apparently not considered something that
> you have, and I would be glad if you could reveal it.
>
> > And, in fact, we already have a sandbox mode in the kernel – it is called eBPF.
>
> Sure. The difference is that eBPF is a platform of its own (with its
> own consistency model, machine code etc.). Rewriting code for eBPF may
> need a bit more effort.
>
> Besides, Roberto wrote a PGP key parser as an eBPF program at some
> point, and I believe it was rejected for that reason. So, it seems
> there are situations where eBPF is not an alternative.
>
> Roberto, can you remember and share some details?
eBPF programs are not signed.
And I struggled to have some security bugs fixed, so I gave up.
Roberto
From: Petr Tesarik <[email protected]>
While I started working on my development branch to illustrate how
SandBox Mode could be enhanced to allow dynamic memory allocation and
other features necessary to convert some existing code, my colleague
Roberto Sassu set out and adapted a PGP key parser to run in a sandbox.
Disclaimer:
The code had to be rearranged in order to avoid memory allocations
and crypto operations in the sandbox. The code might contain errors.
David Howells (4):
PGPLIB: PGP definitions (RFC 4880)
PGPLIB: Basic packet parser
PGPLIB: Signature parser
KEYS: PGP data parser
Roberto Sassu (4):
mpi: Introduce mpi_key_length()
rsa: add parser of raw format
KEYS: Run PGP key parser in a sandbox
KEYS: Add intentional fault injection
crypto/asymmetric_keys/Kconfig | 17 +
crypto/asymmetric_keys/Makefile | 20 +
crypto/asymmetric_keys/pgp.h | 206 +++++++++
crypto/asymmetric_keys/pgp_library.c | 556 ++++++++++++++++++++++++
crypto/asymmetric_keys/pgp_parser.h | 18 +
crypto/asymmetric_keys/pgp_public_key.c | 441 +++++++++++++++++++
crypto/asymmetric_keys/pgplib.h | 58 +++
crypto/rsa.c | 14 +-
crypto/rsa_helper.c | 69 +++
include/crypto/internal/rsa.h | 6 +
include/linux/mpi.h | 2 +
lib/crypto/mpi/mpicoder.c | 33 +-
12 files changed, 1429 insertions(+), 11 deletions(-)
create mode 100644 crypto/asymmetric_keys/pgp.h
create mode 100644 crypto/asymmetric_keys/pgp_library.c
create mode 100644 crypto/asymmetric_keys/pgp_parser.h
create mode 100644 crypto/asymmetric_keys/pgp_public_key.c
create mode 100644 crypto/asymmetric_keys/pgplib.h
--
2.34.1
From: David Howells <[email protected]>
Provide some useful PGP definitions from RFC 4880. These describe details
of public key crypto as used by crypto keys for things like signature
verification.
Signed-off-by: David Howells <[email protected]>
Co-developed-by: Roberto Sassu <[email protected]>
Signed-off-by: Roberto Sassu <[email protected]>
---
crypto/asymmetric_keys/pgp.h | 206 +++++++++++++++++++++++++++++++++++
1 file changed, 206 insertions(+)
create mode 100644 crypto/asymmetric_keys/pgp.h
diff --git a/crypto/asymmetric_keys/pgp.h b/crypto/asymmetric_keys/pgp.h
new file mode 100644
index 000000000000..5eb4f4222090
--- /dev/null
+++ b/crypto/asymmetric_keys/pgp.h
@@ -0,0 +1,206 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* PGP definitions (RFC 4880)
+ *
+ * Copyright (C) 2011 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#include <linux/types.h>
+
+struct pgp_key_ID {
+ u8 id[8];
+} __packed;
+
+struct pgp_time {
+ u8 time[4];
+} __packed;
+
+/*
+ * PGP public-key algorithm identifiers [RFC4880: 9.1]
+ */
+enum pgp_pubkey_algo {
+ PGP_PUBKEY_RSA_ENC_OR_SIG = 1,
+ PGP_PUBKEY_RSA_ENC_ONLY = 2,
+ PGP_PUBKEY_RSA_SIG_ONLY = 3,
+ PGP_PUBKEY_ELGAMAL = 16,
+ PGP_PUBKEY_DSA = 17,
+ PGP_PUBKEY__LAST
+};
+
+/*
+ * PGP symmetric-key algorithm identifiers [RFC4880: 9.2]
+ */
+enum pgp_symkey_algo {
+ PGP_SYMKEY_PLAINTEXT = 0,
+ PGP_SYMKEY_IDEA = 1,
+ PGP_SYMKEY_3DES = 2,
+ PGP_SYMKEY_CAST5 = 3,
+ PGP_SYMKEY_BLOWFISH = 4,
+ PGP_SYMKEY_AES_128KEY = 7,
+ PGP_SYMKEY_AES_192KEY = 8,
+ PGP_SYMKEY_AES_256KEY = 9,
+ PGP_SYMKEY_TWOFISH_256KEY = 10,
+};
+
+/*
+ * PGP compression algorithm identifiers [RFC4880: 9.3]
+ */
+enum pgp_compr_algo {
+ PGP_COMPR_UNCOMPRESSED = 0,
+ PGP_COMPR_ZIP = 1,
+ PGP_COMPR_ZLIB = 2,
+ PGP_COMPR_BZIP2 = 3,
+};
+
+/*
+ * PGP hash algorithm identifiers [RFC4880: 9.4]
+ */
+enum pgp_hash_algo {
+ PGP_HASH_MD5 = 1,
+ PGP_HASH_SHA1 = 2,
+ PGP_HASH_RIPE_MD_160 = 3,
+ PGP_HASH_SHA256 = 8,
+ PGP_HASH_SHA384 = 9,
+ PGP_HASH_SHA512 = 10,
+ PGP_HASH_SHA224 = 11,
+ PGP_HASH__LAST
+};
+
+extern const char *const pgp_hash_algorithms[PGP_HASH__LAST];
+
+/*
+ * PGP packet type tags [RFC4880: 4.3].
+ */
+enum pgp_packet_tag {
+ PGP_PKT_RESERVED = 0,
+ PGP_PKT_PUBKEY_ENC_SESSION_KEY = 1,
+ PGP_PKT_SIGNATURE = 2,
+ PGP_PKT_SYMKEY_ENC_SESSION_KEY = 3,
+ PGP_PKT_ONEPASS_SIGNATURE = 4,
+ PGP_PKT_SECRET_KEY = 5,
+ PGP_PKT_PUBLIC_KEY = 6,
+ PGP_PKT_SECRET_SUBKEY = 7,
+ PGP_PKT_COMPRESSED_DATA = 8,
+ PGP_PKT_SYM_ENC_DATA = 9,
+ PGP_PKT_MARKER = 10,
+ PGP_PKT_LITERAL_DATA = 11,
+ PGP_PKT_TRUST = 12,
+ PGP_PKT_USER_ID = 13,
+ PGP_PKT_PUBLIC_SUBKEY = 14,
+ PGP_PKT_USER_ATTRIBUTE = 17,
+ PGP_PKT_SYM_ENC_AND_INTEG_DATA = 18,
+ PGP_PKT_MODIFY_DETECT_CODE = 19,
+ PGP_PKT_PRIVATE_0 = 60,
+ PGP_PKT_PRIVATE_3 = 63,
+ PGP_PKT__HIGHEST = 63
+};
+
+/*
+ * Signature (tag 2) packet [RFC4880: 5.2].
+ */
+enum pgp_signature_version {
+ PGP_SIG_VERSION_3 = 3,
+ PGP_SIG_VERSION_4 = 4,
+};
+
+enum pgp_signature_type {
+ PGP_SIG_BINARY_DOCUMENT_SIG = 0x00,
+ PGP_SIG_CANONICAL_TEXT_DOCUMENT_SIG = 0x01,
+ PGP_SIG_STANDALONE_SIG = 0x02,
+ PGP_SIG_GENERAL_CERT_OF_UID_PUBKEY = 0x10,
+ PGP_SIG_PERSONAL_CERT_OF_UID_PUBKEY = 0x11,
+ PGP_SIG_CASUAL_CERT_OF_UID_PUBKEY = 0x12,
+ PGP_SIG_POSTITIVE_CERT_OF_UID_PUBKEY = 0x13,
+ PGP_SIG_SUBKEY_BINDING_SIG = 0x18,
+ PGP_SIG_PRIMARY_KEY_BINDING_SIG = 0x19,
+ PGP_SIG_DIRECTLY_ON_KEY = 0x1F,
+ PGP_SIG_KEY_REVOCATION_SIG = 0x20,
+ PGP_SIG_SUBKEY_REVOCATION_SIG = 0x28,
+ PGP_SIG_CERT_REVOCATION_SIG = 0x30,
+ PGP_SIG_TIMESTAMP_SIG = 0x40,
+ PGP_SIG_THIRD_PARTY_CONFIRM_SIG = 0x50,
+};
+
+struct pgp_signature_v3_packet {
+ enum pgp_signature_version version : 8; /* == PGP_SIG_VERSION_3 */
+ u8 length_of_hashed; /* == 5 */
+ struct {
+ enum pgp_signature_type signature_type : 8;
+ struct pgp_time creation_time;
+ } __packed hashed;
+ struct pgp_key_ID issuer;
+ enum pgp_pubkey_algo pubkey_algo : 8;
+ enum pgp_hash_algo hash_algo : 8;
+} __packed;
+
+struct pgp_signature_v4_packet {
+ enum pgp_signature_version version : 8; /* == PGP_SIG_VERSION_4 */
+ enum pgp_signature_type signature_type : 8;
+ enum pgp_pubkey_algo pubkey_algo : 8;
+ enum pgp_hash_algo hash_algo : 8;
+} __packed;
+
+/*
+ * V4 signature subpacket types [RFC4880: 5.2.3.1].
+ */
+enum pgp_sig_subpkt_type {
+ PGP_SIG_CREATION_TIME = 2,
+ PGP_SIG_EXPIRATION_TIME = 3,
+ PGP_SIG_EXPORTABLE_CERT = 4,
+ PGP_SIG_TRUST_SIG = 5,
+ PGP_SIG_REGEXP = 6,
+ PGP_SIG_REVOCABLE = 7,
+ PGP_SIG_KEY_EXPIRATION_TIME = 9,
+ PGP_SIG_PREF_SYM_ALGO = 11,
+ PGP_SIG_REVOCATION_KEY = 12,
+ PGP_SIG_ISSUER = 16,
+ PGP_SIG_NOTATION_DATA = 20,
+ PGP_SIG_PREF_HASH_ALGO = 21,
+ PGP_SIG_PREF_COMPR_ALGO = 22,
+ PGP_SIG_KEY_SERVER_PREFS = 23,
+ PGP_SIG_PREF_KEY_SERVER = 24,
+ PGP_SIG_PRIMARY_USER_ID = 25,
+ PGP_SIG_POLICY_URI = 26,
+ PGP_SIG_KEY_FLAGS = 27,
+ PGP_SIG_SIGNERS_USER_ID = 28,
+ PGP_SIG_REASON_FOR_REVOCATION = 29,
+ PGP_SIG_FEATURES = 30,
+ PGP_SIG_TARGET = 31,
+ PGP_SIG_EMBEDDED_SIG = 32,
+ PGP_SIG__LAST
+};
+
+#define PGP_SIG_SUBPKT_TYPE_CRITICAL_MASK 0x80
+
+/*
+ * Key (tag 5, 6, 7 and 14) packet
+ */
+enum pgp_key_version {
+ PGP_KEY_VERSION_2 = 2,
+ PGP_KEY_VERSION_3 = 3,
+ PGP_KEY_VERSION_4 = 4,
+};
+
+struct pgp_key_v3_packet {
+ enum pgp_key_version version : 8;
+ struct pgp_time creation_time;
+ u8 expiry[2]; /* 0 or time in days till expiry */
+ enum pgp_pubkey_algo pubkey_algo : 8;
+ u8 key_material[0];
+} __packed;
+
+struct pgp_key_v4_packet {
+ enum pgp_key_version version : 8;
+ struct pgp_time creation_time;
+ enum pgp_pubkey_algo pubkey_algo : 8;
+ u8 key_material[0];
+} __packed;
+
+/*
+ * Literal Data (tag 11) packet
+ */
+enum pgp_literal_data_format {
+ PGP_LIT_FORMAT_BINARY = 0x62,
+ PGP_LIT_FORMAT_TEXT = 0x74,
+ PGP_LIT_FORMAT_TEXT_UTF8 = 0x75,
+};
--
2.34.1
From: David Howells <[email protected]>
Implement a PGP data parser for the crypto key type to use when
instantiating a key.
This parser attempts to parse the instantiation data as a PGP packet
sequence (RFC 4880) and if it parses okay, attempts to extract a public-key
algorithm key or subkey from it.
If it finds such a key, it will set up a public_key subtype payload with
appropriate handler routines (RSA) and attach it to the key.
Thanks to Tetsuo Handa <[email protected]> for pointing
out some errors.
Signed-off-by: David Howells <[email protected]>
Co-developed-by: Roberto Sassu <[email protected]>
Signed-off-by: Roberto Sassu <[email protected]>
---
crypto/asymmetric_keys/Kconfig | 11 +
crypto/asymmetric_keys/Makefile | 4 +
crypto/asymmetric_keys/pgp_parser.h | 18 +
crypto/asymmetric_keys/pgp_public_key.c | 416 ++++++++++++++++++++++++
4 files changed, 449 insertions(+)
create mode 100644 crypto/asymmetric_keys/pgp_parser.h
create mode 100644 crypto/asymmetric_keys/pgp_public_key.c
diff --git a/crypto/asymmetric_keys/Kconfig b/crypto/asymmetric_keys/Kconfig
index ebe9dc88d975..ebde5ef5d65f 100644
--- a/crypto/asymmetric_keys/Kconfig
+++ b/crypto/asymmetric_keys/Kconfig
@@ -92,4 +92,15 @@ config PGP_LIBRARY
This option enables a library that provides a number of simple
utility functions for parsing PGP (RFC 4880) packet-based messages.
+config PGP_KEY_PARSER
+ tristate "PGP key parser"
+ depends on ASYMMETRIC_PUBLIC_KEY_SUBTYPE
+ select PGP_LIBRARY
+ select MD5 # V3 fingerprint generation
+ select SHA1 # V4 fingerprint generation
+ help
+ This option provides support for parsing PGP (RFC 4880) format blobs
+ for key data and provides the ability to instantiate a crypto key
+ from a public key packet found inside the blob.
+
endif # ASYMMETRIC_KEY_TYPE
diff --git a/crypto/asymmetric_keys/Makefile b/crypto/asymmetric_keys/Makefile
index f7e5ee59857f..36a27cf2daff 100644
--- a/crypto/asymmetric_keys/Makefile
+++ b/crypto/asymmetric_keys/Makefile
@@ -93,3 +93,7 @@ $(obj)/tpm.asn1.o: $(obj)/tpm.asn1.c $(obj)/tpm.asn1.h
# PGP handling
#
obj-$(CONFIG_PGP_LIBRARY) += pgp_library.o
+
+obj-$(CONFIG_PGP_KEY_PARSER) += pgp_key_parser.o
+pgp_key_parser-y := \
+ pgp_public_key.o
diff --git a/crypto/asymmetric_keys/pgp_parser.h b/crypto/asymmetric_keys/pgp_parser.h
new file mode 100644
index 000000000000..1a560ce32415
--- /dev/null
+++ b/crypto/asymmetric_keys/pgp_parser.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* PGP crypto data parser internal definitions
+ *
+ * Copyright (C) 2011 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#include "pgplib.h"
+
+#define kenter(FMT, ...) \
+ pr_devel("==> %s("FMT")\n", __func__, ##__VA_ARGS__)
+#define kleave(FMT, ...) \
+ pr_devel("<== %s()"FMT"\n", __func__, ##__VA_ARGS__)
+
+/*
+ * pgp_public_key.c
+ */
+extern const char *pgp_to_public_key_algo[PGP_PUBKEY__LAST];
diff --git a/crypto/asymmetric_keys/pgp_public_key.c b/crypto/asymmetric_keys/pgp_public_key.c
new file mode 100644
index 000000000000..0529c8ce2d43
--- /dev/null
+++ b/crypto/asymmetric_keys/pgp_public_key.c
@@ -0,0 +1,416 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Instantiate a public key crypto key from PGP format data [RFC 4880]
+ *
+ * Copyright (C) 2011 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#define pr_fmt(fmt) "PGP: "fmt
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/mpi.h>
+#include <keys/asymmetric-subtype.h>
+#include <keys/asymmetric-parser.h>
+#include <crypto/hash.h>
+#include <crypto/public_key.h>
+
+#include "pgp_parser.h"
+
+#define MAX_MPI 5
+#define KEYCTL_SUPPORTS_ENCDEC \
+ (KEYCTL_SUPPORTS_ENCRYPT | KEYCTL_SUPPORTS_DECRYPT)
+#define KEYCTL_SUPPORTS_SIGVER (KEYCTL_SUPPORTS_SIGN | KEYCTL_SUPPORTS_VERIFY)
+
+MODULE_LICENSE("GPL");
+
+const char *pgp_to_public_key_algo[PGP_PUBKEY__LAST] = {
+ [PGP_PUBKEY_RSA_ENC_OR_SIG] = "rsa",
+ [PGP_PUBKEY_RSA_ENC_ONLY] = "rsa",
+ [PGP_PUBKEY_RSA_SIG_ONLY] = "rsa",
+ [PGP_PUBKEY_ELGAMAL] = NULL,
+ [PGP_PUBKEY_DSA] = NULL,
+};
+
+static const int pgp_key_algo_p_num_mpi[PGP_PUBKEY__LAST] = {
+ [PGP_PUBKEY_RSA_ENC_OR_SIG] = 2,
+ [PGP_PUBKEY_RSA_ENC_ONLY] = 2,
+ [PGP_PUBKEY_RSA_SIG_ONLY] = 2,
+ [PGP_PUBKEY_ELGAMAL] = 3,
+ [PGP_PUBKEY_DSA] = 4,
+};
+
+static const u8 pgp_public_key_capabilities[PGP_PUBKEY__LAST] = {
+ [PGP_PUBKEY_RSA_ENC_OR_SIG] = KEYCTL_SUPPORTS_ENCDEC |
+ KEYCTL_SUPPORTS_SIGVER,
+ [PGP_PUBKEY_RSA_ENC_ONLY] = KEYCTL_SUPPORTS_ENCDEC,
+ [PGP_PUBKEY_RSA_SIG_ONLY] = KEYCTL_SUPPORTS_SIGVER,
+ [PGP_PUBKEY_ELGAMAL] = 0,
+ [PGP_PUBKEY_DSA] = 0,
+};
+
+struct pgp_key_data_parse_context {
+ struct pgp_parse_context pgp;
+ u8 key[1024];
+ size_t keylen;
+ u8 keyid_buf[1024];
+ size_t keyid_buf_len;
+ char user_id[512];
+ size_t user_id_len;
+ const char *algo;
+ u8 raw_fingerprint[HASH_MAX_DIGESTSIZE];
+ size_t raw_fingerprint_len;
+ unsigned int version;
+};
+
+static inline void write_keyid_buf_char(struct pgp_key_data_parse_context *ctx,
+ uint8_t ch)
+{
+ memcpy(&ctx->keyid_buf[ctx->keyid_buf_len++], &ch, 1);
+}
+
+/*
+ * Build buffer to calculate the public key ID (RFC4880 12.2)
+ */
+static int pgp_build_pkey_keyid_buf(struct pgp_key_data_parse_context *ctx,
+ struct pgp_parse_pubkey *pgp)
+{
+ unsigned int nb[MAX_MPI];
+ unsigned int nn[MAX_MPI];
+ unsigned int n;
+ size_t keylen = ctx->keylen;
+ u8 *key_ptr = ctx->key;
+ u8 *pp[MAX_MPI];
+ u32 a32;
+ int npkey = pgp_key_algo_p_num_mpi[pgp->pubkey_algo];
+ int i, ret;
+
+ kenter("");
+
+ n = (pgp->version < PGP_KEY_VERSION_4) ? 8 : 6;
+ for (i = 0; i < npkey; i++) {
+ ret = mpi_key_length(key_ptr, keylen, nb + i, nn + i);
+ if (ret < 0) {
+ kleave(" = %d", ret);
+ return ret;
+ }
+
+ if (keylen < 2 + nn[i])
+ break;
+
+ pp[i] = key_ptr + 2;
+ key_ptr += 2 + nn[i];
+ keylen -= 2 + nn[i];
+ n += 2 + nn[i];
+ }
+
+ if (keylen != 0) {
+ pr_debug("excess %zu\n", keylen);
+ kleave(" = -EBADMSG");
+ return -EBADMSG;
+ }
+
+ write_keyid_buf_char(ctx, 0x99); /* ctb */
+ write_keyid_buf_char(ctx, n >> 8); /* 16-bit header length */
+ write_keyid_buf_char(ctx, n);
+
+ write_keyid_buf_char(ctx, pgp->version);
+
+ a32 = pgp->creation_time;
+ write_keyid_buf_char(ctx, a32 >> 24);
+ write_keyid_buf_char(ctx, a32 >> 16);
+ write_keyid_buf_char(ctx, a32 >> 8);
+ write_keyid_buf_char(ctx, a32 >> 0);
+
+ if (pgp->version < PGP_KEY_VERSION_4) {
+ u16 a16;
+
+ if (pgp->expires_at)
+ a16 = (pgp->expires_at - pgp->creation_time) / 86400UL;
+ else
+ a16 = 0;
+ write_keyid_buf_char(ctx, a16 >> 8);
+ write_keyid_buf_char(ctx, a16 >> 0);
+ }
+
+ write_keyid_buf_char(ctx, pgp->pubkey_algo);
+
+ for (i = 0; i < npkey; i++) {
+ write_keyid_buf_char(ctx, nb[i] >> 8);
+ write_keyid_buf_char(ctx, nb[i]);
+ memcpy(&ctx->keyid_buf[ctx->keyid_buf_len], pp[i], nn[i]);
+ ctx->keyid_buf_len += nn[i];
+ }
+
+ kleave(" = 0");
+ return 0;
+}
+
+/*
+ * Extract a public key or public subkey from the PGP stream.
+ */
+static int pgp_process_public_key(struct pgp_parse_context *context,
+ enum pgp_packet_tag type,
+ u8 headerlen,
+ const u8 *data,
+ size_t datalen)
+{
+ struct pgp_key_data_parse_context *ctx =
+ container_of(context, struct pgp_key_data_parse_context, pgp);
+ struct pgp_parse_pubkey pgp;
+ u8 capabilities;
+ int ret;
+
+ kenter(",%u,%u,,%zu", type, headerlen, datalen);
+
+ if (type == PGP_PKT_USER_ID) {
+ if (!ctx->user_id_len) {
+ if (ctx->user_id_len > sizeof(ctx->user_id)) {
+ kleave(" = -E2BIG");
+ return -E2BIG;
+ }
+
+ memcpy(ctx->user_id, data, datalen);
+ ctx->user_id_len = datalen;
+ }
+ kleave(" = 0 [user ID]");
+ return 0;
+ }
+
+ if (ctx->keyid_buf_len) {
+ kleave(" = -EBADMSG");
+ return -EBADMSG;
+ }
+
+ ret = pgp_parse_public_key(&data, &datalen, &pgp);
+ if (ret < 0) {
+ kleave(" = %d", ret);
+ return ret;
+ }
+
+ ctx->version = pgp.version;
+
+ if (pgp.pubkey_algo < PGP_PUBKEY__LAST)
+ ctx->algo = pgp_to_public_key_algo[pgp.pubkey_algo];
+
+ if (!ctx->algo) {
+ pr_debug("Unsupported public key algorithm %u\n",
+ pgp.pubkey_algo);
+ kleave(" = -ENOPKG");
+ return -ENOPKG;
+ }
+
+ /*
+ * It's the public half of a key, so that only gives us encrypt and
+ * verify capabilities.
+ */
+ capabilities = pgp_public_key_capabilities[pgp.pubkey_algo] &
+ (KEYCTL_SUPPORTS_ENCRYPT | KEYCTL_SUPPORTS_VERIFY);
+ /*
+ * Capabilities are not stored anymore in the public key, store only
+ * those that allow signature verification.
+ */
+ if (!(capabilities & KEYCTL_SUPPORTS_VERIFY)) {
+ pr_debug("Public key cannot be used for verification\n");
+ kleave(" = -ENOPKG");
+ return -ENOPKG;
+ }
+
+ if (datalen > sizeof(ctx->key)) {
+ kleave(" = -E2BIG");
+ return -E2BIG;
+ }
+
+ memcpy(ctx->key, data, datalen);
+ ctx->keylen = datalen;
+
+ ret = pgp_build_pkey_keyid_buf(ctx, &pgp);
+
+ kleave(" = %d", ret);
+ return ret;
+}
+
+/*
+ * Calculate the public key ID fingerprint
+ */
+static int pgp_generate_fingerprint(struct pgp_key_data_parse_context *ctx)
+{
+ struct crypto_shash *tfm;
+ struct shash_desc *digest;
+ char fingerprint[HASH_MAX_DIGESTSIZE * 2 + 1] = { 0 };
+ size_t offset;
+ int ret;
+
+ ret = -ENOMEM;
+ tfm = crypto_alloc_shash(ctx->version < PGP_KEY_VERSION_4 ?
+ "md5" : "sha1", 0, 0);
+ if (!tfm)
+ goto cleanup;
+
+ digest = kmalloc(sizeof(*digest) + crypto_shash_descsize(tfm),
+ GFP_KERNEL);
+ if (!digest)
+ goto cleanup_tfm;
+
+ digest->tfm = tfm;
+ crypto_shash_set_flags(digest->tfm, CRYPTO_TFM_REQ_MAY_SLEEP);
+ ret = crypto_shash_init(digest);
+ if (ret < 0)
+ goto cleanup_hash;
+
+ crypto_shash_update(digest, ctx->keyid_buf, ctx->keyid_buf_len);
+
+ ctx->raw_fingerprint_len = crypto_shash_digestsize(tfm);
+
+ ret = crypto_shash_final(digest, ctx->raw_fingerprint);
+ if (ret < 0)
+ goto cleanup_hash;
+
+ offset = ctx->raw_fingerprint_len - 8;
+ pr_debug("offset %lu/%lu\n", offset, ctx->raw_fingerprint_len);
+
+ bin2hex(fingerprint, ctx->raw_fingerprint, ctx->raw_fingerprint_len);
+ pr_debug("fingerprint %s\n", fingerprint);
+
+ ret = 0;
+cleanup_hash:
+ kfree(digest);
+cleanup_tfm:
+ crypto_free_shash(tfm);
+cleanup:
+ return ret;
+}
+
+static struct asymmetric_key_ids *pgp_key_generate_id(
+ struct pgp_key_data_parse_context *ctx)
+{
+ struct asymmetric_key_ids *kids;
+ struct asymmetric_key_id *kid;
+
+ kids = kzalloc(sizeof(struct asymmetric_key_ids), GFP_KERNEL);
+ if (!kids)
+ return kids;
+
+ kid = asymmetric_key_generate_id(ctx->raw_fingerprint,
+ ctx->raw_fingerprint_len, NULL, 0);
+ if (IS_ERR(kid))
+ goto error;
+
+ kids->id[0] = kid;
+ kids->id[1] = kmemdup(kid, sizeof(*kid) + ctx->raw_fingerprint_len,
+ GFP_KERNEL);
+ if (!kids->id[1])
+ goto error;
+
+ return kids;
+error:
+ kfree(kids->id[0]);
+ kfree(kids);
+
+ return NULL;
+}
+
+/*
+ * Attempt to parse the instantiation data blob for a key as a PGP packet
+ * message holding a key.
+ */
+static int pgp_key_parse(struct key_preparsed_payload *prep)
+{
+ struct pgp_key_data_parse_context *ctx;
+ struct public_key *pub = NULL;
+ int ret;
+
+ kenter("");
+
+ ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+ if (!ctx) {
+ kleave(" = -ENOMEM");
+ return -ENOMEM;
+ }
+
+ ctx->pgp.types_of_interest = (1 << PGP_PKT_PUBLIC_KEY) |
+ (1 << PGP_PKT_USER_ID);
+ ctx->pgp.process_packet = pgp_process_public_key;
+
+ ret = pgp_parse_packets(prep->data, prep->datalen, &ctx->pgp);
+ if (ret < 0)
+ goto error;
+
+ ret = pgp_generate_fingerprint(ctx);
+ if (ret < 0)
+ goto error;
+
+ pub = kzalloc(sizeof(struct public_key), GFP_KERNEL);
+ if (!pub) {
+ ret = -ENOMEM;
+ goto error;
+ }
+
+ pub->key = kmemdup(ctx->key, ctx->keylen, GFP_KERNEL);
+ if (!pub->key) {
+ ret = -ENOMEM;
+ goto error;
+ }
+
+ pub->keylen = ctx->keylen;
+ pub->id_type = "PGP";
+ pub->pkey_algo = ctx->algo;
+
+ if (ctx->user_id && ctx->user_id_len > 0) {
+ /*
+ * Propose a description for the key (user ID without the
+ * comment).
+ */
+ size_t ulen = ctx->user_id_len;
+
+ if (ulen > 255 - 9)
+ ulen = 255 - 9;
+ prep->description = kmalloc(ulen + 1 + 8 + 1, GFP_KERNEL);
+ ret = -ENOMEM;
+ if (!prep->description)
+ goto error;
+ memcpy(prep->description, ctx->user_id, ulen);
+ prep->description[ulen] = ' ';
+ bin2hex(prep->description + ulen + 1,
+ ctx->raw_fingerprint + ctx->raw_fingerprint_len - 4, 4);
+ prep->description[ulen + 9] = '\0';
+ pr_debug("desc '%s'\n", prep->description);
+ }
+
+ /* We're pinning the module by being linked against it */
+ __module_get(public_key_subtype.owner);
+ prep->payload.data[asym_subtype] = &public_key_subtype;
+ prep->payload.data[asym_key_ids] = pgp_key_generate_id(ctx);
+ prep->payload.data[asym_crypto] = pub;
+ prep->quotalen = 100;
+ kfree(ctx);
+ return 0;
+
+error:
+ public_key_free(pub);
+ kfree(ctx);
+ kleave(" = %d", ret);
+ return ret;
+}
+
+static struct asymmetric_key_parser pgp_key_parser = {
+ .owner = THIS_MODULE,
+ .name = "pgp",
+ .parse = pgp_key_parse,
+};
+
+/*
+ * Module stuff
+ */
+static int __init pgp_key_init(void)
+{
+ return register_asymmetric_key_parser(&pgp_key_parser);
+}
+
+static void __exit pgp_key_exit(void)
+{
+ unregister_asymmetric_key_parser(&pgp_key_parser);
+}
+
+module_init(pgp_key_init);
+module_exit(pgp_key_exit);
--
2.34.1
From: Roberto Sassu <[email protected]>
Parse the RSA key with RAW format if the ASN.1 parser returns an error.
Signed-off-by: Roberto Sassu <[email protected]>
---
crypto/rsa.c | 14 +++++--
crypto/rsa_helper.c | 69 +++++++++++++++++++++++++++++++++++
include/crypto/internal/rsa.h | 6 +++
3 files changed, 85 insertions(+), 4 deletions(-)
diff --git a/crypto/rsa.c b/crypto/rsa.c
index b9cd11fb7d36..a26cfda7d7c9 100644
--- a/crypto/rsa.c
+++ b/crypto/rsa.c
@@ -244,8 +244,11 @@ static int rsa_set_pub_key(struct crypto_akcipher *tfm, const void *key,
rsa_free_mpi_key(mpi_key);
ret = rsa_parse_pub_key(&raw_key, key, keylen);
- if (ret)
- return ret;
+ if (ret) {
+ ret = rsa_parse_pub_key_raw(&raw_key, key, keylen);
+ if (ret)
+ return ret;
+ }
mpi_key->e = mpi_read_raw_data(raw_key.e, raw_key.e_sz);
if (!mpi_key->e)
@@ -283,8 +286,11 @@ static int rsa_set_priv_key(struct crypto_akcipher *tfm, const void *key,
rsa_free_mpi_key(mpi_key);
ret = rsa_parse_priv_key(&raw_key, key, keylen);
- if (ret)
- return ret;
+ if (ret) {
+ ret = rsa_parse_priv_key_raw(&raw_key, key, keylen);
+ if (ret)
+ return ret;
+ }
mpi_key->d = mpi_read_raw_data(raw_key.d, raw_key.d_sz);
if (!mpi_key->d)
diff --git a/crypto/rsa_helper.c b/crypto/rsa_helper.c
index 94266f29049c..fb9443df8f0b 100644
--- a/crypto/rsa_helper.c
+++ b/crypto/rsa_helper.c
@@ -9,6 +9,7 @@
#include <linux/export.h>
#include <linux/err.h>
#include <linux/fips.h>
+#include <linux/mpi.h>
#include <crypto/internal/rsa.h>
#include "rsapubkey.asn1.h"
#include "rsaprivkey.asn1.h"
@@ -148,6 +149,32 @@ int rsa_get_qinv(void *context, size_t hdrlen, unsigned char tag,
return 0;
}
+typedef int (*rsa_get_func)(void *, size_t, unsigned char,
+ const void *, size_t);
+
+static int rsa_parse_key_raw(struct rsa_key *rsa_key,
+ const void *key, unsigned int key_len,
+ rsa_get_func *func, int n_func)
+{
+ unsigned int nbytes, len = key_len;
+ const void *key_ptr = key;
+ int ret, i;
+
+ for (i = 0; i < n_func; i++) {
+ ret = mpi_key_length(key_ptr, len, NULL, &nbytes);
+ if (ret < 0)
+ return ret;
+
+ ret = func[i](rsa_key, 0, 0, key_ptr + 2, nbytes);
+ if (ret < 0)
+ return ret;
+
+ key_ptr += nbytes + 2;
+ }
+
+ return (key_ptr == key + key_len) ? 0 : -EINVAL;
+}
+
/**
* rsa_parse_pub_key() - decodes the BER encoded buffer and stores in the
* provided struct rsa_key, pointers to the raw key as is,
@@ -166,6 +193,27 @@ int rsa_parse_pub_key(struct rsa_key *rsa_key, const void *key,
}
EXPORT_SYMBOL_GPL(rsa_parse_pub_key);
+/**
+ * rsa_parse_pub_key_raw() - parse the RAW key and store in the provided struct
+ * rsa_key, pointers to the raw key as is, so that
+ * the caller can copy it or MPI parse it, etc.
+ *
+ * @rsa_key: struct rsa_key key representation
+ * @key: key in RAW format
+ * @key_len: length of key
+ *
+ * Return: 0 on success or error code in case of error
+ */
+int rsa_parse_pub_key_raw(struct rsa_key *rsa_key, const void *key,
+ unsigned int key_len)
+{
+ rsa_get_func pub_func[] = {rsa_get_n, rsa_get_e};
+
+ return rsa_parse_key_raw(rsa_key, key, key_len,
+ pub_func, ARRAY_SIZE(pub_func));
+}
+EXPORT_SYMBOL_GPL(rsa_parse_pub_key_raw);
+
/**
* rsa_parse_priv_key() - decodes the BER encoded buffer and stores in the
* provided struct rsa_key, pointers to the raw key
@@ -184,3 +232,24 @@ int rsa_parse_priv_key(struct rsa_key *rsa_key, const void *key,
return asn1_ber_decoder(&rsaprivkey_decoder, rsa_key, key, key_len);
}
EXPORT_SYMBOL_GPL(rsa_parse_priv_key);
+
+/**
+ * rsa_parse_priv_key_raw() - parse the RAW key and store in the provided struct
+ * rsa_key, pointers to the raw key as is, so that
+ * the caller can copy it or MPI parse it, etc.
+ *
+ * @rsa_key: struct rsa_key key representation
+ * @key: key in RAW format
+ * @key_len: length of key
+ *
+ * Return: 0 on success or error code in case of error
+ */
+int rsa_parse_priv_key_raw(struct rsa_key *rsa_key, const void *key,
+ unsigned int key_len)
+{
+ rsa_get_func priv_func[] = {rsa_get_n, rsa_get_e, rsa_get_d};
+
+ return rsa_parse_key_raw(rsa_key, key, key_len,
+ priv_func, ARRAY_SIZE(priv_func));
+}
+EXPORT_SYMBOL_GPL(rsa_parse_priv_key_raw);
diff --git a/include/crypto/internal/rsa.h b/include/crypto/internal/rsa.h
index e870133f4b77..7141e806ceea 100644
--- a/include/crypto/internal/rsa.h
+++ b/include/crypto/internal/rsa.h
@@ -50,8 +50,14 @@ struct rsa_key {
int rsa_parse_pub_key(struct rsa_key *rsa_key, const void *key,
unsigned int key_len);
+int rsa_parse_pub_key_raw(struct rsa_key *rsa_key, const void *key,
+ unsigned int key_len);
+
int rsa_parse_priv_key(struct rsa_key *rsa_key, const void *key,
unsigned int key_len);
+int rsa_parse_priv_key_raw(struct rsa_key *rsa_key, const void *key,
+ unsigned int key_len);
+
extern struct crypto_template rsa_pkcs1pad_tmpl;
#endif
--
2.34.1
From: Roberto Sassu <[email protected]>
Introduce the new function to get the number of bits and bytes from an MPI.
Signed-off-by: Roberto Sassu <[email protected]>
---
include/linux/mpi.h | 2 ++
lib/crypto/mpi/mpicoder.c | 33 ++++++++++++++++++++++++++-------
2 files changed, 28 insertions(+), 7 deletions(-)
diff --git a/include/linux/mpi.h b/include/linux/mpi.h
index eb0d1c1db208..a7dd4c9d8120 100644
--- a/include/linux/mpi.h
+++ b/include/linux/mpi.h
@@ -90,6 +90,8 @@ enum gcry_mpi_format {
};
MPI mpi_read_raw_data(const void *xbuffer, size_t nbytes);
+int mpi_key_length(const void *xbuffer, unsigned int ret_nread,
+ unsigned int *nbits_arg, unsigned int *nbytes_arg);
MPI mpi_read_from_buffer(const void *buffer, unsigned *ret_nread);
int mpi_fromstr(MPI val, const char *str);
MPI mpi_scanval(const char *string);
diff --git a/lib/crypto/mpi/mpicoder.c b/lib/crypto/mpi/mpicoder.c
index 3cb6bd148fa9..92447a1c8bf9 100644
--- a/lib/crypto/mpi/mpicoder.c
+++ b/lib/crypto/mpi/mpicoder.c
@@ -79,22 +79,41 @@ MPI mpi_read_raw_data(const void *xbuffer, size_t nbytes)
}
EXPORT_SYMBOL_GPL(mpi_read_raw_data);
-MPI mpi_read_from_buffer(const void *xbuffer, unsigned *ret_nread)
+int mpi_key_length(const void *xbuffer, unsigned int ret_nread,
+ unsigned int *nbits_arg, unsigned int *nbytes_arg)
{
const uint8_t *buffer = xbuffer;
- unsigned int nbits, nbytes;
- MPI val;
+ unsigned int nbits;
- if (*ret_nread < 2)
- return ERR_PTR(-EINVAL);
+ if (ret_nread < 2)
+ return -EINVAL;
nbits = buffer[0] << 8 | buffer[1];
if (nbits > MAX_EXTERN_MPI_BITS) {
pr_info("MPI: mpi too large (%u bits)\n", nbits);
- return ERR_PTR(-EINVAL);
+ return -EINVAL;
}
- nbytes = DIV_ROUND_UP(nbits, 8);
+ if (nbits_arg)
+ *nbits_arg = nbits;
+ if (nbytes_arg)
+ *nbytes_arg = DIV_ROUND_UP(nbits, 8);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(mpi_key_length);
+
+MPI mpi_read_from_buffer(const void *xbuffer, unsigned int *ret_nread)
+{
+ const uint8_t *buffer = xbuffer;
+ unsigned int nbytes;
+ MPI val;
+ int ret;
+
+ ret = mpi_key_length(xbuffer, *ret_nread, NULL, &nbytes);
+ if (ret < 0)
+ return ERR_PTR(ret);
+
if (nbytes + 2 > *ret_nread) {
pr_info("MPI: mpi larger than buffer nbytes=%u ret_nread=%u\n",
nbytes, *ret_nread);
--
2.34.1
From: David Howells <[email protected]>
Provide a simple parser that extracts the packets from a PGP packet blob
and passes the desirous ones to the given processor function:
struct pgp_parse_context {
u64 types_of_interest;
int (*process_packet)(struct pgp_parse_context *context,
enum pgp_packet_tag type,
u8 headerlen,
const u8 *data,
size_t datalen);
};
int pgp_parse_packets(const u8 *data, size_t datalen,
struct pgp_parse_context *ctx);
This is configured on with CONFIG_PGP_LIBRARY.
Signed-off-by: David Howells <[email protected]>
Co-developed-by: Roberto Sassu <[email protected]>
Signed-off-by: Roberto Sassu <[email protected]>
---
crypto/asymmetric_keys/Kconfig | 6 +
crypto/asymmetric_keys/Makefile | 16 ++
crypto/asymmetric_keys/pgp_library.c | 272 +++++++++++++++++++++++++++
crypto/asymmetric_keys/pgplib.h | 33 ++++
4 files changed, 327 insertions(+)
create mode 100644 crypto/asymmetric_keys/pgp_library.c
create mode 100644 crypto/asymmetric_keys/pgplib.h
diff --git a/crypto/asymmetric_keys/Kconfig b/crypto/asymmetric_keys/Kconfig
index 59ec726b7c77..ebe9dc88d975 100644
--- a/crypto/asymmetric_keys/Kconfig
+++ b/crypto/asymmetric_keys/Kconfig
@@ -86,4 +86,10 @@ config FIPS_SIGNATURE_SELFTEST
depends on PKCS7_MESSAGE_PARSER=X509_CERTIFICATE_PARSER
depends on X509_CERTIFICATE_PARSER
+config PGP_LIBRARY
+ tristate "PGP parsing library"
+ help
+ This option enables a library that provides a number of simple
+ utility functions for parsing PGP (RFC 4880) packet-based messages.
+
endif # ASYMMETRIC_KEY_TYPE
diff --git a/crypto/asymmetric_keys/Makefile b/crypto/asymmetric_keys/Makefile
index 1a273d6df3eb..f7e5ee59857f 100644
--- a/crypto/asymmetric_keys/Makefile
+++ b/crypto/asymmetric_keys/Makefile
@@ -77,3 +77,19 @@ verify_signed_pefile-y := \
$(obj)/mscode_parser.o: $(obj)/mscode.asn1.h $(obj)/mscode.asn1.h
$(obj)/mscode.asn1.o: $(obj)/mscode.asn1.c $(obj)/mscode.asn1.h
+
+#
+# TPM private key parsing
+#
+obj-$(CONFIG_TPM_KEY_PARSER) += tpm_key_parser.o
+tpm_key_parser-y := \
+ tpm.asn1.o \
+ tpm_parser.o
+
+$(obj)/tpm_parser.o: $(obj)/tpm.asn1.h
+$(obj)/tpm.asn1.o: $(obj)/tpm.asn1.c $(obj)/tpm.asn1.h
+
+#
+# PGP handling
+#
+obj-$(CONFIG_PGP_LIBRARY) += pgp_library.o
diff --git a/crypto/asymmetric_keys/pgp_library.c b/crypto/asymmetric_keys/pgp_library.c
new file mode 100644
index 000000000000..d2c3149983d5
--- /dev/null
+++ b/crypto/asymmetric_keys/pgp_library.c
@@ -0,0 +1,272 @@
+// SPDX-License-Identifier: GPL-2.0
+/* PGP packet parser (RFC 4880)
+ *
+ * Copyright (C) 2011 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#define pr_fmt(fmt) "PGPL: "fmt
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+#include "pgplib.h"
+
+MODULE_LICENSE("GPL");
+
+const char *const pgp_hash_algorithms[PGP_HASH__LAST] = {
+ [PGP_HASH_MD5] = "md5",
+ [PGP_HASH_SHA1] = "sha1",
+ [PGP_HASH_RIPE_MD_160] = "rmd160",
+ [PGP_HASH_SHA256] = "sha256",
+ [PGP_HASH_SHA384] = "sha384",
+ [PGP_HASH_SHA512] = "sha512",
+ [PGP_HASH_SHA224] = "sha224",
+};
+EXPORT_SYMBOL_GPL(pgp_hash_algorithms);
+
+/**
+ * pgp_parse_packet_header - Parse a PGP packet header
+ * @_data: Start of the PGP packet (updated to PGP packet data)
+ * @_datalen: Amount of data remaining in buffer (decreased)
+ * @_type: Where the packet type will be returned
+ * @_headerlen: Where the header length will be returned
+ *
+ * Parse a set of PGP packet header [RFC 4880: 4.2].
+ *
+ * Return: packet data size on success; non-zero on error. If successful,
+ * *_data and *_datalen will have been updated and *_headerlen will be set to
+ * hold the length of the packet header.
+ */
+static ssize_t pgp_parse_packet_header(const u8 **_data, size_t *_datalen,
+ enum pgp_packet_tag *_type,
+ u8 *_headerlen)
+{
+ enum pgp_packet_tag type;
+ const u8 *data = *_data;
+ size_t size, datalen = *_datalen;
+
+ pr_devel("-->%s(,%zu,,)\n", __func__, datalen);
+
+ if (datalen < 2)
+ goto short_packet;
+
+ pr_devel("pkthdr %02x, %02x\n", data[0], data[1]);
+
+ type = *data++;
+ datalen--;
+ if (!(type & 0x80)) {
+ pr_debug("Packet type does not have MSB set\n");
+ return -EBADMSG;
+ }
+ type &= ~0x80;
+
+ if (type & 0x40) {
+ /* New packet length format */
+ type &= ~0x40;
+ pr_devel("new format: t=%u\n", type);
+ switch (data[0]) {
+ case 0x00 ... 0xbf:
+ /* One-byte length */
+ size = data[0];
+ data++;
+ datalen--;
+ *_headerlen = 2;
+ break;
+ case 0xc0 ... 0xdf:
+ /* Two-byte length */
+ if (datalen < 2)
+ goto short_packet;
+ size = (data[0] - 192) * 256;
+ size += data[1] + 192;
+ data += 2;
+ datalen -= 2;
+ *_headerlen = 3;
+ break;
+ case 0xff:
+ /* Five-byte length */
+ if (datalen < 5)
+ goto short_packet;
+ size = data[1] << 24;
+ size |= data[2] << 16;
+ size |= data[3] << 8;
+ size |= data[4];
+ data += 5;
+ datalen -= 5;
+ *_headerlen = 6;
+ break;
+ default:
+ pr_debug("Partial body length packet not supported\n");
+ return -EBADMSG;
+ }
+ } else {
+ /* Old packet length format */
+ u8 length_type = type & 0x03;
+
+ type >>= 2;
+ pr_devel("old format: t=%u lt=%u\n", type, length_type);
+
+ switch (length_type) {
+ case 0:
+ /* One-byte length */
+ size = data[0];
+ data++;
+ datalen--;
+ *_headerlen = 2;
+ break;
+ case 1:
+ /* Two-byte length */
+ if (datalen < 2)
+ goto short_packet;
+ size = data[0] << 8;
+ size |= data[1];
+ data += 2;
+ datalen -= 2;
+ *_headerlen = 3;
+ break;
+ case 2:
+ /* Four-byte length */
+ if (datalen < 4)
+ goto short_packet;
+ size = data[0] << 24;
+ size |= data[1] << 16;
+ size |= data[2] << 8;
+ size |= data[3];
+ data += 4;
+ datalen -= 4;
+ *_headerlen = 5;
+ break;
+ default:
+ pr_debug("Indefinite length packet not supported\n");
+ return -EBADMSG;
+ }
+ }
+
+ pr_devel("datalen=%zu size=%zu\n", datalen, size);
+ if (datalen < size)
+ goto short_packet;
+ if (size > INT_MAX)
+ goto too_big;
+
+ *_data = data;
+ *_datalen = datalen;
+ *_type = type;
+ pr_devel("Found packet type=%u size=%zd\n", type, size);
+ return size;
+
+short_packet:
+ pr_debug("Attempt to parse short packet\n");
+ return -EBADMSG;
+too_big:
+ pr_debug("Signature subpacket size >2G\n");
+ return -EMSGSIZE;
+}
+
+/**
+ * pgp_parse_packets - Parse a set of PGP packets
+ * @data: Data to be parsed (updated)
+ * @datalen: Amount of data (updated)
+ * @ctx: Parsing context
+ *
+ * Parse a set of PGP packets [RFC 4880: 4].
+ *
+ * Return: 0 on successful parsing, a negative value otherwise
+ */
+int pgp_parse_packets(const u8 *data, size_t datalen,
+ struct pgp_parse_context *ctx)
+{
+ enum pgp_packet_tag type;
+ ssize_t pktlen;
+ u8 headerlen;
+ int ret;
+
+ while (datalen > 2) {
+ pktlen = pgp_parse_packet_header(&data, &datalen, &type,
+ &headerlen);
+ if (pktlen < 0)
+ return pktlen;
+
+ if ((ctx->types_of_interest >> type) & 1) {
+ ret = ctx->process_packet(ctx, type, headerlen,
+ data, pktlen);
+ if (ret < 0)
+ return ret;
+ }
+ data += pktlen;
+ datalen -= pktlen;
+ }
+
+ if (datalen != 0) {
+ pr_debug("Excess octets in packet stream\n");
+ return -EBADMSG;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(pgp_parse_packets);
+
+/**
+ * pgp_parse_public_key - Parse the common part of a PGP pubkey packet
+ * @_data: Content of packet (updated)
+ * @_datalen: Length of packet remaining (updated)
+ * @pk: Public key data
+ *
+ * Parse the common data struct for a PGP pubkey packet [RFC 4880: 5.5.2].
+ *
+ * Return: 0 on successful parsing, a negative value otherwise
+ */
+int pgp_parse_public_key(const u8 **_data, size_t *_datalen,
+ struct pgp_parse_pubkey *pk)
+{
+ const u8 *data = *_data;
+ size_t datalen = *_datalen;
+ unsigned int tmp;
+
+ if (datalen < 12) {
+ pr_debug("Public key packet too short\n");
+ return -EBADMSG;
+ }
+
+ pk->version = *data++;
+ switch (pk->version) {
+ case PGP_KEY_VERSION_2:
+ case PGP_KEY_VERSION_3:
+ case PGP_KEY_VERSION_4:
+ break;
+ default:
+ pr_debug("Public key packet with unhandled version %d\n",
+ pk->version);
+ return -EBADMSG;
+ }
+
+ tmp = *data++ << 24;
+ tmp |= *data++ << 16;
+ tmp |= *data++ << 8;
+ tmp |= *data++;
+ pk->creation_time = tmp;
+ if (pk->version == PGP_KEY_VERSION_4) {
+ pk->expires_at = 0; /* Have to get it from the selfsignature */
+ } else {
+ unsigned short ndays;
+
+ ndays = *data++ << 8;
+ ndays |= *data++;
+ if (ndays)
+ pk->expires_at = pk->creation_time + ndays * 86400UL;
+ else
+ pk->expires_at = 0;
+ datalen -= 2;
+ }
+
+ pk->pubkey_algo = *data++;
+ datalen -= 6;
+
+ pr_devel("%x,%x,%lx,%lx\n",
+ pk->version, pk->pubkey_algo, pk->creation_time,
+ pk->expires_at);
+
+ *_data = data;
+ *_datalen = datalen;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(pgp_parse_public_key);
diff --git a/crypto/asymmetric_keys/pgplib.h b/crypto/asymmetric_keys/pgplib.h
new file mode 100644
index 000000000000..d82b84179433
--- /dev/null
+++ b/crypto/asymmetric_keys/pgplib.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* PGP library definitions (RFC 4880)
+ *
+ * Copyright (C) 2012 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#include "pgp.h"
+
+/*
+ * PGP library packet parser
+ */
+struct pgp_parse_context {
+ u64 types_of_interest;
+ int (*process_packet)(struct pgp_parse_context *context,
+ enum pgp_packet_tag type,
+ u8 headerlen,
+ const u8 *data,
+ size_t datalen);
+};
+
+extern int pgp_parse_packets(const u8 *data, size_t datalen,
+ struct pgp_parse_context *ctx);
+
+struct pgp_parse_pubkey {
+ enum pgp_key_version version : 8;
+ enum pgp_pubkey_algo pubkey_algo : 8;
+ __kernel_old_time_t creation_time;
+ __kernel_old_time_t expires_at;
+};
+
+extern int pgp_parse_public_key(const u8 **_data, size_t *_datalen,
+ struct pgp_parse_pubkey *pk);
--
2.34.1
From: David Howells <[email protected]>
Provide some PGP signature parsing helpers:
(1) A function to parse V4 signature subpackets and pass the desired ones
to a processor function:
int pgp_parse_sig_subpkts(const u8 *data, size_t datalen,
struct pgp_parse_sig_context *ctx);
(2) A function to parse out basic signature parameters from any PGP
signature such that the algorithms and public key can be selected:
int pgp_parse_sig_params(const u8 **_data, size_t *_datalen,
struct pgp_sig_parameters *p);
Signed-off-by: David Howells <[email protected]>
Co-developed-by: Roberto Sassu <[email protected]>
Signed-off-by: Roberto Sassu <[email protected]>
---
crypto/asymmetric_keys/pgp_library.c | 284 +++++++++++++++++++++++++++
crypto/asymmetric_keys/pgplib.h | 25 +++
2 files changed, 309 insertions(+)
diff --git a/crypto/asymmetric_keys/pgp_library.c b/crypto/asymmetric_keys/pgp_library.c
index d2c3149983d5..de4a748db9be 100644
--- a/crypto/asymmetric_keys/pgp_library.c
+++ b/crypto/asymmetric_keys/pgp_library.c
@@ -270,3 +270,287 @@ int pgp_parse_public_key(const u8 **_data, size_t *_datalen,
return 0;
}
EXPORT_SYMBOL_GPL(pgp_parse_public_key);
+
+/**
+ * pgp_parse_sig_subpkt_header - Parse a PGP V4 signature subpacket header
+ * @_data: Start of the subpacket (updated to subpacket data)
+ * @_datalen: Amount of data remaining in buffer (decreased)
+ * @_type: Where the subpacket type will be returned
+ *
+ * Parse a PGP V4 signature subpacket header [RFC 4880: 5.2.3.1].
+ *
+ * Return: packet data size on success; non-zero on error. If successful,
+ * *_data and *_datalen will have been updated and *_headerlen will be set to
+ * hold the length of the packet header.
+ */
+static ssize_t pgp_parse_sig_subpkt_header(const u8 **_data, size_t *_datalen,
+ enum pgp_sig_subpkt_type *_type)
+{
+ enum pgp_sig_subpkt_type type;
+ const u8 *data = *_data;
+ size_t size, datalen = *_datalen;
+
+ pr_devel("-->%s(,%zu,,)\n", __func__, datalen);
+
+ if (datalen < 2)
+ goto short_subpacket;
+
+ pr_devel("subpkt hdr %02x, %02x\n", data[0], data[1]);
+
+ switch (data[0]) {
+ case 0x00 ... 0xbf:
+ /* One-byte length */
+ size = data[0];
+ data++;
+ datalen--;
+ break;
+ case 0xc0 ... 0xfe:
+ /* Two-byte length */
+ if (datalen < 3)
+ goto short_subpacket;
+ size = (data[0] - 192) * 256;
+ size += data[1] + 192;
+ data += 2;
+ datalen -= 2;
+ break;
+ case 0xff:
+ if (datalen < 6)
+ goto short_subpacket;
+ size = data[1] << 24;
+ size |= data[2] << 16;
+ size |= data[3] << 8;
+ size |= data[4];
+ data += 5;
+ datalen -= 5;
+ break;
+ }
+
+ /* The type octet is included in the size */
+ pr_devel("datalen=%zu size=%zu\n", datalen, size);
+ if (datalen < size)
+ goto short_subpacket;
+ if (size == 0)
+ goto very_short_subpacket;
+ if (size > INT_MAX)
+ goto too_big;
+
+ type = *data++ & ~PGP_SIG_SUBPKT_TYPE_CRITICAL_MASK;
+ datalen--;
+ size--;
+
+ *_data = data;
+ *_datalen = datalen;
+ *_type = type;
+ pr_devel("Found subpkt type=%u size=%zd\n", type, size);
+ return size;
+
+very_short_subpacket:
+ pr_debug("Signature subpacket size can't be zero\n");
+ return -EBADMSG;
+short_subpacket:
+ pr_debug("Attempt to parse short signature subpacket\n");
+ return -EBADMSG;
+too_big:
+ pr_debug("Signature subpacket size >2G\n");
+ return -EMSGSIZE;
+}
+
+/**
+ * pgp_parse_sig_subpkts - Parse a set of PGP V4 signatute subpackets
+ * @data: Data to be parsed (updated)
+ * @datalen: Amount of data (updated)
+ * @ctx: Parsing context
+ *
+ * Parse a set of PGP signature subpackets [RFC 4880: 5.2.3].
+ *
+ * Return: 0 on successful parsing, an error value otherwise
+ */
+static int pgp_parse_sig_subpkts(const u8 *data, size_t datalen,
+ struct pgp_parse_sig_context *ctx)
+{
+ enum pgp_sig_subpkt_type type;
+ ssize_t pktlen;
+ int ret;
+
+ pr_devel("-->%s(,%zu,,)\n", __func__, datalen);
+
+ while (datalen > 2) {
+ pktlen = pgp_parse_sig_subpkt_header(&data, &datalen, &type);
+ if (pktlen < 0)
+ return pktlen;
+ if (test_bit(type, ctx->types_of_interest)) {
+ ret = ctx->process_packet(ctx, type, data, pktlen);
+ if (ret < 0)
+ return ret;
+ }
+ data += pktlen;
+ datalen -= pktlen;
+ }
+
+ if (datalen != 0) {
+ pr_debug("Excess octets in signature subpacket stream\n");
+ return -EBADMSG;
+ }
+
+ return 0;
+}
+
+struct pgp_parse_sig_params_ctx {
+ struct pgp_parse_sig_context base;
+ struct pgp_sig_parameters *params;
+ bool got_the_issuer;
+};
+
+/*
+ * Process a V4 signature subpacket.
+ */
+static int pgp_process_sig_params_subpkt(struct pgp_parse_sig_context *context,
+ enum pgp_sig_subpkt_type type,
+ const u8 *data,
+ size_t datalen)
+{
+ struct pgp_parse_sig_params_ctx *ctx =
+ container_of(context, struct pgp_parse_sig_params_ctx, base);
+
+ if (ctx->got_the_issuer) {
+ pr_debug("V4 signature packet has multiple issuers\n");
+ return -EBADMSG;
+ }
+
+ if (datalen != 8) {
+ pr_debug("V4 signature issuer subpkt not 8 long (%zu)\n",
+ datalen);
+ return -EBADMSG;
+ }
+
+ memcpy(&ctx->params->issuer, data, 8);
+ ctx->got_the_issuer = true;
+ return 0;
+}
+
+/**
+ * pgp_parse_sig_params - Parse basic parameters from a PGP signature packet
+ * @_data: Content of packet (updated)
+ * @_datalen: Length of packet remaining (updated)
+ * @p: The basic parameters
+ *
+ * Parse the basic parameters from a PGP signature packet [RFC 4880: 5.2] that
+ * are needed to start off a signature verification operation. The only ones
+ * actually necessary are the signature type (which affects how the data is
+ * transformed) and the hash algorithm.
+ *
+ * We also extract the public key algorithm and the issuer's key ID as we'll
+ * need those to determine if we actually have the public key available. If
+ * not, then we can't verify the signature anyway.
+ *
+ * Return: 0 if successful or a negative error code. *_data and *_datalen are
+ * updated to point to the 16-bit subset of the hash value and the set of MPIs.
+ */
+int pgp_parse_sig_params(const u8 **_data, size_t *_datalen,
+ struct pgp_sig_parameters *p)
+{
+ const u8 *data = *_data;
+ size_t datalen = *_datalen;
+ int ret;
+
+ pr_devel("-->%s(,%zu,,)\n", __func__, datalen);
+
+ if (datalen < 1)
+ return -EBADMSG;
+ p->version = *data;
+
+ if (p->version == PGP_SIG_VERSION_3) {
+ const struct pgp_signature_v3_packet *v3 = (const void *)data;
+
+ if (datalen < sizeof(*v3)) {
+ pr_debug("Short V3 signature packet\n");
+ return -EBADMSG;
+ }
+ datalen -= sizeof(*v3);
+ data += sizeof(*v3);
+
+ /* V3 has everything we need in the header */
+ p->signature_type = v3->hashed.signature_type;
+ memcpy(&p->issuer, &v3->issuer, 8);
+ p->pubkey_algo = v3->pubkey_algo;
+ p->hash_algo = v3->hash_algo;
+
+ } else if (p->version == PGP_SIG_VERSION_4) {
+ const struct pgp_signature_v4_packet *v4 = (const void *)data;
+ struct pgp_parse_sig_params_ctx ctx = {
+ .base.process_packet = pgp_process_sig_params_subpkt,
+ .params = p,
+ .got_the_issuer = false,
+ };
+ size_t subdatalen;
+
+ if (datalen < sizeof(*v4) + 2 + 2 + 2) {
+ pr_debug("Short V4 signature packet\n");
+ return -EBADMSG;
+ }
+ datalen -= sizeof(*v4);
+ data += sizeof(*v4);
+
+ /* V4 has most things in the header... */
+ p->signature_type = v4->signature_type;
+ p->pubkey_algo = v4->pubkey_algo;
+ p->hash_algo = v4->hash_algo;
+
+ /*
+ * ... but we have to get the key ID from the subpackets, of
+ * which there are two sets.
+ */
+ __set_bit(PGP_SIG_ISSUER, ctx.base.types_of_interest);
+
+ subdatalen = *data++ << 8;
+ subdatalen |= *data++;
+ datalen -= 2;
+ if (subdatalen) {
+ /* Hashed subpackets */
+ pr_devel("hashed data: %zu (after %zu)\n",
+ subdatalen, sizeof(*v4));
+ if (subdatalen > datalen + 2 + 2) {
+ pr_debug("Short V4 signature packet [hdata]\n");
+ return -EBADMSG;
+ }
+ ret = pgp_parse_sig_subpkts(data, subdatalen,
+ &ctx.base);
+ if (ret < 0)
+ return ret;
+ data += subdatalen;
+ datalen -= subdatalen;
+ }
+
+ subdatalen = *data++ << 8;
+ subdatalen |= *data++;
+ datalen -= 2;
+ if (subdatalen) {
+ /* Unhashed subpackets */
+ pr_devel("unhashed data: %zu\n", subdatalen);
+ if (subdatalen > datalen + 2) {
+ pr_debug("Short V4 signature packet [udata]\n");
+ return -EBADMSG;
+ }
+ ret = pgp_parse_sig_subpkts(data, subdatalen,
+ &ctx.base);
+ if (ret < 0)
+ return ret;
+ data += subdatalen;
+ datalen -= subdatalen;
+ }
+
+ if (!ctx.got_the_issuer) {
+ pr_debug("V4 signature packet lacks issuer\n");
+ return -EBADMSG;
+ }
+ } else {
+ pr_debug("Signature packet with unhandled version %d\n",
+ p->version);
+ return -EBADMSG;
+ }
+
+ *_data = data;
+ *_datalen = datalen;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(pgp_parse_sig_params);
diff --git a/crypto/asymmetric_keys/pgplib.h b/crypto/asymmetric_keys/pgplib.h
index d82b84179433..967e2853186d 100644
--- a/crypto/asymmetric_keys/pgplib.h
+++ b/crypto/asymmetric_keys/pgplib.h
@@ -31,3 +31,28 @@ struct pgp_parse_pubkey {
extern int pgp_parse_public_key(const u8 **_data, size_t *_datalen,
struct pgp_parse_pubkey *pk);
+
+struct pgp_parse_sig_context {
+ unsigned long types_of_interest[128 / BITS_PER_LONG];
+ int (*process_packet)(struct pgp_parse_sig_context *context,
+ enum pgp_sig_subpkt_type type,
+ const u8 *data,
+ size_t datalen);
+};
+
+extern int pgp_parse_sig_packets(const u8 *data, size_t datalen,
+ struct pgp_parse_sig_context *ctx);
+
+struct pgp_sig_parameters {
+ enum pgp_signature_version version : 8;
+ enum pgp_signature_type signature_type : 8;
+ enum pgp_pubkey_algo pubkey_algo : 8;
+ enum pgp_hash_algo hash_algo : 8;
+ union {
+ struct pgp_key_ID issuer;
+ __be32 issuer32[2];
+ };
+};
+
+extern int pgp_parse_sig_params(const u8 **_data, size_t *_datalen,
+ struct pgp_sig_parameters *p);
--
2.34.1
From: Roberto Sassu <[email protected]>
Test it with:
gpg --dearmor < <PGP key> | keyctl padd asymmetric "" @u
Signed-off-by: Roberto Sassu <[email protected]>
---
crypto/asymmetric_keys/pgp_public_key.c | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/crypto/asymmetric_keys/pgp_public_key.c b/crypto/asymmetric_keys/pgp_public_key.c
index 0529c8ce2d43..876bb83abdd5 100644
--- a/crypto/asymmetric_keys/pgp_public_key.c
+++ b/crypto/asymmetric_keys/pgp_public_key.c
@@ -10,6 +10,7 @@
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/mpi.h>
+#include <linux/sbm.h>
#include <keys/asymmetric-subtype.h>
#include <keys/asymmetric-parser.h>
#include <crypto/hash.h>
@@ -310,6 +311,12 @@ static struct asymmetric_key_ids *pgp_key_generate_id(
return NULL;
}
+static SBM_DEFINE_FUNC(parse_key, const unsigned char *, data, size_t, datalen,
+ struct pgp_key_data_parse_context *, ctx)
+{
+ return pgp_parse_packets(data, datalen, &ctx->pgp);
+}
+
/*
* Attempt to parse the instantiation data blob for a key as a PGP packet
* message holding a key.
@@ -318,6 +325,7 @@ static int pgp_key_parse(struct key_preparsed_payload *prep)
{
struct pgp_key_data_parse_context *ctx;
struct public_key *pub = NULL;
+ struct sbm sbm;
int ret;
kenter("");
@@ -332,7 +340,16 @@ static int pgp_key_parse(struct key_preparsed_payload *prep)
(1 << PGP_PKT_USER_ID);
ctx->pgp.process_packet = pgp_process_public_key;
- ret = pgp_parse_packets(prep->data, prep->datalen, &ctx->pgp);
+ sbm_init(&sbm);
+ ret = sbm_call(&sbm, parse_key,
+ SBM_COPY_IN(&sbm, prep->data, prep->datalen),
+ prep->datalen, SBM_COPY_INOUT(&sbm, ctx, sizeof(*ctx)));
+ sbm_destroy(&sbm);
+
+ if (ret < 0)
+ goto error;
+
+ ret = sbm_error(&sbm);
if (ret < 0)
goto error;
--
2.34.1
From: Roberto Sassu <[email protected]>
gpg --dearmor < <PGP key> | keyctl padd asymmetric "fault" @u
Signed-off-by: Roberto Sassu <[email protected]>
---
crypto/asymmetric_keys/pgp_public_key.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/crypto/asymmetric_keys/pgp_public_key.c b/crypto/asymmetric_keys/pgp_public_key.c
index 876bb83abdd5..aa9f92d423c3 100644
--- a/crypto/asymmetric_keys/pgp_public_key.c
+++ b/crypto/asymmetric_keys/pgp_public_key.c
@@ -62,6 +62,7 @@ struct pgp_key_data_parse_context {
u8 raw_fingerprint[HASH_MAX_DIGESTSIZE];
size_t raw_fingerprint_len;
unsigned int version;
+ bool fault;
};
static inline void write_keyid_buf_char(struct pgp_key_data_parse_context *ctx,
@@ -189,6 +190,9 @@ static int pgp_process_public_key(struct pgp_parse_context *context,
return ret;
}
+ if (ctx->fault)
+ ctx->key[16384] = '\0';
+
ctx->version = pgp.version;
if (pgp.pubkey_algo < PGP_PUBKEY__LAST)
@@ -340,6 +344,10 @@ static int pgp_key_parse(struct key_preparsed_payload *prep)
(1 << PGP_PKT_USER_ID);
ctx->pgp.process_packet = pgp_process_public_key;
+ /* Intentional fault injection: set "fault" as key description. */
+ if (prep->orig_description && !strcmp(prep->orig_description, "fault"))
+ ctx->fault = true;
+
sbm_init(&sbm);
ret = sbm_call(&sbm, parse_key,
SBM_COPY_IN(&sbm, prep->data, prep->datalen),
--
2.34.1
On 2/16/24 07:24, Petr Tesarik wrote:
> While I started working on my development branch to illustrate how
> SandBox Mode could be enhanced to allow dynamic memory allocation and
> other features necessary to convert some existing code, my colleague
> Roberto Sassu set out and adapted a PGP key parser to run in a sandbox.
>
> Disclaimer:
>
> The code had to be rearranged in order to avoid memory allocations
> and crypto operations in the sandbox. The code might contain errors.
I'm confused by this. The kernel doesn't (appear to) have a PGP parser
today. So are you saying that it *should* have one and it's only
feasible if its confined in a sandbox?
A much more powerful example would be to take something that the kernel
has already and put it in a sandbox. That would show us how difficult
it is to sandbox something versus just doing it _normally_ in the kernel.
As it stands, I fear this was just the largest chunk of sandbox code
that was laying around and it seemed like a good idea to just chuck
~1400 lines of code over the wall at a huge cc list.
I'm not sure I want to see any more SandBox mode filling up my inbox.
On Fri, 16 Feb 2024 07:38:30 -0800
Dave Hansen <[email protected]> wrote:
> On 2/16/24 07:24, Petr Tesarik wrote:
> > While I started working on my development branch to illustrate how
> > SandBox Mode could be enhanced to allow dynamic memory allocation and
> > other features necessary to convert some existing code, my colleague
> > Roberto Sassu set out and adapted a PGP key parser to run in a sandbox.
> >
> > Disclaimer:
> >
> > The code had to be rearranged in order to avoid memory allocations
> > and crypto operations in the sandbox. The code might contain errors.
>
> I'm confused by this. The kernel doesn't (appear to) have a PGP parser
> today. So are you saying that it *should* have one and it's only
> feasible if its confined in a sandbox?
I'm sorry if this is confusing. Yes, your understanding is correct.
This patch series demonstrates that SBM (even in the initial version
that was submitted) allows to write a PGP parser which can survive
memory safety bugs withoug compromising the rest of the kernel.
> A much more powerful example would be to take something that the kernel
> has already and put it in a sandbox. That would show us how difficult
> it is to sandbox something versus just doing it _normally_ in the kernel.
That's what I have also tested as a PoC with an earlier version of my
patch series and a few quick hacks on top. As it happens, that code
on top needs to be adapted for the current patch series, so I cannot
post it just yet. Please, stay tuned.
> As it stands, I fear this was just the largest chunk of sandbox code
> that was laying around and it seemed like a good idea to just chuck
> ~1400 lines of code over the wall at a huge cc list.
You asked for some real-world scenarios, and this should be one.
Another is on the way, but see above why it takes a bit of time.
I am not trying to claim I'm the smartest person on this planet. I can
accept it if there is a fundamental flaw in my approach. Yet, I'd be
glad if somebody can make me a favor and at least hint at what exactly
is the issue with it. I have to admit this thing is still not quite
clear to me. I would be sad if I couldn't learn from this experience.
Petr T
On Fri, 2024-02-16 at 16:44 +0000, Matthew Wilcox wrote:
> On Fri, Feb 16, 2024 at 04:24:33PM +0100, Petr Tesarik wrote:
> > From: David Howells <[email protected]>
> >
> > Implement a PGP data parser for the crypto key type to use when
> > instantiating a key.
> >
> > This parser attempts to parse the instantiation data as a PGP packet
> > sequence (RFC 4880) and if it parses okay, attempts to extract a public-key
> > algorithm key or subkey from it.
>
> I don't understand why we want to do this in-kernel instead of in
> userspace and then pass in the actual key.
Sigh, this is a long discussion.
PGP keys would be used as a system-wide trust anchor to verify RPM
package headers, which already contain file digests that can be used as
reference values for kernel-enforced integrity appraisal.
With the assumptions that:
- In a locked-down system the kernel has more privileges than root
- The kernel cannot offload this task to an user space process due to
insufficient isolation
the only available option is to do it in the kernel (that is what I got
as suggestion).
Roberto
On February 16, 2024 8:53:01 AM PST, Roberto Sassu <[email protected]> wrote:
>On Fri, 2024-02-16 at 16:44 +0000, Matthew Wilcox wrote:
>> On Fri, Feb 16, 2024 at 04:24:33PM +0100, Petr Tesarik wrote:
>> > From: David Howells <[email protected]>
>> >
>> > Implement a PGP data parser for the crypto key type to use when
>> > instantiating a key.
>> >
>> > This parser attempts to parse the instantiation data as a PGP packet
>> > sequence (RFC 4880) and if it parses okay, attempts to extract a public-key
>> > algorithm key or subkey from it.
>>
>> I don't understand why we want to do this in-kernel instead of in
>> userspace and then pass in the actual key.
>
>Sigh, this is a long discussion.
>
>PGP keys would be used as a system-wide trust anchor to verify RPM
>package headers, which already contain file digests that can be used as
>reference values for kernel-enforced integrity appraisal.
>
>With the assumptions that:
>
>- In a locked-down system the kernel has more privileges than root
>- The kernel cannot offload this task to an user space process due to
> insufficient isolation
>
>the only available option is to do it in the kernel (that is what I got
>as suggestion).
>
>Roberto
>
>
Ok, at least one of those assumptions is false, and *definitely* this approach seems to be a solution in search of a problem.
On Fri, 2024-02-16 at 09:08 -0800, H. Peter Anvin wrote:
> On February 16, 2024 8:53:01 AM PST, Roberto Sassu <[email protected]> wrote:
> > On Fri, 2024-02-16 at 16:44 +0000, Matthew Wilcox wrote:
> > > On Fri, Feb 16, 2024 at 04:24:33PM +0100, Petr Tesarik wrote:
> > > > From: David Howells <[email protected]>
> > > >
> > > > Implement a PGP data parser for the crypto key type to use when
> > > > instantiating a key.
> > > >
> > > > This parser attempts to parse the instantiation data as a PGP packet
> > > > sequence (RFC 4880) and if it parses okay, attempts to extract a public-key
> > > > algorithm key or subkey from it.
> > >
> > > I don't understand why we want to do this in-kernel instead of in
> > > userspace and then pass in the actual key.
> >
> > Sigh, this is a long discussion.
> >
> > PGP keys would be used as a system-wide trust anchor to verify RPM
> > package headers, which already contain file digests that can be used as
> > reference values for kernel-enforced integrity appraisal.
> >
> > With the assumptions that:
> >
> > - In a locked-down system the kernel has more privileges than root
> > - The kernel cannot offload this task to an user space process due to
> > insufficient isolation
> >
> > the only available option is to do it in the kernel (that is what I got
> > as suggestion).
> >
> > Roberto
> >
> >
>
> Ok, at least one of those assumptions is false, and *definitely* this approach seems to be a solution in search of a problem.
I'm looking for a solution to this for a long time. Could you please
explain?
Thanks
Roberto
Petr Tesařík <[email protected]> writes:
> On Fri, 16 Feb 2024 07:38:30 -0800
> Dave Hansen <[email protected]> wrote:
>> I'm confused by this. The kernel doesn't (appear to) have a PGP parser
>> today. So are you saying that it *should* have one and it's only
>> feasible if its confined in a sandbox?
>
> I'm sorry if this is confusing. Yes, your understanding is correct.
> This patch series demonstrates that SBM (even in the initial version
> that was submitted) allows to write a PGP parser which can survive
> memory safety bugs withoug compromising the rest of the kernel.
So I have a different question: some years ago we added the "usermode
blob" feature for just this kind of use case - parsing firewall rules at
the time. It has never been used for that, but it's still there in
kernel/usermode_driver.c. Is there a reason why this existing
functionality can't be used for tasks like PGP parsing as well?
Thanks,
jon
On 2/16/2024 6:21 PM, Jonathan Corbet wrote:
> Petr Tesařík <[email protected]> writes:
>
>> On Fri, 16 Feb 2024 07:38:30 -0800
>> Dave Hansen <[email protected]> wrote:
>>> I'm confused by this. The kernel doesn't (appear to) have a PGP parser
>>> today. So are you saying that it *should* have one and it's only
>>> feasible if its confined in a sandbox?
>>
>> I'm sorry if this is confusing. Yes, your understanding is correct.
>> This patch series demonstrates that SBM (even in the initial version
>> that was submitted) allows to write a PGP parser which can survive
>> memory safety bugs withoug compromising the rest of the kernel.
>
> So I have a different question: some years ago we added the "usermode
> blob" feature for just this kind of use case - parsing firewall rules at
> the time. It has never been used for that, but it's still there in
> kernel/usermode_driver.c. Is there a reason why this existing
> functionality can't be used for tasks like PGP parsing as well?
Yes, it was an option I explored last year (briefly talked about it as a
BoF at LSS NA 2023).
You are right, there is such feature that seemed to fit well.
User space blob embedded in a kernel module, so signed. User space
process connected only to the kernel through a pipe.
I even went ahead, and created a framework:
https://lore.kernel.org/linux-kernel/[email protected]/
so that anyone can implement similar use cases.
The further step is: how can I ensure that the process launched by the
kernel is not attacked by root (which I assumed to be less powerful than
the kernel in a locked-down system).
I handled this in both directions:
- The process launched by the kernel is under a seccomp strict profile,
and can only read/write through the pipe created by the kernel (and
call few other system calls, such as exit()). Otherwise it is killed.
Cannot create any new communication channel.
- I created an LSM that denies any attempt to ptrace/signal to the
process launched by the kernel. Jann Horn also suggested to make the
process non-swappable.
However, despite these attempts, security people don't feel confident on
offloading a kernel workload outside the kernel.
This is why this work started.
Roberto
On Fri, Feb 16, 2024 at 05:53:01PM +0100, Roberto Sassu wrote:
> On Fri, 2024-02-16 at 16:44 +0000, Matthew Wilcox wrote:
> > On Fri, Feb 16, 2024 at 04:24:33PM +0100, Petr Tesarik wrote:
> > > From: David Howells <[email protected]>
> > >
> > > Implement a PGP data parser for the crypto key type to use when
> > > instantiating a key.
> > >
> > > This parser attempts to parse the instantiation data as a PGP packet
> > > sequence (RFC 4880) and if it parses okay, attempts to extract a public-key
> > > algorithm key or subkey from it.
> >
> > I don't understand why we want to do this in-kernel instead of in
> > userspace and then pass in the actual key.
>
> Sigh, this is a long discussion.
Well, yes. When you don't lay out why this is of value, it turns into a
long discussion. This isn't fun for me either.
> PGP keys would be used as a system-wide trust anchor to verify RPM
> package headers, which already contain file digests that can be used as
> reference values for kernel-enforced integrity appraisal.
The one example we have of usage comes in patch 7 of this series and is:
gpg --dearmor < <PGP key> | keyctl padd asymmetric "" @u
And you're already using two userspace programs there. Why not a third?
gpg --dearmor < <PGP key> | ./scripts/parse-pgp-packets | keyctl padd asymmetric "" @u
> With the assumptions that:
>
> - In a locked-down system the kernel has more privileges than root
> - The kernel cannot offload this task to an user space process due to
> insufficient isolation
>
> the only available option is to do it in the kernel (that is what I got
> as suggestion).
This sounds like there's some other way of getting the key into the
kernel which doesn't rely on userspace. Or are you assuming that nobody
bothered to trojan 'cat'?
On Fri, Feb 16, 2024 at 04:24:33PM +0100, Petr Tesarik wrote:
> From: David Howells <[email protected]>
>
> Implement a PGP data parser for the crypto key type to use when
> instantiating a key.
>
> This parser attempts to parse the instantiation data as a PGP packet
> sequence (RFC 4880) and if it parses okay, attempts to extract a public-key
> algorithm key or subkey from it.
I don't understand why we want to do this in-kernel instead of in
userspace and then pass in the actual key.
On 2/16/2024 7:44 PM, Matthew Wilcox wrote:
> On Fri, Feb 16, 2024 at 05:53:01PM +0100, Roberto Sassu wrote:
>> On Fri, 2024-02-16 at 16:44 +0000, Matthew Wilcox wrote:
>>> On Fri, Feb 16, 2024 at 04:24:33PM +0100, Petr Tesarik wrote:
>>>> From: David Howells <[email protected]>
>>>>
>>>> Implement a PGP data parser for the crypto key type to use when
>>>> instantiating a key.
>>>>
>>>> This parser attempts to parse the instantiation data as a PGP packet
>>>> sequence (RFC 4880) and if it parses okay, attempts to extract a public-key
>>>> algorithm key or subkey from it.
>>>
>>> I don't understand why we want to do this in-kernel instead of in
>>> userspace and then pass in the actual key.
>>
>> Sigh, this is a long discussion.
>
> Well, yes. When you don't lay out why this is of value, it turns into a
> long discussion. This isn't fun for me either.
>
>> PGP keys would be used as a system-wide trust anchor to verify RPM
>> package headers, which already contain file digests that can be used as
>> reference values for kernel-enforced integrity appraisal.
>
> The one example we have of usage comes in patch 7 of this series and is:
>
> gpg --dearmor < <PGP key> | keyctl padd asymmetric "" @u
>
> And you're already using two userspace programs there. Why not a third?
I think this is very easy to answer. Why not extracting the public key
from an x509 certificate in user space, sending it to the kernel, and
using it for kernel module verification?
> gpg --dearmor < <PGP key> | ./scripts/parse-pgp-packets | keyctl padd asymmetric "" @u
>
>> With the assumptions that:
>>
>> - In a locked-down system the kernel has more privileges than root
>> - The kernel cannot offload this task to an user space process due to
>> insufficient isolation
>>
>> the only available option is to do it in the kernel (that is what I got
>> as suggestion).
>
> This sounds like there's some other way of getting the key into the
> kernel which doesn't rely on userspace. Or are you assuming that nobody
> bothered to trojan 'cat'?
Apologies for not providing the full information at once. I'm worried
that would be too long, and pieces can be lost in the way. If it is not
a problem, I'm going to clarify on request.
Ok, so, I'm not going to use cat to upload the PGP keys. These will be
embedded in the kernel image, when the Linux distribution vendors build
their kernel.
This works for both secure boot and trusted boot, since the kernel image
can be measured/verified by the boot loader.
Another source for keys is the MOK database, since users might want the
ability to verify their own software, which does not come from the Linux
distribution.
I briefly anticipated the full picture, but I will tell it more explicitly.
The kernel, with the embedded PGP keys, will be able to verify the
signature of the RPM package headers.
A component that I recently developed, the digest_cache LSM, has the
ability to extract file digests from RPM headers and provide a simple
interface for IMA, IPE, BPF LSM and any other component to query the
calculated digest of files being accessed, and allow/deny access to them
depending on whether the query is successful or not.
I already anticipate the question, if you have the problem parsing PGP
keys, why don't you have the problem parsing RPM package headers?
I started finding a solution before this became available, and the only
alternative I found was to formally verify my code. So, I took Frama-C,
wrote the assertions, and verified that not only the code is
functionally correct for correct sequences of bytes, but that there is
no illegal memory access for any arbitrary sequence (unfortunately, I
can prove for a small buffer size).
So, I'm probably going to do the same for the PGP parser, if this does
not fly. But, we were very optimistic that this could be a valid
alternative!
Roberto
On February 20, 2024 2:55:12 AM PST, Petr Tesarik <[email protected]> wrote:
>On 2/16/2024 6:08 PM, H. Peter Anvin wrote:
>> On February 16, 2024 8:53:01 AM PST, Roberto Sassu <[email protected]> wrote:
>>> On Fri, 2024-02-16 at 16:44 +0000, Matthew Wilcox wrote:
>>>> On Fri, Feb 16, 2024 at 04:24:33PM +0100, Petr Tesarik wrote:
>>>>> From: David Howells <[email protected]>
>>>>>
>>>>> Implement a PGP data parser for the crypto key type to use when
>>>>> instantiating a key.
>>>>>
>>>>> This parser attempts to parse the instantiation data as a PGP packet
>>>>> sequence (RFC 4880) and if it parses okay, attempts to extract a public-key
>>>>> algorithm key or subkey from it.
>>>>
>>>> I don't understand why we want to do this in-kernel instead of in
>>>> userspace and then pass in the actual key.
>>>
>>> Sigh, this is a long discussion.
>>>
>>> PGP keys would be used as a system-wide trust anchor to verify RPM
>>> package headers, which already contain file digests that can be used as
>>> reference values for kernel-enforced integrity appraisal.
>>>
>>> With the assumptions that:
>>>
>>> - In a locked-down system the kernel has more privileges than root
>>> - The kernel cannot offload this task to an user space process due to
>>> insufficient isolation
>>>
>>> the only available option is to do it in the kernel (that is what I got
>>> as suggestion).
>>>
>>> Roberto
>>>
>>>
>>
>> Ok, at least one of those assumptions is false, and *definitely* this approach seems to be a solution in search of a problem.
>
>As a matter of fact, there is some truth to this observation.
>
>The frustrating story of Roberto's PGP parser sparked the idea, but it
>would clearly be overkill to add all this code just for this one parser.
>I started looking around if there are other potential uses of a sandbox
>mode, which might justify the effort. I quickly found out that it is
>difficult to find a self-contained part of the kernel.
>
>Now I believe that these dependencies among different parts of the
>kernel present an issue, both to kernel security and to maintainability
>of the source code. Even if sandbox mode as such is rejected (hopefully
>with an explanation of the reasons), I believe that it is good to split
>the kernel into smaller parts and reduce their interdependencies. In
>this sense, sandbox mode is a way to express and enforce the remaining
>dependencies.
>
>Petr T
Congratulations. You just reinvented the microkernel.
On Wed, 21 Feb 2024 06:02:30 -0800
"H. Peter Anvin" <[email protected]> wrote:
> On February 20, 2024 2:55:12 AM PST, Petr Tesarik <[email protected]> wrote:
> >On 2/16/2024 6:08 PM, H. Peter Anvin wrote:
> >> On February 16, 2024 8:53:01 AM PST, Roberto Sassu <[email protected]> wrote:
> >>> On Fri, 2024-02-16 at 16:44 +0000, Matthew Wilcox wrote:
> >>>> On Fri, Feb 16, 2024 at 04:24:33PM +0100, Petr Tesarik wrote:
> >>>>> From: David Howells <[email protected]>
> >>>>>
> >>>>> Implement a PGP data parser for the crypto key type to use when
> >>>>> instantiating a key.
> >>>>>
> >>>>> This parser attempts to parse the instantiation data as a PGP packet
> >>>>> sequence (RFC 4880) and if it parses okay, attempts to extract a public-key
> >>>>> algorithm key or subkey from it.
> >>>>
> >>>> I don't understand why we want to do this in-kernel instead of in
> >>>> userspace and then pass in the actual key.
> >>>
> >>> Sigh, this is a long discussion.
> >>>
> >>> PGP keys would be used as a system-wide trust anchor to verify RPM
> >>> package headers, which already contain file digests that can be used as
> >>> reference values for kernel-enforced integrity appraisal.
> >>>
> >>> With the assumptions that:
> >>>
> >>> - In a locked-down system the kernel has more privileges than root
> >>> - The kernel cannot offload this task to an user space process due to
> >>> insufficient isolation
> >>>
> >>> the only available option is to do it in the kernel (that is what I got
> >>> as suggestion).
> >>>
> >>> Roberto
> >>>
> >>>
> >>
> >> Ok, at least one of those assumptions is false, and *definitely* this approach seems to be a solution in search of a problem.
> >
> >As a matter of fact, there is some truth to this observation.
> >
> >The frustrating story of Roberto's PGP parser sparked the idea, but it
> >would clearly be overkill to add all this code just for this one parser.
> >I started looking around if there are other potential uses of a sandbox
> >mode, which might justify the effort. I quickly found out that it is
> >difficult to find a self-contained part of the kernel.
> >
> >Now I believe that these dependencies among different parts of the
> >kernel present an issue, both to kernel security and to maintainability
> >of the source code. Even if sandbox mode as such is rejected (hopefully
> >with an explanation of the reasons), I believe that it is good to split
> >the kernel into smaller parts and reduce their interdependencies. In
> >this sense, sandbox mode is a way to express and enforce the remaining
> >dependencies.
> >
> >Petr T
>
> Congratulations. You just reinvented the microkernel.
Oh, I have never claimed that the idea is completely new. There is a
lot of prior research in this field; the most advanced project is
probably Lightweight Execution Domains (LXDs), presented at USENIX ATC
in 2019 [1]. This one even adds a full-blown microkernel...
However, these projects have not gone anywhere, for some reason. I
tried to understand the reason and it seems to me that it is not the
underlying concept. I believe the main issue is that it would require
extremely intrusive changes in the overall design of the kernel. For
example LXDs run their microkernel on a dedicated CPU core and it uses
IDL to generate the glue code which passes data between Linux and this
newly introduced microkernel...
Our development process is more suited to incremental changes. This is
reflected by SandBox Mode. It allows to start small, keep the existing
overall design, and chip off only a few selected parts.
In short, SBM does borrow some ideas from microkernels but it does not
turn Linux into a microkernel. OTOH it enhances your freedom of
choice. If you change your mind and decide to make a Linux microkernel
after all, SBM will be able to help you during the transition.
Petr T
[1] https://www.usenix.org/system/files/atc19-narayanan.pdf
From: Petr Tesarik <[email protected]>
[ For people newly added to Cc, this RFC is a reply to subsystem
maintainers who asked for a real-world demonstration of how
SandBox Mode could be used in practice. SandBox Mode itself
was proposed in these two series (generic and x86):
* https://lore.kernel.org/lkml/[email protected]/T/
* https://lore.kernel.org/lkml/[email protected]/T/
]
This patch series provides an example of running existing kernel code in
a sandbox. It also adds some fixes and infrastructure to the base series.
If you only want to see how the conversion itself might look like, skip
straight to patch 5/5.
Patches 1 and 2 amend the base patch series. Patches 3 and 4 are ported
from my earlier proof of concept and adapted to work without adding too
much other code. I am sending a complete WIP patch series so you can
actually build and run the code.
Disclaimer: This code is not ready for submission. It is incomplete and
may contain bugs. It is provided here for the sole purpose of demonstrating
how existing kernel code would be modified to run in a sandbox.
PATCH 1/5 is a bug fix discovered after sending patch series v1.
PATCH 2/5 allows to map a buffer into the sandbox at its kernel address.
PATCH 3/5 is required to intercept calls to pre-selected kernel functions.
PATCH 4/5 implements dynamic allocation in sandbox mode.
PATCH 5/5 demonstrates how to convert existing kernel code to use SBM.
Petr Tesarik (5):
sbm: x86: fix SBM error entry path
sbm: enhance buffer mapping API
sbm: x86: infrastructure to fix up sandbox faults
sbm: fix up calls to dynamic memory allocators
apparmor: parse profiles in sandbox mode
arch/x86/entry/entry_64.S | 10 ++-
arch/x86/kernel/sbm/call_64.S | 20 +++++
arch/x86/kernel/sbm/core.c | 161 +++++++++++++++++++++++++++++++++-
arch/x86/kernel/vmlinux.lds.S | 9 ++
include/linux/sbm.h | 77 ++++++++++++++++
kernel/sbm.c | 34 +++++++
mm/slab_common.c | 3 +-
mm/slub.c | 17 ++--
mm/vmalloc.c | 11 +--
security/apparmor/crypto.c | 7 +-
security/apparmor/policy.c | 29 ++++--
security/apparmor/secid.c | 3 +-
12 files changed, 352 insertions(+), 29 deletions(-)
--
2.34.1
From: Petr Tesarik <[email protected]>
Normal interrupt entry from SBM should be generally treated as entry from
kernel mode (no swapgs, no speculation mitigations), but since there is a
CPL change, the interrupt handler runs on the trampoline stack, which may
get reused if the current task is re-scheduled.
Make sure to switch to the SBM exception stack.
Signed-off-by: Petr Tesarik <[email protected]>
---
arch/x86/entry/entry_64.S | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 4ba3eea38102..96830591302d 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1062,14 +1062,20 @@ SYM_CODE_START(error_entry)
/*
* If sandbox mode was active, adjust the saved CS,
* unconditionally switch to kernel CR3 and continue
- * as if the interrupt was from kernel space.
+ * as if the interrupt was from kernel space, but
+ * switch away from the trampoline stack.
*/
movq x86_sbm_state + SBM_kernel_cr3, %rcx
jrcxz .Lerror_swapgs
andb $~3, CS+8(%rsp)
movq %rcx, %cr3
- jmp .Lerror_entry_done_lfence
+
+ FENCE_SWAPGS_KERNEL_ENTRY
+ CALL_DEPTH_ACCOUNT
+ leaq 8(%rsp), %rdi
+ /* Put us onto the SBM exception stack. */
+ jmp sync_regs
#endif
.Lerror_swapgs:
--
2.34.1
From: Petr Tesarik <[email protected]>
Add SBM_MAP_READONLY() and SBM_MAP_WRITABLE() to the public API to allow
mapping kernel buffers directly into the sandbox with no copying.
Signed-off-by: Petr Tesarik <[email protected]>
---
include/linux/sbm.h | 71 +++++++++++++++++++++++++++++++++++++++++++++
kernel/sbm.c | 34 ++++++++++++++++++++++
2 files changed, 105 insertions(+)
diff --git a/include/linux/sbm.h b/include/linux/sbm.h
index 98fd27cd58d0..dbdc0781349f 100644
--- a/include/linux/sbm.h
+++ b/include/linux/sbm.h
@@ -181,6 +181,31 @@ static inline void *sbm_add_buf(struct sbm *sbm, struct sbm_buf **list,
#define SBM_COPY_INOUT(sbm, buf, size) \
((typeof(({buf; })))sbm_add_buf((sbm), &(sbm)->io, (buf), (size)))
+/**
+ * sbm_map_readonly() - Map memory for reading.
+ * @sbm: SBM instance.
+ * @ptr: Starting virtual address.
+ * @size: Size in bytes.
+ *
+ * Make the specified virtual address range readable in sandbox code.
+ *
+ * Return: Address of the buffer, or %NULL on error.
+ */
+void *sbm_map_readonly(struct sbm *sbm, const void *ptr, size_t size);
+
+/**
+ * sbm_map_writable() - Map memory for reading and writing.
+ * @sbm: SBM instance.
+ * @ptr: Starting virtual address.
+ * @size: Size in bytes.
+ *
+ * Make the specified virtual address range readable and writable in sandbox
+ * code.
+ *
+ * Return: Address of the buffer, or %NULL on error.
+ */
+void *sbm_map_writable(struct sbm *sbm, const void *ptr, size_t size);
+
#ifdef CONFIG_HAVE_ARCH_SBM
/**
@@ -303,8 +328,54 @@ static inline int sbm_exec(struct sbm *sbm, sbm_func func, void *data)
#define SBM_COPY_OUT(sbm, buf, size) __SBM_EVAL(buf)
#define SBM_COPY_INOUT(sbm, buf, size) __SBM_EVAL(buf)
+static inline void *sbm_map_readonly(struct sbm *sbm, const void *ptr,
+ size_t size)
+{
+ return (void *)ptr;
+}
+
+static inline void *sbm_map_writable(struct sbm *sbm, const void *ptr,
+ size_t size)
+{
+ return (void *)ptr;
+}
+
#endif /* CONFIG_SANDBOX_MODE */
+/**
+ * SBM_MAP_READONLY() - Map an input buffer into SBM.
+ * @sbm: SBM instance.
+ * @buf: Buffer virtual address.
+ * @size: Size of the buffer.
+ *
+ * Make a read-only mapping of buffer in sandbox mode.
+ *
+ * This works with page granularity. If the buffer is not page-aligned,
+ * some data before and/or after the page is also mappeed into the sandbox.
+ * The mapping does not ensure guard pages either.
+ *
+ * Return: Buffer address in sandbox mode (same as kernel mode).
+ */
+#define SBM_MAP_READONLY(sbm, buf, size) \
+ ((typeof(({buf; })))sbm_map_readonly((sbm), (buf), (size)))
+
+/**
+ * SBM_MAP_WRITABLE() - Map an input/output buffer into SBM.
+ * @sbm: SBM instance.
+ * @buf: Buffer virtual address.
+ * @size: Size of the buffer.
+ *
+ * Make a writable mapping of buffer in sandbox mode.
+ *
+ * This works with page granularity. If the buffer is not page-aligned,
+ * some data before and/or after the page is also mappeed into the sandbox.
+ * The mapping does not ensure guard pages either.
+ *
+ * Return: Buffer address in sandbox mode (same as kernel mode).
+ */
+#define SBM_MAP_WRITABLE(sbm, buf, size) \
+ ((typeof(({buf; })))sbm_map_writable((sbm), (buf), (size)))
+
/**
* __SBM_MAP() - Convert parameters to comma-separated expressions.
* @m: Macro used to convert each pair.
diff --git a/kernel/sbm.c b/kernel/sbm.c
index df57184f5d87..c832808b538e 100644
--- a/kernel/sbm.c
+++ b/kernel/sbm.c
@@ -71,6 +71,40 @@ void sbm_destroy(struct sbm *sbm)
}
EXPORT_SYMBOL(sbm_destroy);
+void *sbm_map_readonly(struct sbm *sbm, const void *ptr, size_t size)
+{
+ struct sbm_buf buf;
+
+ if (sbm->error)
+ return NULL;
+
+ buf.sbm_ptr = (void *)ptr;
+ buf.size = size;
+ sbm->error = arch_sbm_map_readonly(sbm, &buf);
+ if (sbm->error)
+ return NULL;
+
+ return buf.sbm_ptr;
+}
+EXPORT_SYMBOL(sbm_map_readonly);
+
+void *sbm_map_writable(struct sbm *sbm, const void *ptr, size_t size)
+{
+ struct sbm_buf buf;
+
+ if (sbm->error)
+ return NULL;
+
+ buf.sbm_ptr = (void *)ptr;
+ buf.size = size;
+ sbm->error = arch_sbm_map_writable(sbm, &buf);
+ if (sbm->error)
+ return NULL;
+
+ return buf.sbm_ptr;
+}
+EXPORT_SYMBOL(sbm_map_writable);
+
/* Copy input buffers into a sandbox. */
static int sbm_copy_in(struct sbm *sbm)
{
--
2.34.1
From: Petr Tesarik <[email protected]>
Since sandbox mode cannot modify kernel data, much of the core API cannot
be used directly. Provide a method to call a known subset of kernel
functions from the sandbox fault handler on behalf of the sandbox code.
Since SBM permissions have page granularity, the code of an intercepted
function must not be in the same page as another function running in
sandbox mode. Provide a __nosbm marker to move the intercepted functions
into a special ELF section, align it to page boundaries and map it so that
it is not executable in sandbox mode. To minimize alignment padding, merge
the __nosbm section with the kernel entry code.
Signed-off-by: Petr Tesarik <[email protected]>
---
arch/x86/kernel/sbm/call_64.S | 20 +++++++++++
arch/x86/kernel/sbm/core.c | 65 +++++++++++++++++++++++++++++++++--
arch/x86/kernel/vmlinux.lds.S | 9 +++++
include/linux/sbm.h | 6 ++++
4 files changed, 98 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/sbm/call_64.S b/arch/x86/kernel/sbm/call_64.S
index 21edce5666bc..6d8ae30a0984 100644
--- a/arch/x86/kernel/sbm/call_64.S
+++ b/arch/x86/kernel/sbm/call_64.S
@@ -93,3 +93,23 @@ SYM_INNER_LABEL(x86_sbm_return, SYM_L_GLOBAL)
pop %rbp
RET
SYM_FUNC_END(x86_sbm_exec)
+
+.text
+
+/*
+ * arguments:
+ * rdi .. state (ignored)
+ * rsi .. target function
+ * rdx .. struct pt_regs
+*/
+SYM_FUNC_START(x86_sbm_proxy_call)
+ mov %rdx, %r10
+ mov %rsi, %r11
+ mov pt_regs_di(%r10), %rdi
+ mov pt_regs_si(%r10), %rsi
+ mov pt_regs_dx(%r10), %rdx
+ mov pt_regs_cx(%r10), %rcx
+ mov pt_regs_r8(%r10), %r8
+ mov pt_regs_r9(%r10), %r9
+ JMP_NOSPEC r11
+SYM_FUNC_END(x86_sbm_proxy_call)
diff --git a/arch/x86/kernel/sbm/core.c b/arch/x86/kernel/sbm/core.c
index 296f1fde3c22..c8ac7ecb08cc 100644
--- a/arch/x86/kernel/sbm/core.c
+++ b/arch/x86/kernel/sbm/core.c
@@ -28,6 +28,60 @@ asmlinkage int x86_sbm_exec(struct x86_sbm_state *state, sbm_func func,
unsigned long exc_tos);
extern char x86_sbm_return[];
+extern char __nosbm_text_start[], __nosbm_text_end[];
+
+/*************************************************************
+ * HACK: PROOF-OF-CONCEPT FIXUP CODE STARTS HERE
+ */
+
+typedef unsigned long (*sbm_proxy_call_fn)(struct x86_sbm_state *,
+ unsigned long func,
+ struct pt_regs *);
+
+asmlinkage unsigned long x86_sbm_proxy_call(struct x86_sbm_state *state,
+ unsigned long func,
+ struct pt_regs *regs);
+
+/**
+ * struct sbm_fixup - Describe a sandbox fault fixup.
+ * @target: Target function to be called.
+ * @proxy: Proxy call function.
+ */
+struct sbm_fixup {
+ void *target;
+ sbm_proxy_call_fn proxy;
+};
+
+static const struct sbm_fixup fixups[] =
+{
+ { }
+};
+
+/* Fix up a page fault if it is one of the known exceptions. */
+static bool fixup_sbm_call(struct x86_sbm_state *state,
+ struct pt_regs *regs, unsigned long address)
+{
+ const struct sbm_fixup *fixup;
+
+ for (fixup = fixups; fixup->target; ++fixup) {
+ if (address == (unsigned long)fixup->target) {
+ regs->ax = fixup->proxy(state, address, regs);
+ return true;
+ }
+ }
+
+ return false;
+}
+
+/* Execution in sandbox mode continues here after fixup. */
+static void x86_sbm_continue(void)
+{
+}
+
+/*
+ * HACK: PROOF-OF-CONCEPT FIXUP CODE ENDS HERE
+ *************************************************************/
+
union {
struct x86_sbm_state state;
char page[PAGE_SIZE];
@@ -140,8 +194,8 @@ static int map_kernel(struct x86_sbm_state *state)
if (err)
return err;
- err = map_range(state, (unsigned long)__entry_text_start,
- (unsigned long)__entry_text_end, PAGE_KERNEL_ROX);
+ err = map_range(state, (unsigned long)__nosbm_text_start,
+ (unsigned long)__nosbm_text_end, PAGE_KERNEL_ROX);
if (err)
return err;
@@ -482,6 +536,13 @@ void handle_sbm_fault(struct pt_regs *regs, unsigned long error_code,
if (spurious_sbm_fault(state, error_code, address))
return;
+ if ((error_code & ~X86_PF_PROT) == (X86_PF_USER | X86_PF_INSTR) &&
+ fixup_sbm_call(state, regs, address)) {
+ /* Return back to sandbox... */
+ regs->ip = (unsigned long)x86_sbm_continue;
+ return;
+ }
+
/*
* Force -EFAULT unless the fault was due to a user-mode instruction
* fetch from the designated return address.
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index a349dbfc6d5a..c530a7faaa9a 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -139,8 +139,17 @@ SECTIONS
STATIC_CALL_TEXT
ALIGN_ENTRY_TEXT_BEGIN
+#ifdef CONFIG_SANDBOX_MODE
+ . = ALIGN(PAGE_SIZE);
+ __nosbm_text_start = .;
+#endif
*(.text..__x86.rethunk_untrain)
ENTRY_TEXT
+#ifdef CONFIG_SANDBOX_MODE
+ *(.text.nosbm)
+ . = ALIGN(PAGE_SIZE);
+ __nosbm_text_end = .;
+#endif
#ifdef CONFIG_CPU_SRSO
/*
diff --git a/include/linux/sbm.h b/include/linux/sbm.h
index dbdc0781349f..9d7eb525e489 100644
--- a/include/linux/sbm.h
+++ b/include/linux/sbm.h
@@ -267,6 +267,8 @@ int arch_sbm_map_writable(struct sbm *sbm, const struct sbm_buf *buf);
*/
int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *data);
+#define __nosbm __section(".text.nosbm")
+
#else /* !CONFIG_HAVE_ARCH_SBM */
static inline int arch_sbm_init(struct sbm *sbm)
@@ -295,6 +297,8 @@ static inline int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *data)
return func(data);
}
+#define __nosbm
+
#endif /* CONFIG_HAVE_ARCH_SBM */
#else /* !CONFIG_SANDBOX_MODE */
@@ -340,6 +344,8 @@ static inline void *sbm_map_writable(struct sbm *sbm, const void *ptr,
return (void *)ptr;
}
+#define __nosbm
+
#endif /* CONFIG_SANDBOX_MODE */
/**
--
2.34.1
From: Petr Tesarik <[email protected]>
Add fixup functions to call kmalloc(), vmalloc() and friends on behalf of
the sandbox code.
Signed-off-by: Petr Tesarik <[email protected]>
---
arch/x86/kernel/sbm/core.c | 81 ++++++++++++++++++++++++++++++++++++++
mm/slab_common.c | 3 +-
mm/slub.c | 17 ++++----
mm/vmalloc.c | 11 +++---
4 files changed, 98 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kernel/sbm/core.c b/arch/x86/kernel/sbm/core.c
index c8ac7ecb08cc..3cf3842292b9 100644
--- a/arch/x86/kernel/sbm/core.c
+++ b/arch/x86/kernel/sbm/core.c
@@ -20,6 +20,12 @@
#include <linux/sbm.h>
#include <linux/sched/task_stack.h>
+/*
+ * FIXME: Remove these includes when there is proper API for defining
+ * which functions can be called from sandbox mode.
+ */
+#include <linux/vmalloc.h>
+
#define GFP_SBM_PGTABLE (GFP_KERNEL | __GFP_ZERO)
#define PGD_ORDER get_order(sizeof(pgd_t) * PTRS_PER_PGD)
@@ -52,8 +58,83 @@ struct sbm_fixup {
sbm_proxy_call_fn proxy;
};
+static int map_range(struct x86_sbm_state *state, unsigned long start,
+ unsigned long end, pgprot_t prot);
+
+/* Map the newly allocated dynamic memory region. */
+static unsigned long post_alloc(struct x86_sbm_state *state,
+ unsigned long objp, size_t size)
+{
+ int err;
+
+ if (!objp)
+ return objp;
+
+ err = map_range(state, objp, objp + size, PAGE_SHARED);
+ if (err) {
+ kfree((void*)objp);
+ return 0UL;
+ }
+ return objp;
+}
+
+/* Allocation proxy handler if size is the 1st parameter. */
+static unsigned long proxy_alloc1(struct x86_sbm_state *state,
+ unsigned long func, struct pt_regs *regs)
+{
+ unsigned long objp;
+
+ objp = x86_sbm_proxy_call(state, func, regs);
+ return post_alloc(state, objp, regs->di);
+}
+
+/* Allocation proxy handler if size is the 2nd parameter. */
+static unsigned long proxy_alloc2(struct x86_sbm_state *state,
+ unsigned long func, struct pt_regs *regs)
+{
+ unsigned long objp;
+
+ objp = x86_sbm_proxy_call(state, func, regs);
+ return post_alloc(state, objp, regs->si);
+}
+
+/* Allocation proxy handler if size is the 3rd parameter. */
+static unsigned long proxy_alloc3(struct x86_sbm_state *state,
+ unsigned long func, struct pt_regs *regs)
+{
+ unsigned long objp;
+
+ objp = x86_sbm_proxy_call(state, func, regs);
+ return post_alloc(state, objp, regs->dx);
+}
+
+/* Proxy handler to free previously allocated memory. */
+static unsigned long proxy_free(struct x86_sbm_state *state,
+ unsigned long func, struct pt_regs *regs)
+{
+ /* TODO: unmap allocated addresses from sandbox! */
+ return x86_sbm_proxy_call(state, func, regs);
+}
+
static const struct sbm_fixup fixups[] =
{
+ /* kmalloc() and friends */
+ { kmalloc_trace, proxy_alloc3 },
+ { __kmalloc, proxy_alloc1 },
+ { __kmalloc_node, proxy_alloc1 },
+ { __kmalloc_node_track_caller, proxy_alloc1 },
+ { kmalloc_large, proxy_alloc1 },
+ { kmalloc_large_node, proxy_alloc1 },
+ { krealloc, proxy_alloc2 },
+ { kfree, proxy_free },
+
+ /* vmalloc() and friends */
+ { vmalloc, proxy_alloc1 },
+ { __vmalloc, proxy_alloc1 },
+ { __vmalloc_node, proxy_alloc1 },
+ { vzalloc, proxy_alloc1 },
+ { vfree, proxy_free },
+
{ }
};
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 238293b1dbe1..2b72118d9bfa 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -28,6 +28,7 @@
#include <asm/page.h>
#include <linux/memcontrol.h>
#include <linux/stackdepot.h>
+#include <linux/sbm.h>
#include "internal.h"
#include "slab.h"
@@ -1208,7 +1209,7 @@ __do_krealloc(const void *p, size_t new_size, gfp_t flags)
*
* Return: pointer to the allocated memory or %NULL in case of error
*/
-void *krealloc(const void *p, size_t new_size, gfp_t flags)
+void * __nosbm krealloc(const void *p, size_t new_size, gfp_t flags)
{
void *ret;
diff --git a/mm/slub.c b/mm/slub.c
index 2ef88bbf56a3..5f2290fe4df0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -42,6 +42,7 @@
#include <kunit/test.h>
#include <kunit/test-bug.h>
#include <linux/sort.h>
+#include <linux/sbm.h>
#include <linux/debugfs.h>
#include <trace/events/kmem.h>
@@ -3913,7 +3914,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_node);
* directly to the page allocator. We use __GFP_COMP, because we will need to
* know the allocation order to free the pages properly in kfree.
*/
-static void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
+static void * __nosbm __kmalloc_large_node(size_t size, gfp_t flags, int node)
{
struct folio *folio;
void *ptr = NULL;
@@ -3938,7 +3939,7 @@ static void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
return ptr;
}
-void *kmalloc_large(size_t size, gfp_t flags)
+void * __nosbm kmalloc_large(size_t size, gfp_t flags)
{
void *ret = __kmalloc_large_node(size, flags, NUMA_NO_NODE);
@@ -3983,26 +3984,26 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node,
return ret;
}
-void *__kmalloc_node(size_t size, gfp_t flags, int node)
+void * __nosbm __kmalloc_node(size_t size, gfp_t flags, int node)
{
return __do_kmalloc_node(size, flags, node, _RET_IP_);
}
EXPORT_SYMBOL(__kmalloc_node);
-void *__kmalloc(size_t size, gfp_t flags)
+void * __nosbm __kmalloc(size_t size, gfp_t flags)
{
return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
}
EXPORT_SYMBOL(__kmalloc);
-void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
- int node, unsigned long caller)
+void * __nosbm __kmalloc_node_track_caller(size_t size, gfp_t flags,
+ int node, unsigned long caller)
{
return __do_kmalloc_node(size, flags, node, caller);
}
EXPORT_SYMBOL(__kmalloc_node_track_caller);
-void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
+void * __nosbm kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
{
void *ret = slab_alloc_node(s, NULL, gfpflags, NUMA_NO_NODE,
_RET_IP_, size);
@@ -4386,7 +4387,7 @@ static void free_large_kmalloc(struct folio *folio, void *object)
*
* If @object is NULL, no operation is performed.
*/
-void kfree(const void *object)
+void __nosbm kfree(const void *object)
{
struct folio *folio;
struct slab *slab;
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d12a17fc0c17..d7a5b715ac03 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -40,6 +40,7 @@
#include <linux/pgtable.h>
#include <linux/hugetlb.h>
#include <linux/sched/mm.h>
+#include <linux/sbm.h>
#include <asm/tlbflush.h>
#include <asm/shmparam.h>
@@ -2804,7 +2805,7 @@ void vfree_atomic(const void *addr)
* if we have CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG, but making the calling
* conventions for vfree() arch-dependent would be a really bad idea).
*/
-void vfree(const void *addr)
+void __nosbm vfree(const void *addr)
{
struct vm_struct *vm;
int i;
@@ -3379,7 +3380,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *__vmalloc_node(unsigned long size, unsigned long align,
+void * __nosbm __vmalloc_node(unsigned long size, unsigned long align,
gfp_t gfp_mask, int node, const void *caller)
{
return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END,
@@ -3394,7 +3395,7 @@ void *__vmalloc_node(unsigned long size, unsigned long align,
EXPORT_SYMBOL_GPL(__vmalloc_node);
#endif
-void *__vmalloc(unsigned long size, gfp_t gfp_mask)
+void * __nosbm __vmalloc(unsigned long size, gfp_t gfp_mask)
{
return __vmalloc_node(size, 1, gfp_mask, NUMA_NO_NODE,
__builtin_return_address(0));
@@ -3413,7 +3414,7 @@ EXPORT_SYMBOL(__vmalloc);
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *vmalloc(unsigned long size)
+void * __nosbm vmalloc(unsigned long size)
{
return __vmalloc_node(size, 1, GFP_KERNEL, NUMA_NO_NODE,
__builtin_return_address(0));
@@ -3453,7 +3454,7 @@ EXPORT_SYMBOL_GPL(vmalloc_huge);
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *vzalloc(unsigned long size)
+void * __nosbm vzalloc(unsigned long size)
{
return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_ZERO, NUMA_NO_NODE,
__builtin_return_address(0));
--
2.34.1
From: Petr Tesarik <[email protected]>
Run aa_unpack() in sandbox mode. Map the input data read-only and then walk
the resulting list created in sandbox mode.
Hashes are calculated in kernel mode, because crypto routines are not
sandboxed. The fixups should sanitize the parameters of AppArmor functions
and they should be defined close to the target functions. Both requires
extending the generic API and adding some more arch hooks, which would grow
this patch series too much.
For demonstration purposes, the fixups blindly trust the input from sandbox
mode and are hard-wired in the arch code.
Signed-off-by: Petr Tesarik <[email protected]>
---
arch/x86/kernel/sbm/core.c | 15 +++++++++++++++
security/apparmor/crypto.c | 7 ++++---
security/apparmor/policy.c | 29 ++++++++++++++++++++++-------
security/apparmor/secid.c | 3 ++-
4 files changed, 43 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kernel/sbm/core.c b/arch/x86/kernel/sbm/core.c
index 3cf3842292b9..3268c00da873 100644
--- a/arch/x86/kernel/sbm/core.c
+++ b/arch/x86/kernel/sbm/core.c
@@ -40,6 +40,16 @@ extern char __nosbm_text_start[], __nosbm_text_end[];
* HACK: PROOF-OF-CONCEPT FIXUP CODE STARTS HERE
*/
+/*
+ * HACK: These declarations are needed to make a proxy call, but in the
+ * final version, AppArmor itself will define the proxies. At least, do
+ * not make the functions callable from here. All we need are their
+ * entry point addresses.
+ */
+extern char aa_alloc_secid[];
+extern char aa_calc_hash[];
+extern char aa_calc_profile_hash[];
+
typedef unsigned long (*sbm_proxy_call_fn)(struct x86_sbm_state *,
unsigned long func,
struct pt_regs *);
@@ -135,6 +145,11 @@ static const struct sbm_fixup fixups[] =
{ vzalloc, proxy_alloc1 },
{ vfree, proxy_free },
+ /* AppArmor */
+ { aa_alloc_secid, x86_sbm_proxy_call },
+ { aa_calc_hash, x86_sbm_proxy_call },
+ { aa_calc_profile_hash, x86_sbm_proxy_call },
+
{ }
};
diff --git a/security/apparmor/crypto.c b/security/apparmor/crypto.c
index aad486b2fca6..db349cd4e467 100644
--- a/security/apparmor/crypto.c
+++ b/security/apparmor/crypto.c
@@ -12,6 +12,7 @@
*/
#include <crypto/hash.h>
+#include <linux/sbm.h>
#include "include/apparmor.h"
#include "include/crypto.h"
@@ -25,7 +26,7 @@ unsigned int aa_hash_size(void)
return apparmor_hash_size;
}
-char *aa_calc_hash(void *data, size_t len)
+char * __nosbm aa_calc_hash(void *data, size_t len)
{
SHASH_DESC_ON_STACK(desc, apparmor_tfm);
char *hash;
@@ -58,8 +59,8 @@ char *aa_calc_hash(void *data, size_t len)
return ERR_PTR(error);
}
-int aa_calc_profile_hash(struct aa_profile *profile, u32 version, void *start,
- size_t len)
+int __nosbm aa_calc_profile_hash(struct aa_profile *profile, u32 version, void *start,
+ size_t len)
{
SHASH_DESC_ON_STACK(desc, apparmor_tfm);
int error;
diff --git a/security/apparmor/policy.c b/security/apparmor/policy.c
index 957654d253dd..f2b9bf851be0 100644
--- a/security/apparmor/policy.c
+++ b/security/apparmor/policy.c
@@ -74,6 +74,7 @@
#include <linux/cred.h>
#include <linux/rculist.h>
#include <linux/user_namespace.h>
+#include <linux/sbm.h>
#include "include/apparmor.h"
#include "include/capability.h"
@@ -1040,6 +1041,11 @@ static struct aa_profile *update_to_newest_parent(struct aa_profile *new)
return newest;
}
+static SBM_DEFINE_CALL(aa_unpack, struct aa_loaddata *, udata,
+ struct list_head *, lh, const char **, ns);
+static SBM_DEFINE_THUNK(aa_unpack, struct aa_loaddata *, udata,
+ struct list_head *, lh, const char **, ns);
+
/**
* aa_replace_profiles - replace profile(s) on the profile list
* @policy_ns: namespace load is occurring on
@@ -1063,12 +1069,20 @@ ssize_t aa_replace_profiles(struct aa_ns *policy_ns, struct aa_label *label,
struct aa_loaddata *rawdata_ent;
const char *op;
ssize_t count, error;
- LIST_HEAD(lh);
+ struct list_head lh, *sbm_lh;
+ struct sbm sbm;
op = mask & AA_MAY_REPLACE_POLICY ? OP_PROF_REPL : OP_PROF_LOAD;
aa_get_loaddata(udata);
/* released below */
- error = aa_unpack(udata, &lh, &ns_name);
+ sbm_init(&sbm);
+ SBM_MAP_READONLY(&sbm, udata->data, udata->size);
+ /* TODO: Handling of list heads could be improved */
+ sbm_lh = SBM_COPY_OUT(&sbm, &lh, sizeof(lh));
+ INIT_LIST_HEAD(sbm_lh);
+ error = sbm_call(&sbm, aa_unpack,
+ SBM_COPY_INOUT(&sbm, udata, sizeof(*udata)), sbm_lh,
+ SBM_COPY_OUT(&sbm, &ns_name, sizeof(ns_name)));
if (error)
goto out;
@@ -1078,7 +1092,7 @@ ssize_t aa_replace_profiles(struct aa_ns *policy_ns, struct aa_label *label,
* fail. Sort ent list and take ns locks in hierarchy order
*/
count = 0;
- list_for_each_entry(ent, &lh, list) {
+ list_for_each_entry(ent, sbm_lh, list) {
if (ns_name) {
if (ent->ns_name &&
strcmp(ent->ns_name, ns_name) != 0) {
@@ -1128,7 +1142,7 @@ ssize_t aa_replace_profiles(struct aa_ns *policy_ns, struct aa_label *label,
}
}
/* setup parent and ns info */
- list_for_each_entry(ent, &lh, list) {
+ list_for_each_entry(ent, sbm_lh, list) {
struct aa_policy *policy;
struct aa_profile *p;
@@ -1159,7 +1173,7 @@ ssize_t aa_replace_profiles(struct aa_ns *policy_ns, struct aa_label *label,
policy = __lookup_parent(ns, ent->new->base.hname);
if (!policy) {
/* first check for parent in the load set */
- p = __list_lookup_parent(&lh, ent->new);
+ p = __list_lookup_parent(sbm_lh, ent->new);
if (!p) {
/*
* fill in missing parent with null
@@ -1198,7 +1212,7 @@ ssize_t aa_replace_profiles(struct aa_ns *policy_ns, struct aa_label *label,
goto fail_lock;
}
}
- list_for_each_entry(ent, &lh, list) {
+ list_for_each_entry(ent, sbm_lh, list) {
if (!ent->old) {
struct dentry *parent;
if (rcu_access_pointer(ent->new->parent)) {
@@ -1220,7 +1234,7 @@ ssize_t aa_replace_profiles(struct aa_ns *policy_ns, struct aa_label *label,
__aa_bump_ns_revision(ns);
if (aa_g_export_binary)
__aa_loaddata_update(udata, ns->revision);
- list_for_each_entry_safe(ent, tmp, &lh, list) {
+ list_for_each_entry_safe(ent, tmp, sbm_lh, list) {
list_del_init(&ent->list);
op = (!ent->old && !ent->rename) ? OP_PROF_LOAD : OP_PROF_REPL;
@@ -1265,6 +1279,7 @@ ssize_t aa_replace_profiles(struct aa_ns *policy_ns, struct aa_label *label,
mutex_unlock(&ns->lock);
out:
+ sbm_destroy(&sbm);
aa_put_ns(ns);
aa_put_loaddata(udata);
kfree(ns_name);
diff --git a/security/apparmor/secid.c b/security/apparmor/secid.c
index 83d3d1e6d9dc..4190666d2dee 100644
--- a/security/apparmor/secid.c
+++ b/security/apparmor/secid.c
@@ -16,6 +16,7 @@
#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/xarray.h>
+#include <linux/sbm.h>
#include "include/cred.h"
#include "include/lib.h"
@@ -116,7 +117,7 @@ void apparmor_release_secctx(char *secdata, u32 seclen)
* Returns: 0 with @label->secid initialized
* <0 returns error with @label->secid set to AA_SECID_INVALID
*/
-int aa_alloc_secid(struct aa_label *label, gfp_t gfp)
+int __nosbm aa_alloc_secid(struct aa_label *label, gfp_t gfp)
{
unsigned long flags;
int ret;
--
2.34.1
On 2/22/24 05:12, Petr Tesarik wrote:
> static const struct sbm_fixup fixups[] =
> {
> + /* kmalloc() and friends */
> + { kmalloc_trace, proxy_alloc3 },
> + { __kmalloc, proxy_alloc1 },
> + { __kmalloc_node, proxy_alloc1 },
> + { __kmalloc_node_track_caller, proxy_alloc1 },
> + { kmalloc_large, proxy_alloc1 },
> + { kmalloc_large_node, proxy_alloc1 },
> + { krealloc, proxy_alloc2 },
> + { kfree, proxy_free },
> +
> + /* vmalloc() and friends */
> + { vmalloc, proxy_alloc1 },
> + { __vmalloc, proxy_alloc1 },
> + { __vmalloc_node, proxy_alloc1 },
> + { vzalloc, proxy_alloc1 },
> + { vfree, proxy_free },
> +
> { }
> };
Petr, thanks for sending this. This _is_ a pretty concise example of
what it means to convert kernel code to run in your sandbox mode. But,
from me, it's still "no thanks".
Establishing and maintaining this proxy list will be painful. Folks
will change the code to call something new and break this *constantly*.
That goes for infrastructure like the allocators and for individual
sandbox instances like apparmor.
It's also telling that sandboxing a bit of apparmor took four fixups.
That tells me we're probably still only looking at the tip of the icebeg
if we were to convert a bunch more sites.
That's on top of everything I was concerned about before.
On 2/22/24 09:57, Petr Tesařík wrote:
> * Hardware designers are adding (other) hardware security defenses to
> ring-0 that are not applied to ring-3.
>
> Could you give an example of these other security defenses, please?
Here's one example:
> https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/data-dependent-prefetcher.html
"DDP is neither trained by nor triggered by supervisor-mode accesses."
But seriously, this is going to be my last message on this topic. I
appreciate your enthusiasm, but I don't see any viable way forward for
this approach.
On Thu, 22 Feb 2024 07:51:00 -0800
Dave Hansen <[email protected]> wrote:
> On 2/22/24 05:12, Petr Tesarik wrote:
> > static const struct sbm_fixup fixups[] =
> > {
> > + /* kmalloc() and friends */
> > + { kmalloc_trace, proxy_alloc3 },
> > + { __kmalloc, proxy_alloc1 },
> > + { __kmalloc_node, proxy_alloc1 },
> > + { __kmalloc_node_track_caller, proxy_alloc1 },
> > + { kmalloc_large, proxy_alloc1 },
> > + { kmalloc_large_node, proxy_alloc1 },
> > + { krealloc, proxy_alloc2 },
> > + { kfree, proxy_free },
> > +
> > + /* vmalloc() and friends */
> > + { vmalloc, proxy_alloc1 },
> > + { __vmalloc, proxy_alloc1 },
> > + { __vmalloc_node, proxy_alloc1 },
> > + { vzalloc, proxy_alloc1 },
> > + { vfree, proxy_free },
> > +
> > { }
> > };
>
> Petr, thanks for sending this. This _is_ a pretty concise example of
> what it means to convert kernel code to run in your sandbox mode. But,
> from me, it's still "no thanks".
>
> Establishing and maintaining this proxy list will be painful. Folks
> will change the code to call something new and break this *constantly*.
>
> That goes for infrastructure like the allocators and for individual
> sandbox instances like apparmor.
Understood.
OTOH the proxy list is here for the PoC so I could send something that
builds and runs without making it an overly big patch series. As
explained in patch 5/5, the goal is not to make a global list. Instead,
each instance should define what it needs and that way define its
specific policy of interfacing with the rest of the kernel.
To give an example, these AppArmor fixups would be added only to the
sandbox which runs aa_unpack(), but not to the one which runs
unpack_to_rootfs(), which is another PoC I did (but required porting
more patches).
If more fixups are needed after you change your code, you know you've
just added a new dependency. It's then up to you to decide if it was
intentional.
> It's also telling that sandboxing a bit of apparmor took four fixups.
> That tells me we're probably still only looking at the tip of the icebeg
> if we were to convert a bunch more sites.
Yes, it is the cost paid for getting code and data flows under control.
In your opinion this kind of memory safety is not worth the effort of
explicitly defining the interface between a sandboxed component and the
rest of the kernel, because it increases maintenance costs. Correct?
> That's on top of everything I was concerned about before.
Good, I think I can understand the new concern, but regarding
everything you were concerned about before, this part is still not
quite clear to me. I'll try to summarize the points:
* Running code in ring-0 is inherently safer than running code in
ring-3.
Since what I'm trying to do is protect kernel data structures
from memory safety bugs in another part of the kernel, it roughly
translates to: "Kernel data structures are better protected from
rogue kernel modules than from userspace applications." This cannot
possibly be what you are trying to say.
* SMAP, SMEP and/or LASS can somehow protect one part of the kernel
from memory safety bugs in another part of the kernel.
I somehow can't see how that is the case. I have always thought that:
* SMEP prevents the kernel to execute code from user pages.
* SMAP prevents the kernel to read from or write into user pages.
* LASS does pretty much the same job as SMEP+SMAP, but instead of
using page table protection bits, it uses the highest bit of the
virtual address because that's much faster.
* Hardware designers are adding (other) hardware security defenses to
ring-0 that are not applied to ring-3.
Could you give an example of these other security defenses, please?
* Ring-3 is more exposed to attacks.
This statement sounds a bit too vague on its own. What attack vectors
are we talking about? The primary attack vector that SBM is trying to
address are exploits of kernel code vulnerabilities triggered by data
from sources outside the kernel (boot loader, userspace, etc.).
H. Peter Anvin added a few other points:
* SBM has all the downsides of a microkernel without the upsides.
I can only guess what would be the downsides and upsides...
One notorious downside is performance. Agreed, there is some overhead.
I'm not promoting SBM for time-critical operations. But compared to
user-mode helpers (which was suggested as an alternative for one of
the proposed scenarios), the overhead of SBM is at least an order of
magnitude less.
IPC and the need to define how servers interact with each other is
another downside I can think of. Yes, there is a bit of it in SBM, as
you have correctly noted above.
* SBM introduces architectural changes that are most definitely *very*
harmful both to maintainers and users.
It is very difficult to learn something from this statement. Could
you give some examples of how SBM harms either group, please?
* SBM feels like paravirtualization all over again.
All right, hpa, you've had lots of pain with paravirtualization. I
feel with you, I've had my part of it too. Can you imagine how much
trouble I could have spared myself for the libkdumpfile project if I
didn't have to deal with the difference between "physical addresses"
and "machine addresses"?
However, this is hardly a relevant point. The Linux kernel community
is respected for making decisions based on facts, not feelings.
* SBM exposes kernel memory to user space.
This is a misunderstanding. Sandbox mode does not share anything at
all with user mode. It does share some CPU state with kernel mode,
but not with user mode. If "user space" was intended to mean "Ring-3",
then it doesn't explain how that is a really bad idea.
* SBM is not needed, because there is already eBPF.
Well, yes, but I believe they work on a different level. For example,
eBPF needs a verifier to ensure memory safety. If you run eBPF code
itself in a sandbox instead, that verifier is not needed, because
memory safety is enforced by CPU hardware.
When hpa says that SandBox Mode is "an enormous step in the wrong
direction", I want to understand why this direction is wrong, so I can
take a step in the right direction next time.
So far there has been only one objective concern: the need to track code
(and data) dependencies explicitly. AFAICS this is an inherent drawback
of any kind of program decomposition.
Is decomposition considered harmful?
Petr T
On Fri, 2024-02-16 at 20:54 +0100, Roberto Sassu wrote:
> On 2/16/2024 7:44 PM, Matthew Wilcox wrote:
> > On Fri, Feb 16, 2024 at 05:53:01PM +0100, Roberto Sassu wrote:
> > > On Fri, 2024-02-16 at 16:44 +0000, Matthew Wilcox wrote:
> > > > On Fri, Feb 16, 2024 at 04:24:33PM +0100, Petr Tesarik wrote:
> > > > > From: David Howells <[email protected]>
> > > > >
> > > > > Implement a PGP data parser for the crypto key type to use when
> > > > > instantiating a key.
> > > > >
> > > > > This parser attempts to parse the instantiation data as a PGP packet
> > > > > sequence (RFC 4880) and if it parses okay, attempts to extract a public-key
> > > > > algorithm key or subkey from it.
> > > >
> > > > I don't understand why we want to do this in-kernel instead of in
> > > > userspace and then pass in the actual key.
> > >
> > > Sigh, this is a long discussion.
> >
> > Well, yes. When you don't lay out why this is of value, it turns into a
> > long discussion. This isn't fun for me either.
> >
> > > PGP keys would be used as a system-wide trust anchor to verify RPM
> > > package headers, which already contain file digests that can be used as
> > > reference values for kernel-enforced integrity appraisal.
> >
> > The one example we have of usage comes in patch 7 of this series and is:
> >
> > gpg --dearmor < <PGP key> | keyctl padd asymmetric "" @u
> >
> > And you're already using two userspace programs there. Why not a third?
>
> I think this is very easy to answer. Why not extracting the public key
> from an x509 certificate in user space, sending it to the kernel, and
> using it for kernel module verification?
>
> > gpg --dearmor < <PGP key> | ./scripts/parse-pgp-packets | keyctl padd asymmetric "" @u
> >
> > > With the assumptions that:
> > >
> > > - In a locked-down system the kernel has more privileges than root
> > > - The kernel cannot offload this task to an user space process due to
> > > insufficient isolation
> > >
> > > the only available option is to do it in the kernel (that is what I got
> > > as suggestion).
> >
> > This sounds like there's some other way of getting the key into the
> > kernel which doesn't rely on userspace. Or are you assuming that nobody
> > bothered to trojan 'cat'?
>
> Apologies for not providing the full information at once. I'm worried
> that would be too long, and pieces can be lost in the way. If it is not
> a problem, I'm going to clarify on request.
>
> Ok, so, I'm not going to use cat to upload the PGP keys. These will be
> embedded in the kernel image, when the Linux distribution vendors build
> their kernel.
>
> This works for both secure boot and trusted boot, since the kernel image
> can be measured/verified by the boot loader.
>
> Another source for keys is the MOK database, since users might want the
> ability to verify their own software, which does not come from the Linux
> distribution.
>
> I briefly anticipated the full picture, but I will tell it more explicitly.
>
> The kernel, with the embedded PGP keys, will be able to verify the
> signature of the RPM package headers.
>
> A component that I recently developed, the digest_cache LSM, has the
> ability to extract file digests from RPM headers and provide a simple
> interface for IMA, IPE, BPF LSM and any other component to query the
> calculated digest of files being accessed, and allow/deny access to them
> depending on whether the query is successful or not.
Not the proper thread, but since we talked about it...
I finally put together the PGP key and signature parser, the
digest_cache LSM and the IMA integration patch sets, and built three
openSUSE Tumbleweed packages (kernel, digest-cache-tools, dracut),
which basically enable the integrity features that the kernel (IMA)
supports:
- IMA measurement list with RPM headers and (eventually) unknown files
that are not from Tumbleweed;
- Ability to obtain a deterministic TPM PCR value, suitable for sealing
of TPM keys
- Out of the box integrity enforcement on executable code, based on the
provenance from openSUSE Tumbleweed; nothing else is required other
than those three packages
An introduction and a guide with configuration steps can be found at:
https://github.com/linux-integrity/digest-cache-tools
I would also appreciate your comments.
Thanks
Roberto
> I already anticipate the question, if you have the problem parsing PGP
> keys, why don't you have the problem parsing RPM package headers?
>
> I started finding a solution before this became available, and the only
> alternative I found was to formally verify my code. So, I took Frama-C,
> wrote the assertions, and verified that not only the code is
> functionally correct for correct sequences of bytes, but that there is
> no illegal memory access for any arbitrary sequence (unfortunately, I
> can prove for a small buffer size).
>
> So, I'm probably going to do the same for the PGP parser, if this does
> not fly. But, we were very optimistic that this could be a valid
> alternative!
>
> Roberto