Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752293AbdHJIYp (ORCPT ); Thu, 10 Aug 2017 04:24:45 -0400 Received: from mailapp01.imgtec.com ([195.59.15.196]:30964 "EHLO mailapp01.imgtec.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751650AbdHJIYl (ORCPT ); Thu, 10 Aug 2017 04:24:41 -0400 Subject: Re: [PATCH] MIPS: usercopy: Implement stack frame object validation To: Kees Cook CC: Ralf Baechle , Linux MIPS Mailing List , "kernel-hardening@lists.openwall.com" , LKML , James Hogan , Paul Burton , Josh Poimboeuf References: <1502195022-18161-1-git-send-email-matt.redfearn@imgtec.com> From: Matt Redfearn Message-ID: <28ab1c38-f8a7-3fca-7a5a-e44248bec69f@imgtec.com> Date: Thu, 10 Aug 2017 09:24:38 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Originating-IP: [10.150.130.83] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7391 Lines: 201 Hi Kees, On 08/08/17 20:11, Kees Cook wrote: > On Tue, Aug 8, 2017 at 5:23 AM, Matt Redfearn wrote: >> This implements arch_within_stack_frames() for MIPS that validates if an >> object is wholly contained by a kernel stack frame. >> >> With CONFIG_HARDENED_USERCOPY enabled, MIPS now passes the LKDTM tests >> USERCOPY_STACK_FRAME_TO, USERCOPY_STACK_FRAME_FROM and >> USERCOPY_STACK_BEYOND on a Creator Ci40. >> >> Since the MIPS kernel does not use frame pointers, we re-use the MIPS >> kernels stack frame unwinder which uses instruction inspection to deduce >> the stack frame size. As such it introduces a larger performance penalty >> than on arches which use the frame pointer. > Hmm, given x86's plans to drop the frame pointer, I wonder if the > inter-frame checking code should be gated by a CONFIG. This (3%) is a > rather high performance hit to take for a relatively small protection > (it's mainly about catching too-large-reads, since most > too-large-writes will be caught by the stack canary). > > What do you think? If x86 is going to move to a more expensive stack unwinding method than the frame pointer then I guess it may end up seeing a similar performance hit to what we see on MIPS. In that case it might make sense to add a CONFIG for this such that only those who wish to make the trade off of performance for the added protection need enable it. Thanks, Matt > > -Kees > >> On qemu, before this patch, hackbench gives: >> Running with 10*40 (== 400) tasks. >> Time: 5.484 >> Running with 10*40 (== 400) tasks. >> Time: 4.039 >> Running with 10*40 (== 400) tasks. >> Time: 3.908 >> Running with 10*40 (== 400) tasks. >> Time: 3.955 >> Running with 10*40 (== 400) tasks. >> Time: 4.185 >> Running with 10*40 (== 400) tasks. >> Time: 4.497 >> Running with 10*40 (== 400) tasks. >> Time: 3.980 >> Running with 10*40 (== 400) tasks. >> Time: 4.078 >> Running with 10*40 (== 400) tasks. >> Time: 4.219 >> Running with 10*40 (== 400) tasks. >> Time: 4.026 >> >> Giving an average of 4.2371 >> >> With this patch, hackbench gives: >> Running with 10*40 (== 400) tasks. >> Time: 5.671 >> Running with 10*40 (== 400) tasks. >> Time: 4.282 >> Running with 10*40 (== 400) tasks. >> Time: 4.101 >> Running with 10*40 (== 400) tasks. >> Time: 4.040 >> Running with 10*40 (== 400) tasks. >> Time: 4.683 >> Running with 10*40 (== 400) tasks. >> Time: 4.387 >> Running with 10*40 (== 400) tasks. >> Time: 4.289 >> Running with 10*40 (== 400) tasks. >> Time: 4.027 >> Running with 10*40 (== 400) tasks. >> Time: 4.048 >> Running with 10*40 (== 400) tasks. >> Time: 4.079 >> >> Giving an average of 4.3607 >> >> This indicates an additional 3% overhead for inspecting the kernel stack >> when CONFIG_HARDENED_USERCOPY is enabled. >> >> This patch is based on Linux v4.13-rc4, and for correct operation on >> microMIPS depends on my series "MIPS: Further microMIPS stack unwinding >> fixes" >> >> Signed-off-by: Matt Redfearn >> Reviewed-by: James Hogan >> --- >> >> arch/mips/Kconfig | 1 + >> arch/mips/include/asm/thread_info.h | 74 +++++++++++++++++++++++++++++++++++++ >> 2 files changed, 75 insertions(+) >> >> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig >> index 8dd20358464f..6cbf2d525c8d 100644 >> --- a/arch/mips/Kconfig >> +++ b/arch/mips/Kconfig >> @@ -35,6 +35,7 @@ config MIPS >> select HAVE_ARCH_SECCOMP_FILTER >> select HAVE_ARCH_TRACEHOOK >> select HAVE_ARCH_TRANSPARENT_HUGEPAGE if CPU_SUPPORTS_HUGEPAGES && 64BIT >> + select HAVE_ARCH_WITHIN_STACK_FRAMES if KALLSYMS >> select HAVE_CBPF_JIT if (!64BIT && !CPU_MICROMIPS) >> select HAVE_EBPF_JIT if (64BIT && !CPU_MICROMIPS) >> select HAVE_CC_STACKPROTECTOR >> diff --git a/arch/mips/include/asm/thread_info.h b/arch/mips/include/asm/thread_info.h >> index b439e512792b..931652460393 100644 >> --- a/arch/mips/include/asm/thread_info.h >> +++ b/arch/mips/include/asm/thread_info.h >> @@ -14,6 +14,80 @@ >> >> #include >> >> +#ifdef CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES >> + >> +/* >> + * Walks up the stack frames to make sure that the specified object is >> + * entirely contained by a single stack frame. >> + * >> + * Returns: >> + * GOOD_FRAME if within a frame >> + * BAD_STACK if placed across a frame boundary (or outside stack) >> + * NOT_STACK unable to determine >> + */ >> +static inline int arch_within_stack_frames(const void *const stack, >> + const void *const stackend, >> + const void *obj, unsigned long len) >> +{ >> + /* Avoid header recursion by just declaring this here */ >> + extern unsigned long unwind_stack_by_address( >> + unsigned long stack_page, >> + unsigned long *sp, >> + unsigned long pc, >> + unsigned long *ra); >> + unsigned long sp, lastsp, ra, pc; >> + int skip_frames; >> + >> + /* Get this frame's details */ >> + sp = (unsigned long)__builtin_frame_address(0); >> + pc = (unsigned long)current_text_addr(); >> + >> + /* >> + * Skip initial frames to get back the function requesting the copy. >> + * Unwind the frames of: >> + * arch_within_stack_frames (inlined into check_stack_object) >> + * __check_object_size >> + * This leaves sp & pc in the frame associated with >> + * copy_{to,from}_user() (inlined into do_usercopy_stack) >> + */ >> + for (skip_frames = 0; skip_frames < 2; skip_frames++) { >> + pc = unwind_stack_by_address((unsigned long)stack, &sp, pc, &ra); >> + if (!pc) >> + return BAD_STACK; >> + } >> + >> + if ((unsigned long)obj < sp) { >> + /* obj is not in the frame of the requestor or it's callers */ >> + return BAD_STACK; >> + } >> + >> + /* >> + * low ---------------------------------------> high >> + * [local vars][saved regs][ra][local vars'] >> + * ^ ^ >> + * lastsp sp >> + * ^----------------------^ >> + * allow copies only within here >> + */ >> + do { >> + lastsp = sp; >> + pc = unwind_stack_by_address((unsigned long)stack, &sp, pc, &ra); >> + if ((((unsigned long)obj) >= lastsp) && >> + (((unsigned long)obj + len) <= (sp - sizeof(void *)))) { >> + /* obj is entirely within this stack frame */ >> + return GOOD_FRAME; >> + } >> + } while (pc); >> + >> + /* >> + * We can't unwind any further. If we haven't found the object entirely >> + * within one of our callers frames, it must be a bad object. >> + */ >> + return BAD_STACK; >> +} >> + >> +#endif /* CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES */ >> + >> /* >> * low level task data that entry.S needs immediate access to >> * - this struct should fit entirely inside of one cache line >> -- >> 2.7.4 >> > >