Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp3645728ybi; Fri, 19 Jul 2019 06:42:37 -0700 (PDT) X-Google-Smtp-Source: APXvYqyAa3JBoHM6sDwtLnsG0b/uxvdAEBOlI7ZG3lLN2Mw74v61APxl5Y/erW7Ld/v4Tgbbr18p X-Received: by 2002:a17:902:324:: with SMTP id 33mr55688200pld.340.1563543757552; Fri, 19 Jul 2019 06:42:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563543757; cv=none; d=google.com; s=arc-20160816; b=jve6iVtIKy83L4jVBxsapLrDepxoPw4AyOODGjONh/NOrwBGFhrNW56oDuUSJ50Rx7 nH8btlaG61IIONOLyKnW39bJoRBPUs3GcElAGJK9NW5R+wsr0aQ/GJq4fqt7UBi5ntrc Jlygi+XCWFL0qhZGNjAEabv6bnJNZqstVr0JL7Zf2/Bd7j8KrXW7q34RSJW4WONZj2BQ 0fssqXyhRmFSTQ3dy14AOZ7kYrj+GI37qHvK4215VzT9G6hU/GlNluYKVXEuZ/DhgrBi kz3CX21MWL59C9enGW0RsvKRLDPpE9+5v7t0UrAsdn899CaSG52m0dE1KkpeH5EQvn8X 7UMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:from:subject:mime-version :message-id:date:dkim-signature; bh=iYL0YLGspAoacIEhwiD/4Rcr5rrGFUrtzpWbGW+PP1M=; b=HHnffJte2qaXTlho90HxSHyk1XZQpp2yV7DT7qi+lOP6a2kWrAcrDEyyGr3YIEig6B CT6DkE/WpZnO0fyESkka9OOH5WCfcaxUnaU7R7xIt4/dXEW10ajrIWRMvA7FvWs1Hp25 Bjq8nTbpo3UGA/5od/Nfa6YWpZ5prHYw/v4giY/Sf701RheMCmS3S8PdZ1JKmylz3Zot XQ4oSAjE2Da8rfCvl/KuqV5wAWYScYYsVgDxOk2kxOsOIJSASmeS+Q6612HPPzXi0OFo 6z0uKmvSAen8lTdKcfwdpvyWVvuQK1WSCA7CZibi4P2oVl3vp5HxE3wELEooHjuURiPk 0+aw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=qlZ5YXsQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 130si2970810pfb.117.2019.07.19.06.42.21; Fri, 19 Jul 2019 06:42:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=qlZ5YXsQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728891AbfGSN2h (ORCPT + 99 others); Fri, 19 Jul 2019 09:28:37 -0400 Received: from mail-pg1-f201.google.com ([209.85.215.201]:46749 "EHLO mail-pg1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726243AbfGSN2g (ORCPT ); Fri, 19 Jul 2019 09:28:36 -0400 Received: by mail-pg1-f201.google.com with SMTP id u1so18713818pgr.13 for ; Fri, 19 Jul 2019 06:28:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=iYL0YLGspAoacIEhwiD/4Rcr5rrGFUrtzpWbGW+PP1M=; b=qlZ5YXsQ3q6VcrDkxcJvSAccOBJQViXXTLP1/OSN+TYafaoVDqCkXR5iOqSJ7OW/Bb 8a4zee129xEC8hjucIwplQkU7GE0DbdRixq/X74le1Cy+9vc0Ag9T7Rpq3I6Zae25GnF v9ChASD1iHsW+qVJUeO62zPqpEdxjxhJewf7uzaELdTjLzu8mM5MNAnKMwVG7Hvu3uAF 7oT8cxxLYyEUej8sEaBakNQmX4xfS6+a/LVlYJVNcyhCb6VYbMF3ru9Yqvayn98pdxOS mb1994PMmNjdROUc/irvGXuVMsFogZzJoabzwlEdJ+V8biznyaI9/7FPSbXTous/OBKj sPiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=iYL0YLGspAoacIEhwiD/4Rcr5rrGFUrtzpWbGW+PP1M=; b=jFFsi3WBUM2A+iMD5mhnMQFIqp1+bkWdA3kwOu+jBfSn2ClVaJmEeoymUPGXhC3GKj DdxoV4iKxA8gghrSg2ZgiOs+hfRqtC8fOYmc/B8+R24wFtS15DhxzNt0k7f7d/MX8Wdh udnM8FGGs4VnFIP+WmJFqbsZGBD5S/Dcf3sGAiuO5kvbnPB6025DGrLuZW1Ui9Qdv98s mGRLqeW4lNBqDdO/xHZkYtpdOM0cE/pWMNdorAJVuosJOm+6bTurwzv/SbomBue83Wq6 cUB5VRbOIs6DIqrrI+AnQ5RZHlPwPlkWH4VcryA0pmkKwULcBp7MyGdkNuXIuurhnktf RkuQ== X-Gm-Message-State: APjAAAV+avW8mf1Pxb+atebzNJUxiv3oMjhcTnfGqjZ206B3bjfxvQIw tvNrGF9m7a4rh2AJX316MFWLtv+YxA== X-Received: by 2002:a63:ef4b:: with SMTP id c11mr51427206pgk.129.1563542915386; Fri, 19 Jul 2019 06:28:35 -0700 (PDT) Date: Fri, 19 Jul 2019 15:28:17 +0200 Message-Id: <20190719132818.40258-1-elver@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.22.0.657.g960e92d24f-goog Subject: [PATCH 1/2] kernel/fork: Add support for stack-end guard page From: Marco Elver To: elver@google.com Cc: linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Andrey Ryabinin , Alexander Potapenko , Dmitry Vyukov , Andrey Konovalov , Mark Rutland , Peter Zijlstra , x86@kernel.org, kasan-dev@googlegroups.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Enabling STACK_GUARD_PAGE helps catching kernel stack overflows immediately rather than causing difficult-to-diagnose corruption. Note that, unlike virtually-mapped kernel stacks, this will effectively waste an entire page of memory; however, this feature may provide extra protection in cases that cannot use virtually-mapped kernel stacks, at the cost of a page. The motivation for this patch is that KASAN cannot use virtually-mapped kernel stacks to detect stack overflows. An alternative would be implementing support for vmapped stacks in KASAN, but would add significant extra complexity. While the stack-end guard page approach here wastes a page, it is significantly simpler than the alternative. We assume that the extra cost of a page can be justified in the cases where STACK_GUARD_PAGE would be enabled. Note that in an earlier prototype of this patch, we used 'set_memory_{ro,rw}' functions, which flush the TLBs. This, however, turned out to be unacceptably expensive, especially when run with fuzzers such as Syzkaller, as the kernel would encounter frequent RCU timeouts. The current approach of not flushing the TLB is therefore best-effort, but works in the test cases considered -- any comments on better alternatives or improvements are welcome. Signed-off-by: Marco Elver Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: Andrey Ryabinin Cc: Alexander Potapenko Cc: Dmitry Vyukov Cc: Andrey Konovalov Cc: Mark Rutland Cc: Peter Zijlstra Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Cc: kasan-dev@googlegroups.com --- arch/Kconfig | 15 +++++++++++++++ arch/x86/include/asm/page_64_types.h | 8 +++++++- include/linux/sched/task_stack.h | 11 +++++++++-- kernel/fork.c | 22 +++++++++++++++++++++- 4 files changed, 52 insertions(+), 4 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index e8d19c3cb91f..cca3258fff1f 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -935,6 +935,21 @@ config LOCK_EVENT_COUNTS the chance of application behavior change because of timing differences. The counts are reported via debugfs. +config STACK_GUARD_PAGE + default n + bool "Use stack-end page as guard page" + depends on !VMAP_STACK && ARCH_HAS_SET_DIRECT_MAP && THREAD_INFO_IN_TASK && !STACK_GROWSUP + help + Enable this if you want to use the stack-end page as a guard page. + This causes kernel stack overflows to be caught immediately rather + than causing difficult-to-diagnose corruption. Note that, unlike + virtually-mapped kernel stacks, this will effectively waste an entire + page of memory; however, this feature may provide extra protection in + cases that cannot use virtually-mapped kernel stacks, at the cost of + a page. Note that, this option does not implicitly increase the + default stack size. The main use-case is for KASAN to avoid reporting + misleading bugs due to stack overflow. + source "kernel/gcov/Kconfig" source "scripts/gcc-plugins/Kconfig" diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h index 288b065955b7..b218b5713c02 100644 --- a/arch/x86/include/asm/page_64_types.h +++ b/arch/x86/include/asm/page_64_types.h @@ -12,8 +12,14 @@ #define KASAN_STACK_ORDER 0 #endif +#ifdef CONFIG_STACK_GUARD_PAGE +#define STACK_GUARD_SIZE PAGE_SIZE +#else +#define STACK_GUARD_SIZE 0 +#endif + #define THREAD_SIZE_ORDER (2 + KASAN_STACK_ORDER) -#define THREAD_SIZE (PAGE_SIZE << THREAD_SIZE_ORDER) +#define THREAD_SIZE ((PAGE_SIZE << THREAD_SIZE_ORDER) - STACK_GUARD_SIZE) #define EXCEPTION_STACK_ORDER (0 + KASAN_STACK_ORDER) #define EXCEPTION_STKSZ (PAGE_SIZE << EXCEPTION_STACK_ORDER) diff --git a/include/linux/sched/task_stack.h b/include/linux/sched/task_stack.h index 2413427e439c..7ee86ad0a282 100644 --- a/include/linux/sched/task_stack.h +++ b/include/linux/sched/task_stack.h @@ -11,6 +11,13 @@ #ifdef CONFIG_THREAD_INFO_IN_TASK +#ifndef STACK_GUARD_SIZE +#ifdef CONFIG_STACK_GUARD_PAGE +#error "Architecture not compatible with STACK_GUARD_PAGE" +#endif +#define STACK_GUARD_SIZE 0 +#endif + /* * When accessing the stack of a non-current task that might exit, use * try_get_task_stack() instead. task_stack_page will return a pointer @@ -18,14 +25,14 @@ */ static inline void *task_stack_page(const struct task_struct *task) { - return task->stack; + return task->stack + STACK_GUARD_SIZE; } #define setup_thread_stack(new,old) do { } while(0) static inline unsigned long *end_of_stack(const struct task_struct *task) { - return task->stack; + return task->stack + STACK_GUARD_SIZE; } #elif !defined(__HAVE_THREAD_FUNCTIONS) diff --git a/kernel/fork.c b/kernel/fork.c index d8ae0f1b4148..22033b03f7da 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -94,6 +94,7 @@ #include #include #include +#include #include #include @@ -249,6 +250,14 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node) THREAD_SIZE_ORDER); if (likely(page)) { + if (IS_ENABLED(CONFIG_STACK_GUARD_PAGE)) { + /* + * Best effort: do not flush TLB to avoid the overhead + * of flushing all TLBs. + */ + set_direct_map_invalid_noflush(page); + } + tsk->stack = page_address(page); return tsk->stack; } @@ -258,6 +267,7 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node) static inline void free_thread_stack(struct task_struct *tsk) { + struct page* stack_page; #ifdef CONFIG_VMAP_STACK struct vm_struct *vm = task_stack_vm_area(tsk); @@ -285,7 +295,17 @@ static inline void free_thread_stack(struct task_struct *tsk) } #endif - __free_pages(virt_to_page(tsk->stack), THREAD_SIZE_ORDER); + stack_page = virt_to_page(tsk->stack); + + if (IS_ENABLED(CONFIG_STACK_GUARD_PAGE)) { + /* + * Avoid flushing TLBs, and instead rely on spurious fault + * detection of stale TLBs. + */ + set_direct_map_default_noflush(stack_page); + } + + __free_pages(stack_page, THREAD_SIZE_ORDER); } # else static struct kmem_cache *thread_stack_cache; -- 2.22.0.657.g960e92d24f-goog