Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp4763724iob; Mon, 9 May 2022 00:38:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwKT3jAdnURDFwEFU2ZYo4OKz0XdG6D6/iF077A9qMWMmVJ/Vigw7fBShIlxeAQXLIKSTCA X-Received: by 2002:a17:902:d717:b0:15e:b6ed:4824 with SMTP id w23-20020a170902d71700b0015eb6ed4824mr15003126ply.110.1652081900016; Mon, 09 May 2022 00:38:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652081900; cv=none; d=google.com; s=arc-20160816; b=WXr6vV76KyOT4ljiElKMezo0bGc2ld9aN6Q1ld+fDM0nkEAMXMrZi3VPzR57IDqYfC KFghNJy5XAACHsSc/wovltcvwFThjVHLv2c5MxlFbi53ynmk0SDLisWGC4hkuEfl1cJu 1CIGWkGn5Q+xFsk6BDfXGU5GGg986qd/5z4WfCEe1tPwrTAYWRCaR52TCPdEp6lTvzLX x3N3H8f765YXxitnYoNho5FjhOBFza8JzGiFiXbT6yevnUqiGR8+1gw0Gj3svDJCfvo4 hwqNK2uwzJ/q/aeR7v8pKkFXfAYN1vCMWdmBVQNk7e38ve8vXGRJX0GvTVWwjvyagov2 O/Kw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:reply-to:user-agent :mime-version:date:message-id; bh=hCHtx9BuNNSfq1aRuUx+wGgoggwlZE/FbOgWdTxGgow=; b=lbW329ssaIbAw4o0bCBS/G+OR4OsaHtbyHICoAvoaH42d8c5uEvK+TFCVc1F0oHxJz 0UUq83UTiWSr1hNT4Xyub33v9cPCNvVFEthRpVbM/8gRp5pqGxVLdRoemlqoowPxdxgr WBzt/+6W3+GVRnIhEv1vj1ki1vzlqFUuPm7bHuOK4d2Ei8b1QhiQ+qjbseVHJIVIdRzN CzHxiDQCj6T4ZTEWaWlCEK6u7PTrS00Dzj55WF7RR8GQbCfHmbyheZoaMzLstJC8Husm UiGkZRZfr45xHbHpeLchIO3csDt9OeBitnI9Xac1CWFbyIMxjBKdEQ8Z+AAZK0TcMa+V zpMA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id i5-20020aa78d85000000b004fa3a8e0062si11547339pfr.281.2022.05.09.00.38.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 May 2022 00:38:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 6B0991BC838; Mon, 9 May 2022 00:33:54 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233069AbiEHVbV (ORCPT + 99 others); Sun, 8 May 2022 17:31:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232970AbiEHVbT (ORCPT ); Sun, 8 May 2022 17:31:19 -0400 Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 089ACDEDF for ; Sun, 8 May 2022 14:27:28 -0700 (PDT) Received: by mail-ej1-f54.google.com with SMTP id z2so21865508ejj.3 for ; Sun, 08 May 2022 14:27:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:reply-to :subject:content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=hCHtx9BuNNSfq1aRuUx+wGgoggwlZE/FbOgWdTxGgow=; b=cGnTYO2YREB9mPRagN8DURZ0gBn1jccyP3a7CPhPyyPbBRqEeLxSIBenoeiSaGHCZt JjGgpv/htkXayqFG4KKQqOQ+nmdllt3tW6FZ+hZwMQyUHlDpgGN8olqGrw+zE8++PjYF Ls3Woas91T85Pg4gFhmUC+UU44FpVPNkfnTfPhWMiUeRSUINqIXmWKDH8L7c2QDpKPI3 UmidUb3DhN7Eeskm4KO2OgSPa39r7mAFO/3Acx1I4n5EiGuu3zon8bbxzaDsFtepf7SA 6dyE4AVHvQYbjz+Zoqwv2uCO6m+q4wiqQDKZ8Q4OrFAgKD037BA7BaNsV029SOlkzjHw cGJg== X-Gm-Message-State: AOAM531KQSaSsHZTbrhtYJh7Nxp2vop4hPHLX5AN2o0GefxHxpuM/A3A HVrEkrUAF2nio7wLFMXiRC4= X-Received: by 2002:a17:907:980f:b0:6f8:616f:eccc with SMTP id ji15-20020a170907980f00b006f8616fecccmr6615087ejc.381.1652045246340; Sun, 08 May 2022 14:27:26 -0700 (PDT) Received: from [10.9.0.34] ([46.166.128.205]) by smtp.gmail.com with ESMTPSA id bo9-20020a0564020b2900b0042617ba63d5sm5243357edb.95.2022.05.08.14.27.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 08 May 2022 14:27:25 -0700 (PDT) Message-ID: Date: Mon, 9 May 2022 00:27:22 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Reply-To: alex.popov@linux.com Subject: Re: [PATCH v2 06/13] stackleak: rework stack high bound handling Content-Language: en-US To: Mark Rutland , linux-arm-kernel@lists.infradead.org Cc: akpm@linux-foundation.org, catalin.marinas@arm.com, keescook@chromium.org, linux-kernel@vger.kernel.org, luto@kernel.org, will@kernel.org References: <20220427173128.2603085-1-mark.rutland@arm.com> <20220427173128.2603085-7-mark.rutland@arm.com> From: Alexander Popov In-Reply-To: <20220427173128.2603085-7-mark.rutland@arm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 27.04.2022 20:31, Mark Rutland wrote: > Prior to returning to userpace, we reset current->lowest_stack to a > reasonable high bound. Currently we do this by subtracting the arbitrary > value `THREAD_SIZE/64` from the top of the stack, for reasons lost to > history. > > Looking at configurations today: > > * On i386 where THREAD_SIZE is 8K, the bound will be 128 bytes. The > pt_regs at the top of the stack is 68 bytes (with 0 to 16 bytes of > padding above), and so this covers an additional portion of 44 to 60 > bytes. > > * On x86_64 where THREAD_SIZE is at least 16K (up to 32K with KASAN) the > bound will be at least 256 bytes (up to 512 with KASAN). The pt_regs > at the top of the stack is 168 bytes, and so this cover an additional > 88 bytes of stack (up to 344 with KASAN). > > * On arm64 where THREAD_SIZE is at least 16K (up to 64K with 64K pages > and VMAP_STACK), the bound will be at least 256 bytes (up to 1024 with > KASAN). The pt_regs at the top of the stack is 336 bytes, so this can > fall within the pt_regs, or can cover an additional 688 bytes of > stack. > > Clearly the `THREAD_SIZE/64` value doesn't make much sense -- in the > worst case, this will cause more than 600 bytes of stack to be erased > for every syscall, even if actual stack usage were substantially > smaller. > > This patches makes this slightly less nonsensical by consistently > resetting current->lowest_stack to the base of the task pt_regs. For > clarity and for consistency with the handling of the low bound, the > generation of the high bound is split into a helper with commentary > explaining why. > > Since the pt_regs at the top of the stack will be clobbered upon the > next exception entry, we don't need to poison these at exception exit. > By using task_pt_regs() as the high stack boundary instead of > current_top_of_stack() we avoid some redundant poisoning, and the > compiler can share the address generation between the poisoning and > restting of `current->lowest_stack`, making the generated code more > optimal. > > It's not clear to me whether the existing `THREAD_SIZE/64` offset was a > dodgy heuristic to skip the pt_regs, or whether it was attempting to > minimize the number of times stackleak_check_stack() would have to > update `current->lowest_stack` when stack usage was shallow at the cost > of unconditionally poisoning a small portion of the stack for every exit > to userspace. I inherited this 'THREAD_SIZE/64' logic is from the original grsecurity patch. As I mentioned, originally this was written in asm. For x86_64: mov TASK_thread_sp0(%r11), %rdi sub $256, %rdi mov %rdi, TASK_lowest_stack(%r11) For x86_32: mov TASK_thread_sp0(%ebp), %edi sub $128, %edi mov %edi, TASK_lowest_stack(%ebp) 256 bytes for x86_64 and 128 bytes for x86_32 are exactly THREAD_SIZE/64. I think this value was chosen as optimal for minimizing poison scanning. It's possible that stackleak_track_stack() is not called during the syscall because all the called functions have small stack frames. > For now I've simply removed the offset, and if we need/want to minimize > updates for shallow stack usage it should be easy to add a better > heuristic atop, with appropriate commentary so we know what's going on. I like your idea to erase the thread stack up to pt_regs if we call the stackleak erasing from the trampoline stack. But here I don't understand where task_pt_regs() points to... > Signed-off-by: Mark Rutland > Cc: Alexander Popov > Cc: Andrew Morton > Cc: Andy Lutomirski > Cc: Kees Cook > --- > include/linux/stackleak.h | 14 ++++++++++++++ > kernel/stackleak.c | 19 ++++++++++++++----- > 2 files changed, 28 insertions(+), 5 deletions(-) > > diff --git a/include/linux/stackleak.h b/include/linux/stackleak.h > index 67430faa5c518..467661aeb4136 100644 > --- a/include/linux/stackleak.h > +++ b/include/linux/stackleak.h > @@ -28,6 +28,20 @@ stackleak_task_low_bound(const struct task_struct *tsk) > return (unsigned long)end_of_stack(tsk) + sizeof(unsigned long); > } > > +/* > + * The address immediately after the highest address on tsk's stack which we > + * can plausibly erase. > + */ > +static __always_inline unsigned long > +stackleak_task_high_bound(const struct task_struct *tsk) > +{ > + /* > + * The task's pt_regs lives at the top of the task stack and will be > + * overwritten by exception entry, so there's no need to erase them. > + */ > + return (unsigned long)task_pt_regs(tsk); > +} > + > static inline void stackleak_task_init(struct task_struct *t) > { > t->lowest_stack = stackleak_task_low_bound(t); > diff --git a/kernel/stackleak.c b/kernel/stackleak.c > index d5f684dc0a2d9..ba346d46218f5 100644 > --- a/kernel/stackleak.c > +++ b/kernel/stackleak.c > @@ -73,6 +73,7 @@ late_initcall(stackleak_sysctls_init); > static __always_inline void __stackleak_erase(void) > { > const unsigned long task_stack_low = stackleak_task_low_bound(current); > + const unsigned long task_stack_high = stackleak_task_high_bound(current); > unsigned long erase_low = current->lowest_stack; > unsigned long erase_high; > unsigned int poison_count = 0; > @@ -93,14 +94,22 @@ static __always_inline void __stackleak_erase(void) > #endif > > /* > - * Now write the poison value to the kernel stack between 'erase_low' > - * and 'erase_high'. We assume that the stack pointer doesn't change > - * when we write poison. > + * Write poison to the task's stack between 'erase_low' and > + * 'erase_high'. > + * > + * If we're running on a different stack (e.g. an entry trampoline > + * stack) we can erase everything below the pt_regs at the top of the > + * task stack. > + * > + * If we're running on the task stack itself, we must not clobber any > + * stack used by this function and its caller. We assume that this > + * function has a fixed-size stack frame, and the current stack pointer > + * doesn't change while we write poison. > */ > if (on_thread_stack()) > erase_high = current_stack_pointer; > else > - erase_high = current_top_of_stack(); > + erase_high = task_stack_high; > > while (erase_low < erase_high) { > *(unsigned long *)erase_low = STACKLEAK_POISON; > @@ -108,7 +117,7 @@ static __always_inline void __stackleak_erase(void) > } > > /* Reset the 'lowest_stack' value for the next syscall */ > - current->lowest_stack = current_top_of_stack() - THREAD_SIZE/64; > + current->lowest_stack = task_stack_high; > } > > asmlinkage void noinstr stackleak_erase(void)