Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp2293402pxb; Wed, 9 Feb 2022 15:22:48 -0800 (PST) X-Google-Smtp-Source: ABdhPJxLnUWrvnc0DCAkkhI964a9a16If+hg6doerDOeHyy9XUgrdT8/0qe1EqlmabnDz/FamnWk X-Received: by 2002:a63:5454:: with SMTP id e20mr3795977pgm.621.1644448968005; Wed, 09 Feb 2022 15:22:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644448968; cv=none; d=google.com; s=arc-20160816; b=iJieyiuQDSYcqgtshOZL2ZJ4gWJMNcWSw1dQSel4rDFyZQTpiNr9NKP9Z8IhB8ZmE3 V3ieZwdBZRksHzNLRWEojxEDmsnhCCETwXPoowR5dbkIe6/oxm50zU2MhKsqeyvB7XaI yg6xHUrfaN/y9Lg2XVUdmQFufl7DRaQRUUbodCYKIV95KR53qZoSbVCt69tJLuG2aLsV oQctXkR516bQ7GRympcYAvpNJmvhYIDgzayrG6CuXztcVkkyR9covhzpjKmiR08mfX2+ IPXtHneDNk6IXe9QyipUpzWjhPbp+EYZsiBUBC57mWpRrqjSLJB2P820u2QhZzPzmjYM 8Maw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :from:references:cc:to:content-language:user-agent:mime-version:date :message-id:dkim-signature; bh=4rqfhirsiAZhfOqCmiGmKnDKJzVYnDRbCODYcBHtPA0=; b=huln08BisLLLmgO8Rcaa9Y9n3YzUFFzyghLD7jXFu5fx3iB7/NbjQUqtDLvhVM04mK VKtSpEEYLNWyswB7bMltQorH5TdHVoMvi7/S4POvBVo27lV6zyeYymMiThYrI1/Bz7DX NthK82GNFld1iUNHvJI4qSLSHMPqCV0CQ285ohBZOuGuICY3xbBzQm0CBXjc2d7wUA3V 3QQ6mwbmd+JkQVM/rlZLoWBDTG2+Jj2cTHHy2zMGSkdwWdQfrBenYzeZJu3GtOv4xBJM NC6FKHHrzQhcc7TZ50FX1wsXxCfYIBPs4lINBlqDpGhd6QiZVGwulNKFnDcyyJj8Tdqh LMBQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=YGbc47AS; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id w19si20184594pfu.4.2022.02.09.15.22.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Feb 2022 15:22:47 -0800 (PST) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=YGbc47AS; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 877B9E05BA7A; Wed, 9 Feb 2022 15:18:27 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235524AbiBIWYL (ORCPT + 99 others); Wed, 9 Feb 2022 17:24:11 -0500 Received: from gmail-smtp-in.l.google.com ([23.128.96.19]:51816 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235498AbiBIWXz (ORCPT ); Wed, 9 Feb 2022 17:23:55 -0500 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C77F9E00E5B7; Wed, 9 Feb 2022 14:23:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1644445437; x=1675981437; h=message-id:date:mime-version:to:cc:references:from: subject:in-reply-to:content-transfer-encoding; bh=xkpFBXYQrPgtUiGsMKx5OqlH/pGeqV2L1Ykn/ZtyOSU=; b=YGbc47ASftPxIwFd2kxXZYK63tZmGW0thBn1KD+ynYIc434TWmP3Etu/ +PUeQ81/uJ+1y9DiTFHR7SC1vkGd1E8cjLkzdnrFGdkV4LdQmiHtnucTq wJEFJyOgAwrMJpPzqZ8+l6iH7L2Cb4EW+Goy6DHMSnbIh5aSpk6eYP52H KV3l7aLXuxBta92d/rdsD1EhWoYkgZe0vrVVt0mAZcw5+aDgbeate9uB4 tbTmeSajgeXcAn2R5tZLql2BGBnFlE59sn0oPYWW8KBvzSQRzHGN5JBU+ OWYqC79xyhUcfg9ih/Xp0owIgt/P7tLBgTIi1QH4B/H3P5GqxYdANOqIn w==; X-IronPort-AV: E=McAfee;i="6200,9189,10253"; a="312646845" X-IronPort-AV: E=Sophos;i="5.88,356,1635231600"; d="scan'208";a="312646845" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Feb 2022 14:23:57 -0800 X-IronPort-AV: E=Sophos;i="5.88,356,1635231600"; d="scan'208";a="585744181" Received: from sanvery-mobl.amr.corp.intel.com (HELO [10.212.232.139]) ([10.212.232.139]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Feb 2022 14:23:55 -0800 Message-ID: Date: Wed, 9 Feb 2022 14:23:51 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 Content-Language: en-US To: Rick Edgecombe , x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: Yu-cheng Yu References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> <20220130211838.8382-19-rick.p.edgecombe@intel.com> From: Dave Hansen Subject: Re: [PATCH 18/35] mm: Add guard pages around a shadow stack. In-Reply-To: <20220130211838.8382-19-rick.p.edgecombe@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 1/30/22 13:18, Rick Edgecombe wrote: > INCSSP(Q/D) increments shadow stack pointer and 'pops and discards' the > first and the last elements in the range, effectively touches those memory > areas. This is a pretty close copy of the instruction reference text for INCSSP. I'm feeling rather dense today, but that's just not making any sense. The pseudocode is more sensible in the SDM. I think this needs a better explanation: The INCSSP instruction increments the shadow stack pointer. It is the shadow stack analog of an instruction like: addq $0x80, %rsp However, there is one important difference between an ADD on %rsp and INCSSP. In addition to modifying SSP, INCSSP also reads from the memory of the first and last elements that were "popped". You can think of it as acting like this: READ_ONCE(ssp); // read+discard top element on stack ssp += nr_to_pop * 8; // move the shadow stack READ_ONCE(ssp-8); // read+discard last popped stack element > The maximum moving distance by INCSSPQ is 255 * 8 = 2040 bytes and > 255 * 4 = 1020 bytes by INCSSPD. Both ranges are far from PAGE_SIZE. ... That maximum distance, combined with an a guard pages at the end of a shadow stack ensures that INCSSP will fault before it is able to move across an entire guard page. > Thus, putting a gap page on both ends of a shadow stack prevents INCSSP, > CALL, and RET from going beyond. > > diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h > index a506a411474d..e1533fdc08b4 100644 > --- a/arch/x86/include/asm/page_types.h > +++ b/arch/x86/include/asm/page_types.h > @@ -73,6 +73,13 @@ bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn); > > extern void initmem_init(void); > > +#define vm_start_gap vm_start_gap > +struct vm_area_struct; > +extern unsigned long vm_start_gap(struct vm_area_struct *vma); > + > +#define vm_end_gap vm_end_gap > +extern unsigned long vm_end_gap(struct vm_area_struct *vma); > + > #endif /* !__ASSEMBLY__ */ > > #endif /* _ASM_X86_PAGE_DEFS_H */ > diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c > index f3f52c5e2fd6..81f9325084d3 100644 > --- a/arch/x86/mm/mmap.c > +++ b/arch/x86/mm/mmap.c > @@ -250,3 +250,49 @@ bool pfn_modify_allowed(unsigned long pfn, pgprot_t prot) > return false; > return true; > } > + > +/* > + * Shadow stack pointer is moved by CALL, RET, and INCSSP(Q/D). INCSSPQ > + * moves shadow stack pointer up to 255 * 8 = ~2 KB (~1KB for INCSSPD) and > + * touches the first and the last element in the range, which triggers a > + * page fault if the range is not in a shadow stack. Because of this, > + * creating 4-KB guard pages around a shadow stack prevents these > + * instructions from going beyond. > + */ > +#define SHADOW_STACK_GUARD_GAP PAGE_SIZE > + > +unsigned long vm_start_gap(struct vm_area_struct *vma) > +{ > + unsigned long vm_start = vma->vm_start; > + unsigned long gap = 0; > + > + if (vma->vm_flags & VM_GROWSDOWN) > + gap = stack_guard_gap; > + else if (vma->vm_flags & VM_SHADOW_STACK) > + gap = SHADOW_STACK_GUARD_GAP; > + > + if (gap != 0) { > + vm_start -= gap; > + if (vm_start > vma->vm_start) > + vm_start = 0; > + } > + return vm_start; > +} > + > +unsigned long vm_end_gap(struct vm_area_struct *vma) > +{ > + unsigned long vm_end = vma->vm_end; > + unsigned long gap = 0; > + > + if (vma->vm_flags & VM_GROWSUP) > + gap = stack_guard_gap; > + else if (vma->vm_flags & VM_SHADOW_STACK) > + gap = SHADOW_STACK_GUARD_GAP; > + > + if (gap != 0) { > + vm_end += gap; > + if (vm_end < vma->vm_end) > + vm_end = -PAGE_SIZE; > + } > + return vm_end; > +} First of all, __weak would be a lot better than these #ifdefs. Second, I have the same basic objection to this as the maybe_mkwrite() mess. This is a forked copy of the code. Instead of refactoring, it's just copied-pasted-and-#ifdef'd. Not so nice. Isn't this just a matter of overriding 'stack_guard_gap' for VM_SHADOW_STACK? Why don't we just do this: unsigned long stack_guard_gap(struct vm_area_struct *vma) { if (vma->vm_flags & VM_SHADOW_STACK) return SHADOW_STACK_GUARD_GAP; return __stack_guard_gap; } Or, worst-case if people don't want 2 easily compiled-out lines added to generic code, define: unsigned long __weak stack_guard_gap(struct vm_area_struct *vma) { return __stack_guard_gap; } in generic code, and put the top definition in arch/x86.