Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1934154imm; Thu, 19 Jul 2018 10:10:47 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdSsVozcV9tymXvDrtWZ47vWMYRLejeEnjRL6r/HvznaA0pPJn+BrNDRE4CLZV05hjh5U0A X-Received: by 2002:a63:4d47:: with SMTP id n7-v6mr10792624pgl.270.1532020247842; Thu, 19 Jul 2018 10:10:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532020247; cv=none; d=google.com; s=arc-20160816; b=kDTKzKZSLnzYMdvfI4sfJaTRDmuSm2yR3q4rjjxmi1t290dTgo974LFCig4fIEFYMG ruhZ/iQeMux6cqr06OaH/aRXWHaDOnukSktLMAV+Ve4CWBasIiRx4LkLD/tlDtu5G5P2 s73ZdBiq+bTeIk8qN02X/0S0TV+eAyHRL/nRf0QApEncMwO77QDKM3rpbTovwUwC4Pb6 0vnEZh72alUB2V8b/ddSY3t+eXJOuxGrI0De4uMan1HxWqYhMbVJS9A6PeOTPzqhXpW3 InvwVjdFMS1GaRB+wCmv37pjRNVu/QnNy9+kf5spqhHlxj7YXrSrsH5lxIwaNnO/+rwb HksA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:to:from:subject:message-id :arc-authentication-results; bh=IIe6ZRXOGqkJQamu2QV/GWZKHWsIdBllIbIWF5QUMRM=; b=BN9I988utLhYq8mNhDMA+7fP0XWQphc2yAPfJof5myuIi53TERRq1m1PmWFL8L1d7Q VZTvNvHmEUb8Tuk1nfA28XrZeRuHKeWXwgVO3DjSGqzvAX4Y5RrjBpu8VvqeO703M/LO N7UdVln0aaxks6rN4+Paabg2lTSntLAiUwOyLZgubai65TxaDPiTHDkHg6l/FXziWwyZ XZNXSTt5lbjL3tWlWQic6mvpKxg2UuuLFAWfIjQPK4CrByeTurGFvVOqXbhkn3f8VThS ybGa2dbrY8dIQ1vHchGOAQAbyL3sqWDj2qFqnn04t5mj7zQokzfIOyfb92gAObiCimgz oQ1Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v25-v6si6459756pgk.555.2018.07.19.10.10.33; Thu, 19 Jul 2018 10:10:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732093AbeGSRyA (ORCPT + 99 others); Thu, 19 Jul 2018 13:54:00 -0400 Received: from mga04.intel.com ([192.55.52.120]:31604 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732050AbeGSRyA (ORCPT ); Thu, 19 Jul 2018 13:54:00 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Jul 2018 10:09:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,375,1526367600"; d="scan'208";a="74165176" Received: from 2b52.sc.intel.com ([143.183.136.146]) by orsmga001.jf.intel.com with ESMTP; 19 Jul 2018 10:09:53 -0700 Message-ID: <1532019963.16711.61.camel@intel.com> Subject: Re: [RFC PATCH v2 16/27] mm: Modify can_follow_write_pte/pmd for shadow stack From: Yu-cheng Yu To: Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Cyrill Gorcunov , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , "Ravi V. Shankar" , Vedvyas Shanbhogue Date: Thu, 19 Jul 2018 10:06:03 -0700 In-Reply-To: References: <20180710222639.8241-1-yu-cheng.yu@intel.com> <20180710222639.8241-17-yu-cheng.yu@intel.com> <1531328731.15351.3.camel@intel.com> <45a85b01-e005-8cb6-af96-b23ce9b5fca7@linux.intel.com> <1531868610.3541.21.camel@intel.com> <1531944882.10738.1.camel@intel.com> <3f158401-f0b6-7bf7-48ab-2958354b28ad@linux.intel.com> <1531955428.12385.30.camel@intel.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.18.5.2-0ubuntu3.2 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2018-07-18 at 17:06 -0700, Dave Hansen wrote: > > > > > > > > > > > > > -static inline bool can_follow_write_pte(pte_t pte, unsigned > > > > int flags) > > > > +static inline bool can_follow_write(pte_t pte, unsigned int > > > > flags, > > > > +     struct vm_area_struct > > > > *vma) > > > >  { > > > > - return pte_write(pte) || > > > > - ((flags & FOLL_FORCE) && (flags & FOLL_COW) > > > > && pte_dirty(pte)); > > > > + if (!is_shstk_mapping(vma->vm_flags)) { > > > > + if (pte_write(pte)) > > > > + return true; > > > Let me see if I can say this another way. > > > > > > The bigger issue is that these patches change the semantics of > > > pte_write().  Before these patches, it meant that you *MUST* > > > have this > > > bit set to write to the page controlled by the PTE.  Now, it > > > means: you > > > can write if this bit is set *OR* the shadowstack bit > > > combination is set. > > Here, we only figure out (1) if the page is pointed by a writable > > PTE; or > > (2) if the page is pointed by a RO PTE (data or SHSTK) and it has > > been > > copied and it still exists.  We are not trying to > > determine if the > > SHSTK PTE is writable (we know it is not). > Please think about the big picture.  I'm not just talking about this > patch, but about every use of pte_write() in the kernel. > > > > > > > > > That's the fundamental problem.  We need some code in the kernel > > > that > > > logically represents the concept of "is this PTE a shadowstack > > > PTE or a > > > PTE with the write bit set", and we will call that pte_write(), > > > or maybe > > > pte_writable(). > > > > > > You *have* to somehow rectify this situation.  We can absolutely > > > no > > > leave pte_write() in its current, ambiguous state where it has > > > no real > > > meaning or where it is used to mean _both_ things depending on > > > context. > > True, the processor can always write to a page through a shadow > > stack > > PTE, but it must do that with a CALL instruction.  Can we define > > a  > > write operation as: MOV r1, *(r2).  Then we don't have any doubt > > on > > pte_write() any more. > No, we can't just move the target. :) > > You can define it this way, but then you also need to go to every > spot > in the kernel that calls pte_write() (and _PAGE_RW in fact) and > audit it > to ensure it means "mov ..." and not push. Which pte_write() do you think is right? bool is_shstk_pte(pte) { return (_PAGE_RW not set) && (_PAGE_DIRTY_HW set); } int pte_write_1(pte) { return (_PAGE_RW set) && !is_shstk_pte(pte); } int pte_write_2(pte) { return (_PAGE_RW set) || is_shstk_pte(pte); } Yu-cheng