Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp2838611rwb; Thu, 29 Sep 2022 16:07:55 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4D98WFIUWVmAsuq1L6VXbN8zFvehD3b9uu42JZpCUl4JRu/cD3B1ma1rC2q2UXQgDslgvT X-Received: by 2002:a05:6a00:1a94:b0:548:8ce8:db93 with SMTP id e20-20020a056a001a9400b005488ce8db93mr5924893pfv.13.1664492874870; Thu, 29 Sep 2022 16:07:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664492874; cv=none; d=google.com; s=arc-20160816; b=GpGOKcVDPl5MrfSUfe5ukhiZOu5WlOLr8tP8/Pbfr/LDadC+hF7f7X/XpxaywS58he wp40QwgSF2V4YpZFKqujXIQXc3EIx1+TWa6at9FPSqx1tFfZzrX+ATHiZ1Somm3DwlZR 0mM5xe31d/wV+pR5CM0tkfAcTDDKKqKIVaHEhPMe7zpL6G/rgy83FiiJV7gCShzJuYJc LmD9xBUvWfLwzkUUvLB+rfjAk5FQcqclAFUBzAz0Qjf6N0uKgO/OApOBBv850Cfl/jil 9tSuJyaaSiB42XZXnpSWLnVVwcKAxZeub621/gkHZdPpJFjpaGwylsO8zHMEuXKMufgy QsRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=T5B82Q/4J9Kk/ESzzD5k/poEJjVrxDvqOShCeWflLFA=; b=jSTrlRU8x2J0kpV9SHFa33s2lwyfRjReUAP+u2Qfoh4ACY8UhlHCOehyMBBxUX5nK4 FEkkawx188w9wA+aauAvfACKdhcx0pA81B32wLY7s6cxALNOYkQSUyB4wapni4JbEhqz +2ux2wZuD+92MZ+kWpm78LkiLGfzHUG56nFni7iiupNw3eUnRja1QZTfcn5JEmBrgbz7 51szNfaO1cR00cJoz44VAtB+V9j+kpIZPFfXmPg5LExhe6DiysQYrOeq3EyEeNxrk380 Vpu5ElhYxrY/nVHCBMWnfWjV23wEvKD+XP6Y0uRbcv/K29rPuY/1waBOLeRuprSzvaLS jsqw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=VFA3sPNw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n10-20020a170902f60a00b0017890fc4784si1047535plg.349.2022.09.29.16.07.41; Thu, 29 Sep 2022 16:07:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=VFA3sPNw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230112AbiI2Wcj (ORCPT + 99 others); Thu, 29 Sep 2022 18:32:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230035AbiI2WcR (ORCPT ); Thu, 29 Sep 2022 18:32:17 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9DDF96715B; Thu, 29 Sep 2022 15:30:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1664490634; x=1696026634; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=kQODslNH1nloZ+Qn0eQNHtmSgsYUq5sI4TpgOnIwj+g=; b=VFA3sPNw8F0kGIbOV58jE1oeBRmobHUOtLDI7mt+iDHpb1QqaaxRQ276 6LzGaO7UrxINzJuriymqEJ8Oo9K2XGf7Yh/7oGzHvkC6KFThFUgEVrkde /bSGWcXYSOH4wK0E6300WmsCsFEkjO+prw2WksHM/nC8rDkzg2PQi3kYN He+pNxF6Fm4Pp26KOBVGRneejLMhV4mXdg+Uay1QMo/WJT7D9qzCVjtj8 PYKe2lnTVby2lx3a731hj2X/SogeAnNfgrxbRvtJsHMbMiByepYX/vfGX 0NlcGZMZbfRwLVtyRBTRoaM/RacAf0KRK/7bwtu6YHgvMuV4tcmIILwWX A==; X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="285181967" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="285181967" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:24 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10485"; a="691016227" X-IronPort-AV: E=Sophos;i="5.93,356,1654585200"; d="scan'208";a="691016227" Received: from sergungo-mobl.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.251.25.88]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2022 15:30:18 -0700 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu Subject: [PATCH v2 17/39] mm: Fixup places that call pte_mkwrite() directly Date: Thu, 29 Sep 2022 15:29:14 -0700 Message-Id: <20220929222936.14584-18-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220929222936.14584-1-rick.p.edgecombe@intel.com> References: <20220929222936.14584-1-rick.p.edgecombe@intel.com> X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Yu-cheng Yu With the introduction of shadow stack memory there are two ways a pte can be writable: regular writable memory and shadow stack memory. In past patches, maybe_mkwrite() has been updated to apply pte_mkwrite() or pte_mkwrite_shstk() depending on the VMA flag. This covers most cases where a PTE is made writable. However, there are places where pte_mkwrite() is called directly and the logic should now also create a shadow stack PTE in the case of a shadow stack VMA. - do_anonymous_page() and migrate_vma_insert_page() check VM_WRITE directly and call pte_mkwrite(), which is the same as maybe_mkwrite() in logic and intention. Just change them to maybe_mkwrite(). - When userfaultfd is creating a PTE after userspace handles the fault it calls pte_mkwrite() directly. Teach it about pte_mkwrite_shstk() In other cases where pte_mkwrite() is called directly, the VMA will not be VM_SHADOW_STACK, and so shadow stack memory should not be created. - In the case of pte_savedwrite(), shadow stack VMA's are excluded. - In the case of the "dirty_accountable" optimization in mprotect(), shadow stack VMA's won't be VM_SHARED, so it is not nessary. Signed-off-by: Yu-cheng Yu Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Cc: Kees Cook --- v2: - Updated commit log with comment's from Dave Hansen - Dave also suggested (I understood) to maybe tweak vm_get_page_prot() to avoid having to call maybe_mkwrite(). After playing around with this I opted to *not* do this. Shadow stack memory memory is effectively writable, so having the default permissions be writable ended up mapping the zero page as writable and other surprises. So creating shadow stack memory needs to be done with manual logic like pte_mkwrite(). - Drop change in change_pte_range() because it couldn't actually trigger for shadow stack VMAs. - Clarify reasoning for skipped cases of pte_mkwrite(). Yu-cheng v25: - Apply same changes to do_huge_pmd_numa_page() as to do_numa_page(). mm/migrate_device.c | 3 +-- mm/userfaultfd.c | 10 +++++++--- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 27fb37d65476..eba3164736b3 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -606,8 +606,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, goto abort; } entry = mk_pte(page, vma->vm_page_prot); - if (vma->vm_flags & VM_WRITE) - entry = pte_mkwrite(pte_mkdirty(entry)); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 7327b2573f7c..b49372c7de41 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -63,6 +63,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, int ret; pte_t _dst_pte, *dst_pte; bool writable = dst_vma->vm_flags & VM_WRITE; + bool shstk = dst_vma->vm_flags & VM_SHADOW_STACK; bool vm_shared = dst_vma->vm_flags & VM_SHARED; bool page_in_cache = page->mapping; spinlock_t *ptl; @@ -83,9 +84,12 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, writable = false; } - if (writable) - _dst_pte = pte_mkwrite(_dst_pte); - else + if (writable) { + if (shstk) + _dst_pte = pte_mkwrite_shstk(_dst_pte); + else + _dst_pte = pte_mkwrite(_dst_pte); + } else /* * We need this to make sure write bit removed; as mk_pte() * could return a pte with write bit set. -- 2.17.1