Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp3313893rwd; Mon, 22 May 2023 11:45:22 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ49TMV5yOoa9a7Y2urlN+rP6wZTOZlGeSroIO7/678FVeB9Vyx8i1ULCtJwhIFYb+XX+TSL X-Received: by 2002:a05:6a00:14c4:b0:647:e45f:1a49 with SMTP id w4-20020a056a0014c400b00647e45f1a49mr15263710pfu.4.1684781122373; Mon, 22 May 2023 11:45:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684781122; cv=none; d=google.com; s=arc-20160816; b=PmMgkexcGO9b0um3oLAN9MATDZ1zd+nwuYvEItjlxc79tbOGt7RlqOZ2lCLX/8C9Q6 c5nY+Fd/qXRloMT/y75kQDzEh3YkPi4nqhKwyD6ra7VY/tsa312lFMjrj0aqz2tDqXMF QXUFW7tYvjuO8j8FEb/XPoSjIbl9VQMdkvXF8NIayqfjvAPGCrpT1yEF2YClXrZ/amSt RUhlZ5NoGuhDMkkyEYPK160AsFHWaN/58Bl30aMFbj1ZSayDWPeMgzkmAoJOutRco5Q0 QGRXmkI+/hyAf/tORhidiAyU9W1MI9A83H4KJxfXT6/uCZD7CHnZI+BLKGF9SlOeuF2y shjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=ncyyvzCwaBiOGfdAK57eDSyGsVrCv17gynkqmLbIh7Q=; b=PpXs2s/sfBU/iSQ9A5aUMOc1T8c5z4wMWhce6amj0wmAupHGIa6IuaeZ6ObsmU7nb0 47hSKlpnZJI1fqJoZef/oV328Z7cw3uCeWDw84z1kmqMTX+U+PMmei7gpj+vZquTMD7d lchmqRGt1QlmsMEpMNqcu0+J7kmOL3c1/rGcO/pW6DmgqX8V/wExyrSdviWSdvEX/V9A b9KVfcLhZb5G6Mm++IiRYIQ9E/tm57jyCTD/zXBNCDHrwm5/nmmL6pS1TS9ZYRzeRWxI eKFLOHsaQ5yYAbJBdNjdY5P0HXRsQwCc3JfwnT9zDI/ww5RZkRcZqEd6Xkvt7H8GWwOV Yv6g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=cGPQFY6F; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j27-20020a633c1b000000b0053422305c20si235362pga.14.2023.05.22.11.45.01; Mon, 22 May 2023 11:45:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=cGPQFY6F; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232411AbjEVSen (ORCPT + 99 others); Mon, 22 May 2023 14:34:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46658 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229916AbjEVSel (ORCPT ); Mon, 22 May 2023 14:34:41 -0400 Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com [IPv6:2607:f8b0:4864:20::649]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 40C0DB7 for ; Mon, 22 May 2023 11:34:40 -0700 (PDT) Received: by mail-pl1-x649.google.com with SMTP id d9443c01a7336-1afba64045aso6533335ad.0 for ; Mon, 22 May 2023 11:34:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684780480; x=1687372480; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ncyyvzCwaBiOGfdAK57eDSyGsVrCv17gynkqmLbIh7Q=; b=cGPQFY6F+JsGLicyWmQ7+nx3BT8FDTky0Wpm3XMm47C67teffHRLLERrhhaXB78/CL ntljpUrXT68Ppi5/Br2bFbLfK6HZXVLGLYn6zwBMkZqCqOHJE8c6nTf4PHZT3kOdALRD WYT++yPJw8eQcaZW/JCptRL2KmXZBG59VSW+dbF+1yGNjbzEvGw3WCB6+aR/o2UoFfMD 3Q48kdKs3/9fpMluVeCEMRk7evs7OTjc7hyrDlbNEi7WL2d2onFS7my6C8fDbmseK36M m4pArVT+6V0srD5bk1rZT2WYLS9kqpm+kH9gzo69MoI63Q2Bq2DN4xM3T0ZgHxCuj8pL CMdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684780480; x=1687372480; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ncyyvzCwaBiOGfdAK57eDSyGsVrCv17gynkqmLbIh7Q=; b=lvmk5draHmfIaiG2n6+OBm5WUdBmVBU7BzPFX5rt4Kj2UH7GDKomhNRj4WkE4UV3Q7 S2qzJXe6cfekhK6t6weiPeLKhPyBwlSnEdT4IdNKyjqev4Bb8p8G9r3v4+4jRkNZDM9k sEHAUurlUKXLowOdcgXnqxDkFLbSfgaxnIUTwfbt3uoIc6GAQegqhcQacUu8nZLX95TH kjhj5Pqa2OApE+Gfr55JsVYI4Lf+kmQZwFlgtYwfG1+IPunn9BIT3SKTUxPA9+r+b6kF IEbsafCMIkkriVa1on2T6aGKonEDgi9EJKvymyEHTkj867JSBGvE4nyNjyQ4nQAa2kxM Jh+g== X-Gm-Message-State: AC+VfDynl7Rglkm9LfjiUyw9+XWA+iJq5LltLaMLTHF72uDp+Ze4QQoR 8JU6cbnj3xcOfeq8M1uBqVZn05dLUr0= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:cad2:b0:1ad:ec0b:42f9 with SMTP id y18-20020a170902cad200b001adec0b42f9mr2629780pld.6.1684780479772; Mon, 22 May 2023 11:34:39 -0700 (PDT) Date: Mon, 22 May 2023 11:34:38 -0700 In-Reply-To: <20230522063725.284686-1-apopple@nvidia.com> Mime-Version: 1.0 References: <20230522063725.284686-1-apopple@nvidia.com> Message-ID: Subject: Re: [PATCH] mmu_notifiers: Notify on pte permission upgrades From: Sean Christopherson To: Alistair Popple Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, robin.murphy@arm.com, will@kernel.org, nicolinc@nvidia.com, linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu, jgg@nvidia.com, John Hubbard Content-Type: text/plain; charset="us-ascii" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 22, 2023, Alistair Popple wrote: > Some architectures, specifically ARM and perhaps Sparc and IA64, > require TLB invalidates when upgrading pte permission from read-only > to read-write. > > The current mmu_notifier implementation assumes that upgrades do not > need notifications. Typically though mmu_notifiers are used to > implement TLB invalidations for secondary MMUs that comply with the > main CPU architecture. > > Therefore if the main CPU architecture requires an invalidation for > permission upgrade the secondary MMU will as well and an mmu_notifier > should be sent for the upgrade. > > Currently CPU invalidations for permission upgrade occur in > ptep_set_access_flags(). Unfortunately MMU notifiers cannot be called > directly from this architecture specific code as the notifier > callbacks can sleep, and ptep_set_access_flags() is usually called > whilst holding the PTL spinlock. Therefore add the notifier calls > after the PTL is dropped and only if the PTE actually changed. This > will allow secondary MMUs to obtain an updated PTE with appropriate > permissions. > > This problem was discovered during testing of an ARM SMMU > implementation that does not support broadcast TLB maintenance > (BTM). In this case the SMMU driver uses notifiers to issue TLB > invalidates. For read-only to read-write pte upgrades the SMMU > continually returned a read-only PTE to the device, even though the > CPU had a read-write PTE installed. > > Sending a mmu notifier event to the SMMU driver fixes the problem by > flushing secondary TLB entries. A new notifier event type is added so > drivers may filter out these invalidations if not required. Note a > driver should never upgrade or install a PTE in response to this mmu > notifier event as it is not synchronised against other PTE operations. > > Signed-off-by: Alistair Popple > --- > include/linux/mmu_notifier.h | 6 +++++ > mm/hugetlb.c | 24 ++++++++++++++++++- > mm/memory.c | 45 ++++++++++++++++++++++++++++++++++-- > 3 files changed, 72 insertions(+), 3 deletions(-) > > diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h > index d6c06e140277..f14d68f119d8 100644 > --- a/include/linux/mmu_notifier.h > +++ b/include/linux/mmu_notifier.h > @@ -31,6 +31,11 @@ struct mmu_interval_notifier; > * pages in the range so to mirror those changes the user must inspect the CPU > * page table (from the end callback). > * > + * @MMU_NOTIFY_PROTECTION_UPGRAGE: update is due to a change from read-only to > + * read-write for pages in the range. This must not be used to upgrade > + * permissions on secondary PTEs, rather it should only be used to invalidate > + * caches such as secondary TLBs that may cache old read-only entries. This is a poor fit for invalidate_range_{start,end}(). All other uses bookend the primary MMU update, i.e. call start() _before_ changing PTEs. The comments are somewhat stale as they talk only about "unmapped", but the contract between the primary MMU and the secondary MMU is otherwise quite clear on when the primary MMU will invoke start() and end(). * invalidate_range_start() is called when all pages in the * range are still mapped and have at least a refcount of one. * * invalidate_range_end() is called when all pages in the * range have been unmapped and the pages have been freed by * the VM. I'm also confused as to how this actually fixes ARM's SMMU. Unless I'm looking at the wrong SMMU implementation, the SMMU implemenents only invalidate_range(), not the start()/end() variants. static const struct mmu_notifier_ops arm_smmu_mmu_notifier_ops = { .invalidate_range = arm_smmu_mm_invalidate_range, .release = arm_smmu_mm_release, .free_notifier = arm_smmu_mmu_notifier_free, }; Again from include/linux/mmu_notifier.h, not implementing the start()/end() hooks is perfectly valid. And AFAICT, the existing invalidate_range() hook is pretty much a perfect fit for what you want to achieve. * If invalidate_range() is used to manage a non-CPU TLB with * shared page-tables, it not necessary to implement the * invalidate_range_start()/end() notifiers, as * invalidate_range() already catches the points in time when an * external TLB range needs to be flushed. For more in depth * discussion on this see Documentation/mm/mmu_notifier.rst Even worse, this change may silently regress performance for secondary MMUs that haven't yet taken advantage of the event type, e.g. KVM will zap all of KVM's PTEs in response to the upgrade, instead of waiting until the guest actually tries to utilize the new protections.