Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp35990039rwd; Mon, 10 Jul 2023 16:10:44 -0700 (PDT) X-Google-Smtp-Source: APBJJlEdszEgHeQexE3nMkWPCn6OxSWXwC9EEvLrEg1ivM4DChMSQKQdudrbxcMbEtm+aHSQoHGj X-Received: by 2002:a05:6a00:234d:b0:676:399f:346b with SMTP id j13-20020a056a00234d00b00676399f346bmr15540868pfj.1.1689030644458; Mon, 10 Jul 2023 16:10:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689030644; cv=none; d=google.com; s=arc-20160816; b=h7Kvm53wJsIgPYpp9Bh5USGBFFriO0jXTIpDL4oyrRpvLef5fyA9lQkMxkTmT8K8vB 0uXdS4d/ZFgNc7+PcOH3BovLJmCtcHZDq9eBIIyXBOX6SVA521mHVZJHdVrdnySq0rO5 g5hUWEccqnUG8/0fF/INCYCWNHLj8TH35c//a/diCO6ZsL/SeGhtS+Cg/JvoudjdMFOf ti5OaGdf5kv7+XXnKttxvzF+LWIOzbbobloFCVUI7hmEDES8w7bfsTSbSdXUInf76Vba ANEi8MeU89lf9bU3WJ158444jrJo1EB/uud4j0Xba30UlqSDRxqt+tOS9BNAtUlhUxtn iMaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=QfZzCSrgvGz9BGtS4wsvtCeE34B7/lj4xCbe4JnIP2Q=; fh=1SQAG30xfrpHPUlSiWGJ8RXbXVqt7lVKN7YMZT+bjck=; b=SeaNM9oAOc6C4k2T1V/gmcUMRHyHqTtzFs7ch6LhxvqZMoGVE3Nyq+q5iHEVvFGevG vacgb1rLmVILC8X1ULt7RWT8b/1CkulcZEdHS3h/DEcJaBbEQtMpX6gwxqn/s6pdfuL5 OWQx22/O6gUW2sBlzTVNV4FMUShTyZd5Mrmpb1sIFWd0IN6M6g1mvQUtvcxNEb5xC+eZ SO5OrQRcDdXCylDMxYAjK4O/FTGiUVVRYk+kSYD1U74j1RJDQxvyMkkqu1jx/ZiRRdvr ZP2BCI6J9jTKxSaUU3vJXcbwldkqK6TjfAl21JFUZSBaBNW5AWuWUWwnkTv50mTtDyBO tGaw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=IELdvlh9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o10-20020a056a0015ca00b0068208d19b23si422976pfu.311.2023.07.10.16.10.32; Mon, 10 Jul 2023 16:10:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=IELdvlh9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230052AbjGJWAg (ORCPT + 99 others); Mon, 10 Jul 2023 18:00:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49302 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229669AbjGJWAe (ORCPT ); Mon, 10 Jul 2023 18:00:34 -0400 Received: from mail-ej1-x632.google.com (mail-ej1-x632.google.com [IPv6:2a00:1450:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 47D77DB for ; Mon, 10 Jul 2023 15:00:33 -0700 (PDT) Received: by mail-ej1-x632.google.com with SMTP id a640c23a62f3a-9939fbb7191so995847366b.0 for ; Mon, 10 Jul 2023 15:00:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689026432; x=1691618432; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=QfZzCSrgvGz9BGtS4wsvtCeE34B7/lj4xCbe4JnIP2Q=; b=IELdvlh9d5ofdqTbS2z8RhDc+X1hOJTuBZf0ZbiY48jlt9lPfkJVZe/Wb+ANJ63yBp 3tbyTl8zoOZLeyBV+rFCAm3x940PyUDWPt8floL10TItqyH+FrB/lAGZJtOMKZtZt+LE AXebzx74RigJZLsDEDj+71KrNZ7AIkM3jSIZhPTcrbtV9iT+zovNNmAfibmyJQAcAG3s gar9gMyoH8kJeZoRZEbQRzmJOZpL3w6TvRyJ05iXZGAzAKumLDVIkbWrwobqCp/EnIhf WQ1hrqXvH6LHXODojzsxf3hq+oeedejJX94Hat/Qve2IHSIsFzVJsuq7ZMf2QnUCPaxw LZVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689026432; x=1691618432; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QfZzCSrgvGz9BGtS4wsvtCeE34B7/lj4xCbe4JnIP2Q=; b=Hc5iC/Tj1uxg9anbjU7cLHfLHILJ30sQZ92DIlEfdTpHTH+w/EL3LCkk4fRe4LqGfg eGjAOlCg+SHqKfAK7xQ34bGVzddyPPnbbpRwlHvSGPvKpgusGjNbYpdTx2seUD8ikcNW 01NQoBJ0pzMc4Y0efmE9dLQXXgj1lyHbrw/YsR1RLyE62b3bpQqT6l+D88Bhdnu3ihaY +eoLjEV6R2hpwsU2pt9gzvz1zoBkujy9A6JEavQhwmgyanOGlyAYorRrfbCYbVr7OvOX D4hLdONFU/1EMGvNwo5GRXtikgG6amSiJoU870SLbyRg01/G6N34efl5GDGd+CUWmubD 7KPQ== X-Gm-Message-State: ABy/qLb95bRRDly03gcREdgf3wtZexzZbzceYIW66UwDwlpKuuGnmx1W 1iFmvL4R92kZ7ZoQn4t67iZSAgSnfzie5eC0Gajs5A== X-Received: by 2002:a17:907:c29:b0:993:e85c:4ad6 with SMTP id ga41-20020a1709070c2900b00993e85c4ad6mr13335087ejc.7.1689026431632; Mon, 10 Jul 2023 15:00:31 -0700 (PDT) MIME-Version: 1.0 References: <20230707215540.2324998-1-axelrasmussen@google.com> <20230707215540.2324998-2-axelrasmussen@google.com> <20230708180850.bc938ab49fbfb38b83c367c8@linux-foundation.org> In-Reply-To: From: Axel Rasmussen Date: Mon, 10 Jul 2023 14:59:55 -0700 Message-ID: Subject: Re: [PATCH v4 1/8] mm: make PTE_MARKER_SWAPIN_ERROR more general To: Andrew Morton Cc: Alexander Viro , Brian Geffon , Christian Brauner , David Hildenbrand , Gaosheng Cui , Huang Ying , Hugh Dickins , James Houghton , "Jan Alexander Steffens (heftig)" , Jiaqi Yan , Jonathan Corbet , Kefeng Wang , "Liam R. Howlett" , Miaohe Lin , Mike Kravetz , "Mike Rapoport (IBM)" , Muchun Song , Nadav Amit , Naoya Horiguchi , Peter Xu , Ryan Roberts , Shuah Khan , Suleiman Souhlal , Suren Baghdasaryan , "T.J. Alumbaugh" , Yu Zhao , ZhangPeng , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 10, 2023 at 10:19=E2=80=AFAM Axel Rasmussen wrote: > > On Sat, Jul 8, 2023 at 6:08=E2=80=AFPM Andrew Morton wrote: > > > > On Fri, 7 Jul 2023 14:55:33 -0700 Axel Rasmussen wrote: > > > > > Future patches will re-use PTE_MARKER_SWAPIN_ERROR to implement > > > UFFDIO_POISON, so make some various preparations for that: > > > > > > First, rename it to just PTE_MARKER_POISONED. The "SWAPIN" can be > > > confusing since we're going to re-use it for something not really > > > related to swap. This can be particularly confusing for things like > > > hugetlbfs, which doesn't support swap whatsoever. Also rename some > > > various helper functions. > > > > > > Next, fix pte marker copying for hugetlbfs. Previously, it would WARN= on > > > seeing a PTE_MARKER_SWAPIN_ERROR, since hugetlbfs doesn't support swa= p. > > > But, since we're going to re-use it, we want it to go ahead and copy = it > > > just like non-hugetlbfs memory does today. Since the code to do this = is > > > more complicated now, pull it out into a helper which can be re-used = in > > > both places. While we're at it, also make it slightly more explicit i= n > > > its handling of e.g. uffd wp markers. > > > > > > For non-hugetlbfs page faults, instead of returning VM_FAULT_SIGBUS f= or > > > an error entry, return VM_FAULT_HWPOISON. For most cases this change > > > doesn't matter, e.g. a userspace program would receive a SIGBUS eithe= r > > > way. But for UFFDIO_POISON, this change will let KVM guests get an MC= E > > > out of the box, instead of giving a SIGBUS to the hypervisor and > > > requiring it to somehow inject an MCE. > > > > > > Finally, for hugetlbfs faults, handle PTE_MARKER_POISONED, and return > > > VM_FAULT_HWPOISON_LARGE in such cases. Note that this can't happen to= day > > > because the lack of swap support means we'll never end up with such a > > > PTE anyway, but this behavior will be needed once such entries *can* > > > show up via UFFDIO_POISON. > > > > > > --- a/include/linux/mm_inline.h > > > +++ b/include/linux/mm_inline.h > > > @@ -523,6 +523,25 @@ static inline bool mm_tlb_flush_nested(struct mm= _struct *mm) > > > return atomic_read(&mm->tlb_flush_pending) > 1; > > > } > > > > > > +/* > > > + * Computes the pte marker to copy from the given source entry into = dst_vma. > > > + * If no marker should be copied, returns 0. > > > + * The caller should insert a new pte created with make_pte_marker()= . > > > + */ > > > +static inline pte_marker copy_pte_marker( > > > + swp_entry_t entry, struct vm_area_struct *dst_vma) > > > +{ > > > + pte_marker srcm =3D pte_marker_get(entry); > > > + /* Always copy error entries. */ > > > + pte_marker dstm =3D srcm & PTE_MARKER_POISONED; > > > + > > > + /* Only copy PTE markers if UFFD register matches. */ > > > + if ((srcm & PTE_MARKER_UFFD_WP) && userfaultfd_wp(dst_vma)) > > > + dstm |=3D PTE_MARKER_UFFD_WP; > > > + > > > + return dstm; > > > +} > > > > Breaks the build with CONFIG_MMU=3Dn (arm allnoconfig). pte_marker isn= 't > > defined. > > > > I'll slap #ifdef CONFIG_MMU around this function, but probably somethng= more > > fine-grained could be used, like CONFIG_PTE_MARKER_UFFD_WP. Please > > consider. > > Whoops, sorry about this. This function "ought" to be in > include/linux/swapops.h where it would be inside a #ifdef CONFIG_MMU > anyway, but it can't be because it uses userfaultfd_wp() so there'd be > a circular include. I think just wrapping it in CONFIG_MMU is the > right way. > > But, this has also made me realize we need to not advertise > UFFDIO_POISON as supported unless we have CONFIG_MMU. I don't want > HAVE_ARCH_USERFAULTFD_WP for that, because it's only enabled on > x86_64, whereas I want to support at least arm64 as well. I don't see > a strong reason not to just use CONFIG_MMU for this too; this feature > depends on the API in swapops.h, which uses that ifdef, so I don't see > a lot of value out of creating a new but equivalent config option. Actually, I'm being silly. CONFIG_USERFAULTFD depends on CONFIG_MMU, so we don't need to worry about most of this. Andrew's fix to just wrap the helper in CONFIG_MMU is enough. > > I'll make the needed changes (and also address Peter's comment above) > and send out a v5. > > > > > btw, both copy_pte_marker() and pte_install_uffd_wp_if_needed() look > > far too large to justify inlining. Please review the desirability of > > this. As far as inlining goes, I'm not opposed to un-inlining this, I was mainly copying that pattern from existing helpers in swapops.h. One question is, if it weren't inline, where should it go? There is no mm/swapops.c which I would say is otherwise the proper place for it. I don't see any other good place for the functions to go. The one I'm introducing isn't userfaultfd-specific so userfaultfd.c seems wrong. > > > >