Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp4764096pxv; Tue, 6 Jul 2021 08:36:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyNjIBJhLVypq88ThmzwE84e/Y3IInYoX7EcPVIy/Ly7gSZjPByEen5AgDuP4G1UdQnPYi/ X-Received: by 2002:a92:ddc3:: with SMTP id d3mr15282960ilr.190.1625585766733; Tue, 06 Jul 2021 08:36:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1625585766; cv=none; d=google.com; s=arc-20160816; b=dELZywsm4RIlbIV5LgTI6yOLQbHqOOig+EI3wa9btdMKM3gqgQVuDLd/tO6khQ34W2 guxxjG+O5r5CYeoGriSTU+q76LPcPepUOM9B6EPgJAgIBdmmHzHJyOcKzjkWj8SVGphA TG+eQlX6JIDnoqZHSDXcM7mDFgJ2dm2uBckIu8n5B0Q8YMje1orl2pJlkO75VSAsNLVV fW65JvDC+4OeUkJ5K9UX10QQk4Q/oNrfHUERLBUqxtJOnRtXnuVfxPCRysdCj5OnjH88 SaLCO80yHsNvW+S3zdNEBGd1ljTgDj8E9X8kQg5uUgqIxkoqeclHTGOjr+Yv01L237yM 6I8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=a4kZkm/ribAVqSEiHP5Q3y5RoHp90rtHpwlN5Wsac70=; b=izQ0Hz+ycJxIWkvIVNYhTGwSVvcv4xNGMtmnrAnIyI8DlJAVS//tyOBeQLpw7fXL9v luhGMmpJ9NlR1NXg/+ZeY1Njq6xnWQgHDYC2Gxr9Qgd7N55l1mT/MuCDsh1q5hxcZtK+ zb8k4mHEo19gU+b5VHuGwBRihLx/8KLWEfM6s+U0ZdBEt6fF+yxkdx0dBErdV/+HYztC 4zrnQmxIXqIA7kMqYNZG/j+bPIqbewXiaiHeDHB0mKtFhlqpFYefeMsvdYcB+PNdmJIG gOcWurO33YDc4Qi7af646rCcBnTITKYQjY0/480QaSy6kG6fZpCWB37uY8avqX4VCTmp 0/Fw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=JTEbFPzl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u21si19175132jak.34.2021.07.06.08.35.54; Tue, 06 Jul 2021 08:36:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=JTEbFPzl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232594AbhGFPiC (ORCPT + 99 others); Tue, 6 Jul 2021 11:38:02 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:43252 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231758AbhGFPiB (ORCPT ); Tue, 6 Jul 2021 11:38:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625585722; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=a4kZkm/ribAVqSEiHP5Q3y5RoHp90rtHpwlN5Wsac70=; b=JTEbFPzlXVeCoi6aspMomWWTBp9NdfONZs/nD2Ap8ZEyvQH44vil3oYI4tED/f6zKU5KQQ gJnY9y+zJlyx/oZSiMjkDkX4XoGrE9vgtn6yBd2YnzUxazfN9jCcsasZB8Ad40Nwq7saYc Lsw5JXDz+O1nk2ghD+QzPsevC2noqes= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-347-4NK2BPy7O4mBqoBC--Zpew-1; Tue, 06 Jul 2021 11:35:21 -0400 X-MC-Unique: 4NK2BPy7O4mBqoBC--Zpew-1 Received: by mail-qt1-f199.google.com with SMTP id w3-20020ac80ec30000b029024e8c2383c1so11416021qti.5 for ; Tue, 06 Jul 2021 08:35:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=a4kZkm/ribAVqSEiHP5Q3y5RoHp90rtHpwlN5Wsac70=; b=JpLtluH1ufx6E3lDpqz5PPYrRLKYH8fxcIFEGd7Vx7R9WiD3F2LqIH5dcjppi2vGyu P1v33uUPFBHl+F1B661Few1VRss0XKwdMfWhxT+CLPEucXpH7684mchuYqbB/CjsAOtj jmetWuGhTFAMkyNWxZIhOcuAiQKiDvlf94GxnDthc6q3RypYfSdS1X5GeAV8rZbQ/TqT YNjlZVKP91xrvHxeutWWRBXPOGXmGL0gDmYhKKBaxgOPZljiiIxXc6XvxHCFslH2QuHb OtbaPr7AIUiFlbTFvA3PwZSCLgsTM7Jg8mt2HjseThxjTBL4AM/IamVzv+gtFKTLdqW1 AfZg== X-Gm-Message-State: AOAM531XkU5jlENockB2Df8HuMslwsaDUqZhfBCqWIP/2WXOAOBXdbOu s1/tgBcMWoZD+0GrFjNDv4IQAszxvBnlgrtv1d6RCO6wMX8EZwHU1vS96M9Qmgce/FHotbSRB2r Mpv1TdIXaRJwKI4w+kCc9GMaD X-Received: by 2002:a05:620a:e12:: with SMTP id y18mr20545977qkm.106.1625585720614; Tue, 06 Jul 2021 08:35:20 -0700 (PDT) X-Received: by 2002:a05:620a:e12:: with SMTP id y18mr20545942qkm.106.1625585720300; Tue, 06 Jul 2021 08:35:20 -0700 (PDT) Received: from t490s (bras-base-toroon474qw-grc-65-184-144-111-238.dsl.bell.ca. [184.144.111.238]) by smtp.gmail.com with ESMTPSA id l5sm7026642qkb.62.2021.07.06.08.35.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Jul 2021 08:35:19 -0700 (PDT) Date: Tue, 6 Jul 2021 11:35:18 -0400 From: Peter Xu To: Alistair Popple Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Mike Kravetz , "Kirill A . Shutemov" , Jason Gunthorpe , Hugh Dickins , Matthew Wilcox , Andrew Morton , Miaohe Lin , Jerome Glisse , Nadav Amit , Axel Rasmussen , Andrea Arcangeli , Mike Rapoport Subject: Re: [PATCH v3 11/27] shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed Message-ID: References: <20210527201927.29586-1-peterx@redhat.com> <1857347.At2d1zFpmm@nvdebian> <3895609.yFXQBJUcoq@nvdebian> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <3895609.yFXQBJUcoq@nvdebian> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 06, 2021 at 03:40:42PM +1000, Alistair Popple wrote: > > > > > > > > struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, > > > > > > > > pte_t pte); > > > > > > > > struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, > > > > > > > > diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h > > > > > > > > index 355ea1ee32bd..c29a6ef3a642 100644 > > > > > > > > --- a/include/linux/mm_inline.h > > > > > > > > +++ b/include/linux/mm_inline.h > > > > > > > > @@ -4,6 +4,8 @@ > > > > > > > > > > > > > > > > #include > > > > > > > > #include > > > > > > > > +#include > > > > > > > > +#include > > > > > > > > > > > > > > > > /** > > > > > > > > * page_is_file_lru - should the page be on a file LRU or anon LRU? > > > > > > > > @@ -104,4 +106,45 @@ static __always_inline void del_page_from_lru_list(struct page *page, > > > > > > > > update_lru_size(lruvec, page_lru(page), page_zonenum(page), > > > > > > > > -thp_nr_pages(page)); > > > > > > > > } > > > > > > > > + > > > > > > > > +/* > > > > > > > > + * If this pte is wr-protected by uffd-wp in any form, arm the special pte to > > > > > > > > + * replace a none pte. NOTE! This should only be called when *pte is already > > > > > > > > + * cleared so we will never accidentally replace something valuable. Meanwhile > > > > > > > > + * none pte also means we are not demoting the pte so if tlb flushed then we > > > > > > > > + * don't need to do it again; otherwise if tlb flush is postponed then it's > > > > > > > > + * even better. > > > > > > > > + * > > > > > > > > + * Must be called with pgtable lock held. > > > > > > > > + */ > > > > > > > > +static inline void > > > > > > > > +pte_install_uffd_wp_if_needed(struct vm_area_struct *vma, unsigned long addr, > > > > > > > > + pte_t *pte, pte_t pteval) > > > > > > > > +{ > > > > > > > > +#ifdef CONFIG_USERFAULTFD > > > > > > > > + bool arm_uffd_pte = false; > > > > > > > > + > > > > > > > > + /* The current status of the pte should be "cleared" before calling */ > > > > > > > > + WARN_ON_ONCE(!pte_none(*pte)); > > > > > > > > + > > > > > > > > + if (vma_is_anonymous(vma)) > > > > > > > > + return; > > > > > > > > + > > > > > > > > + /* A uffd-wp wr-protected normal pte */ > > > > > > > > + if (unlikely(pte_present(pteval) && pte_uffd_wp(pteval))) > > > > > > > > + arm_uffd_pte = true; > > > > > > > > + > > > > > > > > + /* > > > > > > > > + * A uffd-wp wr-protected swap pte. Note: this should even work for > > > > > > > > + * pte_swp_uffd_wp_special() too. > > > > > > > > + */ > > > > > > > > > > > > > > I'm probably missing something but when can we actually have this case and why > > > > > > > would we want to leave a special pte behind? From what I can tell this is > > > > > > > called from try_to_unmap_one() where this won't be true or from zap_pte_range() > > > > > > > when not skipping swap pages. > > > > > > > > > > > > Yes this is a good question.. > > > > > > > > > > > > Initially I made this function make sure I cover all forms of uffd-wp bit, that > > > > > > contains both swap and present ptes; imho that's pretty safe. However for > > > > > > !anonymous cases we don't keep swap entry inside pte even if swapped out, as > > > > > > they should reside in shmem page cache indeed. The only missing piece seems to > > > > > > be the device private entries as you also spotted below. > > > > > > > > > > Yes, I think it's *probably* safe although I don't yet have a strong opinion > > > > > here ... > > > > > > > > > > > > > + if (unlikely(is_swap_pte(pteval) && pte_swp_uffd_wp(pteval))) > > > > > > > > > > ... however if this can never happen would a WARN_ON() be better? It would also > > > > > mean you could remove arm_uffd_pte. > > > > > > > > Hmm, after a second thought I think we can't make it a WARN_ON_ONCE().. this > > > > can still be useful for private mapping of shmem files: in that case we'll have > > > > swap entry stored in pte not page cache, so after page reclaim it will contain > > > > a valid swap entry, while it's still "!anonymous". [1] > > > > > > There's something (probably obvious) I must still be missing here. During > > > reclaim won't a private shmem mapping still have a present pteval here? > > > Therefore it won't trigger this case - the uffd wp bit is set when the swap > > > entry is established further down in try_to_unmap_one() right? > > > > I agree if it's at the point when it get reclaimed, however what if we zap a > > pte of a page already got reclaimed? It should have the swap pte installed, > > imho, which will have "is_swap_pte(pteval) && pte_swp_uffd_wp(pteval)"==true. > > Apologies for the delay getting back to this, I hope to find some more time > to look at this again this week. No problem, please take your time on reviewing the series. > > I guess what I am missing is why we care about a swap pte for a reclaimed page > getting zapped. I thought that would imply the mapping was getting torn down, > although I suppose in that case you still want the uffd-wp to apply in case a > new mapping appears there? For the torn down case it'll always have ZAP_FLAG_DROP_FILE_UFFD_WP set, so pte_install_uffd_wp_if_needed() won't be called, as zap_drop_file_uffd_wp() will return true: static inline void zap_install_uffd_wp_if_needed(struct vm_area_struct *vma, unsigned long addr, pte_t *pte, struct zap_details *details, pte_t pteval) { if (zap_drop_file_uffd_wp(details)) return; pte_install_uffd_wp_if_needed(vma, addr, pte, pteval); } If you see it's non-trivial to fully digest all the caller stacks of it. What I wanted to do with pte_install_uffd_wp_if_needed is simply to provide a helper that can convert any form of uffd-wp ptes into a pte marker before being set as none pte. Since uffd-wp can exist in two forms (either present, or swap), then cover all these two forms (and for swap form also cover the uffd-wp special pte itself) is very clear idea and easy to understand to me. I don't even need to worry about who is calling it, and which case can be swap pte, which case must not - we just call it when we want to persist the uffd-wp bit (after a pte got cleared). That's why in all cases I still prefer to keep it as is, as it just makes things straightforward to me. Thanks, -- Peter Xu