Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp3042083rwi; Fri, 21 Oct 2022 10:45:31 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6u0jMvb7DBuJ1ISETlvatq/ou+8S35N006qLwBnFM/PQtfHjqIWbexZaD+xVkdSEJ52E2K X-Received: by 2002:a17:90a:9381:b0:20a:79b7:766a with SMTP id q1-20020a17090a938100b0020a79b7766amr60319499pjo.33.1666374321400; Fri, 21 Oct 2022 10:45:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666374321; cv=none; d=google.com; s=arc-20160816; b=0ebuvYq6vMfX8jZkX2iYejtCQ5LEgzcgnw2E+rIi4iouO2rpjCAsarU8vWf3dsDYIE 9N4IbrKxRb4r13A/PsGqNvxG4JiJ02NuTh0Kxn7EKwz6o/g6sIvHCrDnkGAtx2R8WkXo RsmRIOF8zGTa9wVLZTiNgMkW2TCILJtj7c4zRT3WfbZpjcmyaHlSgO0YCuRv+ShzwPA1 81H3/2jwDY44t3Cem6HLoG0+M6urcoaUVG8/Gs+0QycT/75Boe0PvdOYMWdDtcfiCDgQ 3TANIL3r6gH9Z4H+ewZV5Drb1XTHYzAV8sKTkDI9n3PYGtVxDISTA6wUfF5jwXOQyNEP iJPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:subject:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:cc:to:from:dkim-signature; bh=OC676xm4n+4quUWve6+HSNFg0hwcXHADJsbuIB5KOEM=; b=Z2FkW1wVJJ1U/9Bt7y3l5x8pY++eoEUMx+WsR8Qsx/PjZ1AdtlfjKxHUBHcuYVA2Pf zNXvPQFUMveYq3JQqZvN2F6wxCNg802oSFECPZSfT0K8uILub5j7IxHsNhAtBejlyosZ Mw4PDVanSCmqP/dfyl3Zz21oQbemP35y3mJqj4pf7APYgX4E1pNFOSr263p/yaYBlxFf FZE4ln7bgQn9amTY2XY1Bm+61VMmig1UDOoWjairx1Npj1SABoQTSeeBzOGMlhm2FL9y qdWIiVEX6g93CPzyVcclLk2R1BalpNaQk0IuE6Z8f2YQorxvZpL2nQeQWjwtYULMekFe sWdw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@deltatee.com header.s=20200525 header.b=Wsxq4q38; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=deltatee.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u13-20020a17090341cd00b00178a33f8bb4si30434901ple.328.2022.10.21.10.45.09; Fri, 21 Oct 2022 10:45:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@deltatee.com header.s=20200525 header.b=Wsxq4q38; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=deltatee.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230364AbiJURlf (ORCPT + 99 others); Fri, 21 Oct 2022 13:41:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59770 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229592AbiJURl3 (ORCPT ); Fri, 21 Oct 2022 13:41:29 -0400 Received: from ale.deltatee.com (ale.deltatee.com [204.191.154.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F3ED824AE35; Fri, 21 Oct 2022 10:41:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=deltatee.com; s=20200525; h=Subject:MIME-Version:References:In-Reply-To: Message-Id:Date:Cc:To:From:content-disposition; bh=OC676xm4n+4quUWve6+HSNFg0hwcXHADJsbuIB5KOEM=; b=Wsxq4q38mUEjvwfxvFF14IWzOU 5VlOWiPn8393f1dAxy8toKeXn6XeHrR/IUTe8OpAg2VL4FONatGB0R5rvYnwFWE1BTojlUpH2WfJW 8Z6THGFo+tIqoM5h4LswsjbKqa5dblgAGUcLcNNggZITwNoKaKq9alo493IKjoWM+/9ZiXHbXPtJY LAA7yWQZpTmCk1DJ2q0L2uYXuCNRSifxnQTC8AFv5l/wsP3k9oTTSC9OAUUZPtZWy3uAjMtFP1FCF tQWboeFYCJfBd8htw5d0X8tTLHSMTbe/6E3W00wy5pRWsT/5+JCkf47fjtvYawICHeGbAdpz+t893 wKG7s7/g==; Received: from cgy1-donard.priv.deltatee.com ([172.16.1.31]) by ale.deltatee.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1olw1I-00DoHz-T8; Fri, 21 Oct 2022 11:41:21 -0600 Received: from gunthorp by cgy1-donard.priv.deltatee.com with local (Exim 4.94.2) (envelope-from ) id 1olw1F-0001t9-OW; Fri, 21 Oct 2022 11:41:17 -0600 From: Logan Gunthorpe To: linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-pci@vger.kernel.org, linux-mm@kvack.org Cc: Christoph Hellwig , Greg Kroah-Hartman , Dan Williams , Jason Gunthorpe , =?UTF-8?q?Christian=20K=C3=B6nig?= , John Hubbard , Don Dutile , Matthew Wilcox , Daniel Vetter , Minturn Dave B , Jason Ekstrand , Dave Hansen , Xiong Jianxin , Bjorn Helgaas , Ira Weiny , Robin Murphy , Martin Oliveira , Chaitanya Kulkarni , Ralph Campbell , Stephen Bates , Logan Gunthorpe Date: Fri, 21 Oct 2022 11:41:09 -0600 Message-Id: <20221021174116.7200-3-logang@deltatee.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221021174116.7200-1-logang@deltatee.com> References: <20221021174116.7200-1-logang@deltatee.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SA-Exim-Connect-IP: 172.16.1.31 X-SA-Exim-Rcpt-To: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-pci@vger.kernel.org, linux-mm@kvack.org, hch@lst.de, gregkh@linuxfoundation.org, jgg@ziepe.ca, christian.koenig@amd.com, ddutile@redhat.com, willy@infradead.org, daniel.vetter@ffwll.ch, jason@jlekstrand.net, dave.hansen@linux.intel.com, helgaas@kernel.org, dan.j.williams@intel.com, dave.b.minturn@intel.com, jianxin.xiong@intel.com, ira.weiny@intel.com, robin.murphy@arm.com, martin.oliveira@eideticom.com, ckulkarnilinux@gmail.com, jhubbard@nvidia.com, rcampbell@nvidia.com, sbates@raithlin.com, logang@deltatee.com X-SA-Exim-Mail-From: gunthorp@deltatee.com X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_PASS,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 Subject: [PATCH v11 2/9] mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages X-SA-Exim-Version: 4.2.1 (built Sat, 13 Feb 2021 17:57:42 +0000) X-SA-Exim-Scanned: Yes (on ale.deltatee.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org GUP Callers that expect PCI P2PDMA pages can now set FOLL_PCI_P2PDMA to allow obtaining P2PDMA pages. If GUP is called without the flag and a P2PDMA page is found, it will return an error in try_grab_page() or try_grab_folio(). The check is safe to do before taking the reference to the page in both cases seeing the page should be protected by either the appropriate ptl or mmap_lock; or the gup fast guarantees preventing TLB flushes. try_grab_folio() has one call site that WARNs on failure and cannot actually deal with the failure of this function (it seems it will get into an infinite loop). Expand the comment there to document a couple more conditions on why it will not fail. FOLL_PCI_P2PDMA cannot be set if FOLL_LONGTERM is set. This is to copy fsdax until pgmap refcounts are fixed (see the link below for more information). Link: https://lkml.kernel.org/r/Yy4Ot5MoOhsgYLTQ@ziepe.ca Signed-off-by: Logan Gunthorpe --- include/linux/mm.h | 1 + mm/gup.c | 19 ++++++++++++++++++- mm/hugetlb.c | 6 ++++-- 3 files changed, 23 insertions(+), 3 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 62a91dc1272b..6b081a8dcf88 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2958,6 +2958,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ #define FOLL_PIN 0x40000 /* pages must be released via unpin_user_page */ #define FOLL_FAST_ONLY 0x80000 /* gup_fast: prevent fall-back to slow gup */ +#define FOLL_PCI_P2PDMA 0x100000 /* allow returning PCI P2PDMA pages */ /* * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each diff --git a/mm/gup.c b/mm/gup.c index e2f447446384..29e28f020f0b 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -123,6 +123,9 @@ static inline struct folio *try_get_folio(struct page *page, int refs) */ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) { + if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page))) + return NULL; + if (flags & FOLL_GET) return try_get_folio(page, refs); else if (flags & FOLL_PIN) { @@ -216,6 +219,9 @@ int __must_check try_grab_page(struct page *page, unsigned int flags) if (WARN_ON_ONCE(folio_ref_count(folio) <= 0)) return -ENOMEM; + if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page))) + return -EREMOTEIO; + if (flags & FOLL_GET) folio_ref_inc(folio); else if (flags & FOLL_PIN) { @@ -631,6 +637,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, page = ERR_PTR(ret); goto out; } + /* * We need to make the page accessible if and only if we are going * to access its content (the FOLL_PIN case). Please see @@ -1060,6 +1067,9 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma)) return -EOPNOTSUPP; + if ((gup_flags & FOLL_LONGTERM) && (gup_flags & FOLL_PCI_P2PDMA)) + return -EOPNOTSUPP; + if (vma_is_secretmem(vma)) return -EFAULT; @@ -2536,6 +2546,12 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr, undo_dev_pagemap(nr, nr_start, flags, pages); break; } + + if (!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)) { + undo_dev_pagemap(nr, nr_start, flags, pages); + break; + } + SetPageReferenced(page); pages[*nr] = page; if (unlikely(try_grab_page(page, flags))) { @@ -3020,7 +3036,8 @@ static int internal_get_user_pages_fast(unsigned long start, if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM | FOLL_FORCE | FOLL_PIN | FOLL_GET | - FOLL_FAST_ONLY | FOLL_NOFAULT))) + FOLL_FAST_ONLY | FOLL_NOFAULT | + FOLL_PCI_P2PDMA))) return -EINVAL; if (gup_flags & FOLL_PIN) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e8d01a19ce46..a55adfbacedb 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6342,8 +6342,10 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, * tables. If the huge page is present, then the tail * pages must also be present. The ptl prevents the * head page and tail pages from being rearranged in - * any way. So this page must be available at this - * point, unless the page refcount overflowed: + * any way. As this is hugetlb, the pages will never + * be p2pdma or not longterm pinable. So this page + * must be available at this point, unless the page + * refcount overflowed: */ if (WARN_ON_ONCE(!try_grab_folio(pages[i], refs, flags))) { -- 2.30.2