2022-09-22 17:13:25

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH v10 0/8] Userspace P2PDMA with O_DIRECT NVMe devices

Hi,

This is the latest P2PDMA userspace patch set. This version includes
some cleanup from feedback of the last posting[1].

This patch set enables userspace P2PDMA by allowing userspace to mmap()
allocated chunks of the CMB. The resulting VMA can be passed only
to O_DIRECT IO on NVMe backed files or block devices. A flag is added
to GUP() in Patch 1, then Patches 2 through 6 wire this flag up based
on whether the block queue indicates P2PDMA support. Patches 7
creates the sysfs resource that can hand out the VMAs and Patch 8
adds brief documentation for the new interface.

Feedback welcome.

This series is based on v6.0-rc6. A git branch is available here:

https://github.com/sbates130272/linux-p2pmem/ p2pdma_user_cmb_v10

Thanks,

Logan

[1] https://lkml.kernel.org/r/[email protected]

--

Changes since v8:
- Rebased onto v6.0-rc6
- Reworked iov iter changes to reuse the code better and
name them without the _flags() prefix (per Christoph)
- Renamed a number of flags variables to gup_flags (per John)
- Minor fixups to the last documentation patch (from Greg and John)

Changes since v7:
- Rebased onto v6.0-rc2, included reworking the iov_iter patch
due to changes there
- Drop the char device mmap implementation in favour of a sysfs
based interface. (per Christoph)

Changes since v6:
- Rebase onto v5.19-rc1
- Rework how the pages are stored in the VMA per Jason's suggestion

Changes since v5:
- Rebased onto v5.18-rc1 which includes Christophs cleanup to
free_zone_device_page() (similar to Ralph's patch).
- Fix bug with concurrent first calls to pci_p2pdma_vma_fault()
that caused a double allocation and lost p2p memory. Noticed
by Andrew Maier.
- Collected a Reviewed-by tag from Chaitanya.
- Numerous minor fixes to commit messages

--

Logan Gunthorpe (8):
mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages
iov_iter: introduce iov_iter_get_pages_[alloc_]flags()
block: add check when merging zone device pages
lib/scatterlist: add check when merging zone device pages
block: set FOLL_PCI_P2PDMA in __bio_iov_iter_get_pages()
block: set FOLL_PCI_P2PDMA in bio_map_user_iov()
PCI/P2PDMA: Allow userspace VMA allocations through sysfs
ABI: sysfs-bus-pci: add documentation for p2pmem allocate

Documentation/ABI/testing/sysfs-bus-pci | 10 ++
block/bio.c | 11 ++-
block/blk-map.c | 7 +-
drivers/pci/p2pdma.c | 124 ++++++++++++++++++++++++
include/linux/mm.h | 1 +
include/linux/mmzone.h | 24 +++++
include/linux/uio.h | 6 ++
lib/iov_iter.c | 32 ++++--
lib/scatterlist.c | 25 +++--
mm/gup.c | 22 ++++-
10 files changed, 240 insertions(+), 22 deletions(-)


base-commit: 521a547ced6477c54b4b0cc206000406c221b4d6
--
2.30.2


2022-09-22 17:13:29

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH v10 3/8] block: add check when merging zone device pages

Consecutive zone device pages should not be merged into the same sgl
or bvec segment with other types of pages or if they belong to different
pgmaps. Otherwise getting the pgmap of a given segment is not possible
without scanning the entire segment. This helper returns true either if
both pages are not zone device pages or both pages are zone device
pages with the same pgmap.

Add a helper to determine if zone device pages are mergeable and use
this helper in page_is_mergeable().

Signed-off-by: Logan Gunthorpe <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
---
block/bio.c | 2 ++
include/linux/mmzone.h | 24 ++++++++++++++++++++++++
2 files changed, 26 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index 3d3a2678fea2..969607bc1f4d 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -865,6 +865,8 @@ static inline bool page_is_mergeable(const struct bio_vec *bv,
return false;
if (xen_domain() && !xen_biovec_phys_mergeable(bv, page))
return false;
+ if (!zone_device_pages_have_same_pgmap(bv->bv_page, page))
+ return false;

*same_page = ((vec_end_addr & PAGE_MASK) == page_addr);
if (*same_page)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e24b40c52468..2c31915b057e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -794,6 +794,25 @@ static inline bool is_zone_device_page(const struct page *page)
{
return page_zonenum(page) == ZONE_DEVICE;
}
+
+/*
+ * Consecutive zone device pages should not be merged into the same sgl
+ * or bvec segment with other types of pages or if they belong to different
+ * pgmaps. Otherwise getting the pgmap of a given segment is not possible
+ * without scanning the entire segment. This helper returns true either if
+ * both pages are not zone device pages or both pages are zone device pages
+ * with the same pgmap.
+ */
+static inline bool zone_device_pages_have_same_pgmap(const struct page *a,
+ const struct page *b)
+{
+ if (is_zone_device_page(a) != is_zone_device_page(b))
+ return false;
+ if (!is_zone_device_page(a))
+ return true;
+ return a->pgmap == b->pgmap;
+}
+
extern void memmap_init_zone_device(struct zone *, unsigned long,
unsigned long, struct dev_pagemap *);
#else
@@ -801,6 +820,11 @@ static inline bool is_zone_device_page(const struct page *page)
{
return false;
}
+static inline bool zone_device_pages_have_same_pgmap(const struct page *a,
+ const struct page *b)
+{
+ return true;
+}
#endif

static inline bool folio_is_zone_device(const struct folio *folio)
--
2.30.2

2022-09-23 06:14:45

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v10 0/8] Userspace P2PDMA with O_DIRECT NVMe devices

Thanks, the entire series looks good to me now:

Reviewed-by: Christoph Hellwig <[email protected]>

Given that this is spread all over, what tree do we want to take it
through?

2022-09-23 08:27:47

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH v10 0/8] Userspace P2PDMA with O_DIRECT NVMe devices

On Thu, Sep 22, 2022 at 10:39:18AM -0600, Logan Gunthorpe wrote:
> Hi,
>
> This is the latest P2PDMA userspace patch set. This version includes
> some cleanup from feedback of the last posting[1].
>
> This patch set enables userspace P2PDMA by allowing userspace to mmap()
> allocated chunks of the CMB. The resulting VMA can be passed only
> to O_DIRECT IO on NVMe backed files or block devices. A flag is added
> to GUP() in Patch 1, then Patches 2 through 6 wire this flag up based
> on whether the block queue indicates P2PDMA support. Patches 7
> creates the sysfs resource that can hand out the VMAs and Patch 8
> adds brief documentation for the new interface.
>
> Feedback welcome.
>
> This series is based on v6.0-rc6. A git branch is available here:
>
> https://github.com/sbates130272/linux-p2pmem/ p2pdma_user_cmb_v10

Looks good to me, thanks for sticking with it.

greg k-h

2022-09-23 16:18:45

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v10 0/8] Userspace P2PDMA with O_DIRECT NVMe devices




On 2022-09-23 00:01, Christoph Hellwig wrote:
> Thanks, the entire series looks good to me now:
>
> Reviewed-by: Christoph Hellwig <[email protected]>
>
> Given that this is spread all over, what tree do we want to take it
> through?

Yes, while this is ostensibly a feature for NVMe it turns out we didn't
need to touch any NVMe code at all.

The most likely patch in my mind to have conflicts is the iov_iter patch
as there's been a lot of churn there in the last few cycles and there
are continued discussions.

There are 2 PCI patches, but Bjorn's aware of them and has acked them.
I'm also fairly confident this shouldn't conflict with anything in his tree.

Besides that, there is one mm/gup patch which is the next likely to
conflict; one scatterlist patch and three block layer patches which have
largely been stable when I've done rebases.

Logan