2022-07-23 02:05:15

by Nicolin Chen

[permalink] [raw]
Subject: [PATCH v4 00/10] cover-letter: Update vfio_pin/unpin_pages API

This is a preparatory series for IOMMUFD v2 patches. It prepares for
replacing vfio_iommu_type1 implementations of vfio_pin/unpin_pages()
with IOMMUFD version.

There's a gap between these two versions: the vfio_iommu_type1 version
inputs a non-contiguous PFN list and outputs another PFN list for the
pinned physical page list, while the IOMMUFD version only supports a
contiguous address input by accepting the starting IO virtual address
of a set of pages to pin and by outputting to a physical page list.

The nature of existing callers mostly aligns with the IOMMUFD version,
except s390's vfio_ccw_cp code where some additional change is needed
along with this series. Overall, updating to "iova" and "phys_page"
does improve the caller side to some extent.

Also fix a misuse of physical address and virtual address in the s390's
crypto code. And update the input naming at the adjacent vfio_dma_rw().

This is on github:
https://github.com/nicolinc/iommufd/commits/vfio_pin_pages-v4

Terrence has tested this series on i915; Eric has tested on s390.

Thanks!

Changelog
v4:
* Dropped double-shifting at two gvt_unpin_guest_page calls, fixing
a bug that's discovered by Alex
* Added Reviewed-by from Anthony Krowiak
* Rebased on top of linux-vfio's next
v3: https://lore.kernel.org/kvm/[email protected]/
* Added a patch to replace roundup with DIV_ROUND_UP in i915 gvt
* Dropped the "driver->ops->unpin_pages" and NULL checks in PATCH-1
* Changed to use WARN_ON and separate into lines in PATCH-1
* Replaced "guest" words with "user" and fix typo in PATCH-5
* Updated commit log of PATCH-1, PATCH-6, and PATCH-10
* Added Reviewed/Acked-by from Christoph, Jason, Kirti, Kevin and Eric
* Added Tested-by from Terrence (i915) and Eric (s390)
v2: https://lore.kernel.org/kvm/[email protected]/
* Added a patch to make vfio_unpin_pages return void
* Added two patches to remove PFN list from two s390 callers
* Renamed "phys_page" parameter to "pages" for vfio_pin_pages
* Updated commit log of kmap_local_page() patch
* Added Harald's "Reviewed-by" to pa_ind patch
* Rebased on top of Alex's extern removal path
v1: https://lore.kernel.org/kvm/[email protected]/

Nicolin Chen (10):
vfio: Make vfio_unpin_pages() return void
drm/i915/gvt: Replace roundup with DIV_ROUND_UP
vfio/ap: Pass in physical address of ind to ap_aqic()
vfio/ccw: Only pass in contiguous pages
vfio: Pass in starting IOVA to vfio_pin/unpin_pages API
vfio/ap: Change saved_pfn to saved_iova
vfio/ccw: Change pa_pfn list to pa_iova list
vfio: Rename user_iova of vfio_dma_rw()
vfio/ccw: Add kmap_local_page() for memcpy
vfio: Replace phys_pfn with pages for vfio_pin_pages()

.../driver-api/vfio-mediated-device.rst | 6 +-
arch/s390/include/asm/ap.h | 6 +-
drivers/gpu/drm/i915/gvt/kvmgt.c | 45 ++--
drivers/s390/cio/vfio_ccw_cp.c | 195 +++++++++++-------
drivers/s390/crypto/ap_queue.c | 2 +-
drivers/s390/crypto/vfio_ap_ops.c | 54 +++--
drivers/s390/crypto/vfio_ap_private.h | 4 +-
drivers/vfio/vfio.c | 54 ++---
drivers/vfio/vfio.h | 8 +-
drivers/vfio/vfio_iommu_type1.c | 45 ++--
include/linux/vfio.h | 9 +-
11 files changed, 213 insertions(+), 215 deletions(-)

--
2.17.1


2022-07-23 02:05:28

by Nicolin Chen

[permalink] [raw]
Subject: [PATCH v4 01/10] vfio: Make vfio_unpin_pages() return void

There's only one caller that checks its return value with a WARN_ON_ONCE,
while all other callers don't check the return value at all. Above that,
an undo function should not fail. So, simplify the API to return void by
embedding similar WARN_ONs.

Also for users to pinpoint which condition fails, separate WARN_ON lines,
yet remove the "driver->ops->unpin_pages" check, since it's unreasonable
for callers to unpin on something totally random that wasn't even pinned.
And remove NULL pointer checks for they would trigger oops vs. warnings.
Note that npage is already validated in the vfio core, thus drop the same
check in the type1 code.

Suggested-by: Christoph Hellwig <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Kirti Wankhede <[email protected]>
Tested-by: Terrence Xu <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
.../driver-api/vfio-mediated-device.rst | 2 +-
drivers/gpu/drm/i915/gvt/kvmgt.c | 5 +----
drivers/vfio/vfio.c | 21 +++++++------------
drivers/vfio/vfio.h | 2 +-
drivers/vfio/vfio_iommu_type1.c | 15 ++++++-------
include/linux/vfio.h | 4 ++--
6 files changed, 18 insertions(+), 31 deletions(-)

diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
index 1c57815619fd..b0fdf76b339a 100644
--- a/Documentation/driver-api/vfio-mediated-device.rst
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -265,7 +265,7 @@ driver::
int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
int npage, int prot, unsigned long *phys_pfn);

- int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
+ void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
int npage);

These functions call back into the back-end IOMMU module by using the pin_pages
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index ecd5bb37b63a..4d32a2748958 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -231,18 +231,15 @@ static void intel_gvt_cleanup_vgpu_type_groups(struct intel_gvt *gvt)
static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
unsigned long size)
{
- struct drm_i915_private *i915 = vgpu->gvt->gt->i915;
int total_pages;
int npage;
- int ret;

total_pages = roundup(size, PAGE_SIZE) / PAGE_SIZE;

for (npage = 0; npage < total_pages; npage++) {
unsigned long cur_gfn = gfn + npage;

- ret = vfio_unpin_pages(&vgpu->vfio_device, &cur_gfn, 1);
- drm_WARN_ON(&i915->drm, ret != 1);
+ vfio_unpin_pages(&vgpu->vfio_device, &cur_gfn, 1);
}
}

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index b3ce8073cfb1..92b10aafae28 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1983,31 +1983,24 @@ EXPORT_SYMBOL(vfio_pin_pages);
* PFNs should not be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
* @npage [in] : count of elements in user_pfn array. This count should not
* be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
- * Return error or number of pages unpinned.
*/
-int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
- int npage)
+void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
+ int npage)
{
struct vfio_container *container;
struct vfio_iommu_driver *driver;
- int ret;

- if (!user_pfn || !npage || !vfio_assert_device_open(device))
- return -EINVAL;
+ if (WARN_ON(npage <= 0 || npage > VFIO_PIN_PAGES_MAX_ENTRIES))
+ return;

- if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
- return -E2BIG;
+ if (WARN_ON(!vfio_assert_device_open(device)))
+ return;

/* group->container cannot change while a vfio device is open */
container = device->group->container;
driver = container->iommu_driver;
- if (likely(driver && driver->ops->unpin_pages))
- ret = driver->ops->unpin_pages(container->iommu_data, user_pfn,
- npage);
- else
- ret = -ENOTTY;

- return ret;
+ driver->ops->unpin_pages(container->iommu_data, user_pfn, npage);
}
EXPORT_SYMBOL(vfio_unpin_pages);

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 4a7db1f3c33e..6a8424b407c7 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -53,7 +53,7 @@ struct vfio_iommu_driver_ops {
unsigned long *user_pfn,
int npage, int prot,
unsigned long *phys_pfn);
- int (*unpin_pages)(void *iommu_data,
+ void (*unpin_pages)(void *iommu_data,
unsigned long *user_pfn, int npage);
void (*register_device)(void *iommu_data,
struct vfio_device *vdev);
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 026a1d2553a2..e49fbe9968ef 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -949,20 +949,16 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
return ret;
}

-static int vfio_iommu_type1_unpin_pages(void *iommu_data,
- unsigned long *user_pfn,
- int npage)
+static void vfio_iommu_type1_unpin_pages(void *iommu_data,
+ unsigned long *user_pfn, int npage)
{
struct vfio_iommu *iommu = iommu_data;
bool do_accounting;
int i;

- if (!iommu || !user_pfn || npage <= 0)
- return -EINVAL;
-
/* Supported for v2 version only */
- if (!iommu->v2)
- return -EACCES;
+ if (WARN_ON(!iommu->v2))
+ return;

mutex_lock(&iommu->lock);

@@ -980,7 +976,8 @@ static int vfio_iommu_type1_unpin_pages(void *iommu_data,
}

mutex_unlock(&iommu->lock);
- return i > 0 ? i : -EINVAL;
+
+ WARN_ON(i != npage);
}

static long vfio_sync_unpin(struct vfio_dma *dma, struct vfio_domain *domain,
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 19cefbaa3d06..9f7d74c24925 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -163,8 +163,8 @@ bool vfio_file_has_dev(struct file *file, struct vfio_device *device);

int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
int npage, int prot, unsigned long *phys_pfn);
-int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
- int npage);
+void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
+ int npage);
int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova,
void *data, size_t len, bool write);

--
2.17.1

2022-07-23 02:05:38

by Nicolin Chen

[permalink] [raw]
Subject: [PATCH v4 03/10] vfio/ap: Pass in physical address of ind to ap_aqic()

The ap_aqic() is called by vfio_ap_irq_enable() where it passes in a
virt value that's casted from a physical address "h_nib". Inside the
ap_aqic(), it does virt_to_phys() again.

Since ap_aqic() needs a physical address, let's just pass in a pa of
ind directly. So change the "ind" to "pa_ind".

Reviewed-by: Harald Freudenberger <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Tested-by: Eric Farman <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
arch/s390/include/asm/ap.h | 6 +++---
drivers/s390/crypto/ap_queue.c | 2 +-
drivers/s390/crypto/vfio_ap_ops.c | 7 ++++---
3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/s390/include/asm/ap.h b/arch/s390/include/asm/ap.h
index b515cfa62bd9..f508f5025e38 100644
--- a/arch/s390/include/asm/ap.h
+++ b/arch/s390/include/asm/ap.h
@@ -227,13 +227,13 @@ struct ap_qirq_ctrl {
* ap_aqic(): Control interruption for a specific AP.
* @qid: The AP queue number
* @qirqctrl: struct ap_qirq_ctrl (64 bit value)
- * @ind: The notification indicator byte
+ * @pa_ind: Physical address of the notification indicator byte
*
* Returns AP queue status.
*/
static inline struct ap_queue_status ap_aqic(ap_qid_t qid,
struct ap_qirq_ctrl qirqctrl,
- void *ind)
+ phys_addr_t pa_ind)
{
unsigned long reg0 = qid | (3UL << 24); /* fc 3UL is AQIC */
union {
@@ -241,7 +241,7 @@ static inline struct ap_queue_status ap_aqic(ap_qid_t qid,
struct ap_qirq_ctrl qirqctrl;
struct ap_queue_status status;
} reg1;
- unsigned long reg2 = virt_to_phys(ind);
+ unsigned long reg2 = pa_ind;

reg1.qirqctrl = qirqctrl;

diff --git a/drivers/s390/crypto/ap_queue.c b/drivers/s390/crypto/ap_queue.c
index c48b0db824e3..a32457b4cbb8 100644
--- a/drivers/s390/crypto/ap_queue.c
+++ b/drivers/s390/crypto/ap_queue.c
@@ -34,7 +34,7 @@ static int ap_queue_enable_irq(struct ap_queue *aq, void *ind)

qirqctrl.ir = 1;
qirqctrl.isc = AP_ISC;
- status = ap_aqic(aq->qid, qirqctrl, ind);
+ status = ap_aqic(aq->qid, qirqctrl, virt_to_phys(ind));
switch (status.response_code) {
case AP_RESPONSE_NORMAL:
case AP_RESPONSE_OTHERWISE_CHANGED:
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index bb1a1677c5c2..5781059d3ed2 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -154,7 +154,7 @@ static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
int retries = 5;

do {
- status = ap_aqic(q->apqn, aqic_gisa, NULL);
+ status = ap_aqic(q->apqn, aqic_gisa, 0);
switch (status.response_code) {
case AP_RESPONSE_OTHERWISE_CHANGED:
case AP_RESPONSE_NORMAL:
@@ -245,7 +245,8 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
struct kvm_s390_gisa *gisa;
int nisc;
struct kvm *kvm;
- unsigned long h_nib, g_pfn, h_pfn;
+ unsigned long g_pfn, h_pfn;
+ phys_addr_t h_nib;
int ret;

/* Verify that the notification indicator byte address is valid */
@@ -290,7 +291,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
aqic_gisa.ir = 1;
aqic_gisa.gisa = (uint64_t)gisa >> 4;

- status = ap_aqic(q->apqn, aqic_gisa, (void *)h_nib);
+ status = ap_aqic(q->apqn, aqic_gisa, h_nib);
switch (status.response_code) {
case AP_RESPONSE_NORMAL:
/* See if we did clear older IRQ configuration */
--
2.17.1

2022-07-23 02:05:53

by Nicolin Chen

[permalink] [raw]
Subject: [PATCH v4 04/10] vfio/ccw: Only pass in contiguous pages

This driver is the only caller of vfio_pin/unpin_pages that might pass
in a non-contiguous PFN list, but in many cases it has a contiguous PFN
list to process. So letting VFIO API handle a non-contiguous PFN list
is actually counterproductive.

Add a pair of simple loops to pass in contiguous PFNs only, to have an
efficient implementation in VFIO.

Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Eric Farman <[email protected]>
Tested-by: Eric Farman <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/s390/cio/vfio_ccw_cp.c | 70 +++++++++++++++++++++++++++-------
1 file changed, 56 insertions(+), 14 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 0c2be9421ab7..3b94863ad24e 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -90,6 +90,38 @@ static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
return 0;
}

+/*
+ * pfn_array_unpin() - Unpin user pages in memory
+ * @pa: pfn_array on which to perform the operation
+ * @vdev: the vfio device to perform the operation
+ * @pa_nr: number of user pages to unpin
+ *
+ * Only unpin if any pages were pinned to begin with, i.e. pa_nr > 0,
+ * otherwise only clear pa->pa_nr
+ */
+static void pfn_array_unpin(struct pfn_array *pa,
+ struct vfio_device *vdev, int pa_nr)
+{
+ int unpinned = 0, npage = 1;
+
+ while (unpinned < pa_nr) {
+ unsigned long *first = &pa->pa_iova_pfn[unpinned];
+ unsigned long *last = &first[npage];
+
+ if (unpinned + npage < pa_nr &&
+ *first + npage == *last) {
+ npage++;
+ continue;
+ }
+
+ vfio_unpin_pages(vdev, first, npage);
+ unpinned += npage;
+ npage = 1;
+ }
+
+ pa->pa_nr = 0;
+}
+
/*
* pfn_array_pin() - Pin user pages in memory
* @pa: pfn_array on which to perform the operation
@@ -101,34 +133,44 @@ static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
*/
static int pfn_array_pin(struct pfn_array *pa, struct vfio_device *vdev)
{
+ int pinned = 0, npage = 1;
int ret = 0;

- ret = vfio_pin_pages(vdev, pa->pa_iova_pfn, pa->pa_nr,
- IOMMU_READ | IOMMU_WRITE, pa->pa_pfn);
+ while (pinned < pa->pa_nr) {
+ unsigned long *first = &pa->pa_iova_pfn[pinned];
+ unsigned long *last = &first[npage];

- if (ret < 0) {
- goto err_out;
- } else if (ret > 0 && ret != pa->pa_nr) {
- vfio_unpin_pages(vdev, pa->pa_iova_pfn, ret);
- ret = -EINVAL;
- goto err_out;
+ if (pinned + npage < pa->pa_nr &&
+ *first + npage == *last) {
+ npage++;
+ continue;
+ }
+
+ ret = vfio_pin_pages(vdev, first, npage,
+ IOMMU_READ | IOMMU_WRITE,
+ &pa->pa_pfn[pinned]);
+ if (ret < 0) {
+ goto err_out;
+ } else if (ret > 0 && ret != npage) {
+ pinned += ret;
+ ret = -EINVAL;
+ goto err_out;
+ }
+ pinned += npage;
+ npage = 1;
}

return ret;

err_out:
- pa->pa_nr = 0;
-
+ pfn_array_unpin(pa, vdev, pinned);
return ret;
}

/* Unpin the pages before releasing the memory. */
static void pfn_array_unpin_free(struct pfn_array *pa, struct vfio_device *vdev)
{
- /* Only unpin if any pages were pinned to begin with */
- if (pa->pa_nr)
- vfio_unpin_pages(vdev, pa->pa_iova_pfn, pa->pa_nr);
- pa->pa_nr = 0;
+ pfn_array_unpin(pa, vdev, pa->pa_nr);
kfree(pa->pa_iova_pfn);
}

--
2.17.1

2022-07-23 02:05:55

by Nicolin Chen

[permalink] [raw]
Subject: [PATCH v4 02/10] drm/i915/gvt: Replace roundup with DIV_ROUND_UP

It's a bit redundant for the maths here using roundup.

Suggested-by: Jason Gunthorpe <[email protected]>
Tested-by: Terrence Xu <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/gpu/drm/i915/gvt/kvmgt.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 4d32a2748958..2fee5695515a 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -231,11 +231,9 @@ static void intel_gvt_cleanup_vgpu_type_groups(struct intel_gvt *gvt)
static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
unsigned long size)
{
- int total_pages;
+ int total_pages = DIV_ROUND_UP(size, PAGE_SIZE);
int npage;

- total_pages = roundup(size, PAGE_SIZE) / PAGE_SIZE;
-
for (npage = 0; npage < total_pages; npage++) {
unsigned long cur_gfn = gfn + npage;

@@ -247,12 +245,11 @@ static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
unsigned long size, struct page **page)
{
+ int total_pages = DIV_ROUND_UP(size, PAGE_SIZE);
unsigned long base_pfn = 0;
- int total_pages;
int npage;
int ret;

- total_pages = roundup(size, PAGE_SIZE) / PAGE_SIZE;
/*
* We pin the pages one-by-one to avoid allocating a big arrary
* on stack to hold pfns.
--
2.17.1

2022-07-23 02:06:43

by Nicolin Chen

[permalink] [raw]
Subject: [PATCH v4 05/10] vfio: Pass in starting IOVA to vfio_pin/unpin_pages API

The vfio_pin/unpin_pages() so far accepted arrays of PFNs of user IOVA.
Among all three callers, there was only one caller possibly passing in
a non-contiguous PFN list, which is now ensured to have contiguous PFN
inputs too.

Pass in the starting address with "iova" alone to simplify things, so
callers no longer need to maintain a PFN list or to pin/unpin one page
at a time. This also allows VFIO to use more efficient implementations
of pin/unpin_pages.

For now, also update vfio_iommu_type1 to fit this new parameter too,
while keeping its input intact (being user_iova) since we don't want
to spend too much effort swapping its parameters and local variables
at that level.

Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed by: Kirti Wankhede <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Tony Krowiak <[email protected]>
Acked-by: Eric Farman <[email protected]>
Tested-by: Terrence Xu <[email protected]>
Tested-by: Eric Farman <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
.../driver-api/vfio-mediated-device.rst | 4 +--
drivers/gpu/drm/i915/gvt/kvmgt.c | 18 +++++--------
drivers/s390/cio/vfio_ccw_cp.c | 4 +--
drivers/s390/crypto/vfio_ap_ops.c | 9 +++----
drivers/vfio/vfio.c | 27 +++++++++----------
drivers/vfio/vfio.h | 4 +--
drivers/vfio/vfio_iommu_type1.c | 15 +++++------
include/linux/vfio.h | 5 ++--
8 files changed, 37 insertions(+), 49 deletions(-)

diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
index b0fdf76b339a..ea32a0f13ddb 100644
--- a/Documentation/driver-api/vfio-mediated-device.rst
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -262,10 +262,10 @@ Translation APIs for Mediated Devices
The following APIs are provided for translating user pfn to host pfn in a VFIO
driver::

- int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
+ int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
int npage, int prot, unsigned long *phys_pfn);

- void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
+ void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
int npage);

These functions call back into the back-end IOMMU module by using the pin_pages
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 2fee5695515a..8be75c282611 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -231,14 +231,8 @@ static void intel_gvt_cleanup_vgpu_type_groups(struct intel_gvt *gvt)
static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
unsigned long size)
{
- int total_pages = DIV_ROUND_UP(size, PAGE_SIZE);
- int npage;
-
- for (npage = 0; npage < total_pages; npage++) {
- unsigned long cur_gfn = gfn + npage;
-
- vfio_unpin_pages(&vgpu->vfio_device, &cur_gfn, 1);
- }
+ vfio_unpin_pages(&vgpu->vfio_device, gfn << PAGE_SHIFT,
+ DIV_ROUND_UP(size, PAGE_SIZE));
}

/* Pin a normal or compound guest page for dma. */
@@ -255,14 +249,14 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
* on stack to hold pfns.
*/
for (npage = 0; npage < total_pages; npage++) {
- unsigned long cur_gfn = gfn + npage;
+ dma_addr_t cur_iova = (gfn + npage) << PAGE_SHIFT;
unsigned long pfn;

- ret = vfio_pin_pages(&vgpu->vfio_device, &cur_gfn, 1,
+ ret = vfio_pin_pages(&vgpu->vfio_device, cur_iova, 1,
IOMMU_READ | IOMMU_WRITE, &pfn);
if (ret != 1) {
- gvt_vgpu_err("vfio_pin_pages failed for gfn 0x%lx, ret %d\n",
- cur_gfn, ret);
+ gvt_vgpu_err("vfio_pin_pages failed for iova %pad, ret %d\n",
+ &cur_iova, ret);
goto err;
}

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 3b94863ad24e..a739262f988d 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -114,7 +114,7 @@ static void pfn_array_unpin(struct pfn_array *pa,
continue;
}

- vfio_unpin_pages(vdev, first, npage);
+ vfio_unpin_pages(vdev, *first << PAGE_SHIFT, npage);
unpinned += npage;
npage = 1;
}
@@ -146,7 +146,7 @@ static int pfn_array_pin(struct pfn_array *pa, struct vfio_device *vdev)
continue;
}

- ret = vfio_pin_pages(vdev, first, npage,
+ ret = vfio_pin_pages(vdev, *first << PAGE_SHIFT, npage,
IOMMU_READ | IOMMU_WRITE,
&pa->pa_pfn[pinned]);
if (ret < 0) {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 5781059d3ed2..70f484668ca0 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -124,7 +124,7 @@ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
q->saved_isc = VFIO_AP_ISC_INVALID;
}
if (q->saved_pfn && !WARN_ON(!q->matrix_mdev)) {
- vfio_unpin_pages(&q->matrix_mdev->vdev, &q->saved_pfn, 1);
+ vfio_unpin_pages(&q->matrix_mdev->vdev, q->saved_pfn << PAGE_SHIFT, 1);
q->saved_pfn = 0;
}
}
@@ -258,7 +258,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
return status;
}

- ret = vfio_pin_pages(&q->matrix_mdev->vdev, &g_pfn, 1,
+ ret = vfio_pin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1,
IOMMU_READ | IOMMU_WRITE, &h_pfn);
switch (ret) {
case 1:
@@ -301,7 +301,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
break;
case AP_RESPONSE_OTHERWISE_CHANGED:
/* We could not modify IRQ setings: clear new configuration */
- vfio_unpin_pages(&q->matrix_mdev->vdev, &g_pfn, 1);
+ vfio_unpin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1);
kvm_s390_gisc_unregister(kvm, isc);
break;
default:
@@ -1232,9 +1232,8 @@ static void vfio_ap_mdev_dma_unmap(struct vfio_device *vdev, u64 iova,
{
struct ap_matrix_mdev *matrix_mdev =
container_of(vdev, struct ap_matrix_mdev, vdev);
- unsigned long g_pfn = iova >> PAGE_SHIFT;

- vfio_unpin_pages(&matrix_mdev->vdev, &g_pfn, 1);
+ vfio_unpin_pages(&matrix_mdev->vdev, iova, 1);
}

/**
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 92b10aafae28..ffd1a492eea9 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1934,17 +1934,17 @@ int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs,
EXPORT_SYMBOL(vfio_set_irqs_validate_and_prepare);

/*
- * Pin a set of guest PFNs and return their associated host PFNs for local
+ * Pin contiguous user pages and return their associated host pages for local
* domain only.
* @device [in] : device
- * @user_pfn [in]: array of user/guest PFNs to be pinned.
- * @npage [in] : count of elements in user_pfn array. This count should not
- * be greater VFIO_PIN_PAGES_MAX_ENTRIES.
+ * @iova [in] : starting IOVA of user pages to be pinned.
+ * @npage [in] : count of pages to be pinned. This count should not
+ * be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
* @prot [in] : protection flags
* @phys_pfn[out]: array of host PFNs
* Return error or number of pages pinned.
*/
-int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
+int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
int npage, int prot, unsigned long *phys_pfn)
{
struct vfio_container *container;
@@ -1952,8 +1952,7 @@ int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
struct vfio_iommu_driver *driver;
int ret;

- if (!user_pfn || !phys_pfn || !npage ||
- !vfio_assert_device_open(device))
+ if (!phys_pfn || !npage || !vfio_assert_device_open(device))
return -EINVAL;

if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
@@ -1967,7 +1966,7 @@ int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
driver = container->iommu_driver;
if (likely(driver && driver->ops->pin_pages))
ret = driver->ops->pin_pages(container->iommu_data,
- group->iommu_group, user_pfn,
+ group->iommu_group, iova,
npage, prot, phys_pfn);
else
ret = -ENOTTY;
@@ -1977,15 +1976,13 @@ int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
EXPORT_SYMBOL(vfio_pin_pages);

/*
- * Unpin set of host PFNs for local domain only.
+ * Unpin contiguous host pages for local domain only.
* @device [in] : device
- * @user_pfn [in]: array of user/guest PFNs to be unpinned. Number of user/guest
- * PFNs should not be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
- * @npage [in] : count of elements in user_pfn array. This count should not
+ * @iova [in] : starting address of user pages to be unpinned.
+ * @npage [in] : count of pages to be unpinned. This count should not
* be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
*/
-void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
- int npage)
+void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage)
{
struct vfio_container *container;
struct vfio_iommu_driver *driver;
@@ -2000,7 +1997,7 @@ void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
container = device->group->container;
driver = container->iommu_driver;

- driver->ops->unpin_pages(container->iommu_data, user_pfn, npage);
+ driver->ops->unpin_pages(container->iommu_data, iova, npage);
}
EXPORT_SYMBOL(vfio_unpin_pages);

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 6a8424b407c7..e9767e13f00f 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -50,11 +50,11 @@ struct vfio_iommu_driver_ops {
struct iommu_group *group);
int (*pin_pages)(void *iommu_data,
struct iommu_group *group,
- unsigned long *user_pfn,
+ dma_addr_t user_iova,
int npage, int prot,
unsigned long *phys_pfn);
void (*unpin_pages)(void *iommu_data,
- unsigned long *user_pfn, int npage);
+ dma_addr_t user_iova, int npage);
void (*register_device)(void *iommu_data,
struct vfio_device *vdev);
void (*unregister_device)(void *iommu_data,
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index e49fbe9968ef..e629e059118c 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -829,7 +829,7 @@ static int vfio_unpin_page_external(struct vfio_dma *dma, dma_addr_t iova,

static int vfio_iommu_type1_pin_pages(void *iommu_data,
struct iommu_group *iommu_group,
- unsigned long *user_pfn,
+ dma_addr_t user_iova,
int npage, int prot,
unsigned long *phys_pfn)
{
@@ -841,7 +841,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
bool do_accounting;
dma_addr_t iova;

- if (!iommu || !user_pfn || !phys_pfn)
+ if (!iommu || !phys_pfn)
return -EINVAL;

/* Supported for v2 version only */
@@ -857,7 +857,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
again:
if (iommu->vaddr_invalid_count) {
for (i = 0; i < npage; i++) {
- iova = user_pfn[i] << PAGE_SHIFT;
+ iova = user_iova + PAGE_SIZE * i;
ret = vfio_find_dma_valid(iommu, iova, PAGE_SIZE, &dma);
if (ret < 0)
goto pin_done;
@@ -882,7 +882,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
for (i = 0; i < npage; i++) {
struct vfio_pfn *vpfn;

- iova = user_pfn[i] << PAGE_SHIFT;
+ iova = user_iova + PAGE_SIZE * i;
dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
if (!dma) {
ret = -EINVAL;
@@ -939,7 +939,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
for (j = 0; j < i; j++) {
dma_addr_t iova;

- iova = user_pfn[j] << PAGE_SHIFT;
+ iova = user_iova + PAGE_SIZE * j;
dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
vfio_unpin_page_external(dma, iova, do_accounting);
phys_pfn[j] = 0;
@@ -950,7 +950,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
}

static void vfio_iommu_type1_unpin_pages(void *iommu_data,
- unsigned long *user_pfn, int npage)
+ dma_addr_t user_iova, int npage)
{
struct vfio_iommu *iommu = iommu_data;
bool do_accounting;
@@ -964,10 +964,9 @@ static void vfio_iommu_type1_unpin_pages(void *iommu_data,

do_accounting = list_empty(&iommu->domain_list);
for (i = 0; i < npage; i++) {
+ dma_addr_t iova = user_iova + PAGE_SIZE * i;
struct vfio_dma *dma;
- dma_addr_t iova;

- iova = user_pfn[i] << PAGE_SHIFT;
dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
if (!dma)
break;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 9f7d74c24925..9e3b6abcf890 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -161,10 +161,9 @@ bool vfio_file_has_dev(struct file *file, struct vfio_device *device);

#define VFIO_PIN_PAGES_MAX_ENTRIES (PAGE_SIZE/sizeof(unsigned long))

-int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
+int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
int npage, int prot, unsigned long *phys_pfn);
-void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
- int npage);
+void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage);
int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova,
void *data, size_t len, bool write);

--
2.17.1

2022-07-23 02:07:42

by Nicolin Chen

[permalink] [raw]
Subject: [PATCH v4 06/10] vfio/ap: Change saved_pfn to saved_iova

The vfio_ap_ops code maintains both nib address and its PFN, which
is redundant, merely because vfio_pin/unpin_pages API wanted pfn.
Since vfio_pin/unpin_pages() now accept "iova", change "saved_pfn"
to "saved_iova" and remove pfn in the vfio_ap_validate_nib().

Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Tony Krowiak <[email protected]>
Tested-by: Eric Farman <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 42 +++++++++++----------------
drivers/s390/crypto/vfio_ap_private.h | 4 +--
2 files changed, 19 insertions(+), 27 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 70f484668ca0..d7c38c82f694 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -112,7 +112,7 @@ static void vfio_ap_wait_for_irqclear(int apqn)
*
* Unregisters the ISC in the GIB when the saved ISC not invalid.
* Unpins the guest's page holding the NIB when it exists.
- * Resets the saved_pfn and saved_isc to invalid values.
+ * Resets the saved_iova and saved_isc to invalid values.
*/
static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
{
@@ -123,9 +123,9 @@ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc);
q->saved_isc = VFIO_AP_ISC_INVALID;
}
- if (q->saved_pfn && !WARN_ON(!q->matrix_mdev)) {
- vfio_unpin_pages(&q->matrix_mdev->vdev, q->saved_pfn << PAGE_SHIFT, 1);
- q->saved_pfn = 0;
+ if (q->saved_iova && !WARN_ON(!q->matrix_mdev)) {
+ vfio_unpin_pages(&q->matrix_mdev->vdev, q->saved_iova, 1);
+ q->saved_iova = 0;
}
}

@@ -189,27 +189,19 @@ static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
*
* @vcpu: the object representing the vcpu executing the PQAP(AQIC) instruction.
* @nib: the location for storing the nib address.
- * @g_pfn: the location for storing the page frame number of the page containing
- * the nib.
*
* When the PQAP(AQIC) instruction is executed, general register 2 contains the
* address of the notification indicator byte (nib) used for IRQ notification.
- * This function parses the nib from gr2 and calculates the page frame
- * number for the guest of the page containing the nib. The values are
- * stored in @nib and @g_pfn respectively.
- *
- * The g_pfn of the nib is then validated to ensure the nib address is valid.
+ * This function parses and validates the nib from gr2.
*
* Return: returns zero if the nib address is a valid; otherwise, returns
* -EINVAL.
*/
-static int vfio_ap_validate_nib(struct kvm_vcpu *vcpu, unsigned long *nib,
- unsigned long *g_pfn)
+static int vfio_ap_validate_nib(struct kvm_vcpu *vcpu, dma_addr_t *nib)
{
*nib = vcpu->run->s.regs.gprs[2];
- *g_pfn = *nib >> PAGE_SHIFT;

- if (kvm_is_error_hva(gfn_to_hva(vcpu->kvm, *g_pfn)))
+ if (kvm_is_error_hva(gfn_to_hva(vcpu->kvm, *nib >> PAGE_SHIFT)))
return -EINVAL;

return 0;
@@ -239,34 +231,34 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
int isc,
struct kvm_vcpu *vcpu)
{
- unsigned long nib;
struct ap_qirq_ctrl aqic_gisa = {};
struct ap_queue_status status = {};
struct kvm_s390_gisa *gisa;
int nisc;
struct kvm *kvm;
- unsigned long g_pfn, h_pfn;
+ unsigned long h_pfn;
phys_addr_t h_nib;
+ dma_addr_t nib;
int ret;

/* Verify that the notification indicator byte address is valid */
- if (vfio_ap_validate_nib(vcpu, &nib, &g_pfn)) {
- VFIO_AP_DBF_WARN("%s: invalid NIB address: nib=%#lx, g_pfn=%#lx, apqn=%#04x\n",
- __func__, nib, g_pfn, q->apqn);
+ if (vfio_ap_validate_nib(vcpu, &nib)) {
+ VFIO_AP_DBF_WARN("%s: invalid NIB address: nib=%pad, apqn=%#04x\n",
+ __func__, &nib, q->apqn);

status.response_code = AP_RESPONSE_INVALID_ADDRESS;
return status;
}

- ret = vfio_pin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1,
+ ret = vfio_pin_pages(&q->matrix_mdev->vdev, nib, 1,
IOMMU_READ | IOMMU_WRITE, &h_pfn);
switch (ret) {
case 1:
break;
default:
VFIO_AP_DBF_WARN("%s: vfio_pin_pages failed: rc=%d,"
- "nib=%#lx, g_pfn=%#lx, apqn=%#04x\n",
- __func__, ret, nib, g_pfn, q->apqn);
+ "nib=%pad, apqn=%#04x\n",
+ __func__, ret, &nib, q->apqn);

status.response_code = AP_RESPONSE_INVALID_ADDRESS;
return status;
@@ -296,12 +288,12 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
case AP_RESPONSE_NORMAL:
/* See if we did clear older IRQ configuration */
vfio_ap_free_aqic_resources(q);
- q->saved_pfn = g_pfn;
+ q->saved_iova = nib;
q->saved_isc = isc;
break;
case AP_RESPONSE_OTHERWISE_CHANGED:
/* We could not modify IRQ setings: clear new configuration */
- vfio_unpin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1);
+ vfio_unpin_pages(&q->matrix_mdev->vdev, nib, 1);
kvm_s390_gisc_unregister(kvm, isc);
break;
default:
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index abb59d59f81b..d912487175d3 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -99,13 +99,13 @@ struct ap_matrix_mdev {
* struct vfio_ap_queue - contains the data associated with a queue bound to the
* vfio_ap device driver
* @matrix_mdev: the matrix mediated device
- * @saved_pfn: the guest PFN pinned for the guest
+ * @saved_iova: the notification indicator byte (nib) address
* @apqn: the APQN of the AP queue device
* @saved_isc: the guest ISC registered with the GIB interface
*/
struct vfio_ap_queue {
struct ap_matrix_mdev *matrix_mdev;
- unsigned long saved_pfn;
+ dma_addr_t saved_iova;
int apqn;
#define VFIO_AP_ISC_INVALID 0xff
unsigned char saved_isc;
--
2.17.1

2022-07-23 02:08:55

by Nicolin Chen

[permalink] [raw]
Subject: [PATCH v4 10/10] vfio: Replace phys_pfn with pages for vfio_pin_pages()

Most of the callers of vfio_pin_pages() want "struct page *" and the
low-level mm code to pin pages returns a list of "struct page *" too.
So there's no gain in converting "struct page *" to PFN in between.

Replace the output parameter "phys_pfn" list with a "pages" list, to
simplify callers. This also allows us to replace the vfio_iommu_type1
implementation with a more efficient one.

And drop the pfn_valid check in the gvt code, as there is no need to
do such a check at a page-backed struct page pointer.

For now, also update vfio_iommu_type1 to fit this new parameter too.

Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Kirti Wankhede <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Eric Farman <[email protected]>
Tested-by: Terrence Xu <[email protected]>
Tested-by: Eric Farman <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
.../driver-api/vfio-mediated-device.rst | 2 +-
drivers/gpu/drm/i915/gvt/kvmgt.c | 19 ++++++-------------
drivers/s390/cio/vfio_ccw_cp.c | 19 +++++++++----------
drivers/s390/crypto/vfio_ap_ops.c | 6 +++---
drivers/vfio/vfio.c | 8 ++++----
drivers/vfio/vfio.h | 2 +-
drivers/vfio/vfio_iommu_type1.c | 19 +++++++++++--------
include/linux/vfio.h | 2 +-
8 files changed, 36 insertions(+), 41 deletions(-)

diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
index ea32a0f13ddb..ba5fefcdae1a 100644
--- a/Documentation/driver-api/vfio-mediated-device.rst
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -263,7 +263,7 @@ The following APIs are provided for translating user pfn to host pfn in a VFIO
driver::

int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
- int npage, int prot, unsigned long *phys_pfn);
+ int npage, int prot, struct page **pages);

void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
int npage);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 8be75c282611..e3cd58946477 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -240,7 +240,7 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
unsigned long size, struct page **page)
{
int total_pages = DIV_ROUND_UP(size, PAGE_SIZE);
- unsigned long base_pfn = 0;
+ struct page *base_page = NULL;
int npage;
int ret;

@@ -250,26 +250,19 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
*/
for (npage = 0; npage < total_pages; npage++) {
dma_addr_t cur_iova = (gfn + npage) << PAGE_SHIFT;
- unsigned long pfn;
+ struct page *cur_page;

ret = vfio_pin_pages(&vgpu->vfio_device, cur_iova, 1,
- IOMMU_READ | IOMMU_WRITE, &pfn);
+ IOMMU_READ | IOMMU_WRITE, &cur_page);
if (ret != 1) {
gvt_vgpu_err("vfio_pin_pages failed for iova %pad, ret %d\n",
&cur_iova, ret);
goto err;
}

- if (!pfn_valid(pfn)) {
- gvt_vgpu_err("pfn 0x%lx is not mem backed\n", pfn);
- npage++;
- ret = -EFAULT;
- goto err;
- }
-
if (npage == 0)
- base_pfn = pfn;
- else if (base_pfn + npage != pfn) {
+ base_page = cur_page;
+ else if (base_page + npage != cur_page) {
gvt_vgpu_err("The pages are not continuous\n");
ret = -EINVAL;
npage++;
@@ -277,7 +270,7 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
}
}

- *page = pfn_to_page(base_pfn);
+ *page = base_page;
return 0;
err:
gvt_unpin_guest_page(vgpu, gfn, npage * PAGE_SIZE);
diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index cd4ec4f6d6ff..8963f452f963 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -22,8 +22,8 @@
struct page_array {
/* Array that stores pages need to pin. */
dma_addr_t *pa_iova;
- /* Array that receives PFNs of the pages pinned. */
- unsigned long *pa_pfn;
+ /* Array that receives the pinned pages. */
+ struct page **pa_page;
/* Number of pages pinned from @pa_iova. */
int pa_nr;
};
@@ -68,19 +68,19 @@ static int page_array_alloc(struct page_array *pa, u64 iova, unsigned int len)
return -EINVAL;

pa->pa_iova = kcalloc(pa->pa_nr,
- sizeof(*pa->pa_iova) + sizeof(*pa->pa_pfn),
+ sizeof(*pa->pa_iova) + sizeof(*pa->pa_page),
GFP_KERNEL);
if (unlikely(!pa->pa_iova)) {
pa->pa_nr = 0;
return -ENOMEM;
}
- pa->pa_pfn = (unsigned long *)&pa->pa_iova[pa->pa_nr];
+ pa->pa_page = (struct page **)&pa->pa_iova[pa->pa_nr];

pa->pa_iova[0] = iova;
- pa->pa_pfn[0] = -1ULL;
+ pa->pa_page[0] = NULL;
for (i = 1; i < pa->pa_nr; i++) {
pa->pa_iova[i] = pa->pa_iova[i - 1] + PAGE_SIZE;
- pa->pa_pfn[i] = -1ULL;
+ pa->pa_page[i] = NULL;
}

return 0;
@@ -144,7 +144,7 @@ static int page_array_pin(struct page_array *pa, struct vfio_device *vdev)

ret = vfio_pin_pages(vdev, *first, npage,
IOMMU_READ | IOMMU_WRITE,
- &pa->pa_pfn[pinned]);
+ &pa->pa_page[pinned]);
if (ret < 0) {
goto err_out;
} else if (ret > 0 && ret != npage) {
@@ -195,7 +195,7 @@ static inline void page_array_idal_create_words(struct page_array *pa,
*/

for (i = 0; i < pa->pa_nr; i++)
- idaws[i] = pa->pa_pfn[i] << PAGE_SHIFT;
+ idaws[i] = page_to_phys(pa->pa_page[i]);

/* Adjust the first IDAW, since it may not start on a page boundary */
idaws[0] += pa->pa_iova[0] & (PAGE_SIZE - 1);
@@ -246,8 +246,7 @@ static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,

l = n;
for (i = 0; i < pa.pa_nr; i++) {
- struct page *page = pfn_to_page(pa.pa_pfn[i]);
- void *from = kmap_local_page(page);
+ void *from = kmap_local_page(pa.pa_page[i]);

m = PAGE_SIZE;
if (i == 0) {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index d7c38c82f694..75cd92c291e3 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -234,9 +234,9 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
struct ap_qirq_ctrl aqic_gisa = {};
struct ap_queue_status status = {};
struct kvm_s390_gisa *gisa;
+ struct page *h_page;
int nisc;
struct kvm *kvm;
- unsigned long h_pfn;
phys_addr_t h_nib;
dma_addr_t nib;
int ret;
@@ -251,7 +251,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
}

ret = vfio_pin_pages(&q->matrix_mdev->vdev, nib, 1,
- IOMMU_READ | IOMMU_WRITE, &h_pfn);
+ IOMMU_READ | IOMMU_WRITE, &h_page);
switch (ret) {
case 1:
break;
@@ -267,7 +267,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
kvm = q->matrix_mdev->kvm;
gisa = kvm->arch.gisa_int.origin;

- h_nib = (h_pfn << PAGE_SHIFT) | (nib & ~PAGE_MASK);
+ h_nib = page_to_phys(h_page) | (nib & ~PAGE_MASK);
aqic_gisa.gisc = isc;

nisc = kvm_s390_gisc_register(kvm, isc);
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 606a20b605ba..8e23ca59ceed 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1941,18 +1941,18 @@ EXPORT_SYMBOL(vfio_set_irqs_validate_and_prepare);
* @npage [in] : count of pages to be pinned. This count should not
* be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
* @prot [in] : protection flags
- * @phys_pfn[out]: array of host PFNs
+ * @pages[out] : array of host pages
* Return error or number of pages pinned.
*/
int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
- int npage, int prot, unsigned long *phys_pfn)
+ int npage, int prot, struct page **pages)
{
struct vfio_container *container;
struct vfio_group *group = device->group;
struct vfio_iommu_driver *driver;
int ret;

- if (!phys_pfn || !npage || !vfio_assert_device_open(device))
+ if (!pages || !npage || !vfio_assert_device_open(device))
return -EINVAL;

if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
@@ -1967,7 +1967,7 @@ int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
if (likely(driver && driver->ops->pin_pages))
ret = driver->ops->pin_pages(container->iommu_data,
group->iommu_group, iova,
- npage, prot, phys_pfn);
+ npage, prot, pages);
else
ret = -ENOTTY;

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index e9767e13f00f..503bea6c843d 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -52,7 +52,7 @@ struct vfio_iommu_driver_ops {
struct iommu_group *group,
dma_addr_t user_iova,
int npage, int prot,
- unsigned long *phys_pfn);
+ struct page **pages);
void (*unpin_pages)(void *iommu_data,
dma_addr_t user_iova, int npage);
void (*register_device)(void *iommu_data,
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index e629e059118c..db516c90a977 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -831,7 +831,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
struct iommu_group *iommu_group,
dma_addr_t user_iova,
int npage, int prot,
- unsigned long *phys_pfn)
+ struct page **pages)
{
struct vfio_iommu *iommu = iommu_data;
struct vfio_iommu_group *group;
@@ -841,7 +841,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
bool do_accounting;
dma_addr_t iova;

- if (!iommu || !phys_pfn)
+ if (!iommu || !pages)
return -EINVAL;

/* Supported for v2 version only */
@@ -880,6 +880,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
do_accounting = list_empty(&iommu->domain_list);

for (i = 0; i < npage; i++) {
+ unsigned long phys_pfn;
struct vfio_pfn *vpfn;

iova = user_iova + PAGE_SIZE * i;
@@ -896,23 +897,25 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,

vpfn = vfio_iova_get_vfio_pfn(dma, iova);
if (vpfn) {
- phys_pfn[i] = vpfn->pfn;
+ pages[i] = pfn_to_page(vpfn->pfn);
continue;
}

remote_vaddr = dma->vaddr + (iova - dma->iova);
- ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn[i],
+ ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn,
do_accounting);
if (ret)
goto pin_unwind;

- ret = vfio_add_to_pfn_list(dma, iova, phys_pfn[i]);
+ ret = vfio_add_to_pfn_list(dma, iova, phys_pfn);
if (ret) {
- if (put_pfn(phys_pfn[i], dma->prot) && do_accounting)
+ if (put_pfn(phys_pfn, dma->prot) && do_accounting)
vfio_lock_acct(dma, -1, true);
goto pin_unwind;
}

+ pages[i] = pfn_to_page(phys_pfn);
+
if (iommu->dirty_page_tracking) {
unsigned long pgshift = __ffs(iommu->pgsize_bitmap);

@@ -935,14 +938,14 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
goto pin_done;

pin_unwind:
- phys_pfn[i] = 0;
+ pages[i] = NULL;
for (j = 0; j < i; j++) {
dma_addr_t iova;

iova = user_iova + PAGE_SIZE * j;
dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
vfio_unpin_page_external(dma, iova, do_accounting);
- phys_pfn[j] = 0;
+ pages[j] = NULL;
}
pin_done:
mutex_unlock(&iommu->lock);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index acefd663e63b..e05ddc6fe6a5 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -162,7 +162,7 @@ bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
#define VFIO_PIN_PAGES_MAX_ENTRIES (PAGE_SIZE/sizeof(unsigned long))

int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
- int npage, int prot, unsigned long *phys_pfn);
+ int npage, int prot, struct page **pages);
void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage);
int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova,
void *data, size_t len, bool write);
--
2.17.1

2022-07-23 02:16:39

by Nicolin Chen

[permalink] [raw]
Subject: [PATCH v4 09/10] vfio/ccw: Add kmap_local_page() for memcpy

A PFN is not secure enough to promise that the memory is not IO. And
direct access via memcpy() that only handles CPU memory will crash on
S390 if the PFN is an IO PFN, as we have to use the memcpy_to/fromio()
that uses the special S390 IO access instructions. On the other hand,
a "struct page *" is always a CPU coherent thing that fits memcpy().

Also, casting a PFN to "void *" for memcpy() is not a proper practice,
kmap_local_page() is the correct API to call here, though S390 doesn't
use highmem, which means kmap_local_page() is a NOP.

There's a following patch changing the vfio_pin_pages() API to return
a list of "struct page *" instead of PFNs. It will block any IO memory
from ever getting into this call path, for such a security purpose. In
this patch, add kmap_local_page() to prepare for that.

Suggested-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Acked-by: Eric Farman <[email protected]>
Tested-by: Eric Farman <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/s390/cio/vfio_ccw_cp.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 3854c3d573f5..cd4ec4f6d6ff 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -11,6 +11,7 @@
#include <linux/ratelimit.h>
#include <linux/mm.h>
#include <linux/slab.h>
+#include <linux/highmem.h>
#include <linux/iommu.h>
#include <linux/vfio.h>
#include <asm/idals.h>
@@ -230,7 +231,6 @@ static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,
unsigned long n)
{
struct page_array pa = {0};
- u64 from;
int i, ret;
unsigned long l, m;

@@ -246,7 +246,9 @@ static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,

l = n;
for (i = 0; i < pa.pa_nr; i++) {
- from = pa.pa_pfn[i] << PAGE_SHIFT;
+ struct page *page = pfn_to_page(pa.pa_pfn[i]);
+ void *from = kmap_local_page(page);
+
m = PAGE_SIZE;
if (i == 0) {
from += iova & (PAGE_SIZE - 1);
@@ -254,7 +256,8 @@ static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,
}

m = min(l, m);
- memcpy(to + (n - l), (void *)from, m);
+ memcpy(to + (n - l), from, m);
+ kunmap_local(from);

l -= m;
if (l == 0)
--
2.17.1

2022-07-23 02:16:39

by Nicolin Chen

[permalink] [raw]
Subject: [PATCH v4 08/10] vfio: Rename user_iova of vfio_dma_rw()

Following the updated vfio_pin/unpin_pages(), use the simpler "iova".

Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Tested-by: Terrence Xu <[email protected]>
Tested-by: Eric Farman <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/vfio/vfio.c | 6 +++---
include/linux/vfio.h | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index ffd1a492eea9..606a20b605ba 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -2012,13 +2012,13 @@ EXPORT_SYMBOL(vfio_unpin_pages);
* not a real device DMA, it is not necessary to pin the user space memory.
*
* @device [in] : VFIO device
- * @user_iova [in] : base IOVA of a user space buffer
+ * @iova [in] : base IOVA of a user space buffer
* @data [in] : pointer to kernel buffer
* @len [in] : kernel buffer length
* @write : indicate read or write
* Return error code on failure or 0 on success.
*/
-int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova, void *data,
+int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova, void *data,
size_t len, bool write)
{
struct vfio_container *container;
@@ -2034,7 +2034,7 @@ int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova, void *data,

if (likely(driver && driver->ops->dma_rw))
ret = driver->ops->dma_rw(container->iommu_data,
- user_iova, data, len, write);
+ iova, data, len, write);
else
ret = -ENOTTY;
return ret;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 9e3b6abcf890..acefd663e63b 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -164,7 +164,7 @@ bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
int npage, int prot, unsigned long *phys_pfn);
void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage);
-int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova,
+int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova,
void *data, size_t len, bool write);

/*
--
2.17.1

2022-07-23 02:31:14

by Nicolin Chen

[permalink] [raw]
Subject: [PATCH v4 07/10] vfio/ccw: Change pa_pfn list to pa_iova list

The vfio_ccw_cp code maintains both iova and its PFN list because the
vfio_pin/unpin_pages API wanted pfn list. Since vfio_pin/unpin_pages()
now accept "iova", change to maintain only pa_iova list and rename all
"pfn_array" strings to "page_array", so as to simplify the code.

Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Eric Farman <[email protected]>
Tested-by: Eric Farman <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/s390/cio/vfio_ccw_cp.c | 135 ++++++++++++++++-----------------
1 file changed, 64 insertions(+), 71 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index a739262f988d..3854c3d573f5 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -18,11 +18,9 @@
#include "vfio_ccw_cp.h"
#include "vfio_ccw_private.h"

-struct pfn_array {
- /* Starting guest physical I/O address. */
- unsigned long pa_iova;
- /* Array that stores PFNs of the pages need to pin. */
- unsigned long *pa_iova_pfn;
+struct page_array {
+ /* Array that stores pages need to pin. */
+ dma_addr_t *pa_iova;
/* Array that receives PFNs of the pages pinned. */
unsigned long *pa_pfn;
/* Number of pages pinned from @pa_iova. */
@@ -37,53 +35,50 @@ struct ccwchain {
/* Count of the valid ccws in chain. */
int ch_len;
/* Pinned PAGEs for the original data. */
- struct pfn_array *ch_pa;
+ struct page_array *ch_pa;
};

/*
- * pfn_array_alloc() - alloc memory for PFNs
- * @pa: pfn_array on which to perform the operation
+ * page_array_alloc() - alloc memory for page array
+ * @pa: page_array on which to perform the operation
* @iova: target guest physical address
* @len: number of bytes that should be pinned from @iova
*
- * Attempt to allocate memory for PFNs.
+ * Attempt to allocate memory for page array.
*
- * Usage of pfn_array:
- * We expect (pa_nr == 0) and (pa_iova_pfn == NULL), any field in
+ * Usage of page_array:
+ * We expect (pa_nr == 0) and (pa_iova == NULL), any field in
* this structure will be filled in by this function.
*
* Returns:
- * 0 if PFNs are allocated
- * -EINVAL if pa->pa_nr is not initially zero, or pa->pa_iova_pfn is not NULL
+ * 0 if page array is allocated
+ * -EINVAL if pa->pa_nr is not initially zero, or pa->pa_iova is not NULL
* -ENOMEM if alloc failed
*/
-static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
+static int page_array_alloc(struct page_array *pa, u64 iova, unsigned int len)
{
int i;

- if (pa->pa_nr || pa->pa_iova_pfn)
+ if (pa->pa_nr || pa->pa_iova)
return -EINVAL;

- pa->pa_iova = iova;
-
pa->pa_nr = ((iova & ~PAGE_MASK) + len + (PAGE_SIZE - 1)) >> PAGE_SHIFT;
if (!pa->pa_nr)
return -EINVAL;

- pa->pa_iova_pfn = kcalloc(pa->pa_nr,
- sizeof(*pa->pa_iova_pfn) +
- sizeof(*pa->pa_pfn),
- GFP_KERNEL);
- if (unlikely(!pa->pa_iova_pfn)) {
+ pa->pa_iova = kcalloc(pa->pa_nr,
+ sizeof(*pa->pa_iova) + sizeof(*pa->pa_pfn),
+ GFP_KERNEL);
+ if (unlikely(!pa->pa_iova)) {
pa->pa_nr = 0;
return -ENOMEM;
}
- pa->pa_pfn = pa->pa_iova_pfn + pa->pa_nr;
+ pa->pa_pfn = (unsigned long *)&pa->pa_iova[pa->pa_nr];

- pa->pa_iova_pfn[0] = pa->pa_iova >> PAGE_SHIFT;
+ pa->pa_iova[0] = iova;
pa->pa_pfn[0] = -1ULL;
for (i = 1; i < pa->pa_nr; i++) {
- pa->pa_iova_pfn[i] = pa->pa_iova_pfn[i - 1] + 1;
+ pa->pa_iova[i] = pa->pa_iova[i - 1] + PAGE_SIZE;
pa->pa_pfn[i] = -1ULL;
}

@@ -91,30 +86,30 @@ static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
}

/*
- * pfn_array_unpin() - Unpin user pages in memory
- * @pa: pfn_array on which to perform the operation
+ * page_array_unpin() - Unpin user pages in memory
+ * @pa: page_array on which to perform the operation
* @vdev: the vfio device to perform the operation
* @pa_nr: number of user pages to unpin
*
* Only unpin if any pages were pinned to begin with, i.e. pa_nr > 0,
* otherwise only clear pa->pa_nr
*/
-static void pfn_array_unpin(struct pfn_array *pa,
- struct vfio_device *vdev, int pa_nr)
+static void page_array_unpin(struct page_array *pa,
+ struct vfio_device *vdev, int pa_nr)
{
int unpinned = 0, npage = 1;

while (unpinned < pa_nr) {
- unsigned long *first = &pa->pa_iova_pfn[unpinned];
- unsigned long *last = &first[npage];
+ dma_addr_t *first = &pa->pa_iova[unpinned];
+ dma_addr_t *last = &first[npage];

if (unpinned + npage < pa_nr &&
- *first + npage == *last) {
+ *first + npage * PAGE_SIZE == *last) {
npage++;
continue;
}

- vfio_unpin_pages(vdev, *first << PAGE_SHIFT, npage);
+ vfio_unpin_pages(vdev, *first, npage);
unpinned += npage;
npage = 1;
}
@@ -123,30 +118,30 @@ static void pfn_array_unpin(struct pfn_array *pa,
}

/*
- * pfn_array_pin() - Pin user pages in memory
- * @pa: pfn_array on which to perform the operation
+ * page_array_pin() - Pin user pages in memory
+ * @pa: page_array on which to perform the operation
* @mdev: the mediated device to perform pin operations
*
* Returns number of pages pinned upon success.
* If the pin request partially succeeds, or fails completely,
* all pages are left unpinned and a negative error value is returned.
*/
-static int pfn_array_pin(struct pfn_array *pa, struct vfio_device *vdev)
+static int page_array_pin(struct page_array *pa, struct vfio_device *vdev)
{
int pinned = 0, npage = 1;
int ret = 0;

while (pinned < pa->pa_nr) {
- unsigned long *first = &pa->pa_iova_pfn[pinned];
- unsigned long *last = &first[npage];
+ dma_addr_t *first = &pa->pa_iova[pinned];
+ dma_addr_t *last = &first[npage];

if (pinned + npage < pa->pa_nr &&
- *first + npage == *last) {
+ *first + npage * PAGE_SIZE == *last) {
npage++;
continue;
}

- ret = vfio_pin_pages(vdev, *first << PAGE_SHIFT, npage,
+ ret = vfio_pin_pages(vdev, *first, npage,
IOMMU_READ | IOMMU_WRITE,
&pa->pa_pfn[pinned]);
if (ret < 0) {
@@ -163,32 +158,30 @@ static int pfn_array_pin(struct pfn_array *pa, struct vfio_device *vdev)
return ret;

err_out:
- pfn_array_unpin(pa, vdev, pinned);
+ page_array_unpin(pa, vdev, pinned);
return ret;
}

/* Unpin the pages before releasing the memory. */
-static void pfn_array_unpin_free(struct pfn_array *pa, struct vfio_device *vdev)
+static void page_array_unpin_free(struct page_array *pa, struct vfio_device *vdev)
{
- pfn_array_unpin(pa, vdev, pa->pa_nr);
- kfree(pa->pa_iova_pfn);
+ page_array_unpin(pa, vdev, pa->pa_nr);
+ kfree(pa->pa_iova);
}

-static bool pfn_array_iova_pinned(struct pfn_array *pa, unsigned long iova)
+static bool page_array_iova_pinned(struct page_array *pa, unsigned long iova)
{
- unsigned long iova_pfn = iova >> PAGE_SHIFT;
int i;

for (i = 0; i < pa->pa_nr; i++)
- if (pa->pa_iova_pfn[i] == iova_pfn)
+ if (pa->pa_iova[i] == iova)
return true;

return false;
}
-/* Create the list of IDAL words for a pfn_array. */
-static inline void pfn_array_idal_create_words(
- struct pfn_array *pa,
- unsigned long *idaws)
+/* Create the list of IDAL words for a page_array. */
+static inline void page_array_idal_create_words(struct page_array *pa,
+ unsigned long *idaws)
{
int i;

@@ -204,7 +197,7 @@ static inline void pfn_array_idal_create_words(
idaws[i] = pa->pa_pfn[i] << PAGE_SHIFT;

/* Adjust the first IDAW, since it may not start on a page boundary */
- idaws[0] += pa->pa_iova & (PAGE_SIZE - 1);
+ idaws[0] += pa->pa_iova[0] & (PAGE_SIZE - 1);
}

static void convert_ccw0_to_ccw1(struct ccw1 *source, unsigned long len)
@@ -236,18 +229,18 @@ static void convert_ccw0_to_ccw1(struct ccw1 *source, unsigned long len)
static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,
unsigned long n)
{
- struct pfn_array pa = {0};
+ struct page_array pa = {0};
u64 from;
int i, ret;
unsigned long l, m;

- ret = pfn_array_alloc(&pa, iova, n);
+ ret = page_array_alloc(&pa, iova, n);
if (ret < 0)
return ret;

- ret = pfn_array_pin(&pa, vdev);
+ ret = page_array_pin(&pa, vdev);
if (ret < 0) {
- pfn_array_unpin_free(&pa, vdev);
+ page_array_unpin_free(&pa, vdev);
return ret;
}

@@ -268,7 +261,7 @@ static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,
break;
}

- pfn_array_unpin_free(&pa, vdev);
+ page_array_unpin_free(&pa, vdev);

return l;
}
@@ -371,7 +364,7 @@ static struct ccwchain *ccwchain_alloc(struct channel_program *cp, int len)
chain->ch_ccw = (struct ccw1 *)data;

data = (u8 *)(chain->ch_ccw) + sizeof(*chain->ch_ccw) * len;
- chain->ch_pa = (struct pfn_array *)data;
+ chain->ch_pa = (struct page_array *)data;

chain->ch_len = len;

@@ -555,7 +548,7 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
struct vfio_device *vdev =
&container_of(cp, struct vfio_ccw_private, cp)->vdev;
struct ccw1 *ccw;
- struct pfn_array *pa;
+ struct page_array *pa;
u64 iova;
unsigned long *idaws;
int ret;
@@ -589,13 +582,13 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
}

/*
- * Allocate an array of pfn's for pages to pin/translate.
+ * Allocate an array of pages to pin/translate.
* The number of pages is actually the count of the idaws
* required for the data transfer, since we only only support
* 4K IDAWs today.
*/
pa = chain->ch_pa + idx;
- ret = pfn_array_alloc(pa, iova, bytes);
+ ret = page_array_alloc(pa, iova, bytes);
if (ret < 0)
goto out_free_idaws;

@@ -606,21 +599,21 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
goto out_unpin;

/*
- * Copy guest IDAWs into pfn_array, in case the memory they
+ * Copy guest IDAWs into page_array, in case the memory they
* occupy is not contiguous.
*/
for (i = 0; i < idaw_nr; i++)
- pa->pa_iova_pfn[i] = idaws[i] >> PAGE_SHIFT;
+ pa->pa_iova[i] = idaws[i];
} else {
/*
- * No action is required here; the iova addresses in pfn_array
- * were initialized sequentially in pfn_array_alloc() beginning
+ * No action is required here; the iova addresses in page_array
+ * were initialized sequentially in page_array_alloc() beginning
* with the contents of ccw->cda.
*/
}

if (ccw_does_data_transfer(ccw)) {
- ret = pfn_array_pin(pa, vdev);
+ ret = page_array_pin(pa, vdev);
if (ret < 0)
goto out_unpin;
} else {
@@ -630,13 +623,13 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
ccw->cda = (__u32) virt_to_phys(idaws);
ccw->flags |= CCW_FLAG_IDA;

- /* Populate the IDAL with pinned/translated addresses from pfn */
- pfn_array_idal_create_words(pa, idaws);
+ /* Populate the IDAL with pinned/translated addresses from page */
+ page_array_idal_create_words(pa, idaws);

return 0;

out_unpin:
- pfn_array_unpin_free(pa, vdev);
+ page_array_unpin_free(pa, vdev);
out_free_idaws:
kfree(idaws);
out_init:
@@ -742,7 +735,7 @@ void cp_free(struct channel_program *cp)
cp->initialized = false;
list_for_each_entry_safe(chain, temp, &cp->ccwchain_list, next) {
for (i = 0; i < chain->ch_len; i++) {
- pfn_array_unpin_free(chain->ch_pa + i, vdev);
+ page_array_unpin_free(chain->ch_pa + i, vdev);
ccwchain_cda_free(chain, i);
}
ccwchain_free(chain);
@@ -918,7 +911,7 @@ bool cp_iova_pinned(struct channel_program *cp, u64 iova)

list_for_each_entry(chain, &cp->ccwchain_list, next) {
for (i = 0; i < chain->ch_len; i++)
- if (pfn_array_iova_pinned(chain->ch_pa + i, iova))
+ if (page_array_iova_pinned(chain->ch_pa + i, iova))
return true;
}

--
2.17.1

2022-07-25 10:35:30

by Xu, Terrence

[permalink] [raw]
Subject: RE: [PATCH v4 00/10] cover-letter: Update vfio_pin/unpin_pages API

> -----Original Message-----
> From: intel-gvt-dev <[email protected]> On
> Behalf Of Nicolin Chen
>
> This is a preparatory series for IOMMUFD v2 patches. It prepares for
> replacing vfio_iommu_type1 implementations of vfio_pin/unpin_pages()
> with IOMMUFD version.
>
> There's a gap between these two versions: the vfio_iommu_type1 version
> inputs a non-contiguous PFN list and outputs another PFN list for the
> pinned physical page list, while the IOMMUFD version only supports a
> contiguous address input by accepting the starting IO virtual address of a
> set of pages to pin and by outputting to a physical page list.
>
> The nature of existing callers mostly aligns with the IOMMUFD version,
> except s390's vfio_ccw_cp code where some additional change is needed
> along with this series. Overall, updating to "iova" and "phys_page"
> does improve the caller side to some extent.
>
> Also fix a misuse of physical address and virtual address in the s390's crypto
> code. And update the input naming at the adjacent vfio_dma_rw().
>
> This is on github:
> https://github.com/nicolinc/iommufd/commits/vfio_pin_pages-v4
>
> Terrence has tested this series on i915; Eric has tested on s390.
>
> Thanks!
>
> Changelog
> v4:
> * Dropped double-shifting at two gvt_unpin_guest_page calls, fixing
> a bug that's discovered by Alex
> * Added Reviewed-by from Anthony Krowiak
> * Rebased on top of linux-vfio's next
> v3: https://lore.kernel.org/kvm/20220708224427.1245-1-
> [email protected]/
> * Added a patch to replace roundup with DIV_ROUND_UP in i915 gvt
> * Dropped the "driver->ops->unpin_pages" and NULL checks in PATCH-1
> * Changed to use WARN_ON and separate into lines in PATCH-1
> * Replaced "guest" words with "user" and fix typo in PATCH-5
> * Updated commit log of PATCH-1, PATCH-6, and PATCH-10
> * Added Reviewed/Acked-by from Christoph, Jason, Kirti, Kevin and Eric
> * Added Tested-by from Terrence (i915) and Eric (s390)
> v2: https://lore.kernel.org/kvm/20220706062759.24946-1-
> [email protected]/
> * Added a patch to make vfio_unpin_pages return void
> * Added two patches to remove PFN list from two s390 callers
> * Renamed "phys_page" parameter to "pages" for vfio_pin_pages
> * Updated commit log of kmap_local_page() patch
> * Added Harald's "Reviewed-by" to pa_ind patch
> * Rebased on top of Alex's extern removal path
> v1: https://lore.kernel.org/kvm/20220616235212.15185-1-
> [email protected]/
>
> Nicolin Chen (10):
> vfio: Make vfio_unpin_pages() return void
> drm/i915/gvt: Replace roundup with DIV_ROUND_UP
> vfio/ap: Pass in physical address of ind to ap_aqic()
> vfio/ccw: Only pass in contiguous pages
> vfio: Pass in starting IOVA to vfio_pin/unpin_pages API
> vfio/ap: Change saved_pfn to saved_iova
> vfio/ccw: Change pa_pfn list to pa_iova list
> vfio: Rename user_iova of vfio_dma_rw()
> vfio/ccw: Add kmap_local_page() for memcpy
> vfio: Replace phys_pfn with pages for vfio_pin_pages()
>
> .../driver-api/vfio-mediated-device.rst | 6 +-
> arch/s390/include/asm/ap.h | 6 +-
> drivers/gpu/drm/i915/gvt/kvmgt.c | 45 ++--
> drivers/s390/cio/vfio_ccw_cp.c | 195 +++++++++++-------
> drivers/s390/crypto/ap_queue.c | 2 +-
> drivers/s390/crypto/vfio_ap_ops.c | 54 +++--
> drivers/s390/crypto/vfio_ap_private.h | 4 +-
> drivers/vfio/vfio.c | 54 ++---
> drivers/vfio/vfio.h | 8 +-
> drivers/vfio/vfio_iommu_type1.c | 45 ++--
> include/linux/vfio.h | 9 +-
> 11 files changed, 213 insertions(+), 215 deletions(-)

Verified the Intel KVMGT feature, no regression be introduced by v4 patch series, the previous Call Trace issue already gone.
Tested-by: Terrence Xu <[email protected]>
> --
> 2.17.1

2022-07-25 21:20:55

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v4 00/10] cover-letter: Update vfio_pin/unpin_pages API

On Fri, 22 Jul 2022 19:02:46 -0700
Nicolin Chen <[email protected]> wrote:

> This is a preparatory series for IOMMUFD v2 patches. It prepares for
> replacing vfio_iommu_type1 implementations of vfio_pin/unpin_pages()
> with IOMMUFD version.
>
> There's a gap between these two versions: the vfio_iommu_type1 version
> inputs a non-contiguous PFN list and outputs another PFN list for the
> pinned physical page list, while the IOMMUFD version only supports a
> contiguous address input by accepting the starting IO virtual address
> of a set of pages to pin and by outputting to a physical page list.
>
> The nature of existing callers mostly aligns with the IOMMUFD version,
> except s390's vfio_ccw_cp code where some additional change is needed
> along with this series. Overall, updating to "iova" and "phys_page"
> does improve the caller side to some extent.
>
> Also fix a misuse of physical address and virtual address in the s390's
> crypto code. And update the input naming at the adjacent vfio_dma_rw().
>
> This is on github:
> https://github.com/nicolinc/iommufd/commits/vfio_pin_pages-v4
>
> Terrence has tested this series on i915; Eric has tested on s390.
>
> Thanks!
>
> Changelog
> v4:
> * Dropped double-shifting at two gvt_unpin_guest_page calls, fixing
> a bug that's discovered by Alex
> * Added Reviewed-by from Anthony Krowiak
> * Rebased on top of linux-vfio's next
> v3: https://lore.kernel.org/kvm/[email protected]/
> * Added a patch to replace roundup with DIV_ROUND_UP in i915 gvt
> * Dropped the "driver->ops->unpin_pages" and NULL checks in PATCH-1
> * Changed to use WARN_ON and separate into lines in PATCH-1
> * Replaced "guest" words with "user" and fix typo in PATCH-5
> * Updated commit log of PATCH-1, PATCH-6, and PATCH-10
> * Added Reviewed/Acked-by from Christoph, Jason, Kirti, Kevin and Eric
> * Added Tested-by from Terrence (i915) and Eric (s390)
> v2: https://lore.kernel.org/kvm/[email protected]/
> * Added a patch to make vfio_unpin_pages return void
> * Added two patches to remove PFN list from two s390 callers
> * Renamed "phys_page" parameter to "pages" for vfio_pin_pages
> * Updated commit log of kmap_local_page() patch
> * Added Harald's "Reviewed-by" to pa_ind patch
> * Rebased on top of Alex's extern removal path
> v1: https://lore.kernel.org/kvm/[email protected]/
>
> Nicolin Chen (10):
> vfio: Make vfio_unpin_pages() return void
> drm/i915/gvt: Replace roundup with DIV_ROUND_UP
> vfio/ap: Pass in physical address of ind to ap_aqic()
> vfio/ccw: Only pass in contiguous pages
> vfio: Pass in starting IOVA to vfio_pin/unpin_pages API
> vfio/ap: Change saved_pfn to saved_iova
> vfio/ccw: Change pa_pfn list to pa_iova list
> vfio: Rename user_iova of vfio_dma_rw()
> vfio/ccw: Add kmap_local_page() for memcpy
> vfio: Replace phys_pfn with pages for vfio_pin_pages()
>
> .../driver-api/vfio-mediated-device.rst | 6 +-
> arch/s390/include/asm/ap.h | 6 +-
> drivers/gpu/drm/i915/gvt/kvmgt.c | 45 ++--
> drivers/s390/cio/vfio_ccw_cp.c | 195 +++++++++++-------
> drivers/s390/crypto/ap_queue.c | 2 +-
> drivers/s390/crypto/vfio_ap_ops.c | 54 +++--
> drivers/s390/crypto/vfio_ap_private.h | 4 +-
> drivers/vfio/vfio.c | 54 ++---
> drivers/vfio/vfio.h | 8 +-
> drivers/vfio/vfio_iommu_type1.c | 45 ++--
> include/linux/vfio.h | 9 +-
> 11 files changed, 213 insertions(+), 215 deletions(-)
>

Applied to vfio next branch for v5.20. Thanks,

Alex