2022-07-06 06:29:56

by Nicolin Chen

[permalink] [raw]
Subject: [RFT][PATCH v2 0/9] Update vfio_pin/unpin_pages API

This is a preparatory series for IOMMUFD v2 patches. It prepares for
replacing vfio_iommu_type1 implementations of vfio_pin/unpin_pages()
with IOMMUFD version.

There's a gap between these two versions: the vfio_iommu_type1 version
inputs a non-contiguous PFN list and outputs another PFN list for the
pinned physical page list, while the IOMMUFD version only supports a
contiguous address input by accepting the starting IO virtual address
of a set of pages to pin and by outputting to a physical page list.

The nature of existing callers mostly aligns with the IOMMUFD version,
except s390's vfio_ccw_cp code where some additional change is needed
along with this series. Overall, updating to "iova" and "phys_page"
does improve the caller side to some extent.

Also fix a misuse of physical address and virtual address in the s390's
crypto code. And update the input naming at the adjacent vfio_dma_rw().

This is on github:
https://github.com/nicolinc/iommufd/commits/vfio_pin_pages

Request for testing: I only did build for s390 and i915 code, so it'd
be nice to have people who have environment to run sanity accordingly.

Thanks!

Changelog
v2:
* Added a patch to make vfio_unpin_pages return void
* Added two patches to remove PFN list from two s390 callers
* Renamed "phys_page" parameter to "pages" for vfio_pin_pages
* Updated commit log of kmap_local_page() patch
* Added Harald's "Reviewed-by" to pa_ind patch
* Rebased on top of Alex's extern removal path
v1: https://lore.kernel.org/kvm/[email protected]/

Nicolin Chen (9):
vfio: Make vfio_unpin_pages() return void
vfio/ap: Pass in physical address of ind to ap_aqic()
vfio/ccw: Only pass in contiguous pages
vfio: Pass in starting IOVA to vfio_pin/unpin_pages API
vfio/ap: Remove redundant pfn
vfio/ccw: Change pa_pfn list to pa_iova list
vfio: Rename user_iova of vfio_dma_rw()
vfio/ccw: Add kmap_local_page() for memcpy
vfio: Replace phys_pfn with pages for vfio_pin_pages()

.../driver-api/vfio-mediated-device.rst | 6 +-
arch/s390/include/asm/ap.h | 6 +-
drivers/gpu/drm/i915/gvt/kvmgt.c | 46 ++---
drivers/s390/cio/vfio_ccw_cp.c | 195 +++++++++++-------
drivers/s390/crypto/ap_queue.c | 2 +-
drivers/s390/crypto/vfio_ap_ops.c | 54 +++--
drivers/s390/crypto/vfio_ap_private.h | 4 +-
drivers/vfio/vfio.c | 55 +++--
drivers/vfio/vfio.h | 8 +-
drivers/vfio/vfio_iommu_type1.c | 46 +++--
include/linux/vfio.h | 9 +-
11 files changed, 218 insertions(+), 213 deletions(-)

--
2.17.1


2022-07-06 06:29:57

by Nicolin Chen

[permalink] [raw]
Subject: [RFT][PATCH v2 1/9] vfio: Make vfio_unpin_pages() return void

There's only one caller that checks its return value with a WARN_ON_ONCE,
while all other callers do not check return value at all. So simplify the
API to return void by embedding similar WARN_ON_ONCEs.

Suggested-by: Christoph Hellwig <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
.../driver-api/vfio-mediated-device.rst | 2 +-
drivers/gpu/drm/i915/gvt/kvmgt.c | 5 +---
drivers/vfio/vfio.c | 24 ++++++++-----------
drivers/vfio/vfio.h | 2 +-
drivers/vfio/vfio_iommu_type1.c | 16 ++++++-------
include/linux/vfio.h | 4 ++--
6 files changed, 23 insertions(+), 30 deletions(-)

diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
index 1c57815619fd..b0fdf76b339a 100644
--- a/Documentation/driver-api/vfio-mediated-device.rst
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -265,7 +265,7 @@ driver::
int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
int npage, int prot, unsigned long *phys_pfn);

- int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
+ void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
int npage);

These functions call back into the back-end IOMMU module by using the pin_pages
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index e2f6c56ab342..8c67c9aba82d 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -231,18 +231,15 @@ static void intel_gvt_cleanup_vgpu_type_groups(struct intel_gvt *gvt)
static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
unsigned long size)
{
- struct drm_i915_private *i915 = vgpu->gvt->gt->i915;
int total_pages;
int npage;
- int ret;

total_pages = roundup(size, PAGE_SIZE) / PAGE_SIZE;

for (npage = 0; npage < total_pages; npage++) {
unsigned long cur_gfn = gfn + npage;

- ret = vfio_unpin_pages(&vgpu->vfio_device, &cur_gfn, 1);
- drm_WARN_ON(&i915->drm, ret != 1);
+ vfio_unpin_pages(&vgpu->vfio_device, &cur_gfn, 1);
}
}

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 61e71c1154be..01f45ec70a3d 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1959,31 +1959,27 @@ EXPORT_SYMBOL(vfio_pin_pages);
* PFNs should not be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
* @npage [in] : count of elements in user_pfn array. This count should not
* be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
- * Return error or number of pages unpinned.
*/
-int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
- int npage)
+void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
+ int npage)
{
struct vfio_container *container;
struct vfio_iommu_driver *driver;
- int ret;

- if (!user_pfn || !npage || !vfio_assert_device_open(device))
- return -EINVAL;
+ if (WARN_ON_ONCE(!user_pfn || !npage || !vfio_assert_device_open(device)))
+ return;

- if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
- return -E2BIG;
+ if (WARN_ON_ONCE(npage > VFIO_PIN_PAGES_MAX_ENTRIES))
+ return;

/* group->container cannot change while a vfio device is open */
container = device->group->container;
driver = container->iommu_driver;
- if (likely(driver && driver->ops->unpin_pages))
- ret = driver->ops->unpin_pages(container->iommu_data, user_pfn,
- npage);
- else
- ret = -ENOTTY;

- return ret;
+ if (WARN_ON_ONCE(unlikely(!driver || !driver->ops->unpin_pages)))
+ return;
+
+ driver->ops->unpin_pages(container->iommu_data, user_pfn, npage);
}
EXPORT_SYMBOL(vfio_unpin_pages);

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index a67130221151..bef4edf58138 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -53,7 +53,7 @@ struct vfio_iommu_driver_ops {
unsigned long *user_pfn,
int npage, int prot,
unsigned long *phys_pfn);
- int (*unpin_pages)(void *iommu_data,
+ void (*unpin_pages)(void *iommu_data,
unsigned long *user_pfn, int npage);
int (*register_notifier)(void *iommu_data,
unsigned long *events,
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index c13b9290e357..08613edaf722 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -948,20 +948,19 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
return ret;
}

-static int vfio_iommu_type1_unpin_pages(void *iommu_data,
- unsigned long *user_pfn,
- int npage)
+static void vfio_iommu_type1_unpin_pages(void *iommu_data,
+ unsigned long *user_pfn, int npage)
{
struct vfio_iommu *iommu = iommu_data;
bool do_accounting;
int i;

- if (!iommu || !user_pfn || npage <= 0)
- return -EINVAL;
+ if (WARN_ON_ONCE(!iommu || !user_pfn || npage <= 0))
+ return;

/* Supported for v2 version only */
- if (!iommu->v2)
- return -EACCES;
+ if (WARN_ON_ONCE(!iommu->v2))
+ return;

mutex_lock(&iommu->lock);

@@ -979,7 +978,8 @@ static int vfio_iommu_type1_unpin_pages(void *iommu_data,
}

mutex_unlock(&iommu->lock);
- return i > 0 ? i : -EINVAL;
+
+ WARN_ON_ONCE(i != npage);
}

static long vfio_sync_unpin(struct vfio_dma *dma, struct vfio_domain *domain,
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 49580fa2073a..d0844ecdc961 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -149,8 +149,8 @@ bool vfio_file_has_dev(struct file *file, struct vfio_device *device);

int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
int npage, int prot, unsigned long *phys_pfn);
-int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
- int npage);
+void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
+ int npage);
int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova,
void *data, size_t len, bool write);

--
2.17.1

2022-07-06 06:29:56

by Nicolin Chen

[permalink] [raw]
Subject: [RFT][PATCH v2 2/9] vfio/ap: Pass in physical address of ind to ap_aqic()

The ap_aqic() is called by vfio_ap_irq_enable() where it passes in a
virt value that's casted from a physical address "h_nib". Inside the
ap_aqic(), it does virt_to_phys() again.

Since ap_aqic() needs a physical address, let's just pass in a pa of
ind directly. So change the "ind" to "pa_ind".

Reviewed-by: Harald Freudenberger <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
arch/s390/include/asm/ap.h | 6 +++---
drivers/s390/crypto/ap_queue.c | 2 +-
drivers/s390/crypto/vfio_ap_ops.c | 7 ++++---
3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/s390/include/asm/ap.h b/arch/s390/include/asm/ap.h
index b515cfa62bd9..f508f5025e38 100644
--- a/arch/s390/include/asm/ap.h
+++ b/arch/s390/include/asm/ap.h
@@ -227,13 +227,13 @@ struct ap_qirq_ctrl {
* ap_aqic(): Control interruption for a specific AP.
* @qid: The AP queue number
* @qirqctrl: struct ap_qirq_ctrl (64 bit value)
- * @ind: The notification indicator byte
+ * @pa_ind: Physical address of the notification indicator byte
*
* Returns AP queue status.
*/
static inline struct ap_queue_status ap_aqic(ap_qid_t qid,
struct ap_qirq_ctrl qirqctrl,
- void *ind)
+ phys_addr_t pa_ind)
{
unsigned long reg0 = qid | (3UL << 24); /* fc 3UL is AQIC */
union {
@@ -241,7 +241,7 @@ static inline struct ap_queue_status ap_aqic(ap_qid_t qid,
struct ap_qirq_ctrl qirqctrl;
struct ap_queue_status status;
} reg1;
- unsigned long reg2 = virt_to_phys(ind);
+ unsigned long reg2 = pa_ind;

reg1.qirqctrl = qirqctrl;

diff --git a/drivers/s390/crypto/ap_queue.c b/drivers/s390/crypto/ap_queue.c
index c48b0db824e3..a32457b4cbb8 100644
--- a/drivers/s390/crypto/ap_queue.c
+++ b/drivers/s390/crypto/ap_queue.c
@@ -34,7 +34,7 @@ static int ap_queue_enable_irq(struct ap_queue *aq, void *ind)

qirqctrl.ir = 1;
qirqctrl.isc = AP_ISC;
- status = ap_aqic(aq->qid, qirqctrl, ind);
+ status = ap_aqic(aq->qid, qirqctrl, virt_to_phys(ind));
switch (status.response_code) {
case AP_RESPONSE_NORMAL:
case AP_RESPONSE_OTHERWISE_CHANGED:
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index a7d2a95796d3..bb869b28cebd 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -154,7 +154,7 @@ static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
int retries = 5;

do {
- status = ap_aqic(q->apqn, aqic_gisa, NULL);
+ status = ap_aqic(q->apqn, aqic_gisa, 0);
switch (status.response_code) {
case AP_RESPONSE_OTHERWISE_CHANGED:
case AP_RESPONSE_NORMAL:
@@ -245,7 +245,8 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
struct kvm_s390_gisa *gisa;
int nisc;
struct kvm *kvm;
- unsigned long h_nib, g_pfn, h_pfn;
+ unsigned long g_pfn, h_pfn;
+ phys_addr_t h_nib;
int ret;

/* Verify that the notification indicator byte address is valid */
@@ -290,7 +291,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
aqic_gisa.ir = 1;
aqic_gisa.gisa = (uint64_t)gisa >> 4;

- status = ap_aqic(q->apqn, aqic_gisa, (void *)h_nib);
+ status = ap_aqic(q->apqn, aqic_gisa, h_nib);
switch (status.response_code) {
case AP_RESPONSE_NORMAL:
/* See if we did clear older IRQ configuration */
--
2.17.1

2022-07-06 06:30:04

by Nicolin Chen

[permalink] [raw]
Subject: [RFT][PATCH v2 3/9] vfio/ccw: Only pass in contiguous pages

This driver is the only caller of vfio_pin/unpin_pages that might pass
in a non-contiguous PFN list, but in many cases it has a contiguous PFN
list to process. So letting VFIO API handle a non-contiguous PFN list
is actually counterproductive.

Add a pair of simple loops to pass in contiguous PFNs only, to have an
efficient implementation in VFIO.

Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/s390/cio/vfio_ccw_cp.c | 70 +++++++++++++++++++++++++++-------
1 file changed, 56 insertions(+), 14 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 0c2be9421ab7..3b94863ad24e 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -90,6 +90,38 @@ static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
return 0;
}

+/*
+ * pfn_array_unpin() - Unpin user pages in memory
+ * @pa: pfn_array on which to perform the operation
+ * @vdev: the vfio device to perform the operation
+ * @pa_nr: number of user pages to unpin
+ *
+ * Only unpin if any pages were pinned to begin with, i.e. pa_nr > 0,
+ * otherwise only clear pa->pa_nr
+ */
+static void pfn_array_unpin(struct pfn_array *pa,
+ struct vfio_device *vdev, int pa_nr)
+{
+ int unpinned = 0, npage = 1;
+
+ while (unpinned < pa_nr) {
+ unsigned long *first = &pa->pa_iova_pfn[unpinned];
+ unsigned long *last = &first[npage];
+
+ if (unpinned + npage < pa_nr &&
+ *first + npage == *last) {
+ npage++;
+ continue;
+ }
+
+ vfio_unpin_pages(vdev, first, npage);
+ unpinned += npage;
+ npage = 1;
+ }
+
+ pa->pa_nr = 0;
+}
+
/*
* pfn_array_pin() - Pin user pages in memory
* @pa: pfn_array on which to perform the operation
@@ -101,34 +133,44 @@ static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
*/
static int pfn_array_pin(struct pfn_array *pa, struct vfio_device *vdev)
{
+ int pinned = 0, npage = 1;
int ret = 0;

- ret = vfio_pin_pages(vdev, pa->pa_iova_pfn, pa->pa_nr,
- IOMMU_READ | IOMMU_WRITE, pa->pa_pfn);
+ while (pinned < pa->pa_nr) {
+ unsigned long *first = &pa->pa_iova_pfn[pinned];
+ unsigned long *last = &first[npage];

- if (ret < 0) {
- goto err_out;
- } else if (ret > 0 && ret != pa->pa_nr) {
- vfio_unpin_pages(vdev, pa->pa_iova_pfn, ret);
- ret = -EINVAL;
- goto err_out;
+ if (pinned + npage < pa->pa_nr &&
+ *first + npage == *last) {
+ npage++;
+ continue;
+ }
+
+ ret = vfio_pin_pages(vdev, first, npage,
+ IOMMU_READ | IOMMU_WRITE,
+ &pa->pa_pfn[pinned]);
+ if (ret < 0) {
+ goto err_out;
+ } else if (ret > 0 && ret != npage) {
+ pinned += ret;
+ ret = -EINVAL;
+ goto err_out;
+ }
+ pinned += npage;
+ npage = 1;
}

return ret;

err_out:
- pa->pa_nr = 0;
-
+ pfn_array_unpin(pa, vdev, pinned);
return ret;
}

/* Unpin the pages before releasing the memory. */
static void pfn_array_unpin_free(struct pfn_array *pa, struct vfio_device *vdev)
{
- /* Only unpin if any pages were pinned to begin with */
- if (pa->pa_nr)
- vfio_unpin_pages(vdev, pa->pa_iova_pfn, pa->pa_nr);
- pa->pa_nr = 0;
+ pfn_array_unpin(pa, vdev, pa->pa_nr);
kfree(pa->pa_iova_pfn);
}

--
2.17.1

2022-07-06 06:30:20

by Nicolin Chen

[permalink] [raw]
Subject: [RFT][PATCH v2 7/9] vfio: Rename user_iova of vfio_dma_rw()

Following the updated vfio_pin/unpin_pages(), use the simpler "iova".

Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/vfio/vfio.c | 6 +++---
include/linux/vfio.h | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 813b517c973e..b95ec2d78966 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1991,13 +1991,13 @@ EXPORT_SYMBOL(vfio_unpin_pages);
* not a real device DMA, it is not necessary to pin the user space memory.
*
* @device [in] : VFIO device
- * @user_iova [in] : base IOVA of a user space buffer
+ * @iova [in] : base IOVA of a user space buffer
* @data [in] : pointer to kernel buffer
* @len [in] : kernel buffer length
* @write : indicate read or write
* Return error code on failure or 0 on success.
*/
-int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova, void *data,
+int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova, void *data,
size_t len, bool write)
{
struct vfio_container *container;
@@ -2013,7 +2013,7 @@ int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova, void *data,

if (likely(driver && driver->ops->dma_rw))
ret = driver->ops->dma_rw(container->iommu_data,
- user_iova, data, len, write);
+ iova, data, len, write);
else
ret = -ENOTTY;
return ret;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index c3e441c0bf4b..9108de34a79b 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -150,7 +150,7 @@ bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
int npage, int prot, unsigned long *phys_pfn);
void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage);
-int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova,
+int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova,
void *data, size_t len, bool write);

/* each type has independent events */
--
2.17.1

2022-07-06 06:30:25

by Nicolin Chen

[permalink] [raw]
Subject: [RFT][PATCH v2 9/9] vfio: Replace phys_pfn with pages for vfio_pin_pages()

Most of the callers of vfio_pin_pages() want "struct page *" and the
low-level mm code to pin pages returns a list of "struct page *" too.
So there's no gain in converting "struct page *" to PFN in between.

Replace the output parameter "phys_pfn" list with a "pages" list, to
simplify callers. This also allows us to replace the vfio_iommu_type1
implementation with a more efficient one.

For now, also update vfio_iommu_type1 to fit this new parameter too.

Signed-off-by: Nicolin Chen <[email protected]>
---
.../driver-api/vfio-mediated-device.rst | 2 +-
drivers/gpu/drm/i915/gvt/kvmgt.c | 19 ++++++-------------
drivers/s390/cio/vfio_ccw_cp.c | 19 +++++++++----------
drivers/s390/crypto/vfio_ap_ops.c | 6 +++---
drivers/vfio/vfio.c | 8 ++++----
drivers/vfio/vfio.h | 2 +-
drivers/vfio/vfio_iommu_type1.c | 19 +++++++++++--------
include/linux/vfio.h | 2 +-
8 files changed, 36 insertions(+), 41 deletions(-)

diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
index ea32a0f13ddb..ba5fefcdae1a 100644
--- a/Documentation/driver-api/vfio-mediated-device.rst
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -263,7 +263,7 @@ The following APIs are provided for translating user pfn to host pfn in a VFIO
driver::

int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
- int npage, int prot, unsigned long *phys_pfn);
+ int npage, int prot, struct page **pages);

void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
int npage);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index ea6041fa48ac..3a49471dcc16 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -239,7 +239,7 @@ static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
unsigned long size, struct page **page)
{
- unsigned long base_pfn = 0;
+ struct page *base_page = NULL;
int total_pages;
int npage;
int ret;
@@ -251,26 +251,19 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
*/
for (npage = 0; npage < total_pages; npage++) {
dma_addr_t cur_iova = (gfn + npage) << PAGE_SHIFT;
- unsigned long pfn;
+ struct page *cur_page;

ret = vfio_pin_pages(&vgpu->vfio_device, cur_iova, 1,
- IOMMU_READ | IOMMU_WRITE, &pfn);
+ IOMMU_READ | IOMMU_WRITE, &cur_page);
if (ret != 1) {
gvt_vgpu_err("vfio_pin_pages failed for iova %pad, ret %d\n",
&cur_iova, ret);
goto err;
}

- if (!pfn_valid(pfn)) {
- gvt_vgpu_err("pfn 0x%lx is not mem backed\n", pfn);
- npage++;
- ret = -EFAULT;
- goto err;
- }
-
if (npage == 0)
- base_pfn = pfn;
- else if (base_pfn + npage != pfn) {
+ base_page = cur_page;
+ else if (base_page + npage != cur_page) {
gvt_vgpu_err("The pages are not continuous\n");
ret = -EINVAL;
npage++;
@@ -278,7 +271,7 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
}
}

- *page = pfn_to_page(base_pfn);
+ *page = base_page;
return 0;
err:
gvt_unpin_guest_page(vgpu, gfn, npage * PAGE_SIZE);
diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index cd4ec4f6d6ff..8963f452f963 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -22,8 +22,8 @@
struct page_array {
/* Array that stores pages need to pin. */
dma_addr_t *pa_iova;
- /* Array that receives PFNs of the pages pinned. */
- unsigned long *pa_pfn;
+ /* Array that receives the pinned pages. */
+ struct page **pa_page;
/* Number of pages pinned from @pa_iova. */
int pa_nr;
};
@@ -68,19 +68,19 @@ static int page_array_alloc(struct page_array *pa, u64 iova, unsigned int len)
return -EINVAL;

pa->pa_iova = kcalloc(pa->pa_nr,
- sizeof(*pa->pa_iova) + sizeof(*pa->pa_pfn),
+ sizeof(*pa->pa_iova) + sizeof(*pa->pa_page),
GFP_KERNEL);
if (unlikely(!pa->pa_iova)) {
pa->pa_nr = 0;
return -ENOMEM;
}
- pa->pa_pfn = (unsigned long *)&pa->pa_iova[pa->pa_nr];
+ pa->pa_page = (struct page **)&pa->pa_iova[pa->pa_nr];

pa->pa_iova[0] = iova;
- pa->pa_pfn[0] = -1ULL;
+ pa->pa_page[0] = NULL;
for (i = 1; i < pa->pa_nr; i++) {
pa->pa_iova[i] = pa->pa_iova[i - 1] + PAGE_SIZE;
- pa->pa_pfn[i] = -1ULL;
+ pa->pa_page[i] = NULL;
}

return 0;
@@ -144,7 +144,7 @@ static int page_array_pin(struct page_array *pa, struct vfio_device *vdev)

ret = vfio_pin_pages(vdev, *first, npage,
IOMMU_READ | IOMMU_WRITE,
- &pa->pa_pfn[pinned]);
+ &pa->pa_page[pinned]);
if (ret < 0) {
goto err_out;
} else if (ret > 0 && ret != npage) {
@@ -195,7 +195,7 @@ static inline void page_array_idal_create_words(struct page_array *pa,
*/

for (i = 0; i < pa->pa_nr; i++)
- idaws[i] = pa->pa_pfn[i] << PAGE_SHIFT;
+ idaws[i] = page_to_phys(pa->pa_page[i]);

/* Adjust the first IDAW, since it may not start on a page boundary */
idaws[0] += pa->pa_iova[0] & (PAGE_SIZE - 1);
@@ -246,8 +246,7 @@ static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,

l = n;
for (i = 0; i < pa.pa_nr; i++) {
- struct page *page = pfn_to_page(pa.pa_pfn[i]);
- void *from = kmap_local_page(page);
+ void *from = kmap_local_page(pa.pa_page[i]);

m = PAGE_SIZE;
if (i == 0) {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 6945e0ddc08c..b0972ca0dfa3 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -234,9 +234,9 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
struct ap_qirq_ctrl aqic_gisa = {};
struct ap_queue_status status = {};
struct kvm_s390_gisa *gisa;
+ struct page *h_page;
int nisc;
struct kvm *kvm;
- unsigned long h_pfn;
phys_addr_t h_nib;
dma_addr_t nib;
int ret;
@@ -251,7 +251,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
}

ret = vfio_pin_pages(&q->matrix_mdev->vdev, nib, 1,
- IOMMU_READ | IOMMU_WRITE, &h_pfn);
+ IOMMU_READ | IOMMU_WRITE, &h_page);
switch (ret) {
case 1:
break;
@@ -267,7 +267,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
kvm = q->matrix_mdev->kvm;
gisa = kvm->arch.gisa_int.origin;

- h_nib = (h_pfn << PAGE_SHIFT) | (nib & ~PAGE_MASK);
+ h_nib = page_to_phys(h_page) | (nib & ~PAGE_MASK);
aqic_gisa.gisc = isc;

nisc = kvm_s390_gisc_register(kvm, isc);
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index b95ec2d78966..96b758e06c4a 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1917,18 +1917,18 @@ EXPORT_SYMBOL(vfio_set_irqs_validate_and_prepare);
* @npage [in] : count of pages to be pinned. This count should not
* be greater VFIO_PIN_PAGES_MAX_ENTRIES.
* @prot [in] : protection flags
- * @phys_pfn[out]: array of host PFNs
+ * @pages[out] : array of host pages
* Return error or number of pages pinned.
*/
int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
- int npage, int prot, unsigned long *phys_pfn)
+ int npage, int prot, struct page **pages)
{
struct vfio_container *container;
struct vfio_group *group = device->group;
struct vfio_iommu_driver *driver;
int ret;

- if (!phys_pfn || !npage || !vfio_assert_device_open(device))
+ if (!pages || !npage || !vfio_assert_device_open(device))
return -EINVAL;

if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
@@ -1943,7 +1943,7 @@ int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
if (likely(driver && driver->ops->pin_pages))
ret = driver->ops->pin_pages(container->iommu_data,
group->iommu_group, iova,
- npage, prot, phys_pfn);
+ npage, prot, pages);
else
ret = -ENOTTY;

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index dbcd0e8c031b..dbfad8e20581 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -52,7 +52,7 @@ struct vfio_iommu_driver_ops {
struct iommu_group *group,
dma_addr_t user_iova,
int npage, int prot,
- unsigned long *phys_pfn);
+ struct page **pages);
void (*unpin_pages)(void *iommu_data,
dma_addr_t user_iova, int npage);
int (*register_notifier)(void *iommu_data,
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index f10d0c4b1f26..de342d82b154 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -830,7 +830,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
struct iommu_group *iommu_group,
dma_addr_t user_iova,
int npage, int prot,
- unsigned long *phys_pfn)
+ struct page **pages)
{
struct vfio_iommu *iommu = iommu_data;
struct vfio_iommu_group *group;
@@ -840,7 +840,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
bool do_accounting;
dma_addr_t iova;

- if (!iommu || !phys_pfn)
+ if (!iommu || !pages)
return -EINVAL;

/* Supported for v2 version only */
@@ -879,6 +879,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
do_accounting = list_empty(&iommu->domain_list);

for (i = 0; i < npage; i++) {
+ unsigned long phys_pfn;
struct vfio_pfn *vpfn;

iova = user_iova + PAGE_SIZE * i;
@@ -895,23 +896,25 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,

vpfn = vfio_iova_get_vfio_pfn(dma, iova);
if (vpfn) {
- phys_pfn[i] = vpfn->pfn;
+ pages[i] = pfn_to_page(vpfn->pfn);
continue;
}

remote_vaddr = dma->vaddr + (iova - dma->iova);
- ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn[i],
+ ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn,
do_accounting);
if (ret)
goto pin_unwind;

- ret = vfio_add_to_pfn_list(dma, iova, phys_pfn[i]);
+ ret = vfio_add_to_pfn_list(dma, iova, phys_pfn);
if (ret) {
- if (put_pfn(phys_pfn[i], dma->prot) && do_accounting)
+ if (put_pfn(phys_pfn, dma->prot) && do_accounting)
vfio_lock_acct(dma, -1, true);
goto pin_unwind;
}

+ pages[i] = pfn_to_page(phys_pfn);
+
if (iommu->dirty_page_tracking) {
unsigned long pgshift = __ffs(iommu->pgsize_bitmap);

@@ -934,14 +937,14 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
goto pin_done;

pin_unwind:
- phys_pfn[i] = 0;
+ pages[i] = NULL;
for (j = 0; j < i; j++) {
dma_addr_t iova;

iova = user_iova + PAGE_SIZE * j;
dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
vfio_unpin_page_external(dma, iova, do_accounting);
- phys_pfn[j] = 0;
+ pages[j] = NULL;
}
pin_done:
mutex_unlock(&iommu->lock);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 9108de34a79b..c76a2c492bbd 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -148,7 +148,7 @@ bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
#define VFIO_PIN_PAGES_MAX_ENTRIES (PAGE_SIZE/sizeof(unsigned long))

int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
- int npage, int prot, unsigned long *phys_pfn);
+ int npage, int prot, struct page **pages);
void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage);
int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova,
void *data, size_t len, bool write);
--
2.17.1

2022-07-06 06:30:44

by Nicolin Chen

[permalink] [raw]
Subject: [RFT][PATCH v2 8/9] vfio/ccw: Add kmap_local_page() for memcpy

A PFN is not secure enough to promise that the memory is not IO. And
direct access via memcpy() that only handles CPU memory will crash on
S390 if the PFN is an IO PFN, as we have to use the memcpy_to/fromio()
that uses the special S390 IO access instructions. On the other hand,
a "struct page *" is always a CPU coherent thing that fits memcpy().

Also, casting a PFN to "void *" for memcpy() is not a proper practice,
kmap_local_page() is the correct API to call here, though S390 doesn't
use highmem, which means kmap_local_page() is a NOP.

There's a following patch changing the vfio_pin_pages() API to return
a list of "struct page *" instead of PFNs. It will block any IO memory
from ever getting into this call path, for such a security purpose. In
this patch, add kmap_local_page() to prepare for that.

Suggested-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/s390/cio/vfio_ccw_cp.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 3854c3d573f5..cd4ec4f6d6ff 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -11,6 +11,7 @@
#include <linux/ratelimit.h>
#include <linux/mm.h>
#include <linux/slab.h>
+#include <linux/highmem.h>
#include <linux/iommu.h>
#include <linux/vfio.h>
#include <asm/idals.h>
@@ -230,7 +231,6 @@ static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,
unsigned long n)
{
struct page_array pa = {0};
- u64 from;
int i, ret;
unsigned long l, m;

@@ -246,7 +246,9 @@ static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,

l = n;
for (i = 0; i < pa.pa_nr; i++) {
- from = pa.pa_pfn[i] << PAGE_SHIFT;
+ struct page *page = pfn_to_page(pa.pa_pfn[i]);
+ void *from = kmap_local_page(page);
+
m = PAGE_SIZE;
if (i == 0) {
from += iova & (PAGE_SIZE - 1);
@@ -254,7 +256,8 @@ static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,
}

m = min(l, m);
- memcpy(to + (n - l), (void *)from, m);
+ memcpy(to + (n - l), from, m);
+ kunmap_local(from);

l -= m;
if (l == 0)
--
2.17.1

2022-07-06 06:41:33

by Nicolin Chen

[permalink] [raw]
Subject: [RFT][PATCH v2 5/9] vfio/ap: Remove redundant pfn

The vfio_ap_ops code maintains both nib address and its PFN, which
is redundant, merely because vfio_pin/unpin_pages API wanted pfn.
Since vfio_pin/unpin_pages() now accept "iova", remove duplicated
pfn code in their callers too.

Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 42 +++++++++++----------------
drivers/s390/crypto/vfio_ap_private.h | 4 +--
2 files changed, 19 insertions(+), 27 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 8a2018ab3cf0..6945e0ddc08c 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -112,7 +112,7 @@ static void vfio_ap_wait_for_irqclear(int apqn)
*
* Unregisters the ISC in the GIB when the saved ISC not invalid.
* Unpins the guest's page holding the NIB when it exists.
- * Resets the saved_pfn and saved_isc to invalid values.
+ * Resets the saved_iova and saved_isc to invalid values.
*/
static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
{
@@ -123,9 +123,9 @@ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc);
q->saved_isc = VFIO_AP_ISC_INVALID;
}
- if (q->saved_pfn && !WARN_ON(!q->matrix_mdev)) {
- vfio_unpin_pages(&q->matrix_mdev->vdev, q->saved_pfn << PAGE_SHIFT, 1);
- q->saved_pfn = 0;
+ if (q->saved_iova && !WARN_ON(!q->matrix_mdev)) {
+ vfio_unpin_pages(&q->matrix_mdev->vdev, q->saved_iova, 1);
+ q->saved_iova = 0;
}
}

@@ -189,27 +189,19 @@ static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
*
* @vcpu: the object representing the vcpu executing the PQAP(AQIC) instruction.
* @nib: the location for storing the nib address.
- * @g_pfn: the location for storing the page frame number of the page containing
- * the nib.
*
* When the PQAP(AQIC) instruction is executed, general register 2 contains the
* address of the notification indicator byte (nib) used for IRQ notification.
- * This function parses the nib from gr2 and calculates the page frame
- * number for the guest of the page containing the nib. The values are
- * stored in @nib and @g_pfn respectively.
- *
- * The g_pfn of the nib is then validated to ensure the nib address is valid.
+ * This function parses and validate the nib from gr2.
*
* Return: returns zero if the nib address is a valid; otherwise, returns
* -EINVAL.
*/
-static int vfio_ap_validate_nib(struct kvm_vcpu *vcpu, unsigned long *nib,
- unsigned long *g_pfn)
+static int vfio_ap_validate_nib(struct kvm_vcpu *vcpu, dma_addr_t *nib)
{
*nib = vcpu->run->s.regs.gprs[2];
- *g_pfn = *nib >> PAGE_SHIFT;

- if (kvm_is_error_hva(gfn_to_hva(vcpu->kvm, *g_pfn)))
+ if (kvm_is_error_hva(gfn_to_hva(vcpu->kvm, *nib >> PAGE_SHIFT)))
return -EINVAL;

return 0;
@@ -239,34 +231,34 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
int isc,
struct kvm_vcpu *vcpu)
{
- unsigned long nib;
struct ap_qirq_ctrl aqic_gisa = {};
struct ap_queue_status status = {};
struct kvm_s390_gisa *gisa;
int nisc;
struct kvm *kvm;
- unsigned long g_pfn, h_pfn;
+ unsigned long h_pfn;
phys_addr_t h_nib;
+ dma_addr_t nib;
int ret;

/* Verify that the notification indicator byte address is valid */
- if (vfio_ap_validate_nib(vcpu, &nib, &g_pfn)) {
- VFIO_AP_DBF_WARN("%s: invalid NIB address: nib=%#lx, g_pfn=%#lx, apqn=%#04x\n",
- __func__, nib, g_pfn, q->apqn);
+ if (vfio_ap_validate_nib(vcpu, &nib)) {
+ VFIO_AP_DBF_WARN("%s: invalid NIB address: nib=%pad, apqn=%#04x\n",
+ __func__, &nib, q->apqn);

status.response_code = AP_RESPONSE_INVALID_ADDRESS;
return status;
}

- ret = vfio_pin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1,
+ ret = vfio_pin_pages(&q->matrix_mdev->vdev, nib, 1,
IOMMU_READ | IOMMU_WRITE, &h_pfn);
switch (ret) {
case 1:
break;
default:
VFIO_AP_DBF_WARN("%s: vfio_pin_pages failed: rc=%d,"
- "nib=%#lx, g_pfn=%#lx, apqn=%#04x\n",
- __func__, ret, nib, g_pfn, q->apqn);
+ "nib=%pad, apqn=%#04x\n",
+ __func__, ret, &nib, q->apqn);

status.response_code = AP_RESPONSE_INVALID_ADDRESS;
return status;
@@ -296,12 +288,12 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
case AP_RESPONSE_NORMAL:
/* See if we did clear older IRQ configuration */
vfio_ap_free_aqic_resources(q);
- q->saved_pfn = g_pfn;
+ q->saved_iova = nib;
q->saved_isc = isc;
break;
case AP_RESPONSE_OTHERWISE_CHANGED:
/* We could not modify IRQ setings: clear new configuration */
- vfio_unpin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1);
+ vfio_unpin_pages(&q->matrix_mdev->vdev, nib, 1);
kvm_s390_gisc_unregister(kvm, isc);
break;
default:
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index a26efd804d0d..479b205179bd 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -102,13 +102,13 @@ struct ap_matrix_mdev {
* struct vfio_ap_queue - contains the data associated with a queue bound to the
* vfio_ap device driver
* @matrix_mdev: the matrix mediated device
- * @saved_pfn: the guest PFN pinned for the guest
+ * @saved_iova: the notification indicator byte (nib) address
* @apqn: the APQN of the AP queue device
* @saved_isc: the guest ISC registered with the GIB interface
*/
struct vfio_ap_queue {
struct ap_matrix_mdev *matrix_mdev;
- unsigned long saved_pfn;
+ dma_addr_t saved_iova;
int apqn;
#define VFIO_AP_ISC_INVALID 0xff
unsigned char saved_isc;
--
2.17.1

2022-07-06 06:47:43

by Nicolin Chen

[permalink] [raw]
Subject: [RFT][PATCH v2 4/9] vfio: Pass in starting IOVA to vfio_pin/unpin_pages API

The vfio_pin/unpin_pages() so far accepted arrays of PFNs of user IOVA.
Among all three callers, there was only one caller possibly passing in
a non-contiguous PFN list, which is now ensured to have contiguous PFN
inputs too.

Pass in the starting address with "iova" alone to simplify things, so
callers no longer need to maintain a PFN list or to pin/unpin one page
at a time. This also allows VFIO to use more efficient implementations
of pin/unpin_pages.

For now, also update vfio_iommu_type1 to fit this new parameter too,
while keeping its input intact (being user_iova) since we don't want
to spend too much effort swapping its parameters and local variables
at that level.

Signed-off-by: Nicolin Chen <[email protected]>
---
.../driver-api/vfio-mediated-device.rst | 4 +--
drivers/gpu/drm/i915/gvt/kvmgt.c | 24 ++++++-----------
drivers/s390/cio/vfio_ccw_cp.c | 4 +--
drivers/s390/crypto/vfio_ap_ops.c | 9 +++----
drivers/vfio/vfio.c | 27 +++++++++----------
drivers/vfio/vfio.h | 4 +--
drivers/vfio/vfio_iommu_type1.c | 17 ++++++------
include/linux/vfio.h | 5 ++--
8 files changed, 40 insertions(+), 54 deletions(-)

diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
index b0fdf76b339a..ea32a0f13ddb 100644
--- a/Documentation/driver-api/vfio-mediated-device.rst
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -262,10 +262,10 @@ Translation APIs for Mediated Devices
The following APIs are provided for translating user pfn to host pfn in a VFIO
driver::

- int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
+ int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
int npage, int prot, unsigned long *phys_pfn);

- void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
+ void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
int npage);

These functions call back into the back-end IOMMU module by using the pin_pages
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 8c67c9aba82d..ea6041fa48ac 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -231,16 +231,8 @@ static void intel_gvt_cleanup_vgpu_type_groups(struct intel_gvt *gvt)
static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
unsigned long size)
{
- int total_pages;
- int npage;
-
- total_pages = roundup(size, PAGE_SIZE) / PAGE_SIZE;
-
- for (npage = 0; npage < total_pages; npage++) {
- unsigned long cur_gfn = gfn + npage;
-
- vfio_unpin_pages(&vgpu->vfio_device, &cur_gfn, 1);
- }
+ vfio_unpin_pages(&vgpu->vfio_device, gfn << PAGE_SHIFT,
+ roundup(size, PAGE_SIZE) / PAGE_SIZE);
}

/* Pin a normal or compound guest page for dma. */
@@ -258,14 +250,14 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
* on stack to hold pfns.
*/
for (npage = 0; npage < total_pages; npage++) {
- unsigned long cur_gfn = gfn + npage;
+ dma_addr_t cur_iova = (gfn + npage) << PAGE_SHIFT;
unsigned long pfn;

- ret = vfio_pin_pages(&vgpu->vfio_device, &cur_gfn, 1,
+ ret = vfio_pin_pages(&vgpu->vfio_device, cur_iova, 1,
IOMMU_READ | IOMMU_WRITE, &pfn);
if (ret != 1) {
- gvt_vgpu_err("vfio_pin_pages failed for gfn 0x%lx, ret %d\n",
- cur_gfn, ret);
+ gvt_vgpu_err("vfio_pin_pages failed for iova %pad, ret %d\n",
+ &cur_iova, ret);
goto err;
}

@@ -309,7 +301,7 @@ static int gvt_dma_map_page(struct intel_vgpu *vgpu, unsigned long gfn,
if (dma_mapping_error(dev, *dma_addr)) {
gvt_vgpu_err("DMA mapping failed for pfn 0x%lx, ret %d\n",
page_to_pfn(page), ret);
- gvt_unpin_guest_page(vgpu, gfn, size);
+ gvt_unpin_guest_page(vgpu, gfn << PAGE_SHIFT, size);
return -ENOMEM;
}

@@ -322,7 +314,7 @@ static void gvt_dma_unmap_page(struct intel_vgpu *vgpu, unsigned long gfn,
struct device *dev = vgpu->gvt->gt->i915->drm.dev;

dma_unmap_page(dev, dma_addr, size, DMA_BIDIRECTIONAL);
- gvt_unpin_guest_page(vgpu, gfn, size);
+ gvt_unpin_guest_page(vgpu, gfn << PAGE_SHIFT, size);
}

static struct gvt_dma *__gvt_cache_find_dma_addr(struct intel_vgpu *vgpu,
diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 3b94863ad24e..a739262f988d 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -114,7 +114,7 @@ static void pfn_array_unpin(struct pfn_array *pa,
continue;
}

- vfio_unpin_pages(vdev, first, npage);
+ vfio_unpin_pages(vdev, *first << PAGE_SHIFT, npage);
unpinned += npage;
npage = 1;
}
@@ -146,7 +146,7 @@ static int pfn_array_pin(struct pfn_array *pa, struct vfio_device *vdev)
continue;
}

- ret = vfio_pin_pages(vdev, first, npage,
+ ret = vfio_pin_pages(vdev, *first << PAGE_SHIFT, npage,
IOMMU_READ | IOMMU_WRITE,
&pa->pa_pfn[pinned]);
if (ret < 0) {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index bb869b28cebd..8a2018ab3cf0 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -124,7 +124,7 @@ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
q->saved_isc = VFIO_AP_ISC_INVALID;
}
if (q->saved_pfn && !WARN_ON(!q->matrix_mdev)) {
- vfio_unpin_pages(&q->matrix_mdev->vdev, &q->saved_pfn, 1);
+ vfio_unpin_pages(&q->matrix_mdev->vdev, q->saved_pfn << PAGE_SHIFT, 1);
q->saved_pfn = 0;
}
}
@@ -258,7 +258,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
return status;
}

- ret = vfio_pin_pages(&q->matrix_mdev->vdev, &g_pfn, 1,
+ ret = vfio_pin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1,
IOMMU_READ | IOMMU_WRITE, &h_pfn);
switch (ret) {
case 1:
@@ -301,7 +301,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
break;
case AP_RESPONSE_OTHERWISE_CHANGED:
/* We could not modify IRQ setings: clear new configuration */
- vfio_unpin_pages(&q->matrix_mdev->vdev, &g_pfn, 1);
+ vfio_unpin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1);
kvm_s390_gisc_unregister(kvm, isc);
break;
default:
@@ -1248,9 +1248,8 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,

if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
struct vfio_iommu_type1_dma_unmap *unmap = data;
- unsigned long g_pfn = unmap->iova >> PAGE_SHIFT;

- vfio_unpin_pages(&matrix_mdev->vdev, &g_pfn, 1);
+ vfio_unpin_pages(&matrix_mdev->vdev, unmap->iova, 1);
return NOTIFY_OK;
}

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 01f45ec70a3d..813b517c973e 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1910,17 +1910,17 @@ int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs,
EXPORT_SYMBOL(vfio_set_irqs_validate_and_prepare);

/*
- * Pin a set of guest PFNs and return their associated host PFNs for local
+ * Pin contiguous guest pages and return their associated host pages for local
* domain only.
* @device [in] : device
- * @user_pfn [in]: array of user/guest PFNs to be pinned.
- * @npage [in] : count of elements in user_pfn array. This count should not
+ * @iova [in] : starting IOVA of user/guest pages to be pinned.
+ * @npage [in] : count of pages to be pinned. This count should not
* be greater VFIO_PIN_PAGES_MAX_ENTRIES.
* @prot [in] : protection flags
* @phys_pfn[out]: array of host PFNs
* Return error or number of pages pinned.
*/
-int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
+int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
int npage, int prot, unsigned long *phys_pfn)
{
struct vfio_container *container;
@@ -1928,8 +1928,7 @@ int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
struct vfio_iommu_driver *driver;
int ret;

- if (!user_pfn || !phys_pfn || !npage ||
- !vfio_assert_device_open(device))
+ if (!phys_pfn || !npage || !vfio_assert_device_open(device))
return -EINVAL;

if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
@@ -1943,7 +1942,7 @@ int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
driver = container->iommu_driver;
if (likely(driver && driver->ops->pin_pages))
ret = driver->ops->pin_pages(container->iommu_data,
- group->iommu_group, user_pfn,
+ group->iommu_group, iova,
npage, prot, phys_pfn);
else
ret = -ENOTTY;
@@ -1953,20 +1952,18 @@ int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
EXPORT_SYMBOL(vfio_pin_pages);

/*
- * Unpin set of host PFNs for local domain only.
+ * Unpin contiguous host pages for local domain only.
* @device [in] : device
- * @user_pfn [in]: array of user/guest PFNs to be unpinned. Number of user/guest
- * PFNs should not be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
- * @npage [in] : count of elements in user_pfn array. This count should not
+ * @iova [in] : starting address of user/guest pages to be unpinned.
+ * @npage [in] : count of pages to be unpinned. This count should not
* be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
*/
-void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
- int npage)
+void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage)
{
struct vfio_container *container;
struct vfio_iommu_driver *driver;

- if (WARN_ON_ONCE(!user_pfn || !npage || !vfio_assert_device_open(device)))
+ if (WARN_ON_ONCE(!npage || !vfio_assert_device_open(device)))
return;

if (WARN_ON_ONCE(npage > VFIO_PIN_PAGES_MAX_ENTRIES))
@@ -1979,7 +1976,7 @@ void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
if (WARN_ON_ONCE(unlikely(!driver || !driver->ops->unpin_pages)))
return;

- driver->ops->unpin_pages(container->iommu_data, user_pfn, npage);
+ driver->ops->unpin_pages(container->iommu_data, iova, npage);
}
EXPORT_SYMBOL(vfio_unpin_pages);

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index bef4edf58138..dbcd0e8c031b 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -50,11 +50,11 @@ struct vfio_iommu_driver_ops {
struct iommu_group *group);
int (*pin_pages)(void *iommu_data,
struct iommu_group *group,
- unsigned long *user_pfn,
+ dma_addr_t user_iova,
int npage, int prot,
unsigned long *phys_pfn);
void (*unpin_pages)(void *iommu_data,
- unsigned long *user_pfn, int npage);
+ dma_addr_t user_iova, int npage);
int (*register_notifier)(void *iommu_data,
unsigned long *events,
struct notifier_block *nb);
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 08613edaf722..f10d0c4b1f26 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -828,7 +828,7 @@ static int vfio_unpin_page_external(struct vfio_dma *dma, dma_addr_t iova,

static int vfio_iommu_type1_pin_pages(void *iommu_data,
struct iommu_group *iommu_group,
- unsigned long *user_pfn,
+ dma_addr_t user_iova,
int npage, int prot,
unsigned long *phys_pfn)
{
@@ -840,7 +840,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
bool do_accounting;
dma_addr_t iova;

- if (!iommu || !user_pfn || !phys_pfn)
+ if (!iommu || !phys_pfn)
return -EINVAL;

/* Supported for v2 version only */
@@ -856,7 +856,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
again:
if (iommu->vaddr_invalid_count) {
for (i = 0; i < npage; i++) {
- iova = user_pfn[i] << PAGE_SHIFT;
+ iova = user_iova + PAGE_SIZE * i;
ret = vfio_find_dma_valid(iommu, iova, PAGE_SIZE, &dma);
if (ret < 0)
goto pin_done;
@@ -881,7 +881,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
for (i = 0; i < npage; i++) {
struct vfio_pfn *vpfn;

- iova = user_pfn[i] << PAGE_SHIFT;
+ iova = user_iova + PAGE_SIZE * i;
dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
if (!dma) {
ret = -EINVAL;
@@ -938,7 +938,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
for (j = 0; j < i; j++) {
dma_addr_t iova;

- iova = user_pfn[j] << PAGE_SHIFT;
+ iova = user_iova + PAGE_SIZE * j;
dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
vfio_unpin_page_external(dma, iova, do_accounting);
phys_pfn[j] = 0;
@@ -949,13 +949,13 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
}

static void vfio_iommu_type1_unpin_pages(void *iommu_data,
- unsigned long *user_pfn, int npage)
+ dma_addr_t user_iova, int npage)
{
struct vfio_iommu *iommu = iommu_data;
bool do_accounting;
int i;

- if (WARN_ON_ONCE(!iommu || !user_pfn || npage <= 0))
+ if (WARN_ON_ONCE(!iommu || npage <= 0))
return;

/* Supported for v2 version only */
@@ -966,10 +966,9 @@ static void vfio_iommu_type1_unpin_pages(void *iommu_data,

do_accounting = list_empty(&iommu->domain_list);
for (i = 0; i < npage; i++) {
+ dma_addr_t iova = user_iova + PAGE_SIZE * i;
struct vfio_dma *dma;
- dma_addr_t iova;

- iova = user_pfn[i] << PAGE_SHIFT;
dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
if (!dma)
break;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index d0844ecdc961..c3e441c0bf4b 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -147,10 +147,9 @@ bool vfio_file_has_dev(struct file *file, struct vfio_device *device);

#define VFIO_PIN_PAGES_MAX_ENTRIES (PAGE_SIZE/sizeof(unsigned long))

-int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
+int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
int npage, int prot, unsigned long *phys_pfn);
-void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
- int npage);
+void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage);
int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova,
void *data, size_t len, bool write);

--
2.17.1

2022-07-06 06:50:51

by Nicolin Chen

[permalink] [raw]
Subject: [RFT][PATCH v2 6/9] vfio/ccw: Change pa_pfn list to pa_iova list

The vfio_ccw_cp code maintains both iova and its PFN list because the
vfio_pin/unpin_pages API wanted pfn list. Since vfio_pin/unpin_pages()
now accept "iova", change to maintain only pa_iova list and rename all
"pfn_array" strings to "page_array", so as to simplify the code.

Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/s390/cio/vfio_ccw_cp.c | 135 ++++++++++++++++-----------------
1 file changed, 64 insertions(+), 71 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index a739262f988d..3854c3d573f5 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -18,11 +18,9 @@
#include "vfio_ccw_cp.h"
#include "vfio_ccw_private.h"

-struct pfn_array {
- /* Starting guest physical I/O address. */
- unsigned long pa_iova;
- /* Array that stores PFNs of the pages need to pin. */
- unsigned long *pa_iova_pfn;
+struct page_array {
+ /* Array that stores pages need to pin. */
+ dma_addr_t *pa_iova;
/* Array that receives PFNs of the pages pinned. */
unsigned long *pa_pfn;
/* Number of pages pinned from @pa_iova. */
@@ -37,53 +35,50 @@ struct ccwchain {
/* Count of the valid ccws in chain. */
int ch_len;
/* Pinned PAGEs for the original data. */
- struct pfn_array *ch_pa;
+ struct page_array *ch_pa;
};

/*
- * pfn_array_alloc() - alloc memory for PFNs
- * @pa: pfn_array on which to perform the operation
+ * page_array_alloc() - alloc memory for page array
+ * @pa: page_array on which to perform the operation
* @iova: target guest physical address
* @len: number of bytes that should be pinned from @iova
*
- * Attempt to allocate memory for PFNs.
+ * Attempt to allocate memory for page array.
*
- * Usage of pfn_array:
- * We expect (pa_nr == 0) and (pa_iova_pfn == NULL), any field in
+ * Usage of page_array:
+ * We expect (pa_nr == 0) and (pa_iova == NULL), any field in
* this structure will be filled in by this function.
*
* Returns:
- * 0 if PFNs are allocated
- * -EINVAL if pa->pa_nr is not initially zero, or pa->pa_iova_pfn is not NULL
+ * 0 if page array is allocated
+ * -EINVAL if pa->pa_nr is not initially zero, or pa->pa_iova is not NULL
* -ENOMEM if alloc failed
*/
-static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
+static int page_array_alloc(struct page_array *pa, u64 iova, unsigned int len)
{
int i;

- if (pa->pa_nr || pa->pa_iova_pfn)
+ if (pa->pa_nr || pa->pa_iova)
return -EINVAL;

- pa->pa_iova = iova;
-
pa->pa_nr = ((iova & ~PAGE_MASK) + len + (PAGE_SIZE - 1)) >> PAGE_SHIFT;
if (!pa->pa_nr)
return -EINVAL;

- pa->pa_iova_pfn = kcalloc(pa->pa_nr,
- sizeof(*pa->pa_iova_pfn) +
- sizeof(*pa->pa_pfn),
- GFP_KERNEL);
- if (unlikely(!pa->pa_iova_pfn)) {
+ pa->pa_iova = kcalloc(pa->pa_nr,
+ sizeof(*pa->pa_iova) + sizeof(*pa->pa_pfn),
+ GFP_KERNEL);
+ if (unlikely(!pa->pa_iova)) {
pa->pa_nr = 0;
return -ENOMEM;
}
- pa->pa_pfn = pa->pa_iova_pfn + pa->pa_nr;
+ pa->pa_pfn = (unsigned long *)&pa->pa_iova[pa->pa_nr];

- pa->pa_iova_pfn[0] = pa->pa_iova >> PAGE_SHIFT;
+ pa->pa_iova[0] = iova;
pa->pa_pfn[0] = -1ULL;
for (i = 1; i < pa->pa_nr; i++) {
- pa->pa_iova_pfn[i] = pa->pa_iova_pfn[i - 1] + 1;
+ pa->pa_iova[i] = pa->pa_iova[i - 1] + PAGE_SIZE;
pa->pa_pfn[i] = -1ULL;
}

@@ -91,30 +86,30 @@ static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
}

/*
- * pfn_array_unpin() - Unpin user pages in memory
- * @pa: pfn_array on which to perform the operation
+ * page_array_unpin() - Unpin user pages in memory
+ * @pa: page_array on which to perform the operation
* @vdev: the vfio device to perform the operation
* @pa_nr: number of user pages to unpin
*
* Only unpin if any pages were pinned to begin with, i.e. pa_nr > 0,
* otherwise only clear pa->pa_nr
*/
-static void pfn_array_unpin(struct pfn_array *pa,
- struct vfio_device *vdev, int pa_nr)
+static void page_array_unpin(struct page_array *pa,
+ struct vfio_device *vdev, int pa_nr)
{
int unpinned = 0, npage = 1;

while (unpinned < pa_nr) {
- unsigned long *first = &pa->pa_iova_pfn[unpinned];
- unsigned long *last = &first[npage];
+ dma_addr_t *first = &pa->pa_iova[unpinned];
+ dma_addr_t *last = &first[npage];

if (unpinned + npage < pa_nr &&
- *first + npage == *last) {
+ *first + npage * PAGE_SIZE == *last) {
npage++;
continue;
}

- vfio_unpin_pages(vdev, *first << PAGE_SHIFT, npage);
+ vfio_unpin_pages(vdev, *first, npage);
unpinned += npage;
npage = 1;
}
@@ -123,30 +118,30 @@ static void pfn_array_unpin(struct pfn_array *pa,
}

/*
- * pfn_array_pin() - Pin user pages in memory
- * @pa: pfn_array on which to perform the operation
+ * page_array_pin() - Pin user pages in memory
+ * @pa: page_array on which to perform the operation
* @mdev: the mediated device to perform pin operations
*
* Returns number of pages pinned upon success.
* If the pin request partially succeeds, or fails completely,
* all pages are left unpinned and a negative error value is returned.
*/
-static int pfn_array_pin(struct pfn_array *pa, struct vfio_device *vdev)
+static int page_array_pin(struct page_array *pa, struct vfio_device *vdev)
{
int pinned = 0, npage = 1;
int ret = 0;

while (pinned < pa->pa_nr) {
- unsigned long *first = &pa->pa_iova_pfn[pinned];
- unsigned long *last = &first[npage];
+ dma_addr_t *first = &pa->pa_iova[pinned];
+ dma_addr_t *last = &first[npage];

if (pinned + npage < pa->pa_nr &&
- *first + npage == *last) {
+ *first + npage * PAGE_SIZE == *last) {
npage++;
continue;
}

- ret = vfio_pin_pages(vdev, *first << PAGE_SHIFT, npage,
+ ret = vfio_pin_pages(vdev, *first, npage,
IOMMU_READ | IOMMU_WRITE,
&pa->pa_pfn[pinned]);
if (ret < 0) {
@@ -163,32 +158,30 @@ static int pfn_array_pin(struct pfn_array *pa, struct vfio_device *vdev)
return ret;

err_out:
- pfn_array_unpin(pa, vdev, pinned);
+ page_array_unpin(pa, vdev, pinned);
return ret;
}

/* Unpin the pages before releasing the memory. */
-static void pfn_array_unpin_free(struct pfn_array *pa, struct vfio_device *vdev)
+static void page_array_unpin_free(struct page_array *pa, struct vfio_device *vdev)
{
- pfn_array_unpin(pa, vdev, pa->pa_nr);
- kfree(pa->pa_iova_pfn);
+ page_array_unpin(pa, vdev, pa->pa_nr);
+ kfree(pa->pa_iova);
}

-static bool pfn_array_iova_pinned(struct pfn_array *pa, unsigned long iova)
+static bool page_array_iova_pinned(struct page_array *pa, unsigned long iova)
{
- unsigned long iova_pfn = iova >> PAGE_SHIFT;
int i;

for (i = 0; i < pa->pa_nr; i++)
- if (pa->pa_iova_pfn[i] == iova_pfn)
+ if (pa->pa_iova[i] == iova)
return true;

return false;
}
-/* Create the list of IDAL words for a pfn_array. */
-static inline void pfn_array_idal_create_words(
- struct pfn_array *pa,
- unsigned long *idaws)
+/* Create the list of IDAL words for a page_array. */
+static inline void page_array_idal_create_words(struct page_array *pa,
+ unsigned long *idaws)
{
int i;

@@ -204,7 +197,7 @@ static inline void pfn_array_idal_create_words(
idaws[i] = pa->pa_pfn[i] << PAGE_SHIFT;

/* Adjust the first IDAW, since it may not start on a page boundary */
- idaws[0] += pa->pa_iova & (PAGE_SIZE - 1);
+ idaws[0] += pa->pa_iova[0] & (PAGE_SIZE - 1);
}

static void convert_ccw0_to_ccw1(struct ccw1 *source, unsigned long len)
@@ -236,18 +229,18 @@ static void convert_ccw0_to_ccw1(struct ccw1 *source, unsigned long len)
static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,
unsigned long n)
{
- struct pfn_array pa = {0};
+ struct page_array pa = {0};
u64 from;
int i, ret;
unsigned long l, m;

- ret = pfn_array_alloc(&pa, iova, n);
+ ret = page_array_alloc(&pa, iova, n);
if (ret < 0)
return ret;

- ret = pfn_array_pin(&pa, vdev);
+ ret = page_array_pin(&pa, vdev);
if (ret < 0) {
- pfn_array_unpin_free(&pa, vdev);
+ page_array_unpin_free(&pa, vdev);
return ret;
}

@@ -268,7 +261,7 @@ static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,
break;
}

- pfn_array_unpin_free(&pa, vdev);
+ page_array_unpin_free(&pa, vdev);

return l;
}
@@ -371,7 +364,7 @@ static struct ccwchain *ccwchain_alloc(struct channel_program *cp, int len)
chain->ch_ccw = (struct ccw1 *)data;

data = (u8 *)(chain->ch_ccw) + sizeof(*chain->ch_ccw) * len;
- chain->ch_pa = (struct pfn_array *)data;
+ chain->ch_pa = (struct page_array *)data;

chain->ch_len = len;

@@ -555,7 +548,7 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
struct vfio_device *vdev =
&container_of(cp, struct vfio_ccw_private, cp)->vdev;
struct ccw1 *ccw;
- struct pfn_array *pa;
+ struct page_array *pa;
u64 iova;
unsigned long *idaws;
int ret;
@@ -589,13 +582,13 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
}

/*
- * Allocate an array of pfn's for pages to pin/translate.
+ * Allocate an array of pages to pin/translate.
* The number of pages is actually the count of the idaws
* required for the data transfer, since we only only support
* 4K IDAWs today.
*/
pa = chain->ch_pa + idx;
- ret = pfn_array_alloc(pa, iova, bytes);
+ ret = page_array_alloc(pa, iova, bytes);
if (ret < 0)
goto out_free_idaws;

@@ -606,21 +599,21 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
goto out_unpin;

/*
- * Copy guest IDAWs into pfn_array, in case the memory they
+ * Copy guest IDAWs into page_array, in case the memory they
* occupy is not contiguous.
*/
for (i = 0; i < idaw_nr; i++)
- pa->pa_iova_pfn[i] = idaws[i] >> PAGE_SHIFT;
+ pa->pa_iova[i] = idaws[i];
} else {
/*
- * No action is required here; the iova addresses in pfn_array
- * were initialized sequentially in pfn_array_alloc() beginning
+ * No action is required here; the iova addresses in page_array
+ * were initialized sequentially in page_array_alloc() beginning
* with the contents of ccw->cda.
*/
}

if (ccw_does_data_transfer(ccw)) {
- ret = pfn_array_pin(pa, vdev);
+ ret = page_array_pin(pa, vdev);
if (ret < 0)
goto out_unpin;
} else {
@@ -630,13 +623,13 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
ccw->cda = (__u32) virt_to_phys(idaws);
ccw->flags |= CCW_FLAG_IDA;

- /* Populate the IDAL with pinned/translated addresses from pfn */
- pfn_array_idal_create_words(pa, idaws);
+ /* Populate the IDAL with pinned/translated addresses from page */
+ page_array_idal_create_words(pa, idaws);

return 0;

out_unpin:
- pfn_array_unpin_free(pa, vdev);
+ page_array_unpin_free(pa, vdev);
out_free_idaws:
kfree(idaws);
out_init:
@@ -742,7 +735,7 @@ void cp_free(struct channel_program *cp)
cp->initialized = false;
list_for_each_entry_safe(chain, temp, &cp->ccwchain_list, next) {
for (i = 0; i < chain->ch_len; i++) {
- pfn_array_unpin_free(chain->ch_pa + i, vdev);
+ page_array_unpin_free(chain->ch_pa + i, vdev);
ccwchain_cda_free(chain, i);
}
ccwchain_free(chain);
@@ -918,7 +911,7 @@ bool cp_iova_pinned(struct channel_program *cp, u64 iova)

list_for_each_entry(chain, &cp->ccwchain_list, next) {
for (i = 0; i < chain->ch_len; i++)
- if (pfn_array_iova_pinned(chain->ch_pa + i, iova))
+ if (page_array_iova_pinned(chain->ch_pa + i, iova))
return true;
}

--
2.17.1

2022-07-06 06:57:58

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 1/9] vfio: Make vfio_unpin_pages() return void

> +void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
> + int npage)
> {
> struct vfio_container *container;
> struct vfio_iommu_driver *driver;
> - int ret;
>
> - if (!user_pfn || !npage || !vfio_assert_device_open(device))
> - return -EINVAL;
> + if (WARN_ON_ONCE(!user_pfn || !npage || !vfio_assert_device_open(device)))

This adds an overly long line. Note that I think in general it is
better to have an individual WARN_ON per condition anyway, as that
allows to directly pinpoint what went wrong when it triggers.

> + if (WARN_ON_ONCE(unlikely(!driver || !driver->ops->unpin_pages)))
> + return;

I'd just skip this check an let it crash. If someone calls unpin
on something totally random that wasn't even pinned we don't need to
handle that gracefully.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <[email protected]>

2022-07-06 06:58:45

by Christoph Hellwig

[permalink] [raw]

2022-07-06 07:14:03

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 4/9] vfio: Pass in starting IOVA to vfio_pin/unpin_pages API

> - vfio_unpin_pages(&q->matrix_mdev->vdev, &q->saved_pfn, 1);
> + vfio_unpin_pages(&q->matrix_mdev->vdev, q->saved_pfn << PAGE_SHIFT, 1);

Overly long line here.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <[email protected]>

2022-07-06 16:03:27

by Nicolin Chen

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 4/9] vfio: Pass in starting IOVA to vfio_pin/unpin_pages API

On Tue, Jul 05, 2022 at 11:56:25PM -0700, Christoph Hellwig wrote:
> > - vfio_unpin_pages(&q->matrix_mdev->vdev, &q->saved_pfn, 1);
> > + vfio_unpin_pages(&q->matrix_mdev->vdev, q->saved_pfn << PAGE_SHIFT, 1);
>
> Overly long line here.

The following PATCH-5 drops the "<< PAGE_SHIFT", addressing this.

To not wrapping and unwrapping a line back and forth, I think we
are fine here?

> Reviewed-by: Christoph Hellwig <[email protected]>

Will add to v3. Thanks for the review!

2022-07-06 16:04:53

by Nicolin Chen

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 1/9] vfio: Make vfio_unpin_pages() return void

On Tue, Jul 05, 2022 at 11:54:50PM -0700, Christoph Hellwig wrote:
> > +void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
> > + int npage)
> > {
> > struct vfio_container *container;
> > struct vfio_iommu_driver *driver;
> > - int ret;
> >
> > - if (!user_pfn || !npage || !vfio_assert_device_open(device))
> > - return -EINVAL;
> > + if (WARN_ON_ONCE(!user_pfn || !npage || !vfio_assert_device_open(device)))
>
> This adds an overly long line. Note that I think in general it is
> better to have an individual WARN_ON per condition anyway, as that
> allows to directly pinpoint what went wrong when it triggers.

Following patches are touching this line too. And it'll be shrunk
to a shorter line eventually by the end of PATCH-9.

Yet, I can separate them as you pointed out.

> > + if (WARN_ON_ONCE(unlikely(!driver || !driver->ops->unpin_pages)))
> > + return;
>
> I'd just skip this check an let it crash. If someone calls unpin
> on something totally random that wasn't even pinned we don't need to
> handle that gracefully.

Makes sense. I can drop that in next version.

> Reviewed-by: Christoph Hellwig <[email protected]>

Will add to v3. Thanks for the review!

2022-07-06 16:52:33

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 2/9] vfio/ap: Pass in physical address of ind to ap_aqic()

On Tue, Jul 05, 2022 at 11:27:52PM -0700, Nicolin Chen wrote:
> The ap_aqic() is called by vfio_ap_irq_enable() where it passes in a
> virt value that's casted from a physical address "h_nib". Inside the
> ap_aqic(), it does virt_to_phys() again.
>
> Since ap_aqic() needs a physical address, let's just pass in a pa of
> ind directly. So change the "ind" to "pa_ind".
>
> Reviewed-by: Harald Freudenberger <[email protected]>
> Signed-off-by: Nicolin Chen <[email protected]>
> ---
> arch/s390/include/asm/ap.h | 6 +++---
> drivers/s390/crypto/ap_queue.c | 2 +-
> drivers/s390/crypto/vfio_ap_ops.c | 7 ++++---
> 3 files changed, 8 insertions(+), 7 deletions(-)

Reviewed-by: Jason Gunthorpe <[email protected]>

Jason

2022-07-06 17:10:23

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 1/9] vfio: Make vfio_unpin_pages() return void

On Tue, Jul 05, 2022 at 11:27:51PM -0700, Nicolin Chen wrote:
> There's only one caller that checks its return value with a WARN_ON_ONCE,
> while all other callers do not check return value at all. So simplify the
> API to return void by embedding similar WARN_ON_ONCEs.
>
> Suggested-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Nicolin Chen <[email protected]>
> ---
> .../driver-api/vfio-mediated-device.rst | 2 +-
> drivers/gpu/drm/i915/gvt/kvmgt.c | 5 +---
> drivers/vfio/vfio.c | 24 ++++++++-----------
> drivers/vfio/vfio.h | 2 +-
> drivers/vfio/vfio_iommu_type1.c | 16 ++++++-------
> include/linux/vfio.h | 4 ++--
> 6 files changed, 23 insertions(+), 30 deletions(-)

Reviewed-by: Jason Gunthorpe <[email protected]>

Jason

2022-07-06 17:23:03

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 3/9] vfio/ccw: Only pass in contiguous pages

On Tue, Jul 05, 2022 at 11:27:53PM -0700, Nicolin Chen wrote:
> This driver is the only caller of vfio_pin/unpin_pages that might pass
> in a non-contiguous PFN list, but in many cases it has a contiguous PFN
> list to process. So letting VFIO API handle a non-contiguous PFN list
> is actually counterproductive.
>
> Add a pair of simple loops to pass in contiguous PFNs only, to have an
> efficient implementation in VFIO.
>
> Signed-off-by: Nicolin Chen <[email protected]>
> ---
> drivers/s390/cio/vfio_ccw_cp.c | 70 +++++++++++++++++++++++++++-------
> 1 file changed, 56 insertions(+), 14 deletions(-)

I think this is fine as-is for this series, but someone who knows and
can test ccw should go in and fix things so that pfn_array_alloc()
doesn't exist. Allocating memory and filling it with consecutive
integers is kind of silly given we can just call vfio_pin_pages() with
pa_nr directly.

pa->pa_iova_pfn[0] = pa->pa_iova >> PAGE_SHIFT;
pa->pa_pfn[0] = -1ULL;
for (i = 1; i < pa->pa_nr; i++) {
pa->pa_iova_pfn[i] = pa->pa_iova_pfn[i - 1] + 1;

It looks like only the 'ccw_is_idal' flow can actually create
non-continuities. Also the loop in copy_from_iova() should ideally be
using the much faster 'rw' interface, and not a pin/unpin cycle just
to memcpy.

If I guess right these changes would significantly speed this driver
up.

Anyhow,

Reviewed-by: Jason Gunthorpe <[email protected]>

Jason

2022-07-06 18:01:38

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 4/9] vfio: Pass in starting IOVA to vfio_pin/unpin_pages API

On Tue, Jul 05, 2022 at 11:27:54PM -0700, Nicolin Chen wrote:

> These functions call back into the back-end IOMMU module by using the pin_pages
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 8c67c9aba82d..ea6041fa48ac 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -231,16 +231,8 @@ static void intel_gvt_cleanup_vgpu_type_groups(struct intel_gvt *gvt)
> static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
> unsigned long size)
> {
> - int total_pages;
> - int npage;
> -
> - total_pages = roundup(size, PAGE_SIZE) / PAGE_SIZE;
> -
> - for (npage = 0; npage < total_pages; npage++) {
> - unsigned long cur_gfn = gfn + npage;
> -
> - vfio_unpin_pages(&vgpu->vfio_device, &cur_gfn, 1);
> - }
> + vfio_unpin_pages(&vgpu->vfio_device, gfn << PAGE_SHIFT,
> + roundup(size, PAGE_SIZE) / PAGE_SIZE);

These maths are DIV_ROUND_UP()

Otherwise,

Reviewed-by: Jason Gunthorpe <[email protected]>

Jason

2022-07-06 18:02:20

by Kirti Wankhede

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 1/9] vfio: Make vfio_unpin_pages() return void


Reviewed-by: Kirti Wankhede <[email protected]>

On 7/6/2022 11:57 AM, Nicolin Chen wrote:
> There's only one caller that checks its return value with a WARN_ON_ONCE,
> while all other callers do not check return value at all. So simplify the
> API to return void by embedding similar WARN_ON_ONCEs.
>
> Suggested-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Nicolin Chen <[email protected]>
> ---
> .../driver-api/vfio-mediated-device.rst | 2 +-
> drivers/gpu/drm/i915/gvt/kvmgt.c | 5 +---
> drivers/vfio/vfio.c | 24 ++++++++-----------
> drivers/vfio/vfio.h | 2 +-
> drivers/vfio/vfio_iommu_type1.c | 16 ++++++-------
> include/linux/vfio.h | 4 ++--
> 6 files changed, 23 insertions(+), 30 deletions(-)
>
> diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
> index 1c57815619fd..b0fdf76b339a 100644
> --- a/Documentation/driver-api/vfio-mediated-device.rst
> +++ b/Documentation/driver-api/vfio-mediated-device.rst
> @@ -265,7 +265,7 @@ driver::
> int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
> int npage, int prot, unsigned long *phys_pfn);
>
> - int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
> + void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
> int npage);
>
> These functions call back into the back-end IOMMU module by using the pin_pages
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index e2f6c56ab342..8c67c9aba82d 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -231,18 +231,15 @@ static void intel_gvt_cleanup_vgpu_type_groups(struct intel_gvt *gvt)
> static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
> unsigned long size)
> {
> - struct drm_i915_private *i915 = vgpu->gvt->gt->i915;
> int total_pages;
> int npage;
> - int ret;
>
> total_pages = roundup(size, PAGE_SIZE) / PAGE_SIZE;
>
> for (npage = 0; npage < total_pages; npage++) {
> unsigned long cur_gfn = gfn + npage;
>
> - ret = vfio_unpin_pages(&vgpu->vfio_device, &cur_gfn, 1);
> - drm_WARN_ON(&i915->drm, ret != 1);
> + vfio_unpin_pages(&vgpu->vfio_device, &cur_gfn, 1);
> }
> }
>
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 61e71c1154be..01f45ec70a3d 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1959,31 +1959,27 @@ EXPORT_SYMBOL(vfio_pin_pages);
> * PFNs should not be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
> * @npage [in] : count of elements in user_pfn array. This count should not
> * be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
> - * Return error or number of pages unpinned.
> */
> -int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
> - int npage)
> +void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
> + int npage)
> {
> struct vfio_container *container;
> struct vfio_iommu_driver *driver;
> - int ret;
>
> - if (!user_pfn || !npage || !vfio_assert_device_open(device))
> - return -EINVAL;
> + if (WARN_ON_ONCE(!user_pfn || !npage || !vfio_assert_device_open(device)))
> + return;
>
> - if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
> - return -E2BIG;
> + if (WARN_ON_ONCE(npage > VFIO_PIN_PAGES_MAX_ENTRIES))
> + return;
>
> /* group->container cannot change while a vfio device is open */
> container = device->group->container;
> driver = container->iommu_driver;
> - if (likely(driver && driver->ops->unpin_pages))
> - ret = driver->ops->unpin_pages(container->iommu_data, user_pfn,
> - npage);
> - else
> - ret = -ENOTTY;
>
> - return ret;
> + if (WARN_ON_ONCE(unlikely(!driver || !driver->ops->unpin_pages)))
> + return;
> +
> + driver->ops->unpin_pages(container->iommu_data, user_pfn, npage);
> }
> EXPORT_SYMBOL(vfio_unpin_pages);
>
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index a67130221151..bef4edf58138 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -53,7 +53,7 @@ struct vfio_iommu_driver_ops {
> unsigned long *user_pfn,
> int npage, int prot,
> unsigned long *phys_pfn);
> - int (*unpin_pages)(void *iommu_data,
> + void (*unpin_pages)(void *iommu_data,
> unsigned long *user_pfn, int npage);
> int (*register_notifier)(void *iommu_data,
> unsigned long *events,
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index c13b9290e357..08613edaf722 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -948,20 +948,19 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
> return ret;
> }
>
> -static int vfio_iommu_type1_unpin_pages(void *iommu_data,
> - unsigned long *user_pfn,
> - int npage)
> +static void vfio_iommu_type1_unpin_pages(void *iommu_data,
> + unsigned long *user_pfn, int npage)
> {
> struct vfio_iommu *iommu = iommu_data;
> bool do_accounting;
> int i;
>
> - if (!iommu || !user_pfn || npage <= 0)
> - return -EINVAL;
> + if (WARN_ON_ONCE(!iommu || !user_pfn || npage <= 0))
> + return;
>
> /* Supported for v2 version only */
> - if (!iommu->v2)
> - return -EACCES;
> + if (WARN_ON_ONCE(!iommu->v2))
> + return;
>
> mutex_lock(&iommu->lock);
>
> @@ -979,7 +978,8 @@ static int vfio_iommu_type1_unpin_pages(void *iommu_data,
> }
>
> mutex_unlock(&iommu->lock);
> - return i > 0 ? i : -EINVAL;
> +
> + WARN_ON_ONCE(i != npage);
> }
>
> static long vfio_sync_unpin(struct vfio_dma *dma, struct vfio_domain *domain,
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 49580fa2073a..d0844ecdc961 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -149,8 +149,8 @@ bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
>
> int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
> int npage, int prot, unsigned long *phys_pfn);
> -int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
> - int npage);
> +void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
> + int npage);
> int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova,
> void *data, size_t len, bool write);
>

2022-07-06 18:02:26

by Kirti Wankhede

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 9/9] vfio: Replace phys_pfn with pages for vfio_pin_pages()


Reviewed-by: Kirti Wankhede <[email protected]>

On 7/6/2022 11:57 AM, Nicolin Chen wrote:
> Most of the callers of vfio_pin_pages() want "struct page *" and the
> low-level mm code to pin pages returns a list of "struct page *" too.
> So there's no gain in converting "struct page *" to PFN in between.
>
> Replace the output parameter "phys_pfn" list with a "pages" list, to
> simplify callers. This also allows us to replace the vfio_iommu_type1
> implementation with a more efficient one.
>
> For now, also update vfio_iommu_type1 to fit this new parameter too.
>
> Signed-off-by: Nicolin Chen <[email protected]>
> ---
> .../driver-api/vfio-mediated-device.rst | 2 +-
> drivers/gpu/drm/i915/gvt/kvmgt.c | 19 ++++++-------------
> drivers/s390/cio/vfio_ccw_cp.c | 19 +++++++++----------
> drivers/s390/crypto/vfio_ap_ops.c | 6 +++---
> drivers/vfio/vfio.c | 8 ++++----
> drivers/vfio/vfio.h | 2 +-
> drivers/vfio/vfio_iommu_type1.c | 19 +++++++++++--------
> include/linux/vfio.h | 2 +-
> 8 files changed, 36 insertions(+), 41 deletions(-)
>
> diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
> index ea32a0f13ddb..ba5fefcdae1a 100644
> --- a/Documentation/driver-api/vfio-mediated-device.rst
> +++ b/Documentation/driver-api/vfio-mediated-device.rst
> @@ -263,7 +263,7 @@ The following APIs are provided for translating user pfn to host pfn in a VFIO
> driver::
>
> int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
> - int npage, int prot, unsigned long *phys_pfn);
> + int npage, int prot, struct page **pages);
>
> void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
> int npage);
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index ea6041fa48ac..3a49471dcc16 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -239,7 +239,7 @@ static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
> static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
> unsigned long size, struct page **page)
> {
> - unsigned long base_pfn = 0;
> + struct page *base_page = NULL;
> int total_pages;
> int npage;
> int ret;
> @@ -251,26 +251,19 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
> */
> for (npage = 0; npage < total_pages; npage++) {
> dma_addr_t cur_iova = (gfn + npage) << PAGE_SHIFT;
> - unsigned long pfn;
> + struct page *cur_page;
>
> ret = vfio_pin_pages(&vgpu->vfio_device, cur_iova, 1,
> - IOMMU_READ | IOMMU_WRITE, &pfn);
> + IOMMU_READ | IOMMU_WRITE, &cur_page);
> if (ret != 1) {
> gvt_vgpu_err("vfio_pin_pages failed for iova %pad, ret %d\n",
> &cur_iova, ret);
> goto err;
> }
>
> - if (!pfn_valid(pfn)) {
> - gvt_vgpu_err("pfn 0x%lx is not mem backed\n", pfn);
> - npage++;
> - ret = -EFAULT;
> - goto err;
> - }
> -
> if (npage == 0)
> - base_pfn = pfn;
> - else if (base_pfn + npage != pfn) {
> + base_page = cur_page;
> + else if (base_page + npage != cur_page) {
> gvt_vgpu_err("The pages are not continuous\n");
> ret = -EINVAL;
> npage++;
> @@ -278,7 +271,7 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
> }
> }
>
> - *page = pfn_to_page(base_pfn);
> + *page = base_page;
> return 0;
> err:
> gvt_unpin_guest_page(vgpu, gfn, npage * PAGE_SIZE);
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> index cd4ec4f6d6ff..8963f452f963 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -22,8 +22,8 @@
> struct page_array {
> /* Array that stores pages need to pin. */
> dma_addr_t *pa_iova;
> - /* Array that receives PFNs of the pages pinned. */
> - unsigned long *pa_pfn;
> + /* Array that receives the pinned pages. */
> + struct page **pa_page;
> /* Number of pages pinned from @pa_iova. */
> int pa_nr;
> };
> @@ -68,19 +68,19 @@ static int page_array_alloc(struct page_array *pa, u64 iova, unsigned int len)
> return -EINVAL;
>
> pa->pa_iova = kcalloc(pa->pa_nr,
> - sizeof(*pa->pa_iova) + sizeof(*pa->pa_pfn),
> + sizeof(*pa->pa_iova) + sizeof(*pa->pa_page),
> GFP_KERNEL);
> if (unlikely(!pa->pa_iova)) {
> pa->pa_nr = 0;
> return -ENOMEM;
> }
> - pa->pa_pfn = (unsigned long *)&pa->pa_iova[pa->pa_nr];
> + pa->pa_page = (struct page **)&pa->pa_iova[pa->pa_nr];
>
> pa->pa_iova[0] = iova;
> - pa->pa_pfn[0] = -1ULL;
> + pa->pa_page[0] = NULL;
> for (i = 1; i < pa->pa_nr; i++) {
> pa->pa_iova[i] = pa->pa_iova[i - 1] + PAGE_SIZE;
> - pa->pa_pfn[i] = -1ULL;
> + pa->pa_page[i] = NULL;
> }
>
> return 0;
> @@ -144,7 +144,7 @@ static int page_array_pin(struct page_array *pa, struct vfio_device *vdev)
>
> ret = vfio_pin_pages(vdev, *first, npage,
> IOMMU_READ | IOMMU_WRITE,
> - &pa->pa_pfn[pinned]);
> + &pa->pa_page[pinned]);
> if (ret < 0) {
> goto err_out;
> } else if (ret > 0 && ret != npage) {
> @@ -195,7 +195,7 @@ static inline void page_array_idal_create_words(struct page_array *pa,
> */
>
> for (i = 0; i < pa->pa_nr; i++)
> - idaws[i] = pa->pa_pfn[i] << PAGE_SHIFT;
> + idaws[i] = page_to_phys(pa->pa_page[i]);
>
> /* Adjust the first IDAW, since it may not start on a page boundary */
> idaws[0] += pa->pa_iova[0] & (PAGE_SIZE - 1);
> @@ -246,8 +246,7 @@ static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,
>
> l = n;
> for (i = 0; i < pa.pa_nr; i++) {
> - struct page *page = pfn_to_page(pa.pa_pfn[i]);
> - void *from = kmap_local_page(page);
> + void *from = kmap_local_page(pa.pa_page[i]);
>
> m = PAGE_SIZE;
> if (i == 0) {
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 6945e0ddc08c..b0972ca0dfa3 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -234,9 +234,9 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
> struct ap_qirq_ctrl aqic_gisa = {};
> struct ap_queue_status status = {};
> struct kvm_s390_gisa *gisa;
> + struct page *h_page;
> int nisc;
> struct kvm *kvm;
> - unsigned long h_pfn;
> phys_addr_t h_nib;
> dma_addr_t nib;
> int ret;
> @@ -251,7 +251,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
> }
>
> ret = vfio_pin_pages(&q->matrix_mdev->vdev, nib, 1,
> - IOMMU_READ | IOMMU_WRITE, &h_pfn);
> + IOMMU_READ | IOMMU_WRITE, &h_page);
> switch (ret) {
> case 1:
> break;
> @@ -267,7 +267,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
> kvm = q->matrix_mdev->kvm;
> gisa = kvm->arch.gisa_int.origin;
>
> - h_nib = (h_pfn << PAGE_SHIFT) | (nib & ~PAGE_MASK);
> + h_nib = page_to_phys(h_page) | (nib & ~PAGE_MASK);
> aqic_gisa.gisc = isc;
>
> nisc = kvm_s390_gisc_register(kvm, isc);
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index b95ec2d78966..96b758e06c4a 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1917,18 +1917,18 @@ EXPORT_SYMBOL(vfio_set_irqs_validate_and_prepare);
> * @npage [in] : count of pages to be pinned. This count should not
> * be greater VFIO_PIN_PAGES_MAX_ENTRIES.
> * @prot [in] : protection flags
> - * @phys_pfn[out]: array of host PFNs
> + * @pages[out] : array of host pages
> * Return error or number of pages pinned.
> */
> int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
> - int npage, int prot, unsigned long *phys_pfn)
> + int npage, int prot, struct page **pages)
> {
> struct vfio_container *container;
> struct vfio_group *group = device->group;
> struct vfio_iommu_driver *driver;
> int ret;
>
> - if (!phys_pfn || !npage || !vfio_assert_device_open(device))
> + if (!pages || !npage || !vfio_assert_device_open(device))
> return -EINVAL;
>
> if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
> @@ -1943,7 +1943,7 @@ int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
> if (likely(driver && driver->ops->pin_pages))
> ret = driver->ops->pin_pages(container->iommu_data,
> group->iommu_group, iova,
> - npage, prot, phys_pfn);
> + npage, prot, pages);
> else
> ret = -ENOTTY;
>
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index dbcd0e8c031b..dbfad8e20581 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -52,7 +52,7 @@ struct vfio_iommu_driver_ops {
> struct iommu_group *group,
> dma_addr_t user_iova,
> int npage, int prot,
> - unsigned long *phys_pfn);
> + struct page **pages);
> void (*unpin_pages)(void *iommu_data,
> dma_addr_t user_iova, int npage);
> int (*register_notifier)(void *iommu_data,
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index f10d0c4b1f26..de342d82b154 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -830,7 +830,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
> struct iommu_group *iommu_group,
> dma_addr_t user_iova,
> int npage, int prot,
> - unsigned long *phys_pfn)
> + struct page **pages)
> {
> struct vfio_iommu *iommu = iommu_data;
> struct vfio_iommu_group *group;
> @@ -840,7 +840,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
> bool do_accounting;
> dma_addr_t iova;
>
> - if (!iommu || !phys_pfn)
> + if (!iommu || !pages)
> return -EINVAL;
>
> /* Supported for v2 version only */
> @@ -879,6 +879,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
> do_accounting = list_empty(&iommu->domain_list);
>
> for (i = 0; i < npage; i++) {
> + unsigned long phys_pfn;
> struct vfio_pfn *vpfn;
>
> iova = user_iova + PAGE_SIZE * i;
> @@ -895,23 +896,25 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
>
> vpfn = vfio_iova_get_vfio_pfn(dma, iova);
> if (vpfn) {
> - phys_pfn[i] = vpfn->pfn;
> + pages[i] = pfn_to_page(vpfn->pfn);
> continue;
> }
>
> remote_vaddr = dma->vaddr + (iova - dma->iova);
> - ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn[i],
> + ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn,
> do_accounting);
> if (ret)
> goto pin_unwind;
>
> - ret = vfio_add_to_pfn_list(dma, iova, phys_pfn[i]);
> + ret = vfio_add_to_pfn_list(dma, iova, phys_pfn);
> if (ret) {
> - if (put_pfn(phys_pfn[i], dma->prot) && do_accounting)
> + if (put_pfn(phys_pfn, dma->prot) && do_accounting)
> vfio_lock_acct(dma, -1, true);
> goto pin_unwind;
> }
>
> + pages[i] = pfn_to_page(phys_pfn);
> +
> if (iommu->dirty_page_tracking) {
> unsigned long pgshift = __ffs(iommu->pgsize_bitmap);
>
> @@ -934,14 +937,14 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
> goto pin_done;
>
> pin_unwind:
> - phys_pfn[i] = 0;
> + pages[i] = NULL;
> for (j = 0; j < i; j++) {
> dma_addr_t iova;
>
> iova = user_iova + PAGE_SIZE * j;
> dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
> vfio_unpin_page_external(dma, iova, do_accounting);
> - phys_pfn[j] = 0;
> + pages[j] = NULL;
> }
> pin_done:
> mutex_unlock(&iommu->lock);
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 9108de34a79b..c76a2c492bbd 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -148,7 +148,7 @@ bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
> #define VFIO_PIN_PAGES_MAX_ENTRIES (PAGE_SIZE/sizeof(unsigned long))
>
> int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
> - int npage, int prot, unsigned long *phys_pfn);
> + int npage, int prot, struct page **pages);
> void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage);
> int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova,
> void *data, size_t len, bool write);

2022-07-06 18:08:44

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 6/9] vfio/ccw: Change pa_pfn list to pa_iova list

On Tue, Jul 05, 2022 at 11:27:56PM -0700, Nicolin Chen wrote:
> The vfio_ccw_cp code maintains both iova and its PFN list because the
> vfio_pin/unpin_pages API wanted pfn list. Since vfio_pin/unpin_pages()
> now accept "iova", change to maintain only pa_iova list and rename all
> "pfn_array" strings to "page_array", so as to simplify the code.
>
> Signed-off-by: Nicolin Chen <[email protected]>
> ---
> drivers/s390/cio/vfio_ccw_cp.c | 135 ++++++++++++++++-----------------
> 1 file changed, 64 insertions(+), 71 deletions(-)

Reviewed-by: Jason Gunthorpe <[email protected]>

Jason

2022-07-06 18:09:38

by Nicolin Chen

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 4/9] vfio: Pass in starting IOVA to vfio_pin/unpin_pages API

On Wed, Jul 06, 2022 at 02:49:23PM -0300, Jason Gunthorpe wrote:
> On Tue, Jul 05, 2022 at 11:27:54PM -0700, Nicolin Chen wrote:
>
> > These functions call back into the back-end IOMMU module by using the pin_pages
> > diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> > index 8c67c9aba82d..ea6041fa48ac 100644
> > --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> > +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> > @@ -231,16 +231,8 @@ static void intel_gvt_cleanup_vgpu_type_groups(struct intel_gvt *gvt)
> > static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
> > unsigned long size)
> > {
> > - int total_pages;
> > - int npage;
> > -
> > - total_pages = roundup(size, PAGE_SIZE) / PAGE_SIZE;
> > -
> > - for (npage = 0; npage < total_pages; npage++) {
> > - unsigned long cur_gfn = gfn + npage;
> > -
> > - vfio_unpin_pages(&vgpu->vfio_device, &cur_gfn, 1);
> > - }
> > + vfio_unpin_pages(&vgpu->vfio_device, gfn << PAGE_SHIFT,
> > + roundup(size, PAGE_SIZE) / PAGE_SIZE);
>
> These maths are DIV_ROUND_UP()

Will change in v3.

> Reviewed-by: Jason Gunthorpe <[email protected]>

Thanks!

2022-07-06 18:16:10

by Kirti Wankhede

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 4/9] vfio: Pass in starting IOVA to vfio_pin/unpin_pages API


Reviewed by: Kirti Wankhede <[email protected]>


On 7/6/2022 11:57 AM, Nicolin Chen wrote:
> The vfio_pin/unpin_pages() so far accepted arrays of PFNs of user IOVA.
> Among all three callers, there was only one caller possibly passing in
> a non-contiguous PFN list, which is now ensured to have contiguous PFN
> inputs too.
>
> Pass in the starting address with "iova" alone to simplify things, so
> callers no longer need to maintain a PFN list or to pin/unpin one page
> at a time. This also allows VFIO to use more efficient implementations
> of pin/unpin_pages.
>
> For now, also update vfio_iommu_type1 to fit this new parameter too,
> while keeping its input intact (being user_iova) since we don't want
> to spend too much effort swapping its parameters and local variables
> at that level.
>
> Signed-off-by: Nicolin Chen <[email protected]>
> ---
> .../driver-api/vfio-mediated-device.rst | 4 +--
> drivers/gpu/drm/i915/gvt/kvmgt.c | 24 ++++++-----------
> drivers/s390/cio/vfio_ccw_cp.c | 4 +--
> drivers/s390/crypto/vfio_ap_ops.c | 9 +++----
> drivers/vfio/vfio.c | 27 +++++++++----------
> drivers/vfio/vfio.h | 4 +--
> drivers/vfio/vfio_iommu_type1.c | 17 ++++++------
> include/linux/vfio.h | 5 ++--
> 8 files changed, 40 insertions(+), 54 deletions(-)
>
> diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
> index b0fdf76b339a..ea32a0f13ddb 100644
> --- a/Documentation/driver-api/vfio-mediated-device.rst
> +++ b/Documentation/driver-api/vfio-mediated-device.rst
> @@ -262,10 +262,10 @@ Translation APIs for Mediated Devices
> The following APIs are provided for translating user pfn to host pfn in a VFIO
> driver::
>
> - int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
> + int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
> int npage, int prot, unsigned long *phys_pfn);
>
> - void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
> + void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
> int npage);
>
> These functions call back into the back-end IOMMU module by using the pin_pages
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 8c67c9aba82d..ea6041fa48ac 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -231,16 +231,8 @@ static void intel_gvt_cleanup_vgpu_type_groups(struct intel_gvt *gvt)
> static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
> unsigned long size)
> {
> - int total_pages;
> - int npage;
> -
> - total_pages = roundup(size, PAGE_SIZE) / PAGE_SIZE;
> -
> - for (npage = 0; npage < total_pages; npage++) {
> - unsigned long cur_gfn = gfn + npage;
> -
> - vfio_unpin_pages(&vgpu->vfio_device, &cur_gfn, 1);
> - }
> + vfio_unpin_pages(&vgpu->vfio_device, gfn << PAGE_SHIFT,
> + roundup(size, PAGE_SIZE) / PAGE_SIZE);
> }
>
> /* Pin a normal or compound guest page for dma. */
> @@ -258,14 +250,14 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
> * on stack to hold pfns.
> */
> for (npage = 0; npage < total_pages; npage++) {
> - unsigned long cur_gfn = gfn + npage;
> + dma_addr_t cur_iova = (gfn + npage) << PAGE_SHIFT;
> unsigned long pfn;
>
> - ret = vfio_pin_pages(&vgpu->vfio_device, &cur_gfn, 1,
> + ret = vfio_pin_pages(&vgpu->vfio_device, cur_iova, 1,
> IOMMU_READ | IOMMU_WRITE, &pfn);
> if (ret != 1) {
> - gvt_vgpu_err("vfio_pin_pages failed for gfn 0x%lx, ret %d\n",
> - cur_gfn, ret);
> + gvt_vgpu_err("vfio_pin_pages failed for iova %pad, ret %d\n",
> + &cur_iova, ret);
> goto err;
> }
>
> @@ -309,7 +301,7 @@ static int gvt_dma_map_page(struct intel_vgpu *vgpu, unsigned long gfn,
> if (dma_mapping_error(dev, *dma_addr)) {
> gvt_vgpu_err("DMA mapping failed for pfn 0x%lx, ret %d\n",
> page_to_pfn(page), ret);
> - gvt_unpin_guest_page(vgpu, gfn, size);
> + gvt_unpin_guest_page(vgpu, gfn << PAGE_SHIFT, size);
> return -ENOMEM;
> }
>
> @@ -322,7 +314,7 @@ static void gvt_dma_unmap_page(struct intel_vgpu *vgpu, unsigned long gfn,
> struct device *dev = vgpu->gvt->gt->i915->drm.dev;
>
> dma_unmap_page(dev, dma_addr, size, DMA_BIDIRECTIONAL);
> - gvt_unpin_guest_page(vgpu, gfn, size);
> + gvt_unpin_guest_page(vgpu, gfn << PAGE_SHIFT, size);
> }
>
> static struct gvt_dma *__gvt_cache_find_dma_addr(struct intel_vgpu *vgpu,
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> index 3b94863ad24e..a739262f988d 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -114,7 +114,7 @@ static void pfn_array_unpin(struct pfn_array *pa,
> continue;
> }
>
> - vfio_unpin_pages(vdev, first, npage);
> + vfio_unpin_pages(vdev, *first << PAGE_SHIFT, npage);
> unpinned += npage;
> npage = 1;
> }
> @@ -146,7 +146,7 @@ static int pfn_array_pin(struct pfn_array *pa, struct vfio_device *vdev)
> continue;
> }
>
> - ret = vfio_pin_pages(vdev, first, npage,
> + ret = vfio_pin_pages(vdev, *first << PAGE_SHIFT, npage,
> IOMMU_READ | IOMMU_WRITE,
> &pa->pa_pfn[pinned]);
> if (ret < 0) {
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index bb869b28cebd..8a2018ab3cf0 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -124,7 +124,7 @@ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
> q->saved_isc = VFIO_AP_ISC_INVALID;
> }
> if (q->saved_pfn && !WARN_ON(!q->matrix_mdev)) {
> - vfio_unpin_pages(&q->matrix_mdev->vdev, &q->saved_pfn, 1);
> + vfio_unpin_pages(&q->matrix_mdev->vdev, q->saved_pfn << PAGE_SHIFT, 1);
> q->saved_pfn = 0;
> }
> }
> @@ -258,7 +258,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
> return status;
> }
>
> - ret = vfio_pin_pages(&q->matrix_mdev->vdev, &g_pfn, 1,
> + ret = vfio_pin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1,
> IOMMU_READ | IOMMU_WRITE, &h_pfn);
> switch (ret) {
> case 1:
> @@ -301,7 +301,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
> break;
> case AP_RESPONSE_OTHERWISE_CHANGED:
> /* We could not modify IRQ setings: clear new configuration */
> - vfio_unpin_pages(&q->matrix_mdev->vdev, &g_pfn, 1);
> + vfio_unpin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1);
> kvm_s390_gisc_unregister(kvm, isc);
> break;
> default:
> @@ -1248,9 +1248,8 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
>
> if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
> struct vfio_iommu_type1_dma_unmap *unmap = data;
> - unsigned long g_pfn = unmap->iova >> PAGE_SHIFT;
>
> - vfio_unpin_pages(&matrix_mdev->vdev, &g_pfn, 1);
> + vfio_unpin_pages(&matrix_mdev->vdev, unmap->iova, 1);
> return NOTIFY_OK;
> }
>
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 01f45ec70a3d..813b517c973e 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1910,17 +1910,17 @@ int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs,
> EXPORT_SYMBOL(vfio_set_irqs_validate_and_prepare);
>
> /*
> - * Pin a set of guest PFNs and return their associated host PFNs for local
> + * Pin contiguous guest pages and return their associated host pages for local
> * domain only.
> * @device [in] : device
> - * @user_pfn [in]: array of user/guest PFNs to be pinned.
> - * @npage [in] : count of elements in user_pfn array. This count should not
> + * @iova [in] : starting IOVA of user/guest pages to be pinned.
> + * @npage [in] : count of pages to be pinned. This count should not
> * be greater VFIO_PIN_PAGES_MAX_ENTRIES.
> * @prot [in] : protection flags
> * @phys_pfn[out]: array of host PFNs
> * Return error or number of pages pinned.
> */
> -int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
> +int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
> int npage, int prot, unsigned long *phys_pfn)
> {
> struct vfio_container *container;
> @@ -1928,8 +1928,7 @@ int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
> struct vfio_iommu_driver *driver;
> int ret;
>
> - if (!user_pfn || !phys_pfn || !npage ||
> - !vfio_assert_device_open(device))
> + if (!phys_pfn || !npage || !vfio_assert_device_open(device))
> return -EINVAL;
>
> if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
> @@ -1943,7 +1942,7 @@ int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
> driver = container->iommu_driver;
> if (likely(driver && driver->ops->pin_pages))
> ret = driver->ops->pin_pages(container->iommu_data,
> - group->iommu_group, user_pfn,
> + group->iommu_group, iova,
> npage, prot, phys_pfn);
> else
> ret = -ENOTTY;
> @@ -1953,20 +1952,18 @@ int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
> EXPORT_SYMBOL(vfio_pin_pages);
>
> /*
> - * Unpin set of host PFNs for local domain only.
> + * Unpin contiguous host pages for local domain only.
> * @device [in] : device
> - * @user_pfn [in]: array of user/guest PFNs to be unpinned. Number of user/guest
> - * PFNs should not be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
> - * @npage [in] : count of elements in user_pfn array. This count should not
> + * @iova [in] : starting address of user/guest pages to be unpinned.
> + * @npage [in] : count of pages to be unpinned. This count should not
> * be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
> */
> -void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
> - int npage)
> +void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage)
> {
> struct vfio_container *container;
> struct vfio_iommu_driver *driver;
>
> - if (WARN_ON_ONCE(!user_pfn || !npage || !vfio_assert_device_open(device)))
> + if (WARN_ON_ONCE(!npage || !vfio_assert_device_open(device)))
> return;
>
> if (WARN_ON_ONCE(npage > VFIO_PIN_PAGES_MAX_ENTRIES))
> @@ -1979,7 +1976,7 @@ void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
> if (WARN_ON_ONCE(unlikely(!driver || !driver->ops->unpin_pages)))
> return;
>
> - driver->ops->unpin_pages(container->iommu_data, user_pfn, npage);
> + driver->ops->unpin_pages(container->iommu_data, iova, npage);
> }
> EXPORT_SYMBOL(vfio_unpin_pages);
>
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index bef4edf58138..dbcd0e8c031b 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -50,11 +50,11 @@ struct vfio_iommu_driver_ops {
> struct iommu_group *group);
> int (*pin_pages)(void *iommu_data,
> struct iommu_group *group,
> - unsigned long *user_pfn,
> + dma_addr_t user_iova,
> int npage, int prot,
> unsigned long *phys_pfn);
> void (*unpin_pages)(void *iommu_data,
> - unsigned long *user_pfn, int npage);
> + dma_addr_t user_iova, int npage);
> int (*register_notifier)(void *iommu_data,
> unsigned long *events,
> struct notifier_block *nb);
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 08613edaf722..f10d0c4b1f26 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -828,7 +828,7 @@ static int vfio_unpin_page_external(struct vfio_dma *dma, dma_addr_t iova,
>
> static int vfio_iommu_type1_pin_pages(void *iommu_data,
> struct iommu_group *iommu_group,
> - unsigned long *user_pfn,
> + dma_addr_t user_iova,
> int npage, int prot,
> unsigned long *phys_pfn)
> {
> @@ -840,7 +840,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
> bool do_accounting;
> dma_addr_t iova;
>
> - if (!iommu || !user_pfn || !phys_pfn)
> + if (!iommu || !phys_pfn)
> return -EINVAL;
>
> /* Supported for v2 version only */
> @@ -856,7 +856,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
> again:
> if (iommu->vaddr_invalid_count) {
> for (i = 0; i < npage; i++) {
> - iova = user_pfn[i] << PAGE_SHIFT;
> + iova = user_iova + PAGE_SIZE * i;
> ret = vfio_find_dma_valid(iommu, iova, PAGE_SIZE, &dma);
> if (ret < 0)
> goto pin_done;
> @@ -881,7 +881,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
> for (i = 0; i < npage; i++) {
> struct vfio_pfn *vpfn;
>
> - iova = user_pfn[i] << PAGE_SHIFT;
> + iova = user_iova + PAGE_SIZE * i;
> dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
> if (!dma) {
> ret = -EINVAL;
> @@ -938,7 +938,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
> for (j = 0; j < i; j++) {
> dma_addr_t iova;
>
> - iova = user_pfn[j] << PAGE_SHIFT;
> + iova = user_iova + PAGE_SIZE * j;
> dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
> vfio_unpin_page_external(dma, iova, do_accounting);
> phys_pfn[j] = 0;
> @@ -949,13 +949,13 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
> }
>
> static void vfio_iommu_type1_unpin_pages(void *iommu_data,
> - unsigned long *user_pfn, int npage)
> + dma_addr_t user_iova, int npage)
> {
> struct vfio_iommu *iommu = iommu_data;
> bool do_accounting;
> int i;
>
> - if (WARN_ON_ONCE(!iommu || !user_pfn || npage <= 0))
> + if (WARN_ON_ONCE(!iommu || npage <= 0))
> return;
>
> /* Supported for v2 version only */
> @@ -966,10 +966,9 @@ static void vfio_iommu_type1_unpin_pages(void *iommu_data,
>
> do_accounting = list_empty(&iommu->domain_list);
> for (i = 0; i < npage; i++) {
> + dma_addr_t iova = user_iova + PAGE_SIZE * i;
> struct vfio_dma *dma;
> - dma_addr_t iova;
>
> - iova = user_pfn[i] << PAGE_SHIFT;
> dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
> if (!dma)
> break;
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index d0844ecdc961..c3e441c0bf4b 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -147,10 +147,9 @@ bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
>
> #define VFIO_PIN_PAGES_MAX_ENTRIES (PAGE_SIZE/sizeof(unsigned long))
>
> -int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
> +int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
> int npage, int prot, unsigned long *phys_pfn);
> -void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
> - int npage);
> +void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage);
> int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova,
> void *data, size_t len, bool write);
>

2022-07-06 18:20:42

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 5/9] vfio/ap: Remove redundant pfn

On Tue, Jul 05, 2022 at 11:27:55PM -0700, Nicolin Chen wrote:
> The vfio_ap_ops code maintains both nib address and its PFN, which
> is redundant, merely because vfio_pin/unpin_pages API wanted pfn.
> Since vfio_pin/unpin_pages() now accept "iova", remove duplicated
> pfn code in their callers too.

I would describe this as renaming saved_pfn to saved_iova

> *
> * @vcpu: the object representing the vcpu executing the PQAP(AQIC) instruction.
> * @nib: the location for storing the nib address.
> - * @g_pfn: the location for storing the page frame number of the page containing
> - * the nib.
> *
> * When the PQAP(AQIC) instruction is executed, general register 2 contains the
> * address of the notification indicator byte (nib) used for IRQ notification.
> - * This function parses the nib from gr2 and calculates the page frame
> - * number for the guest of the page containing the nib. The values are
> - * stored in @nib and @g_pfn respectively.
> - *
> - * The g_pfn of the nib is then validated to ensure the nib address is valid.
> + * This function parses and validate the nib from gr2.
> *
> * Return: returns zero if the nib address is a valid; otherwise, returns
> * -EINVAL.
> */
> -static int vfio_ap_validate_nib(struct kvm_vcpu *vcpu, unsigned long *nib,
> - unsigned long *g_pfn)
> +static int vfio_ap_validate_nib(struct kvm_vcpu *vcpu, dma_addr_t *nib)
> {
> *nib = vcpu->run->s.regs.gprs[2];
> - *g_pfn = *nib >> PAGE_SHIFT;
>
> - if (kvm_is_error_hva(gfn_to_hva(vcpu->kvm, *g_pfn)))
> + if (kvm_is_error_hva(gfn_to_hva(vcpu->kvm, *nib >> PAGE_SHIFT)))
> return -EINVAL;

This existing code is fishy. nib is either an IOVA passed to
vfio_pin_pages() or a GFN passed to gfn_to_hva(). These are not the
same thing and are not interchangable - writing code like this assumes
that the guest is running with iommu=pt or no iommu.

Someone should look at it..

Otherwise it looks OK

Reviewed-by: Jason Gunthorpe <[email protected]>

Jason

2022-07-06 18:24:43

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 8/9] vfio/ccw: Add kmap_local_page() for memcpy

On Tue, Jul 05, 2022 at 11:27:58PM -0700, Nicolin Chen wrote:
> A PFN is not secure enough to promise that the memory is not IO. And
> direct access via memcpy() that only handles CPU memory will crash on
> S390 if the PFN is an IO PFN, as we have to use the memcpy_to/fromio()
> that uses the special S390 IO access instructions. On the other hand,
> a "struct page *" is always a CPU coherent thing that fits memcpy().
>
> Also, casting a PFN to "void *" for memcpy() is not a proper practice,
> kmap_local_page() is the correct API to call here, though S390 doesn't
> use highmem, which means kmap_local_page() is a NOP.
>
> There's a following patch changing the vfio_pin_pages() API to return
> a list of "struct page *" instead of PFNs. It will block any IO memory
> from ever getting into this call path, for such a security purpose. In
> this patch, add kmap_local_page() to prepare for that.
>
> Suggested-by: Jason Gunthorpe <[email protected]>
> Signed-off-by: Nicolin Chen <[email protected]>
> ---
> drivers/s390/cio/vfio_ccw_cp.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)

Reviewed-by: Jason Gunthorpe <[email protected]>

Jason

2022-07-06 18:26:02

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 9/9] vfio: Replace phys_pfn with pages for vfio_pin_pages()

On Tue, Jul 05, 2022 at 11:27:59PM -0700, Nicolin Chen wrote:
> Most of the callers of vfio_pin_pages() want "struct page *" and the
> low-level mm code to pin pages returns a list of "struct page *" too.
> So there's no gain in converting "struct page *" to PFN in between.
>
> Replace the output parameter "phys_pfn" list with a "pages" list, to
> simplify callers. This also allows us to replace the vfio_iommu_type1
> implementation with a more efficient one.
>
> For now, also update vfio_iommu_type1 to fit this new parameter too.
>
> Signed-off-by: Nicolin Chen <[email protected]>
> ---
> .../driver-api/vfio-mediated-device.rst | 2 +-
> drivers/gpu/drm/i915/gvt/kvmgt.c | 19 ++++++-------------
> drivers/s390/cio/vfio_ccw_cp.c | 19 +++++++++----------
> drivers/s390/crypto/vfio_ap_ops.c | 6 +++---
> drivers/vfio/vfio.c | 8 ++++----
> drivers/vfio/vfio.h | 2 +-
> drivers/vfio/vfio_iommu_type1.c | 19 +++++++++++--------
> include/linux/vfio.h | 2 +-
> 8 files changed, 36 insertions(+), 41 deletions(-)

Reviewed-by: Jason Gunthorpe <[email protected]>

Jason

2022-07-06 18:36:15

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 7/9] vfio: Rename user_iova of vfio_dma_rw()

On Tue, Jul 05, 2022 at 11:27:57PM -0700, Nicolin Chen wrote:
> Following the updated vfio_pin/unpin_pages(), use the simpler "iova".
>
> Signed-off-by: Nicolin Chen <[email protected]>
> ---
> drivers/vfio/vfio.c | 6 +++---
> include/linux/vfio.h | 2 +-
> 2 files changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Jason Gunthorpe <[email protected]>

Jason

2022-07-06 19:43:18

by Nicolin Chen

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 4/9] vfio: Pass in starting IOVA to vfio_pin/unpin_pages API

On Wed, Jul 06, 2022 at 02:49:23PM -0300, Jason Gunthorpe wrote:
> On Tue, Jul 05, 2022 at 11:27:54PM -0700, Nicolin Chen wrote:
>
> > These functions call back into the back-end IOMMU module by using the pin_pages
> > diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> > index 8c67c9aba82d..ea6041fa48ac 100644
> > --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> > +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c

> > + vfio_unpin_pages(&vgpu->vfio_device, gfn << PAGE_SHIFT,
> > + roundup(size, PAGE_SIZE) / PAGE_SIZE);
>
> These maths are DIV_ROUND_UP()

Actually I see two places in this file doing the same roundup.
So I am going to add a prep patch in v3 to fix them separately.

2022-07-07 06:37:07

by Tian, Kevin

[permalink] [raw]
Subject: RE: [RFT][PATCH v2 0/9] Update vfio_pin/unpin_pages API

> From: Nicolin Chen
> Sent: Wednesday, July 6, 2022 2:28 PM
>
> This is a preparatory series for IOMMUFD v2 patches. It prepares for
> replacing vfio_iommu_type1 implementations of vfio_pin/unpin_pages()
> with IOMMUFD version.
>
> There's a gap between these two versions: the vfio_iommu_type1 version
> inputs a non-contiguous PFN list and outputs another PFN list for the
> pinned physical page list, while the IOMMUFD version only supports a
> contiguous address input by accepting the starting IO virtual address
> of a set of pages to pin and by outputting to a physical page list.
>
> The nature of existing callers mostly aligns with the IOMMUFD version,
> except s390's vfio_ccw_cp code where some additional change is needed
> along with this series. Overall, updating to "iova" and "phys_page"
> does improve the caller side to some extent.
>
> Also fix a misuse of physical address and virtual address in the s390's
> crypto code. And update the input naming at the adjacent vfio_dma_rw().
>
> This is on github:
> https://github.com/nicolinc/iommufd/commits/vfio_pin_pages
>
> Request for testing: I only did build for s390 and i915 code, so it'd
> be nice to have people who have environment to run sanity accordingly.
>

+Terrence who is testing it for i915 now...

2022-07-07 06:46:17

by Nicolin Chen

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 0/9] Update vfio_pin/unpin_pages API

On Thu, Jul 07, 2022 at 06:08:45AM +0000, Tian, Kevin wrote:

> > Request for testing: I only did build for s390 and i915 code, so it'd
> > be nice to have people who have environment to run sanity accordingly.
> >
>
> +Terrence who is testing it for i915 now...

Hi Terrence, would it be possible for you to pull v3 to test on?
https://github.com/nicolinc/iommufd/commits/dev/vfio_pin_pages-v3

They are basically same but there's a new DIV_ROUND_UP change,
which shouldn't result in any functional difference, IMHO. If
v3 passes, I can simply add your Tested-by when I respin it.

Thanks
Nic

2022-07-07 08:47:51

by Tian, Kevin

[permalink] [raw]
Subject: RE: [RFT][PATCH v2 1/9] vfio: Make vfio_unpin_pages() return void

> From: Nicolin Chen <[email protected]>
> Sent: Wednesday, July 6, 2022 2:28 PM
>
> There's only one caller that checks its return value with a WARN_ON_ONCE,
> while all other callers do not check return value at all. So simplify the
> API to return void by embedding similar WARN_ON_ONCEs.

While this change keeps the similar effect as before it leads to different
policy for same type of errors between pin and unpin paths:

e.g.

vfio_unpin_pages():
if (WARN_ON_ONCE(!user_pfn || !npage || !vfio_assert_device_open(device)))
return;

vfio_pin_pages():
if (!user_pfn || !phys_pfn || !npage ||
!vfio_assert_device_open(device))
return -EINVAL;

It sounds a bit weird when reading related code...

2022-07-07 08:49:11

by Tian, Kevin

[permalink] [raw]
Subject: RE: [RFT][PATCH v2 4/9] vfio: Pass in starting IOVA to vfio_pin/unpin_pages API

> From: Nicolin Chen
> Sent: Wednesday, July 6, 2022 2:28 PM
> /*
> - * Pin a set of guest PFNs and return their associated host PFNs for local
> + * Pin contiguous guest pages and return their associated host pages for
> local

can we replace 'guest' with 'user'?

> * domain only.
> * @device [in] : device
> - * @user_pfn [in]: array of user/guest PFNs to be pinned.
> - * @npage [in] : count of elements in user_pfn array. This count should
> not
> + * @iova [in] : starting IOVA of user/guest pages to be pinned.

remove 'guest'.

> + * @npage [in] : count of pages to be pinned. This count should not
> * be greater VFIO_PIN_PAGES_MAX_ENTRIES.

greater 'than' ...

otherwise looks good:

Reviewed-by: Kevin Tian <[email protected]>

2022-07-07 08:49:53

by Tian, Kevin

[permalink] [raw]
Subject: RE: [RFT][PATCH v2 7/9] vfio: Rename user_iova of vfio_dma_rw()

> From: Nicolin Chen <[email protected]>
> Sent: Wednesday, July 6, 2022 2:28 PM
>
> Following the updated vfio_pin/unpin_pages(), use the simpler "iova".
>
> Signed-off-by: Nicolin Chen <[email protected]>

Reviewed-by: Kevin Tian <[email protected]>

2022-07-07 09:14:34

by Tian, Kevin

[permalink] [raw]
Subject: RE: [RFT][PATCH v2 9/9] vfio: Replace phys_pfn with pages for vfio_pin_pages()

> From: Nicolin Chen <[email protected]>
> Sent: Wednesday, July 6, 2022 2:28 PM
>
> Most of the callers of vfio_pin_pages() want "struct page *" and the
> low-level mm code to pin pages returns a list of "struct page *" too.
> So there's no gain in converting "struct page *" to PFN in between.
>
> Replace the output parameter "phys_pfn" list with a "pages" list, to
> simplify callers. This also allows us to replace the vfio_iommu_type1
> implementation with a more efficient one.

worth mentioning that vfio pin is only for struct page * hence the
pfn_valid() check in gvt can be removed.

>
> For now, also update vfio_iommu_type1 to fit this new parameter too.
>
> Signed-off-by: Nicolin Chen <[email protected]>

Reviewed-by: Kevin Tian <[email protected]>

2022-07-07 17:25:27

by Nicolin Chen

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 4/9] vfio: Pass in starting IOVA to vfio_pin/unpin_pages API

On Thu, Jul 07, 2022 at 08:46:12AM +0000, Tian, Kevin wrote:
> External email: Use caution opening links or attachments
>
>
> > From: Nicolin Chen
> > Sent: Wednesday, July 6, 2022 2:28 PM
> > /*
> > - * Pin a set of guest PFNs and return their associated host PFNs for local
> > + * Pin contiguous guest pages and return their associated host pages for
> > local
>
> can we replace 'guest' with 'user'?
>
> > * domain only.
> > * @device [in] : device
> > - * @user_pfn [in]: array of user/guest PFNs to be pinned.
> > - * @npage [in] : count of elements in user_pfn array. This count should
> > not
> > + * @iova [in] : starting IOVA of user/guest pages to be pinned.
>
> remove 'guest'.
>
> > + * @npage [in] : count of pages to be pinned. This count should not
> > * be greater VFIO_PIN_PAGES_MAX_ENTRIES.
>
> greater 'than' ...

Will update them in v3. Thanks

2022-07-07 17:26:36

by Nicolin Chen

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 1/9] vfio: Make vfio_unpin_pages() return void

On Thu, Jul 07, 2022 at 08:42:28AM +0000, Tian, Kevin wrote:
> External email: Use caution opening links or attachments
>
>
> > From: Nicolin Chen <[email protected]>
> > Sent: Wednesday, July 6, 2022 2:28 PM
> >
> > There's only one caller that checks its return value with a WARN_ON_ONCE,
> > while all other callers do not check return value at all. So simplify the
> > API to return void by embedding similar WARN_ON_ONCEs.
>
> While this change keeps the similar effect as before it leads to different
> policy for same type of errors between pin and unpin paths:

I think it's because of the policy that an undo function should not
fail. Meanwhile, indulging faulty inputs isn't good either.

> e.g.
>
> vfio_unpin_pages():
> if (WARN_ON_ONCE(!user_pfn || !npage || !vfio_assert_device_open(device)))
> return;
>
> vfio_pin_pages():
> if (!user_pfn || !phys_pfn || !npage ||
> !vfio_assert_device_open(device))
> return -EINVAL;
>
> It sounds a bit weird when reading related code...

Any better way to handle this?

2022-07-07 17:44:36

by Nicolin Chen

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 9/9] vfio: Replace phys_pfn with pages for vfio_pin_pages()

On Thu, Jul 07, 2022 at 08:49:28AM +0000, Tian, Kevin wrote:
> External email: Use caution opening links or attachments
>
>
> > From: Nicolin Chen <[email protected]>
> > Sent: Wednesday, July 6, 2022 2:28 PM
> >
> > Most of the callers of vfio_pin_pages() want "struct page *" and the
> > low-level mm code to pin pages returns a list of "struct page *" too.
> > So there's no gain in converting "struct page *" to PFN in between.
> >
> > Replace the output parameter "phys_pfn" list with a "pages" list, to
> > simplify callers. This also allows us to replace the vfio_iommu_type1
> > implementation with a more efficient one.
>
> worth mentioning that vfio pin is only for struct page * hence the
> pfn_valid() check in gvt can be removed.

Will add that.

> >
> > For now, also update vfio_iommu_type1 to fit this new parameter too.
> >
> > Signed-off-by: Nicolin Chen <[email protected]>
>
> Reviewed-by: Kevin Tian <[email protected]>

Thanks for the review!

2022-07-07 19:39:23

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 1/9] vfio: Make vfio_unpin_pages() return void

On Thu, Jul 07, 2022 at 10:12:41AM -0700, Nicolin Chen wrote:
> On Thu, Jul 07, 2022 at 08:42:28AM +0000, Tian, Kevin wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > > From: Nicolin Chen <[email protected]>
> > > Sent: Wednesday, July 6, 2022 2:28 PM
> > >
> > > There's only one caller that checks its return value with a WARN_ON_ONCE,
> > > while all other callers do not check return value at all. So simplify the
> > > API to return void by embedding similar WARN_ON_ONCEs.
> >
> > While this change keeps the similar effect as before it leads to different
> > policy for same type of errors between pin and unpin paths:
>
> I think it's because of the policy that an undo function should not
> fail. Meanwhile, indulging faulty inputs isn't good either.
>
> > e.g.
> >
> > vfio_unpin_pages():
> > if (WARN_ON_ONCE(!user_pfn || !npage || !vfio_assert_device_open(device)))
> > return;
> >
> > vfio_pin_pages():
> > if (!user_pfn || !phys_pfn || !npage ||
> > !vfio_assert_device_open(device))
> > return -EINVAL;
> >
> > It sounds a bit weird when reading related code...
>
> Any better way to handle this?

They should all be WARN_ON's, that is the standard pattern to assert
that function arguments must be correctly formed.

I would also drop the tests that obviously will oops on their on
anyone, like NULL pointer checks. This is a semi-performance path.

Jason

2022-07-07 19:45:33

by Nicolin Chen

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 1/9] vfio: Make vfio_unpin_pages() return void

On Thu, Jul 07, 2022 at 04:22:10PM -0300, Jason Gunthorpe wrote:
> On Thu, Jul 07, 2022 at 10:12:41AM -0700, Nicolin Chen wrote:
> > On Thu, Jul 07, 2022 at 08:42:28AM +0000, Tian, Kevin wrote:
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > > From: Nicolin Chen <[email protected]>
> > > > Sent: Wednesday, July 6, 2022 2:28 PM
> > > >
> > > > There's only one caller that checks its return value with a WARN_ON_ONCE,
> > > > while all other callers do not check return value at all. So simplify the
> > > > API to return void by embedding similar WARN_ON_ONCEs.
> > >
> > > While this change keeps the similar effect as before it leads to different
> > > policy for same type of errors between pin and unpin paths:
> >
> > I think it's because of the policy that an undo function should not
> > fail. Meanwhile, indulging faulty inputs isn't good either.
> >
> > > e.g.
> > >
> > > vfio_unpin_pages():
> > > if (WARN_ON_ONCE(!user_pfn || !npage || !vfio_assert_device_open(device)))
> > > return;
> > >
> > > vfio_pin_pages():
> > > if (!user_pfn || !phys_pfn || !npage ||
> > > !vfio_assert_device_open(device))
> > > return -EINVAL;
> > >
> > > It sounds a bit weird when reading related code...
> >
> > Any better way to handle this?
>
> They should all be WARN_ON's, that is the standard pattern to assert
> that function arguments must be correctly formed.

OK. I can change that. I assume that, not confined to arguments,
we might want to have a WARN_ON for the return value check also.

> I would also drop the tests that obviously will oops on their on
> anyone, like NULL pointer checks. This is a semi-performance path.

OK. I will simply remove those NULL pointer checks. Actually,
that !user_pfn check is gone anyway in the following patch, as
user_pfn is replaced with iova.

2022-07-08 07:57:06

by Xu, Terrence

[permalink] [raw]
Subject: RE: [RFT][PATCH v2 0/9] Update vfio_pin/unpin_pages API

> -----Original Message-----
> From: intel-gvt-dev <[email protected]> On Behalf Of
> On Thu, Jul 07, 2022 at 06:08:45AM +0000, Tian, Kevin wrote:
>
> > > Request for testing: I only did build for s390 and i915 code, so
> > > it'd be nice to have people who have environment to run sanity accordingly.
> > >
> >
> > +Terrence who is testing it for i915 now...
>
> Hi Terrence, would it be possible for you to pull v3 to test on?
> https://github.com/nicolinc/iommufd/commits/dev/vfio_pin_pages-v3
>
> They are basically same but there's a new DIV_ROUND_UP change, which
> shouldn't result in any functional difference, IMHO. If
> v3 passes, I can simply add your Tested-by when I respin it.

Hi Nicolin, I already completed KVMGT key feature testing based on your v3 repo, VM booted up successfully and run smoothly, but there is a call trace during each time VM booting up, as the attachment.


Attachments:
call_trace.log (4.19 kB)
call_trace.log

2022-07-08 20:33:29

by Eric Farman

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 8/9] vfio/ccw: Add kmap_local_page() for memcpy

On Tue, 2022-07-05 at 23:27 -0700, Nicolin Chen wrote:
> A PFN is not secure enough to promise that the memory is not IO. And
> direct access via memcpy() that only handles CPU memory will crash on
> S390 if the PFN is an IO PFN, as we have to use the
> memcpy_to/fromio()
> that uses the special S390 IO access instructions. On the other hand,
> a "struct page *" is always a CPU coherent thing that fits memcpy().
>
> Also, casting a PFN to "void *" for memcpy() is not a proper
> practice,
> kmap_local_page() is the correct API to call here, though S390
> doesn't
> use highmem, which means kmap_local_page() is a NOP.
>
> There's a following patch changing the vfio_pin_pages() API to return
> a list of "struct page *" instead of PFNs. It will block any IO
> memory
> from ever getting into this call path, for such a security purpose.
> In
> this patch, add kmap_local_page() to prepare for that.

This all sounds like it's conflating vfio-ccw with vfio-pci, and
configuration-wise I have a hard time picturing the situation described
above. But in the interest of the change in the next patch, I suppose
it's fine.

Acked-by: Eric Farman <[email protected]>

>
> Suggested-by: Jason Gunthorpe <[email protected]>
> Signed-off-by: Nicolin Chen <[email protected]>
> ---
> drivers/s390/cio/vfio_ccw_cp.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c
> b/drivers/s390/cio/vfio_ccw_cp.c
> index 3854c3d573f5..cd4ec4f6d6ff 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -11,6 +11,7 @@
> #include <linux/ratelimit.h>
> #include <linux/mm.h>
> #include <linux/slab.h>
> +#include <linux/highmem.h>
> #include <linux/iommu.h>
> #include <linux/vfio.h>
> #include <asm/idals.h>
> @@ -230,7 +231,6 @@ static long copy_from_iova(struct vfio_device
> *vdev, void *to, u64 iova,
> unsigned long n)
> {
> struct page_array pa = {0};
> - u64 from;
> int i, ret;
> unsigned long l, m;
>
> @@ -246,7 +246,9 @@ static long copy_from_iova(struct vfio_device
> *vdev, void *to, u64 iova,
>
> l = n;
> for (i = 0; i < pa.pa_nr; i++) {
> - from = pa.pa_pfn[i] << PAGE_SHIFT;
> + struct page *page = pfn_to_page(pa.pa_pfn[i]);
> + void *from = kmap_local_page(page);
> +
> m = PAGE_SIZE;
> if (i == 0) {
> from += iova & (PAGE_SIZE - 1);
> @@ -254,7 +256,8 @@ static long copy_from_iova(struct vfio_device
> *vdev, void *to, u64 iova,
> }
>
> m = min(l, m);
> - memcpy(to + (n - l), (void *)from, m);
> + memcpy(to + (n - l), from, m);
> + kunmap_local(from);
>
> l -= m;
> if (l == 0)

2022-07-08 20:33:34

by Eric Farman

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 0/9] Update vfio_pin/unpin_pages API

On Tue, 2022-07-05 at 23:27 -0700, Nicolin Chen wrote:
> This is a preparatory series for IOMMUFD v2 patches. It prepares for
> replacing vfio_iommu_type1 implementations of vfio_pin/unpin_pages()
> with IOMMUFD version.
>
> There's a gap between these two versions: the vfio_iommu_type1
> version
> inputs a non-contiguous PFN list and outputs another PFN list for the
> pinned physical page list, while the IOMMUFD version only supports a
> contiguous address input by accepting the starting IO virtual address
> of a set of pages to pin and by outputting to a physical page list.
>
> The nature of existing callers mostly aligns with the IOMMUFD
> version,
> except s390's vfio_ccw_cp code where some additional change is needed
> along with this series. Overall, updating to "iova" and "phys_page"
> does improve the caller side to some extent.
>
> Also fix a misuse of physical address and virtual address in the
> s390's
> crypto code. And update the input naming at the adjacent
> vfio_dma_rw().
>
> This is on github:
> https://github.com/nicolinc/iommufd/commits/vfio_pin_pages
>
> Request for testing: I only did build for s390 and i915 code, so it'd
> be nice to have people who have environment to run sanity
> accordingly.

Tested-by: Eric Farman <[email protected]> # s390

>
> Thanks!
>
> Changelog
> v2:
> * Added a patch to make vfio_unpin_pages return void
> * Added two patches to remove PFN list from two s390 callers
> * Renamed "phys_page" parameter to "pages" for vfio_pin_pages
> * Updated commit log of kmap_local_page() patch
> * Added Harald's "Reviewed-by" to pa_ind patch
> * Rebased on top of Alex's extern removal path
> v1:
> https://lore.kernel.org/kvm/[email protected]/
>
> Nicolin Chen (9):
> vfio: Make vfio_unpin_pages() return void
> vfio/ap: Pass in physical address of ind to ap_aqic()
> vfio/ccw: Only pass in contiguous pages
> vfio: Pass in starting IOVA to vfio_pin/unpin_pages API
> vfio/ap: Remove redundant pfn
> vfio/ccw: Change pa_pfn list to pa_iova list
> vfio: Rename user_iova of vfio_dma_rw()
> vfio/ccw: Add kmap_local_page() for memcpy
> vfio: Replace phys_pfn with pages for vfio_pin_pages()
>
> .../driver-api/vfio-mediated-device.rst | 6 +-
> arch/s390/include/asm/ap.h | 6 +-
> drivers/gpu/drm/i915/gvt/kvmgt.c | 46 ++---
> drivers/s390/cio/vfio_ccw_cp.c | 195 +++++++++++-----
> --
> drivers/s390/crypto/ap_queue.c | 2 +-
> drivers/s390/crypto/vfio_ap_ops.c | 54 +++--
> drivers/s390/crypto/vfio_ap_private.h | 4 +-
> drivers/vfio/vfio.c | 55 +++--
> drivers/vfio/vfio.h | 8 +-
> drivers/vfio/vfio_iommu_type1.c | 46 +++--
> include/linux/vfio.h | 9 +-
> 11 files changed, 218 insertions(+), 213 deletions(-)
>

2022-07-08 20:44:20

by Nicolin Chen

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 0/9] Update vfio_pin/unpin_pages API

On Fri, Jul 08, 2022 at 07:24:30AM +0000, Xu, Terrence wrote:
> External email: Use caution opening links or attachments
>
>
> > -----Original Message-----
> > From: intel-gvt-dev <[email protected]> On Behalf Of
> > On Thu, Jul 07, 2022 at 06:08:45AM +0000, Tian, Kevin wrote:
> >
> > > > Request for testing: I only did build for s390 and i915 code, so
> > > > it'd be nice to have people who have environment to run sanity accordingly.
> > > >
> > >
> > > +Terrence who is testing it for i915 now...
> >
> > Hi Terrence, would it be possible for you to pull v3 to test on?
> > https://github.com/nicolinc/iommufd/commits/dev/vfio_pin_pages-v3
> >
> > They are basically same but there's a new DIV_ROUND_UP change, which
> > shouldn't result in any functional difference, IMHO. If
> > v3 passes, I can simply add your Tested-by when I respin it.
>
> Hi Nicolin, I already completed KVMGT key feature testing based on
> your v3 repo, VM booted up successfully and run smoothly, but there
> is a call trace during each time VM booting up, as the attachment.

Nice! Thank you for the testing. I will add your Tested-by in v3.

2022-07-08 20:51:18

by Eric Farman

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 3/9] vfio/ccw: Only pass in contiguous pages

On Tue, 2022-07-05 at 23:27 -0700, Nicolin Chen wrote:
> This driver is the only caller of vfio_pin/unpin_pages that might
> pass
> in a non-contiguous PFN list, but in many cases it has a contiguous
> PFN
> list to process. So letting VFIO API handle a non-contiguous PFN list
> is actually counterproductive.
>
> Add a pair of simple loops to pass in contiguous PFNs only, to have
> an
> efficient implementation in VFIO.
>
> Signed-off-by: Nicolin Chen <[email protected]>

Reviewed-by: Eric Farman <[email protected]>

> ---
> drivers/s390/cio/vfio_ccw_cp.c | 70 +++++++++++++++++++++++++++-----
> --
> 1 file changed, 56 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c
> b/drivers/s390/cio/vfio_ccw_cp.c
> index 0c2be9421ab7..3b94863ad24e 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -90,6 +90,38 @@ static int pfn_array_alloc(struct pfn_array *pa,
> u64 iova, unsigned int len)
> return 0;
> }
>
> +/*
> + * pfn_array_unpin() - Unpin user pages in memory
> + * @pa: pfn_array on which to perform the operation
> + * @vdev: the vfio device to perform the operation
> + * @pa_nr: number of user pages to unpin
> + *
> + * Only unpin if any pages were pinned to begin with, i.e. pa_nr >
> 0,
> + * otherwise only clear pa->pa_nr
> + */
> +static void pfn_array_unpin(struct pfn_array *pa,
> + struct vfio_device *vdev, int pa_nr)
> +{
> + int unpinned = 0, npage = 1;
> +
> + while (unpinned < pa_nr) {
> + unsigned long *first = &pa->pa_iova_pfn[unpinned];
> + unsigned long *last = &first[npage];
> +
> + if (unpinned + npage < pa_nr &&
> + *first + npage == *last) {
> + npage++;
> + continue;
> + }
> +
> + vfio_unpin_pages(vdev, first, npage);
> + unpinned += npage;
> + npage = 1;
> + }
> +
> + pa->pa_nr = 0;
> +}
> +
> /*
> * pfn_array_pin() - Pin user pages in memory
> * @pa: pfn_array on which to perform the operation
> @@ -101,34 +133,44 @@ static int pfn_array_alloc(struct pfn_array
> *pa, u64 iova, unsigned int len)
> */
> static int pfn_array_pin(struct pfn_array *pa, struct vfio_device
> *vdev)
> {
> + int pinned = 0, npage = 1;
> int ret = 0;
>
> - ret = vfio_pin_pages(vdev, pa->pa_iova_pfn, pa->pa_nr,
> - IOMMU_READ | IOMMU_WRITE, pa->pa_pfn);
> + while (pinned < pa->pa_nr) {
> + unsigned long *first = &pa->pa_iova_pfn[pinned];
> + unsigned long *last = &first[npage];
>
> - if (ret < 0) {
> - goto err_out;
> - } else if (ret > 0 && ret != pa->pa_nr) {
> - vfio_unpin_pages(vdev, pa->pa_iova_pfn, ret);
> - ret = -EINVAL;
> - goto err_out;
> + if (pinned + npage < pa->pa_nr &&
> + *first + npage == *last) {
> + npage++;
> + continue;
> + }
> +
> + ret = vfio_pin_pages(vdev, first, npage,
> + IOMMU_READ | IOMMU_WRITE,
> + &pa->pa_pfn[pinned]);
> + if (ret < 0) {
> + goto err_out;
> + } else if (ret > 0 && ret != npage) {
> + pinned += ret;
> + ret = -EINVAL;
> + goto err_out;
> + }
> + pinned += npage;
> + npage = 1;
> }
>
> return ret;
>
> err_out:
> - pa->pa_nr = 0;
> -
> + pfn_array_unpin(pa, vdev, pinned);
> return ret;
> }
>
> /* Unpin the pages before releasing the memory. */
> static void pfn_array_unpin_free(struct pfn_array *pa, struct
> vfio_device *vdev)
> {
> - /* Only unpin if any pages were pinned to begin with */
> - if (pa->pa_nr)
> - vfio_unpin_pages(vdev, pa->pa_iova_pfn, pa->pa_nr);
> - pa->pa_nr = 0;
> + pfn_array_unpin(pa, vdev, pa->pa_nr);
> kfree(pa->pa_iova_pfn);
> }
>

2022-07-08 20:51:53

by Eric Farman

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 3/9] vfio/ccw: Only pass in contiguous pages

On Wed, 2022-07-06 at 14:05 -0300, Jason Gunthorpe wrote:
> On Tue, Jul 05, 2022 at 11:27:53PM -0700, Nicolin Chen wrote:
> > This driver is the only caller of vfio_pin/unpin_pages that might
> > pass
> > in a non-contiguous PFN list, but in many cases it has a contiguous
> > PFN
> > list to process. So letting VFIO API handle a non-contiguous PFN
> > list
> > is actually counterproductive.
> >
> > Add a pair of simple loops to pass in contiguous PFNs only, to have
> > an
> > efficient implementation in VFIO.
> >
> > Signed-off-by: Nicolin Chen <[email protected]>
> > ---
> > drivers/s390/cio/vfio_ccw_cp.c | 70 +++++++++++++++++++++++++++---
> > ----
> > 1 file changed, 56 insertions(+), 14 deletions(-)
>
> I think this is fine as-is for this series, but someone who knows and
> can test ccw should go in and fix things so that pfn_array_alloc()
> doesn't exist. Allocating memory and filling it with consecutive
> integers is kind of silly given we can just call vfio_pin_pages()
> with
> pa_nr directly.
>
> pa->pa_iova_pfn[0] = pa->pa_iova >> PAGE_SHIFT;
> pa->pa_pfn[0] = -1ULL;
> for (i = 1; i < pa->pa_nr; i++) {
> pa->pa_iova_pfn[i] = pa->pa_iova_pfn[i - 1] + 1;
>
> It looks like only the 'ccw_is_idal' flow can actually create
> non-continuities.

This code is certainly not my favorite, but you're right that it's the
IDAL flow that generates the non-contiguous requests and the code you
reference is simply an initialization for the !IDAL case. As I have a
todo in this code anyway, I'll register your suggestion to see if they
can be untangled.

> Also the loop in copy_from_iova() should ideally be
> using the much faster 'rw' interface, and not a pin/unpin cycle just
> to memcpy.

I guess I missed when that was added. This looks like low hanging fruit
for some ollllld code regardless of the above. Will get to this once
I'm back. Thank you!

Eric

>
> If I guess right these changes would significantly speed this driver
> up.
>
> Anyhow,
>
> Reviewed-by: Jason Gunthorpe <[email protected]>
>
> Jason

2022-07-08 20:52:13

by Eric Farman

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 9/9] vfio: Replace phys_pfn with pages for vfio_pin_pages()

On Tue, 2022-07-05 at 23:27 -0700, Nicolin Chen wrote:
> Most of the callers of vfio_pin_pages() want "struct page *" and the
> low-level mm code to pin pages returns a list of "struct page *" too.
> So there's no gain in converting "struct page *" to PFN in between.
>
> Replace the output parameter "phys_pfn" list with a "pages" list, to
> simplify callers. This also allows us to replace the vfio_iommu_type1
> implementation with a more efficient one.
>
> For now, also update vfio_iommu_type1 to fit this new parameter too.
>
> Signed-off-by: Nicolin Chen <[email protected]>
> ---
> .../driver-api/vfio-mediated-device.rst | 2 +-
> drivers/gpu/drm/i915/gvt/kvmgt.c | 19 ++++++-----------
> --
> drivers/s390/cio/vfio_ccw_cp.c | 19 +++++++++------
> ----

Acked-by: Eric Farman <[email protected]>

> drivers/s390/crypto/vfio_ap_ops.c | 6 +++---
> drivers/vfio/vfio.c | 8 ++++----
> drivers/vfio/vfio.h | 2 +-
> drivers/vfio/vfio_iommu_type1.c | 19 +++++++++++----
> ----
> include/linux/vfio.h | 2 +-
> 8 files changed, 36 insertions(+), 41 deletions(-)
>
> diff --git a/Documentation/driver-api/vfio-mediated-device.rst
> b/Documentation/driver-api/vfio-mediated-device.rst
> index ea32a0f13ddb..ba5fefcdae1a 100644
> --- a/Documentation/driver-api/vfio-mediated-device.rst
> +++ b/Documentation/driver-api/vfio-mediated-device.rst
> @@ -263,7 +263,7 @@ The following APIs are provided for translating
> user pfn to host pfn in a VFIO
> driver::
>
> int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
> - int npage, int prot, unsigned long
> *phys_pfn);
> + int npage, int prot, struct page
> **pages);
>
> void vfio_unpin_pages(struct vfio_device *device, dma_addr_t
> iova,
> int npage);
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c
> b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index ea6041fa48ac..3a49471dcc16 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -239,7 +239,7 @@ static void gvt_unpin_guest_page(struct
> intel_vgpu *vgpu, unsigned long gfn,
> static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long
> gfn,
> unsigned long size, struct page **page)
> {
> - unsigned long base_pfn = 0;
> + struct page *base_page = NULL;
> int total_pages;
> int npage;
> int ret;
> @@ -251,26 +251,19 @@ static int gvt_pin_guest_page(struct intel_vgpu
> *vgpu, unsigned long gfn,
> */
> for (npage = 0; npage < total_pages; npage++) {
> dma_addr_t cur_iova = (gfn + npage) << PAGE_SHIFT;
> - unsigned long pfn;
> + struct page *cur_page;
>
> ret = vfio_pin_pages(&vgpu->vfio_device, cur_iova, 1,
> - IOMMU_READ | IOMMU_WRITE, &pfn);
> + IOMMU_READ | IOMMU_WRITE,
> &cur_page);
> if (ret != 1) {
> gvt_vgpu_err("vfio_pin_pages failed for iova
> %pad, ret %d\n",
> &cur_iova, ret);
> goto err;
> }
>
> - if (!pfn_valid(pfn)) {
> - gvt_vgpu_err("pfn 0x%lx is not mem backed\n",
> pfn);
> - npage++;
> - ret = -EFAULT;
> - goto err;
> - }
> -
> if (npage == 0)
> - base_pfn = pfn;
> - else if (base_pfn + npage != pfn) {
> + base_page = cur_page;
> + else if (base_page + npage != cur_page) {
> gvt_vgpu_err("The pages are not continuous\n");
> ret = -EINVAL;
> npage++;
> @@ -278,7 +271,7 @@ static int gvt_pin_guest_page(struct intel_vgpu
> *vgpu, unsigned long gfn,
> }
> }
>
> - *page = pfn_to_page(base_pfn);
> + *page = base_page;
> return 0;
> err:
> gvt_unpin_guest_page(vgpu, gfn, npage * PAGE_SIZE);
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c
> b/drivers/s390/cio/vfio_ccw_cp.c
> index cd4ec4f6d6ff..8963f452f963 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -22,8 +22,8 @@
> struct page_array {
> /* Array that stores pages need to pin. */
> dma_addr_t *pa_iova;
> - /* Array that receives PFNs of the pages pinned. */
> - unsigned long *pa_pfn;
> + /* Array that receives the pinned pages. */
> + struct page **pa_page;
> /* Number of pages pinned from @pa_iova. */
> int pa_nr;
> };
> @@ -68,19 +68,19 @@ static int page_array_alloc(struct page_array
> *pa, u64 iova, unsigned int len)
> return -EINVAL;
>
> pa->pa_iova = kcalloc(pa->pa_nr,
> - sizeof(*pa->pa_iova) + sizeof(*pa-
> >pa_pfn),
> + sizeof(*pa->pa_iova) + sizeof(*pa-
> >pa_page),
> GFP_KERNEL);
> if (unlikely(!pa->pa_iova)) {
> pa->pa_nr = 0;
> return -ENOMEM;
> }
> - pa->pa_pfn = (unsigned long *)&pa->pa_iova[pa->pa_nr];
> + pa->pa_page = (struct page **)&pa->pa_iova[pa->pa_nr];
>
> pa->pa_iova[0] = iova;
> - pa->pa_pfn[0] = -1ULL;
> + pa->pa_page[0] = NULL;
> for (i = 1; i < pa->pa_nr; i++) {
> pa->pa_iova[i] = pa->pa_iova[i - 1] + PAGE_SIZE;
> - pa->pa_pfn[i] = -1ULL;
> + pa->pa_page[i] = NULL;
> }
>
> return 0;
> @@ -144,7 +144,7 @@ static int page_array_pin(struct page_array *pa,
> struct vfio_device *vdev)
>
> ret = vfio_pin_pages(vdev, *first, npage,
> IOMMU_READ | IOMMU_WRITE,
> - &pa->pa_pfn[pinned]);
> + &pa->pa_page[pinned]);
> if (ret < 0) {
> goto err_out;
> } else if (ret > 0 && ret != npage) {
> @@ -195,7 +195,7 @@ static inline void
> page_array_idal_create_words(struct page_array *pa,
> */
>
> for (i = 0; i < pa->pa_nr; i++)
> - idaws[i] = pa->pa_pfn[i] << PAGE_SHIFT;
> + idaws[i] = page_to_phys(pa->pa_page[i]);
>
> /* Adjust the first IDAW, since it may not start on a page
> boundary */
> idaws[0] += pa->pa_iova[0] & (PAGE_SIZE - 1);
> @@ -246,8 +246,7 @@ static long copy_from_iova(struct vfio_device
> *vdev, void *to, u64 iova,
>
> l = n;
> for (i = 0; i < pa.pa_nr; i++) {
> - struct page *page = pfn_to_page(pa.pa_pfn[i]);
> - void *from = kmap_local_page(page);
> + void *from = kmap_local_page(pa.pa_page[i]);
>
> m = PAGE_SIZE;
> if (i == 0) {
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
> b/drivers/s390/crypto/vfio_ap_ops.c
> index 6945e0ddc08c..b0972ca0dfa3 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -234,9 +234,9 @@ static struct ap_queue_status
> vfio_ap_irq_enable(struct vfio_ap_queue *q,
> struct ap_qirq_ctrl aqic_gisa = {};
> struct ap_queue_status status = {};
> struct kvm_s390_gisa *gisa;
> + struct page *h_page;
> int nisc;
> struct kvm *kvm;
> - unsigned long h_pfn;
> phys_addr_t h_nib;
> dma_addr_t nib;
> int ret;
> @@ -251,7 +251,7 @@ static struct ap_queue_status
> vfio_ap_irq_enable(struct vfio_ap_queue *q,
> }
>
> ret = vfio_pin_pages(&q->matrix_mdev->vdev, nib, 1,
> - IOMMU_READ | IOMMU_WRITE, &h_pfn);
> + IOMMU_READ | IOMMU_WRITE, &h_page);
> switch (ret) {
> case 1:
> break;
> @@ -267,7 +267,7 @@ static struct ap_queue_status
> vfio_ap_irq_enable(struct vfio_ap_queue *q,
> kvm = q->matrix_mdev->kvm;
> gisa = kvm->arch.gisa_int.origin;
>
> - h_nib = (h_pfn << PAGE_SHIFT) | (nib & ~PAGE_MASK);
> + h_nib = page_to_phys(h_page) | (nib & ~PAGE_MASK);
> aqic_gisa.gisc = isc;
>
> nisc = kvm_s390_gisc_register(kvm, isc);
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index b95ec2d78966..96b758e06c4a 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1917,18 +1917,18 @@
> EXPORT_SYMBOL(vfio_set_irqs_validate_and_prepare);
> * @npage [in] : count of pages to be pinned. This count should
> not
> * be greater VFIO_PIN_PAGES_MAX_ENTRIES.
> * @prot [in] : protection flags
> - * @phys_pfn[out]: array of host PFNs
> + * @pages[out] : array of host pages
> * Return error or number of pages pinned.
> */
> int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
> - int npage, int prot, unsigned long *phys_pfn)
> + int npage, int prot, struct page **pages)
> {
> struct vfio_container *container;
> struct vfio_group *group = device->group;
> struct vfio_iommu_driver *driver;
> int ret;
>
> - if (!phys_pfn || !npage || !vfio_assert_device_open(device))
> + if (!pages || !npage || !vfio_assert_device_open(device))
> return -EINVAL;
>
> if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
> @@ -1943,7 +1943,7 @@ int vfio_pin_pages(struct vfio_device *device,
> dma_addr_t iova,
> if (likely(driver && driver->ops->pin_pages))
> ret = driver->ops->pin_pages(container->iommu_data,
> group->iommu_group, iova,
> - npage, prot, phys_pfn);
> + npage, prot, pages);
> else
> ret = -ENOTTY;
>
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index dbcd0e8c031b..dbfad8e20581 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -52,7 +52,7 @@ struct vfio_iommu_driver_ops {
> struct iommu_group *group,
> dma_addr_t user_iova,
> int npage, int prot,
> - unsigned long *phys_pfn);
> + struct page **pages);
> void (*unpin_pages)(void *iommu_data,
> dma_addr_t user_iova, int
> npage);
> int (*register_notifier)(void *iommu_data,
> diff --git a/drivers/vfio/vfio_iommu_type1.c
> b/drivers/vfio/vfio_iommu_type1.c
> index f10d0c4b1f26..de342d82b154 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -830,7 +830,7 @@ static int vfio_iommu_type1_pin_pages(void
> *iommu_data,
> struct iommu_group *iommu_group,
> dma_addr_t user_iova,
> int npage, int prot,
> - unsigned long *phys_pfn)
> + struct page **pages)
> {
> struct vfio_iommu *iommu = iommu_data;
> struct vfio_iommu_group *group;
> @@ -840,7 +840,7 @@ static int vfio_iommu_type1_pin_pages(void
> *iommu_data,
> bool do_accounting;
> dma_addr_t iova;
>
> - if (!iommu || !phys_pfn)
> + if (!iommu || !pages)
> return -EINVAL;
>
> /* Supported for v2 version only */
> @@ -879,6 +879,7 @@ static int vfio_iommu_type1_pin_pages(void
> *iommu_data,
> do_accounting = list_empty(&iommu->domain_list);
>
> for (i = 0; i < npage; i++) {
> + unsigned long phys_pfn;
> struct vfio_pfn *vpfn;
>
> iova = user_iova + PAGE_SIZE * i;
> @@ -895,23 +896,25 @@ static int vfio_iommu_type1_pin_pages(void
> *iommu_data,
>
> vpfn = vfio_iova_get_vfio_pfn(dma, iova);
> if (vpfn) {
> - phys_pfn[i] = vpfn->pfn;
> + pages[i] = pfn_to_page(vpfn->pfn);
> continue;
> }
>
> remote_vaddr = dma->vaddr + (iova - dma->iova);
> - ret = vfio_pin_page_external(dma, remote_vaddr,
> &phys_pfn[i],
> + ret = vfio_pin_page_external(dma, remote_vaddr,
> &phys_pfn,
> do_accounting);
> if (ret)
> goto pin_unwind;
>
> - ret = vfio_add_to_pfn_list(dma, iova, phys_pfn[i]);
> + ret = vfio_add_to_pfn_list(dma, iova, phys_pfn);
> if (ret) {
> - if (put_pfn(phys_pfn[i], dma->prot) &&
> do_accounting)
> + if (put_pfn(phys_pfn, dma->prot) &&
> do_accounting)
> vfio_lock_acct(dma, -1, true);
> goto pin_unwind;
> }
>
> + pages[i] = pfn_to_page(phys_pfn);
> +
> if (iommu->dirty_page_tracking) {
> unsigned long pgshift = __ffs(iommu-
> >pgsize_bitmap);
>
> @@ -934,14 +937,14 @@ static int vfio_iommu_type1_pin_pages(void
> *iommu_data,
> goto pin_done;
>
> pin_unwind:
> - phys_pfn[i] = 0;
> + pages[i] = NULL;
> for (j = 0; j < i; j++) {
> dma_addr_t iova;
>
> iova = user_iova + PAGE_SIZE * j;
> dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
> vfio_unpin_page_external(dma, iova, do_accounting);
> - phys_pfn[j] = 0;
> + pages[j] = NULL;
> }
> pin_done:
> mutex_unlock(&iommu->lock);
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 9108de34a79b..c76a2c492bbd 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -148,7 +148,7 @@ bool vfio_file_has_dev(struct file *file, struct
> vfio_device *device);
> #define VFIO_PIN_PAGES_MAX_ENTRIES (PAGE_SIZE/sizeof(unsigned
> long))
>
> int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
> - int npage, int prot, unsigned long *phys_pfn);
> + int npage, int prot, struct page **pages);
> void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
> int npage);
> int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova,
> void *data, size_t len, bool write);

2022-07-08 20:59:52

by Eric Farman

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 4/9] vfio: Pass in starting IOVA to vfio_pin/unpin_pages API

On Tue, 2022-07-05 at 23:27 -0700, Nicolin Chen wrote:
> The vfio_pin/unpin_pages() so far accepted arrays of PFNs of user
> IOVA.
> Among all three callers, there was only one caller possibly passing
> in
> a non-contiguous PFN list, which is now ensured to have contiguous
> PFN
> inputs too.
>
> Pass in the starting address with "iova" alone to simplify things, so
> callers no longer need to maintain a PFN list or to pin/unpin one
> page
> at a time. This also allows VFIO to use more efficient
> implementations
> of pin/unpin_pages.
>
> For now, also update vfio_iommu_type1 to fit this new parameter too,
> while keeping its input intact (being user_iova) since we don't want
> to spend too much effort swapping its parameters and local variables
> at that level.
>
> Signed-off-by: Nicolin Chen <[email protected]>
> ---
> .../driver-api/vfio-mediated-device.rst | 4 +--
> drivers/gpu/drm/i915/gvt/kvmgt.c | 24 ++++++-----------
> drivers/s390/cio/vfio_ccw_cp.c | 4 +--

Acked-by: Eric Farman <[email protected]>

> drivers/s390/crypto/vfio_ap_ops.c | 9 +++----
> drivers/vfio/vfio.c | 27 +++++++++------
> ----
> drivers/vfio/vfio.h | 4 +--
> drivers/vfio/vfio_iommu_type1.c | 17 ++++++------
> include/linux/vfio.h | 5 ++--
> 8 files changed, 40 insertions(+), 54 deletions(-)
>
> diff --git a/Documentation/driver-api/vfio-mediated-device.rst
> b/Documentation/driver-api/vfio-mediated-device.rst
> index b0fdf76b339a..ea32a0f13ddb 100644
> --- a/Documentation/driver-api/vfio-mediated-device.rst
> +++ b/Documentation/driver-api/vfio-mediated-device.rst
> @@ -262,10 +262,10 @@ Translation APIs for Mediated Devices
> The following APIs are provided for translating user pfn to host pfn
> in a VFIO
> driver::
>
> - int vfio_pin_pages(struct vfio_device *device, unsigned long
> *user_pfn,
> + int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
> int npage, int prot, unsigned long
> *phys_pfn);
>
> - void vfio_unpin_pages(struct vfio_device *device, unsigned long
> *user_pfn,
> + void vfio_unpin_pages(struct vfio_device *device, dma_addr_t
> iova,
> int npage);
>
> These functions call back into the back-end IOMMU module by using
> the pin_pages
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c
> b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 8c67c9aba82d..ea6041fa48ac 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -231,16 +231,8 @@ static void
> intel_gvt_cleanup_vgpu_type_groups(struct intel_gvt *gvt)
> static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned
> long gfn,
> unsigned long size)
> {
> - int total_pages;
> - int npage;
> -
> - total_pages = roundup(size, PAGE_SIZE) / PAGE_SIZE;
> -
> - for (npage = 0; npage < total_pages; npage++) {
> - unsigned long cur_gfn = gfn + npage;
> -
> - vfio_unpin_pages(&vgpu->vfio_device, &cur_gfn, 1);
> - }
> + vfio_unpin_pages(&vgpu->vfio_device, gfn << PAGE_SHIFT,
> + roundup(size, PAGE_SIZE) / PAGE_SIZE);
> }
>
> /* Pin a normal or compound guest page for dma. */
> @@ -258,14 +250,14 @@ static int gvt_pin_guest_page(struct intel_vgpu
> *vgpu, unsigned long gfn,
> * on stack to hold pfns.
> */
> for (npage = 0; npage < total_pages; npage++) {
> - unsigned long cur_gfn = gfn + npage;
> + dma_addr_t cur_iova = (gfn + npage) << PAGE_SHIFT;
> unsigned long pfn;
>
> - ret = vfio_pin_pages(&vgpu->vfio_device, &cur_gfn, 1,
> + ret = vfio_pin_pages(&vgpu->vfio_device, cur_iova, 1,
> IOMMU_READ | IOMMU_WRITE, &pfn);
> if (ret != 1) {
> - gvt_vgpu_err("vfio_pin_pages failed for gfn
> 0x%lx, ret %d\n",
> - cur_gfn, ret);
> + gvt_vgpu_err("vfio_pin_pages failed for iova
> %pad, ret %d\n",
> + &cur_iova, ret);
> goto err;
> }
>
> @@ -309,7 +301,7 @@ static int gvt_dma_map_page(struct intel_vgpu
> *vgpu, unsigned long gfn,
> if (dma_mapping_error(dev, *dma_addr)) {
> gvt_vgpu_err("DMA mapping failed for pfn 0x%lx, ret
> %d\n",
> page_to_pfn(page), ret);
> - gvt_unpin_guest_page(vgpu, gfn, size);
> + gvt_unpin_guest_page(vgpu, gfn << PAGE_SHIFT, size);
> return -ENOMEM;
> }
>
> @@ -322,7 +314,7 @@ static void gvt_dma_unmap_page(struct intel_vgpu
> *vgpu, unsigned long gfn,
> struct device *dev = vgpu->gvt->gt->i915->drm.dev;
>
> dma_unmap_page(dev, dma_addr, size, DMA_BIDIRECTIONAL);
> - gvt_unpin_guest_page(vgpu, gfn, size);
> + gvt_unpin_guest_page(vgpu, gfn << PAGE_SHIFT, size);
> }
>
> static struct gvt_dma *__gvt_cache_find_dma_addr(struct intel_vgpu
> *vgpu,
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c
> b/drivers/s390/cio/vfio_ccw_cp.c
> index 3b94863ad24e..a739262f988d 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -114,7 +114,7 @@ static void pfn_array_unpin(struct pfn_array *pa,
> continue;
> }
>
> - vfio_unpin_pages(vdev, first, npage);
> + vfio_unpin_pages(vdev, *first << PAGE_SHIFT, npage);
> unpinned += npage;
> npage = 1;
> }
> @@ -146,7 +146,7 @@ static int pfn_array_pin(struct pfn_array *pa,
> struct vfio_device *vdev)
> continue;
> }
>
> - ret = vfio_pin_pages(vdev, first, npage,
> + ret = vfio_pin_pages(vdev, *first << PAGE_SHIFT, npage,
> IOMMU_READ | IOMMU_WRITE,
> &pa->pa_pfn[pinned]);
> if (ret < 0) {
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c
> b/drivers/s390/crypto/vfio_ap_ops.c
> index bb869b28cebd..8a2018ab3cf0 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -124,7 +124,7 @@ static void vfio_ap_free_aqic_resources(struct
> vfio_ap_queue *q)
> q->saved_isc = VFIO_AP_ISC_INVALID;
> }
> if (q->saved_pfn && !WARN_ON(!q->matrix_mdev)) {
> - vfio_unpin_pages(&q->matrix_mdev->vdev, &q->saved_pfn,
> 1);
> + vfio_unpin_pages(&q->matrix_mdev->vdev, q->saved_pfn <<
> PAGE_SHIFT, 1);
> q->saved_pfn = 0;
> }
> }
> @@ -258,7 +258,7 @@ static struct ap_queue_status
> vfio_ap_irq_enable(struct vfio_ap_queue *q,
> return status;
> }
>
> - ret = vfio_pin_pages(&q->matrix_mdev->vdev, &g_pfn, 1,
> + ret = vfio_pin_pages(&q->matrix_mdev->vdev, g_pfn <<
> PAGE_SHIFT, 1,
> IOMMU_READ | IOMMU_WRITE, &h_pfn);
> switch (ret) {
> case 1:
> @@ -301,7 +301,7 @@ static struct ap_queue_status
> vfio_ap_irq_enable(struct vfio_ap_queue *q,
> break;
> case AP_RESPONSE_OTHERWISE_CHANGED:
> /* We could not modify IRQ setings: clear new
> configuration */
> - vfio_unpin_pages(&q->matrix_mdev->vdev, &g_pfn, 1);
> + vfio_unpin_pages(&q->matrix_mdev->vdev, g_pfn <<
> PAGE_SHIFT, 1);
> kvm_s390_gisc_unregister(kvm, isc);
> break;
> default:
> @@ -1248,9 +1248,8 @@ static int vfio_ap_mdev_iommu_notifier(struct
> notifier_block *nb,
>
> if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
> struct vfio_iommu_type1_dma_unmap *unmap = data;
> - unsigned long g_pfn = unmap->iova >> PAGE_SHIFT;
>
> - vfio_unpin_pages(&matrix_mdev->vdev, &g_pfn, 1);
> + vfio_unpin_pages(&matrix_mdev->vdev, unmap->iova, 1);
> return NOTIFY_OK;
> }
>
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 01f45ec70a3d..813b517c973e 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1910,17 +1910,17 @@ int vfio_set_irqs_validate_and_prepare(struct
> vfio_irq_set *hdr, int num_irqs,
> EXPORT_SYMBOL(vfio_set_irqs_validate_and_prepare);
>
> /*
> - * Pin a set of guest PFNs and return their associated host PFNs for
> local
> + * Pin contiguous guest pages and return their associated host pages
> for local
> * domain only.
> * @device [in] : device
> - * @user_pfn [in]: array of user/guest PFNs to be pinned.
> - * @npage [in] : count of elements in user_pfn array. This count
> should not
> + * @iova [in] : starting IOVA of user/guest pages to be pinned.
> + * @npage [in] : count of pages to be pinned. This count should
> not
> * be greater VFIO_PIN_PAGES_MAX_ENTRIES.
> * @prot [in] : protection flags
> * @phys_pfn[out]: array of host PFNs
> * Return error or number of pages pinned.
> */
> -int vfio_pin_pages(struct vfio_device *device, unsigned long
> *user_pfn,
> +int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
> int npage, int prot, unsigned long *phys_pfn)
> {
> struct vfio_container *container;
> @@ -1928,8 +1928,7 @@ int vfio_pin_pages(struct vfio_device *device,
> unsigned long *user_pfn,
> struct vfio_iommu_driver *driver;
> int ret;
>
> - if (!user_pfn || !phys_pfn || !npage ||
> - !vfio_assert_device_open(device))
> + if (!phys_pfn || !npage || !vfio_assert_device_open(device))
> return -EINVAL;
>
> if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
> @@ -1943,7 +1942,7 @@ int vfio_pin_pages(struct vfio_device *device,
> unsigned long *user_pfn,
> driver = container->iommu_driver;
> if (likely(driver && driver->ops->pin_pages))
> ret = driver->ops->pin_pages(container->iommu_data,
> - group->iommu_group,
> user_pfn,
> + group->iommu_group, iova,
> npage, prot, phys_pfn);
> else
> ret = -ENOTTY;
> @@ -1953,20 +1952,18 @@ int vfio_pin_pages(struct vfio_device
> *device, unsigned long *user_pfn,
> EXPORT_SYMBOL(vfio_pin_pages);
>
> /*
> - * Unpin set of host PFNs for local domain only.
> + * Unpin contiguous host pages for local domain only.
> * @device [in] : device
> - * @user_pfn [in]: array of user/guest PFNs to be unpinned. Number
> of user/guest
> - * PFNs should not be greater than
> VFIO_PIN_PAGES_MAX_ENTRIES.
> - * @npage [in] : count of elements in user_pfn array. This count
> should not
> + * @iova [in] : starting address of user/guest pages to be
> unpinned.
> + * @npage [in] : count of pages to be unpinned. This count should
> not
> * be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
> */
> -void vfio_unpin_pages(struct vfio_device *device, unsigned long
> *user_pfn,
> - int npage)
> +void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
> int npage)
> {
> struct vfio_container *container;
> struct vfio_iommu_driver *driver;
>
> - if (WARN_ON_ONCE(!user_pfn || !npage ||
> !vfio_assert_device_open(device)))
> + if (WARN_ON_ONCE(!npage || !vfio_assert_device_open(device)))
> return;
>
> if (WARN_ON_ONCE(npage > VFIO_PIN_PAGES_MAX_ENTRIES))
> @@ -1979,7 +1976,7 @@ void vfio_unpin_pages(struct vfio_device
> *device, unsigned long *user_pfn,
> if (WARN_ON_ONCE(unlikely(!driver || !driver->ops-
> >unpin_pages)))
> return;
>
> - driver->ops->unpin_pages(container->iommu_data, user_pfn,
> npage);
> + driver->ops->unpin_pages(container->iommu_data, iova, npage);
> }
> EXPORT_SYMBOL(vfio_unpin_pages);
>
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index bef4edf58138..dbcd0e8c031b 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -50,11 +50,11 @@ struct vfio_iommu_driver_ops {
> struct iommu_group *group);
> int (*pin_pages)(void *iommu_data,
> struct iommu_group *group,
> - unsigned long *user_pfn,
> + dma_addr_t user_iova,
> int npage, int prot,
> unsigned long *phys_pfn);
> void (*unpin_pages)(void *iommu_data,
> - unsigned long *user_pfn, int
> npage);
> + dma_addr_t user_iova, int
> npage);
> int (*register_notifier)(void *iommu_data,
> unsigned long *events,
> struct notifier_block
> *nb);
> diff --git a/drivers/vfio/vfio_iommu_type1.c
> b/drivers/vfio/vfio_iommu_type1.c
> index 08613edaf722..f10d0c4b1f26 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -828,7 +828,7 @@ static int vfio_unpin_page_external(struct
> vfio_dma *dma, dma_addr_t iova,
>
> static int vfio_iommu_type1_pin_pages(void *iommu_data,
> struct iommu_group *iommu_group,
> - unsigned long *user_pfn,
> + dma_addr_t user_iova,
> int npage, int prot,
> unsigned long *phys_pfn)
> {
> @@ -840,7 +840,7 @@ static int vfio_iommu_type1_pin_pages(void
> *iommu_data,
> bool do_accounting;
> dma_addr_t iova;
>
> - if (!iommu || !user_pfn || !phys_pfn)
> + if (!iommu || !phys_pfn)
> return -EINVAL;
>
> /* Supported for v2 version only */
> @@ -856,7 +856,7 @@ static int vfio_iommu_type1_pin_pages(void
> *iommu_data,
> again:
> if (iommu->vaddr_invalid_count) {
> for (i = 0; i < npage; i++) {
> - iova = user_pfn[i] << PAGE_SHIFT;
> + iova = user_iova + PAGE_SIZE * i;
> ret = vfio_find_dma_valid(iommu, iova,
> PAGE_SIZE, &dma);
> if (ret < 0)
> goto pin_done;
> @@ -881,7 +881,7 @@ static int vfio_iommu_type1_pin_pages(void
> *iommu_data,
> for (i = 0; i < npage; i++) {
> struct vfio_pfn *vpfn;
>
> - iova = user_pfn[i] << PAGE_SHIFT;
> + iova = user_iova + PAGE_SIZE * i;
> dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
> if (!dma) {
> ret = -EINVAL;
> @@ -938,7 +938,7 @@ static int vfio_iommu_type1_pin_pages(void
> *iommu_data,
> for (j = 0; j < i; j++) {
> dma_addr_t iova;
>
> - iova = user_pfn[j] << PAGE_SHIFT;
> + iova = user_iova + PAGE_SIZE * j;
> dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
> vfio_unpin_page_external(dma, iova, do_accounting);
> phys_pfn[j] = 0;
> @@ -949,13 +949,13 @@ static int vfio_iommu_type1_pin_pages(void
> *iommu_data,
> }
>
> static void vfio_iommu_type1_unpin_pages(void *iommu_data,
> - unsigned long *user_pfn, int
> npage)
> + dma_addr_t user_iova, int
> npage)
> {
> struct vfio_iommu *iommu = iommu_data;
> bool do_accounting;
> int i;
>
> - if (WARN_ON_ONCE(!iommu || !user_pfn || npage <= 0))
> + if (WARN_ON_ONCE(!iommu || npage <= 0))
> return;
>
> /* Supported for v2 version only */
> @@ -966,10 +966,9 @@ static void vfio_iommu_type1_unpin_pages(void
> *iommu_data,
>
> do_accounting = list_empty(&iommu->domain_list);
> for (i = 0; i < npage; i++) {
> + dma_addr_t iova = user_iova + PAGE_SIZE * i;
> struct vfio_dma *dma;
> - dma_addr_t iova;
>
> - iova = user_pfn[i] << PAGE_SHIFT;
> dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
> if (!dma)
> break;
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index d0844ecdc961..c3e441c0bf4b 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -147,10 +147,9 @@ bool vfio_file_has_dev(struct file *file, struct
> vfio_device *device);
>
> #define VFIO_PIN_PAGES_MAX_ENTRIES (PAGE_SIZE/sizeof(unsigned
> long))
>
> -int vfio_pin_pages(struct vfio_device *device, unsigned long
> *user_pfn,
> +int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
> int npage, int prot, unsigned long *phys_pfn);
> -void vfio_unpin_pages(struct vfio_device *device, unsigned long
> *user_pfn,
> - int npage);
> +void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
> int npage);
> int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova,
> void *data, size_t len, bool write);
>

2022-07-08 21:01:07

by Eric Farman

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 6/9] vfio/ccw: Change pa_pfn list to pa_iova list

On Tue, 2022-07-05 at 23:27 -0700, Nicolin Chen wrote:
> The vfio_ccw_cp code maintains both iova and its PFN list because the
> vfio_pin/unpin_pages API wanted pfn list. Since
> vfio_pin/unpin_pages()
> now accept "iova", change to maintain only pa_iova list and rename
> all
> "pfn_array" strings to "page_array", so as to simplify the code.
>
> Signed-off-by: Nicolin Chen <[email protected]>

Reviewed-by: Eric Farman <[email protected]>

> ---
> drivers/s390/cio/vfio_ccw_cp.c | 135 ++++++++++++++++---------------
> --
> 1 file changed, 64 insertions(+), 71 deletions(-)
>
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c
> b/drivers/s390/cio/vfio_ccw_cp.c
> index a739262f988d..3854c3d573f5 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -18,11 +18,9 @@
> #include "vfio_ccw_cp.h"
> #include "vfio_ccw_private.h"
>
> -struct pfn_array {
> - /* Starting guest physical I/O address. */
> - unsigned long pa_iova;
> - /* Array that stores PFNs of the pages need to pin. */
> - unsigned long *pa_iova_pfn;
> +struct page_array {
> + /* Array that stores pages need to pin. */
> + dma_addr_t *pa_iova;
> /* Array that receives PFNs of the pages pinned. */
> unsigned long *pa_pfn;
> /* Number of pages pinned from @pa_iova. */
> @@ -37,53 +35,50 @@ struct ccwchain {
> /* Count of the valid ccws in chain. */
> int ch_len;
> /* Pinned PAGEs for the original data. */
> - struct pfn_array *ch_pa;
> + struct page_array *ch_pa;
> };
>
> /*
> - * pfn_array_alloc() - alloc memory for PFNs
> - * @pa: pfn_array on which to perform the operation
> + * page_array_alloc() - alloc memory for page array
> + * @pa: page_array on which to perform the operation
> * @iova: target guest physical address
> * @len: number of bytes that should be pinned from @iova
> *
> - * Attempt to allocate memory for PFNs.
> + * Attempt to allocate memory for page array.
> *
> - * Usage of pfn_array:
> - * We expect (pa_nr == 0) and (pa_iova_pfn == NULL), any field in
> + * Usage of page_array:
> + * We expect (pa_nr == 0) and (pa_iova == NULL), any field in
> * this structure will be filled in by this function.
> *
> * Returns:
> - * 0 if PFNs are allocated
> - * -EINVAL if pa->pa_nr is not initially zero, or pa->pa_iova_pfn
> is not NULL
> + * 0 if page array is allocated
> + * -EINVAL if pa->pa_nr is not initially zero, or pa->pa_iova is
> not NULL
> * -ENOMEM if alloc failed
> */
> -static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned
> int len)
> +static int page_array_alloc(struct page_array *pa, u64 iova,
> unsigned int len)
> {
> int i;
>
> - if (pa->pa_nr || pa->pa_iova_pfn)
> + if (pa->pa_nr || pa->pa_iova)
> return -EINVAL;
>
> - pa->pa_iova = iova;
> -
> pa->pa_nr = ((iova & ~PAGE_MASK) + len + (PAGE_SIZE - 1)) >>
> PAGE_SHIFT;
> if (!pa->pa_nr)
> return -EINVAL;
>
> - pa->pa_iova_pfn = kcalloc(pa->pa_nr,
> - sizeof(*pa->pa_iova_pfn) +
> - sizeof(*pa->pa_pfn),
> - GFP_KERNEL);
> - if (unlikely(!pa->pa_iova_pfn)) {
> + pa->pa_iova = kcalloc(pa->pa_nr,
> + sizeof(*pa->pa_iova) + sizeof(*pa-
> >pa_pfn),
> + GFP_KERNEL);
> + if (unlikely(!pa->pa_iova)) {
> pa->pa_nr = 0;
> return -ENOMEM;
> }
> - pa->pa_pfn = pa->pa_iova_pfn + pa->pa_nr;
> + pa->pa_pfn = (unsigned long *)&pa->pa_iova[pa->pa_nr];
>
> - pa->pa_iova_pfn[0] = pa->pa_iova >> PAGE_SHIFT;
> + pa->pa_iova[0] = iova;
> pa->pa_pfn[0] = -1ULL;
> for (i = 1; i < pa->pa_nr; i++) {
> - pa->pa_iova_pfn[i] = pa->pa_iova_pfn[i - 1] + 1;
> + pa->pa_iova[i] = pa->pa_iova[i - 1] + PAGE_SIZE;
> pa->pa_pfn[i] = -1ULL;
> }
>
> @@ -91,30 +86,30 @@ static int pfn_array_alloc(struct pfn_array *pa,
> u64 iova, unsigned int len)
> }
>
> /*
> - * pfn_array_unpin() - Unpin user pages in memory
> - * @pa: pfn_array on which to perform the operation
> + * page_array_unpin() - Unpin user pages in memory
> + * @pa: page_array on which to perform the operation
> * @vdev: the vfio device to perform the operation
> * @pa_nr: number of user pages to unpin
> *
> * Only unpin if any pages were pinned to begin with, i.e. pa_nr >
> 0,
> * otherwise only clear pa->pa_nr
> */
> -static void pfn_array_unpin(struct pfn_array *pa,
> - struct vfio_device *vdev, int pa_nr)
> +static void page_array_unpin(struct page_array *pa,
> + struct vfio_device *vdev, int pa_nr)
> {
> int unpinned = 0, npage = 1;
>
> while (unpinned < pa_nr) {
> - unsigned long *first = &pa->pa_iova_pfn[unpinned];
> - unsigned long *last = &first[npage];
> + dma_addr_t *first = &pa->pa_iova[unpinned];
> + dma_addr_t *last = &first[npage];
>
> if (unpinned + npage < pa_nr &&
> - *first + npage == *last) {
> + *first + npage * PAGE_SIZE == *last) {
> npage++;
> continue;
> }
>
> - vfio_unpin_pages(vdev, *first << PAGE_SHIFT, npage);
> + vfio_unpin_pages(vdev, *first, npage);
> unpinned += npage;
> npage = 1;
> }
> @@ -123,30 +118,30 @@ static void pfn_array_unpin(struct pfn_array
> *pa,
> }
>
> /*
> - * pfn_array_pin() - Pin user pages in memory
> - * @pa: pfn_array on which to perform the operation
> + * page_array_pin() - Pin user pages in memory
> + * @pa: page_array on which to perform the operation
> * @mdev: the mediated device to perform pin operations
> *
> * Returns number of pages pinned upon success.
> * If the pin request partially succeeds, or fails completely,
> * all pages are left unpinned and a negative error value is
> returned.
> */
> -static int pfn_array_pin(struct pfn_array *pa, struct vfio_device
> *vdev)
> +static int page_array_pin(struct page_array *pa, struct vfio_device
> *vdev)
> {
> int pinned = 0, npage = 1;
> int ret = 0;
>
> while (pinned < pa->pa_nr) {
> - unsigned long *first = &pa->pa_iova_pfn[pinned];
> - unsigned long *last = &first[npage];
> + dma_addr_t *first = &pa->pa_iova[pinned];
> + dma_addr_t *last = &first[npage];
>
> if (pinned + npage < pa->pa_nr &&
> - *first + npage == *last) {
> + *first + npage * PAGE_SIZE == *last) {
> npage++;
> continue;
> }
>
> - ret = vfio_pin_pages(vdev, *first << PAGE_SHIFT, npage,
> + ret = vfio_pin_pages(vdev, *first, npage,
> IOMMU_READ | IOMMU_WRITE,
> &pa->pa_pfn[pinned]);
> if (ret < 0) {
> @@ -163,32 +158,30 @@ static int pfn_array_pin(struct pfn_array *pa,
> struct vfio_device *vdev)
> return ret;
>
> err_out:
> - pfn_array_unpin(pa, vdev, pinned);
> + page_array_unpin(pa, vdev, pinned);
> return ret;
> }
>
> /* Unpin the pages before releasing the memory. */
> -static void pfn_array_unpin_free(struct pfn_array *pa, struct
> vfio_device *vdev)
> +static void page_array_unpin_free(struct page_array *pa, struct
> vfio_device *vdev)
> {
> - pfn_array_unpin(pa, vdev, pa->pa_nr);
> - kfree(pa->pa_iova_pfn);
> + page_array_unpin(pa, vdev, pa->pa_nr);
> + kfree(pa->pa_iova);
> }
>
> -static bool pfn_array_iova_pinned(struct pfn_array *pa, unsigned
> long iova)
> +static bool page_array_iova_pinned(struct page_array *pa, unsigned
> long iova)
> {
> - unsigned long iova_pfn = iova >> PAGE_SHIFT;
> int i;
>
> for (i = 0; i < pa->pa_nr; i++)
> - if (pa->pa_iova_pfn[i] == iova_pfn)
> + if (pa->pa_iova[i] == iova)
> return true;
>
> return false;
> }
> -/* Create the list of IDAL words for a pfn_array. */
> -static inline void pfn_array_idal_create_words(
> - struct pfn_array *pa,
> - unsigned long *idaws)
> +/* Create the list of IDAL words for a page_array. */
> +static inline void page_array_idal_create_words(struct page_array
> *pa,
> + unsigned long *idaws)
> {
> int i;
>
> @@ -204,7 +197,7 @@ static inline void pfn_array_idal_create_words(
> idaws[i] = pa->pa_pfn[i] << PAGE_SHIFT;
>
> /* Adjust the first IDAW, since it may not start on a page
> boundary */
> - idaws[0] += pa->pa_iova & (PAGE_SIZE - 1);
> + idaws[0] += pa->pa_iova[0] & (PAGE_SIZE - 1);
> }
>
> static void convert_ccw0_to_ccw1(struct ccw1 *source, unsigned long
> len)
> @@ -236,18 +229,18 @@ static void convert_ccw0_to_ccw1(struct ccw1
> *source, unsigned long len)
> static long copy_from_iova(struct vfio_device *vdev, void *to, u64
> iova,
> unsigned long n)
> {
> - struct pfn_array pa = {0};
> + struct page_array pa = {0};
> u64 from;
> int i, ret;
> unsigned long l, m;
>
> - ret = pfn_array_alloc(&pa, iova, n);
> + ret = page_array_alloc(&pa, iova, n);
> if (ret < 0)
> return ret;
>
> - ret = pfn_array_pin(&pa, vdev);
> + ret = page_array_pin(&pa, vdev);
> if (ret < 0) {
> - pfn_array_unpin_free(&pa, vdev);
> + page_array_unpin_free(&pa, vdev);
> return ret;
> }
>
> @@ -268,7 +261,7 @@ static long copy_from_iova(struct vfio_device
> *vdev, void *to, u64 iova,
> break;
> }
>
> - pfn_array_unpin_free(&pa, vdev);
> + page_array_unpin_free(&pa, vdev);
>
> return l;
> }
> @@ -371,7 +364,7 @@ static struct ccwchain *ccwchain_alloc(struct
> channel_program *cp, int len)
> chain->ch_ccw = (struct ccw1 *)data;
>
> data = (u8 *)(chain->ch_ccw) + sizeof(*chain->ch_ccw) * len;
> - chain->ch_pa = (struct pfn_array *)data;
> + chain->ch_pa = (struct page_array *)data;
>
> chain->ch_len = len;
>
> @@ -555,7 +548,7 @@ static int ccwchain_fetch_direct(struct ccwchain
> *chain,
> struct vfio_device *vdev =
> &container_of(cp, struct vfio_ccw_private, cp)->vdev;
> struct ccw1 *ccw;
> - struct pfn_array *pa;
> + struct page_array *pa;
> u64 iova;
> unsigned long *idaws;
> int ret;
> @@ -589,13 +582,13 @@ static int ccwchain_fetch_direct(struct
> ccwchain *chain,
> }
>
> /*
> - * Allocate an array of pfn's for pages to pin/translate.
> + * Allocate an array of pages to pin/translate.
> * The number of pages is actually the count of the idaws
> * required for the data transfer, since we only only support
> * 4K IDAWs today.
> */
> pa = chain->ch_pa + idx;
> - ret = pfn_array_alloc(pa, iova, bytes);
> + ret = page_array_alloc(pa, iova, bytes);
> if (ret < 0)
> goto out_free_idaws;
>
> @@ -606,21 +599,21 @@ static int ccwchain_fetch_direct(struct
> ccwchain *chain,
> goto out_unpin;
>
> /*
> - * Copy guest IDAWs into pfn_array, in case the memory
> they
> + * Copy guest IDAWs into page_array, in case the memory
> they
> * occupy is not contiguous.
> */
> for (i = 0; i < idaw_nr; i++)
> - pa->pa_iova_pfn[i] = idaws[i] >> PAGE_SHIFT;
> + pa->pa_iova[i] = idaws[i];
> } else {
> /*
> - * No action is required here; the iova addresses in
> pfn_array
> - * were initialized sequentially in pfn_array_alloc()
> beginning
> + * No action is required here; the iova addresses in
> page_array
> + * were initialized sequentially in page_array_alloc()
> beginning
> * with the contents of ccw->cda.
> */
> }
>
> if (ccw_does_data_transfer(ccw)) {
> - ret = pfn_array_pin(pa, vdev);
> + ret = page_array_pin(pa, vdev);
> if (ret < 0)
> goto out_unpin;
> } else {
> @@ -630,13 +623,13 @@ static int ccwchain_fetch_direct(struct
> ccwchain *chain,
> ccw->cda = (__u32) virt_to_phys(idaws);
> ccw->flags |= CCW_FLAG_IDA;
>
> - /* Populate the IDAL with pinned/translated addresses from pfn
> */
> - pfn_array_idal_create_words(pa, idaws);
> + /* Populate the IDAL with pinned/translated addresses from page
> */
> + page_array_idal_create_words(pa, idaws);
>
> return 0;
>
> out_unpin:
> - pfn_array_unpin_free(pa, vdev);
> + page_array_unpin_free(pa, vdev);
> out_free_idaws:
> kfree(idaws);
> out_init:
> @@ -742,7 +735,7 @@ void cp_free(struct channel_program *cp)
> cp->initialized = false;
> list_for_each_entry_safe(chain, temp, &cp->ccwchain_list, next)
> {
> for (i = 0; i < chain->ch_len; i++) {
> - pfn_array_unpin_free(chain->ch_pa + i, vdev);
> + page_array_unpin_free(chain->ch_pa + i, vdev);
> ccwchain_cda_free(chain, i);
> }
> ccwchain_free(chain);
> @@ -918,7 +911,7 @@ bool cp_iova_pinned(struct channel_program *cp,
> u64 iova)
>
> list_for_each_entry(chain, &cp->ccwchain_list, next) {
> for (i = 0; i < chain->ch_len; i++)
> - if (pfn_array_iova_pinned(chain->ch_pa + i,
> iova))
> + if (page_array_iova_pinned(chain->ch_pa + i,
> iova))
> return true;
> }
>

2022-07-08 21:09:29

by Nicolin Chen

[permalink] [raw]
Subject: Re: [RFT][PATCH v2 0/9] Update vfio_pin/unpin_pages API

On Fri, Jul 08, 2022 at 04:30:32PM -0400, Eric Farman wrote:
> External email: Use caution opening links or attachments
>
>
> On Tue, 2022-07-05 at 23:27 -0700, Nicolin Chen wrote:
> > This is a preparatory series for IOMMUFD v2 patches. It prepares for
> > replacing vfio_iommu_type1 implementations of vfio_pin/unpin_pages()
> > with IOMMUFD version.
> >
> > There's a gap between these two versions: the vfio_iommu_type1
> > version
> > inputs a non-contiguous PFN list and outputs another PFN list for the
> > pinned physical page list, while the IOMMUFD version only supports a
> > contiguous address input by accepting the starting IO virtual address
> > of a set of pages to pin and by outputting to a physical page list.
> >
> > The nature of existing callers mostly aligns with the IOMMUFD
> > version,
> > except s390's vfio_ccw_cp code where some additional change is needed
> > along with this series. Overall, updating to "iova" and "phys_page"
> > does improve the caller side to some extent.
> >
> > Also fix a misuse of physical address and virtual address in the
> > s390's
> > crypto code. And update the input naming at the adjacent
> > vfio_dma_rw().
> >
> > This is on github:
> > https://github.com/nicolinc/iommufd/commits/vfio_pin_pages
> >
> > Request for testing: I only did build for s390 and i915 code, so it'd
> > be nice to have people who have environment to run sanity
> > accordingly.
>
> Tested-by: Eric Farman <[email protected]> # s390

Thank you for the review and test!