This is a preparatory series for IOMMUFD v2 patches. It prepares for
replacing vfio_iommu_type1 implementations of vfio_pin/unpin_pages()
with IOMMUFD version.
There's a gap between these two versions: the vfio_iommu_type1 version
inputs a non-contiguous PFN list and outputs another PFN list for the
pinned physical page list, while the IOMMUFD version only supports a
contiguous address input by accepting the starting IO virtual address
of a set of pages to pin and by outputting to a physical page list.
The nature of existing callers mostly aligns with the IOMMUFD version,
except s390's vfio_ccw_cp code where some additional change is needed
along with this series. Overall, updating to "iova" and "phys_page"
does improve the caller side to some extent.
Also fix a misuse of physical address and virtual address in the s390's
crypto code. And update the input naming at the adjacent vfio_dma_rw().
This is on github:
https://github.com/nicolinc/iommufd/commits/vfio_pin_pages
Terrence has tested this series on i915; Eric has tested on s390.
Thanks!
Changelog
v3:
* Added a patch to replace roundup with DIV_ROUND_UP in i915 gvt
* Dropped the "driver->ops->unpin_pages" and NULL checks in PATCH-1
* Changed to use WARN_ON and separate into lines in PATCH-1
* Replaced "guest" words with "user" and fix typo in PATCH-5
* Updated commit log of PATCH-1, PATCH-6, and PATCH-10
* Added Reviewed/Acked-by from Christoph, Jason, Kirti, Kevin and Eric
* Added Tested-by from Terrence (i915) and Eric (s390)
v2: https://lore.kernel.org/kvm/[email protected]/
* Added a patch to make vfio_unpin_pages return void
* Added two patches to remove PFN list from two s390 callers
* Renamed "phys_page" parameter to "pages" for vfio_pin_pages
* Updated commit log of kmap_local_page() patch
* Added Harald's "Reviewed-by" to pa_ind patch
* Rebased on top of Alex's extern removal path
v1: https://lore.kernel.org/kvm/[email protected]/
Nicolin Chen (10):
vfio: Make vfio_unpin_pages() return void
drm/i915/gvt: Replace roundup with DIV_ROUND_UP
vfio/ap: Pass in physical address of ind to ap_aqic()
vfio/ccw: Only pass in contiguous pages
vfio: Pass in starting IOVA to vfio_pin/unpin_pages API
vfio/ap: Change saved_pfn to saved_iova
vfio/ccw: Change pa_pfn list to pa_iova list
vfio: Rename user_iova of vfio_dma_rw()
vfio/ccw: Add kmap_local_page() for memcpy
vfio: Replace phys_pfn with pages for vfio_pin_pages()
.../driver-api/vfio-mediated-device.rst | 6 +-
arch/s390/include/asm/ap.h | 6 +-
drivers/gpu/drm/i915/gvt/kvmgt.c | 49 ++---
drivers/s390/cio/vfio_ccw_cp.c | 195 +++++++++++-------
drivers/s390/crypto/ap_queue.c | 2 +-
drivers/s390/crypto/vfio_ap_ops.c | 54 +++--
drivers/s390/crypto/vfio_ap_private.h | 4 +-
drivers/vfio/vfio.c | 54 ++---
drivers/vfio/vfio.h | 8 +-
drivers/vfio/vfio_iommu_type1.c | 45 ++--
include/linux/vfio.h | 9 +-
11 files changed, 215 insertions(+), 217 deletions(-)
--
2.17.1
The vfio_ap_ops code maintains both nib address and its PFN, which
is redundant, merely because vfio_pin/unpin_pages API wanted pfn.
Since vfio_pin/unpin_pages() now accept "iova", change "saved_pfn"
to "saved_iova" and remove pfn in the vfio_ap_validate_nib().
Reviewed-by: Jason Gunthorpe <[email protected]>
Tested-by: Eric Farman <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/s390/crypto/vfio_ap_ops.c | 42 +++++++++++----------------
drivers/s390/crypto/vfio_ap_private.h | 4 +--
2 files changed, 19 insertions(+), 27 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 8a2018ab3cf0..e8856a7e151c 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -112,7 +112,7 @@ static void vfio_ap_wait_for_irqclear(int apqn)
*
* Unregisters the ISC in the GIB when the saved ISC not invalid.
* Unpins the guest's page holding the NIB when it exists.
- * Resets the saved_pfn and saved_isc to invalid values.
+ * Resets the saved_iova and saved_isc to invalid values.
*/
static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
{
@@ -123,9 +123,9 @@ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc);
q->saved_isc = VFIO_AP_ISC_INVALID;
}
- if (q->saved_pfn && !WARN_ON(!q->matrix_mdev)) {
- vfio_unpin_pages(&q->matrix_mdev->vdev, q->saved_pfn << PAGE_SHIFT, 1);
- q->saved_pfn = 0;
+ if (q->saved_iova && !WARN_ON(!q->matrix_mdev)) {
+ vfio_unpin_pages(&q->matrix_mdev->vdev, q->saved_iova, 1);
+ q->saved_iova = 0;
}
}
@@ -189,27 +189,19 @@ static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
*
* @vcpu: the object representing the vcpu executing the PQAP(AQIC) instruction.
* @nib: the location for storing the nib address.
- * @g_pfn: the location for storing the page frame number of the page containing
- * the nib.
*
* When the PQAP(AQIC) instruction is executed, general register 2 contains the
* address of the notification indicator byte (nib) used for IRQ notification.
- * This function parses the nib from gr2 and calculates the page frame
- * number for the guest of the page containing the nib. The values are
- * stored in @nib and @g_pfn respectively.
- *
- * The g_pfn of the nib is then validated to ensure the nib address is valid.
+ * This function parses and validates the nib from gr2.
*
* Return: returns zero if the nib address is a valid; otherwise, returns
* -EINVAL.
*/
-static int vfio_ap_validate_nib(struct kvm_vcpu *vcpu, unsigned long *nib,
- unsigned long *g_pfn)
+static int vfio_ap_validate_nib(struct kvm_vcpu *vcpu, dma_addr_t *nib)
{
*nib = vcpu->run->s.regs.gprs[2];
- *g_pfn = *nib >> PAGE_SHIFT;
- if (kvm_is_error_hva(gfn_to_hva(vcpu->kvm, *g_pfn)))
+ if (kvm_is_error_hva(gfn_to_hva(vcpu->kvm, *nib >> PAGE_SHIFT)))
return -EINVAL;
return 0;
@@ -239,34 +231,34 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
int isc,
struct kvm_vcpu *vcpu)
{
- unsigned long nib;
struct ap_qirq_ctrl aqic_gisa = {};
struct ap_queue_status status = {};
struct kvm_s390_gisa *gisa;
int nisc;
struct kvm *kvm;
- unsigned long g_pfn, h_pfn;
+ unsigned long h_pfn;
phys_addr_t h_nib;
+ dma_addr_t nib;
int ret;
/* Verify that the notification indicator byte address is valid */
- if (vfio_ap_validate_nib(vcpu, &nib, &g_pfn)) {
- VFIO_AP_DBF_WARN("%s: invalid NIB address: nib=%#lx, g_pfn=%#lx, apqn=%#04x\n",
- __func__, nib, g_pfn, q->apqn);
+ if (vfio_ap_validate_nib(vcpu, &nib)) {
+ VFIO_AP_DBF_WARN("%s: invalid NIB address: nib=%pad, apqn=%#04x\n",
+ __func__, &nib, q->apqn);
status.response_code = AP_RESPONSE_INVALID_ADDRESS;
return status;
}
- ret = vfio_pin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1,
+ ret = vfio_pin_pages(&q->matrix_mdev->vdev, nib, 1,
IOMMU_READ | IOMMU_WRITE, &h_pfn);
switch (ret) {
case 1:
break;
default:
VFIO_AP_DBF_WARN("%s: vfio_pin_pages failed: rc=%d,"
- "nib=%#lx, g_pfn=%#lx, apqn=%#04x\n",
- __func__, ret, nib, g_pfn, q->apqn);
+ "nib=%pad, apqn=%#04x\n",
+ __func__, ret, &nib, q->apqn);
status.response_code = AP_RESPONSE_INVALID_ADDRESS;
return status;
@@ -296,12 +288,12 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
case AP_RESPONSE_NORMAL:
/* See if we did clear older IRQ configuration */
vfio_ap_free_aqic_resources(q);
- q->saved_pfn = g_pfn;
+ q->saved_iova = nib;
q->saved_isc = isc;
break;
case AP_RESPONSE_OTHERWISE_CHANGED:
/* We could not modify IRQ setings: clear new configuration */
- vfio_unpin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1);
+ vfio_unpin_pages(&q->matrix_mdev->vdev, nib, 1);
kvm_s390_gisc_unregister(kvm, isc);
break;
default:
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index a26efd804d0d..479b205179bd 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -102,13 +102,13 @@ struct ap_matrix_mdev {
* struct vfio_ap_queue - contains the data associated with a queue bound to the
* vfio_ap device driver
* @matrix_mdev: the matrix mediated device
- * @saved_pfn: the guest PFN pinned for the guest
+ * @saved_iova: the notification indicator byte (nib) address
* @apqn: the APQN of the AP queue device
* @saved_isc: the guest ISC registered with the GIB interface
*/
struct vfio_ap_queue {
struct ap_matrix_mdev *matrix_mdev;
- unsigned long saved_pfn;
+ dma_addr_t saved_iova;
int apqn;
#define VFIO_AP_ISC_INVALID 0xff
unsigned char saved_isc;
--
2.17.1
The vfio_ccw_cp code maintains both iova and its PFN list because the
vfio_pin/unpin_pages API wanted pfn list. Since vfio_pin/unpin_pages()
now accept "iova", change to maintain only pa_iova list and rename all
"pfn_array" strings to "page_array", so as to simplify the code.
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Eric Farman <[email protected]>
Tested-by: Eric Farman <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/s390/cio/vfio_ccw_cp.c | 135 ++++++++++++++++-----------------
1 file changed, 64 insertions(+), 71 deletions(-)
diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index a739262f988d..3854c3d573f5 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -18,11 +18,9 @@
#include "vfio_ccw_cp.h"
#include "vfio_ccw_private.h"
-struct pfn_array {
- /* Starting guest physical I/O address. */
- unsigned long pa_iova;
- /* Array that stores PFNs of the pages need to pin. */
- unsigned long *pa_iova_pfn;
+struct page_array {
+ /* Array that stores pages need to pin. */
+ dma_addr_t *pa_iova;
/* Array that receives PFNs of the pages pinned. */
unsigned long *pa_pfn;
/* Number of pages pinned from @pa_iova. */
@@ -37,53 +35,50 @@ struct ccwchain {
/* Count of the valid ccws in chain. */
int ch_len;
/* Pinned PAGEs for the original data. */
- struct pfn_array *ch_pa;
+ struct page_array *ch_pa;
};
/*
- * pfn_array_alloc() - alloc memory for PFNs
- * @pa: pfn_array on which to perform the operation
+ * page_array_alloc() - alloc memory for page array
+ * @pa: page_array on which to perform the operation
* @iova: target guest physical address
* @len: number of bytes that should be pinned from @iova
*
- * Attempt to allocate memory for PFNs.
+ * Attempt to allocate memory for page array.
*
- * Usage of pfn_array:
- * We expect (pa_nr == 0) and (pa_iova_pfn == NULL), any field in
+ * Usage of page_array:
+ * We expect (pa_nr == 0) and (pa_iova == NULL), any field in
* this structure will be filled in by this function.
*
* Returns:
- * 0 if PFNs are allocated
- * -EINVAL if pa->pa_nr is not initially zero, or pa->pa_iova_pfn is not NULL
+ * 0 if page array is allocated
+ * -EINVAL if pa->pa_nr is not initially zero, or pa->pa_iova is not NULL
* -ENOMEM if alloc failed
*/
-static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
+static int page_array_alloc(struct page_array *pa, u64 iova, unsigned int len)
{
int i;
- if (pa->pa_nr || pa->pa_iova_pfn)
+ if (pa->pa_nr || pa->pa_iova)
return -EINVAL;
- pa->pa_iova = iova;
-
pa->pa_nr = ((iova & ~PAGE_MASK) + len + (PAGE_SIZE - 1)) >> PAGE_SHIFT;
if (!pa->pa_nr)
return -EINVAL;
- pa->pa_iova_pfn = kcalloc(pa->pa_nr,
- sizeof(*pa->pa_iova_pfn) +
- sizeof(*pa->pa_pfn),
- GFP_KERNEL);
- if (unlikely(!pa->pa_iova_pfn)) {
+ pa->pa_iova = kcalloc(pa->pa_nr,
+ sizeof(*pa->pa_iova) + sizeof(*pa->pa_pfn),
+ GFP_KERNEL);
+ if (unlikely(!pa->pa_iova)) {
pa->pa_nr = 0;
return -ENOMEM;
}
- pa->pa_pfn = pa->pa_iova_pfn + pa->pa_nr;
+ pa->pa_pfn = (unsigned long *)&pa->pa_iova[pa->pa_nr];
- pa->pa_iova_pfn[0] = pa->pa_iova >> PAGE_SHIFT;
+ pa->pa_iova[0] = iova;
pa->pa_pfn[0] = -1ULL;
for (i = 1; i < pa->pa_nr; i++) {
- pa->pa_iova_pfn[i] = pa->pa_iova_pfn[i - 1] + 1;
+ pa->pa_iova[i] = pa->pa_iova[i - 1] + PAGE_SIZE;
pa->pa_pfn[i] = -1ULL;
}
@@ -91,30 +86,30 @@ static int pfn_array_alloc(struct pfn_array *pa, u64 iova, unsigned int len)
}
/*
- * pfn_array_unpin() - Unpin user pages in memory
- * @pa: pfn_array on which to perform the operation
+ * page_array_unpin() - Unpin user pages in memory
+ * @pa: page_array on which to perform the operation
* @vdev: the vfio device to perform the operation
* @pa_nr: number of user pages to unpin
*
* Only unpin if any pages were pinned to begin with, i.e. pa_nr > 0,
* otherwise only clear pa->pa_nr
*/
-static void pfn_array_unpin(struct pfn_array *pa,
- struct vfio_device *vdev, int pa_nr)
+static void page_array_unpin(struct page_array *pa,
+ struct vfio_device *vdev, int pa_nr)
{
int unpinned = 0, npage = 1;
while (unpinned < pa_nr) {
- unsigned long *first = &pa->pa_iova_pfn[unpinned];
- unsigned long *last = &first[npage];
+ dma_addr_t *first = &pa->pa_iova[unpinned];
+ dma_addr_t *last = &first[npage];
if (unpinned + npage < pa_nr &&
- *first + npage == *last) {
+ *first + npage * PAGE_SIZE == *last) {
npage++;
continue;
}
- vfio_unpin_pages(vdev, *first << PAGE_SHIFT, npage);
+ vfio_unpin_pages(vdev, *first, npage);
unpinned += npage;
npage = 1;
}
@@ -123,30 +118,30 @@ static void pfn_array_unpin(struct pfn_array *pa,
}
/*
- * pfn_array_pin() - Pin user pages in memory
- * @pa: pfn_array on which to perform the operation
+ * page_array_pin() - Pin user pages in memory
+ * @pa: page_array on which to perform the operation
* @mdev: the mediated device to perform pin operations
*
* Returns number of pages pinned upon success.
* If the pin request partially succeeds, or fails completely,
* all pages are left unpinned and a negative error value is returned.
*/
-static int pfn_array_pin(struct pfn_array *pa, struct vfio_device *vdev)
+static int page_array_pin(struct page_array *pa, struct vfio_device *vdev)
{
int pinned = 0, npage = 1;
int ret = 0;
while (pinned < pa->pa_nr) {
- unsigned long *first = &pa->pa_iova_pfn[pinned];
- unsigned long *last = &first[npage];
+ dma_addr_t *first = &pa->pa_iova[pinned];
+ dma_addr_t *last = &first[npage];
if (pinned + npage < pa->pa_nr &&
- *first + npage == *last) {
+ *first + npage * PAGE_SIZE == *last) {
npage++;
continue;
}
- ret = vfio_pin_pages(vdev, *first << PAGE_SHIFT, npage,
+ ret = vfio_pin_pages(vdev, *first, npage,
IOMMU_READ | IOMMU_WRITE,
&pa->pa_pfn[pinned]);
if (ret < 0) {
@@ -163,32 +158,30 @@ static int pfn_array_pin(struct pfn_array *pa, struct vfio_device *vdev)
return ret;
err_out:
- pfn_array_unpin(pa, vdev, pinned);
+ page_array_unpin(pa, vdev, pinned);
return ret;
}
/* Unpin the pages before releasing the memory. */
-static void pfn_array_unpin_free(struct pfn_array *pa, struct vfio_device *vdev)
+static void page_array_unpin_free(struct page_array *pa, struct vfio_device *vdev)
{
- pfn_array_unpin(pa, vdev, pa->pa_nr);
- kfree(pa->pa_iova_pfn);
+ page_array_unpin(pa, vdev, pa->pa_nr);
+ kfree(pa->pa_iova);
}
-static bool pfn_array_iova_pinned(struct pfn_array *pa, unsigned long iova)
+static bool page_array_iova_pinned(struct page_array *pa, unsigned long iova)
{
- unsigned long iova_pfn = iova >> PAGE_SHIFT;
int i;
for (i = 0; i < pa->pa_nr; i++)
- if (pa->pa_iova_pfn[i] == iova_pfn)
+ if (pa->pa_iova[i] == iova)
return true;
return false;
}
-/* Create the list of IDAL words for a pfn_array. */
-static inline void pfn_array_idal_create_words(
- struct pfn_array *pa,
- unsigned long *idaws)
+/* Create the list of IDAL words for a page_array. */
+static inline void page_array_idal_create_words(struct page_array *pa,
+ unsigned long *idaws)
{
int i;
@@ -204,7 +197,7 @@ static inline void pfn_array_idal_create_words(
idaws[i] = pa->pa_pfn[i] << PAGE_SHIFT;
/* Adjust the first IDAW, since it may not start on a page boundary */
- idaws[0] += pa->pa_iova & (PAGE_SIZE - 1);
+ idaws[0] += pa->pa_iova[0] & (PAGE_SIZE - 1);
}
static void convert_ccw0_to_ccw1(struct ccw1 *source, unsigned long len)
@@ -236,18 +229,18 @@ static void convert_ccw0_to_ccw1(struct ccw1 *source, unsigned long len)
static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,
unsigned long n)
{
- struct pfn_array pa = {0};
+ struct page_array pa = {0};
u64 from;
int i, ret;
unsigned long l, m;
- ret = pfn_array_alloc(&pa, iova, n);
+ ret = page_array_alloc(&pa, iova, n);
if (ret < 0)
return ret;
- ret = pfn_array_pin(&pa, vdev);
+ ret = page_array_pin(&pa, vdev);
if (ret < 0) {
- pfn_array_unpin_free(&pa, vdev);
+ page_array_unpin_free(&pa, vdev);
return ret;
}
@@ -268,7 +261,7 @@ static long copy_from_iova(struct vfio_device *vdev, void *to, u64 iova,
break;
}
- pfn_array_unpin_free(&pa, vdev);
+ page_array_unpin_free(&pa, vdev);
return l;
}
@@ -371,7 +364,7 @@ static struct ccwchain *ccwchain_alloc(struct channel_program *cp, int len)
chain->ch_ccw = (struct ccw1 *)data;
data = (u8 *)(chain->ch_ccw) + sizeof(*chain->ch_ccw) * len;
- chain->ch_pa = (struct pfn_array *)data;
+ chain->ch_pa = (struct page_array *)data;
chain->ch_len = len;
@@ -555,7 +548,7 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
struct vfio_device *vdev =
&container_of(cp, struct vfio_ccw_private, cp)->vdev;
struct ccw1 *ccw;
- struct pfn_array *pa;
+ struct page_array *pa;
u64 iova;
unsigned long *idaws;
int ret;
@@ -589,13 +582,13 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
}
/*
- * Allocate an array of pfn's for pages to pin/translate.
+ * Allocate an array of pages to pin/translate.
* The number of pages is actually the count of the idaws
* required for the data transfer, since we only only support
* 4K IDAWs today.
*/
pa = chain->ch_pa + idx;
- ret = pfn_array_alloc(pa, iova, bytes);
+ ret = page_array_alloc(pa, iova, bytes);
if (ret < 0)
goto out_free_idaws;
@@ -606,21 +599,21 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
goto out_unpin;
/*
- * Copy guest IDAWs into pfn_array, in case the memory they
+ * Copy guest IDAWs into page_array, in case the memory they
* occupy is not contiguous.
*/
for (i = 0; i < idaw_nr; i++)
- pa->pa_iova_pfn[i] = idaws[i] >> PAGE_SHIFT;
+ pa->pa_iova[i] = idaws[i];
} else {
/*
- * No action is required here; the iova addresses in pfn_array
- * were initialized sequentially in pfn_array_alloc() beginning
+ * No action is required here; the iova addresses in page_array
+ * were initialized sequentially in page_array_alloc() beginning
* with the contents of ccw->cda.
*/
}
if (ccw_does_data_transfer(ccw)) {
- ret = pfn_array_pin(pa, vdev);
+ ret = page_array_pin(pa, vdev);
if (ret < 0)
goto out_unpin;
} else {
@@ -630,13 +623,13 @@ static int ccwchain_fetch_direct(struct ccwchain *chain,
ccw->cda = (__u32) virt_to_phys(idaws);
ccw->flags |= CCW_FLAG_IDA;
- /* Populate the IDAL with pinned/translated addresses from pfn */
- pfn_array_idal_create_words(pa, idaws);
+ /* Populate the IDAL with pinned/translated addresses from page */
+ page_array_idal_create_words(pa, idaws);
return 0;
out_unpin:
- pfn_array_unpin_free(pa, vdev);
+ page_array_unpin_free(pa, vdev);
out_free_idaws:
kfree(idaws);
out_init:
@@ -742,7 +735,7 @@ void cp_free(struct channel_program *cp)
cp->initialized = false;
list_for_each_entry_safe(chain, temp, &cp->ccwchain_list, next) {
for (i = 0; i < chain->ch_len; i++) {
- pfn_array_unpin_free(chain->ch_pa + i, vdev);
+ page_array_unpin_free(chain->ch_pa + i, vdev);
ccwchain_cda_free(chain, i);
}
ccwchain_free(chain);
@@ -918,7 +911,7 @@ bool cp_iova_pinned(struct channel_program *cp, u64 iova)
list_for_each_entry(chain, &cp->ccwchain_list, next) {
for (i = 0; i < chain->ch_len; i++)
- if (pfn_array_iova_pinned(chain->ch_pa + i, iova))
+ if (page_array_iova_pinned(chain->ch_pa + i, iova))
return true;
}
--
2.17.1
The vfio_pin/unpin_pages() so far accepted arrays of PFNs of user IOVA.
Among all three callers, there was only one caller possibly passing in
a non-contiguous PFN list, which is now ensured to have contiguous PFN
inputs too.
Pass in the starting address with "iova" alone to simplify things, so
callers no longer need to maintain a PFN list or to pin/unpin one page
at a time. This also allows VFIO to use more efficient implementations
of pin/unpin_pages.
For now, also update vfio_iommu_type1 to fit this new parameter too,
while keeping its input intact (being user_iova) since we don't want
to spend too much effort swapping its parameters and local variables
at that level.
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed by: Kirti Wankhede <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Acked-by: Eric Farman <[email protected]>
Tested-by: Terrence Xu <[email protected]>
Tested-by: Eric Farman <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
.../driver-api/vfio-mediated-device.rst | 4 +--
drivers/gpu/drm/i915/gvt/kvmgt.c | 22 ++++++---------
drivers/s390/cio/vfio_ccw_cp.c | 4 +--
drivers/s390/crypto/vfio_ap_ops.c | 9 +++----
drivers/vfio/vfio.c | 27 +++++++++----------
drivers/vfio/vfio.h | 4 +--
drivers/vfio/vfio_iommu_type1.c | 15 +++++------
include/linux/vfio.h | 5 ++--
8 files changed, 39 insertions(+), 51 deletions(-)
diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
index 4307421dcaa0..af31eaf836e8 100644
--- a/Documentation/driver-api/vfio-mediated-device.rst
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -260,10 +260,10 @@ Translation APIs for Mediated Devices
The following APIs are provided for translating user pfn to host pfn in a VFIO
driver::
- int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
+ int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
int npage, int prot, unsigned long *phys_pfn);
- void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
+ void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
int npage);
These functions call back into the back-end IOMMU module by using the pin_pages
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 7ce7b09aa5b2..d3ac8383d759 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -231,14 +231,8 @@ static void intel_gvt_cleanup_vgpu_type_groups(struct intel_gvt *gvt)
static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
unsigned long size)
{
- int total_pages = DIV_ROUND_UP(size, PAGE_SIZE);
- int npage;
-
- for (npage = 0; npage < total_pages; npage++) {
- unsigned long cur_gfn = gfn + npage;
-
- vfio_unpin_pages(&vgpu->vfio_device, &cur_gfn, 1);
- }
+ vfio_unpin_pages(&vgpu->vfio_device, gfn << PAGE_SHIFT,
+ DIV_ROUND_UP(size, PAGE_SIZE));
}
/* Pin a normal or compound guest page for dma. */
@@ -255,14 +249,14 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
* on stack to hold pfns.
*/
for (npage = 0; npage < total_pages; npage++) {
- unsigned long cur_gfn = gfn + npage;
+ dma_addr_t cur_iova = (gfn + npage) << PAGE_SHIFT;
unsigned long pfn;
- ret = vfio_pin_pages(&vgpu->vfio_device, &cur_gfn, 1,
+ ret = vfio_pin_pages(&vgpu->vfio_device, cur_iova, 1,
IOMMU_READ | IOMMU_WRITE, &pfn);
if (ret != 1) {
- gvt_vgpu_err("vfio_pin_pages failed for gfn 0x%lx, ret %d\n",
- cur_gfn, ret);
+ gvt_vgpu_err("vfio_pin_pages failed for iova %pad, ret %d\n",
+ &cur_iova, ret);
goto err;
}
@@ -306,7 +300,7 @@ static int gvt_dma_map_page(struct intel_vgpu *vgpu, unsigned long gfn,
if (dma_mapping_error(dev, *dma_addr)) {
gvt_vgpu_err("DMA mapping failed for pfn 0x%lx, ret %d\n",
page_to_pfn(page), ret);
- gvt_unpin_guest_page(vgpu, gfn, size);
+ gvt_unpin_guest_page(vgpu, gfn << PAGE_SHIFT, size);
return -ENOMEM;
}
@@ -319,7 +313,7 @@ static void gvt_dma_unmap_page(struct intel_vgpu *vgpu, unsigned long gfn,
struct device *dev = vgpu->gvt->gt->i915->drm.dev;
dma_unmap_page(dev, dma_addr, size, DMA_BIDIRECTIONAL);
- gvt_unpin_guest_page(vgpu, gfn, size);
+ gvt_unpin_guest_page(vgpu, gfn << PAGE_SHIFT, size);
}
static struct gvt_dma *__gvt_cache_find_dma_addr(struct intel_vgpu *vgpu,
diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 3b94863ad24e..a739262f988d 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -114,7 +114,7 @@ static void pfn_array_unpin(struct pfn_array *pa,
continue;
}
- vfio_unpin_pages(vdev, first, npage);
+ vfio_unpin_pages(vdev, *first << PAGE_SHIFT, npage);
unpinned += npage;
npage = 1;
}
@@ -146,7 +146,7 @@ static int pfn_array_pin(struct pfn_array *pa, struct vfio_device *vdev)
continue;
}
- ret = vfio_pin_pages(vdev, first, npage,
+ ret = vfio_pin_pages(vdev, *first << PAGE_SHIFT, npage,
IOMMU_READ | IOMMU_WRITE,
&pa->pa_pfn[pinned]);
if (ret < 0) {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index bb869b28cebd..8a2018ab3cf0 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -124,7 +124,7 @@ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
q->saved_isc = VFIO_AP_ISC_INVALID;
}
if (q->saved_pfn && !WARN_ON(!q->matrix_mdev)) {
- vfio_unpin_pages(&q->matrix_mdev->vdev, &q->saved_pfn, 1);
+ vfio_unpin_pages(&q->matrix_mdev->vdev, q->saved_pfn << PAGE_SHIFT, 1);
q->saved_pfn = 0;
}
}
@@ -258,7 +258,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
return status;
}
- ret = vfio_pin_pages(&q->matrix_mdev->vdev, &g_pfn, 1,
+ ret = vfio_pin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1,
IOMMU_READ | IOMMU_WRITE, &h_pfn);
switch (ret) {
case 1:
@@ -301,7 +301,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
break;
case AP_RESPONSE_OTHERWISE_CHANGED:
/* We could not modify IRQ setings: clear new configuration */
- vfio_unpin_pages(&q->matrix_mdev->vdev, &g_pfn, 1);
+ vfio_unpin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1);
kvm_s390_gisc_unregister(kvm, isc);
break;
default:
@@ -1248,9 +1248,8 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
struct vfio_iommu_type1_dma_unmap *unmap = data;
- unsigned long g_pfn = unmap->iova >> PAGE_SHIFT;
- vfio_unpin_pages(&matrix_mdev->vdev, &g_pfn, 1);
+ vfio_unpin_pages(&matrix_mdev->vdev, unmap->iova, 1);
return NOTIFY_OK;
}
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index ad90adbfddc8..ee4d45c2f210 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1915,17 +1915,17 @@ int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs,
EXPORT_SYMBOL(vfio_set_irqs_validate_and_prepare);
/*
- * Pin a set of guest PFNs and return their associated host PFNs for local
+ * Pin contiguous user pages and return their associated host pages for local
* domain only.
* @device [in] : device
- * @user_pfn [in]: array of user/guest PFNs to be pinned.
- * @npage [in] : count of elements in user_pfn array. This count should not
- * be greater VFIO_PIN_PAGES_MAX_ENTRIES.
+ * @iova [in] : starting IOVA of user pages to be pinned.
+ * @npage [in] : count of pages to be pinned. This count should not
+ * be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
* @prot [in] : protection flags
* @phys_pfn[out]: array of host PFNs
* Return error or number of pages pinned.
*/
-int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
+int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
int npage, int prot, unsigned long *phys_pfn)
{
struct vfio_container *container;
@@ -1933,8 +1933,7 @@ int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
struct vfio_iommu_driver *driver;
int ret;
- if (!user_pfn || !phys_pfn || !npage ||
- !vfio_assert_device_open(device))
+ if (!phys_pfn || !npage || !vfio_assert_device_open(device))
return -EINVAL;
if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
@@ -1948,7 +1947,7 @@ int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
driver = container->iommu_driver;
if (likely(driver && driver->ops->pin_pages))
ret = driver->ops->pin_pages(container->iommu_data,
- group->iommu_group, user_pfn,
+ group->iommu_group, iova,
npage, prot, phys_pfn);
else
ret = -ENOTTY;
@@ -1958,15 +1957,13 @@ int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
EXPORT_SYMBOL(vfio_pin_pages);
/*
- * Unpin set of host PFNs for local domain only.
+ * Unpin contiguous host pages for local domain only.
* @device [in] : device
- * @user_pfn [in]: array of user/guest PFNs to be unpinned. Number of user/guest
- * PFNs should not be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
- * @npage [in] : count of elements in user_pfn array. This count should not
+ * @iova [in] : starting address of user pages to be unpinned.
+ * @npage [in] : count of pages to be unpinned. This count should not
* be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
*/
-void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
- int npage)
+void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage)
{
struct vfio_container *container;
struct vfio_iommu_driver *driver;
@@ -1981,7 +1978,7 @@ void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
container = device->group->container;
driver = container->iommu_driver;
- driver->ops->unpin_pages(container->iommu_data, user_pfn, npage);
+ driver->ops->unpin_pages(container->iommu_data, iova, npage);
}
EXPORT_SYMBOL(vfio_unpin_pages);
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index bef4edf58138..dbcd0e8c031b 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -50,11 +50,11 @@ struct vfio_iommu_driver_ops {
struct iommu_group *group);
int (*pin_pages)(void *iommu_data,
struct iommu_group *group,
- unsigned long *user_pfn,
+ dma_addr_t user_iova,
int npage, int prot,
unsigned long *phys_pfn);
void (*unpin_pages)(void *iommu_data,
- unsigned long *user_pfn, int npage);
+ dma_addr_t user_iova, int npage);
int (*register_notifier)(void *iommu_data,
unsigned long *events,
struct notifier_block *nb);
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index cfeea4efd625..c5ff9970edc4 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -828,7 +828,7 @@ static int vfio_unpin_page_external(struct vfio_dma *dma, dma_addr_t iova,
static int vfio_iommu_type1_pin_pages(void *iommu_data,
struct iommu_group *iommu_group,
- unsigned long *user_pfn,
+ dma_addr_t user_iova,
int npage, int prot,
unsigned long *phys_pfn)
{
@@ -840,7 +840,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
bool do_accounting;
dma_addr_t iova;
- if (!iommu || !user_pfn || !phys_pfn)
+ if (!iommu || !phys_pfn)
return -EINVAL;
/* Supported for v2 version only */
@@ -856,7 +856,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
again:
if (iommu->vaddr_invalid_count) {
for (i = 0; i < npage; i++) {
- iova = user_pfn[i] << PAGE_SHIFT;
+ iova = user_iova + PAGE_SIZE * i;
ret = vfio_find_dma_valid(iommu, iova, PAGE_SIZE, &dma);
if (ret < 0)
goto pin_done;
@@ -881,7 +881,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
for (i = 0; i < npage; i++) {
struct vfio_pfn *vpfn;
- iova = user_pfn[i] << PAGE_SHIFT;
+ iova = user_iova + PAGE_SIZE * i;
dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
if (!dma) {
ret = -EINVAL;
@@ -938,7 +938,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
for (j = 0; j < i; j++) {
dma_addr_t iova;
- iova = user_pfn[j] << PAGE_SHIFT;
+ iova = user_iova + PAGE_SIZE * j;
dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
vfio_unpin_page_external(dma, iova, do_accounting);
phys_pfn[j] = 0;
@@ -949,7 +949,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
}
static void vfio_iommu_type1_unpin_pages(void *iommu_data,
- unsigned long *user_pfn, int npage)
+ dma_addr_t user_iova, int npage)
{
struct vfio_iommu *iommu = iommu_data;
bool do_accounting;
@@ -963,10 +963,9 @@ static void vfio_iommu_type1_unpin_pages(void *iommu_data,
do_accounting = list_empty(&iommu->domain_list);
for (i = 0; i < npage; i++) {
+ dma_addr_t iova = user_iova + PAGE_SIZE * i;
struct vfio_dma *dma;
- dma_addr_t iova;
- iova = user_pfn[i] << PAGE_SHIFT;
dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
if (!dma)
break;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 5348ef353029..2cefb63751f9 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -157,10 +157,9 @@ bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
#define VFIO_PIN_PAGES_MAX_ENTRIES (PAGE_SIZE/sizeof(unsigned long))
-int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
+int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
int npage, int prot, unsigned long *phys_pfn);
-void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
- int npage);
+void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage);
int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova,
void *data, size_t len, bool write);
--
2.17.1
Following the updated vfio_pin/unpin_pages(), use the simpler "iova".
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Tested-by: Terrence Xu <[email protected]>
Tested-by: Eric Farman <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
---
drivers/vfio/vfio.c | 6 +++---
include/linux/vfio.h | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index ee4d45c2f210..153e5a7bce6a 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1993,13 +1993,13 @@ EXPORT_SYMBOL(vfio_unpin_pages);
* not a real device DMA, it is not necessary to pin the user space memory.
*
* @device [in] : VFIO device
- * @user_iova [in] : base IOVA of a user space buffer
+ * @iova [in] : base IOVA of a user space buffer
* @data [in] : pointer to kernel buffer
* @len [in] : kernel buffer length
* @write : indicate read or write
* Return error code on failure or 0 on success.
*/
-int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova, void *data,
+int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova, void *data,
size_t len, bool write)
{
struct vfio_container *container;
@@ -2015,7 +2015,7 @@ int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova, void *data,
if (likely(driver && driver->ops->dma_rw))
ret = driver->ops->dma_rw(container->iommu_data,
- user_iova, data, len, write);
+ iova, data, len, write);
else
ret = -ENOTTY;
return ret;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 2cefb63751f9..1d4febc4f711 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -160,7 +160,7 @@ bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
int npage, int prot, unsigned long *phys_pfn);
void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage);
-int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova,
+int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova,
void *data, size_t len, bool write);
/* each type has independent events */
--
2.17.1
Reviewed-by: Tony Krowiak <[email protected]>
On 7/8/22 6:44 PM, Nicolin Chen wrote:
> The vfio_ap_ops code maintains both nib address and its PFN, which
> is redundant, merely because vfio_pin/unpin_pages API wanted pfn.
> Since vfio_pin/unpin_pages() now accept "iova", change "saved_pfn"
> to "saved_iova" and remove pfn in the vfio_ap_validate_nib().
>
> Reviewed-by: Jason Gunthorpe <[email protected]>
> Tested-by: Eric Farman <[email protected]>
> Signed-off-by: Nicolin Chen <[email protected]>
> ---
> drivers/s390/crypto/vfio_ap_ops.c | 42 +++++++++++----------------
> drivers/s390/crypto/vfio_ap_private.h | 4 +--
> 2 files changed, 19 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 8a2018ab3cf0..e8856a7e151c 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -112,7 +112,7 @@ static void vfio_ap_wait_for_irqclear(int apqn)
> *
> * Unregisters the ISC in the GIB when the saved ISC not invalid.
> * Unpins the guest's page holding the NIB when it exists.
> - * Resets the saved_pfn and saved_isc to invalid values.
> + * Resets the saved_iova and saved_isc to invalid values.
> */
> static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
> {
> @@ -123,9 +123,9 @@ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
> kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc);
> q->saved_isc = VFIO_AP_ISC_INVALID;
> }
> - if (q->saved_pfn && !WARN_ON(!q->matrix_mdev)) {
> - vfio_unpin_pages(&q->matrix_mdev->vdev, q->saved_pfn << PAGE_SHIFT, 1);
> - q->saved_pfn = 0;
> + if (q->saved_iova && !WARN_ON(!q->matrix_mdev)) {
> + vfio_unpin_pages(&q->matrix_mdev->vdev, q->saved_iova, 1);
> + q->saved_iova = 0;
> }
> }
>
> @@ -189,27 +189,19 @@ static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
> *
> * @vcpu: the object representing the vcpu executing the PQAP(AQIC) instruction.
> * @nib: the location for storing the nib address.
> - * @g_pfn: the location for storing the page frame number of the page containing
> - * the nib.
> *
> * When the PQAP(AQIC) instruction is executed, general register 2 contains the
> * address of the notification indicator byte (nib) used for IRQ notification.
> - * This function parses the nib from gr2 and calculates the page frame
> - * number for the guest of the page containing the nib. The values are
> - * stored in @nib and @g_pfn respectively.
> - *
> - * The g_pfn of the nib is then validated to ensure the nib address is valid.
> + * This function parses and validates the nib from gr2.
> *
> * Return: returns zero if the nib address is a valid; otherwise, returns
> * -EINVAL.
> */
> -static int vfio_ap_validate_nib(struct kvm_vcpu *vcpu, unsigned long *nib,
> - unsigned long *g_pfn)
> +static int vfio_ap_validate_nib(struct kvm_vcpu *vcpu, dma_addr_t *nib)
> {
> *nib = vcpu->run->s.regs.gprs[2];
> - *g_pfn = *nib >> PAGE_SHIFT;
>
> - if (kvm_is_error_hva(gfn_to_hva(vcpu->kvm, *g_pfn)))
> + if (kvm_is_error_hva(gfn_to_hva(vcpu->kvm, *nib >> PAGE_SHIFT)))
> return -EINVAL;
>
> return 0;
> @@ -239,34 +231,34 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
> int isc,
> struct kvm_vcpu *vcpu)
> {
> - unsigned long nib;
> struct ap_qirq_ctrl aqic_gisa = {};
> struct ap_queue_status status = {};
> struct kvm_s390_gisa *gisa;
> int nisc;
> struct kvm *kvm;
> - unsigned long g_pfn, h_pfn;
> + unsigned long h_pfn;
> phys_addr_t h_nib;
> + dma_addr_t nib;
> int ret;
>
> /* Verify that the notification indicator byte address is valid */
> - if (vfio_ap_validate_nib(vcpu, &nib, &g_pfn)) {
> - VFIO_AP_DBF_WARN("%s: invalid NIB address: nib=%#lx, g_pfn=%#lx, apqn=%#04x\n",
> - __func__, nib, g_pfn, q->apqn);
> + if (vfio_ap_validate_nib(vcpu, &nib)) {
> + VFIO_AP_DBF_WARN("%s: invalid NIB address: nib=%pad, apqn=%#04x\n",
> + __func__, &nib, q->apqn);
>
> status.response_code = AP_RESPONSE_INVALID_ADDRESS;
> return status;
> }
>
> - ret = vfio_pin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1,
> + ret = vfio_pin_pages(&q->matrix_mdev->vdev, nib, 1,
> IOMMU_READ | IOMMU_WRITE, &h_pfn);
> switch (ret) {
> case 1:
> break;
> default:
> VFIO_AP_DBF_WARN("%s: vfio_pin_pages failed: rc=%d,"
> - "nib=%#lx, g_pfn=%#lx, apqn=%#04x\n",
> - __func__, ret, nib, g_pfn, q->apqn);
> + "nib=%pad, apqn=%#04x\n",
> + __func__, ret, &nib, q->apqn);
>
> status.response_code = AP_RESPONSE_INVALID_ADDRESS;
> return status;
> @@ -296,12 +288,12 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
> case AP_RESPONSE_NORMAL:
> /* See if we did clear older IRQ configuration */
> vfio_ap_free_aqic_resources(q);
> - q->saved_pfn = g_pfn;
> + q->saved_iova = nib;
> q->saved_isc = isc;
> break;
> case AP_RESPONSE_OTHERWISE_CHANGED:
> /* We could not modify IRQ setings: clear new configuration */
> - vfio_unpin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1);
> + vfio_unpin_pages(&q->matrix_mdev->vdev, nib, 1);
> kvm_s390_gisc_unregister(kvm, isc);
> break;
> default:
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index a26efd804d0d..479b205179bd 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -102,13 +102,13 @@ struct ap_matrix_mdev {
> * struct vfio_ap_queue - contains the data associated with a queue bound to the
> * vfio_ap device driver
> * @matrix_mdev: the matrix mediated device
> - * @saved_pfn: the guest PFN pinned for the guest
> + * @saved_iova: the notification indicator byte (nib) address
> * @apqn: the APQN of the AP queue device
> * @saved_isc: the guest ISC registered with the GIB interface
> */
> struct vfio_ap_queue {
> struct ap_matrix_mdev *matrix_mdev;
> - unsigned long saved_pfn;
> + dma_addr_t saved_iova;
> int apqn;
> #define VFIO_AP_ISC_INVALID 0xff
> unsigned char saved_isc;
Reviewed-by: Tony Krowiak <[email protected]>
On 7/8/22 6:44 PM, Nicolin Chen wrote:
> The vfio_pin/unpin_pages() so far accepted arrays of PFNs of user IOVA.
> Among all three callers, there was only one caller possibly passing in
> a non-contiguous PFN list, which is now ensured to have contiguous PFN
> inputs too.
>
> Pass in the starting address with "iova" alone to simplify things, so
> callers no longer need to maintain a PFN list or to pin/unpin one page
> at a time. This also allows VFIO to use more efficient implementations
> of pin/unpin_pages.
>
> For now, also update vfio_iommu_type1 to fit this new parameter too,
> while keeping its input intact (being user_iova) since we don't want
> to spend too much effort swapping its parameters and local variables
> at that level.
>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Reviewed by: Kirti Wankhede <[email protected]>
> Reviewed-by: Jason Gunthorpe <[email protected]>
> Reviewed-by: Kevin Tian <[email protected]>
> Acked-by: Eric Farman <[email protected]>
> Tested-by: Terrence Xu <[email protected]>
> Tested-by: Eric Farman <[email protected]>
> Signed-off-by: Nicolin Chen <[email protected]>
> ---
> .../driver-api/vfio-mediated-device.rst | 4 +--
> drivers/gpu/drm/i915/gvt/kvmgt.c | 22 ++++++---------
> drivers/s390/cio/vfio_ccw_cp.c | 4 +--
> drivers/s390/crypto/vfio_ap_ops.c | 9 +++----
> drivers/vfio/vfio.c | 27 +++++++++----------
> drivers/vfio/vfio.h | 4 +--
> drivers/vfio/vfio_iommu_type1.c | 15 +++++------
> include/linux/vfio.h | 5 ++--
> 8 files changed, 39 insertions(+), 51 deletions(-)
>
> diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
> index 4307421dcaa0..af31eaf836e8 100644
> --- a/Documentation/driver-api/vfio-mediated-device.rst
> +++ b/Documentation/driver-api/vfio-mediated-device.rst
> @@ -260,10 +260,10 @@ Translation APIs for Mediated Devices
> The following APIs are provided for translating user pfn to host pfn in a VFIO
> driver::
>
> - int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
> + int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
> int npage, int prot, unsigned long *phys_pfn);
>
> - void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
> + void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova,
> int npage);
>
> These functions call back into the back-end IOMMU module by using the pin_pages
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 7ce7b09aa5b2..d3ac8383d759 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -231,14 +231,8 @@ static void intel_gvt_cleanup_vgpu_type_groups(struct intel_gvt *gvt)
> static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
> unsigned long size)
> {
> - int total_pages = DIV_ROUND_UP(size, PAGE_SIZE);
> - int npage;
> -
> - for (npage = 0; npage < total_pages; npage++) {
> - unsigned long cur_gfn = gfn + npage;
> -
> - vfio_unpin_pages(&vgpu->vfio_device, &cur_gfn, 1);
> - }
> + vfio_unpin_pages(&vgpu->vfio_device, gfn << PAGE_SHIFT,
> + DIV_ROUND_UP(size, PAGE_SIZE));
> }
>
> /* Pin a normal or compound guest page for dma. */
> @@ -255,14 +249,14 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
> * on stack to hold pfns.
> */
> for (npage = 0; npage < total_pages; npage++) {
> - unsigned long cur_gfn = gfn + npage;
> + dma_addr_t cur_iova = (gfn + npage) << PAGE_SHIFT;
> unsigned long pfn;
>
> - ret = vfio_pin_pages(&vgpu->vfio_device, &cur_gfn, 1,
> + ret = vfio_pin_pages(&vgpu->vfio_device, cur_iova, 1,
> IOMMU_READ | IOMMU_WRITE, &pfn);
> if (ret != 1) {
> - gvt_vgpu_err("vfio_pin_pages failed for gfn 0x%lx, ret %d\n",
> - cur_gfn, ret);
> + gvt_vgpu_err("vfio_pin_pages failed for iova %pad, ret %d\n",
> + &cur_iova, ret);
> goto err;
> }
>
> @@ -306,7 +300,7 @@ static int gvt_dma_map_page(struct intel_vgpu *vgpu, unsigned long gfn,
> if (dma_mapping_error(dev, *dma_addr)) {
> gvt_vgpu_err("DMA mapping failed for pfn 0x%lx, ret %d\n",
> page_to_pfn(page), ret);
> - gvt_unpin_guest_page(vgpu, gfn, size);
> + gvt_unpin_guest_page(vgpu, gfn << PAGE_SHIFT, size);
> return -ENOMEM;
> }
>
> @@ -319,7 +313,7 @@ static void gvt_dma_unmap_page(struct intel_vgpu *vgpu, unsigned long gfn,
> struct device *dev = vgpu->gvt->gt->i915->drm.dev;
>
> dma_unmap_page(dev, dma_addr, size, DMA_BIDIRECTIONAL);
> - gvt_unpin_guest_page(vgpu, gfn, size);
> + gvt_unpin_guest_page(vgpu, gfn << PAGE_SHIFT, size);
> }
>
> static struct gvt_dma *__gvt_cache_find_dma_addr(struct intel_vgpu *vgpu,
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> index 3b94863ad24e..a739262f988d 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -114,7 +114,7 @@ static void pfn_array_unpin(struct pfn_array *pa,
> continue;
> }
>
> - vfio_unpin_pages(vdev, first, npage);
> + vfio_unpin_pages(vdev, *first << PAGE_SHIFT, npage);
> unpinned += npage;
> npage = 1;
> }
> @@ -146,7 +146,7 @@ static int pfn_array_pin(struct pfn_array *pa, struct vfio_device *vdev)
> continue;
> }
>
> - ret = vfio_pin_pages(vdev, first, npage,
> + ret = vfio_pin_pages(vdev, *first << PAGE_SHIFT, npage,
> IOMMU_READ | IOMMU_WRITE,
> &pa->pa_pfn[pinned]);
> if (ret < 0) {
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index bb869b28cebd..8a2018ab3cf0 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -124,7 +124,7 @@ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
> q->saved_isc = VFIO_AP_ISC_INVALID;
> }
> if (q->saved_pfn && !WARN_ON(!q->matrix_mdev)) {
> - vfio_unpin_pages(&q->matrix_mdev->vdev, &q->saved_pfn, 1);
> + vfio_unpin_pages(&q->matrix_mdev->vdev, q->saved_pfn << PAGE_SHIFT, 1);
> q->saved_pfn = 0;
> }
> }
> @@ -258,7 +258,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
> return status;
> }
>
> - ret = vfio_pin_pages(&q->matrix_mdev->vdev, &g_pfn, 1,
> + ret = vfio_pin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1,
> IOMMU_READ | IOMMU_WRITE, &h_pfn);
> switch (ret) {
> case 1:
> @@ -301,7 +301,7 @@ static struct ap_queue_status vfio_ap_irq_enable(struct vfio_ap_queue *q,
> break;
> case AP_RESPONSE_OTHERWISE_CHANGED:
> /* We could not modify IRQ setings: clear new configuration */
> - vfio_unpin_pages(&q->matrix_mdev->vdev, &g_pfn, 1);
> + vfio_unpin_pages(&q->matrix_mdev->vdev, g_pfn << PAGE_SHIFT, 1);
> kvm_s390_gisc_unregister(kvm, isc);
> break;
> default:
> @@ -1248,9 +1248,8 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
>
> if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
> struct vfio_iommu_type1_dma_unmap *unmap = data;
> - unsigned long g_pfn = unmap->iova >> PAGE_SHIFT;
>
> - vfio_unpin_pages(&matrix_mdev->vdev, &g_pfn, 1);
> + vfio_unpin_pages(&matrix_mdev->vdev, unmap->iova, 1);
> return NOTIFY_OK;
> }
>
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index ad90adbfddc8..ee4d45c2f210 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1915,17 +1915,17 @@ int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs,
> EXPORT_SYMBOL(vfio_set_irqs_validate_and_prepare);
>
> /*
> - * Pin a set of guest PFNs and return their associated host PFNs for local
> + * Pin contiguous user pages and return their associated host pages for local
> * domain only.
> * @device [in] : device
> - * @user_pfn [in]: array of user/guest PFNs to be pinned.
> - * @npage [in] : count of elements in user_pfn array. This count should not
> - * be greater VFIO_PIN_PAGES_MAX_ENTRIES.
> + * @iova [in] : starting IOVA of user pages to be pinned.
> + * @npage [in] : count of pages to be pinned. This count should not
> + * be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
> * @prot [in] : protection flags
> * @phys_pfn[out]: array of host PFNs
> * Return error or number of pages pinned.
> */
> -int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
> +int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
> int npage, int prot, unsigned long *phys_pfn)
> {
> struct vfio_container *container;
> @@ -1933,8 +1933,7 @@ int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
> struct vfio_iommu_driver *driver;
> int ret;
>
> - if (!user_pfn || !phys_pfn || !npage ||
> - !vfio_assert_device_open(device))
> + if (!phys_pfn || !npage || !vfio_assert_device_open(device))
> return -EINVAL;
>
> if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
> @@ -1948,7 +1947,7 @@ int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
> driver = container->iommu_driver;
> if (likely(driver && driver->ops->pin_pages))
> ret = driver->ops->pin_pages(container->iommu_data,
> - group->iommu_group, user_pfn,
> + group->iommu_group, iova,
> npage, prot, phys_pfn);
> else
> ret = -ENOTTY;
> @@ -1958,15 +1957,13 @@ int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
> EXPORT_SYMBOL(vfio_pin_pages);
>
> /*
> - * Unpin set of host PFNs for local domain only.
> + * Unpin contiguous host pages for local domain only.
> * @device [in] : device
> - * @user_pfn [in]: array of user/guest PFNs to be unpinned. Number of user/guest
> - * PFNs should not be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
> - * @npage [in] : count of elements in user_pfn array. This count should not
> + * @iova [in] : starting address of user pages to be unpinned.
> + * @npage [in] : count of pages to be unpinned. This count should not
> * be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
> */
> -void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
> - int npage)
> +void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage)
> {
> struct vfio_container *container;
> struct vfio_iommu_driver *driver;
> @@ -1981,7 +1978,7 @@ void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
> container = device->group->container;
> driver = container->iommu_driver;
>
> - driver->ops->unpin_pages(container->iommu_data, user_pfn, npage);
> + driver->ops->unpin_pages(container->iommu_data, iova, npage);
> }
> EXPORT_SYMBOL(vfio_unpin_pages);
>
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index bef4edf58138..dbcd0e8c031b 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -50,11 +50,11 @@ struct vfio_iommu_driver_ops {
> struct iommu_group *group);
> int (*pin_pages)(void *iommu_data,
> struct iommu_group *group,
> - unsigned long *user_pfn,
> + dma_addr_t user_iova,
> int npage, int prot,
> unsigned long *phys_pfn);
> void (*unpin_pages)(void *iommu_data,
> - unsigned long *user_pfn, int npage);
> + dma_addr_t user_iova, int npage);
> int (*register_notifier)(void *iommu_data,
> unsigned long *events,
> struct notifier_block *nb);
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index cfeea4efd625..c5ff9970edc4 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -828,7 +828,7 @@ static int vfio_unpin_page_external(struct vfio_dma *dma, dma_addr_t iova,
>
> static int vfio_iommu_type1_pin_pages(void *iommu_data,
> struct iommu_group *iommu_group,
> - unsigned long *user_pfn,
> + dma_addr_t user_iova,
> int npage, int prot,
> unsigned long *phys_pfn)
> {
> @@ -840,7 +840,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
> bool do_accounting;
> dma_addr_t iova;
>
> - if (!iommu || !user_pfn || !phys_pfn)
> + if (!iommu || !phys_pfn)
> return -EINVAL;
>
> /* Supported for v2 version only */
> @@ -856,7 +856,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
> again:
> if (iommu->vaddr_invalid_count) {
> for (i = 0; i < npage; i++) {
> - iova = user_pfn[i] << PAGE_SHIFT;
> + iova = user_iova + PAGE_SIZE * i;
> ret = vfio_find_dma_valid(iommu, iova, PAGE_SIZE, &dma);
> if (ret < 0)
> goto pin_done;
> @@ -881,7 +881,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
> for (i = 0; i < npage; i++) {
> struct vfio_pfn *vpfn;
>
> - iova = user_pfn[i] << PAGE_SHIFT;
> + iova = user_iova + PAGE_SIZE * i;
> dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
> if (!dma) {
> ret = -EINVAL;
> @@ -938,7 +938,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
> for (j = 0; j < i; j++) {
> dma_addr_t iova;
>
> - iova = user_pfn[j] << PAGE_SHIFT;
> + iova = user_iova + PAGE_SIZE * j;
> dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
> vfio_unpin_page_external(dma, iova, do_accounting);
> phys_pfn[j] = 0;
> @@ -949,7 +949,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
> }
>
> static void vfio_iommu_type1_unpin_pages(void *iommu_data,
> - unsigned long *user_pfn, int npage)
> + dma_addr_t user_iova, int npage)
> {
> struct vfio_iommu *iommu = iommu_data;
> bool do_accounting;
> @@ -963,10 +963,9 @@ static void vfio_iommu_type1_unpin_pages(void *iommu_data,
>
> do_accounting = list_empty(&iommu->domain_list);
> for (i = 0; i < npage; i++) {
> + dma_addr_t iova = user_iova + PAGE_SIZE * i;
> struct vfio_dma *dma;
> - dma_addr_t iova;
>
> - iova = user_pfn[i] << PAGE_SHIFT;
> dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
> if (!dma)
> break;
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 5348ef353029..2cefb63751f9 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -157,10 +157,9 @@ bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
>
> #define VFIO_PIN_PAGES_MAX_ENTRIES (PAGE_SIZE/sizeof(unsigned long))
>
> -int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
> +int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
> int npage, int prot, unsigned long *phys_pfn);
> -void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
> - int npage);
> +void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage);
> int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova,
> void *data, size_t len, bool write);
>
On Fri, 8 Jul 2022 15:44:18 -0700
Nicolin Chen <[email protected]> wrote:
> This is a preparatory series for IOMMUFD v2 patches. It prepares for
> replacing vfio_iommu_type1 implementations of vfio_pin/unpin_pages()
> with IOMMUFD version.
>
> There's a gap between these two versions: the vfio_iommu_type1 version
> inputs a non-contiguous PFN list and outputs another PFN list for the
> pinned physical page list, while the IOMMUFD version only supports a
> contiguous address input by accepting the starting IO virtual address
> of a set of pages to pin and by outputting to a physical page list.
>
> The nature of existing callers mostly aligns with the IOMMUFD version,
> except s390's vfio_ccw_cp code where some additional change is needed
> along with this series. Overall, updating to "iova" and "phys_page"
> does improve the caller side to some extent.
>
> Also fix a misuse of physical address and virtual address in the s390's
> crypto code. And update the input naming at the adjacent vfio_dma_rw().
>
> This is on github:
> https://github.com/nicolinc/iommufd/commits/vfio_pin_pages
>
> Terrence has tested this series on i915; Eric has tested on s390.
>
> Thanks!
>
> Changelog
> v3:
> * Added a patch to replace roundup with DIV_ROUND_UP in i915 gvt
> * Dropped the "driver->ops->unpin_pages" and NULL checks in PATCH-1
> * Changed to use WARN_ON and separate into lines in PATCH-1
> * Replaced "guest" words with "user" and fix typo in PATCH-5
> * Updated commit log of PATCH-1, PATCH-6, and PATCH-10
> * Added Reviewed/Acked-by from Christoph, Jason, Kirti, Kevin and Eric
> * Added Tested-by from Terrence (i915) and Eric (s390)
> v2: https://lore.kernel.org/kvm/[email protected]/
> * Added a patch to make vfio_unpin_pages return void
> * Added two patches to remove PFN list from two s390 callers
> * Renamed "phys_page" parameter to "pages" for vfio_pin_pages
> * Updated commit log of kmap_local_page() patch
> * Added Harald's "Reviewed-by" to pa_ind patch
> * Rebased on top of Alex's extern removal path
> v1: https://lore.kernel.org/kvm/[email protected]/
>
> Nicolin Chen (10):
> vfio: Make vfio_unpin_pages() return void
> drm/i915/gvt: Replace roundup with DIV_ROUND_UP
> vfio/ap: Pass in physical address of ind to ap_aqic()
> vfio/ccw: Only pass in contiguous pages
> vfio: Pass in starting IOVA to vfio_pin/unpin_pages API
> vfio/ap: Change saved_pfn to saved_iova
> vfio/ccw: Change pa_pfn list to pa_iova list
> vfio: Rename user_iova of vfio_dma_rw()
> vfio/ccw: Add kmap_local_page() for memcpy
> vfio: Replace phys_pfn with pages for vfio_pin_pages()
>
> .../driver-api/vfio-mediated-device.rst | 6 +-
> arch/s390/include/asm/ap.h | 6 +-
> drivers/gpu/drm/i915/gvt/kvmgt.c | 49 ++---
> drivers/s390/cio/vfio_ccw_cp.c | 195 +++++++++++-------
> drivers/s390/crypto/ap_queue.c | 2 +-
> drivers/s390/crypto/vfio_ap_ops.c | 54 +++--
> drivers/s390/crypto/vfio_ap_private.h | 4 +-
> drivers/vfio/vfio.c | 54 ++---
> drivers/vfio/vfio.h | 8 +-
> drivers/vfio/vfio_iommu_type1.c | 45 ++--
> include/linux/vfio.h | 9 +-
> 11 files changed, 215 insertions(+), 217 deletions(-)
>
GVT-g explodes for me with this series on my Broadwell test system,
continuously spewing the following:
[ 47.344126] ------------[ cut here ]------------
[ 47.348778] WARNING: CPU: 3 PID: 501 at drivers/vfio/vfio_iommu_type1.c:978 vfio_iommu_type1_unpin_pages+0x7b/0x100 [vfio_iommu_type1]
[ 47.360871] Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink tun bridge stp llc rfkill sunrpc vfat fat intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iTCO_wdt at24 mei_wdt mei_hdcp intel_pmc_bxt mei_pxp rapl iTCO_vendor_support intel_cstate pcspkr e1000e mei_me intel_uncore i2c_i801 mei lpc_ich i2c_smbus acpi_pad fuse zram ip_tables kvmgt mdev vfio_iommu_type1 vfio kvm irqbypass i915 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pinctrl_lynxpoint i2c_algo_bit drm_buddy video drm_display_helper drm_kms_helper cec ttm drm
[ 47.423398] CPU: 3 PID: 501 Comm: gvt:rcs0 Tainted: G W 5.19.0-rc4+ #3
[ 47.431228] Hardware name: /NUC5i5MYBE, BIOS MYBDWi5v.86A.0054.2019.0520.1531 05/20/2019
[ 47.439408] RIP: 0010:vfio_iommu_type1_unpin_pages+0x7b/0x100 [vfio_iommu_type1]
[ 47.446818] Code: 10 00 00 45 31 ed 48 8b 7b 40 48 85 ff 74 12 48 8b 47 18 49 39 c6 77 23 48 8b 7f 10 48 85 ff 75 ee 48 8b 3c 24 e8 45 57 92 e4 <0f> 0b 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 03 47 28 49
[ 47.465573] RSP: 0018:ffff9ac5806cfbe0 EFLAGS: 00010246
[ 47.470807] RAX: ffff8cb42f4c5180 RBX: ffff8cb4145c03c0 RCX: 0000000000000000
[ 47.477948] RDX: 0000000000000000 RSI: 0000163802000000 RDI: ffff8cb4145c03e0
[ 47.485088] RBP: 0000000000000001 R08: 0000000000000000 R09: ffff9ac581aed000
[ 47.492230] R10: ffff9ac5806cfc58 R11: 00000001b2202000 R12: 0000000000000001
[ 47.499370] R13: 0000000000000000 R14: 0000163802001000 R15: 0000163802000000
[ 47.506513] FS: 0000000000000000(0000) GS:ffff8cb776d80000(0000) knlGS:0000000000000000
[ 47.514608] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 47.520361] CR2: ffffdc0933f76192 CR3: 0000000118118003 CR4: 00000000003726e0
[ 47.527510] Call Trace:
[ 47.529976] <TASK>
[ 47.532091] intel_gvt_dma_unmap_guest_page+0xd5/0x110 [kvmgt]
[ 47.537948] ppgtt_invalidate_spt+0x323/0x340 [kvmgt]
[ 47.543017] ppgtt_invalidate_spt+0x173/0x340 [kvmgt]
[ 47.548088] ppgtt_invalidate_spt+0x173/0x340 [kvmgt]
[ 47.553159] ppgtt_invalidate_spt+0x173/0x340 [kvmgt]
[ 47.558228] invalidate_ppgtt_mm+0x5f/0x110 [kvmgt]
[ 47.563124] _intel_vgpu_mm_release+0xd6/0xe0 [kvmgt]
[ 47.568193] intel_vgpu_destroy_workload+0x1b7/0x1e0 [kvmgt]
[ 47.573872] workload_thread+0xa4c/0x19a0 [kvmgt]
[ 47.578613] ? _raw_spin_rq_lock_irqsave+0x20/0x20
[ 47.583422] ? dequeue_task_stop+0x70/0x70
[ 47.587530] ? _raw_spin_lock_irqsave+0x24/0x50
[ 47.592072] ? intel_vgpu_reset_submission+0x40/0x40 [kvmgt]
[ 47.597746] kthread+0xe7/0x110
[ 47.600902] ? kthread_complete_and_exit+0x20/0x20
[ 47.605702] ret_from_fork+0x22/0x30
[ 47.609293] </TASK>
[ 47.611503] ---[ end trace 0000000000000000 ]---
Line 978 is the WARN_ON(i != npage) line. For the cases where we don't
find a matching vfio_dma, I'm seeing addresses that look maybe like
we're shifting a value that's already an iova by PAGE_SHIFT somewhere.
Thanks,
Alex
On Fri, Jul 22, 2022 at 04:11:29PM -0600, Alex Williamson wrote:
> GVT-g explodes for me with this series on my Broadwell test system,
> continuously spewing the following:
Thank you for running additional tests.
> [ 47.348778] WARNING: CPU: 3 PID: 501 at drivers/vfio/vfio_iommu_type1.c:978 vfio_iommu_type1_unpin_pages+0x7b/0x100 [vfio_iommu_type1]
> Line 978 is the WARN_ON(i != npage) line. For the cases where we don't
> find a matching vfio_dma, I'm seeing addresses that look maybe like
> we're shifting a value that's already an iova by PAGE_SHIFT somewhere.
Hmm..I don't understand the PAGE_SHIFT part. Do you mind clarifying?
And GVT code initiated an unpin request from gvt_unpin_guest_pag()
that is currently unpinning one page at a time on a contiguous IOVA
range, prior to this series. After this series, it leaves the per-
page routine to the internal loop of vfio_iommu_type1_unpin_pages(),
which is supposed to do the same.
So, either resulted from the npage input being wrong or some other
factor weighed in that invoked a vfio_remove_dma on those iovas?
Thanks
Nic
On Fri, 22 Jul 2022 16:12:19 -0700
Nicolin Chen <[email protected]> wrote:
> On Fri, Jul 22, 2022 at 04:11:29PM -0600, Alex Williamson wrote:
>
> > GVT-g explodes for me with this series on my Broadwell test system,
> > continuously spewing the following:
>
> Thank you for running additional tests.
>
> > [ 47.348778] WARNING: CPU: 3 PID: 501 at drivers/vfio/vfio_iommu_type1.c:978 vfio_iommu_type1_unpin_pages+0x7b/0x100 [vfio_iommu_type1]
>
> > Line 978 is the WARN_ON(i != npage) line. For the cases where we don't
> > find a matching vfio_dma, I'm seeing addresses that look maybe like
> > we're shifting a value that's already an iova by PAGE_SHIFT somewhere.
>
> Hmm..I don't understand the PAGE_SHIFT part. Do you mind clarifying?
The iova was a very large address for a 4GB VM with a lot of zeros on
the low order bits, ex. 0x162459000000. Thanks,
Alex
> And GVT code initiated an unpin request from gvt_unpin_guest_pag()
> that is currently unpinning one page at a time on a contiguous IOVA
> range, prior to this series. After this series, it leaves the per-
> page routine to the internal loop of vfio_iommu_type1_unpin_pages(),
> which is supposed to do the same.
>
> So, either resulted from the npage input being wrong or some other
> factor weighed in that invoked a vfio_remove_dma on those iovas?
>
> Thanks
> Nic
>
On Fri, Jul 22, 2022 at 06:18:00PM -0600, Alex Williamson wrote:
> External email: Use caution opening links or attachments
>
>
> On Fri, 22 Jul 2022 16:12:19 -0700
> Nicolin Chen <[email protected]> wrote:
>
> > On Fri, Jul 22, 2022 at 04:11:29PM -0600, Alex Williamson wrote:
> >
> > > GVT-g explodes for me with this series on my Broadwell test system,
> > > continuously spewing the following:
> >
> > Thank you for running additional tests.
> >
> > > [ 47.348778] WARNING: CPU: 3 PID: 501 at drivers/vfio/vfio_iommu_type1.c:978 vfio_iommu_type1_unpin_pages+0x7b/0x100 [vfio_iommu_type1]
> >
> > > Line 978 is the WARN_ON(i != npage) line. For the cases where we don't
> > > find a matching vfio_dma, I'm seeing addresses that look maybe like
> > > we're shifting a value that's already an iova by PAGE_SHIFT somewhere.
> >
> > Hmm..I don't understand the PAGE_SHIFT part. Do you mind clarifying?
>
> The iova was a very large address for a 4GB VM with a lot of zeros on
> the low order bits, ex. 0x162459000000. Thanks,
Ah! Thanks for the hint. The following commit did a double shifting:
"vfio: Pass in starting IOVA to vfio_pin/unpin_pages AP"
And the following change should fix:
-------------------
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 481dd2aeb40e..4790c7f35b88 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -293,7 +293,7 @@ static int gvt_dma_map_page(struct intel_vgpu *vgpu, unsigned long gfn,
if (dma_mapping_error(dev, *dma_addr)) {
gvt_vgpu_err("DMA mapping failed for pfn 0x%lx, ret %d\n",
page_to_pfn(page), ret);
- gvt_unpin_guest_page(vgpu, gfn << PAGE_SHIFT, size);
+ gvt_unpin_guest_page(vgpu, gfn, size);
return -ENOMEM;
}
@@ -306,7 +306,7 @@ static void gvt_dma_unmap_page(struct intel_vgpu *vgpu, unsigned long gfn,
struct device *dev = vgpu->gvt->gt->i915->drm.dev;
dma_unmap_page(dev, dma_addr, size, DMA_BIDIRECTIONAL);
- gvt_unpin_guest_page(vgpu, gfn << PAGE_SHIFT, size);
+ gvt_unpin_guest_page(vgpu, gfn, size);
}
static struct gvt_dma *__gvt_cache_find_dma_addr(struct intel_vgpu *vgpu,
-------------------
So, I think that I should send a v4, given that the patches aren't
officially applied?
On Fri, 22 Jul 2022 17:38:25 -0700
Nicolin Chen <[email protected]> wrote:
> On Fri, Jul 22, 2022 at 06:18:00PM -0600, Alex Williamson wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Fri, 22 Jul 2022 16:12:19 -0700
> > Nicolin Chen <[email protected]> wrote:
> >
> > > On Fri, Jul 22, 2022 at 04:11:29PM -0600, Alex Williamson wrote:
> > >
> > > > GVT-g explodes for me with this series on my Broadwell test system,
> > > > continuously spewing the following:
> > >
> > > Thank you for running additional tests.
> > >
> > > > [ 47.348778] WARNING: CPU: 3 PID: 501 at drivers/vfio/vfio_iommu_type1.c:978 vfio_iommu_type1_unpin_pages+0x7b/0x100 [vfio_iommu_type1]
> > >
> > > > Line 978 is the WARN_ON(i != npage) line. For the cases where we don't
> > > > find a matching vfio_dma, I'm seeing addresses that look maybe like
> > > > we're shifting a value that's already an iova by PAGE_SHIFT somewhere.
> > >
> > > Hmm..I don't understand the PAGE_SHIFT part. Do you mind clarifying?
> >
> > The iova was a very large address for a 4GB VM with a lot of zeros on
> > the low order bits, ex. 0x162459000000. Thanks,
>
> Ah! Thanks for the hint. The following commit did a double shifting:
> "vfio: Pass in starting IOVA to vfio_pin/unpin_pages AP"
>
> And the following change should fix:
> -------------------
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 481dd2aeb40e..4790c7f35b88 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -293,7 +293,7 @@ static int gvt_dma_map_page(struct intel_vgpu *vgpu, unsigned long gfn,
> if (dma_mapping_error(dev, *dma_addr)) {
> gvt_vgpu_err("DMA mapping failed for pfn 0x%lx, ret %d\n",
> page_to_pfn(page), ret);
> - gvt_unpin_guest_page(vgpu, gfn << PAGE_SHIFT, size);
> + gvt_unpin_guest_page(vgpu, gfn, size);
> return -ENOMEM;
> }
>
> @@ -306,7 +306,7 @@ static void gvt_dma_unmap_page(struct intel_vgpu *vgpu, unsigned long gfn,
> struct device *dev = vgpu->gvt->gt->i915->drm.dev;
>
> dma_unmap_page(dev, dma_addr, size, DMA_BIDIRECTIONAL);
> - gvt_unpin_guest_page(vgpu, gfn << PAGE_SHIFT, size);
> + gvt_unpin_guest_page(vgpu, gfn, size);
> }
>
> static struct gvt_dma *__gvt_cache_find_dma_addr(struct intel_vgpu *vgpu,
> -------------------
Looks likely. Not sure how Terrance was able to test this successfully
though.
> So, I think that I should send a v4, given that the patches aren't
> officially applied?
Yep, please rebase on current vfio next branch. Thanks,
Alex
On Fri, Jul 22, 2022 at 07:09:01PM -0600, Alex Williamson wrote:
> > So, I think that I should send a v4, given that the patches aren't
> > officially applied?
>
> Yep, please rebase on current vfio next branch. Thanks,
Sent. And they are on Github, basing on linux-vfio next too:
https://github.com/nicolinc/iommufd/commits/vfio_pin_pages-v4
Thanks!
Nic