2024-03-08 18:22:27

by Brett Creeley

[permalink] [raw]
Subject: [PATCH vfio 0/2] vfio/pds: Reset related fixes/improvements

This mini-series adds a fix for a patch that was recently accepted that
I missed during initial testing. The link can be found here:
https://lore.kernel.org/kvm/[email protected]/

Also, remove the unnecessary deferred reset logic.

Brett Creeley (2):
vfio/pds: Make sure migration file isn't accessed after reset
vfio/pds: Refactor/simplify reset logic

drivers/vfio/pci/pds/dirty.c | 6 ++---
drivers/vfio/pci/pds/lm.c | 13 ++++++++++
drivers/vfio/pci/pds/lm.h | 1 +
drivers/vfio/pci/pds/pci_drv.c | 27 ++++----------------
drivers/vfio/pci/pds/vfio_dev.c | 45 +++++++--------------------------
drivers/vfio/pci/pds/vfio_dev.h | 8 ++----
6 files changed, 33 insertions(+), 67 deletions(-)

--
2.17.1



2024-03-08 18:22:47

by Brett Creeley

[permalink] [raw]
Subject: [PATCH vfio 1/2] vfio/pds: Make sure migration file isn't accessed after reset

It's possible the migration file is accessed after reset when it has
been cleaned up, especially when it's initiated by the device. This is
because the driver doesn't rip out the filep when cleaning up it only
frees the related page structures and sets its local struct
pds_vfio_lm_file pointer to NULL. This can cause a NULL pointer
dereference, which is shown in the example below during a restore after
a device initiated reset:

BUG: kernel NULL pointer dereference, address: 000000000000000c
PF: supervisor read access in kernel mode
PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:pds_vfio_get_file_page+0x5d/0xf0 [pds_vfio_pci]
[...]
Call Trace:
<TASK>
pds_vfio_restore_write+0xf6/0x160 [pds_vfio_pci]
vfs_write+0xc9/0x3f0
? __fget_light+0xc9/0x110
ksys_write+0xb5/0xf0
__x64_sys_write+0x1a/0x20
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
[...]

Add a disabled flag to the driver's struct pds_vfio_lm_file that gets
set during cleanup. Then make sure to check the flag when the migration
file is accessed via its file_operations. By default this flag will be
false as the memory for struct pds_vfio_lm_file is kzalloc'd, which means
the struct pds_vfio_lm_file is enabled and accessible. Also, since the
file_operations and driver's migration file cleanup happen under the
protection of the same pds_vfio_lm_file.lock, using this flag is thread
safe.

Fixes: 8512ed256334 ("vfio/pds: Always clear the save/restore FDs on reset")
Reviewed-by: Shannon Nelson <[email protected]>
Signed-off-by: Brett Creeley <[email protected]>
---
drivers/vfio/pci/pds/lm.c | 13 +++++++++++++
drivers/vfio/pci/pds/lm.h | 1 +
2 files changed, 14 insertions(+)

diff --git a/drivers/vfio/pci/pds/lm.c b/drivers/vfio/pci/pds/lm.c
index 79fe2e66bb49..6b94cc0bf45b 100644
--- a/drivers/vfio/pci/pds/lm.c
+++ b/drivers/vfio/pci/pds/lm.c
@@ -92,8 +92,10 @@ static void pds_vfio_put_lm_file(struct pds_vfio_lm_file *lm_file)
{
mutex_lock(&lm_file->lock);

+ lm_file->disabled = true;
lm_file->size = 0;
lm_file->alloc_size = 0;
+ lm_file->filep->f_pos = 0;

/* Free scatter list of file pages */
sg_free_table(&lm_file->sg_table);
@@ -183,6 +185,12 @@ static ssize_t pds_vfio_save_read(struct file *filp, char __user *buf,
pos = &filp->f_pos;

mutex_lock(&lm_file->lock);
+
+ if (lm_file->disabled) {
+ done = -ENODEV;
+ goto out_unlock;
+ }
+
if (*pos > lm_file->size) {
done = -EINVAL;
goto out_unlock;
@@ -283,6 +291,11 @@ static ssize_t pds_vfio_restore_write(struct file *filp, const char __user *buf,

mutex_lock(&lm_file->lock);

+ if (lm_file->disabled) {
+ done = -ENODEV;
+ goto out_unlock;
+ }
+
while (len) {
size_t page_offset;
struct page *page;
diff --git a/drivers/vfio/pci/pds/lm.h b/drivers/vfio/pci/pds/lm.h
index 13be893198b7..9511b1afc6a1 100644
--- a/drivers/vfio/pci/pds/lm.h
+++ b/drivers/vfio/pci/pds/lm.h
@@ -27,6 +27,7 @@ struct pds_vfio_lm_file {
struct scatterlist *last_offset_sg; /* Iterator */
unsigned int sg_last_entry;
unsigned long last_offset;
+ bool disabled;
};

struct pds_vfio_pci_device;
--
2.17.1


2024-03-08 18:23:02

by Brett Creeley

[permalink] [raw]
Subject: [PATCH vfio 2/2] vfio/pds: Refactor/simplify reset logic

The current logic for handling resets is more complicated than it needs
to be. The deferred_reset flag is used to indicate a reset is needed
and the deferred_reset_state is the requested, post-reset, state.

Also, the deferred_reset logic was added to vfio migration drivers to
prevent a circular locking dependency with respect to mm_lock and state
mutex. This is mainly because of the copy_to/from_user() functions(which
takes mm_lock) invoked under state mutex.

Remove all of the deferred reset logic and just pass the requested
next state to pds_vfio_reset() so it can be used for VMM and DSC
initiated resets.

This removes the need for pds_vfio_state_mutex_lock(), so remove that
and replace its use with a simple mutex_unlock().

Also, remove the reset_mutex as it's no longer needed since the
state_mutex can be the driver's primary protector.

Suggested-by: Shameer Kolothum <[email protected]>
Reviewed-by: Shannon Nelson <[email protected]>
Signed-off-by: Brett Creeley <[email protected]>
---
drivers/vfio/pci/pds/dirty.c | 6 ++---
drivers/vfio/pci/pds/pci_drv.c | 27 ++++----------------
drivers/vfio/pci/pds/vfio_dev.c | 45 +++++++--------------------------
drivers/vfio/pci/pds/vfio_dev.h | 8 ++----
4 files changed, 19 insertions(+), 67 deletions(-)

diff --git a/drivers/vfio/pci/pds/dirty.c b/drivers/vfio/pci/pds/dirty.c
index 8ddf4346fcd5..68e8f006dfdb 100644
--- a/drivers/vfio/pci/pds/dirty.c
+++ b/drivers/vfio/pci/pds/dirty.c
@@ -607,7 +607,7 @@ int pds_vfio_dma_logging_report(struct vfio_device *vdev, unsigned long iova,

mutex_lock(&pds_vfio->state_mutex);
err = pds_vfio_dirty_sync(pds_vfio, dirty, iova, length);
- pds_vfio_state_mutex_unlock(pds_vfio);
+ mutex_unlock(&pds_vfio->state_mutex);

return err;
}
@@ -624,7 +624,7 @@ int pds_vfio_dma_logging_start(struct vfio_device *vdev,
mutex_lock(&pds_vfio->state_mutex);
pds_vfio_send_host_vf_lm_status_cmd(pds_vfio, PDS_LM_STA_IN_PROGRESS);
err = pds_vfio_dirty_enable(pds_vfio, ranges, nnodes, page_size);
- pds_vfio_state_mutex_unlock(pds_vfio);
+ mutex_unlock(&pds_vfio->state_mutex);

return err;
}
@@ -637,7 +637,7 @@ int pds_vfio_dma_logging_stop(struct vfio_device *vdev)

mutex_lock(&pds_vfio->state_mutex);
pds_vfio_dirty_disable(pds_vfio, true);
- pds_vfio_state_mutex_unlock(pds_vfio);
+ mutex_unlock(&pds_vfio->state_mutex);

return 0;
}
diff --git a/drivers/vfio/pci/pds/pci_drv.c b/drivers/vfio/pci/pds/pci_drv.c
index a34dda516629..16e93b11ab1b 100644
--- a/drivers/vfio/pci/pds/pci_drv.c
+++ b/drivers/vfio/pci/pds/pci_drv.c
@@ -21,16 +21,13 @@

static void pds_vfio_recovery(struct pds_vfio_pci_device *pds_vfio)
{
- bool deferred_reset_needed = false;
-
/*
* Documentation states that the kernel migration driver must not
* generate asynchronous device state transitions outside of
* manipulation by the user or the VFIO_DEVICE_RESET ioctl.
*
* Since recovery is an asynchronous event received from the device,
- * initiate a deferred reset. Issue a deferred reset in the following
- * situations:
+ * initiate a reset in the following situations:
* 1. Migration is in progress, which will cause the next step of
* the migration to fail.
* 2. If the device is in a state that will be set to
@@ -42,24 +39,8 @@ static void pds_vfio_recovery(struct pds_vfio_pci_device *pds_vfio)
pds_vfio->state != VFIO_DEVICE_STATE_ERROR) ||
(pds_vfio->state == VFIO_DEVICE_STATE_RUNNING &&
pds_vfio_dirty_is_enabled(pds_vfio)))
- deferred_reset_needed = true;
+ pds_vfio_reset(pds_vfio, VFIO_DEVICE_STATE_ERROR);
mutex_unlock(&pds_vfio->state_mutex);
-
- /*
- * On the next user initiated state transition, the device will
- * transition to the VFIO_DEVICE_STATE_ERROR. At this point it's the user's
- * responsibility to reset the device.
- *
- * If a VFIO_DEVICE_RESET is requested post recovery and before the next
- * state transition, then the deferred reset state will be set to
- * VFIO_DEVICE_STATE_RUNNING.
- */
- if (deferred_reset_needed) {
- mutex_lock(&pds_vfio->reset_mutex);
- pds_vfio->deferred_reset = true;
- pds_vfio->deferred_reset_state = VFIO_DEVICE_STATE_ERROR;
- mutex_unlock(&pds_vfio->reset_mutex);
- }
}

static int pds_vfio_pci_notify_handler(struct notifier_block *nb,
@@ -185,7 +166,9 @@ static void pds_vfio_pci_aer_reset_done(struct pci_dev *pdev)
{
struct pds_vfio_pci_device *pds_vfio = pds_vfio_pci_drvdata(pdev);

- pds_vfio_reset(pds_vfio);
+ mutex_lock(&pds_vfio->state_mutex);
+ pds_vfio_reset(pds_vfio, VFIO_DEVICE_STATE_RUNNING);
+ mutex_unlock(&pds_vfio->state_mutex);
}

static const struct pci_error_handlers pds_vfio_pci_err_handlers = {
diff --git a/drivers/vfio/pci/pds/vfio_dev.c b/drivers/vfio/pci/pds/vfio_dev.c
index a286ebcc7112..76a80ae7087b 100644
--- a/drivers/vfio/pci/pds/vfio_dev.c
+++ b/drivers/vfio/pci/pds/vfio_dev.c
@@ -26,37 +26,14 @@ struct pds_vfio_pci_device *pds_vfio_pci_drvdata(struct pci_dev *pdev)
vfio_coredev);
}

-void pds_vfio_state_mutex_unlock(struct pds_vfio_pci_device *pds_vfio)
+void pds_vfio_reset(struct pds_vfio_pci_device *pds_vfio,
+ enum vfio_device_mig_state state)
{
-again:
- mutex_lock(&pds_vfio->reset_mutex);
- if (pds_vfio->deferred_reset) {
- pds_vfio->deferred_reset = false;
- pds_vfio_put_restore_file(pds_vfio);
- pds_vfio_put_save_file(pds_vfio);
- if (pds_vfio->state == VFIO_DEVICE_STATE_ERROR) {
- pds_vfio_dirty_disable(pds_vfio, false);
- }
- pds_vfio->state = pds_vfio->deferred_reset_state;
- pds_vfio->deferred_reset_state = VFIO_DEVICE_STATE_RUNNING;
- mutex_unlock(&pds_vfio->reset_mutex);
- goto again;
- }
- mutex_unlock(&pds_vfio->state_mutex);
- mutex_unlock(&pds_vfio->reset_mutex);
-}
-
-void pds_vfio_reset(struct pds_vfio_pci_device *pds_vfio)
-{
- mutex_lock(&pds_vfio->reset_mutex);
- pds_vfio->deferred_reset = true;
- pds_vfio->deferred_reset_state = VFIO_DEVICE_STATE_RUNNING;
- if (!mutex_trylock(&pds_vfio->state_mutex)) {
- mutex_unlock(&pds_vfio->reset_mutex);
- return;
- }
- mutex_unlock(&pds_vfio->reset_mutex);
- pds_vfio_state_mutex_unlock(pds_vfio);
+ pds_vfio_put_restore_file(pds_vfio);
+ pds_vfio_put_save_file(pds_vfio);
+ if (state == VFIO_DEVICE_STATE_ERROR)
+ pds_vfio_dirty_disable(pds_vfio, false);
+ pds_vfio->state = state;
}

static struct file *
@@ -97,8 +74,7 @@ pds_vfio_set_device_state(struct vfio_device *vdev,
break;
}
}
- pds_vfio_state_mutex_unlock(pds_vfio);
- /* still waiting on a deferred_reset */
+ mutex_unlock(&pds_vfio->state_mutex);
if (pds_vfio->state == VFIO_DEVICE_STATE_ERROR)
res = ERR_PTR(-EIO);

@@ -114,7 +90,7 @@ static int pds_vfio_get_device_state(struct vfio_device *vdev,

mutex_lock(&pds_vfio->state_mutex);
*current_state = pds_vfio->state;
- pds_vfio_state_mutex_unlock(pds_vfio);
+ mutex_unlock(&pds_vfio->state_mutex);
return 0;
}

@@ -156,7 +132,6 @@ static int pds_vfio_init_device(struct vfio_device *vdev)
pds_vfio->vf_id = vf_id;

mutex_init(&pds_vfio->state_mutex);
- mutex_init(&pds_vfio->reset_mutex);

vdev->migration_flags = VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P;
vdev->mig_ops = &pds_vfio_lm_ops;
@@ -178,7 +153,6 @@ static void pds_vfio_release_device(struct vfio_device *vdev)
vfio_coredev.vdev);

mutex_destroy(&pds_vfio->state_mutex);
- mutex_destroy(&pds_vfio->reset_mutex);
vfio_pci_core_release_dev(vdev);
}

@@ -194,7 +168,6 @@ static int pds_vfio_open_device(struct vfio_device *vdev)
return err;

pds_vfio->state = VFIO_DEVICE_STATE_RUNNING;
- pds_vfio->deferred_reset_state = VFIO_DEVICE_STATE_RUNNING;

vfio_pci_core_finish_enable(&pds_vfio->vfio_coredev);

diff --git a/drivers/vfio/pci/pds/vfio_dev.h b/drivers/vfio/pci/pds/vfio_dev.h
index e7b01080a1ec..803d99d69c73 100644
--- a/drivers/vfio/pci/pds/vfio_dev.h
+++ b/drivers/vfio/pci/pds/vfio_dev.h
@@ -18,20 +18,16 @@ struct pds_vfio_pci_device {
struct pds_vfio_dirty dirty;
struct mutex state_mutex; /* protect migration state */
enum vfio_device_mig_state state;
- struct mutex reset_mutex; /* protect reset_done flow */
- u8 deferred_reset;
- enum vfio_device_mig_state deferred_reset_state;
struct notifier_block nb;

int vf_id;
u16 client_id;
};

-void pds_vfio_state_mutex_unlock(struct pds_vfio_pci_device *pds_vfio);
-
const struct vfio_device_ops *pds_vfio_ops_info(void);
struct pds_vfio_pci_device *pds_vfio_pci_drvdata(struct pci_dev *pdev);
-void pds_vfio_reset(struct pds_vfio_pci_device *pds_vfio);
+void pds_vfio_reset(struct pds_vfio_pci_device *pds_vfio,
+ enum vfio_device_mig_state state);

struct pci_dev *pds_vfio_to_pci_dev(struct pds_vfio_pci_device *pds_vfio);
struct device *pds_vfio_to_dev(struct pds_vfio_pci_device *pds_vfio);
--
2.17.1


2024-03-11 20:23:36

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH vfio 0/2] vfio/pds: Reset related fixes/improvements

On Fri, 8 Mar 2024 10:21:47 -0800
Brett Creeley <[email protected]> wrote:

> This mini-series adds a fix for a patch that was recently accepted that
> I missed during initial testing. The link can be found here:
> https://lore.kernel.org/kvm/[email protected]/
>
> Also, remove the unnecessary deferred reset logic.
>
> Brett Creeley (2):
> vfio/pds: Make sure migration file isn't accessed after reset
> vfio/pds: Refactor/simplify reset logic
>
> drivers/vfio/pci/pds/dirty.c | 6 ++---
> drivers/vfio/pci/pds/lm.c | 13 ++++++++++
> drivers/vfio/pci/pds/lm.h | 1 +
> drivers/vfio/pci/pds/pci_drv.c | 27 ++++----------------
> drivers/vfio/pci/pds/vfio_dev.c | 45 +++++++--------------------------
> drivers/vfio/pci/pds/vfio_dev.h | 8 ++----
> 6 files changed, 33 insertions(+), 67 deletions(-)
>

Applied to vfio next branch for v6.9. Thanks,

Alex