2022-02-28 17:54:48

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration

On Mon, Feb 28, 2022 at 09:01:20AM +0000, Shameer Kolothum wrote:

> +static int hisi_acc_vf_stop_copy(struct hisi_acc_vf_core_device *hisi_acc_vdev,
> + struct hisi_acc_vf_migration_file *migf)
> +{
> + struct acc_vf_data *vf_data = &migf->vf_data;

This now needs to hold the migf->lock

> +
> + if ((cur == VFIO_DEVICE_STATE_STOP || cur == VFIO_DEVICE_STATE_PRE_COPY) &&
> + new == VFIO_DEVICE_STATE_RUNNING) {
> + hisi_acc_vf_start_device(hisi_acc_vdev);

This should be two stanzas STOP->RUNNING should do start_device

And PRE_COPY->RUNNING should do disable_fds, and presumably nothing
else - the device was never stopped.


> + } else if (cmd == VFIO_DEVICE_MIG_PRECOPY) {
> + struct vfio_device_mig_precopy precopy;
> + enum vfio_device_mig_state curr_state;
> + unsigned long minsz;
> + int ret;
> +
> + minsz = offsetofend(struct vfio_device_mig_precopy, dirty_bytes);
> +
> + if (copy_from_user(&precopy, (void __user *)arg, minsz))
> + return -EFAULT;
> + if (precopy.argsz < minsz)
> + return -EINVAL;
> +
> + ret = hisi_acc_vfio_pci_get_device_state(core_vdev, &curr_state);
> + if (!ret && curr_state == VFIO_DEVICE_STATE_PRE_COPY) {
> + precopy.initial_bytes = QM_MATCH_SIZE;
> + precopy.dirty_bytes = QM_MATCH_SIZE;

dirty_bytes should be 0

initial_bytes should be calculated based on the current file
descriptor offset.

The use of curr_state should be eliminated

This ioctl should be on the saving file_operations, not here

+ * This ioctl is used on the migration data FD in the precopy phase of the
+ * migration data transfer. It returns an estimate of the current data sizes

I see there is a bug in the qemu version:

@@ -215,12 +218,13 @@ static void vfio_save_precopy_pending(QEMUFile *f, void *>
uint64_t *res_postcopy_only)
{
VFIODevice *vbasedev = opaque;
+ VFIOMigration *migration = vbasedev->migration;
struct vfio_device_mig_precopy precopy = {
.argsz = sizeof(precopy),
};
int ret;

- ret = ioctl(vbasedev->fd, VFIO_DEVICE_MIG_PRECOPY, &precopy);
+ ret = ioctl(migration->data_fd, VFIO_DEVICE_MIG_PRECOPY, &precopy);
if (ret) {
return;
}

I'll update my github.

Jason


Subject: RE: [PATCH v6 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration



> -----Original Message-----
> From: Jason Gunthorpe [mailto:[email protected]]
> Sent: 28 February 2022 14:58
> To: Shameerali Kolothum Thodi <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; Linuxarm
> <[email protected]>; liulongfang <[email protected]>; Zengtao (B)
> <[email protected]>; Jonathan Cameron
> <[email protected]>; Wangzhou (B) <[email protected]>
> Subject: Re: [PATCH v6 09/10] hisi_acc_vfio_pci: Add support for VFIO live
> migration
>
> On Mon, Feb 28, 2022 at 09:01:20AM +0000, Shameer Kolothum wrote:
>
> > +static int hisi_acc_vf_stop_copy(struct hisi_acc_vf_core_device
> *hisi_acc_vdev,
> > + struct hisi_acc_vf_migration_file *migf)
> > +{
> > + struct acc_vf_data *vf_data = &migf->vf_data;
>
> This now needs to hold the migf->lock
>
> > +
> > + if ((cur == VFIO_DEVICE_STATE_STOP || cur ==
> VFIO_DEVICE_STATE_PRE_COPY) &&
> > + new == VFIO_DEVICE_STATE_RUNNING) {
> > + hisi_acc_vf_start_device(hisi_acc_vdev);
>
> This should be two stanzas STOP->RUNNING should do start_device
>
> And PRE_COPY->RUNNING should do disable_fds, and presumably nothing
> else - the device was never stopped.
>

Ok. I will take care of all the above.

> > + } else if (cmd == VFIO_DEVICE_MIG_PRECOPY) {
> > + struct vfio_device_mig_precopy precopy;
> > + enum vfio_device_mig_state curr_state;
> > + unsigned long minsz;
> > + int ret;
> > +
> > + minsz = offsetofend(struct vfio_device_mig_precopy, dirty_bytes);
> > +
> > + if (copy_from_user(&precopy, (void __user *)arg, minsz))
> > + return -EFAULT;
> > + if (precopy.argsz < minsz)
> > + return -EINVAL;
> > +
> > + ret = hisi_acc_vfio_pci_get_device_state(core_vdev, &curr_state);
> > + if (!ret && curr_state == VFIO_DEVICE_STATE_PRE_COPY) {
> > + precopy.initial_bytes = QM_MATCH_SIZE;
> > + precopy.dirty_bytes = QM_MATCH_SIZE;
>
> dirty_bytes should be 0
>
> initial_bytes should be calculated based on the current file
> descriptor offset.
>
> The use of curr_state should be eliminated
>
> This ioctl should be on the saving file_operations, not here
>
> + * This ioctl is used on the migration data FD in the precopy phase of the
> + * migration data transfer. It returns an estimate of the current data sizes
>
> I see there is a bug in the qemu version:
>
> @@ -215,12 +218,13 @@ static void vfio_save_precopy_pending(QEMUFile
> *f, void *>
> uint64_t *res_postcopy_only)
> {
> VFIODevice *vbasedev = opaque;
> + VFIOMigration *migration = vbasedev->migration;
> struct vfio_device_mig_precopy precopy = {
> .argsz = sizeof(precopy),
> };
> int ret;
>
> - ret = ioctl(vbasedev->fd, VFIO_DEVICE_MIG_PRECOPY, &precopy);
> + ret = ioctl(migration->data_fd, VFIO_DEVICE_MIG_PRECOPY, &precopy);
> if (ret) {
> return;
> }
>
> I'll update my github.

Ok. Thanks for that.

And for the VFIO_DEVICE_MIG_PRECOPY ioctl, this is what I have now,

+static long hisi_acc_vf_save_unl_ioctl(struct file *filp,
+ unsigned int cmd, unsigned long arg)
+{
+ struct hisi_acc_vf_migration_file *migf = filp->private_data;
+ loff_t *pos = &filp->f_pos;
+ struct vfio_device_mig_precopy precopy;
+ unsigned long minsz;
+
+ if (cmd != VFIO_DEVICE_MIG_PRECOPY)
+ return -EINVAL;
+
+ minsz = offsetofend(struct vfio_device_mig_precopy, dirty_bytes);
+
+ if (copy_from_user(&precopy, (void __user *)arg, minsz))
+ return -EFAULT;
+ if (precopy.argsz < minsz)
+ return -EINVAL;
+
+ mutex_lock(&migf->lock);
+ if (*pos > migf->total_length) {
+ mutex_unlock(&migf->lock);
+ return -EINVAL;
+ }
+
+ precopy.dirty_bytes = 0;
+ precopy.initial_bytes = migf->total_length - *pos;
+ mutex_unlock(&migf->lock);
+ return copy_to_user((void __user *)arg, &precopy, minsz) ? -EFAULT : 0;
+}
+

I had a quick run with the above Qemu changes, and looks ok. Please let me know.

Thanks,
Shameer

2022-02-28 19:17:12

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration

On Mon, Feb 28, 2022 at 06:01:44PM +0000, Shameerali Kolothum Thodi wrote:

> +static long hisi_acc_vf_save_unl_ioctl(struct file *filp,
> + unsigned int cmd, unsigned long arg)
> +{
> + struct hisi_acc_vf_migration_file *migf = filp->private_data;
> + loff_t *pos = &filp->f_pos;
> + struct vfio_device_mig_precopy precopy;
> + unsigned long minsz;
> +
> + if (cmd != VFIO_DEVICE_MIG_PRECOPY)
> + return -EINVAL;

ENOTTY

> +
> + minsz = offsetofend(struct vfio_device_mig_precopy, dirty_bytes);
> +
> + if (copy_from_user(&precopy, (void __user *)arg, minsz))
> + return -EFAULT;
> + if (precopy.argsz < minsz)
> + return -EINVAL;
> +
> + mutex_lock(&migf->lock);
> + if (*pos > migf->total_length) {
> + mutex_unlock(&migf->lock);
> + return -EINVAL;
> + }
> +
> + precopy.dirty_bytes = 0;
> + precopy.initial_bytes = migf->total_length - *pos;
> + mutex_unlock(&migf->lock);
> + return copy_to_user((void __user *)arg, &precopy, minsz) ? -EFAULT : 0;
> +}

Yes

And I noticed this didn't include the ENOMSG handling, read() should
return ENOMSG when it reaches EOS for the pre-copy:

+ * During pre-copy the migration data FD has a temporary "end of stream" that is
+ * reached when both initial_bytes and dirty_byte are zero. For instance, this
+ * may indicate that the device is idle and not currently dirtying any internal
+ * state. When read() is done on this temporary end of stream the kernel driver
+ * should return ENOMSG from read(). Userspace can wait for more data (which may
+ * never come) by using poll.

Jason