2022-03-03 13:29:05

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v7 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration

On Thu, Mar 03, 2022 at 12:57:29PM +0000, Shameerali Kolothum Thodi wrote:
>
>
> > From: Jason Gunthorpe [mailto:[email protected]]
> > Sent: 03 March 2022 00:22
> > To: Shameerali Kolothum Thodi <[email protected]>
> > Cc: [email protected]; [email protected];
> > [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; Linuxarm <[email protected]>; liulongfang
> > <[email protected]>; Zengtao (B) <[email protected]>;
> > Jonathan Cameron <[email protected]>; Wangzhou (B)
> > <[email protected]>
> > Subject: Re: [PATCH v7 09/10] hisi_acc_vfio_pci: Add support for VFIO live
> > migration
> >
> > On Wed, Mar 02, 2022 at 05:29:02PM +0000, Shameer Kolothum wrote:
> > > +static long hisi_acc_vf_save_unl_ioctl(struct file *filp,
> > > + unsigned int cmd, unsigned long arg)
> > > +{
> > > + struct hisi_acc_vf_migration_file *migf = filp->private_data;
> > > + struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(migf,
> > > + struct hisi_acc_vf_core_device, saving_migf);
> > > + loff_t *pos = &filp->f_pos;
> > > + struct vfio_precopy_info info;
> > > + unsigned long minsz;
> > > + int ret;
> > > +
> > > + if (cmd != VFIO_MIG_GET_PRECOPY_INFO)
> > > + return -ENOTTY;
> > > +
> > > + minsz = offsetofend(struct vfio_precopy_info, dirty_bytes);
> > > +
> > > + if (copy_from_user(&info, (void __user *)arg, minsz))
> > > + return -EFAULT;
> > > + if (info.argsz < minsz)
> > > + return -EINVAL;
> > > +
> > > + mutex_lock(&hisi_acc_vdev->state_mutex);
> > > + if (hisi_acc_vdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY) {
> > > + mutex_unlock(&hisi_acc_vdev->state_mutex);
> > > + return -EINVAL;
> > > + }
> >
> > IMHO it is easier just to check the total_length and not grab this
> > other lock
>
> The problem with checking the total_length here is that it is possible that
> in STOP_COPY the dev is not ready and there are no more data to be transferred
> and the total_length remains at QM_MATCH_SIZE.

Tthere is a scenario that transfers only QM_MATCH_SIZE in stop_copy?
This doesn't seem like a good idea, I think you should transfer a
positive indication 'this device is not ready' instead of truncating
the stream. A truncated stream should not be a valid stream.

ie always transfer the whole struct.

> Looks like setting the total_length = 0 in STOP_COPY is a better solution(If there are
> no other issues with that) as it will avoid grabbing the state_mutex as you
> mentioned above.

That seems really weird, I wouldn't recommend doing that..

Kaspm


Subject: RE: [PATCH v7 09/10] hisi_acc_vfio_pci: Add support for VFIO live migration



> -----Original Message-----
> From: Jason Gunthorpe [mailto:[email protected]]
> Sent: 03 March 2022 13:04
> To: Shameerali Kolothum Thodi <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; Linuxarm <[email protected]>; liulongfang
> <[email protected]>; Zengtao (B) <[email protected]>;
> Jonathan Cameron <[email protected]>; Wangzhou (B)
> <[email protected]>
> Subject: Re: [PATCH v7 09/10] hisi_acc_vfio_pci: Add support for VFIO live
> migration
>
> On Thu, Mar 03, 2022 at 12:57:29PM +0000, Shameerali Kolothum Thodi
> wrote:
> >
> >
> > > From: Jason Gunthorpe [mailto:[email protected]]
> > > Sent: 03 March 2022 00:22
> > > To: Shameerali Kolothum Thodi
> <[email protected]>
> > > Cc: [email protected]; [email protected];
> > > [email protected]; [email protected];
> > > [email protected]; [email protected];
> [email protected];
> > > [email protected]; Linuxarm <[email protected]>; liulongfang
> > > <[email protected]>; Zengtao (B) <[email protected]>;
> > > Jonathan Cameron <[email protected]>; Wangzhou (B)
> > > <[email protected]>
> > > Subject: Re: [PATCH v7 09/10] hisi_acc_vfio_pci: Add support for VFIO live
> > > migration
> > >
> > > On Wed, Mar 02, 2022 at 05:29:02PM +0000, Shameer Kolothum wrote:
> > > > +static long hisi_acc_vf_save_unl_ioctl(struct file *filp,
> > > > + unsigned int cmd, unsigned long arg)
> > > > +{
> > > > + struct hisi_acc_vf_migration_file *migf = filp->private_data;
> > > > + struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(migf,
> > > > + struct hisi_acc_vf_core_device, saving_migf);
> > > > + loff_t *pos = &filp->f_pos;
> > > > + struct vfio_precopy_info info;
> > > > + unsigned long minsz;
> > > > + int ret;
> > > > +
> > > > + if (cmd != VFIO_MIG_GET_PRECOPY_INFO)
> > > > + return -ENOTTY;
> > > > +
> > > > + minsz = offsetofend(struct vfio_precopy_info, dirty_bytes);
> > > > +
> > > > + if (copy_from_user(&info, (void __user *)arg, minsz))
> > > > + return -EFAULT;
> > > > + if (info.argsz < minsz)
> > > > + return -EINVAL;
> > > > +
> > > > + mutex_lock(&hisi_acc_vdev->state_mutex);
> > > > + if (hisi_acc_vdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY) {
> > > > + mutex_unlock(&hisi_acc_vdev->state_mutex);
> > > > + return -EINVAL;
> > > > + }
> > >
> > > IMHO it is easier just to check the total_length and not grab this
> > > other lock
> >
> > The problem with checking the total_length here is that it is possible that
> > in STOP_COPY the dev is not ready and there are no more data to be
> transferred
> > and the total_length remains at QM_MATCH_SIZE.
>
> Tthere is a scenario that transfers only QM_MATCH_SIZE in stop_copy?
> This doesn't seem like a good idea, I think you should transfer a
> positive indication 'this device is not ready' instead of truncating
> the stream. A truncated stream should not be a valid stream.
>
> ie always transfer the whole struct.

We could add a 'qm_state' and return the whole struct. But the rest
of the struct is basically invalid if qm_state = QM_NOT_REDAY.

>
> > Looks like setting the total_length = 0 in STOP_COPY is a better
> > solution(If there are no other issues with that) as it will avoid
> > grabbing the state_mutex as you mentioned above.
>
> That seems really weird, I wouldn't recommend doing that..

Does that mean we don't support a zero data transfer in STOP_COPY?
The concern is if we always transfer the whole struct, we end up reading
and writing the whole thing even if most of the data is invalid.

Thanks,
Shameer