Subject: [PATCH] hisi_acc_vfio_pci: Update migration data pointer correctly on saving/resume

When the optional PRE_COPY support was added to speed up the device
compatibility check, it failed to update the saving/resuming data
pointers based on the fd offset. This results in migration data
corruption and when the device gets started on the destination the
following error is reported in some cases,

[ 478.907684] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
[ 478.913691] arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000310200000010
[ 478.919603] arm-smmu-v3 arm-smmu-v3.2.auto: 0x000002088000007f
[ 478.925515] arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000
[ 478.931425] arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000
[ 478.947552] hisi_zip 0000:31:00.0: qm_axi_rresp [error status=0x1] found
[ 478.955930] hisi_zip 0000:31:00.0: qm_db_timeout [error status=0x400] found
[ 478.955944] hisi_zip 0000:31:00.0: qm sq doorbell timeout in function 2

Fixes: d9a871e4a143 ("hisi_acc_vfio_pci: Introduce support for PRE_COPY state transitions")
Signed-off-by: Shameer Kolothum <[email protected]>
---
drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index b2f9778c8366..4d27465c8f1a 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -694,6 +694,7 @@ static ssize_t hisi_acc_vf_resume_write(struct file *filp, const char __user *bu
size_t len, loff_t *pos)
{
struct hisi_acc_vf_migration_file *migf = filp->private_data;
+ u8 *vf_data = (u8 *)&migf->vf_data;
loff_t requested_length;
ssize_t done = 0;
int ret;
@@ -715,7 +716,7 @@ static ssize_t hisi_acc_vf_resume_write(struct file *filp, const char __user *bu
goto out_unlock;
}

- ret = copy_from_user(&migf->vf_data, buf, len);
+ ret = copy_from_user(vf_data + *pos, buf, len);
if (ret) {
done = -EFAULT;
goto out_unlock;
@@ -835,7 +836,9 @@ static ssize_t hisi_acc_vf_save_read(struct file *filp, char __user *buf, size_t

len = min_t(size_t, migf->total_length - *pos, len);
if (len) {
- ret = copy_to_user(buf, &migf->vf_data, len);
+ u8 *vf_data = (u8 *)&migf->vf_data;
+
+ ret = copy_to_user(buf, vf_data + *pos, len);
if (ret) {
done = -EFAULT;
goto out_unlock;
--
2.34.1


2023-11-20 14:29:43

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] hisi_acc_vfio_pci: Update migration data pointer correctly on saving/resume

On Mon, Nov 20, 2023 at 09:14:06AM +0000, Shameer Kolothum wrote:
> When the optional PRE_COPY support was added to speed up the device
> compatibility check, it failed to update the saving/resuming data
> pointers based on the fd offset. This results in migration data
> corruption and when the device gets started on the destination the
> following error is reported in some cases,
>
> [ 478.907684] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
> [ 478.913691] arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000310200000010
> [ 478.919603] arm-smmu-v3 arm-smmu-v3.2.auto: 0x000002088000007f
> [ 478.925515] arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000
> [ 478.931425] arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000
> [ 478.947552] hisi_zip 0000:31:00.0: qm_axi_rresp [error status=0x1] found
> [ 478.955930] hisi_zip 0000:31:00.0: qm_db_timeout [error status=0x400] found
> [ 478.955944] hisi_zip 0000:31:00.0: qm sq doorbell timeout in function 2
>
> Fixes: d9a871e4a143 ("hisi_acc_vfio_pci: Introduce support for PRE_COPY state transitions")
> Signed-off-by: Shameer Kolothum <[email protected]>
> ---
> drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)

Reviewed-by: Jason Gunthorpe <[email protected]>

Jason

Subject: RE: [PATCH] hisi_acc_vfio_pci: Update migration data pointer correctly on saving/resume

Hi Alex,

Just a gentle ping on this.

Thanks,
Shameer

> -----Original Message-----
> From: Jason Gunthorpe <[email protected]>
> Sent: Monday, November 20, 2023 2:29 PM
> To: Shameerali Kolothum Thodi <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> Linuxarm <[email protected]>; liulongfang <[email protected]>
> Subject: Re: [PATCH] hisi_acc_vfio_pci: Update migration data pointer correctly
> on saving/resume
>
> On Mon, Nov 20, 2023 at 09:14:06AM +0000, Shameer Kolothum wrote:
> > When the optional PRE_COPY support was added to speed up the device
> > compatibility check, it failed to update the saving/resuming data
> > pointers based on the fd offset. This results in migration data
> > corruption and when the device gets started on the destination the
> > following error is reported in some cases,
> >
> > [ 478.907684] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
> > [ 478.913691] arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000310200000010 [
> > 478.919603] arm-smmu-v3 arm-smmu-v3.2.auto: 0x000002088000007f [
> > 478.925515] arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000 [
> > 478.931425] arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000 [
> > 478.947552] hisi_zip 0000:31:00.0: qm_axi_rresp [error status=0x1]
> > found [ 478.955930] hisi_zip 0000:31:00.0: qm_db_timeout [error
> > status=0x400] found [ 478.955944] hisi_zip 0000:31:00.0: qm sq
> > doorbell timeout in function 2
> >
> > Fixes: d9a871e4a143 ("hisi_acc_vfio_pci: Introduce support for
> > PRE_COPY state transitions")
> > Signed-off-by: Shameer Kolothum <[email protected]>
> > ---
> > drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
>
> Reviewed-by: Jason Gunthorpe <[email protected]>
>
> Jason

2024-01-05 16:30:51

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH] hisi_acc_vfio_pci: Update migration data pointer correctly on saving/resume

On Fri, 5 Jan 2024 15:56:09 +0000
Shameerali Kolothum Thodi <[email protected]> wrote:

> Hi Alex,
>
> Just a gentle ping on this.

Thanks for the ping, it seems to have slipped under my radar. Applied
to vfio next branch for v6.8. Thanks,

Alex

> > -----Original Message-----
> > From: Jason Gunthorpe <[email protected]>
> > Sent: Monday, November 20, 2023 2:29 PM
> > To: Shameerali Kolothum Thodi <[email protected]>
> > Cc: [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > Linuxarm <[email protected]>; liulongfang <[email protected]>
> > Subject: Re: [PATCH] hisi_acc_vfio_pci: Update migration data pointer correctly
> > on saving/resume
> >
> > On Mon, Nov 20, 2023 at 09:14:06AM +0000, Shameer Kolothum wrote:
> > > When the optional PRE_COPY support was added to speed up the device
> > > compatibility check, it failed to update the saving/resuming data
> > > pointers based on the fd offset. This results in migration data
> > > corruption and when the device gets started on the destination the
> > > following error is reported in some cases,
> > >
> > > [ 478.907684] arm-smmu-v3 arm-smmu-v3.2.auto: event 0x10 received:
> > > [ 478.913691] arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000310200000010 [
> > > 478.919603] arm-smmu-v3 arm-smmu-v3.2.auto: 0x000002088000007f [
> > > 478.925515] arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000 [
> > > 478.931425] arm-smmu-v3 arm-smmu-v3.2.auto: 0x0000000000000000 [
> > > 478.947552] hisi_zip 0000:31:00.0: qm_axi_rresp [error status=0x1]
> > > found [ 478.955930] hisi_zip 0000:31:00.0: qm_db_timeout [error
> > > status=0x400] found [ 478.955944] hisi_zip 0000:31:00.0: qm sq
> > > doorbell timeout in function 2
> > >
> > > Fixes: d9a871e4a143 ("hisi_acc_vfio_pci: Introduce support for
> > > PRE_COPY state transitions")
> > > Signed-off-by: Shameer Kolothum <[email protected]>
> > > ---
> > > drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 7 +++++--
> > > 1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > Reviewed-by: Jason Gunthorpe <[email protected]>
> >
> > Jason
>