2023-09-20 11:48:54

by Zhu, Lingshan

[permalink] [raw]
Subject: Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg



On 9/20/2023 2:58 PM, Parav Pandit wrote:
>> From: Chen, Jiqian <[email protected]>
>> Sent: Wednesday, September 20, 2023 12:03 PM
>> If driver write 0 to reset device, can the SUSPEND bit be cleared?
> It must as reset operation, resets everything else and so the suspend too.
>
>> (pci_pm_resume->virtio_pci_restore->virtio_device_restore-
>>> virtio_reset_device)
>> If SUSPEND is cleared, then during the reset process in Qemu, I can't judge if
>> the reset request is from guest restore process or not, and then I can't change
>> the reset behavior.
> Reset should not be influenced by suspend.
> Suspend should do the work of suspend and reset to do the reset.
>
> The problem to overcome in [1] is, resume operation needs to be synchronous as it involves large part of context to resume back, and hence just asynchronously setting DRIVER_OK is not enough.
> The sw must verify back that device has resumed the operation and ready to answer requests.
this is not live migration, all device status and other information
still stay in the device, no need to "resume" context, just resume running.

Like resume from a failed LM.
>
> This is slightly different flow than setting the DRIVER_OK for the first time device initialization sequence as it does not involve large restoration.
>
> So, to merge two ideas, instead of doing DRIVER_OK to resume, the driver should clear the SUSPEND bit and verify that it is out of SUSPEND.
>
> Because driver is still in _OK_ driving the device flipping the SUSPEND bit.
Please read the spec, it says:
The driver MUST NOT clear a device status bit



2023-09-20 14:10:31

by Parav Pandit

[permalink] [raw]
Subject: RE: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg


> From: Zhu, Lingshan <[email protected]>
> Sent: Wednesday, September 20, 2023 12:37 PM

> > The problem to overcome in [1] is, resume operation needs to be synchronous
> as it involves large part of context to resume back, and hence just
> asynchronously setting DRIVER_OK is not enough.
> > The sw must verify back that device has resumed the operation and ready to
> answer requests.
> this is not live migration, all device status and other information still stay in the
> device, no need to "resume" context, just resume running.
>
I am aware that it is not live migration. :)

"Just resuming" involves lot of device setup task. The device implementation does not know for how long a device is suspended.
So for example, a VM is suspended for 6 hours, hence the device context could be saved in a slow disk.
Hence, when the resume is done, it needs to setup things again and driver got to verify before accessing more from the device.

> Like resume from a failed LM.
> >
> > This is slightly different flow than setting the DRIVER_OK for the first time
> device initialization sequence as it does not involve large restoration.
> >
> > So, to merge two ideas, instead of doing DRIVER_OK to resume, the driver
> should clear the SUSPEND bit and verify that it is out of SUSPEND.
> >
> > Because driver is still in _OK_ driving the device flipping the SUSPEND bit.
> Please read the spec, it says:
> The driver MUST NOT clear a device status bit
>
Yes, this is why either DRIER_OK validation by the driver is needed or Jiqian's synchronous new register..

2023-09-20 14:51:45

by Jiqian Chen

[permalink] [raw]
Subject: Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

Hi Lingshan,
It seems you reply to the wrong email thread. They are not related to my patch.

On 2023/9/20 15:06, Zhu, Lingshan wrote:
>
>
> On 9/20/2023 2:58 PM, Parav Pandit wrote:
>>> From: Chen, Jiqian <[email protected]>
>>> Sent: Wednesday, September 20, 2023 12:03 PM
>>> If driver write 0 to reset device, can the SUSPEND bit be cleared?
>> It must as reset operation, resets everything else and so the suspend too.
>>
>>> (pci_pm_resume->virtio_pci_restore->virtio_device_restore-
>>>> virtio_reset_device)
>>> If SUSPEND is cleared, then during the reset process in Qemu, I can't judge if
>>> the reset request is from guest restore process or not, and then I can't change
>>> the reset behavior.
>> Reset should not be influenced by suspend.
>> Suspend should do the work of suspend and reset to do the reset.
>>
>> The problem to overcome in [1] is, resume operation needs to be synchronous as it involves large part of context to resume back, and hence just asynchronously setting DRIVER_OK is not enough.
>> The sw must verify back that device has resumed the operation and ready to answer requests.
> this is not live migration, all device status and other information still stay in the device, no need to "resume" context, just resume running.
>
> Like resume from a failed LM.
>>
>> This is slightly different flow than setting the DRIVER_OK for the first time device initialization sequence as it does not involve large restoration.
>>
>> So, to merge two ideas, instead of doing DRIVER_OK to resume, the driver should clear the SUSPEND bit and verify that it is out of SUSPEND.
>>
>> Because driver is still in _OK_ driving the device flipping the SUSPEND bit.
> Please read the spec, it says:
> The driver MUST NOT clear a device status bit
>
>

--
Best regards,
Jiqian Chen.