2023-09-20 10:39:37

by Jiqian Chen

[permalink] [raw]
Subject: Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

Hi Lingshan,

On 2023/9/20 13:59, Zhu, Lingshan wrote:
>
>
> On 9/19/2023 8:31 PM, Michael S. Tsirkin wrote:
>> On Tue, Sep 19, 2023 at 07:42:42PM +0800, Jiqian Chen wrote:
>>> When guest vm does S3, Qemu will reset and clear some things of virtio
>>> devices, but guest can't aware that, so that may cause some problems.
>>> For excample, Qemu calls virtio_reset->virtio_gpu_gl_reset when guest
>>> resume, that function will destroy render resources of virtio-gpu. As
>>> a result, after guest resume, the display can't come back and we only
>>> saw a black screen. Due to guest can't re-create all the resources, so
>>> we need to let Qemu not to destroy them when S3.
>>>
>>> For above purpose, we need a mechanism that allows guests and QEMU to
>>> negotiate their reset behavior. So this patch add a new parameter
>>> named freeze_mode to struct virtio_pci_common_cfg. And when guest
>>> suspends, it can write freeze_mode to be FREEZE_S3, and then virtio
>>> devices can change their reset behavior on Qemu side according to
>>> freeze_mode. What's more, freeze_mode can be used for all virtio
>>> devices to affect the behavior of Qemu, not just virtio gpu device.
> Hi Jiqian,
>
> Have you seen this series: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
> https://lore.kernel.org/all/[email protected]/T/
>
> We introduced a bit in the device status SUSPEND, when VIRTIO_F_SUSPEND is
> negotiated, the driver can set SUSPEND in the device status to suspend the
> device.
>
> When SUSPEND, the device should pause its operations and preserve its configurations
> in its configuration space.
>
> The driver re-write DRIVER_OK to clear SUSPEND, so the device resumes running.
>
> This is originally to serve live migration, but I think it can also meet your needs.
I noticed your series, but I am not sure they are also meet my needs.
If driver write 0 to reset device, can the SUSPEND bit be cleared? (pci_pm_resume->virtio_pci_restore->virtio_device_restore->virtio_reset_device)
If SUSPEND is cleared, then during the reset process in Qemu, I can't judge if the reset request is from guest restore process or not, and then I can't change the reset behavior.
Can you send me your patch link on kernel and qemu side? I will take a deep look.

>
> Thanks,
> Zhu Lingshan
>>>
>>> Signed-off-by: Jiqian Chen <[email protected]>
>>> ---
>>>   transport-pci.tex | 7 +++++++
>>>   1 file changed, 7 insertions(+)
>>>
>>> diff --git a/transport-pci.tex b/transport-pci.tex
>>> index a5c6719..2543536 100644
>>> --- a/transport-pci.tex
>>> +++ b/transport-pci.tex
>>> @@ -319,6 +319,7 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>>           le64 queue_desc;                /* read-write */
>>>           le64 queue_driver;              /* read-write */
>>>           le64 queue_device;              /* read-write */
>>> +        le16 freeze_mode;               /* read-write */
>>>           le16 queue_notif_config_data;   /* read-only for driver */
>>>           le16 queue_reset;               /* read-write */
>>>
>> we can't add fields in the middle of the structure like this -
>> offset of queue_notif_config_data and queue_reset changes.
>>
>>   
>>> @@ -393,6 +394,12 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>>   \item[\field{queue_device}]
>>>           The driver writes the physical address of Device Area here.  See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
>>>   +\item[\field{freeze_mode}]
>>> +        The driver writes this to set the freeze mode of virtio pci.
>>> +        VIRTIO_PCI_FREEZE_MODE_UNFREEZE - virtio-pci is running;
>>> +        VIRTIO_PCI_FREEZE_MODE_FREEZE_S3 - guest vm is doing S3, and virtio-pci enters S3 suspension;
>>> +        Other values are reserved for future use, like S4, etc.
>>> +
>> we need to specify these values then.
>>
>> we also need
>> - feature bit to detect support for S3
>> - conformance statements documenting behavious under S3
>>
>>
>>>   \item[\field{queue_notif_config_data}]
>>>           This field exists only if VIRTIO_F_NOTIF_CONFIG_DATA has been negotiated.
>>>           The driver will use this value when driver sends available buffer
>>> -- 
>>> 2.34.1
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: [email protected]
>> Unsubscribe: [email protected]
>> List help: [email protected]
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

--
Best regards,
Jiqian Chen.


2023-09-20 12:33:24

by Parav Pandit

[permalink] [raw]
Subject: RE: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg


> From: Chen, Jiqian <[email protected]>
> Sent: Wednesday, September 20, 2023 12:03 PM

> If driver write 0 to reset device, can the SUSPEND bit be cleared?
It must as reset operation, resets everything else and so the suspend too.

> (pci_pm_resume->virtio_pci_restore->virtio_device_restore-
> >virtio_reset_device)
> If SUSPEND is cleared, then during the reset process in Qemu, I can't judge if
> the reset request is from guest restore process or not, and then I can't change
> the reset behavior.
Reset should not be influenced by suspend.
Suspend should do the work of suspend and reset to do the reset.

The problem to overcome in [1] is, resume operation needs to be synchronous as it involves large part of context to resume back, and hence just asynchronously setting DRIVER_OK is not enough.
The sw must verify back that device has resumed the operation and ready to answer requests.

This is slightly different flow than setting the DRIVER_OK for the first time device initialization sequence as it does not involve large restoration.

So, to merge two ideas, instead of doing DRIVER_OK to resume, the driver should clear the SUSPEND bit and verify that it is out of SUSPEND.

Because driver is still in _OK_ driving the device flipping the SUSPEND bit.