Hi Jason,
Adding timeout to the cvq is a great idea IMO.
> - /* Spin for a response, the kick causes an ioport write, trapping
> - * into the hypervisor, so the request should be handled immediately.
> - */
> - while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> - !virtqueue_is_broken(vi->cvq))
> - cpu_relax();
> + virtqueue_wait_for_used(vi->cvq, &tmp);
Do you think that we should continue like nothing happened in case of a timeout?
Shouldn't we reset the device?
What happens if a device completes the control command after timeout?
Thanks
Alvaro
Hi Alvaro:
On Thu, Dec 22, 2022 at 2:44 PM Alvaro Karsz <[email protected]> wrote:
>
> Hi Jason,
>
> Adding timeout to the cvq is a great idea IMO.
>
> > - /* Spin for a response, the kick causes an ioport write, trapping
> > - * into the hypervisor, so the request should be handled immediately.
> > - */
> > - while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > - !virtqueue_is_broken(vi->cvq))
> > - cpu_relax();
> > + virtqueue_wait_for_used(vi->cvq, &tmp);
>
> Do you think that we should continue like nothing happened in case of a timeout?
We could, but we should not depend on a device to do this since it's
not reliable. More below.
> Shouldn't we reset the device?
We can't depend on device, there's probably another loop in reset():
E.g in vp_reset() we had:
while (vp_modern_get_status(mdev))
msleep(1);
> What happens if a device completes the control command after timeout?
Maybe we could have a BAD_RING() here in this case (and more check in
vq->broken in this case).
Thanks
>
> Thanks
>
> Alvaro
>
My point is that the device may complete the control command after the timeout,
so, if I'm not mistaken, next time we send a control command and call
virtqueue_wait_for_used we'll get the previous response.
On Thu, Dec 22, 2022 at 11:55 PM Alvaro Karsz
<[email protected]> wrote:
>
> My point is that the device may complete the control command after the timeout,
This needs to be proposed to the virtio spec first. And actually we
need more than this:
1) we still need a way to deal with the device without this feature
2) driver can't depend solely on what is advertised by the device (e.g
device can choose to advertise a very long timeout)
> so, if I'm not mistaken, next time we send a control command and call
> virtqueue_wait_for_used we'll get the previous response.
>
In the next version, I will first put BAD_RING() to prevent future
requests for cvq.
Note that the patch can't fix all the issues, we need more things on
top. But it's a good step and it will behave much better than the
current code.
Thanks
> This needs to be proposed to the virtio spec first. And actually we
> need more than this:
>
> 1) we still need a way to deal with the device without this feature
> 2) driver can't depend solely on what is advertised by the device (e.g
> device can choose to advertise a very long timeout)
I think that I wasn't clear, sorry.
I'm not talking about a new virtio feature, I'm talking about a situation when:
* virtio_net issues a control command.
* the device gets the command, but somehow, completes the command
after timeout.
* virtio_net assumes that the command failed (timeout), and issues a
different control command.
* virtio_net will then call virtqueue_wait_for_used, and will
immediately get the previous response (If I'm not wrong).
So, this is not a new feature that I'm proposing, just a situation
that may occur due to cvq timeouts.
Anyhow, your solution calling BAD_RING if we reach a timeout should
prevent this situation.
Thanks
On Fri, Dec 23, 2022 at 3:39 PM Alvaro Karsz <[email protected]> wrote:
>
> > This needs to be proposed to the virtio spec first. And actually we
> > need more than this:
> >
> > 1) we still need a way to deal with the device without this feature
> > 2) driver can't depend solely on what is advertised by the device (e.g
> > device can choose to advertise a very long timeout)
>
> I think that I wasn't clear, sorry.
> I'm not talking about a new virtio feature, I'm talking about a situation when:
> * virtio_net issues a control command.
> * the device gets the command, but somehow, completes the command
> after timeout.
> * virtio_net assumes that the command failed (timeout), and issues a
> different control command.
> * virtio_net will then call virtqueue_wait_for_used, and will
> immediately get the previous response (If I'm not wrong).
>
> So, this is not a new feature that I'm proposing, just a situation
> that may occur due to cvq timeouts.
>
> Anyhow, your solution calling BAD_RING if we reach a timeout should
> prevent this situation.
Right, that is the goal.
Thanks
>
> Thanks
>