2022-12-22 06:51:53

by Alvaro Karsz

[permalink] [raw]
Subject: Re: [RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command

Hi Jason,

Adding timeout to the cvq is a great idea IMO.

> - /* Spin for a response, the kick causes an ioport write, trapping
> - * into the hypervisor, so the request should be handled immediately.
> - */
> - while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> - !virtqueue_is_broken(vi->cvq))
> - cpu_relax();
> + virtqueue_wait_for_used(vi->cvq, &tmp);

Do you think that we should continue like nothing happened in case of a timeout?
Shouldn't we reset the device?
What happens if a device completes the control command after timeout?

Thanks

Alvaro


2022-12-22 08:53:55

by Jason Wang

[permalink] [raw]
Subject: Re: [RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command

Hi Alvaro:

On Thu, Dec 22, 2022 at 2:44 PM Alvaro Karsz <[email protected]> wrote:
>
> Hi Jason,
>
> Adding timeout to the cvq is a great idea IMO.
>
> > - /* Spin for a response, the kick causes an ioport write, trapping
> > - * into the hypervisor, so the request should be handled immediately.
> > - */
> > - while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > - !virtqueue_is_broken(vi->cvq))
> > - cpu_relax();
> > + virtqueue_wait_for_used(vi->cvq, &tmp);
>
> Do you think that we should continue like nothing happened in case of a timeout?

We could, but we should not depend on a device to do this since it's
not reliable. More below.

> Shouldn't we reset the device?

We can't depend on device, there's probably another loop in reset():

E.g in vp_reset() we had:

while (vp_modern_get_status(mdev))
msleep(1);

> What happens if a device completes the control command after timeout?

Maybe we could have a BAD_RING() here in this case (and more check in
vq->broken in this case).

Thanks

>
> Thanks
>
> Alvaro
>

2022-12-22 16:07:59

by Alvaro Karsz

[permalink] [raw]
Subject: Re: [RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command

My point is that the device may complete the control command after the timeout,
so, if I'm not mistaken, next time we send a control command and call
virtqueue_wait_for_used we'll get the previous response.

2022-12-23 04:23:55

by Jason Wang

[permalink] [raw]
Subject: Re: [RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command

On Thu, Dec 22, 2022 at 11:55 PM Alvaro Karsz
<[email protected]> wrote:
>
> My point is that the device may complete the control command after the timeout,

This needs to be proposed to the virtio spec first. And actually we
need more than this:

1) we still need a way to deal with the device without this feature
2) driver can't depend solely on what is advertised by the device (e.g
device can choose to advertise a very long timeout)

> so, if I'm not mistaken, next time we send a control command and call
> virtqueue_wait_for_used we'll get the previous response.
>

In the next version, I will first put BAD_RING() to prevent future
requests for cvq.

Note that the patch can't fix all the issues, we need more things on
top. But it's a good step and it will behave much better than the
current code.

Thanks

2022-12-23 08:09:49

by Alvaro Karsz

[permalink] [raw]
Subject: Re: [RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command

> This needs to be proposed to the virtio spec first. And actually we
> need more than this:
>
> 1) we still need a way to deal with the device without this feature
> 2) driver can't depend solely on what is advertised by the device (e.g
> device can choose to advertise a very long timeout)

I think that I wasn't clear, sorry.
I'm not talking about a new virtio feature, I'm talking about a situation when:
* virtio_net issues a control command.
* the device gets the command, but somehow, completes the command
after timeout.
* virtio_net assumes that the command failed (timeout), and issues a
different control command.
* virtio_net will then call virtqueue_wait_for_used, and will
immediately get the previous response (If I'm not wrong).

So, this is not a new feature that I'm proposing, just a situation
that may occur due to cvq timeouts.

Anyhow, your solution calling BAD_RING if we reach a timeout should
prevent this situation.

Thanks

2022-12-26 04:20:23

by Jason Wang

[permalink] [raw]
Subject: Re: [RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command

On Fri, Dec 23, 2022 at 3:39 PM Alvaro Karsz <[email protected]> wrote:
>
> > This needs to be proposed to the virtio spec first. And actually we
> > need more than this:
> >
> > 1) we still need a way to deal with the device without this feature
> > 2) driver can't depend solely on what is advertised by the device (e.g
> > device can choose to advertise a very long timeout)
>
> I think that I wasn't clear, sorry.
> I'm not talking about a new virtio feature, I'm talking about a situation when:
> * virtio_net issues a control command.
> * the device gets the command, but somehow, completes the command
> after timeout.
> * virtio_net assumes that the command failed (timeout), and issues a
> different control command.
> * virtio_net will then call virtqueue_wait_for_used, and will
> immediately get the previous response (If I'm not wrong).
>
> So, this is not a new feature that I'm proposing, just a situation
> that may occur due to cvq timeouts.
>
> Anyhow, your solution calling BAD_RING if we reach a timeout should
> prevent this situation.

Right, that is the goal.

Thanks

>
> Thanks
>