Check vring size and fail probe if a transmit/receive vring size is
smaller than MAX_SKB_FRAGS + 2.
At the moment, any vring size is accepted. This is problematic because
it may result in attempting to transmit a packet with more fragments
than there are descriptors in the ring.
Furthermore, it leads to an immediate bug:
The condition: (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) in
virtnet_poll_cleantx and virtnet_poll_tx always evaluates to false,
so netif_tx_wake_queue is not called, leading to TX timeouts.
Signed-off-by: Alvaro Karsz <[email protected]>
---
drivers/net/virtio_net.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2396c28c012..59676252c5c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3745,6 +3745,26 @@ static int init_vqs(struct virtnet_info *vi)
return ret;
}
+static int virtnet_validate_vqs(struct virtnet_info *vi)
+{
+ u32 i, min_size = roundup_pow_of_two(MAX_SKB_FRAGS + 2);
+
+ /* Transmit/Receive vring size must be at least MAX_SKB_FRAGS + 2
+ * (fragments + linear part + virtio header)
+ */
+ for (i = 0; i < vi->max_queue_pairs; i++) {
+ if (virtqueue_get_vring_size(vi->sq[i].vq) < min_size ||
+ virtqueue_get_vring_size(vi->rq[i].vq) < min_size) {
+ dev_warn(&vi->vdev->dev,
+ "Transmit/Receive virtqueue vring size must be at least %u\n",
+ min_size);
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+
#ifdef CONFIG_SYSFS
static ssize_t mergeable_rx_buffer_size_show(struct netdev_rx_queue *queue,
char *buf)
@@ -4056,6 +4076,10 @@ static int virtnet_probe(struct virtio_device *vdev)
if (err)
goto free;
+ err = virtnet_validate_vqs(vi);
+ if (err)
+ goto free_vqs;
+
#ifdef CONFIG_SYSFS
if (vi->mergeable_rx_bufs)
dev->sysfs_rx_queue_group = &virtio_net_mrg_rx_group;
--
2.34.1
After further consideration, other virtio drivers need a minimum limit to the vring size too.
Maybe this can be more general, for example a new virtio_driver callback that is called (if implemented) during virtio_dev_probe, before drv->probe.
What do you think?
Thanks,
Alvaro
On Sun, Apr 16, 2023 at 10:46:07AM +0300, Alvaro Karsz wrote:
> Check vring size and fail probe if a transmit/receive vring size is
> smaller than MAX_SKB_FRAGS + 2.
>
> At the moment, any vring size is accepted. This is problematic because
> it may result in attempting to transmit a packet with more fragments
> than there are descriptors in the ring.
>
> Furthermore, it leads to an immediate bug:
>
> The condition: (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) in
> virtnet_poll_cleantx and virtnet_poll_tx always evaluates to false,
> so netif_tx_wake_queue is not called, leading to TX timeouts.
>
> Signed-off-by: Alvaro Karsz <[email protected]>
> ---
> drivers/net/virtio_net.c | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 2396c28c012..59676252c5c 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3745,6 +3745,26 @@ static int init_vqs(struct virtnet_info *vi)
> return ret;
> }
>
> +static int virtnet_validate_vqs(struct virtnet_info *vi)
> +{
> + u32 i, min_size = roundup_pow_of_two(MAX_SKB_FRAGS + 2);
why power of two?
> +
> + /* Transmit/Receive vring size must be at least MAX_SKB_FRAGS + 2
> + * (fragments + linear part + virtio header)
> + */
> + for (i = 0; i < vi->max_queue_pairs; i++) {
> + if (virtqueue_get_vring_size(vi->sq[i].vq) < min_size ||
> + virtqueue_get_vring_size(vi->rq[i].vq) < min_size) {
> + dev_warn(&vi->vdev->dev,
> + "Transmit/Receive virtqueue vring size must be at least %u\n",
> + min_size);
> + return -EINVAL;
> + }
> + }
> +
> + return 0;
> +}
> +
> #ifdef CONFIG_SYSFS
> static ssize_t mergeable_rx_buffer_size_show(struct netdev_rx_queue *queue,
> char *buf)
> @@ -4056,6 +4076,10 @@ static int virtnet_probe(struct virtio_device *vdev)
> if (err)
> goto free;
>
> + err = virtnet_validate_vqs(vi);
> + if (err)
> + goto free_vqs;
> +
> #ifdef CONFIG_SYSFS
> if (vi->mergeable_rx_bufs)
> dev->sysfs_rx_queue_group = &virtio_net_mrg_rx_group;
> --
> 2.34.1
On Sun, Apr 16, 2023 at 04:54:57PM +0000, Alvaro Karsz wrote:
> After further consideration, other virtio drivers need a minimum limit to the vring size too.
>
> Maybe this can be more general, for example a new virtio_driver callback that is called (if implemented) during virtio_dev_probe, before drv->probe.
>
> What do you think?
>
> Thanks,
> Alvaro
Let's start with what you did here, when more than 2 drivers do it we'll
move it to core.
--
MST
On Sun, 16 Apr 2023 10:46:07 +0300, Alvaro Karsz <[email protected]> wrote:
> Check vring size and fail probe if a transmit/receive vring size is
> smaller than MAX_SKB_FRAGS + 2.
>
> At the moment, any vring size is accepted. This is problematic because
> it may result in attempting to transmit a packet with more fragments
> than there are descriptors in the ring.
So, why we check the rx ring?
Thanks.
>
> Furthermore, it leads to an immediate bug:
>
> The condition: (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) in
> virtnet_poll_cleantx and virtnet_poll_tx always evaluates to false,
> so netif_tx_wake_queue is not called, leading to TX timeouts.
>
> Signed-off-by: Alvaro Karsz <[email protected]>
> ---
> drivers/net/virtio_net.c | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 2396c28c012..59676252c5c 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3745,6 +3745,26 @@ static int init_vqs(struct virtnet_info *vi)
> return ret;
> }
>
> +static int virtnet_validate_vqs(struct virtnet_info *vi)
> +{
> + u32 i, min_size = roundup_pow_of_two(MAX_SKB_FRAGS + 2);
> +
> + /* Transmit/Receive vring size must be at least MAX_SKB_FRAGS + 2
> + * (fragments + linear part + virtio header)
> + */
> + for (i = 0; i < vi->max_queue_pairs; i++) {
> + if (virtqueue_get_vring_size(vi->sq[i].vq) < min_size ||
> + virtqueue_get_vring_size(vi->rq[i].vq) < min_size) {
> + dev_warn(&vi->vdev->dev,
> + "Transmit/Receive virtqueue vring size must be at least %u\n",
> + min_size);
> + return -EINVAL;
> + }
> + }
> +
> + return 0;
> +}
> +
> #ifdef CONFIG_SYSFS
> static ssize_t mergeable_rx_buffer_size_show(struct netdev_rx_queue *queue,
> char *buf)
> @@ -4056,6 +4076,10 @@ static int virtnet_probe(struct virtio_device *vdev)
> if (err)
> goto free;
>
> + err = virtnet_validate_vqs(vi);
> + if (err)
> + goto free_vqs;
> +
> #ifdef CONFIG_SYSFS
> if (vi->mergeable_rx_bufs)
> dev->sysfs_rx_queue_group = &virtio_net_mrg_rx_group;
> --
> 2.34.1
>
On Mon, Apr 17, 2023 at 4:45 AM Michael S. Tsirkin <[email protected]> wrote:
>
> On Sun, Apr 16, 2023 at 04:54:57PM +0000, Alvaro Karsz wrote:
> > After further consideration, other virtio drivers need a minimum limit to the vring size too.
> >
> > Maybe this can be more general, for example a new virtio_driver callback that is called (if implemented) during virtio_dev_probe, before drv->probe.
> >
> > What do you think?
> >
> > Thanks,
> > Alvaro
>
> Let's start with what you did here, when more than 2 drivers do it we'll
> move it to core.
I wonder how hard it is to let virtio support small vring size?
Thanks
>
> --
> MST
>
On Sun, 16 Apr 2023 10:46:07 +0300, Alvaro Karsz <[email protected]> wrote:
> Check vring size and fail probe if a transmit/receive vring size is
> smaller than MAX_SKB_FRAGS + 2.
>
> At the moment, any vring size is accepted. This is problematic because
> it may result in attempting to transmit a packet with more fragments
> than there are descriptors in the ring.
>
> Furthermore, it leads to an immediate bug:
>
> The condition: (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) in
> virtnet_poll_cleantx and virtnet_poll_tx always evaluates to false,
> so netif_tx_wake_queue is not called, leading to TX timeouts.
>
> Signed-off-by: Alvaro Karsz <[email protected]>
> ---
> drivers/net/virtio_net.c | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 2396c28c012..59676252c5c 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3745,6 +3745,26 @@ static int init_vqs(struct virtnet_info *vi)
> return ret;
> }
>
> +static int virtnet_validate_vqs(struct virtnet_info *vi)
> +{
> + u32 i, min_size = roundup_pow_of_two(MAX_SKB_FRAGS + 2);
> +
> + /* Transmit/Receive vring size must be at least MAX_SKB_FRAGS + 2
> + * (fragments + linear part + virtio header)
> + */
> + for (i = 0; i < vi->max_queue_pairs; i++) {
> + if (virtqueue_get_vring_size(vi->sq[i].vq) < min_size ||
> + virtqueue_get_vring_size(vi->rq[i].vq) < min_size) {
> + dev_warn(&vi->vdev->dev,
> + "Transmit/Receive virtqueue vring size must be at least %u\n",
> + min_size);
> + return -EINVAL;
> + }
> + }
> +
> + return 0;
> +}
> +
> #ifdef CONFIG_SYSFS
> static ssize_t mergeable_rx_buffer_size_show(struct netdev_rx_queue *queue,
> char *buf)
> @@ -4056,6 +4076,10 @@ static int virtnet_probe(struct virtio_device *vdev)
> if (err)
> goto free;
>
> + err = virtnet_validate_vqs(vi);
> + if (err)
> + goto free_vqs;
> +
I wonder whether is better moving this to virtnet_find_vqs?
Thanks
> #ifdef CONFIG_SYSFS
> if (vi->mergeable_rx_bufs)
> dev->sysfs_rx_queue_group = &virtio_net_mrg_rx_group;
> --
> 2.34.1
>
On Mon, Apr 17, 2023 at 11:24:16AM +0800, Jason Wang wrote:
> On Mon, Apr 17, 2023 at 4:45 AM Michael S. Tsirkin <[email protected]> wrote:
> >
> > On Sun, Apr 16, 2023 at 04:54:57PM +0000, Alvaro Karsz wrote:
> > > After further consideration, other virtio drivers need a minimum limit to the vring size too.
> > >
> > > Maybe this can be more general, for example a new virtio_driver callback that is called (if implemented) during virtio_dev_probe, before drv->probe.
> > >
> > > What do you think?
> > >
> > > Thanks,
> > > Alvaro
> >
> > Let's start with what you did here, when more than 2 drivers do it we'll
> > move it to core.
>
> I wonder how hard it is to let virtio support small vring size?
>
> Thanks
Actually, I think that all you need to do is disable NETIF_F_SG,
and things will work, no?
Alvaro, can you try?
> >
> > --
> > MST
> >
> Actually, I think that all you need to do is disable NETIF_F_SG,
> and things will work, no?
I think that this is not so simple, if I understand correctly, by disabling NETIF_F_SG we will never receive a chained skbs to transmit, but we still have more functionality to address, for example:
* The TX timeouts.
* Guest GSO/big MTU (without VIRTIO_NET_F_MRG_RXBUF?), we can't chain page size buffers anymore.
> Alvaro, can you try?
It won't matter at the moment, we'll get TX timeout after the first tx packet, we need to address this part as well.
On Mon, Apr 17, 2023 at 06:38:43AM +0000, Alvaro Karsz wrote:
> > Actually, I think that all you need to do is disable NETIF_F_SG,
> > and things will work, no?
>
> I think that this is not so simple, if I understand correctly, by disabling NETIF_F_SG we will never receive a chained skbs to transmit, but we still have more functionality to address, for example:
> * The TX timeouts.
I don't get it. With a linear skb we can transmit it as long as there's
space for 2 entries in the vq: header and data. What's the source of the
timeouts?
> * Guest GSO/big MTU (without VIRTIO_NET_F_MRG_RXBUF?), we can't chain page size buffers anymore.
I think we can. mergeable_min_buf_len will just be large.
> > Alvaro, can you try?
>
> It won't matter at the moment, we'll get TX timeout after the first tx packet, we need to address this part as well.
> > +static int virtnet_validate_vqs(struct virtnet_info *vi)
> > +{
> > + u32 i, min_size = roundup_pow_of_two(MAX_SKB_FRAGS + 2);
>
> why power of two?
The ring size is always a power of 2, so checking against MAX_SKB_FRAGS + 2 or against roundup_pow_of_two will result in the same, and I think that printing the warning with the actual min value is more helpful.
I can check the condition against MAX_SKB_FRAGS + 2, and print the rounded value in case of an error.
On Mon, 17 Apr 2023 06:38:43 +0000, Alvaro Karsz <[email protected]> wrote:
> > Actually, I think that all you need to do is disable NETIF_F_SG,
> > and things will work, no?
>
> I think that this is not so simple, if I understand correctly, by disabling NETIF_F_SG we will never receive a chained skbs to transmit, but we still have more functionality to address, for example:
> * The TX timeouts.
Why tx timeout without frags?
> * Guest GSO/big MTU (without VIRTIO_NET_F_MRG_RXBUF?), we can't chain page size buffers anymore.
Or, we disable the GUEST_GSO, HOST_GSO......
Thanks.
>
> > Alvaro, can you try?
>
> It won't matter at the moment, we'll get TX timeout after the first tx packet, we need to address this part as well.
> > Check vring size and fail probe if a transmit/receive vring size is
> > smaller than MAX_SKB_FRAGS + 2.
> >
> > At the moment, any vring size is accepted. This is problematic because
> > it may result in attempting to transmit a packet with more fragments
> > than there are descriptors in the ring.
>
> So, why we check the rx ring?
>
You're right, the rx check should be a little more complicated.
It depends on the negotiated features, like VIRTIO_NET_F_MTU, any guest GSO, VIRTIO_NET_F_MRG_RXBUF.
But MAX_SKB_FRAGS + 2 covers all the rx scenarios.
We may be able to accept smaller rx rings if for example none of the above features are negotiated.
If you think that this is necessary, we can do a more complex rx check.
> > > Actually, I think that all you need to do is disable NETIF_F_SG,
> > > and things will work, no?
> >
> > I think that this is not so simple, if I understand correctly, by disabling NETIF_F_SG we will never receive a chained skbs to transmit, but we still have more functionality to address, for example:
> > * The TX timeouts.
>
> I don't get it. With a linear skb we can transmit it as long as there's
> space for 2 entries in the vq: header and data. What's the source of the
> timeouts?
>
I'm not saying that this is not possible, I meant that we need more changes to virtio-net.
The source of the timeouts is from the current implementation of virtnet_poll_tx.
if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
netif_tx_wake_queue(txq);
> > * Guest GSO/big MTU (without VIRTIO_NET_F_MRG_RXBUF?), we can't chain page size buffers anymore.
>
> I think we can. mergeable_min_buf_len will just be large.
>
I meant that we can't just by clearing NETIF_F_SG, we'll need to change virtio-net a little bit more, for example, the virtnet_set_big_packets function.
> Why tx timeout without frags?
Please see my response to Michael.
> > * Guest GSO/big MTU (without VIRTIO_NET_F_MRG_RXBUF?), we can't chain page size buffers anymore.
>
>
> Or, we disable the GUEST_GSO, HOST_GSO......
>
And disable VIRTIO_NET_F_MTU, quoting the spec:
"A driver SHOULD negotiate VIRTIO_NET_F_MTU if the device offers it."
We can find a way around using buffers bigger than a page size like Michael implied.
On Mon, Apr 17, 2023 at 07:03:52AM +0000, Alvaro Karsz wrote:
> > > > Actually, I think that all you need to do is disable NETIF_F_SG,
> > > > and things will work, no?
> > >
> > > I think that this is not so simple, if I understand correctly, by disabling NETIF_F_SG we will never receive a chained skbs to transmit, but we still have more functionality to address, for example:
> > > * The TX timeouts.
> >
> > I don't get it. With a linear skb we can transmit it as long as there's
> > space for 2 entries in the vq: header and data. What's the source of the
> > timeouts?
> >
>
> I'm not saying that this is not possible, I meant that we need more changes to virtio-net.
> The source of the timeouts is from the current implementation of virtnet_poll_tx.
>
> if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> netif_tx_wake_queue(txq);
Oh right. So this should check NETIF_F_SG then.
BTW both ring size and s/g can be tweaked by ethtool, also
needs handling.
>
> > > * Guest GSO/big MTU (without VIRTIO_NET_F_MRG_RXBUF?), we can't chain page size buffers anymore.
> >
> > I think we can. mergeable_min_buf_len will just be large.
> >
>
> I meant that we can't just by clearing NETIF_F_SG, we'll need to change virtio-net a little bit more, for example, the virtnet_set_big_packets function.
>
Right - for RX, big_packets_num_skbfrags ignores ring size and that's
probably a bug if mtu is very large.
--
MST
On Mon, Apr 17, 2023 at 07:07:59AM +0000, Alvaro Karsz wrote:
> > Why tx timeout without frags?
>
> Please see my response to Michael.
>
> > > * Guest GSO/big MTU (without VIRTIO_NET_F_MRG_RXBUF?), we can't chain page size buffers anymore.
> >
> >
> > Or, we disable the GUEST_GSO, HOST_GSO......
> >
> And disable VIRTIO_NET_F_MTU, quoting the spec:
> "A driver SHOULD negotiate VIRTIO_NET_F_MTU if the device offers it."
If you don't the nic can still get jumbo packets it will just drop them ...
> We can find a way around using buffers bigger than a page size like Michael implied.
> > > > > Actually, I think that all you need to do is disable NETIF_F_SG,
> > > > > and things will work, no?
> > > >
> > > > I think that this is not so simple, if I understand correctly, by disabling NETIF_F_SG we will never receive a chained skbs to transmit, but we still have more functionality to address, for example:
> > > > * The TX timeouts.
> > >
> > > I don't get it. With a linear skb we can transmit it as long as there's
> > > space for 2 entries in the vq: header and data. What's the source of the
> > > timeouts?
> > >
> >
> > I'm not saying that this is not possible, I meant that we need more changes to virtio-net.
> > The source of the timeouts is from the current implementation of virtnet_poll_tx.
> >
> > if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> > netif_tx_wake_queue(txq);
>
> Oh right. So this should check NETIF_F_SG then.
> BTW both ring size and s/g can be tweaked by ethtool, also
> needs handling.
>
Good point.
> >
> > > > * Guest GSO/big MTU (without VIRTIO_NET_F_MRG_RXBUF?), we can't chain page size buffers anymore.
> > >
> > > I think we can. mergeable_min_buf_len will just be large.
> > >
> >
> > I meant that we can't just by clearing NETIF_F_SG, we'll need to change virtio-net a little bit more, for example, the virtnet_set_big_packets function.
> >
>
> Right - for RX, big_packets_num_skbfrags ignores ring size and that's
> probably a bug if mtu is very large.
>
So, what do you think, we should fix virtio-net to work with smaller rings? we should fail probe?
I think that since this never came up until now, there is no big demand to such small rings.
On Mon, Apr 17, 2023 at 07:33:58AM +0000, Alvaro Karsz wrote:
> > > > > > Actually, I think that all you need to do is disable NETIF_F_SG,
> > > > > > and things will work, no?
> > > > >
> > > > > I think that this is not so simple, if I understand correctly, by disabling NETIF_F_SG we will never receive a chained skbs to transmit, but we still have more functionality to address, for example:
> > > > > * The TX timeouts.
> > > >
> > > > I don't get it. With a linear skb we can transmit it as long as there's
> > > > space for 2 entries in the vq: header and data. What's the source of the
> > > > timeouts?
> > > >
> > >
> > > I'm not saying that this is not possible, I meant that we need more changes to virtio-net.
> > > The source of the timeouts is from the current implementation of virtnet_poll_tx.
> > >
> > > if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> > > netif_tx_wake_queue(txq);
> >
> > Oh right. So this should check NETIF_F_SG then.
> > BTW both ring size and s/g can be tweaked by ethtool, also
> > needs handling.
> >
>
> Good point.
>
> > >
> > > > > * Guest GSO/big MTU (without VIRTIO_NET_F_MRG_RXBUF?), we can't chain page size buffers anymore.
> > > >
> > > > I think we can. mergeable_min_buf_len will just be large.
> > > >
> > >
> > > I meant that we can't just by clearing NETIF_F_SG, we'll need to change virtio-net a little bit more, for example, the virtnet_set_big_packets function.
> > >
> >
> > Right - for RX, big_packets_num_skbfrags ignores ring size and that's
> > probably a bug if mtu is very large.
> >
>
> So, what do you think, we should fix virtio-net to work with smaller rings? we should fail probe?
>
> I think that since this never came up until now, there is no big demand to such small rings.
The worry is that once we start failing probe there's just a tiny chance
hosts begin to rely on us failing probe then we won't be able to fix it.
So it depends on the size of the patch I think. So far it seems small enough
that wasting code on failing probe isn't worth it.
--
MST
> > So, what do you think, we should fix virtio-net to work with smaller rings? we should fail probe?
> >
> > I think that since this never came up until now, there is no big demand to such small rings.
>
> The worry is that once we start failing probe there's just a tiny chance
> hosts begin to rely on us failing probe then we won't be able to fix it.
> So it depends on the size of the patch I think. So far it seems small enough
> that wasting code on failing probe isn't worth it.
>
I see your point.
Regardless, we'll need to fail probe in some cases.
ring size of 1 for example (if I'm not mistaken)
control vq even needs a bigger ring.
Maybe we can fix virtnet to allow smaller rings + fail probe in some cases, all in the same patch/patchset.
> > I see your point.
> > Regardless, we'll need to fail probe in some cases.
> > ring size of 1 for example (if I'm not mistaken)
>
> Hmm. We can make it work if we increase hard header size, then
> there will always be room for vnet header.
>
> > control vq even needs a bigger ring.
>
> Why does it?
At the moment, most of the commands chain 3 descriptors:
1 - class + command
2 - command specific
3 - ack
We could merge 1 and 2 into a single one, both are read only for the device, so I take it back, it won't need a bigger ring.
But it will need 2 descriptors at least(1 read only for the device and 1 write only for the device), so we still need to fail probe sometimes.
On Mon, Apr 17, 2023 at 10:04:56AM +0000, Alvaro Karsz wrote:
> > > So, what do you think, we should fix virtio-net to work with smaller rings? we should fail probe?
> > >
> > > I think that since this never came up until now, there is no big demand to such small rings.
> >
> > The worry is that once we start failing probe there's just a tiny chance
> > hosts begin to rely on us failing probe then we won't be able to fix it.
> > So it depends on the size of the patch I think. So far it seems small enough
> > that wasting code on failing probe isn't worth it.
> >
>
> I see your point.
> Regardless, we'll need to fail probe in some cases.
> ring size of 1 for example (if I'm not mistaken)
Hmm. We can make it work if we increase hard header size, then
there will always be room for vnet header.
> control vq even needs a bigger ring.
Why does it?
> Maybe we can fix virtnet to allow smaller rings + fail probe in some cases, all in the same patch/patchset.
If we can't make it work then yes.
--
MST
On Mon, Apr 17, 2023 at 11:51:22AM +0000, Alvaro Karsz wrote:
> > > I see your point.
> > > Regardless, we'll need to fail probe in some cases.
> > > ring size of 1 for example (if I'm not mistaken)
> >
> > Hmm. We can make it work if we increase hard header size, then
> > there will always be room for vnet header.
> >
> > > control vq even needs a bigger ring.
> >
> > Why does it?
>
> At the moment, most of the commands chain 3 descriptors:
> 1 - class + command
> 2 - command specific
> 3 - ack
>
> We could merge 1 and 2 into a single one, both are read only for the device, so I take it back, it won't need a bigger ring.
> But it will need 2 descriptors at least(1 read only for the device and 1 write only for the device), so we still need to fail probe sometimes.
>
Yes that makes sense, it's architetural. We can disable ctrl vq though.
--
MST
On Sun, Apr 23, 2023 at 06:51:46AM +0000, Alvaro Karsz wrote:
> > Yes that makes sense, it's architetural. We can disable ctrl vq though.
>
> The problem here is that we know the vring size after calling virtnet_find_vqs, so the number of VQs already includes the control VQ.
>
> Actually, many variables/settings that are initialized before we call virtnet_find_vqs may need modifications if we use small vrings.
> For example has_rss_hash_report, has_rss, hdr_len etc..
>
> We could have a fixup function to fix everything after we discover that we are using small vrings, but, honestly, I think that this will be hard to maintain in the future, and I don't like this approach much.
>
> The ideal thing will be to discover if we use small vrings in probe's beginning.
>
> I'm looking for a way at the moment.
Hmm. I was wrong. There is no way to disable CVQ feature bit.
1. Reset the device.
2. Set the ACKNOWLEDGE status bit: the guest OS has notice the device.
3. Set the DRIVER status bit: the guest OS knows how to drive the device.
4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the
device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration
fields to check that it can support the device before accepting it.
5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step.
6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not
support our subset of features and the device is unusable.
7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup,
reading and possibly writing the device’s virtio configuration space, and population of virtqueues.
8. Set the DRIVER_OK status bit. At this point the device is “live”.
So features are confirmed before find vqs.
The rest of stuff can probably just be moved to after find_vqs without
much pain.
So if cvq is too small we can either
- probe but avoid using cvq
or
- fail probe
--
MST
> Yes that makes sense, it's architetural. We can disable ctrl vq though.
The problem here is that we know the vring size after calling virtnet_find_vqs, so the number of VQs already includes the control VQ.
Actually, many variables/settings that are initialized before we call virtnet_find_vqs may need modifications if we use small vrings.
For example has_rss_hash_report, has_rss, hdr_len etc..
We could have a fixup function to fix everything after we discover that we are using small vrings, but, honestly, I think that this will be hard to maintain in the future, and I don't like this approach much.
The ideal thing will be to discover if we use small vrings in probe's beginning.
I'm looking for a way at the moment.
> Hmm. I was wrong. There is no way to disable CVQ feature bit.
>
> 1. Reset the device.
> 2. Set the ACKNOWLEDGE status bit: the guest OS has notice the device.
> 3. Set the DRIVER status bit: the guest OS knows how to drive the device.
> 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the
> device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration
> fields to check that it can support the device before accepting it.
> 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step.
> 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not
> support our subset of features and the device is unusable.
> 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-> bus setup,
> reading and possibly writing the device?s virtio configuration space, and population of virtqueues.
> 8. Set the DRIVER_OK status bit. At this point the device is ?live?.
>
>
> So features are confirmed before find vqs.
>
> The rest of stuff can probably just be moved to after find_vqs without
> much pain.
>
Actually, I think that with a little bit of pain :)
If we use small vrings and a GRO feature bit is set, Linux will need to allocate 64KB of continuous memory for every receive descriptor..
Instead of failing probe if GRO/CVQ are set, can we just reset the device if we discover small vrings and start over?
Can we remember that this device uses small vrings, and then just avoid negotiating the features that we cannot support?
We could add a new virtio_config_ops: peek_vqs.
We can call it during virtnet_validate, and then fixup the features in case of small vrings.
If peek_vqs is not implemented by the transport, we can just fail probe later in case of small vrings.
On Sun, Apr 23, 2023 at 07:52:10AM +0000, Alvaro Karsz wrote:
> > Hmm. I was wrong. There is no way to disable CVQ feature bit.
> >
> > 1. Reset the device.
> > 2. Set the ACKNOWLEDGE status bit: the guest OS has notice the device.
> > 3. Set the DRIVER status bit: the guest OS knows how to drive the device.
> > 4. Read device feature bits, and write the subset of feature bits understood by the OS and driver to the
> > device. During this step the driver MAY read (but MUST NOT write) the device-specific configuration
> > fields to check that it can support the device before accepting it.
> > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits after this step.
> > 6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, the device does not
> > support our subset of features and the device is unusable.
> > 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-> bus setup,
> > reading and possibly writing the device’s virtio configuration space, and population of virtqueues.
> > 8. Set the DRIVER_OK status bit. At this point the device is “live”.
> >
> >
> > So features are confirmed before find vqs.
> >
> > The rest of stuff can probably just be moved to after find_vqs without
> > much pain.
> >
> Actually, I think that with a little bit of pain :)
> If we use small vrings and a GRO feature bit is set, Linux will need to allocate 64KB of continuous memory for every receive descriptor..
Oh right. Hmm. Well this is same as big packets though, isn't it?
> Instead of failing probe if GRO/CVQ are set, can we just reset the device if we discover small vrings and start over?
> Can we remember that this device uses small vrings, and then just avoid negotiating the features that we cannot support?
We technically can of course. I am just not sure supporting CVQ with just 1 s/g entry will
ever be viable.
--
MST
On Sun, Apr 23, 2023 at 08:01:35AM +0000, Alvaro Karsz wrote:
> We could add a new virtio_config_ops: peek_vqs.
> We can call it during virtnet_validate, and then fixup the features in case of small vrings.
>
> If peek_vqs is not implemented by the transport, we can just fail probe later in case of small vrings.
>
Nope, we can't. Driver is not supposed to discover vqs before
FEATURES_OK, the vq size might depend on features.
On Mon, Apr 17, 2023 at 06:43:39AM +0000, Alvaro Karsz wrote:
> > > +static int virtnet_validate_vqs(struct virtnet_info *vi)
> > > +{
> > > + u32 i, min_size = roundup_pow_of_two(MAX_SKB_FRAGS + 2);
> >
> > why power of two?
>
> The ring size is always a power of 2,
Not really, packed rings allow non power of 2.
Linux had a bug that it required power of 2 for packed, but
we are fixing that finally.
> so checking against
> MAX_SKB_FRAGS + 2 or against roundup_pow_of_two will result in the
> same, and I think that printing the warning with the actual min value
> is more helpful. I can check the condition against MAX_SKB_FRAGS + 2,
> and print the rounded value in case of an error.
>
> > > The rest of stuff can probably just be moved to after find_vqs without
> > > much pain.
> > >
> > Actually, I think that with a little bit of pain :)
> > If we use small vrings and a GRO feature bit is set, Linux will need to allocate 64KB of continuous memory for every receive descriptor..
>
> Oh right. Hmm. Well this is same as big packets though, isn't it?
>
Well, when VIRTIO_NET_F_MRG_RXBUF is not negotiated and one of the GRO features is, the receive buffers are page size buffers chained together to form a 64K buffer.
In this case, do all the chained descriptors actually point to a single block of continuous memory, or is it possible for the descriptors to point to pages spread all over?
>
> > Instead of failing probe if GRO/CVQ are set, can we just reset the device if we discover small vrings and start over?
> > Can we remember that this device uses small vrings, and then just avoid negotiating the features that we cannot support?
>
>
> We technically can of course. I am just not sure supporting CVQ with just 1 s/g entry will
> ever be viable.
Even if we won't support 1 s/g entry, do we want to fail probe in such cases?
We could just disable the CVQ feature (with reset, as suggested before).
I'm not saying that we should, just raising the option.
On Sun, Apr 23, 2023 at 12:28:49PM +0000, Alvaro Karsz wrote:
>
> > > > The rest of stuff can probably just be moved to after find_vqs without
> > > > much pain.
> > > >
> > > Actually, I think that with a little bit of pain :)
> > > If we use small vrings and a GRO feature bit is set, Linux will need to allocate 64KB of continuous memory for every receive descriptor..
> >
> > Oh right. Hmm. Well this is same as big packets though, isn't it?
> >
>
> Well, when VIRTIO_NET_F_MRG_RXBUF is not negotiated and one of the GRO features is, the receive buffers are page size buffers chained together to form a 64K buffer.
> In this case, do all the chained descriptors actually point to a single block of continuous memory, or is it possible for the descriptors to point to pages spread all over?
>
> >
> > > Instead of failing probe if GRO/CVQ are set, can we just reset the device if we discover small vrings and start over?
> > > Can we remember that this device uses small vrings, and then just avoid negotiating the features that we cannot support?
> >
> >
> > We technically can of course. I am just not sure supporting CVQ with just 1 s/g entry will
> > ever be viable.
>
> Even if we won't support 1 s/g entry, do we want to fail probe in such cases?
> We could just disable the CVQ feature (with reset, as suggested before).
> I'm not saying that we should, just raising the option.
>
OK I'm convinced, reset and re-negotiate seems cleaner.
--
MST
On Sun, Apr 23, 2023 at 12:28:49PM +0000, Alvaro Karsz wrote:
>
> > > > The rest of stuff can probably just be moved to after find_vqs without
> > > > much pain.
> > > >
> > > Actually, I think that with a little bit of pain :)
> > > If we use small vrings and a GRO feature bit is set, Linux will need to allocate 64KB of continuous memory for every receive descriptor..
> >
> > Oh right. Hmm. Well this is same as big packets though, isn't it?
> >
>
> Well, when VIRTIO_NET_F_MRG_RXBUF is not negotiated and one of the GRO features is, the receive buffers are page size buffers chained together to form a 64K buffer.
> In this case, do all the chained descriptors actually point to a single block of continuous memory, or is it possible for the descriptors to point to pages spread all over?
>
> >
> > > Instead of failing probe if GRO/CVQ are set, can we just reset the device if we discover small vrings and start over?
> > > Can we remember that this device uses small vrings, and then just avoid negotiating the features that we cannot support?
> >
> >
> > We technically can of course. I am just not sure supporting CVQ with just 1 s/g entry will
> > ever be viable.
>
> Even if we won't support 1 s/g entry, do we want to fail probe in such cases?
> We could just disable the CVQ feature (with reset, as suggested before).
> I'm not saying that we should, just raising the option.
>
So, let's add some funky flags in virtio device to block out
features, have core compare these before and after,
detect change, reset and retry?
--
MST
> So, let's add some funky flags in virtio device to block out
> features, have core compare these before and after,
> detect change, reset and retry?
In the virtnet case, we'll decide which features to block based on the ring size.
2 < ring < MAX_FRAGS + 2 -> BLOCK GRO + MRG_RXBUF
ring < 2 -> BLOCK GRO + MRG_RXBUF + CTRL_VQ
So we'll need a new virtio callback instead of flags.
Furthermore, other virtio drivers may decide which features to block based on parameters different than ring size (I don't have a good example at the moment).
So maybe we should leave it to the driver to handle (during probe), and offer a virtio core function to re-negotiate the features?
In the solution I'm working on, I expose a new virtio core function that resets the device and renegotiates the received features.
+ A new virtio_config_ops callback peek_vqs_len to peek at the VQ lengths before calling find_vqs. (The callback must be called after the features negotiation)
So, the flow is something like:
* Super early in virtnet probe, we peek at the VQ lengths and decide if we are
using small vrings, if so, we reset and renegotiate the features.
* We continue normally and create the VQs.
* We check if the created rings are small.
If they are and some blocked features were negotiated anyway (may occur if
the re-negotiation fails, or if the transport has no implementation for
peek_vqs_len), we fail probe.
If the ring is small and the features are ok, we mark the virtnet device as
vring_small and fixup some variables.
peek_vqs_len is needed because we must know the VQ length before calling init_vqs.
During virtnet_find_vqs we check the following:
vi->has_cvq
vi->big_packets
vi->mergeable_rx_bufs
But these will change if the ring is small..
(Of course, another solution will be to re-negotiate features after init_vqs, but this will make a big mess, tons of things to clean and reconfigure)
The 2 < ring < MAX_FRAGS + 2 part is ready, I have tested a few cases and it is working.
I'm considering splitting the effort into 2 series.
A 2 < ring < MAX_FRAGS + 2 series, and a follow up series with the ring < 2 case.
I'm also thinking about sending the first series as an RFC soon, so it will be more broadly tested.
What do you think?
> > So, let's add some funky flags in virtio device to block out
> > features, have core compare these before and after,
> > detect change, reset and retry?
>
> In the virtnet case, we'll decide which features to block based on the ring size.
> 2 < ring < MAX_FRAGS + 2 -> BLOCK GRO + MRG_RXBUF
> ring < 2 -> BLOCK GRO + MRG_RXBUF + CTRL_VQ
>
> So we'll need a new virtio callback instead of flags.
>
> Furthermore, other virtio drivers may decide which features to block based on parameters different than ring size (I don't have a good example at the moment).
> So maybe we should leave it to the driver to handle (during probe), > and offer a virtio core function to re-negotiate the features?
>
> In the solution I'm working on, I expose a new virtio core function that resets the device and renegotiates the received features.
> + A new virtio_config_ops callback peek_vqs_len to peek at the VQ lengths before calling find_vqs. (The callback must be called after the features negotiation)
>
> So, the flow is something like:
>
> * Super early in virtnet probe, we peek at the VQ lengths and decide if we are
> using small vrings, if so, we reset and renegotiate the features.
> * We continue normally and create the VQs.
> * We check if the created rings are small.
> If they are and some blocked features were negotiated anyway (may occur if
> the re-negotiation fails, or if the transport has no implementation for
> peek_vqs_len), we fail probe.
Small fix: if the re-negotiation fails, we fail probe immediately.
The only way to negotiate blocked features with a small vring is if the transport has no implementation for peek_vqs_len.
> If the ring is small and the features are ok, we mark the virtnet device as
> vring_small and fixup some variables.
>
>
> peek_vqs_len is needed because we must know the VQ length before calling init_vqs.
>
> During virtnet_find_vqs we check the following:
> vi->has_cvq
> vi->big_packets
> vi->mergeable_rx_bufs
>
> But these will change if the ring is small..
>
> (Of course, another solution will be to re-negotiate features after init_vqs, but this will make a big mess, tons of things to clean and reconfigure)
>
>
> The 2 < ring < MAX_FRAGS + 2 part is ready, I have tested a few cases and it is working.
>
> I'm considering splitting the effort into 2 series.
> A 2 < ring < MAX_FRAGS + 2 series, and a follow up series with the ring < 2 case.
>
> I'm also thinking about sending the first series as an RFC soon, so it will be more broadly tested.
>
> What do you think?
>
On Tue, Apr 25, 2023 at 09:41:35AM +0000, Alvaro Karsz wrote:
> > So, let's add some funky flags in virtio device to block out
> > features, have core compare these before and after,
> > detect change, reset and retry?
>
> In the virtnet case, we'll decide which features to block based on the ring size.
> 2 < ring < MAX_FRAGS + 2 -> BLOCK GRO + MRG_RXBUF
> ring < 2 -> BLOCK GRO + MRG_RXBUF + CTRL_VQ
why MRG_RXBUF? what does it matter?
> So we'll need a new virtio callback instead of flags.
> Furthermore, other virtio drivers may decide which features to block based on parameters different than ring size (I don't have a good example at the moment).
> So maybe we should leave it to the driver to handle (during probe), and offer a virtio core function to re-negotiate the features?
>
> In the solution I'm working on, I expose a new virtio core function that resets the device and renegotiates the received features.
> + A new virtio_config_ops callback peek_vqs_len to peek at the VQ lengths before calling find_vqs. (The callback must be called after the features negotiation)
>
> So, the flow is something like:
>
> * Super early in virtnet probe, we peek at the VQ lengths and decide if we are
> using small vrings, if so, we reset and renegotiate the features.
Using which APIs? What does peek_vqs_len do and why does it matter that
it is super early?
> * We continue normally and create the VQs.
> * We check if the created rings are small.
> If they are and some blocked features were negotiated anyway (may occur if
> the re-negotiation fails, or if the transport has no implementation for
> peek_vqs_len), we fail probe.
> If the ring is small and the features are ok, we mark the virtnet device as
> vring_small and fixup some variables.
>
>
> peek_vqs_len is needed because we must know the VQ length before calling init_vqs.
>
> During virtnet_find_vqs we check the following:
> vi->has_cvq
> vi->big_packets
> vi->mergeable_rx_bufs
>
> But these will change if the ring is small..
>
> (Of course, another solution will be to re-negotiate features after init_vqs, but this will make a big mess, tons of things to clean and reconfigure)
>
>
> The 2 < ring < MAX_FRAGS + 2 part is ready, I have tested a few cases and it is working.
>
> I'm considering splitting the effort into 2 series.
> A 2 < ring < MAX_FRAGS + 2 series, and a follow up series with the ring < 2 case.
>
> I'm also thinking about sending the first series as an RFC soon, so it will be more broadly tested.
>
> What do you think?
Lots of work spilling over to transports.
And I especially don't like that it slows down boot on good path.
I have the following idea:
- add a blocked features value in virtio_device
- before calling probe, core saves blocked features
- if probe fails, checks blocked features.
if any were added, reset, negotiate all features
except blocked ones and do the validate/probe dance again
This will mean mostly no changes to drivers: just check condition,
block feature and fail probe.
--
MST
On Tue, Apr 25, 2023 at 11:11:54AM +0000, Alvaro Karsz wrote:
> > > So, let's add some funky flags in virtio device to block out
> > > features, have core compare these before and after,
> > > detect change, reset and retry?
> >
> > In the virtnet case, we'll decide which features to block based on the ring size.
> > 2 < ring < MAX_FRAGS + 2 -> BLOCK GRO + MRG_RXBUF
> > ring < 2 -> BLOCK GRO + MRG_RXBUF + CTRL_VQ
> >
> > So we'll need a new virtio callback instead of flags.
> >
> > Furthermore, other virtio drivers may decide which features to block based on parameters different than ring size (I don't have a good example at the moment).
> > So maybe we should leave it to the driver to handle (during probe), > and offer a virtio core function to re-negotiate the features?
> >
> > In the solution I'm working on, I expose a new virtio core function that resets the device and renegotiates the received features.
> > + A new virtio_config_ops callback peek_vqs_len to peek at the VQ lengths before calling find_vqs. (The callback must be called after the features negotiation)
> >
> > So, the flow is something like:
> >
> > * Super early in virtnet probe, we peek at the VQ lengths and decide if we are
> > using small vrings, if so, we reset and renegotiate the features.
> > * We continue normally and create the VQs.
> > * We check if the created rings are small.
> > If they are and some blocked features were negotiated anyway (may occur if
> > the re-negotiation fails, or if the transport has no implementation for
> > peek_vqs_len), we fail probe.
>
> Small fix: if the re-negotiation fails, we fail probe immediately.
> The only way to negotiate blocked features with a small vring is if the transport has no implementation for peek_vqs_len.
with my idea, you can go iteratively: fail one condition, core will
retry with a feature blocked, we can block more, retry again.
up to 64 times :)
> > If the ring is small and the features are ok, we mark the virtnet device as
> > vring_small and fixup some variables.
> >
> >
> > peek_vqs_len is needed because we must know the VQ length before calling init_vqs.
> >
> > During virtnet_find_vqs we check the following:
> > vi->has_cvq
> > vi->big_packets
> > vi->mergeable_rx_bufs
> >
> > But these will change if the ring is small..
> >
> > (Of course, another solution will be to re-negotiate features after init_vqs, but this will make a big mess, tons of things to clean and reconfigure)
> >
> >
> > The 2 < ring < MAX_FRAGS + 2 part is ready, I have tested a few cases and it is working.
> >
> > I'm considering splitting the effort into 2 series.
> > A 2 < ring < MAX_FRAGS + 2 series, and a follow up series with the ring < 2 case.
> >
> > I'm also thinking about sending the first series as an RFC soon, so it will be more broadly tested.
> >
> > What do you think?
> >
> > In the virtnet case, we'll decide which features to block based on the ring size.
> > 2 < ring < MAX_FRAGS + 2 -> BLOCK GRO + MRG_RXBUF
> > ring < 2 -> BLOCK GRO + MRG_RXBUF + CTRL_VQ
>
> why MRG_RXBUF? what does it matter?
>
You're right, it should be blocked only when ring < 2.
Or we should let this pass, and let the device figure out that MRG_RXBUF is meaningless with 1 entry..
> > So we'll need a new virtio callback instead of flags.
> > Furthermore, other virtio drivers may decide which features to block based on parameters different than ring size (I don't have a good example at the moment).
> > So maybe we should leave it to the driver to handle (during probe), and offer a virtio core function to re-negotiate the features?
> >
> > In the solution I'm working on, I expose a new virtio core function that resets the device and renegotiates the received features.
> > + A new virtio_config_ops callback peek_vqs_len to peek at the VQ lengths before calling find_vqs. (The callback must be called after the features negotiation)
> >
> > So, the flow is something like:
> >
> > * Super early in virtnet probe, we peek at the VQ lengths and decide if we are
> > using small vrings, if so, we reset and renegotiate the features.
>
> Using which APIs? What does peek_vqs_len do and why does it matter that
> it is super early?
>
We peek at the lengths using a new virtio_config.h function that calls a transport specific callback.
We renegotiate calling the new, exported virtio core function.
peek_vqs_len fills an array of u16 variables with the max length of every VQ.
The idea here is not to fail probe.
So we start probe, check if the ring is small, renegotiate the features and then continue with the new features.
This needs to be super early because otherwise, some virtio_has_feature calls before re-negotiating may be invalid, meaning a lot of reconfigurations.
> > * We continue normally and create the VQs.
> > * We check if the created rings are small.
> > If they are and some blocked features were negotiated anyway (may occur if
> > the re-negotiation fails, or if the transport has no implementation for
> > peek_vqs_len), we fail probe.
> > If the ring is small and the features are ok, we mark the virtnet device as
> > vring_small and fixup some variables.
> >
> >
> > peek_vqs_len is needed because we must know the VQ length before calling init_vqs.
> >
> > During virtnet_find_vqs we check the following:
> > vi->has_cvq
> > vi->big_packets
> > vi->mergeable_rx_bufs
> >
> > But these will change if the ring is small..
> >
> > (Of course, another solution will be to re-negotiate features after init_vqs, but this will make a big mess, tons of things to clean and reconfigure)
> >
> >
> > The 2 < ring < MAX_FRAGS + 2 part is ready, I have tested a few cases and it is working.
> >
> > I'm considering splitting the effort into 2 series.
> > A 2 < ring < MAX_FRAGS + 2 series, and a follow up series with the ring < 2 case.
> >
> > I'm also thinking about sending the first series as an RFC soon, so it will be more broadly tested.
> >
> > What do you think?
>
> Lots of work spilling over to transports.
>
> And I especially don't like that it slows down boot on good path.
Yes, but I don't think that this is really significant.
It's just a call to the transport to get the length of the VQs.
If ring is not small, we continue as normal.
If ring is small, we renegotiate and continue, without failing probe.
>
> I have the following idea:
> - add a blocked features value in virtio_device
> - before calling probe, core saves blocked features
> - if probe fails, checks blocked features.
> if any were added, reset, negotiate all features
> except blocked ones and do the validate/probe dance again
>
>
> This will mean mostly no changes to drivers: just check condition,
> block feature and fail probe.
>
I like the idea, will try to implement it.
Thanks,
On Tue, Apr 25, 2023 at 01:02:38PM +0000, Alvaro Karsz wrote:
> > > In the virtnet case, we'll decide which features to block based on the ring size.
> > > 2 < ring < MAX_FRAGS + 2 -> BLOCK GRO + MRG_RXBUF
> > > ring < 2 -> BLOCK GRO + MRG_RXBUF + CTRL_VQ
> >
> > why MRG_RXBUF? what does it matter?
> >
>
> You're right, it should be blocked only when ring < 2.
> Or we should let this pass, and let the device figure out that MRG_RXBUF is meaningless with 1 entry..
yep, later I think.
> > > So we'll need a new virtio callback instead of flags.
> > > Furthermore, other virtio drivers may decide which features to block based on parameters different than ring size (I don't have a good example at the moment).
> > > So maybe we should leave it to the driver to handle (during probe), and offer a virtio core function to re-negotiate the features?
> > >
> > > In the solution I'm working on, I expose a new virtio core function that resets the device and renegotiates the received features.
> > > + A new virtio_config_ops callback peek_vqs_len to peek at the VQ lengths before calling find_vqs. (The callback must be called after the features negotiation)
> > >
> > > So, the flow is something like:
> > >
> > > * Super early in virtnet probe, we peek at the VQ lengths and decide if we are
> > > using small vrings, if so, we reset and renegotiate the features.
> >
> > Using which APIs? What does peek_vqs_len do and why does it matter that
> > it is super early?
> >
>
> We peek at the lengths using a new virtio_config.h function that calls a transport specific callback.
> We renegotiate calling the new, exported virtio core function.
>
> peek_vqs_len fills an array of u16 variables with the max length of every VQ.
>
> The idea here is not to fail probe.
> So we start probe, check if the ring is small, renegotiate the features and then continue with the new features.
> This needs to be super early because otherwise, some virtio_has_feature calls before re-negotiating may be invalid, meaning a lot of reconfigurations.
>
> > > * We continue normally and create the VQs.
> > > * We check if the created rings are small.
> > > If they are and some blocked features were negotiated anyway (may occur if
> > > the re-negotiation fails, or if the transport has no implementation for
> > > peek_vqs_len), we fail probe.
> > > If the ring is small and the features are ok, we mark the virtnet device as
> > > vring_small and fixup some variables.
> > >
> > >
> > > peek_vqs_len is needed because we must know the VQ length before calling init_vqs.
> > >
> > > During virtnet_find_vqs we check the following:
> > > vi->has_cvq
> > > vi->big_packets
> > > vi->mergeable_rx_bufs
> > >
> > > But these will change if the ring is small..
> > >
> > > (Of course, another solution will be to re-negotiate features after init_vqs, but this will make a big mess, tons of things to clean and reconfigure)
> > >
> > >
> > > The 2 < ring < MAX_FRAGS + 2 part is ready, I have tested a few cases and it is working.
> > >
> > > I'm considering splitting the effort into 2 series.
> > > A 2 < ring < MAX_FRAGS + 2 series, and a follow up series with the ring < 2 case.
> > >
> > > I'm also thinking about sending the first series as an RFC soon, so it will be more broadly tested.
> > >
> > > What do you think?
> >
> > Lots of work spilling over to transports.
> >
> > And I especially don't like that it slows down boot on good path.
>
> Yes, but I don't think that this is really significant.
> It's just a call to the transport to get the length of the VQs.
With lots of VQs that is lots of exits.
> If ring is not small, we continue as normal.
> If ring is small, we renegotiate and continue, without failing probe.
>
> >
> > I have the following idea:
> > - add a blocked features value in virtio_device
> > - before calling probe, core saves blocked features
> > - if probe fails, checks blocked features.
> > if any were added, reset, negotiate all features
> > except blocked ones and do the validate/probe dance again
> >
> >
> > This will mean mostly no changes to drivers: just check condition,
> > block feature and fail probe.
> >
>
> I like the idea, will try to implement it.
>
> Thanks,