LinuxLists.cc - [PATCH v3 4/4] virtio_net: disable cb aggressively

2021-05-26 08:28:00

Subject: [PATCH v3 4/4] virtio_net: disable cb aggressively

There are currently two cases where we poll TX vq not in response to a
callback: start xmit and rx napi. We currently do this with callbacks
enabled which can cause extra interrupts from the card. Used not to be
a big issue as we run with interrupts disabled but that is no longer the
case, and in some cases the rate of spurious interrupts is so high
linux detects this and actually kills the interrupt.

Fix up by disabling the callbacks before polling the tx vq.

Signed-off-by: Michael S. Tsirkin <[email protected]>
---
drivers/net/virtio_net.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c29f42d1e04f..a83dc038d8af 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
return;

if (__netif_tx_trylock(txq)) {
- free_old_xmit_skbs(sq, true);
+ do {
+ virtqueue_disable_cb(sq->vq);
+ free_old_xmit_skbs(sq, true);
+ } while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));

if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
netif_tx_wake_queue(txq);
@@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
bool kick = !netdev_xmit_more();
bool use_napi = sq->napi.weight;
+ unsigned int bytes = skb->len;

/* Free up any pending old buffers before queueing new ones. */
- free_old_xmit_skbs(sq, false);
+ do {
+ if (use_napi)
+ virtqueue_disable_cb(sq->vq);

- if (use_napi && kick)
- virtqueue_enable_cb_delayed(sq->vq);
+ free_old_xmit_skbs(sq, false);
+
+ } while (use_napi && kick &&
+ unlikely(!virtqueue_enable_cb_delayed(sq->vq)));

/* timestamp packet in software */
skb_tx_timestamp(skb);
--
MST

2021-05-26 20:39:14

by Jakub Kicinski

[permalink] [raw]

Subject: Re: [PATCH v3 4/4] virtio_net: disable cb aggressively

On Wed, 26 May 2021 04:24:43 -0400 Michael S. Tsirkin wrote:
> @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
> bool kick = !netdev_xmit_more();
> bool use_napi = sq->napi.weight;
> + unsigned int bytes = skb->len;

FWIW GCC says bytes is unused.

2021-05-26 21:25:40

by Willem de Bruijn

[permalink] [raw]

Subject: Re: [PATCH v3 4/4] virtio_net: disable cb aggressively

On Wed, May 26, 2021 at 11:15 AM Eric Dumazet <[email protected]> wrote:
>
>
>
> On 5/26/21 10:24 AM, Michael S. Tsirkin wrote:
> > There are currently two cases where we poll TX vq not in response to a
> > callback: start xmit and rx napi. We currently do this with callbacks
> > enabled which can cause extra interrupts from the card. Used not to be
> > a big issue as we run with interrupts disabled but that is no longer the
> > case, and in some cases the rate of spurious interrupts is so high
> > linux detects this and actually kills the interrupt.

Temporarily disabling interrupts during free_old_xmit_skbs in
virtnet_poll_cleantx might reduce the spurious interrupt rate by
avoiding an additional Tx interrupt from being scheduled during
virtnet_poll_cleantx.

It probably does not address all spurious interrupts, as
virtnet_poll_cleantx might also run in between the scheduling of the
Tx interrupt and the call to virtnet_poll_tx, right? The Tx and Rx
interrupts racing.

If I can reproduce the report, I can also test how much this helps in practice.

> > Fix up by disabling the callbacks before polling the tx vq.
>
>
> It is not clear why we want to poll TX completions from ndo_start_xmit() in napi mode ?

Yes, we can simply exclude that. The original napi-tx patch did not
make that change, but not for any strong reason.

2021-05-26 22:26:48

by Eric Dumazet

[permalink] [raw]

Subject: Re: [PATCH v3 4/4] virtio_net: disable cb aggressively

On 5/26/21 10:24 AM, Michael S. Tsirkin wrote:
> There are currently two cases where we poll TX vq not in response to a
> callback: start xmit and rx napi. We currently do this with callbacks
> enabled which can cause extra interrupts from the card. Used not to be
> a big issue as we run with interrupts disabled but that is no longer the
> case, and in some cases the rate of spurious interrupts is so high
> linux detects this and actually kills the interrupt.
>
> Fix up by disabling the callbacks before polling the tx vq.

It is not clear why we want to poll TX completions from ndo_start_xmit() in napi mode ?

This seems not needed, adding costs to sender thread, this might
reduce the ability to use a different cpu for tx completions.

Also this will likely conflict with BQL model if we want to use BQL at some point.

>

This probably needs a Fixes: tag

> Signed-off-by: Michael S. Tsirkin <[email protected]>
> ---
> drivers/net/virtio_net.c | 16 ++++++++++++----
> 1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c29f42d1e04f..a83dc038d8af 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> return;
>
> if (__netif_tx_trylock(txq)) {
> - free_old_xmit_skbs(sq, true);
> + do {
> + virtqueue_disable_cb(sq->vq);
> + free_old_xmit_skbs(sq, true);
> + } while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>
> if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> netif_tx_wake_queue(txq);
> @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
> bool kick = !netdev_xmit_more();
> bool use_napi = sq->napi.weight;
> + unsigned int bytes = skb->len;
>
> /* Free up any pending old buffers before queueing new ones. */
> - free_old_xmit_skbs(sq, false);
> + do {
> + if (use_napi)
> + virtqueue_disable_cb(sq->vq);
>
> - if (use_napi && kick)
> - virtqueue_enable_cb_delayed(sq->vq);
> + free_old_xmit_skbs(sq, false);
> +
> + } while (use_napi && kick &&
> + unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>
> /* timestamp packet in software */
> skb_tx_timestamp(skb);
>

2021-05-27 04:13:16

by Jason Wang

[permalink] [raw]

Subject: Re: [PATCH v3 4/4] virtio_net: disable cb aggressively

?? 2021/5/26 ????4:24, Michael S. Tsirkin д??:
> There are currently two cases where we poll TX vq not in response to a
> callback: start xmit and rx napi. We currently do this with callbacks
> enabled which can cause extra interrupts from the card. Used not to be
> a big issue as we run with interrupts disabled but that is no longer the
> case, and in some cases the rate of spurious interrupts is so high
> linux detects this and actually kills the interrupt.
>
> Fix up by disabling the callbacks before polling the tx vq.
>
> Signed-off-by: Michael S. Tsirkin <[email protected]>
> ---
> drivers/net/virtio_net.c | 16 ++++++++++++----
> 1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c29f42d1e04f..a83dc038d8af 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> return;
>
> if (__netif_tx_trylock(txq)) {
> - free_old_xmit_skbs(sq, true);
> + do {
> + virtqueue_disable_cb(sq->vq);
> + free_old_xmit_skbs(sq, true);
> + } while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>
> if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> netif_tx_wake_queue(txq);
> @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
> bool kick = !netdev_xmit_more();
> bool use_napi = sq->napi.weight;
> + unsigned int bytes = skb->len;
>
> /* Free up any pending old buffers before queueing new ones. */
> - free_old_xmit_skbs(sq, false);
> + do {
> + if (use_napi)
> + virtqueue_disable_cb(sq->vq);
>
> - if (use_napi && kick)
> - virtqueue_enable_cb_delayed(sq->vq);
> + free_old_xmit_skbs(sq, false);
> +
> + } while (use_napi && kick &&
> + unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>
> /* timestamp packet in software */
> skb_tx_timestamp(skb);

I wonder whehter we can simple disable cb during ndo_start_xmit(), or is
there a way to make xmit and napi work in parallel?

Thanks

2023-01-16 13:53:03

by Laurent Vivier

[permalink] [raw]

Subject: Re: [PATCH v3 4/4] virtio_net: disable cb aggressively

Hi Michael,

On 5/26/21 10:24, Michael S. Tsirkin wrote:
> There are currently two cases where we poll TX vq not in response to a
> callback: start xmit and rx napi. We currently do this with callbacks
> enabled which can cause extra interrupts from the card. Used not to be
> a big issue as we run with interrupts disabled but that is no longer the
> case, and in some cases the rate of spurious interrupts is so high
> linux detects this and actually kills the interrupt.
>
> Fix up by disabling the callbacks before polling the tx vq.
>
> Signed-off-by: Michael S. Tsirkin <[email protected]>
> ---
> drivers/net/virtio_net.c | 16 ++++++++++++----
> 1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c29f42d1e04f..a83dc038d8af 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> return;
>
> if (__netif_tx_trylock(txq)) {
> - free_old_xmit_skbs(sq, true);
> + do {
> + virtqueue_disable_cb(sq->vq);
> + free_old_xmit_skbs(sq, true);
> + } while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>
> if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> netif_tx_wake_queue(txq);
> @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
> bool kick = !netdev_xmit_more();
> bool use_napi = sq->napi.weight;
> + unsigned int bytes = skb->len;
>
> /* Free up any pending old buffers before queueing new ones. */
> - free_old_xmit_skbs(sq, false);
> + do {
> + if (use_napi)
> + virtqueue_disable_cb(sq->vq);
>
> - if (use_napi && kick)
> - virtqueue_enable_cb_delayed(sq->vq);
> + free_old_xmit_skbs(sq, false);
> +
> + } while (use_napi && kick &&
> + unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>
> /* timestamp packet in software */
> skb_tx_timestamp(skb);

This patch seems to introduce a problem with QEMU connected to passt using netdev stream
backend.

When I run an iperf3 test I get after 1 or 2 seconds of test:

[ 254.035559] NETDEV WATCHDOG: ens3 (virtio_net): transmit queue 0 timed out
...
[ 254.060962] virtio_net virtio1 ens3: TX timeout on queue: 0, sq: output.0, vq: 0x1,
name: output.0, 8856000 usecs ago
[ 259.155150] virtio_net virtio1 ens3: TX timeout on queue: 0, sq: output.0, vq: 0x1,
name: output.0, 13951000 usecs ago

In QEMU, I can see in virtio_net_tx_bh() the function virtio_net_flush_tx() has flushed
all the queue entries and re-enabled the queue notification with
virtio_queue_set_notification() and tries to flush again the queue and as it is empty it
does nothing more and then rely on a kernel notification to re-enable the bottom half
function. As this notification never comes the queue is stuck and kernel add entries but
QEMU doesn't remove them:

2812 static void virtio_net_tx_bh(void *opaque)
2813 {
...
2833 ret = virtio_net_flush_tx(q);

-> flush the queue and ret is not an error and not n->tx_burst (that would re-schedule the
function)

...
2850 virtio_queue_set_notification(q->tx_vq, 1);

-> re-enable the queue notification

2851 ret = virtio_net_flush_tx(q);
2852 if (ret == -EINVAL) {
2853 return;
2854 } else if (ret > 0) {
2855 virtio_queue_set_notification(q->tx_vq, 0);
2856 qemu_bh_schedule(q->tx_bh);
2857 q->tx_waiting = 1;
2858 }

-> ret is 0, exit the function without re-scheduling the function.
...
2859 }

If I revert this patch in the kernel (a7766ef18b33 ("virtio_net: disable cb
aggressively")), it works fine.

How to reproduce it:

I start passt (https://passt.top/passt):

passt -f

and then QEMU

qemu-system-x86_64 ... --netdev
stream,id=netdev0,server=off,addr.type=unix,addr.path=/tmp/passt_1.socket -device
virtio-net,mac=9a:2b:2c:2d:2e:2f,netdev=netdev0

Host side:

sysctl -w net.core.rmem_max=134217728
sysctl -w net.core.wmem_max=134217728
iperf3 -s

Guest side:

sysctl -w net.core.rmem_max=536870912
sysctl -w net.core.wmem_max=536870912

ip link set dev $DEV mtu 256
iperf3 -c $HOST -t10 -i0 -Z -P 8 -l 1M --pacing-timer 1000000 -w 4M

Any idea of what is the problem?

Thanks,
Laurent

2023-01-17 04:25:33

by Jason Wang

[permalink] [raw]

Subject: Re: [PATCH v3 4/4] virtio_net: disable cb aggressively

On Mon, Jan 16, 2023 at 9:41 PM Laurent Vivier <[email protected]> wrote:
>
> Hi Michael,
>
> On 5/26/21 10:24, Michael S. Tsirkin wrote:
> > There are currently two cases where we poll TX vq not in response to a
> > callback: start xmit and rx napi. We currently do this with callbacks
> > enabled which can cause extra interrupts from the card. Used not to be
> > a big issue as we run with interrupts disabled but that is no longer the
> > case, and in some cases the rate of spurious interrupts is so high
> > linux detects this and actually kills the interrupt.
> >
> > Fix up by disabling the callbacks before polling the tx vq.
> >
> > Signed-off-by: Michael S. Tsirkin <[email protected]>
> > ---
> > drivers/net/virtio_net.c | 16 ++++++++++++----
> > 1 file changed, 12 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index c29f42d1e04f..a83dc038d8af 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> > return;
> >
> > if (__netif_tx_trylock(txq)) {
> > - free_old_xmit_skbs(sq, true);
> > + do {
> > + virtqueue_disable_cb(sq->vq);
> > + free_old_xmit_skbs(sq, true);
> > + } while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> >
> > if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> > netif_tx_wake_queue(txq);
> > @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
> > bool kick = !netdev_xmit_more();
> > bool use_napi = sq->napi.weight;
> > + unsigned int bytes = skb->len;
> >
> > /* Free up any pending old buffers before queueing new ones. */
> > - free_old_xmit_skbs(sq, false);
> > + do {
> > + if (use_napi)
> > + virtqueue_disable_cb(sq->vq);
> >
> > - if (use_napi && kick)
> > - virtqueue_enable_cb_delayed(sq->vq);
> > + free_old_xmit_skbs(sq, false);
> > +
> > + } while (use_napi && kick &&
> > + unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> >
> > /* timestamp packet in software */
> > skb_tx_timestamp(skb);
>
> This patch seems to introduce a problem with QEMU connected to passt using netdev stream
> backend.
>
> When I run an iperf3 test I get after 1 or 2 seconds of test:
>
> [ 254.035559] NETDEV WATCHDOG: ens3 (virtio_net): transmit queue 0 timed out
> ...
> [ 254.060962] virtio_net virtio1 ens3: TX timeout on queue: 0, sq: output.0, vq: 0x1,
> name: output.0, 8856000 usecs ago
> [ 259.155150] virtio_net virtio1 ens3: TX timeout on queue: 0, sq: output.0, vq: 0x1,
> name: output.0, 13951000 usecs ago
>
> In QEMU, I can see in virtio_net_tx_bh() the function virtio_net_flush_tx() has flushed
> all the queue entries and re-enabled the queue notification with
> virtio_queue_set_notification() and tries to flush again the queue and as it is empty it
> does nothing more and then rely on a kernel notification to re-enable the bottom half
> function. As this notification never comes the queue is stuck and kernel add entries but
> QEMU doesn't remove them:
>
> 2812 static void virtio_net_tx_bh(void *opaque)
> 2813 {
> ...
> 2833 ret = virtio_net_flush_tx(q);
>
> -> flush the queue and ret is not an error and not n->tx_burst (that would re-schedule the
> function)
>
> ...
> 2850 virtio_queue_set_notification(q->tx_vq, 1);
>
> -> re-enable the queue notification
>
> 2851 ret = virtio_net_flush_tx(q);
> 2852 if (ret == -EINVAL) {
> 2853 return;
> 2854 } else if (ret > 0) {
> 2855 virtio_queue_set_notification(q->tx_vq, 0);
> 2856 qemu_bh_schedule(q->tx_bh);
> 2857 q->tx_waiting = 1;
> 2858 }
>
> -> ret is 0, exit the function without re-scheduling the function.
> ...
> 2859 }
>
> If I revert this patch in the kernel (a7766ef18b33 ("virtio_net: disable cb
> aggressively")), it works fine.
>
> How to reproduce it:
>
> I start passt (https://passt.top/passt):
>
> passt -f
>
> and then QEMU
>
> qemu-system-x86_64 ... --netdev
> stream,id=netdev0,server=off,addr.type=unix,addr.path=/tmp/passt_1.socket -device
> virtio-net,mac=9a:2b:2c:2d:2e:2f,netdev=netdev0
>
> Host side:
>
> sysctl -w net.core.rmem_max=134217728
> sysctl -w net.core.wmem_max=134217728
> iperf3 -s
>
> Guest side:
>
> sysctl -w net.core.rmem_max=536870912
> sysctl -w net.core.wmem_max=536870912
>
> ip link set dev $DEV mtu 256
> iperf3 -c $HOST -t10 -i0 -Z -P 8 -l 1M --pacing-timer 1000000 -w 4M
>
> Any idea of what is the problem?

This looks similar to what I spot and try to fix in:

[PATCH net V3] virtio-net: correctly enable callback during start_xmit

(I've cced you in this version).

Thanks

>
> Thanks,
> Laurent
>
>