2011-06-01 09:50:11

by Michael S. Tsirkin

[permalink] [raw]
Subject: [PATCH RFC 0/3] virtio and vhost-net capacity handling

OK, here's a new attempt to use the new capacity api. I also added more
comments to clarify the logic. Hope this is more readable. Let me know
pls.

This is on top of the patches applied by Rusty.

Note: there are now actually 2 calls to fee_old_xmit_skbs on
data path so instead of passing flag '2' to the
second one I thought we can just make each call free up
at least 1 skb. This will work and even might be a bit faster (less
branches), but in the end I discarded this idea as too fragile (relies
on two calls on data path to function properly).

Warning: untested. Posting now to give people chance to
comment on the API.

Michael S. Tsirkin (3):
virtio_ring: add capacity check API
virtio_net: fix tx capacity checks using new API
virtio_net: limit xmit polling

drivers/net/virtio_net.c | 65 +++++++++++++++++++++++------------------
drivers/virtio/virtio_ring.c | 8 +++++
include/linux/virtio.h | 5 +++
3 files changed, 49 insertions(+), 29 deletions(-)

--
1.7.5.53.gc233e


2011-06-01 09:50:27

by Michael S. Tsirkin

[permalink] [raw]
Subject: [PATCH RFC 1/3] virtio_ring: add capacity check API

Add API to check ring capacity. Because of the option
to use indirect buffers, this returns the worst
case, not the normal case capacity.

Signed-off-by: Michael S. Tsirkin <[email protected]>
---
drivers/virtio/virtio_ring.c | 8 ++++++++
include/linux/virtio.h | 5 +++++
2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 68b9136..23422f1 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -344,6 +344,14 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
}
EXPORT_SYMBOL_GPL(virtqueue_get_buf);

+int virtqueue_min_capacity(struct virtqueue *_vq)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+
+ return vq->num_free;
+}
+EXPORT_SYMBOL_GPL(virtqueue_min_capacity);
+
void virtqueue_disable_cb(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 7108857..209220d 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -42,6 +42,9 @@ struct virtqueue {
* vq: the struct virtqueue we're talking about.
* len: the length written into the buffer
* Returns NULL or the "data" token handed to add_buf.
+ * virtqueue_min_capacity: get the current capacity of the queue
+ * vq: the struct virtqueue we're talking about.
+ * Returns the current worst case capacity of the queue.
* virtqueue_disable_cb: disable callbacks
* vq: the struct virtqueue we're talking about.
* Note that this is not necessarily synchronous, hence unreliable and only
@@ -89,6 +92,8 @@ void virtqueue_kick(struct virtqueue *vq);

void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len);

+int virtqueue_min_capacity(struct virtqueue *vq);
+
void virtqueue_disable_cb(struct virtqueue *vq);

bool virtqueue_enable_cb(struct virtqueue *vq);
--
1.7.5.53.gc233e

2011-06-01 09:50:35

by Michael S. Tsirkin

[permalink] [raw]
Subject: [PATCH RFC 2/3] virtio_net: fix tx capacity checks using new API

In the (rare) case where new descriptors are used
while virtio_net enables vq callback for the TX vq,
virtio_net uses the number of sg entries in the skb it frees to
calculate how many descriptors in the ring have just been made
available. But this value is an overestimate: with indirect buffers
each skb only uses one descriptor entry, meaning we may wake the queue
only to find we still can't transmit anything.

Using the new virtqueue_min_capacity() call, we can determine
the remaining capacity, so we should use that instead.
This estimation is worst-case which is consistent with
the value returned by virtqueue_add_buf.

Signed-off-by: Michael S. Tsirkin <[email protected]>
---
drivers/net/virtio_net.c | 9 ++++-----
1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f685324..a0ee78d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -509,19 +509,17 @@ again:
return received;
}

-static unsigned int free_old_xmit_skbs(struct virtnet_info *vi)
+static void free_old_xmit_skbs(struct virtnet_info *vi)
{
struct sk_buff *skb;
- unsigned int len, tot_sgs = 0;
+ unsigned int len;

while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
pr_debug("Sent skb %p\n", skb);
vi->dev->stats.tx_bytes += skb->len;
vi->dev->stats.tx_packets++;
- tot_sgs += skb_vnet_hdr(skb)->num_sg;
dev_kfree_skb_any(skb);
}
- return tot_sgs;
}

static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
@@ -611,7 +609,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
netif_stop_queue(dev);
if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
/* More just got used, free them then recheck. */
- capacity += free_old_xmit_skbs(vi);
+ free_old_xmit_skbs(vi);
+ capacity = virtqueue_min_capacity(vi->svq);
if (capacity >= 2+MAX_SKB_FRAGS) {
netif_start_queue(dev);
virtqueue_disable_cb(vi->svq);
--
1.7.5.53.gc233e

2011-06-01 09:51:04

by Michael S. Tsirkin

[permalink] [raw]
Subject: [PATCH RFC 3/3] virtio_net: limit xmit polling

Current code might introduce a lot of latency variation
if there are many pending bufs at the time we
attempt to transmit a new one. This is bad for
real-time applications and can't be good for TCP either.

Free up just enough to both clean up all buffers
eventually and to be able to xmit the next packet.

Signed-off-by: Michael S. Tsirkin <[email protected]>
---
drivers/net/virtio_net.c | 62 ++++++++++++++++++++++++++--------------------
1 files changed, 35 insertions(+), 27 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a0ee78d..6045510 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -509,17 +509,27 @@ again:
return received;
}

-static void free_old_xmit_skbs(struct virtnet_info *vi)
+/* Check capacity and try to free enough pending old buffers to enable queueing
+ * new ones. If min_skbs > 0, try to free at least the specified number of skbs
+ * even if the ring already has sufficient capacity. Return true if we can
+ * guarantee that a following virtqueue_add_buf will succeed. */
+static bool free_old_xmit_skbs(struct virtnet_info *vi, int min_skbs)
{
struct sk_buff *skb;
unsigned int len;
+ bool r;

- while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
+ while ((r = virtqueue_min_capacity(vi->svq) < MAX_SKB_FRAGS + 2) ||
+ min_skbs-- > 0) {
+ skb = virtqueue_get_buf(vi->svq, &len);
+ if (unlikely(!skb))
+ break;
pr_debug("Sent skb %p\n", skb);
vi->dev->stats.tx_bytes += skb->len;
vi->dev->stats.tx_packets++;
dev_kfree_skb_any(skb);
}
+ return r;
}

static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
@@ -572,27 +582,24 @@ static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct virtnet_info *vi = netdev_priv(dev);
- int capacity;
+ int ret, n;

- /* Free up any pending old buffers before queueing new ones. */
- free_old_xmit_skbs(vi);
+ /* Free up space in the ring in case this is the first time we get
+ * woken up after ring full condition. Note: this might try to free
+ * more than strictly necessary if the skb has a small
+ * number of fragments, but keep it simple. */
+ free_old_xmit_skbs(vi, 0);

/* Try to transmit */
- capacity = xmit_skb(vi, skb);
-
- /* This can happen with OOM and indirect buffers. */
- if (unlikely(capacity < 0)) {
- if (net_ratelimit()) {
- if (likely(capacity == -ENOMEM)) {
- dev_warn(&dev->dev,
- "TX queue failure: out of memory\n");
- } else {
- dev->stats.tx_fifo_errors++;
- dev_warn(&dev->dev,
- "Unexpected TX queue failure: %d\n",
- capacity);
- }
- }
+ ret = xmit_skb(vi, skb);
+
+ /* Failure to queue is unlikely. It's not a bug though: it might happen
+ * if we get an interrupt while the queue is still mostly full.
+ * We could stop the queue and re-enable callbacks (and possibly return
+ * TX_BUSY), but as this should be rare, we don't bother. */
+ if (unlikely(ret < 0)) {
+ if (net_ratelimit())
+ dev_info(&dev->dev, "TX queue failure: %d\n", ret);
dev->stats.tx_dropped++;
kfree_skb(skb);
return NETDEV_TX_OK;
@@ -603,15 +610,16 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
skb_orphan(skb);
nf_reset(skb);

- /* Apparently nice girls don't return TX_BUSY; stop the queue
- * before it gets out of hand. Naturally, this wastes entries. */
- if (capacity < 2+MAX_SKB_FRAGS) {
+ /* Apparently nice girls don't return TX_BUSY; check capacity and stop
+ * the queue before it gets out of hand.
+ * Naturally, this wastes entries. */
+ /* We transmit one skb, so try to free at least two pending skbs.
+ * This is so that we don't hog the skb memory unnecessarily. */
+ if (!likely(free_old_xmit_skbs(vi, 2))) {
netif_stop_queue(dev);
if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
- /* More just got used, free them then recheck. */
- free_old_xmit_skbs(vi);
- capacity = virtqueue_min_capacity(vi->svq);
- if (capacity >= 2+MAX_SKB_FRAGS) {
+ /* More just got used, free them and recheck. */
+ if (!likely(free_old_xmit_skbs(vi, 0))) {
netif_start_queue(dev);
virtqueue_disable_cb(vi->svq);
}
--
1.7.5.53.gc233e

2011-06-02 03:55:40

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH RFC 2/3] virtio_net: fix tx capacity checks using new API

On Wed, 1 Jun 2011 12:49:54 +0300, "Michael S. Tsirkin" <[email protected]> wrote:
> In the (rare) case where new descriptors are used
> while virtio_net enables vq callback for the TX vq,
> virtio_net uses the number of sg entries in the skb it frees to
> calculate how many descriptors in the ring have just been made
> available. But this value is an overestimate: with indirect buffers
> each skb only uses one descriptor entry, meaning we may wake the queue
> only to find we still can't transmit anything.

This is a bit misleading.

The value is an overestimate, but so is the requirement for
2+MAX_SKB_FRAGS, *unless* we suddenly drop into direct mode due to OOM.

Thanks,
Rusty.

2011-06-02 03:55:39

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH RFC 1/3] virtio_ring: add capacity check API

On Wed, 1 Jun 2011 12:49:46 +0300, "Michael S. Tsirkin" <[email protected]> wrote:
> Add API to check ring capacity. Because of the option
> to use indirect buffers, this returns the worst
> case, not the normal case capacity.

Can we drop the silly "add_buf() returns capacity" hack then?

Thanks,
Rusty.

2011-06-02 03:55:41

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH RFC 3/3] virtio_net: limit xmit polling

On Wed, 1 Jun 2011 12:50:03 +0300, "Michael S. Tsirkin" <[email protected]> wrote:
> Current code might introduce a lot of latency variation
> if there are many pending bufs at the time we
> attempt to transmit a new one. This is bad for
> real-time applications and can't be good for TCP either.
>
> Free up just enough to both clean up all buffers
> eventually and to be able to xmit the next packet.

OK, I found this quite confusing to read.

> - while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
> + while ((r = virtqueue_min_capacity(vi->svq) < MAX_SKB_FRAGS + 2) ||
> + min_skbs-- > 0) {
> + skb = virtqueue_get_buf(vi->svq, &len);
> + if (unlikely(!skb))
> + break;
> pr_debug("Sent skb %p\n", skb);
> vi->dev->stats.tx_bytes += skb->len;
> vi->dev->stats.tx_packets++;
> dev_kfree_skb_any(skb);
> }
> + return r;
> }

Gah... what a horrible loop.

Basically, this patch makes hard-to-read code worse, and we should try
to make it better.

Currently, xmit *can* fail when an xmit interrupt wakes the queue, but
the packet(s) xmitted didn't free up enough space for the new packet.
With indirect buffers this only happens if we hit OOM (and thus go to
direct buffers).

We could solve this by only waking the queue in skb_xmit_done if the
capacity is >= 2 + MAX_SKB_FRAGS. But can we do it without a race?

If not, then I'd really prefer to see this, because I think it's clearer:

// Try to free 2 buffers for every 1 xmit, to stay ahead.
free_old_buffers(2)

if (!add_buf()) {
// Screw latency, free them all.
free_old_buffers(UINT_MAX)
// OK, this can happen if we are using direct buffers,
// and the xmit interrupt woke us but the packets
// xmitted were smaller than this one. Rare though.
if (!add_buf())
Whinge and stop queue, maybe loop.
}

if (capacity < 2 + MAX_SKB_FRAGS) {
// We don't have enough for the next packet? Try
// freeing more.
free_old_buffers(UINT_MAX);
if (capacity < 2 + MAX_SKB_FRAGS) {
Stop queue, maybe loop.
}

The current code makes my head hurt :(

Thoughts?
Rusty.

2011-06-02 13:29:54

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH RFC 2/3] virtio_net: fix tx capacity checks using new API

On Thu, Jun 02, 2011 at 11:40:26AM +0930, Rusty Russell wrote:
> On Wed, 1 Jun 2011 12:49:54 +0300, "Michael S. Tsirkin" <[email protected]> wrote:
> > In the (rare) case where new descriptors are used
> > while virtio_net enables vq callback for the TX vq,
> > virtio_net uses the number of sg entries in the skb it frees to
> > calculate how many descriptors in the ring have just been made
> > available. But this value is an overestimate: with indirect buffers
> > each skb only uses one descriptor entry, meaning we may wake the queue
> > only to find we still can't transmit anything.
>
> This is a bit misleading.
>
> The value is an overestimate, but so is the requirement for
> 2+MAX_SKB_FRAGS, *unless* we suddenly drop into direct mode due to OOM.
>
> Thanks,
> Rusty.

I agree, it's unlikely.

s/still can't transmit anything/are still out of space and need to stop
the ring almost at once/

Better?

2011-06-02 13:31:48

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH RFC 1/3] virtio_ring: add capacity check API

On Thu, Jun 02, 2011 at 11:41:50AM +0930, Rusty Russell wrote:
> On Wed, 1 Jun 2011 12:49:46 +0300, "Michael S. Tsirkin" <[email protected]> wrote:
> > Add API to check ring capacity. Because of the option
> > to use indirect buffers, this returns the worst
> > case, not the normal case capacity.
>
> Can we drop the silly "add_buf() returns capacity" hack then?
>
> Thanks,
> Rusty.

Sure.

2011-06-02 13:34:55

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH RFC 3/3] virtio_net: limit xmit polling

On Thu, Jun 02, 2011 at 01:24:57PM +0930, Rusty Russell wrote:
> On Wed, 1 Jun 2011 12:50:03 +0300, "Michael S. Tsirkin" <[email protected]> wrote:
> > Current code might introduce a lot of latency variation
> > if there are many pending bufs at the time we
> > attempt to transmit a new one. This is bad for
> > real-time applications and can't be good for TCP either.
> >
> > Free up just enough to both clean up all buffers
> > eventually and to be able to xmit the next packet.
>
> OK, I found this quite confusing to read.
>
> > - while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
> > + while ((r = virtqueue_min_capacity(vi->svq) < MAX_SKB_FRAGS + 2) ||
> > + min_skbs-- > 0) {
> > + skb = virtqueue_get_buf(vi->svq, &len);
> > + if (unlikely(!skb))
> > + break;
> > pr_debug("Sent skb %p\n", skb);
> > vi->dev->stats.tx_bytes += skb->len;
> > vi->dev->stats.tx_packets++;
> > dev_kfree_skb_any(skb);
> > }
> > + return r;
> > }
>
> Gah... what a horrible loop.
>
> Basically, this patch makes hard-to-read code worse, and we should try
> to make it better.
>
> Currently, xmit *can* fail when an xmit interrupt wakes the queue, but
> the packet(s) xmitted didn't free up enough space for the new packet.
> With indirect buffers this only happens if we hit OOM (and thus go to
> direct buffers).
>
> We could solve this by only waking the queue in skb_xmit_done if the
> capacity is >= 2 + MAX_SKB_FRAGS. But can we do it without a race?

I don't think so.

> If not, then I'd really prefer to see this, because I think it's clearer:
>
> // Try to free 2 buffers for every 1 xmit, to stay ahead.
> free_old_buffers(2)
>
> if (!add_buf()) {
> // Screw latency, free them all.
> free_old_buffers(UINT_MAX)
> // OK, this can happen if we are using direct buffers,
> // and the xmit interrupt woke us but the packets
> // xmitted were smaller than this one. Rare though.
> if (!add_buf())
> Whinge and stop queue, maybe loop.
> }
>
> if (capacity < 2 + MAX_SKB_FRAGS) {
> // We don't have enough for the next packet? Try
> // freeing more.
> free_old_buffers(UINT_MAX);
> if (capacity < 2 + MAX_SKB_FRAGS) {
> Stop queue, maybe loop.
> }
>
> The current code makes my head hurt :(
>
> Thoughts?
> Rusty.

OK, I have something very similar, but I still dislike the screw the
latency part: this path is exactly what the IBM guys seem to hit. So I
created two functions: one tries to free a constant number and another
one up to capacity. I'll post that now.


--
MST

2011-06-02 14:14:49

by Krishna Kumar2

[permalink] [raw]
Subject: Re: [PATCH RFC 3/3] virtio_net: limit xmit polling

> OK, I have something very similar, but I still dislike the screw the
> latency part: this path is exactly what the IBM guys seem to hit. So I
> created two functions: one tries to free a constant number and another
> one up to capacity. I'll post that now.

Please review this patch to see if it looks reasonable (inline and
attachment):

1. Picked comments/code from Michael's code and Rusty's review.
2. virtqueue_min_capacity() needs to be called only if it returned
empty the last time it was called.
3. Fix return value bug in free_old_xmit_skbs (hangs guest).
4. Stop queue only if capacity is not enough for next xmit.
5. Fix/clean some likely/unlikely checks (hopefully).
6. I think xmit_skb cannot return error since
virtqueue_enable_cb_delayed() can return false only if
3/4th space became available, which is what we check.
6. The comments for free_old_xmit_skbs needs to be more
clear (not done).

I have done some minimal netperf tests with this.

With this patch, add_buf returning capacity seems to be useful - it
allows using fewer virtio API calls.

(See attached file: patch)

Signed-off-by: Krishna Kumar <[email protected]>
---
drivers/net/virtio_net.c | 105 ++++++++++++++++++++++---------------
1 file changed, 64 insertions(+), 41 deletions(-)

diff -ruNp org/drivers/net/virtio_net.c new/drivers/net/virtio_net.c
--- org/drivers/net/virtio_net.c 2011-06-02 15:49:25.000000000 +0530
+++ new/drivers/net/virtio_net.c 2011-06-02 19:13:02.000000000 +0530
@@ -509,27 +509,43 @@ again:
return received;
}

-/* Check capacity and try to free enough pending old buffers to enable
queueing
- * new ones. If min_skbs > 0, try to free at least the specified number
of skbs
- * even if the ring already has sufficient capacity. Return true if we
can
- * guarantee that a following virtqueue_add_buf will succeed. */
-static bool free_old_xmit_skbs(struct virtnet_info *vi, int min_skbs)
+/* Return true if freed a skb, else false */
+static inline bool free_one_old_xmit_skb(struct virtnet_info *vi)
{
struct sk_buff *skb;
unsigned int len;
- bool r;

- while ((r = virtqueue_min_capacity(vi->svq) < MAX_SKB_FRAGS + 2) ||
- min_skbs-- > 0) {
- skb = virtqueue_get_buf(vi->svq, &len);
- if (unlikely(!skb))
+ skb = virtqueue_get_buf(vi->svq, &len);
+ if (unlikely(!skb))
+ return 0;
+
+ pr_debug("Sent skb %p\n", skb);
+ vi->dev->stats.tx_bytes += skb->len;
+ vi->dev->stats.tx_packets++;
+ dev_kfree_skb_any(skb);
+ return 1;
+}
+
+static bool free_old_xmit_skbs(struct virtnet_info *vi, int to_free)
+{
+ bool empty = virtqueue_min_capacity(vi->svq) < MAX_SKB_FRAGS + 2;
+
+ do {
+ if (!free_one_old_xmit_skb(vi)) {
+ /* No more skbs to free up */
break;
- pr_debug("Sent skb %p\n", skb);
- vi->dev->stats.tx_bytes += skb->len;
- vi->dev->stats.tx_packets++;
- dev_kfree_skb_any(skb);
- }
- return r;
+ }
+
+ if (empty) {
+ /* Check again if there is enough space */
+ empty = virtqueue_min_capacity(vi->svq) <
+ MAX_SKB_FRAGS + 2;
+ } else {
+ --to_free;
+ }
+ } while (to_free > 0);
+
+ return !empty;
}

static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
@@ -582,46 +598,53 @@ static int xmit_skb(struct virtnet_info
static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct virtnet_info *vi = netdev_priv(dev);
- int ret, n;
+ int capacity;

- /* Free up space in the ring in case this is the first time we get
- * woken up after ring full condition. Note: this might try to free
- * more than strictly necessary if the skb has a small
- * number of fragments, but keep it simple. */
- free_old_xmit_skbs(vi, 0);
+ /* Try to free 2 buffers for every 1 xmit, to stay ahead. */
+ free_old_xmit_skbs(vi, 2);

/* Try to transmit */
- ret = xmit_skb(vi, skb);
+ capacity = xmit_skb(vi, skb);

- /* Failure to queue is unlikely. It's not a bug though: it might happen
- * if we get an interrupt while the queue is still mostly full.
- * We could stop the queue and re-enable callbacks (and possibly
return
- * TX_BUSY), but as this should be rare, we don't bother. */
- if (unlikely(ret < 0)) {
+ if (unlikely(capacity < 0)) {
+ /*
+ * Failure to queue should be impossible. The only way to
+ * reach here is if we got a cb before 3/4th of space was
+ * available. We could stop the queue and re-enable
+ * callbacks (and possibly return TX_BUSY), but we don't
+ * bother since this is impossible.
+ */
if (net_ratelimit())
- dev_info(&dev->dev, "TX queue failure: %d\n", ret);
+ dev_info(&dev->dev, "TX queue failure: %d\n", capacity);
dev->stats.tx_dropped++;
kfree_skb(skb);
return NETDEV_TX_OK;
}
+
virtqueue_kick(vi->svq);

/* Don't wait up for transmitted skbs to be freed. */
skb_orphan(skb);
nf_reset(skb);

- /* Apparently nice girls don't return TX_BUSY; check capacity and stop
- * the queue before it gets out of hand.
- * Naturally, this wastes entries. */
- /* We transmit one skb, so try to free at least two pending skbs.
- * This is so that we don't hog the skb memory unnecessarily. */
- if (!likely(free_old_xmit_skbs(vi, 2))) {
- netif_stop_queue(dev);
- if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
- /* More just got used, free them and recheck. */
- if (!likely(free_old_xmit_skbs(vi, 0))) {
- netif_start_queue(dev);
- virtqueue_disable_cb(vi->svq);
+ /*
+ * Apparently nice girls don't return TX_BUSY; check capacity and
+ * stop the queue before it gets out of hand. Naturally, this wastes
+ * entries.
+ */
+ if (capacity < 2+MAX_SKB_FRAGS) {
+ /*
+ * We don't have enough space for the next packet. Try
+ * freeing more.
+ */
+ if (likely(!free_old_xmit_skbs(vi, UINT_MAX))) {
+ netif_stop_queue(dev);
+ if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
+ /* More just got used, free them and recheck. */
+ if (likely(free_old_xmit_skbs(vi, UINT_MAX))) {
+ netif_start_queue(dev);
+ virtqueue_disable_cb(vi->svq);
+ }
}
}
}


Attachments:
patch (5.64 kB)

2011-06-02 14:43:48

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH RFC 3/3] virtio_net: limit xmit polling

On Thu, Jun 02, 2011 at 07:47:48PM +0530, Krishna Kumar2 wrote:
> > OK, I have something very similar, but I still dislike the screw the
> > latency part: this path is exactly what the IBM guys seem to hit. So I
> > created two functions: one tries to free a constant number and another
> > one up to capacity. I'll post that now.
>
> Please review this patch to see if it looks reasonable:

Hmm, since you decided to work on top of my patch,
I'd appreciate split-up fixes.

> 1. Picked comments/code from MST's code and Rusty's review.
> 2. virtqueue_min_capacity() needs to be called only if it returned
> empty the last time it was called.
> 3. Fix return value bug in free_old_xmit_skbs (hangs guest).
> 4. Stop queue only if capacity is not enough for next xmit.

That's what we always did ...

> 5. Fix/clean some likely/unlikely checks (hopefully).
>
> I have done some minimal netperf tests with this.
>
> With this patch, add_buf returning capacity seems to be useful - it
> allows less virtio API calls.

Why bother? It's cheap ...

>
> Signed-off-by: Krishna Kumar <[email protected]>
> ---
> drivers/net/virtio_net.c | 105 ++++++++++++++++++++++---------------
> 1 file changed, 64 insertions(+), 41 deletions(-)
>
> diff -ruNp org/drivers/net/virtio_net.c new/drivers/net/virtio_net.c
> --- org/drivers/net/virtio_net.c 2011-06-02 15:49:25.000000000 +0530
> +++ new/drivers/net/virtio_net.c 2011-06-02 19:13:02.000000000 +0530
> @@ -509,27 +509,43 @@ again:
> return received;
> }
>
> -/* Check capacity and try to free enough pending old buffers to enable queueing
> - * new ones. If min_skbs > 0, try to free at least the specified number of skbs
> - * even if the ring already has sufficient capacity. Return true if we can
> - * guarantee that a following virtqueue_add_buf will succeed. */
> -static bool free_old_xmit_skbs(struct virtnet_info *vi, int min_skbs)
> +/* Return true if freed a skb, else false */
> +static inline bool free_one_old_xmit_skb(struct virtnet_info *vi)
> {
> struct sk_buff *skb;
> unsigned int len;
> - bool r;
>
> - while ((r = virtqueue_min_capacity(vi->svq) < MAX_SKB_FRAGS + 2) ||
> - min_skbs-- > 0) {
> - skb = virtqueue_get_buf(vi->svq, &len);
> - if (unlikely(!skb))
> + skb = virtqueue_get_buf(vi->svq, &len);
> + if (unlikely(!skb))
> + return 0;
> +
> + pr_debug("Sent skb %p\n", skb);
> + vi->dev->stats.tx_bytes += skb->len;
> + vi->dev->stats.tx_packets++;
> + dev_kfree_skb_any(skb);
> + return 1;
> +}
> +
> +static bool free_old_xmit_skbs(struct virtnet_info *vi, int to_free)
> +{
> + bool empty = virtqueue_min_capacity(vi->svq) < MAX_SKB_FRAGS + 2;
> +
> + do {
> + if (!free_one_old_xmit_skb(vi)) {
> + /* No more skbs to free up */
> break;
> - pr_debug("Sent skb %p\n", skb);
> - vi->dev->stats.tx_bytes += skb->len;
> - vi->dev->stats.tx_packets++;
> - dev_kfree_skb_any(skb);
> - }
> - return r;
> + }
> +
> + if (empty) {
> + /* Check again if there is enough space */
> + empty = virtqueue_min_capacity(vi->svq) <
> + MAX_SKB_FRAGS + 2;
> + } else {
> + --to_free;
> + }
> + } while (to_free > 0);
> +
> + return !empty;
> }

Why bother doing the capacity check in this function?

> static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
> @@ -582,46 +598,53 @@ static int xmit_skb(struct virtnet_info
> static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> {
> struct virtnet_info *vi = netdev_priv(dev);
> - int ret, n;
> + int capacity;
>
> - /* Free up space in the ring in case this is the first time we get
> - * woken up after ring full condition. Note: this might try to free
> - * more than strictly necessary if the skb has a small
> - * number of fragments, but keep it simple. */
> - free_old_xmit_skbs(vi, 0);
> + /* Try to free 2 buffers for every 1 xmit, to stay ahead. */
> + free_old_xmit_skbs(vi, 2);
>
> /* Try to transmit */
> - ret = xmit_skb(vi, skb);
> + capacity = xmit_skb(vi, skb);
>
> - /* Failure to queue is unlikely. It's not a bug though: it might happen
> - * if we get an interrupt while the queue is still mostly full.
> - * We could stop the queue and re-enable callbacks (and possibly return
> - * TX_BUSY), but as this should be rare, we don't bother. */
> - if (unlikely(ret < 0)) {
> + if (unlikely(capacity < 0)) {
> + /*
> + * Failure to queue should be impossible. The only way to
> + * reach here is if we got a cb before 3/4th of space was
> + * available. We could stop the queue and re-enable
> + * callbacks (and possibly return TX_BUSY), but we don't
> + * bother since this is impossible.

It's far from impossible. The 3/4 thing is only a hint, and old devices
don't support it anyway.

> + */
> if (net_ratelimit())
> - dev_info(&dev->dev, "TX queue failure: %d\n", ret);
> + dev_info(&dev->dev, "TX queue failure: %d\n", capacity);
> dev->stats.tx_dropped++;
> kfree_skb(skb);
> return NETDEV_TX_OK;
> }
> +
> virtqueue_kick(vi->svq);
>
> /* Don't wait up for transmitted skbs to be freed. */
> skb_orphan(skb);
> nf_reset(skb);
>
> - /* Apparently nice girls don't return TX_BUSY; check capacity and stop
> - * the queue before it gets out of hand.
> - * Naturally, this wastes entries. */
> - /* We transmit one skb, so try to free at least two pending skbs.
> - * This is so that we don't hog the skb memory unnecessarily. */
> - if (!likely(free_old_xmit_skbs(vi, 2))) {
> - netif_stop_queue(dev);
> - if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
> - /* More just got used, free them and recheck. */
> - if (!likely(free_old_xmit_skbs(vi, 0))) {
> - netif_start_queue(dev);
> - virtqueue_disable_cb(vi->svq);
> + /*
> + * Apparently nice girls don't return TX_BUSY; check capacity and
> + * stop the queue before it gets out of hand. Naturally, this wastes
> + * entries.
> + */
> + if (capacity < 2+MAX_SKB_FRAGS) {
> + /*
> + * We don't have enough space for the next packet. Try
> + * freeing more.
> + */
> + if (likely(!free_old_xmit_skbs(vi, UINT_MAX))) {
> + netif_stop_queue(dev);
> + if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
> + /* More just got used, free them and recheck. */
> + if (likely(free_old_xmit_skbs(vi, UINT_MAX))) {

Is this where the bug was?

> + netif_start_queue(dev);
> + virtqueue_disable_cb(vi->svq);
> + }
> }
> }
> }

2011-06-02 15:23:56

by Krishna Kumar2

[permalink] [raw]
Subject: Re: [PATCH RFC 3/3] virtio_net: limit xmit polling

"Michael S. Tsirkin" <[email protected]> wrote on 06/02/2011 08:13:46 PM:

> > Please review this patch to see if it looks reasonable:
>
> Hmm, since you decided to work on top of my patch,
> I'd appreciate split-up fixes.

OK (that also explains your next comment).

> > 1. Picked comments/code from MST's code and Rusty's review.
> > 2. virtqueue_min_capacity() needs to be called only if it returned
> > empty the last time it was called.
> > 3. Fix return value bug in free_old_xmit_skbs (hangs guest).
> > 4. Stop queue only if capacity is not enough for next xmit.
>
> That's what we always did ...

I had made the patch against your patch, hence this change (sorry for
the confusion!).

> > 5. Fix/clean some likely/unlikely checks (hopefully).
> >
> > I have done some minimal netperf tests with this.
> >
> > With this patch, add_buf returning capacity seems to be useful - it
> > allows less virtio API calls.
>
> Why bother? It's cheap ...

If add_buf retains it's functionality to return the capacity (it
is going to need a change to return 0 otherwise anyway), is it
useful to call another function at each xmit?

> > +static bool free_old_xmit_skbs(struct virtnet_info *vi, int to_free)
> > +{
> > + bool empty = virtqueue_min_capacity(vi->svq) < MAX_SKB_FRAGS + 2;
> > +
> > + do {
> > + if (!free_one_old_xmit_skb(vi)) {
> > + /* No more skbs to free up */
> > break;
> > - pr_debug("Sent skb %p\n", skb);
> > - vi->dev->stats.tx_bytes += skb->len;
> > - vi->dev->stats.tx_packets++;
> > - dev_kfree_skb_any(skb);
> > - }
> > - return r;
> > + }
> > +
> > + if (empty) {
> > + /* Check again if there is enough space */
> > + empty = virtqueue_min_capacity(vi->svq) <
> > + MAX_SKB_FRAGS + 2;
> > + } else {
> > + --to_free;
> > + }
> > + } while (to_free > 0);
> > +
> > + return !empty;
> > }
>
> Why bother doing the capacity check in this function?

To return whether we have enough space for next xmit. It should call
it only once unless space is running out. Does it sound OK?

> > - if (unlikely(ret < 0)) {
> > + if (unlikely(capacity < 0)) {
> > + /*
> > + * Failure to queue should be impossible. The only way to
> > + * reach here is if we got a cb before 3/4th of space was
> > + * available. We could stop the queue and re-enable
> > + * callbacks (and possibly return TX_BUSY), but we don't
> > + * bother since this is impossible.
>
> It's far from impossible. The 3/4 thing is only a hint, and old devices
> don't support it anyway.

OK, I will re-put back your comment.

> > - if (!likely(free_old_xmit_skbs(vi, 2))) {
> > - netif_stop_queue(dev);
> > - if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
> > - /* More just got used, free them and recheck. */
> > - if (!likely(free_old_xmit_skbs(vi, 0))) {
> > - netif_start_queue(dev);
> > - virtqueue_disable_cb(vi->svq);
> > + /*
> > + * Apparently nice girls don't return TX_BUSY; check capacity and
> > + * stop the queue before it gets out of hand. Naturally, this
wastes
> > + * entries.
> > + */
> > + if (capacity < 2+MAX_SKB_FRAGS) {
> > + /*
> > + * We don't have enough space for the next packet. Try
> > + * freeing more.
> > + */
> > + if (likely(!free_old_xmit_skbs(vi, UINT_MAX))) {
> > + netif_stop_queue(dev);
> > + if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
> > + /* More just got used, free them and recheck. */
> > + if (likely(free_old_xmit_skbs(vi, UINT_MAX))) {
>
> Is this where the bug was?

Return value in free_old_xmit() was wrong. I will re-do against the
mainline kernel.

Thanks,

- KK

2011-06-02 15:34:24

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH RFC 3/3] virtio_net: limit xmit polling

On Thu, Jun 02, 2011 at 08:56:42PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin" <[email protected]> wrote on 06/02/2011 08:13:46 PM:
>
> > > Please review this patch to see if it looks reasonable:
> >
> > Hmm, since you decided to work on top of my patch,
> > I'd appreciate split-up fixes.
>
> OK (that also explains your next comment).
>
> > > 1. Picked comments/code from MST's code and Rusty's review.
> > > 2. virtqueue_min_capacity() needs to be called only if it returned
> > > empty the last time it was called.
> > > 3. Fix return value bug in free_old_xmit_skbs (hangs guest).
> > > 4. Stop queue only if capacity is not enough for next xmit.
> >
> > That's what we always did ...
>
> I had made the patch against your patch, hence this change (sorry for
> the confusion!).
>
> > > 5. Fix/clean some likely/unlikely checks (hopefully).
> > >
> > > I have done some minimal netperf tests with this.
> > >
> > > With this patch, add_buf returning capacity seems to be useful - it
> > > allows less virtio API calls.
> >
> > Why bother? It's cheap ...
>
> If add_buf retains it's functionality to return the capacity (it
> is going to need a change to return 0 otherwise anyway), is it
> useful to call another function at each xmit?
>
> > > +static bool free_old_xmit_skbs(struct virtnet_info *vi, int to_free)
> > > +{
> > > + bool empty = virtqueue_min_capacity(vi->svq) < MAX_SKB_FRAGS + 2;
> > > +
> > > + do {
> > > + if (!free_one_old_xmit_skb(vi)) {
> > > + /* No more skbs to free up */
> > > break;
> > > - pr_debug("Sent skb %p\n", skb);
> > > - vi->dev->stats.tx_bytes += skb->len;
> > > - vi->dev->stats.tx_packets++;
> > > - dev_kfree_skb_any(skb);
> > > - }
> > > - return r;
> > > + }
> > > +
> > > + if (empty) {
> > > + /* Check again if there is enough space */
> > > + empty = virtqueue_min_capacity(vi->svq) <
> > > + MAX_SKB_FRAGS + 2;
> > > + } else {
> > > + --to_free;
> > > + }
> > > + } while (to_free > 0);
> > > +
> > > + return !empty;
> > > }
> >
> > Why bother doing the capacity check in this function?
>
> To return whether we have enough space for next xmit. It should call
> it only once unless space is running out. Does it sound OK?
>
> > > - if (unlikely(ret < 0)) {
> > > + if (unlikely(capacity < 0)) {
> > > + /*
> > > + * Failure to queue should be impossible. The only way to
> > > + * reach here is if we got a cb before 3/4th of space was
> > > + * available. We could stop the queue and re-enable
> > > + * callbacks (and possibly return TX_BUSY), but we don't
> > > + * bother since this is impossible.
> >
> > It's far from impossible. The 3/4 thing is only a hint, and old devices
> > don't support it anyway.
>
> OK, I will re-put back your comment.
>
> > > - if (!likely(free_old_xmit_skbs(vi, 2))) {
> > > - netif_stop_queue(dev);
> > > - if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
> > > - /* More just got used, free them and recheck. */
> > > - if (!likely(free_old_xmit_skbs(vi, 0))) {
> > > - netif_start_queue(dev);
> > > - virtqueue_disable_cb(vi->svq);
> > > + /*
> > > + * Apparently nice girls don't return TX_BUSY; check capacity and
> > > + * stop the queue before it gets out of hand. Naturally, this
> wastes
> > > + * entries.
> > > + */
> > > + if (capacity < 2+MAX_SKB_FRAGS) {
> > > + /*
> > > + * We don't have enough space for the next packet. Try
> > > + * freeing more.
> > > + */
> > > + if (likely(!free_old_xmit_skbs(vi, UINT_MAX))) {
> > > + netif_stop_queue(dev);
> > > + if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
> > > + /* More just got used, free them and recheck. */
> > > + if (likely(free_old_xmit_skbs(vi, UINT_MAX))) {
> >
> > Is this where the bug was?
>
> Return value in free_old_xmit() was wrong. I will re-do against the
> mainline kernel.
>
> Thanks,
>
> - KK

Just noting that I'm working on that patch as well, it might
be more efficient if we don't both of us do this in parallel :)

--
MST

2011-06-02 15:45:03

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH RFC 3/3] virtio_net: limit xmit polling

On Thu, Jun 02, 2011 at 08:56:42PM +0530, Krishna Kumar2 wrote:
> Return value in free_old_xmit() was wrong.

Could you check my latest RFC pls?

--
MST

2011-06-03 04:06:07

by Krishna Kumar2

[permalink] [raw]
Subject: Re: [PATCH RFC 3/3] virtio_net: limit xmit polling

"Michael S. Tsirkin" <[email protected]> wrote on 06/02/2011 09:04:23 PM:

> > > Is this where the bug was?
> >
> > Return value in free_old_xmit() was wrong. I will re-do against the
> > mainline kernel.
> >
> > Thanks,
> >
> > - KK
>
> Just noting that I'm working on that patch as well, it might
> be more efficient if we don't both of us do this in parallel :)

OK, but my intention was to work on a alternate approach, which
was the reason to base it against your patch.

I will check your latest patch.

thanks,

- KK