2014-07-30 13:26:17

by Zoltan Kiss

[permalink] [raw]
Subject: [PATCH] xen-netfront: Fix handling packets on compound pages with skb_segment

There is a long known problem with the netfront/netback interface: if the guest
tries to send a packet which constitues more than MAX_SKB_FRAGS + 1 ring slots,
it gets dropped. The reason is that netback maps these slots to a frag in the
frags array, which is limited by size. Having so many slots can occur since
compound pages were introduced, as the ring protocol slice them up into
individual (non-compound) page aligned slots. The theoretical worst case
scenario looks like this (note, skbs are limited to 64 Kb here):
linear buffer: at most PAGE_SIZE - 17 * 2 bytes, overlapping page boundary,
using 2 slots
first 15 frags: 1 + PAGE_SIZE + 1 bytes long, first and last bytes are at the
end and the beginning of a page, therefore they use 3 * 15 = 45 slots
last 2 frags: 1 + 1 bytes, overlapping page boundary, 2 * 2 = 4 slots
Although I don't think this 51 slots skb can really happen, we need a solution
which can deal with every scenario. In real life there is only a few slots
overdue, but usually it causes the TCP stream to be blocked, as the retry will
most likely have the same buffer layout.
This patch solves this problem by slicing up the skb itself with the help of
skb_segment, and calling xennet_start_xmit again on the resulting packets. It
also works with the theoretical worst case, where there is a 3 level recursion.
The good thing is that skb_segment only copies the header part, the frags will
be just referenced again.

Signed-off-by: Zoltan Kiss <[email protected]>
Cc: Wei Liu <[email protected]>
Cc: Ian Campbell <[email protected]>
Cc: Paul Durrant <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 055222b..0398240 100644
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 055222b..9ce1b62 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -625,12 +625,37 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
goto drop;
}

+ /* WARNING: this function should be reentrant up until this point, as in
+ * the below if branch it could be called recursively
+ */
slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
xennet_count_skb_frag_slots(skb);
if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
- net_alert_ratelimited(
- "xennet: skb rides the rocket: %d slots\n", slots);
- goto drop;
+ struct sk_buff *segs, *nskb;
+ unsigned short gso_size_orig = skb_shinfo(skb)->gso_size;
+ unsigned short gso_type_orig = skb_shinfo(skb)->gso_type;
+
+ net_dbg_ratelimited(
+ "xennet: skb rides the rocket: %d slots, %d bytes\n",
+ slots, skb->len);
+ netdev_features_t features =
+ netif_skb_features(skb) & ~NETIF_F_GSO_MASK;
+ /* Segment this into two pieces, most probably it will fit */
+ skb_shinfo(skb)->gso_size = skb->len / 2 + 1;
+ segs = skb_gso_segment(skb, features);
+ if (unlikely(!segs || IS_ERR(segs)))
+ goto drop;
+ do {
+ nskb = segs;
+ segs = nskb->next;
+ nskb->next = NULL;
+ skb_shinfo(nskb)->gso_size = gso_size_orig;
+ skb_shinfo(nskb)->gso_type = gso_type_orig;
+ xennet_start_xmit(nskb, dev);
+ } while (segs);
+
+ dev_kfree_skb_any(skb);
+ return NETDEV_TX_OK;
}

spin_lock_irqsave(&queue->tx_lock, flags);


2014-07-31 20:25:24

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] xen-netfront: Fix handling packets on compound pages with skb_segment

From: Zoltan Kiss <[email protected]>
Date: Wed, 30 Jul 2014 14:25:30 +0100

> There is a long known problem with the netfront/netback interface: if the guest
> tries to send a packet which constitues more than MAX_SKB_FRAGS + 1 ring slots,
> it gets dropped. The reason is that netback maps these slots to a frag in the
> frags array, which is limited by size. Having so many slots can occur since
> compound pages were introduced, as the ring protocol slice them up into
> individual (non-compound) page aligned slots. The theoretical worst case
> scenario looks like this (note, skbs are limited to 64 Kb here):
> linear buffer: at most PAGE_SIZE - 17 * 2 bytes, overlapping page boundary,
> using 2 slots
> first 15 frags: 1 + PAGE_SIZE + 1 bytes long, first and last bytes are at the
> end and the beginning of a page, therefore they use 3 * 15 = 45 slots
> last 2 frags: 1 + 1 bytes, overlapping page boundary, 2 * 2 = 4 slots
> Although I don't think this 51 slots skb can really happen, we need a solution
> which can deal with every scenario. In real life there is only a few slots
> overdue, but usually it causes the TCP stream to be blocked, as the retry will
> most likely have the same buffer layout.
> This patch solves this problem by slicing up the skb itself with the help of
> skb_segment, and calling xennet_start_xmit again on the resulting packets. It
> also works with the theoretical worst case, where there is a 3 level recursion.
> The good thing is that skb_segment only copies the header part, the frags will
> be just referenced again.
>
> Signed-off-by: Zoltan Kiss <[email protected]>

This is a really scary change :-)

I definitely see some potential problem here.

First of all, even in cases where it might "work", such as TCP, you
are modifying the data stream. The sizes are changing, the packet
counts are different, and all of this will have side effects such as
potentially harming TCP performance.

Secondly, for something like UDP you can't just split the packet up
like this, or for any other datagram protocol for that matter.

I know you're in a difficult situation, but I just can't see this
being an acceptable approach to solving the problem right now.

Where does the MAX_SKB_FRAGS + 1 limit really come from, the size of
the TX queue?

If you were to have a 64-slot TX queue, you ought to be able to handle
this theoretical 51 slot SKB.

And I don't think it's so theoretical, a carefully crafted sequence of
sendfile() calls during a TCP_CORK sequence should be able to do it.