2013-10-30 00:50:29

by Zoltan Kiss

[permalink] [raw]
Subject: [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy

A long known problem of the upstream netback implementation that on the TX
path (from guest to Dom0) it copies the whole packet from guest memory into
Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
huge perfomance penalty. The classic kernel version of netback used grant
mapping, and to get notified when the page can be unmapped, it used page
destructors. Unfortunately that destructor is not an upstreamable solution.
Ian Campbell's skb fragment destructor patch series
(http://lwn.net/Articles/491522/) tried to solve this problem, however it
seems to be very invasive on the network stack's code, and therefore haven't
progressed very well.
This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
know when the skb is freed up. That is the way KVM solved the same problem,
and based on my initial tests it can do the same for us. Avoiding the extra
copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
switch)
Based on my investigations the packet get only copied if it is delivered to
Dom0 stack, which is due to this patch:
https://lkml.org/lkml/2012/7/20/363
That's a bit unfortunate, but as far as I know for the huge majority this use
case is not too important. There are a couple of things which need more
polishing, see the FIXME comments. I will run some more extensive tests, but
in the meantime I would like to hear comments about what I've done so far.
I've tried to broke it down to smaller patches, with mixed results, so I
welcome suggestions on that part as well:
1/5: Introduce TX grant map definitions
2/5: Change TX path from grant copy to mapping
3/5: Remove old TX grant copy definitons
4/5: Fix indentations
5/5: Change RX path for mapped SKB fragments

Signed-off-by: Zoltan Kiss <[email protected]>


2013-10-30 00:50:43

by Zoltan Kiss

[permalink] [raw]
Subject: [PATCH net-next RFC 1/5] xen-netback: Introduce TX grant map definitions

This patch contains the new definitions necessary for grant mapping.

Signed-off-by: Zoltan Kiss <[email protected]>

---
drivers/net/xen-netback/common.h | 22 ++++++
drivers/net/xen-netback/interface.c | 1 +
drivers/net/xen-netback/netback.c | 134 +++++++++++++++++++++++++++++++++++
3 files changed, 157 insertions(+)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 55b8dec..36a3fda 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -79,6 +79,11 @@ struct pending_tx_info {
* if it is head of one or more tx
* reqs
*/
+ /* callback data for released SKBs. The callback is always
+ * xenvif_zerocopy_callback, ctx points to the next fragment, desc
+ * contains the pending_idx
+ */
+ struct ubuf_info callback_struct;
};

#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
@@ -101,6 +106,8 @@ struct xenvif_rx_meta {

#define MAX_PENDING_REQS 256

+#define NETBACK_INVALID_HANDLE -1
+
struct xenvif {
/* Unique identifier for this interface. */
domid_t domid;
@@ -119,13 +126,22 @@ struct xenvif {
pending_ring_idx_t pending_cons;
u16 pending_ring[MAX_PENDING_REQS];
struct pending_tx_info pending_tx_info[MAX_PENDING_REQS];
+ grant_handle_t grant_tx_handle[MAX_PENDING_REQS];

/* Coalescing tx requests before copying makes number of grant
* copy ops greater or equal to number of slots required. In
* worst case a tx request consumes 2 gnttab_copy.
*/
struct gnttab_copy tx_copy_ops[2*MAX_PENDING_REQS];
+ struct gnttab_unmap_grant_ref tx_unmap_ops[MAX_PENDING_REQS];
+ struct gnttab_map_grant_ref tx_map_ops[MAX_PENDING_REQS];
+ /* passed to gnttab_[un]map_refs with pages under (un)mapping */
+ struct page *pages_to_gnt[MAX_PENDING_REQS];

+ spinlock_t dealloc_lock;
+ pending_ring_idx_t dealloc_prod;
+ pending_ring_idx_t dealloc_cons;
+ u16 dealloc_ring[MAX_PENDING_REQS];

/* Use kthread for guest RX */
struct task_struct *task;
@@ -226,6 +242,12 @@ void xenvif_rx_action(struct xenvif *vif);

int xenvif_kthread(void *data);

+/* Callback from stack when TX packet can be released */
+void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success);
+
+/* Unmap a pending page, usually has to be called before xenvif_idx_release */
+void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx);
+
extern bool separate_tx_rx_irq;

#endif /* __XEN_NETBACK__COMMON_H__ */
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index e4aa267..f5c3c57 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -37,6 +37,7 @@

#include <xen/events.h>
#include <asm/xen/hypercall.h>
+#include <xen/balloon.h>

#define XENVIF_QUEUE_LENGTH 32
#define XENVIF_NAPI_WEIGHT 64
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 828fdab..10470dc 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -869,6 +869,20 @@ static struct page *xenvif_alloc_page(struct xenvif *vif,
return page;
}

+static inline void xenvif_tx_create_gop(struct xenvif *vif, u16 pending_idx,
+ struct xen_netif_tx_request *txp,
+ struct gnttab_map_grant_ref *gop)
+{
+ vif->pages_to_gnt[gop-vif->tx_map_ops] = vif->mmap_pages[pending_idx];
+ gnttab_set_map_op(gop, idx_to_kaddr(vif, pending_idx),
+ GNTMAP_host_map | GNTMAP_readonly,
+ txp->gref, vif->domid);
+
+ memcpy(&vif->pending_tx_info[pending_idx].req, txp,
+ sizeof(*txp));
+
+}
+
static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
struct sk_buff *skb,
struct xen_netif_tx_request *txp,
@@ -1652,6 +1666,106 @@ static int xenvif_tx_submit(struct xenvif *vif, int budget)
return work_done;
}

+void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
+{
+ unsigned long flags;
+ pending_ring_idx_t index;
+ u16 pending_idx = ubuf->desc;
+ struct pending_tx_info *temp =
+ container_of(ubuf, struct pending_tx_info, callback_struct);
+ struct xenvif *vif =
+ container_of(temp - pending_idx, struct xenvif,
+ pending_tx_info[0]);
+
+ spin_lock_irqsave(&vif->dealloc_lock, flags);
+ do {
+ pending_idx = ubuf->desc;
+ ubuf = (struct ubuf_info *) ubuf->ctx;
+ index = pending_index(vif->dealloc_prod);
+ vif->dealloc_ring[index] = pending_idx;
+ /* Sync with xenvif_tx_action_dealloc:
+ * insert idx then incr producer.
+ */
+ smp_wmb();
+ vif->dealloc_prod++;
+ napi_schedule(&vif->napi);
+ } while (ubuf);
+ spin_unlock_irqrestore(&vif->dealloc_lock, flags);
+}
+
+static inline void xenvif_tx_action_dealloc(struct xenvif *vif)
+{
+ struct gnttab_unmap_grant_ref *gop;
+ pending_ring_idx_t dc, dp;
+ u16 pending_idx, pending_idx_release[MAX_PENDING_REQS];
+ unsigned int i = 0;
+
+ dc = vif->dealloc_cons;
+ gop = vif->tx_unmap_ops;
+
+ /* Free up any grants we have finished using */
+ do {
+ dp = vif->dealloc_prod;
+
+ /* Ensure we see all indices enqueued by netif_idx_release(). */
+ smp_rmb();
+
+ while (dc != dp) {
+
+ pending_idx_release[gop-vif->tx_unmap_ops] =
+ pending_idx =
+ vif->dealloc_ring[pending_index(dc++)];
+
+ /* Already unmapped? */
+ if (vif->grant_tx_handle[pending_idx] ==
+ NETBACK_INVALID_HANDLE) {
+ netdev_err(vif->dev,
+ "Trying to unmap invalid handle! "
+ "pending_idx: %x\n", pending_idx);
+ continue;
+ }
+
+ vif->pages_to_gnt[gop-vif->tx_unmap_ops] =
+ vif->mmap_pages[pending_idx];
+ gnttab_set_unmap_op(gop,
+ idx_to_kaddr(vif, pending_idx),
+ GNTMAP_host_map,
+ vif->grant_tx_handle[pending_idx]);
+ vif->grant_tx_handle[pending_idx] =
+ NETBACK_INVALID_HANDLE;
+ ++gop;
+ }
+
+ } while (dp != vif->dealloc_prod);
+
+ vif->dealloc_cons = dc;
+
+ if (gop - vif->tx_unmap_ops > 0) {
+ int ret;
+ ret = gnttab_unmap_refs(vif->tx_unmap_ops,
+ NULL,
+ vif->pages_to_gnt,
+ gop - vif->tx_unmap_ops);
+ if (ret) {
+ netdev_err(vif->dev, "Unmap fail: nr_ops %x ret %d\n",
+ gop - vif->tx_unmap_ops, ret);
+ for (i = 0; i < gop - vif->tx_unmap_ops; ++i) {
+ netdev_err(vif->dev,
+ " host_addr: %llx handle: %x status: %d\n",
+ gop[i].host_addr,
+ gop[i].handle,
+ gop[i].status);
+ }
+ BUG();
+ }
+ }
+
+ for (i = 0; i < gop - vif->tx_unmap_ops; ++i)
+ xenvif_idx_release(vif, pending_idx_release[i],
+ XEN_NETIF_RSP_OKAY);
+}
+
+
/* Called after netfront has transmitted */
int xenvif_tx_action(struct xenvif *vif, int budget)
{
@@ -1718,6 +1832,26 @@ static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
vif->mmap_pages[pending_idx] = NULL;
}

+void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx)
+{
+ int ret;
+ if (vif->grant_tx_handle[pending_idx] == NETBACK_INVALID_HANDLE) {
+ netdev_err(vif->dev,
+ "Trying to unmap invalid handle! pending_idx: %x\n",
+ pending_idx);
+ return;
+ }
+ gnttab_set_unmap_op(&vif->tx_unmap_ops[0],
+ idx_to_kaddr(vif, pending_idx),
+ GNTMAP_host_map,
+ vif->grant_tx_handle[pending_idx]);
+ ret = gnttab_unmap_refs(vif->tx_unmap_ops,
+ NULL,
+ &vif->mmap_pages[pending_idx],
+ 1);
+ BUG_ON(ret);
+ vif->grant_tx_handle[pending_idx] = NETBACK_INVALID_HANDLE;
+}

static void make_tx_response(struct xenvif *vif,
struct xen_netif_tx_request *txp,

2013-10-30 00:50:52

by Zoltan Kiss

[permalink] [raw]
Subject: [PATCH net-next RFC 4/5] xen-netback: Fix indentations

I've left intentionally these indentations in this way, to improve readability of previous patches.

Signed-off-by: Zoltan Kiss <[email protected]>
---
drivers/net/xen-netback/netback.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index c91dd36..204fa46 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -882,8 +882,8 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,

for (shinfo->nr_frags = start; shinfo->nr_frags < nr_slots;
shinfo->nr_frags++, txp++, gop++) {
- index = pending_index(vif->pending_cons++);
- pending_idx = vif->pending_ring[index];
+ index = pending_index(vif->pending_cons++);
+ pending_idx = vif->pending_ring[index];
xenvif_tx_create_gop(vif, pending_idx, txp, gop);
frag_set_pending_idx(&frags[shinfo->nr_frags], pending_idx);
}
@@ -929,7 +929,7 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
tx_info = &vif->pending_tx_info[pending_idx];

/* Check error status: if okay then remember grant handle. */
- newerr = (++gop)->status;
+ newerr = (++gop)->status;

if (likely(!newerr)) {
if (vif->grant_tx_handle[pending_idx] !=
@@ -1717,10 +1717,10 @@ static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
struct pending_tx_info *pending_tx_info;
pending_ring_idx_t index;

- pending_tx_info = &vif->pending_tx_info[pending_idx];
- make_tx_response(vif, &pending_tx_info->req, status);
- index = pending_index(vif->pending_prod++);
- vif->pending_ring[index] = pending_idx;
+ pending_tx_info = &vif->pending_tx_info[pending_idx];
+ make_tx_response(vif, &pending_tx_info->req, status);
+ index = pending_index(vif->pending_prod++);
+ vif->pending_ring[index] = pending_idx;
}

void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx)

2013-10-30 00:50:46

by Zoltan Kiss

[permalink] [raw]
Subject: [PATCH net-next RFC 2/5] xen-netback: Change TX path from grant copy to mapping

This patch changes the grant copy on the TX patch to grant mapping

Signed-off-by: Zoltan Kiss <[email protected]>
---
drivers/net/xen-netback/interface.c | 39 +++++-
drivers/net/xen-netback/netback.c | 241 +++++++++++++----------------------
2 files changed, 126 insertions(+), 154 deletions(-)

diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index f5c3c57..fb16ede 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -336,8 +336,20 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
vif->pending_prod = MAX_PENDING_REQS;
for (i = 0; i < MAX_PENDING_REQS; i++)
vif->pending_ring[i] = i;
- for (i = 0; i < MAX_PENDING_REQS; i++)
- vif->mmap_pages[i] = NULL;
+ err = alloc_xenballooned_pages(MAX_PENDING_REQS,
+ vif->mmap_pages,
+ false);
+ if (err) {
+ netdev_err(dev, "Could not reserve mmap_pages\n");
+ return NULL;
+ }
+ for (i = 0; i < MAX_PENDING_REQS; i++) {
+ vif->pending_tx_info[i].callback_struct = (struct ubuf_info)
+ { .callback = xenvif_zerocopy_callback,
+ .ctx = NULL,
+ .desc = i };
+ vif->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
+ }

/*
* Initialise a dummy MAC address. We choose the numerically
@@ -481,11 +493,34 @@ void xenvif_disconnect(struct xenvif *vif)

void xenvif_free(struct xenvif *vif)
{
+ int i;
+
netif_napi_del(&vif->napi);

unregister_netdev(vif->dev);

free_netdev(vif->dev);

+ /* FIXME: This is a workaround for 2 problems:
+ * - the guest sent a packet just before teardown, and it is still not
+ * delivered
+ * - the stack forgot to notify us that we can give back a slot
+ * For the first problem we shouldn't do this, as the skb might still
+ * access that page. I will come up with a more robust solution later.
+ * The second is definitely a bug, it leaks a slot from the ring
+ * forever.
+ * Classic kernel patchset has delayed copy for that, we might want to
+ * reuse that?
+ */
+ for (i = 0; i < MAX_PENDING_REQS; ++i) {
+ if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
+ netdev_err(vif->dev,
+ "Page still granted! Index: %x\n", i);
+ xenvif_idx_unmap(vif, i);
+ }
+ }
+
+ free_xenballooned_pages(MAX_PENDING_REQS, vif->mmap_pages);
+
module_put(THIS_MODULE);
}
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 10470dc..e544e9f 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -883,10 +883,10 @@ static inline void xenvif_tx_create_gop(struct xenvif *vif, u16 pending_idx,

}

-static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
+static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
struct sk_buff *skb,
struct xen_netif_tx_request *txp,
- struct gnttab_copy *gop)
+ struct gnttab_map_grant_ref *gop)
{
struct skb_shared_info *shinfo = skb_shinfo(skb);
skb_frag_t *frags = shinfo->frags;
@@ -907,83 +907,12 @@ static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
/* Skip first skb fragment if it is on same page as header fragment. */
start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);

- /* Coalesce tx requests, at this point the packet passed in
- * should be <= 64K. Any packets larger than 64K have been
- * handled in xenvif_count_requests().
- */
- for (shinfo->nr_frags = slot = start; slot < nr_slots;
- shinfo->nr_frags++) {
- struct pending_tx_info *pending_tx_info =
- vif->pending_tx_info;
-
- page = alloc_page(GFP_ATOMIC|__GFP_COLD);
- if (!page)
- goto err;
-
- dst_offset = 0;
- first = NULL;
- while (dst_offset < PAGE_SIZE && slot < nr_slots) {
- gop->flags = GNTCOPY_source_gref;
-
- gop->source.u.ref = txp->gref;
- gop->source.domid = vif->domid;
- gop->source.offset = txp->offset;
-
- gop->dest.domid = DOMID_SELF;
-
- gop->dest.offset = dst_offset;
- gop->dest.u.gmfn = virt_to_mfn(page_address(page));
-
- if (dst_offset + txp->size > PAGE_SIZE) {
- /* This page can only merge a portion
- * of tx request. Do not increment any
- * pointer / counter here. The txp
- * will be dealt with in future
- * rounds, eventually hitting the
- * `else` branch.
- */
- gop->len = PAGE_SIZE - dst_offset;
- txp->offset += gop->len;
- txp->size -= gop->len;
- dst_offset += gop->len; /* quit loop */
- } else {
- /* This tx request can be merged in the page */
- gop->len = txp->size;
- dst_offset += gop->len;
-
+ for (shinfo->nr_frags = start; shinfo->nr_frags < nr_slots;
+ shinfo->nr_frags++, txp++, gop++) {
index = pending_index(vif->pending_cons++);
-
pending_idx = vif->pending_ring[index];
-
- memcpy(&pending_tx_info[pending_idx].req, txp,
- sizeof(*txp));
-
- /* Poison these fields, corresponding
- * fields for head tx req will be set
- * to correct values after the loop.
- */
- vif->mmap_pages[pending_idx] = (void *)(~0UL);
- pending_tx_info[pending_idx].head =
- INVALID_PENDING_RING_IDX;
-
- if (!first) {
- first = &pending_tx_info[pending_idx];
- start_idx = index;
- head_idx = pending_idx;
- }
-
- txp++;
- slot++;
- }
-
- gop++;
- }
-
- first->req.offset = 0;
- first->req.size = dst_offset;
- first->head = start_idx;
- vif->mmap_pages[head_idx] = page;
- frag_set_pending_idx(&frags[shinfo->nr_frags], head_idx);
+ xenvif_tx_create_gop(vif, pending_idx, txp, gop);
+ frag_set_pending_idx(&frags[shinfo->nr_frags], pending_idx);
}

BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);
@@ -1005,9 +934,9 @@ err:

static int xenvif_tx_check_gop(struct xenvif *vif,
struct sk_buff *skb,
- struct gnttab_copy **gopp)
+ struct gnttab_map_grant_ref **gopp)
{
- struct gnttab_copy *gop = *gopp;
+ struct gnttab_map_grant_ref *gop = *gopp;
u16 pending_idx = *((u16 *)skb->data);
struct skb_shared_info *shinfo = skb_shinfo(skb);
struct pending_tx_info *tx_info;
@@ -1019,6 +948,16 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
err = gop->status;
if (unlikely(err))
xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR);
+ else {
+ if (vif->grant_tx_handle[pending_idx] !=
+ NETBACK_INVALID_HANDLE) {
+ netdev_err(vif->dev,
+ "Stale mapped handle! pending_idx %x handle %x\n",
+ pending_idx, vif->grant_tx_handle[pending_idx]);
+ xenvif_fatal_tx_err(vif);
+ }
+ vif->grant_tx_handle[pending_idx] = gop->handle;
+ }

/* Skip first skb fragment if it is on same page as header fragment. */
start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
@@ -1032,18 +971,24 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
head = tx_info->head;

/* Check error status: if okay then remember grant handle. */
- do {
newerr = (++gop)->status;
- if (newerr)
- break;
- peek = vif->pending_ring[pending_index(++head)];
- } while (!pending_tx_is_head(vif, peek));

if (likely(!newerr)) {
+ if (vif->grant_tx_handle[pending_idx] !=
+ NETBACK_INVALID_HANDLE) {
+ netdev_err(vif->dev,
+ "Stale mapped handle! pending_idx %x handle %x\n",
+ pending_idx,
+ vif->grant_tx_handle[pending_idx]);
+ xenvif_fatal_tx_err(vif);
+ }
+ vif->grant_tx_handle[pending_idx] = gop->handle;
/* Had a previous error? Invalidate this fragment. */
- if (unlikely(err))
+ if (unlikely(err)) {
+ xenvif_idx_unmap(vif, pending_idx);
xenvif_idx_release(vif, pending_idx,
XEN_NETIF_RSP_OKAY);
+ }
continue;
}

@@ -1056,9 +1001,11 @@ static int xenvif_tx_check_gop(struct xenvif *vif,

/* First error: invalidate header and preceding fragments. */
pending_idx = *((u16 *)skb->data);
+ xenvif_idx_unmap(vif, pending_idx);
xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY);
for (j = start; j < i; j++) {
pending_idx = frag_get_pending_idx(&shinfo->frags[j]);
+ xenvif_idx_unmap(vif, pending_idx);
xenvif_idx_release(vif, pending_idx,
XEN_NETIF_RSP_OKAY);
}
@@ -1071,7 +1018,8 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
return err;
}

-static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb)
+static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb,
+ u16 prev_pending_idx)
{
struct skb_shared_info *shinfo = skb_shinfo(skb);
int nr_frags = shinfo->nr_frags;
@@ -1085,6 +1033,15 @@ static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb)

pending_idx = frag_get_pending_idx(frag);

+ /* If this is not the first frag, chain it to the previous*/
+ vif->pending_tx_info[pending_idx].callback_struct.ctx = NULL;
+ if (pending_idx != prev_pending_idx) {
+ vif->pending_tx_info[prev_pending_idx].callback_struct.ctx =
+ &(vif->pending_tx_info[pending_idx].callback_struct);
+ prev_pending_idx = pending_idx;
+ }
+
+
txp = &vif->pending_tx_info[pending_idx].req;
page = virt_to_page(idx_to_kaddr(vif, pending_idx));
__skb_fill_page_desc(skb, i, page, txp->offset, txp->size);
@@ -1092,10 +1049,15 @@ static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb)
skb->data_len += txp->size;
skb->truesize += txp->size;

- /* Take an extra reference to offset xenvif_idx_release */
+ /* Take an extra reference to offset network stack's put_page */
get_page(vif->mmap_pages[pending_idx]);
- xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY);
}
+ /* FIXME: __skb_fill_page_desc set this to true because page->pfmemalloc
+ * overlaps with "index", and "mapping" is not set. I think mapping
+ * should be set. If delivered to local stack, it would drop this
+ * skb in sk_filter unless the socket has the right to use it.
+ */
+ skb->pfmemalloc = false;
}

static int xenvif_get_extras(struct xenvif *vif,
@@ -1426,7 +1388,7 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)

static unsigned xenvif_tx_build_gops(struct xenvif *vif)
{
- struct gnttab_copy *gop = vif->tx_copy_ops, *request_gop;
+ struct gnttab_map_grant_ref *gop = vif->tx_map_ops, *request_gop;
struct sk_buff *skb;
int ret;

@@ -1533,30 +1495,10 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif)
}
}

- /* XXX could copy straight to head */
- page = xenvif_alloc_page(vif, pending_idx);
- if (!page) {
- kfree_skb(skb);
- xenvif_tx_err(vif, &txreq, idx);
- break;
- }
-
- gop->source.u.ref = txreq.gref;
- gop->source.domid = vif->domid;
- gop->source.offset = txreq.offset;
-
- gop->dest.u.gmfn = virt_to_mfn(page_address(page));
- gop->dest.domid = DOMID_SELF;
- gop->dest.offset = txreq.offset;
-
- gop->len = txreq.size;
- gop->flags = GNTCOPY_source_gref;
+ xenvif_tx_create_gop(vif, pending_idx, &txreq, gop);

gop++;

- memcpy(&vif->pending_tx_info[pending_idx].req,
- &txreq, sizeof(txreq));
- vif->pending_tx_info[pending_idx].head = index;
*((u16 *)skb->data) = pending_idx;

__skb_put(skb, data_len);
@@ -1585,17 +1527,17 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif)

vif->tx.req_cons = idx;

- if ((gop-vif->tx_copy_ops) >= ARRAY_SIZE(vif->tx_copy_ops))
+ if ((gop-vif->tx_map_ops) >= ARRAY_SIZE(vif->tx_map_ops))
break;
}

- return gop - vif->tx_copy_ops;
+ return gop - vif->tx_map_ops;
}


static int xenvif_tx_submit(struct xenvif *vif, int budget)
{
- struct gnttab_copy *gop = vif->tx_copy_ops;
+ struct gnttab_map_grant_ref *gop = vif->tx_map_ops;
struct sk_buff *skb;
int work_done = 0;

@@ -1620,14 +1562,25 @@ static int xenvif_tx_submit(struct xenvif *vif, int budget)
memcpy(skb->data,
(void *)(idx_to_kaddr(vif, pending_idx)|txp->offset),
data_len);
+ vif->pending_tx_info[pending_idx].callback_struct.ctx = NULL;
if (data_len < txp->size) {
/* Append the packet payload as a fragment. */
txp->offset += data_len;
txp->size -= data_len;
- } else {
+ skb_shinfo(skb)->destructor_arg =
+ &vif->pending_tx_info[pending_idx].callback_struct;
+ } else if (!skb_shinfo(skb)->nr_frags) {
/* Schedule a response immediately. */
+ skb_shinfo(skb)->destructor_arg = NULL;
+ xenvif_idx_unmap(vif, pending_idx);
xenvif_idx_release(vif, pending_idx,
XEN_NETIF_RSP_OKAY);
+ } else {
+ /* FIXME: first request fits linear space, I don't know
+ * if any guest would do that, but I think it's possible
+ */
+ skb_shinfo(skb)->destructor_arg =
+ &vif->pending_tx_info[pending_idx].callback_struct;
}

if (txp->flags & XEN_NETTXF_csum_blank)
@@ -1635,13 +1588,19 @@ static int xenvif_tx_submit(struct xenvif *vif, int budget)
else if (txp->flags & XEN_NETTXF_data_validated)
skb->ip_summed = CHECKSUM_UNNECESSARY;

- xenvif_fill_frags(vif, skb);
+ xenvif_fill_frags(vif, skb, pending_idx);

if (skb_is_nonlinear(skb) && skb_headlen(skb) < PKT_PROT_LEN) {
int target = min_t(int, skb->len, PKT_PROT_LEN);
__pskb_pull_tail(skb, target - skb_headlen(skb));
}

+ /* Set this flag after __pskb_pull_tail, as it can trigger
+ * skb_copy_ubufs, while we are still in control of the skb
+ */
+ if (skb_shinfo(skb)->destructor_arg)
+ skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
+
skb->dev = vif->dev;
skb->protocol = eth_type_trans(skb, skb->dev);
skb_reset_network_header(skb);
@@ -1770,17 +1729,25 @@ static inline void xenvif_tx_action_dealloc(struct xenvif *vif)
int xenvif_tx_action(struct xenvif *vif, int budget)
{
unsigned nr_gops;
- int work_done;
+ int work_done, ret;

if (unlikely(!tx_work_todo(vif)))
return 0;

+ xenvif_tx_action_dealloc(vif);
+
nr_gops = xenvif_tx_build_gops(vif);

if (nr_gops == 0)
return 0;

- gnttab_batch_copy(vif->tx_copy_ops, nr_gops);
+ if (nr_gops) {
+ ret = gnttab_map_refs(vif->tx_map_ops,
+ NULL,
+ vif->pages_to_gnt,
+ nr_gops);
+ BUG_ON(ret);
+ }

work_done = xenvif_tx_submit(vif, nr_gops);

@@ -1791,45 +1758,13 @@ static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
u8 status)
{
struct pending_tx_info *pending_tx_info;
- pending_ring_idx_t head;
+ pending_ring_idx_t index;
u16 peek; /* peek into next tx request */

- BUG_ON(vif->mmap_pages[pending_idx] == (void *)(~0UL));
-
- /* Already complete? */
- if (vif->mmap_pages[pending_idx] == NULL)
- return;
-
- pending_tx_info = &vif->pending_tx_info[pending_idx];
-
- head = pending_tx_info->head;
-
- BUG_ON(!pending_tx_is_head(vif, head));
- BUG_ON(vif->pending_ring[pending_index(head)] != pending_idx);
-
- do {
- pending_ring_idx_t index;
- pending_ring_idx_t idx = pending_index(head);
- u16 info_idx = vif->pending_ring[idx];
-
- pending_tx_info = &vif->pending_tx_info[info_idx];
+ pending_tx_info = &vif->pending_tx_info[pending_idx];
make_tx_response(vif, &pending_tx_info->req, status);
-
- /* Setting any number other than
- * INVALID_PENDING_RING_IDX indicates this slot is
- * starting a new packet / ending a previous packet.
- */
- pending_tx_info->head = 0;
-
index = pending_index(vif->pending_prod++);
- vif->pending_ring[index] = vif->pending_ring[info_idx];
-
- peek = vif->pending_ring[pending_index(++head)];
-
- } while (!pending_tx_is_head(vif, peek));
-
- put_page(vif->mmap_pages[pending_idx]);
- vif->mmap_pages[pending_idx] = NULL;
+ vif->pending_ring[index] = pending_idx;
}

void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx)
@@ -1904,6 +1839,8 @@ static inline int rx_work_todo(struct xenvif *vif)

static inline int tx_work_todo(struct xenvif *vif)
{
+ if (vif->dealloc_cons != vif->dealloc_prod)
+ return 1;

if (likely(RING_HAS_UNCONSUMED_REQUESTS(&vif->tx)) &&
(nr_pending_reqs(vif) + XEN_NETBK_LEGACY_SLOTS_MAX

2013-10-30 00:50:56

by Zoltan Kiss

[permalink] [raw]
Subject: [PATCH net-next RFC 4/5] xen-netback: Change RX path for mapped SKB fragments

RX path need to know if the SKB fragments are stored on pages from another
domain.

Signed-off-by: Zoltan Kiss <[email protected]>
---
drivers/net/xen-netback/netback.c | 46 +++++++++++++++++++++++++++++++++----
1 file changed, 41 insertions(+), 5 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 204fa46..0ffa412 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -322,7 +322,9 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif *vif,
static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
struct netrx_pending_operations *npo,
struct page *page, unsigned long size,
- unsigned long offset, int *head)
+ unsigned long offset, int *head,
+ struct xenvif *foreign_vif,
+ grant_ref_t foreign_gref)
{
struct gnttab_copy *copy_gop;
struct xenvif_rx_meta *meta;
@@ -364,8 +366,15 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
copy_gop->flags = GNTCOPY_dest_gref;
copy_gop->len = bytes;

- copy_gop->source.domid = DOMID_SELF;
- copy_gop->source.u.gmfn = virt_to_mfn(page_address(page));
+ if (foreign_vif) {
+ copy_gop->source.domid = foreign_vif->domid;
+ copy_gop->source.u.ref = foreign_gref;
+ copy_gop->flags |= GNTCOPY_source_gref;
+ } else {
+ copy_gop->source.domid = DOMID_SELF;
+ copy_gop->source.u.gmfn =
+ virt_to_mfn(page_address(page));
+ }
copy_gop->source.offset = offset;

copy_gop->dest.domid = vif->domid;
@@ -426,6 +435,9 @@ static int xenvif_gop_skb(struct sk_buff *skb,
int old_meta_prod;
int gso_type;
int gso_size;
+ struct ubuf_info *ubuf = skb_shinfo(skb)->destructor_arg;
+ grant_ref_t foreign_grefs[MAX_SKB_FRAGS];
+ struct xenvif *foreign_vif = NULL;

old_meta_prod = npo->meta_prod;

@@ -466,6 +478,26 @@ static int xenvif_gop_skb(struct sk_buff *skb,
npo->copy_off = 0;
npo->copy_gref = req->gref;

+ if ((skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) &&
+ (ubuf->callback == &xenvif_zerocopy_callback)) {
+ u16 pending_idx = ubuf->desc;
+ int i = 0;
+ struct pending_tx_info *temp =
+ container_of(ubuf,
+ struct pending_tx_info,
+ callback_struct);
+ foreign_vif =
+ container_of(temp - pending_idx,
+ struct xenvif,
+ pending_tx_info[0]);
+ do {
+ pending_idx = ubuf->desc;
+ ubuf = (struct ubuf_info *) ubuf->ctx;
+ foreign_grefs[i++] =
+ foreign_vif->pending_tx_info[pending_idx].req.gref;
+ } while (ubuf);
+ }
+
data = skb->data;
while (data < skb_tail_pointer(skb)) {
unsigned int offset = offset_in_page(data);
@@ -475,7 +507,9 @@ static int xenvif_gop_skb(struct sk_buff *skb,
len = skb_tail_pointer(skb) - data;

xenvif_gop_frag_copy(vif, skb, npo,
- virt_to_page(data), len, offset, &head);
+ virt_to_page(data), len, offset, &head,
+ NULL,
+ 0);
data += len;
}

@@ -484,7 +518,9 @@ static int xenvif_gop_skb(struct sk_buff *skb,
skb_frag_page(&skb_shinfo(skb)->frags[i]),
skb_frag_size(&skb_shinfo(skb)->frags[i]),
skb_shinfo(skb)->frags[i].page_offset,
- &head);
+ &head,
+ foreign_vif,
+ foreign_grefs[i]);
}

return npo->meta_prod - old_meta_prod;

2013-10-30 00:50:49

by Zoltan Kiss

[permalink] [raw]
Subject: [PATCH net-next RFC 3/5] xen-netback: Remove old TX grant copy definitons

These became obsolate with grant mapping.

Signed-off-by: Zoltan Kiss <[email protected]>
---
drivers/net/xen-netback/common.h | 37 +---------------------------
drivers/net/xen-netback/netback.c | 48 ++-----------------------------------
2 files changed, 3 insertions(+), 82 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 36a3fda..e82c82c 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -46,39 +46,9 @@
#include <xen/xenbus.h>

typedef unsigned int pending_ring_idx_t;
-#define INVALID_PENDING_RING_IDX (~0U)

-/* For the head field in pending_tx_info: it is used to indicate
- * whether this tx info is the head of one or more coalesced requests.
- *
- * When head != INVALID_PENDING_RING_IDX, it means the start of a new
- * tx requests queue and the end of previous queue.
- *
- * An example sequence of head fields (I = INVALID_PENDING_RING_IDX):
- *
- * ...|0 I I I|5 I|9 I I I|...
- * -->|<-INUSE----------------
- *
- * After consuming the first slot(s) we have:
- *
- * ...|V V V V|5 I|9 I I I|...
- * -----FREE->|<-INUSE--------
- *
- * where V stands for "valid pending ring index". Any number other
- * than INVALID_PENDING_RING_IDX is OK. These entries are considered
- * free and can contain any number other than
- * INVALID_PENDING_RING_IDX. In practice we use 0.
- *
- * The in use non-INVALID_PENDING_RING_IDX (say 0, 5 and 9 in the
- * above example) number is the index into pending_tx_info and
- * mmap_pages arrays.
- */
struct pending_tx_info {
- struct xen_netif_tx_request req; /* coalesced tx request */
- pending_ring_idx_t head; /* head != INVALID_PENDING_RING_IDX
- * if it is head of one or more tx
- * reqs
- */
+ struct xen_netif_tx_request req; /* tx request */
/* callback data for released SKBs. The callback is always
* xenvif_zerocopy_callback, ctx points to the next fragment, desc
* contains the pending_idx
@@ -128,11 +98,6 @@ struct xenvif {
struct pending_tx_info pending_tx_info[MAX_PENDING_REQS];
grant_handle_t grant_tx_handle[MAX_PENDING_REQS];

- /* Coalescing tx requests before copying makes number of grant
- * copy ops greater or equal to number of slots required. In
- * worst case a tx request consumes 2 gnttab_copy.
- */
- struct gnttab_copy tx_copy_ops[2*MAX_PENDING_REQS];
struct gnttab_unmap_grant_ref tx_unmap_ops[MAX_PENDING_REQS];
struct gnttab_map_grant_ref tx_map_ops[MAX_PENDING_REQS];
/* passed to gnttab_[un]map_refs with pages under (un)mapping */
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index e544e9f..c91dd36 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -70,16 +70,6 @@ module_param(fatal_skb_slots, uint, 0444);
*/
#define XEN_NETBK_LEGACY_SLOTS_MAX XEN_NETIF_NR_SLOTS_MIN

-/*
- * If head != INVALID_PENDING_RING_IDX, it means this tx request is head of
- * one or more merged tx requests, otherwise it is the continuation of
- * previous tx request.
- */
-static inline int pending_tx_is_head(struct xenvif *vif, RING_IDX idx)
-{
- return vif->pending_tx_info[idx].head != INVALID_PENDING_RING_IDX;
-}
-
static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
u8 status);

@@ -856,19 +846,6 @@ static int xenvif_count_requests(struct xenvif *vif,
return slots;
}

-static struct page *xenvif_alloc_page(struct xenvif *vif,
- u16 pending_idx)
-{
- struct page *page;
-
- page = alloc_page(GFP_ATOMIC|__GFP_COLD);
- if (!page)
- return NULL;
- vif->mmap_pages[pending_idx] = page;
-
- return page;
-}
-
static inline void xenvif_tx_create_gop(struct xenvif *vif, u16 pending_idx,
struct xen_netif_tx_request *txp,
struct gnttab_map_grant_ref *gop)
@@ -891,13 +868,9 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
struct skb_shared_info *shinfo = skb_shinfo(skb);
skb_frag_t *frags = shinfo->frags;
u16 pending_idx = *((u16 *)skb->data);
- u16 head_idx = 0;
- int slot, start;
- struct page *page;
- pending_ring_idx_t index, start_idx = 0;
- uint16_t dst_offset;
+ int start;
+ pending_ring_idx_t index;
unsigned int nr_slots;
- struct pending_tx_info *first = NULL;

/* At this point shinfo->nr_frags is in fact the number of
* slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
@@ -918,18 +891,6 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);

return gop;
-err:
- /* Unwind, freeing all pages and sending error responses. */
- while (shinfo->nr_frags-- > start) {
- xenvif_idx_release(vif,
- frag_get_pending_idx(&frags[shinfo->nr_frags]),
- XEN_NETIF_RSP_ERROR);
- }
- /* The head too, if necessary. */
- if (start)
- xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR);
-
- return NULL;
}

static int xenvif_tx_check_gop(struct xenvif *vif,
@@ -942,7 +903,6 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
struct pending_tx_info *tx_info;
int nr_frags = shinfo->nr_frags;
int i, err, start;
- u16 peek; /* peek into next tx request */

/* Check status of header. */
err = gop->status;
@@ -964,11 +924,9 @@ static int xenvif_tx_check_gop(struct xenvif *vif,

for (i = start; i < nr_frags; i++) {
int j, newerr;
- pending_ring_idx_t head;

pending_idx = frag_get_pending_idx(&shinfo->frags[i]);
tx_info = &vif->pending_tx_info[pending_idx];
- head = tx_info->head;

/* Check error status: if okay then remember grant handle. */
newerr = (++gop)->status;
@@ -1396,7 +1354,6 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif)
< MAX_PENDING_REQS)) {
struct xen_netif_tx_request txreq;
struct xen_netif_tx_request txfrags[XEN_NETBK_LEGACY_SLOTS_MAX];
- struct page *page;
struct xen_netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX-1];
u16 pending_idx;
RING_IDX idx;
@@ -1759,7 +1716,6 @@ static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
{
struct pending_tx_info *pending_tx_info;
pending_ring_idx_t index;
- u16 peek; /* peek into next tx request */

pending_tx_info = &vif->pending_tx_info[pending_idx];
make_tx_response(vif, &pending_tx_info->req, status);

2013-10-30 09:11:43

by Paul Durrant

[permalink] [raw]
Subject: RE: [Xen-devel] [PATCH net-next RFC 2/5] xen-netback: Change TX path from grant copy to mapping

> -----Original Message-----
> From: [email protected] [mailto:xen-devel-
> [email protected]] On Behalf Of Zoltan Kiss
> Sent: 30 October 2013 00:50
> To: Ian Campbell; Wei Liu; [email protected];
> [email protected]; [email protected]; Jonathan Davies
> Cc: Zoltan Kiss
> Subject: [Xen-devel] [PATCH net-next RFC 2/5] xen-netback: Change TX path
> from grant copy to mapping
>
> This patch changes the grant copy on the TX patch to grant mapping
>
> Signed-off-by: Zoltan Kiss <[email protected]>
> ---
> drivers/net/xen-netback/interface.c | 39 +++++-
> drivers/net/xen-netback/netback.c | 241 +++++++++++++--------------------
> --
> 2 files changed, 126 insertions(+), 154 deletions(-)
>
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-
> netback/interface.c
> index f5c3c57..fb16ede 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -336,8 +336,20 @@ struct xenvif *xenvif_alloc(struct device *parent,
> domid_t domid,
> vif->pending_prod = MAX_PENDING_REQS;
> for (i = 0; i < MAX_PENDING_REQS; i++)
> vif->pending_ring[i] = i;
> - for (i = 0; i < MAX_PENDING_REQS; i++)
> - vif->mmap_pages[i] = NULL;
> + err = alloc_xenballooned_pages(MAX_PENDING_REQS,
> + vif->mmap_pages,
> + false);

Since this is a per-vif allocation, is this going to scale?

> + if (err) {
> + netdev_err(dev, "Could not reserve mmap_pages\n");
> + return NULL;
> + }
> + for (i = 0; i < MAX_PENDING_REQS; i++) {
> + vif->pending_tx_info[i].callback_struct = (struct ubuf_info)
> + { .callback = xenvif_zerocopy_callback,
> + .ctx = NULL,
> + .desc = i };
> + vif->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
> + }
>
> /*
> * Initialise a dummy MAC address. We choose the numerically
> @@ -481,11 +493,34 @@ void xenvif_disconnect(struct xenvif *vif)
>
> void xenvif_free(struct xenvif *vif)
> {
> + int i;
> +
> netif_napi_del(&vif->napi);
>
> unregister_netdev(vif->dev);
>
> free_netdev(vif->dev);
>
> + /* FIXME: This is a workaround for 2 problems:
> + * - the guest sent a packet just before teardown, and it is still not
> + * delivered
> + * - the stack forgot to notify us that we can give back a slot
> + * For the first problem we shouldn't do this, as the skb might still
> + * access that page. I will come up with a more robust solution later.
> + * The second is definitely a bug, it leaks a slot from the ring
> + * forever.
> + * Classic kernel patchset has delayed copy for that, we might want to
> + * reuse that?

I think you're going to have to. You can't have once guest left at the mercy of another; if the mapping sticks around for too long then you really need the copy-aside.

> + */
> + for (i = 0; i < MAX_PENDING_REQS; ++i) {
> + if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
> + netdev_err(vif->dev,
> + "Page still granted! Index: %x\n", i);
> + xenvif_idx_unmap(vif, i);
> + }
> + }
> +
> + free_xenballooned_pages(MAX_PENDING_REQS, vif-
> >mmap_pages);
> +
> module_put(THIS_MODULE);
> }
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-
> netback/netback.c
> index 10470dc..e544e9f 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -883,10 +883,10 @@ static inline void xenvif_tx_create_gop(struct
> xenvif *vif, u16 pending_idx,
>
> }
>
> -static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
> +static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif
> *vif,
> struct sk_buff *skb,
> struct xen_netif_tx_request *txp,
> - struct gnttab_copy *gop)
> + struct gnttab_map_grant_ref
> *gop)
> {
> struct skb_shared_info *shinfo = skb_shinfo(skb);
> skb_frag_t *frags = shinfo->frags;
> @@ -907,83 +907,12 @@ static struct gnttab_copy
> *xenvif_get_requests(struct xenvif *vif,
> /* Skip first skb fragment if it is on same page as header fragment. */
> start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
>
> - /* Coalesce tx requests, at this point the packet passed in
> - * should be <= 64K. Any packets larger than 64K have been
> - * handled in xenvif_count_requests().
> - */
> - for (shinfo->nr_frags = slot = start; slot < nr_slots;
> - shinfo->nr_frags++) {
> - struct pending_tx_info *pending_tx_info =
> - vif->pending_tx_info;
> -
> - page = alloc_page(GFP_ATOMIC|__GFP_COLD);
> - if (!page)
> - goto err;
> -
> - dst_offset = 0;
> - first = NULL;
> - while (dst_offset < PAGE_SIZE && slot < nr_slots) {
> - gop->flags = GNTCOPY_source_gref;
> -
> - gop->source.u.ref = txp->gref;
> - gop->source.domid = vif->domid;
> - gop->source.offset = txp->offset;
> -
> - gop->dest.domid = DOMID_SELF;
> -
> - gop->dest.offset = dst_offset;
> - gop->dest.u.gmfn =
> virt_to_mfn(page_address(page));
> -
> - if (dst_offset + txp->size > PAGE_SIZE) {
> - /* This page can only merge a portion
> - * of tx request. Do not increment any
> - * pointer / counter here. The txp
> - * will be dealt with in future
> - * rounds, eventually hitting the
> - * `else` branch.
> - */
> - gop->len = PAGE_SIZE - dst_offset;
> - txp->offset += gop->len;
> - txp->size -= gop->len;
> - dst_offset += gop->len; /* quit loop */
> - } else {
> - /* This tx request can be merged in the page
> */
> - gop->len = txp->size;
> - dst_offset += gop->len;
> -
> + for (shinfo->nr_frags = start; shinfo->nr_frags < nr_slots;
> + shinfo->nr_frags++, txp++, gop++) {
> index = pending_index(vif-
> >pending_cons++);
> -
> pending_idx = vif->pending_ring[index];
> -
> -
> memcpy(&pending_tx_info[pending_idx].req, txp,
> - sizeof(*txp));
> -
> - /* Poison these fields, corresponding
> - * fields for head tx req will be set
> - * to correct values after the loop.
> - */
> - vif->mmap_pages[pending_idx] = (void
> *)(~0UL);
> - pending_tx_info[pending_idx].head =
> - INVALID_PENDING_RING_IDX;
> -
> - if (!first) {
> - first =
> &pending_tx_info[pending_idx];
> - start_idx = index;
> - head_idx = pending_idx;
> - }
> -
> - txp++;
> - slot++;
> - }
> -
> - gop++;
> - }
> -
> - first->req.offset = 0;
> - first->req.size = dst_offset;
> - first->head = start_idx;
> - vif->mmap_pages[head_idx] = page;
> - frag_set_pending_idx(&frags[shinfo->nr_frags], head_idx);
> + xenvif_tx_create_gop(vif, pending_idx, txp, gop);
> + frag_set_pending_idx(&frags[shinfo->nr_frags],
> pending_idx);
> }
>
> BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);
> @@ -1005,9 +934,9 @@ err:
>
> static int xenvif_tx_check_gop(struct xenvif *vif,
> struct sk_buff *skb,
> - struct gnttab_copy **gopp)
> + struct gnttab_map_grant_ref **gopp)
> {
> - struct gnttab_copy *gop = *gopp;
> + struct gnttab_map_grant_ref *gop = *gopp;
> u16 pending_idx = *((u16 *)skb->data);
> struct skb_shared_info *shinfo = skb_shinfo(skb);
> struct pending_tx_info *tx_info;
> @@ -1019,6 +948,16 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
> err = gop->status;
> if (unlikely(err))
> xenvif_idx_release(vif, pending_idx,
> XEN_NETIF_RSP_ERROR);
> + else {
> + if (vif->grant_tx_handle[pending_idx] !=
> + NETBACK_INVALID_HANDLE) {
> + netdev_err(vif->dev,
> + "Stale mapped handle! pending_idx %x
> handle %x\n",
> + pending_idx, vif-
> >grant_tx_handle[pending_idx]);
> + xenvif_fatal_tx_err(vif);
> + }
> + vif->grant_tx_handle[pending_idx] = gop->handle;
> + }
>
> /* Skip first skb fragment if it is on same page as header fragment. */
> start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
> @@ -1032,18 +971,24 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
> head = tx_info->head;
>
> /* Check error status: if okay then remember grant handle.
> */
> - do {
> newerr = (++gop)->status;
> - if (newerr)
> - break;
> - peek = vif->pending_ring[pending_index(++head)];
> - } while (!pending_tx_is_head(vif, peek));
>
> if (likely(!newerr)) {
> + if (vif->grant_tx_handle[pending_idx] !=
> + NETBACK_INVALID_HANDLE) {
> + netdev_err(vif->dev,
> + "Stale mapped handle! pending_idx
> %x handle %x\n",
> + pending_idx,
> + vif->grant_tx_handle[pending_idx]);
> + xenvif_fatal_tx_err(vif);
> + }
> + vif->grant_tx_handle[pending_idx] = gop->handle;
> /* Had a previous error? Invalidate this fragment. */
> - if (unlikely(err))
> + if (unlikely(err)) {
> + xenvif_idx_unmap(vif, pending_idx);
> xenvif_idx_release(vif, pending_idx,
> XEN_NETIF_RSP_OKAY);
> + }
> continue;
> }
>
> @@ -1056,9 +1001,11 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
>
> /* First error: invalidate header and preceding fragments. */
> pending_idx = *((u16 *)skb->data);
> + xenvif_idx_unmap(vif, pending_idx);
> xenvif_idx_release(vif, pending_idx,
> XEN_NETIF_RSP_OKAY);
> for (j = start; j < i; j++) {
> pending_idx = frag_get_pending_idx(&shinfo-
> >frags[j]);
> + xenvif_idx_unmap(vif, pending_idx);
> xenvif_idx_release(vif, pending_idx,
> XEN_NETIF_RSP_OKAY);
> }
> @@ -1071,7 +1018,8 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
> return err;
> }
>
> -static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb)
> +static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb,
> + u16 prev_pending_idx)
> {
> struct skb_shared_info *shinfo = skb_shinfo(skb);
> int nr_frags = shinfo->nr_frags;
> @@ -1085,6 +1033,15 @@ static void xenvif_fill_frags(struct xenvif *vif,
> struct sk_buff *skb)
>
> pending_idx = frag_get_pending_idx(frag);
>
> + /* If this is not the first frag, chain it to the previous*/
> + vif->pending_tx_info[pending_idx].callback_struct.ctx =
> NULL;
> + if (pending_idx != prev_pending_idx) {
> + vif-
> >pending_tx_info[prev_pending_idx].callback_struct.ctx =
> + &(vif-
> >pending_tx_info[pending_idx].callback_struct);
> + prev_pending_idx = pending_idx;
> + }
> +
> +
> txp = &vif->pending_tx_info[pending_idx].req;
> page = virt_to_page(idx_to_kaddr(vif, pending_idx));
> __skb_fill_page_desc(skb, i, page, txp->offset, txp->size);
> @@ -1092,10 +1049,15 @@ static void xenvif_fill_frags(struct xenvif *vif,
> struct sk_buff *skb)
> skb->data_len += txp->size;
> skb->truesize += txp->size;
>
> - /* Take an extra reference to offset xenvif_idx_release */
> + /* Take an extra reference to offset network stack's
> put_page */
> get_page(vif->mmap_pages[pending_idx]);
> - xenvif_idx_release(vif, pending_idx,
> XEN_NETIF_RSP_OKAY);
> }
> + /* FIXME: __skb_fill_page_desc set this to true because page-
> >pfmemalloc
> + * overlaps with "index", and "mapping" is not set. I think mapping
> + * should be set. If delivered to local stack, it would drop this
> + * skb in sk_filter unless the socket has the right to use it.
> + */
> + skb->pfmemalloc = false;
> }
>
> static int xenvif_get_extras(struct xenvif *vif,
> @@ -1426,7 +1388,7 @@ static bool tx_credit_exceeded(struct xenvif *vif,
> unsigned size)
>
> static unsigned xenvif_tx_build_gops(struct xenvif *vif)
> {
> - struct gnttab_copy *gop = vif->tx_copy_ops, *request_gop;
> + struct gnttab_map_grant_ref *gop = vif->tx_map_ops,
> *request_gop;
> struct sk_buff *skb;
> int ret;
>
> @@ -1533,30 +1495,10 @@ static unsigned xenvif_tx_build_gops(struct
> xenvif *vif)
> }
> }
>
> - /* XXX could copy straight to head */
> - page = xenvif_alloc_page(vif, pending_idx);
> - if (!page) {
> - kfree_skb(skb);
> - xenvif_tx_err(vif, &txreq, idx);
> - break;
> - }
> -
> - gop->source.u.ref = txreq.gref;
> - gop->source.domid = vif->domid;
> - gop->source.offset = txreq.offset;
> -
> - gop->dest.u.gmfn = virt_to_mfn(page_address(page));
> - gop->dest.domid = DOMID_SELF;
> - gop->dest.offset = txreq.offset;
> -
> - gop->len = txreq.size;
> - gop->flags = GNTCOPY_source_gref;
> + xenvif_tx_create_gop(vif, pending_idx, &txreq, gop);
>
> gop++;
>
> - memcpy(&vif->pending_tx_info[pending_idx].req,
> - &txreq, sizeof(txreq));
> - vif->pending_tx_info[pending_idx].head = index;
> *((u16 *)skb->data) = pending_idx;
>
> __skb_put(skb, data_len);
> @@ -1585,17 +1527,17 @@ static unsigned xenvif_tx_build_gops(struct
> xenvif *vif)
>
> vif->tx.req_cons = idx;
>
> - if ((gop-vif->tx_copy_ops) >= ARRAY_SIZE(vif-
> >tx_copy_ops))
> + if ((gop-vif->tx_map_ops) >= ARRAY_SIZE(vif-
> >tx_map_ops))
> break;
> }
>
> - return gop - vif->tx_copy_ops;
> + return gop - vif->tx_map_ops;
> }
>
>
> static int xenvif_tx_submit(struct xenvif *vif, int budget)
> {
> - struct gnttab_copy *gop = vif->tx_copy_ops;
> + struct gnttab_map_grant_ref *gop = vif->tx_map_ops;
> struct sk_buff *skb;
> int work_done = 0;
>
> @@ -1620,14 +1562,25 @@ static int xenvif_tx_submit(struct xenvif *vif, int
> budget)
> memcpy(skb->data,
> (void *)(idx_to_kaddr(vif, pending_idx)|txp->offset),
> data_len);
> + vif->pending_tx_info[pending_idx].callback_struct.ctx =
> NULL;
> if (data_len < txp->size) {
> /* Append the packet payload as a fragment. */
> txp->offset += data_len;
> txp->size -= data_len;
> - } else {
> + skb_shinfo(skb)->destructor_arg =
> + &vif-
> >pending_tx_info[pending_idx].callback_struct;
> + } else if (!skb_shinfo(skb)->nr_frags) {
> /* Schedule a response immediately. */
> + skb_shinfo(skb)->destructor_arg = NULL;
> + xenvif_idx_unmap(vif, pending_idx);
> xenvif_idx_release(vif, pending_idx,
> XEN_NETIF_RSP_OKAY);
> + } else {
> + /* FIXME: first request fits linear space, I don't know
> + * if any guest would do that, but I think it's possible
> + */

The Windows frontend, because it has to parse the packet headers, will coalesce everything up to the payload in a single frag and it would be a good idea to copy this directly into the linear area.

> + skb_shinfo(skb)->destructor_arg =
> + &vif-
> >pending_tx_info[pending_idx].callback_struct;
> }
>
> if (txp->flags & XEN_NETTXF_csum_blank)
> @@ -1635,13 +1588,19 @@ static int xenvif_tx_submit(struct xenvif *vif, int
> budget)
> else if (txp->flags & XEN_NETTXF_data_validated)
> skb->ip_summed = CHECKSUM_UNNECESSARY;
>
> - xenvif_fill_frags(vif, skb);
> + xenvif_fill_frags(vif, skb, pending_idx);
>
> if (skb_is_nonlinear(skb) && skb_headlen(skb) <
> PKT_PROT_LEN) {
> int target = min_t(int, skb->len, PKT_PROT_LEN);
> __pskb_pull_tail(skb, target - skb_headlen(skb));
> }
>
> + /* Set this flag after __pskb_pull_tail, as it can trigger
> + * skb_copy_ubufs, while we are still in control of the skb
> + */

You can't be sure that there will be no subsequent pullups. The v6 parsing code I added may need to do that on a (hopefully) rare occasion.

Paul

> + if (skb_shinfo(skb)->destructor_arg)
> + skb_shinfo(skb)->tx_flags |=
> SKBTX_DEV_ZEROCOPY;
> +
> skb->dev = vif->dev;
> skb->protocol = eth_type_trans(skb, skb->dev);
> skb_reset_network_header(skb);
> @@ -1770,17 +1729,25 @@ static inline void xenvif_tx_action_dealloc(struct
> xenvif *vif)
> int xenvif_tx_action(struct xenvif *vif, int budget)
> {
> unsigned nr_gops;
> - int work_done;
> + int work_done, ret;
>
> if (unlikely(!tx_work_todo(vif)))
> return 0;
>
> + xenvif_tx_action_dealloc(vif);
> +
> nr_gops = xenvif_tx_build_gops(vif);
>
> if (nr_gops == 0)
> return 0;
>
> - gnttab_batch_copy(vif->tx_copy_ops, nr_gops);
> + if (nr_gops) {
> + ret = gnttab_map_refs(vif->tx_map_ops,
> + NULL,
> + vif->pages_to_gnt,
> + nr_gops);
> + BUG_ON(ret);
> + }
>
> work_done = xenvif_tx_submit(vif, nr_gops);
>
> @@ -1791,45 +1758,13 @@ static void xenvif_idx_release(struct xenvif *vif,
> u16 pending_idx,
> u8 status)
> {
> struct pending_tx_info *pending_tx_info;
> - pending_ring_idx_t head;
> + pending_ring_idx_t index;
> u16 peek; /* peek into next tx request */
>
> - BUG_ON(vif->mmap_pages[pending_idx] == (void *)(~0UL));
> -
> - /* Already complete? */
> - if (vif->mmap_pages[pending_idx] == NULL)
> - return;
> -
> - pending_tx_info = &vif->pending_tx_info[pending_idx];
> -
> - head = pending_tx_info->head;
> -
> - BUG_ON(!pending_tx_is_head(vif, head));
> - BUG_ON(vif->pending_ring[pending_index(head)] != pending_idx);
> -
> - do {
> - pending_ring_idx_t index;
> - pending_ring_idx_t idx = pending_index(head);
> - u16 info_idx = vif->pending_ring[idx];
> -
> - pending_tx_info = &vif->pending_tx_info[info_idx];
> + pending_tx_info = &vif->pending_tx_info[pending_idx];
> make_tx_response(vif, &pending_tx_info->req, status);
> -
> - /* Setting any number other than
> - * INVALID_PENDING_RING_IDX indicates this slot is
> - * starting a new packet / ending a previous packet.
> - */
> - pending_tx_info->head = 0;
> -
> index = pending_index(vif->pending_prod++);
> - vif->pending_ring[index] = vif->pending_ring[info_idx];
> -
> - peek = vif->pending_ring[pending_index(++head)];
> -
> - } while (!pending_tx_is_head(vif, peek));
> -
> - put_page(vif->mmap_pages[pending_idx]);
> - vif->mmap_pages[pending_idx] = NULL;
> + vif->pending_ring[index] = pending_idx;
> }
>
> void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx)
> @@ -1904,6 +1839,8 @@ static inline int rx_work_todo(struct xenvif *vif)
>
> static inline int tx_work_todo(struct xenvif *vif)
> {
> + if (vif->dealloc_cons != vif->dealloc_prod)
> + return 1;
>
> if (likely(RING_HAS_UNCONSUMED_REQUESTS(&vif->tx)) &&
> (nr_pending_reqs(vif) + XEN_NETBK_LEGACY_SLOTS_MAX
>
> _______________________________________________
> Xen-devel mailing list
> [email protected]
> http://lists.xen.org/xen-devel

2013-10-30 09:28:12

by Jan Beulich

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH net-next RFC 1/5] xen-netback: Introduce TX grant map definitions

>>> On 30.10.13 at 01:50, Zoltan Kiss <[email protected]> wrote:
> @@ -119,13 +126,22 @@ struct xenvif {
> pending_ring_idx_t pending_cons;
> u16 pending_ring[MAX_PENDING_REQS];
> struct pending_tx_info pending_tx_info[MAX_PENDING_REQS];
> + grant_handle_t grant_tx_handle[MAX_PENDING_REQS];
>
> /* Coalescing tx requests before copying makes number of grant
> * copy ops greater or equal to number of slots required. In
> * worst case a tx request consumes 2 gnttab_copy.
> */
> struct gnttab_copy tx_copy_ops[2*MAX_PENDING_REQS];
> + struct gnttab_unmap_grant_ref tx_unmap_ops[MAX_PENDING_REQS];
> + struct gnttab_map_grant_ref tx_map_ops[MAX_PENDING_REQS];
> + /* passed to gnttab_[un]map_refs with pages under (un)mapping */
> + struct page *pages_to_gnt[MAX_PENDING_REQS];

I think you will want to try to limit the structure size here by putting
things into unions that can't be used at the same time: Without
having looked at the later patches yet, it seems quite unlikely that
map and unmap can be used simultaneously. And the total of copy
and map can't also be used at the same time, as for each pending
request you would use either up to two copy slots or a single map
slot. I didn't look for further opportunities of sharing space.

Jan

2013-10-30 09:39:57

by Jan Beulich

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH net-next RFC 3/5] xen-netback: Remove old TX grant copy definitons

>>> On 30.10.13 at 01:50, Zoltan Kiss <[email protected]> wrote:
> These became obsolate with grant mapping.

I didn't look at this in detail, but I'm surprised you can get away
without any copying: For one, the header part needs copying
anyway, so you'd pointlessly map and then copy it if it's small
enough. And second you need to be prepared for the frontend
to hand you more slots than you can fit in MAX_SKB_FRAGS
(namely when MAX_SKB_FRAGS < XEN_NETIF_NR_SLOTS_MIN),
which you can't handle with mapping alone afaict.

Jan

2013-10-30 11:13:16

by Wei Liu

[permalink] [raw]
Subject: Re: [PATCH net-next RFC 4/5] xen-netback: Fix indentations

On Wed, Oct 30, 2013 at 12:50:19AM +0000, Zoltan Kiss wrote:
> I've left intentionally these indentations in this way, to improve readability of previous patches.
>

Apparently this doesn't deserve a single patch -- please squash it when
you post non-RFC series.

Wei.

2013-10-30 11:13:22

by Wei Liu

[permalink] [raw]
Subject: Re: [PATCH net-next RFC 3/5] xen-netback: Remove old TX grant copy definitons

On Wed, Oct 30, 2013 at 12:50:18AM +0000, Zoltan Kiss wrote:
> These became obsolate with grant mapping.
>

I'm afraid not.

This TX coalescing mechanism is designed to handle the situation where
guest's MAX_SKB_FRAGS > host's MAX_SKB_FRAGS.

To a further extent, I think you might need to add mechanism for backend
to tell frontend how many frags it can handle, and frontend needs to do
necessary coalescing. Until then you can safely use this new mapping
scheme.

Wei.

2013-10-30 19:16:27

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy

On Wed, Oct 30, 2013 at 12:50:15AM +0000, Zoltan Kiss wrote:
> A long known problem of the upstream netback implementation that on the TX
> path (from guest to Dom0) it copies the whole packet from guest memory into
> Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
> huge perfomance penalty. The classic kernel version of netback used grant
> mapping, and to get notified when the page can be unmapped, it used page
> destructors. Unfortunately that destructor is not an upstreamable solution.
> Ian Campbell's skb fragment destructor patch series
> (http://lwn.net/Articles/491522/) tried to solve this problem, however it
> seems to be very invasive on the network stack's code, and therefore haven't
> progressed very well.
> This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
> know when the skb is freed up. That is the way KVM solved the same problem,
> and based on my initial tests it can do the same for us. Avoiding the extra
> copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
> Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
> running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
> switch)
> Based on my investigations the packet get only copied if it is delivered to
> Dom0 stack, which is due to this patch:
> https://lkml.org/lkml/2012/7/20/363
> That's a bit unfortunate, but as far as I know for the huge majority this use
> case is not too important. There are a couple of things which need more
> polishing, see the FIXME comments. I will run some more extensive tests, but
> in the meantime I would like to hear comments about what I've done so far.
> I've tried to broke it down to smaller patches, with mixed results, so I
> welcome suggestions on that part as well:
> 1/5: Introduce TX grant map definitions
> 2/5: Change TX path from grant copy to mapping
> 3/5: Remove old TX grant copy definitons
> 4/5: Fix indentations
> 5/5: Change RX path for mapped SKB fragments

Odd. I don't see #5 patch patch?
>
> Signed-off-by: Zoltan Kiss <[email protected]>
>
>
> _______________________________________________
> Xen-devel mailing list
> [email protected]
> http://lists.xen.org/xen-devel

2013-10-30 19:17:31

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy

On Wed, Oct 30, 2013 at 03:16:17PM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Oct 30, 2013 at 12:50:15AM +0000, Zoltan Kiss wrote:
> > A long known problem of the upstream netback implementation that on the TX
> > path (from guest to Dom0) it copies the whole packet from guest memory into
> > Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
> > huge perfomance penalty. The classic kernel version of netback used grant
> > mapping, and to get notified when the page can be unmapped, it used page
> > destructors. Unfortunately that destructor is not an upstreamable solution.
> > Ian Campbell's skb fragment destructor patch series
> > (http://lwn.net/Articles/491522/) tried to solve this problem, however it
> > seems to be very invasive on the network stack's code, and therefore haven't
> > progressed very well.
> > This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
> > know when the skb is freed up. That is the way KVM solved the same problem,
> > and based on my initial tests it can do the same for us. Avoiding the extra
> > copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
> > Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
> > running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
> > switch)
> > Based on my investigations the packet get only copied if it is delivered to
> > Dom0 stack, which is due to this patch:
> > https://lkml.org/lkml/2012/7/20/363
> > That's a bit unfortunate, but as far as I know for the huge majority this use
> > case is not too important. There are a couple of things which need more
> > polishing, see the FIXME comments. I will run some more extensive tests, but
> > in the meantime I would like to hear comments about what I've done so far.
> > I've tried to broke it down to smaller patches, with mixed results, so I
> > welcome suggestions on that part as well:
> > 1/5: Introduce TX grant map definitions
> > 2/5: Change TX path from grant copy to mapping
> > 3/5: Remove old TX grant copy definitons
> > 4/5: Fix indentations
> > 5/5: Change RX path for mapped SKB fragments
>
> Odd. I don't see #5 patch patch?

Ah, you have two #4 patches:

[PATCH net-next RFC 4/5] xen-netback: Change RX path for mapped SKB fragments
[PATCH net-next RFC 4/5] xen-netback: Fix indentations

!
> >
> > Signed-off-by: Zoltan Kiss <[email protected]>
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > [email protected]
> > http://lists.xen.org/xen-devel

2013-10-30 21:10:56

by Zoltan Kiss

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH net-next RFC 2/5] xen-netback: Change TX path from grant copy to mapping

On 30/10/13 09:11, Paul Durrant wrote:
>> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-
>> netback/interface.c
>> index f5c3c57..fb16ede 100644
>> --- a/drivers/net/xen-netback/interface.c
>> +++ b/drivers/net/xen-netback/interface.c
>> @@ -336,8 +336,20 @@ struct xenvif *xenvif_alloc(struct device *parent,
>> domid_t domid,
>> vif->pending_prod = MAX_PENDING_REQS;
>> for (i = 0; i < MAX_PENDING_REQS; i++)
>> vif->pending_ring[i] = i;
>> - for (i = 0; i < MAX_PENDING_REQS; i++)
>> - vif->mmap_pages[i] = NULL;
>> + err = alloc_xenballooned_pages(MAX_PENDING_REQS,
>> + vif->mmap_pages,
>> + false);
>
> Since this is a per-vif allocation, is this going to scale?
Good question, I'll look after this.

>> @@ -1620,14 +1562,25 @@ static int xenvif_tx_submit(struct xenvif *vif, int
>> budget)
>> memcpy(skb->data,
>> (void *)(idx_to_kaddr(vif, pending_idx)|txp->offset),
>> data_len);
>> + vif->pending_tx_info[pending_idx].callback_struct.ctx =
>> NULL;
>> if (data_len < txp->size) {
>> /* Append the packet payload as a fragment. */
>> txp->offset += data_len;
>> txp->size -= data_len;
>> - } else {
>> + skb_shinfo(skb)->destructor_arg =
>> + &vif-
>>> pending_tx_info[pending_idx].callback_struct;
>> + } else if (!skb_shinfo(skb)->nr_frags) {
>> /* Schedule a response immediately. */
>> + skb_shinfo(skb)->destructor_arg = NULL;
>> + xenvif_idx_unmap(vif, pending_idx);
>> xenvif_idx_release(vif, pending_idx,
>> XEN_NETIF_RSP_OKAY);
>> + } else {
>> + /* FIXME: first request fits linear space, I don't know
>> + * if any guest would do that, but I think it's possible
>> + */
>
> The Windows frontend, because it has to parse the packet headers, will coalesce everything up to the payload in a single frag and it would be a good idea to copy this directly into the linear area.
I forgot to clarify this comment: the problem I wanted to handle here if
the first request's size is PKT_PROT_LEN and there is more fragments.
Then skb->len will be PKT_PROT_LEN as well, and the if statement falls
through to the else branch. That might be problematic if we release the
slot of the first request separately from the others. Or am I
overlooking something? Does that matter to netfront anyway?
And this problem, if it's true, applies to the previous, grant copy
method as well.
However, as I think, it might be better to change the condition to
(data_len <= txp->size), rather than putting an if-else statement into
the else branch.

>> @@ -1635,13 +1588,19 @@ static int xenvif_tx_submit(struct xenvif *vif, int
>> budget)
>> else if (txp->flags & XEN_NETTXF_data_validated)
>> skb->ip_summed = CHECKSUM_UNNECESSARY;
>>
>> - xenvif_fill_frags(vif, skb);
>> + xenvif_fill_frags(vif, skb, pending_idx);
>>
>> if (skb_is_nonlinear(skb) && skb_headlen(skb) <
>> PKT_PROT_LEN) {
>> int target = min_t(int, skb->len, PKT_PROT_LEN);
>> __pskb_pull_tail(skb, target - skb_headlen(skb));
>> }
>>
>> + /* Set this flag after __pskb_pull_tail, as it can trigger
>> + * skb_copy_ubufs, while we are still in control of the skb
>> + */
>
> You can't be sure that there will be no subsequent pullups. The v6 parsing code I added may need to do that on a (hopefully) rare occasion.
The only thing matters that it shouldn't happen between this and before
calling netif_receive_skb. I think I will move this right before it, and
expand the comment.

Zoli

2013-10-30 21:14:36

by Zoltan Kiss

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH net-next RFC 0/5] xen-netback: TX grant mapping instead of copy

On 30/10/13 19:17, Konrad Rzeszutek Wilk wrote:
> On Wed, Oct 30, 2013 at 03:16:17PM -0400, Konrad Rzeszutek Wilk wrote:
>> Odd. I don't see #5 patch patch?
>
> Ah, you have two #4 patches:
>
> [PATCH net-next RFC 4/5] xen-netback: Change RX path for mapped SKB fragments
> [PATCH net-next RFC 4/5] xen-netback: Fix indentations

Yep, sorry, I will fix it up in the next version!

Zoli

2013-10-31 19:22:32

by Zoltan Kiss

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH net-next RFC 1/5] xen-netback: Introduce TX grant map definitions

On 30/10/13 09:28, Jan Beulich wrote:
>>>> On 30.10.13 at 01:50, Zoltan Kiss <[email protected]> wrote:
>> @@ -119,13 +126,22 @@ struct xenvif {
>> pending_ring_idx_t pending_cons;
>> u16 pending_ring[MAX_PENDING_REQS];
>> struct pending_tx_info pending_tx_info[MAX_PENDING_REQS];
>> + grant_handle_t grant_tx_handle[MAX_PENDING_REQS];
>>
>> /* Coalescing tx requests before copying makes number of grant
>> * copy ops greater or equal to number of slots required. In
>> * worst case a tx request consumes 2 gnttab_copy.
>> */
>> struct gnttab_copy tx_copy_ops[2*MAX_PENDING_REQS];
>> + struct gnttab_unmap_grant_ref tx_unmap_ops[MAX_PENDING_REQS];
>> + struct gnttab_map_grant_ref tx_map_ops[MAX_PENDING_REQS];
>> + /* passed to gnttab_[un]map_refs with pages under (un)mapping */
>> + struct page *pages_to_gnt[MAX_PENDING_REQS];
>
> I think you will want to try to limit the structure size here by putting
> things into unions that can't be used at the same time: Without
> having looked at the later patches yet, it seems quite unlikely that
> map and unmap can be used simultaneously. And the total of copy
> and map can't also be used at the same time, as for each pending
> request you would use either up to two copy slots or a single map
> slot. I didn't look for further opportunities of sharing space.

Indeed, map and unmap can't be done at the same time, so it's safe to
put them into union. But I'm afraid grant_tx_handle and pages_to_gnt
can't share space with other members.
tx_copy_ops is a different topic, let's discuss that in it's own thread ...

Thanks,

Zoli

2013-10-31 19:34:05

by Zoltan Kiss

[permalink] [raw]
Subject: Re: [PATCH net-next RFC 1/5] xen-netback: Introduce TX grant map definitions

On 30/10/13 00:50, Zoltan Kiss wrote:
> +void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
> +{
> + unsigned long flags;
> + pending_ring_idx_t index;
> + u16 pending_idx = ubuf->desc;
> + struct pending_tx_info *temp =
> + container_of(ubuf, struct pending_tx_info, callback_struct);
> + struct xenvif *vif =
> + container_of(temp - pending_idx, struct xenvif,
> + pending_tx_info[0]);
> +
> + spin_lock_irqsave(&vif->dealloc_lock, flags);
> + do {
> + pending_idx = ubuf->desc;
> + ubuf = (struct ubuf_info *) ubuf->ctx;
> + index = pending_index(vif->dealloc_prod);
> + vif->dealloc_ring[index] = pending_idx;
> + /* Sync with xenvif_tx_action_dealloc:
> + * insert idx then incr producer.
> + */
> + smp_wmb();
> + vif->dealloc_prod++;
> + napi_schedule(&vif->napi);
> + } while (ubuf);
> + spin_unlock_irqrestore(&vif->dealloc_lock, flags);
> +}

Another possible place for improvement is the placing of napi_schedule.
Now it get called after every fragment, which is probably suboptimal. I
think it's likely that the vif thread can't finish one dealloc faster
than one iteration of this while loop.
Another idea is to place it after the while, so it get called only once,
but in the meantime the thread doesn't have chance to start working on
the deallocs.
A compromise might be to do it once in the first iteration of the loop,
and then once after the loop to make sure the thread knows about the
dealloc.
Thoughts?

Zoli

2013-10-31 19:46:41

by Zoltan Kiss

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH net-next RFC 3/5] xen-netback: Remove old TX grant copy definitons

On 30/10/13 09:39, Jan Beulich wrote:
>>>> On 30.10.13 at 01:50, Zoltan Kiss <[email protected]> wrote:
>> These became obsolate with grant mapping.
>
> I didn't look at this in detail, but I'm surprised you can get away
> without any copying: For one, the header part needs copying
> anyway, so you'd pointlessly map and then copy it if it's small
> enough.
Yep, that's a further plan for optimization. I think I will add that as
a separate patch to the series later. But that doesn't necessarily needs
these definitions, let's see that later.

> And second you need to be prepared for the frontend
> to hand you more slots than you can fit in MAX_SKB_FRAGS
> (namely when MAX_SKB_FRAGS < XEN_NETIF_NR_SLOTS_MIN),
> which you can't handle with mapping alone afaict.
Oh, I was not aware of this problem. And indeed, the trivial solution is
to keep the grant copy methods for such kind of packets, however that
sounds quite nasty.
My another idea is to use skb_shinfo(skb)->frag_list for such packets,
so the stack will see it as a fragmented IP packet. It might be less
efficient than coalescing them into one skb during grant copy at first
place, but probably a cleaner solution. If we don't care that much about
the performance of such guests, it might be a better solution.
But I don't know that closely the IP fragmentation ideas, so it might be
a bad idea. I'm happy to hear comments from people who have more
experience with that.

Zoli

2013-10-31 19:48:40

by Zoltan Kiss

[permalink] [raw]
Subject: Re: [PATCH net-next RFC 4/5] xen-netback: Fix indentations

On 30/10/13 11:13, Wei Liu wrote:
> On Wed, Oct 30, 2013 at 12:50:19AM +0000, Zoltan Kiss wrote:
>> I've left intentionally these indentations in this way, to improve readability of previous patches.
>>
>
> Apparently this doesn't deserve a single patch -- please squash it when
> you post non-RFC series.
>
> Wei.
>
Yep, I just want to keep them separately until the series get it's final
form.

Zoli