2013-09-04 20:47:29

by Zoltan Kiss

[permalink] [raw]
Subject: [PATCH] net/core: Order-3 frag allocator causes SWIOTLB bouncing under Xen

THIS PATCH IS NOT INTENDED TO BE UPSTREAMED, IT HAS ONLY INFORMING PURPOSES!

I've noticed a performance regression with upstream kernels when used as Dom0
under Xen. The classic kernel can utilize the whole bandwidth of a 10G NIC
(ca. 9.3 Gbps), but upstream can reach only ca. 7 Gbps. I found that it
happens because SWIOTLB has to do double buffering. The per task frag
allocator introduced in 5640f7 creates 32 kb frags, which are not contiguous
in mfn space.
This patch provides a workaround by going back to the old way. The possible
ideas came up to solve this:

* make sure Dom0 memory is contiguous: it sounds trivial, but doesn't work with
driver domains, and there are lots of situations where this is not possible.
* use PVH Dom0: so we will have IOMMU. In the future sometime.
* use IOMMU with PV Dom0: this seems to happen earlier.

Signed-off-by: Zoltan Kiss <[email protected]>
---
net/core/sock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 2c097c5..854a0ea 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1812,7 +1812,7 @@ struct sk_buff *sock_alloc_send_skb(struct sock *sk, unsigned long size,
EXPORT_SYMBOL(sock_alloc_send_skb);

/* On 32bit arches, an skb frag is limited to 2^15 */
-#define SKB_FRAG_PAGE_ORDER get_order(32768)
+#define SKB_FRAG_PAGE_ORDER get_order(4096)

bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag)
{


2013-09-04 21:00:45

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH] net/core: Order-3 frag allocator causes SWIOTLB bouncing under Xen

On Wed, 2013-09-04 at 21:47 +0100, Zoltan Kiss wrote:
> THIS PATCH IS NOT INTENDED TO BE UPSTREAMED, IT HAS ONLY INFORMING PURPOSES!
>
> I've noticed a performance regression with upstream kernels when used as Dom0
> under Xen. The classic kernel can utilize the whole bandwidth of a 10G NIC
> (ca. 9.3 Gbps), but upstream can reach only ca. 7 Gbps. I found that it
> happens because SWIOTLB has to do double buffering. The per task frag
> allocator introduced in 5640f7 creates 32 kb frags, which are not contiguous
> in mfn space.
> This patch provides a workaround by going back to the old way. The possible
> ideas came up to solve this:
>
> * make sure Dom0 memory is contiguous: it sounds trivial, but doesn't work with
> driver domains, and there are lots of situations where this is not possible.
> * use PVH Dom0: so we will have IOMMU. In the future sometime.
> * use IOMMU with PV Dom0: this seems to happen earlier.
>
> Signed-off-by: Zoltan Kiss <[email protected]>
> ---
> net/core/sock.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 2c097c5..854a0ea 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1812,7 +1812,7 @@ struct sk_buff *sock_alloc_send_skb(struct sock *sk, unsigned long size,
> EXPORT_SYMBOL(sock_alloc_send_skb);
>
> /* On 32bit arches, an skb frag is limited to 2^15 */
> -#define SKB_FRAG_PAGE_ORDER get_order(32768)
> +#define SKB_FRAG_PAGE_ORDER get_order(4096)
>

Well, this hack is not new...

We have dev->gso_max_size and dev->gso_max_segs

We also have in net-next sk_pacing_rate and dynamic TSO sizing.

Maybe you could add proper infrastructure to deal with Xen limitations.


2013-09-04 21:11:46

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH] net/core: Order-3 frag allocator causes SWIOTLB bouncing under Xen

On Wed, Sep 04, 2013 at 02:00:40PM -0700, Eric Dumazet wrote:
> On Wed, 2013-09-04 at 21:47 +0100, Zoltan Kiss wrote:
> > THIS PATCH IS NOT INTENDED TO BE UPSTREAMED, IT HAS ONLY INFORMING PURPOSES!
> >
> > I've noticed a performance regression with upstream kernels when used as Dom0
> > under Xen. The classic kernel can utilize the whole bandwidth of a 10G NIC
> > (ca. 9.3 Gbps), but upstream can reach only ca. 7 Gbps. I found that it
> > happens because SWIOTLB has to do double buffering. The per task frag
> > allocator introduced in 5640f7 creates 32 kb frags, which are not contiguous
> > in mfn space.
> > This patch provides a workaround by going back to the old way. The possible
> > ideas came up to solve this:
> >
> > * make sure Dom0 memory is contiguous: it sounds trivial, but doesn't work with
> > driver domains, and there are lots of situations where this is not possible.
> > * use PVH Dom0: so we will have IOMMU. In the future sometime.
> > * use IOMMU with PV Dom0: this seems to happen earlier.
> >
> > Signed-off-by: Zoltan Kiss <[email protected]>
> > ---
> > net/core/sock.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index 2c097c5..854a0ea 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -1812,7 +1812,7 @@ struct sk_buff *sock_alloc_send_skb(struct sock *sk, unsigned long size,
> > EXPORT_SYMBOL(sock_alloc_send_skb);
> >
> > /* On 32bit arches, an skb frag is limited to 2^15 */
> > -#define SKB_FRAG_PAGE_ORDER get_order(32768)
> > +#define SKB_FRAG_PAGE_ORDER get_order(4096)
> >
>
> Well, this hack is not new...
>
> We have dev->gso_max_size and dev->gso_max_segs
>
> We also have in net-next sk_pacing_rate and dynamic TSO sizing.
>
> Maybe you could add proper infrastructure to deal with Xen limitations.

I think Ian posted at some point an sysctl patch for that (more for
debugging that anything else). And it kind
of stalled: http://lists.xen.org/archives/html/xen-devel/2012-10/msg01832.html

Is that what you mean by proper infrastructure ?

Oh wait, did you mean via dev and not the whole system wide sysctl?

>
>
>

2013-09-05 07:39:10

by Ian Campbell

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] net/core: Order-3 frag allocator causes SWIOTLB bouncing under Xen

On Wed, 2013-09-04 at 17:11 -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Sep 04, 2013 at 02:00:40PM -0700, Eric Dumazet wrote:

> > Maybe you could add proper infrastructure to deal with Xen limitations.
>
> I think Ian posted at some point an sysctl patch for that (more for
> debugging that anything else). And it kind
> of stalled: http://lists.xen.org/archives/html/xen-devel/2012-10/msg01832.html

I think I though you were looking into it from the swiotlb angle?

In any case I don't have time to look into this further.

> Is that what you mean by proper infrastructure ?
>
> Oh wait, did you mean via dev and not the whole system wide sysctl?

The system wide sysctl was not acceptable AFAIR, which seems reasonable.

Per-dev is hard because it affects the native drivers for each physical
NIC when running under Xen, not the Xen PV NIC which we fixed by
splitting compound frags into separate slots on the PV ring.

Ian.

2013-09-06 13:27:46

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] net/core: Order-3 frag allocator causes SWIOTLB bouncing under Xen

On Thu, Sep 05, 2013 at 08:39:06AM +0100, Ian Campbell wrote:
> On Wed, 2013-09-04 at 17:11 -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Sep 04, 2013 at 02:00:40PM -0700, Eric Dumazet wrote:
>
> > > Maybe you could add proper infrastructure to deal with Xen limitations.
> >
> > I think Ian posted at some point an sysctl patch for that (more for
> > debugging that anything else). And it kind
> > of stalled: http://lists.xen.org/archives/html/xen-devel/2012-10/msg01832.html
>
> I think I though you were looking into it from the swiotlb angle?

Yes. I didn't find anything immediately obvious - but I can still
reproduce with an skge DMA issues when booting baremetal with 'swiotlb=force'.

But only under 32-bit. I think there is some physical address
truncation with the new compound size skb's. Obviously needs
further investigation.

>
> In any case I don't have time to look into this further.
>
> > Is that what you mean by proper infrastructure ?
> >
> > Oh wait, did you mean via dev and not the whole system wide sysctl?
>
> The system wide sysctl was not acceptable AFAIR, which seems reasonable.

<nods>
>
> Per-dev is hard because it affects the native drivers for each physical
> NIC when running under Xen, not the Xen PV NIC which we fixed by
> splitting compound frags into separate slots on the PV ring.

Right. Thank you for pointing that obvious issue.

>
> Ian.
>