2005-09-18 14:32:21

by Dan Aloni

[permalink] [raw]
Subject: workaround large MTU and N-order allocation failures

Hello,

Is there currently a workaround available for handling large MTU
(larger than 1 page, even 2-order) in the Linux network stack?

The problem with large MTU is external memory fragmentation in
the buddy system following high workload, causing alloc_skb() to
fail.

I'm interested in patches for both 2.4 and 2.6 kernels.

Thanks,

--
Dan Aloni
[email protected], [email protected], [email protected]


2005-09-18 23:11:32

by Francois Romieu

[permalink] [raw]
Subject: Re: workaround large MTU and N-order allocation failures

Dan Aloni <[email protected]> :
[...]
> The problem with large MTU is external memory fragmentation in
> the buddy system following high workload, causing alloc_skb() to
> fail.

If the issue hits the Rx path, it is probably the responsibility of
the device driver. Which kind of hardware do you use ?

--
Ueimor

2005-09-19 06:31:17

by Nick Piggin

[permalink] [raw]
Subject: Re: workaround large MTU and N-order allocation failures

On Sun, 2005-09-18 at 17:35 +0300, Dan Aloni wrote:
> Hello,
>
> Is there currently a workaround available for handling large MTU
> (larger than 1 page, even 2-order) in the Linux network stack?
>
> The problem with large MTU is external memory fragmentation in
> the buddy system following high workload, causing alloc_skb() to
> fail.
>
> I'm interested in patches for both 2.4 and 2.6 kernels.
>

Yes there is currently a workaround. That is to keep increasing
/proc/sys/vm/min_free_kbytes until your allocation failures stop.

Nick

--
SUSE Labs, Novell Inc.



Send instant messages to your online friends http://au.messenger.yahoo.com

2005-09-19 07:10:45

by Dan Aloni

[permalink] [raw]
Subject: Re: workaround large MTU and N-order allocation failures

On Mon, Sep 19, 2005 at 01:08:22AM +0200, Francois Romieu wrote:
> Dan Aloni <[email protected]> :
> [...]
> > The problem with large MTU is external memory fragmentation in
> > the buddy system following high workload, causing alloc_skb() to
> > fail.
>
> If the issue hits the Rx path, it is probably the responsibility of
> the device driver. Which kind of hardware do you use ?

We are using a SuperMicro board and the network driver is e1000. The
revision of the chipset is 82546GB-copper (maps to e1000_82546_rev_3).

This particular chipset does not support packet splitting, so we
are looking for a hack on the skb layer.

--
Dan Aloni
[email protected], [email protected], [email protected]

2005-09-19 13:44:05

by Al Boldi

[permalink] [raw]
Subject: Re: workaround large MTU and N-order allocation failures

Nick Piggin wrote:
> On Sun, 2005-09-18 at 17:35 +0300, Dan Aloni wrote:
> > Hello,
> >
> > Is there currently a workaround available for handling large MTU
> > (larger than 1 page, even 2-order) in the Linux network stack?
> >
> > The problem with large MTU is external memory fragmentation in
> > the buddy system following high workload, causing alloc_skb() to
> > fail.
> >
> > I'm interested in patches for both 2.4 and 2.6 kernels.
>
> Yes there is currently a workaround. That is to keep increasing
> /proc/sys/vm/min_free_kbytes until your allocation failures stop.

How do you do it in 2.4?

--
Al

2005-09-19 17:25:33

by Ganesh Venkatesan

[permalink] [raw]
Subject: Re: workaround large MTU and N-order allocation failures

82546GB supports an incoming Rx packet to be received in multiple rx
buffers. A driver that enables this feature is under test currently.
What version of the e1000 are you using?

ganesh.

On 9/19/05, Dan Aloni <[email protected]> wrote:
> On Mon, Sep 19, 2005 at 01:08:22AM +0200, Francois Romieu wrote:
> > Dan Aloni <[email protected]> :
> > [...]
> > > The problem with large MTU is external memory fragmentation in
> > > the buddy system following high workload, causing alloc_skb() to
> > > fail.
> >
> > If the issue hits the Rx path, it is probably the responsibility of
> > the device driver. Which kind of hardware do you use ?
>
> We are using a SuperMicro board and the network driver is e1000. The
> revision of the chipset is 82546GB-copper (maps to e1000_82546_rev_3).
>
> This particular chipset does not support packet splitting, so we
> are looking for a hack on the skb layer.
>
> --
> Dan Aloni
> [email protected], [email protected], [email protected]
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2005-09-20 14:21:55

by Dan Aloni

[permalink] [raw]
Subject: Re: workaround large MTU and N-order allocation failures

On Mon, Sep 19, 2005 at 10:25:29AM -0700, Ganesh Venkatesan wrote:
> 82546GB supports an incoming Rx packet to be received in multiple rx
> buffers. A driver that enables this feature is under test currently.
> What version of the e1000 are you using?

We are currently using the lastest version of the driver from the 2.6
tree backported to the 2.4 tree. I wasn't aware that 82546GB supports
this - I inferred differently from the comments in the driver's source.

Is the version of the driver you mention available from CVS somewhere?

--
Dan Aloni
[email protected], [email protected], [email protected]

2005-09-20 14:31:12

by Dan Aloni

[permalink] [raw]
Subject: Re: workaround large MTU and N-order allocation failures

On Mon, Sep 19, 2005 at 04:31:02PM +1000, Nick Piggin wrote:
> On Sun, 2005-09-18 at 17:35 +0300, Dan Aloni wrote:
> > Hello,
> >
> > Is there currently a workaround available for handling large MTU
> > (larger than 1 page, even 2-order) in the Linux network stack?
> >
> > The problem with large MTU is external memory fragmentation in
> > the buddy system following high workload, causing alloc_skb() to
> > fail.
> >
> > I'm interested in patches for both 2.4 and 2.6 kernels.
> >
>
> Yes there is currently a workaround. That is to keep increasing
> /proc/sys/vm/min_free_kbytes until your allocation failures stop.

We have developed a much more reliable workaround which works on both
the 2.4 and 2.6 trees.

Our development is called 'Pre-allocated Big Buffers', basically prebb
provides fixed-sized pools of fixed-size blocks that are allocated during
boot time using the bootmem allocator (thus are disconnected from the
slab cache completely). block size need not to be page aligned. It is
possible to allocate these blocks at O(1) efficiency at any context.

Each pool has a minimum and maximum object size (where allocations
should strive to be the maximum for memory usage efficiency). Currently
we use prebb to ensure no fragmentation and fine-tuned memory usage.

(Of course a few changes inside net/core/skbuff.c were needed for
skb buffers to allocate from prebb instead of slab).

--
Dan Aloni
[email protected], [email protected], [email protected]