LinuxLists.cc - kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages

2012-08-29 00:33:26

Subject: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

Dear all,

I hope this is the right place to send this sort of backtrace dump.

I'm getting the following sort of dumps (below) on a 3.2.27 kernel on
an arm/kirkwood (actually DreamPlug) machine that's used as a router.

I imagine it being somehow related to the fact that it operates a
network bridge (I imagine this because I have another identical
machine with exactly the same kernel and a very similar config but not
running a bridge, and the warning never pops up).

Is this worth investigating? (I will, of course, provide the config
file and any other relevant data if the answer is "yes".) Is this
potentially serious? (I'm getting hard lockups on this machine which
I suspect are due to hardware and unrelated to this, but if someone
tells me it could be the cause, I'd be more than happy to believe it.)

[24711.204492] ------------[ cut here ]------------
[24711.209151] WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()
[24711.216667] Modules linked in: 8021q ath9k_htc mac80211 ath9k_common ath9k_hw ath cfg80211 bnep rfcomm sit tunnel4 sch_ingress cls_fw cls_u32 sch_sfq sch_htb pppoe pppox ppp_generic slhc bridge stp llc ip6t_REJECT ip6table_filter ip6table_mangle xt_NOTRACK ip6table_raw ip6_tables nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ftp nf_conntrack_ftp ipt_REJECT xt_conntrack iptable_filter ipt_MASQUERADE iptable_nat nf_nat xt_TCPMSS xt_tcpudp xt_mark iptable_mangle ip_tables x_tables nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 orion_wdt ipv6 snd_usb_audio snd_pcm snd_page_alloc snd_hwdep snd_usbmidi_lib snd_seq_midi snd_seq_midi_event snd_rawmidi btmrvl_sdio btmrvl snd_seq snd_timer snd_seq_device snd bluetooth soundcore
[24711.280663] [<c000d728>] (unwind_backtrace+0x0/0xf0) from [<c0022f74>] (warn_slowpath_common+0x50/0x68)
[24711.290124] [<c0022f74>] (warn_slowpath_common+0x50/0x68) from [<c0022fa8>] (warn_slowpath_null+0x1c/0x24)
[24711.299845] [<c0022fa8>] (warn_slowpath_null+0x1c/0x24) from [<c009caec>] (__alloc_pages_nodemask+0x1d4/0x68c)
[24711.309914] [<c009caec>] (__alloc_pages_nodemask+0x1d4/0x68c) from [<c009cfb4>] (__get_free_pages+0x10/0x3c)
[24711.319805] [<c009cfb4>] (__get_free_pages+0x10/0x3c) from [<c00c9fd0>] (kmalloc_order_trace+0x24/0xdc)
[24711.329269] [<c00c9fd0>] (kmalloc_order_trace+0x24/0xdc) from [<c038d638>] (pskb_expand_head+0x68/0x298)
[24711.338901] [<c038d638>] (pskb_expand_head+0x68/0x298) from [<bf0dd3ec>] (ip6_forward+0x4d4/0x7bc [ipv6])
[24711.348638] [<bf0dd3ec>] (ip6_forward+0x4d4/0x7bc [ipv6]) from [<bf0dfebc>] (ipv6_rcv+0x2bc/0x3dc [ipv6])
[24711.358333] [<bf0dfebc>] (ipv6_rcv+0x2bc/0x3dc [ipv6]) from [<c0394870>] (__netif_receive_skb+0x544/0x66c)
[24711.368106] [<c0394870>] (__netif_receive_skb+0x544/0x66c) from [<bf1cd054>] (br_nf_pre_routing_finish_ipv6+0x10c/0x160 [bridge])
[24711.379899] [<bf1cd054>] (br_nf_pre_routing_finish_ipv6+0x10c/0x160 [bridge]) from [<bf1cdae8>] (br_nf_pre_routing+0x59c/0x67c [bridge])
[24711.392271] [<bf1cdae8>] (br_nf_pre_routing+0x59c/0x67c [bridge]) from [<c03bd2a4>] (nf_iterate+0x8c/0xb4)
[24711.401988] [<c03bd2a4>] (nf_iterate+0x8c/0xb4) from [<c03bd328>] (nf_hook_slow+0x5c/0x118)
[24711.410540] [<c03bd328>] (nf_hook_slow+0x5c/0x118) from [<bf1c7fa4>] (br_handle_frame+0x1b8/0x290 [bridge])
[24711.420367] [<bf1c7fa4>] (br_handle_frame+0x1b8/0x290 [bridge]) from [<c03946f8>] (__netif_receive_skb+0x3cc/0x66c)
[24711.430872] [<c03946f8>] (__netif_receive_skb+0x3cc/0x66c) from [<c031e254>] (mv643xx_eth_poll+0x540/0x734)
[24711.440680] [<c031e254>] (mv643xx_eth_poll+0x540/0x734) from [<c0397390>] (net_rx_action+0x118/0x314)
[24711.449970] [<c0397390>] (net_rx_action+0x118/0x314) from [<c0029924>] (__do_softirq+0xac/0x234)
[24711.458817] [<c0029924>] (__do_softirq+0xac/0x234) from [<c0029f00>] (irq_exit+0x94/0x9c)
[24711.467046] [<c0029f00>] (irq_exit+0x94/0x9c) from [<c00094b0>] (handle_IRQ+0x34/0x84)
[24711.475007] [<c00094b0>] (handle_IRQ+0x34/0x84) from [<c04398d4>] (__irq_svc+0x34/0x98)
[24711.483068] [<c04398d4>] (__irq_svc+0x34/0x98) from [<c0011d6c>] (kirkwood_enter_idle+0x4c/0x94)
[24711.491908] [<c0011d6c>] (kirkwood_enter_idle+0x4c/0x94) from [<c0357a00>] (cpuidle_idle_call+0xc8/0x35c)
[24711.501532] [<c0357a00>] (cpuidle_idle_call+0xc8/0x35c) from [<c0009764>] (cpu_idle+0x88/0xdc)
[24711.510201] [<c0009764>] (cpu_idle+0x88/0xdc) from [<c05d8720>] (start_kernel+0x2a0/0x2f0)
[24711.518512] ---[ end trace e1776fbe32468909 ]---

--
David A. Madore
( http://www.madore.org/~david/ )

2012-08-29 19:48:44

by Francois Romieu

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

David Madore <[email protected]> :
[...]
> I imagine it being somehow related to the fact that it operates a
> network bridge (I imagine this because I have another identical
> machine with exactly the same kernel and a very similar config but not
> running a bridge, and the warning never pops up).

Could it not be a genuine allocation failure ?

--
Ueimor

2012-08-31 10:59:40

by David Madore

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

On Wed, Aug 29, 2012 at 09:32:20PM +0200, Francois Romieu wrote:
> David Madore <[email protected]> :
> [...]
> > I imagine it being somehow related to the fact that it operates a
> > network bridge (I imagine this because I have another identical
> > machine with exactly the same kernel and a very similar config but not
> > running a bridge, and the warning never pops up).
>
> Could it not be a genuine allocation failure ?

I have no idea. How can I tell? In any case, if having 512MB RAM
isn't enough for the kernel in the router of a small home's network,
that's a bug somewhere, isn't it?

Also:

On Wed, Aug 29, 2012 at 02:25:48AM +0200, David Madore wrote:
> Is this worth investigating? (I will, of course, provide the config
> file and any other relevant data if the answer is "yes".) Is this
> potentially serious? (I'm getting hard lockups on this machine which
> I suspect are due to hardware and unrelated to this, but if someone
> tells me it could be the cause, I'd be more than happy to believe it.)

I'm now inclined to believe the hard lockups are indeed related to
this (I can semi-reproducibly make them happen with only network
traffic - actually, with the messages of a compilation taking place on
another machine being routed through this box (over IPv6)).

So how can I help debug this? (One difficulty is that I have only
remote access to this box, and it's not meant for experimenting with.)

--
David A. Madore
( http://www.madore.org/~david/ )

2012-08-31 13:58:00

by David Madore

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

On Fri, Aug 31, 2012 at 12:59:36PM +0200, David Madore wrote:
> On Wed, Aug 29, 2012 at 09:32:20PM +0200, Francois Romieu wrote:
> > David Madore <[email protected]> :
> > [...]
> > > I imagine it being somehow related to the fact that it operates a
> > > network bridge (I imagine this because I have another identical
> > > machine with exactly the same kernel and a very similar config but not
> > > running a bridge, and the warning never pops up).
> >
> > Could it not be a genuine allocation failure ?
>
> I have no idea. How can I tell? In any case, if having 512MB RAM
> isn't enough for the kernel in the router of a small home's network,
> that's a bug somewhere, isn't it?

PS: I'm also getting the following kind of messages from a wlan
interface that's on the bridge:

[ 268.976317] ieee80211 phy0: failed to reallocate TX buffer
[ 716.880515] ieee80211 phy0: failed to reallocate TX buffer
[ 1160.877677] ieee80211 phy0: failed to reallocate TX buffer

Could they be related?

--
David A. Madore
( http://www.madore.org/~david/ )

2012-08-31 18:29:01

by Francois Romieu

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

David Madore <[email protected]> :
[...]
> [ 268.976317] ieee80211 phy0: failed to reallocate TX buffer
> [ 716.880515] ieee80211 phy0: failed to reallocate TX buffer
> [ 1160.877677] ieee80211 phy0: failed to reallocate TX buffer
>
> Could they be related?

Yes. This is a pskb_expand_head failure from net/mac80211/tx.c.
The router is experiencing GFP_ATOMIC allocation failure

You should monitor /proc/slabinfo content. A pig may appear over time.

--
Ueimor

2012-09-01 02:21:45

by Hugh Dickins

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

[ Cc'ing original mail to netdev as the problem may be recognized there ]

On Wed, 29 Aug 2012, David Madore wrote:
> Dear all,
>
> I hope this is the right place to send this sort of backtrace dump.
>
> I'm getting the following sort of dumps (below) on a 3.2.27 kernel on
> an arm/kirkwood (actually DreamPlug) machine that's used as a router.
>
> I imagine it being somehow related to the fact that it operates a
> network bridge (I imagine this because I have another identical
> machine with exactly the same kernel and a very similar config but not
> running a bridge, and the warning never pops up).
>
> Is this worth investigating? (I will, of course, provide the config
> file and any other relevant data if the answer is "yes".) Is this
> potentially serious? (I'm getting hard lockups on this machine which
> I suspect are due to hardware and unrelated to this, but if someone
> tells me it could be the cause, I'd be more than happy to believe it.)
>
> [24711.204492] ------------[ cut here ]------------
> [24711.209151] WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()
> [24711.216667] Modules linked in: 8021q ath9k_htc mac80211 ath9k_common ath9k_hw ath cfg80211 bnep rfcomm sit tunnel4 sch_ingress cls_fw cls_u32 sch_sfq sch_htb pppoe pppox ppp_generic slhc bridge stp llc ip6t_REJECT ip6table_filter ip6table_mangle xt_NOTRACK ip6table_raw ip6_tables nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ftp nf_conntrack_ftp ipt_REJECT xt_conntrack iptable_filter ipt_MASQUERADE iptable_nat nf_nat xt_TCPMSS xt_tcpudp xt_mark iptable_mangle ip_tables x_tables nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 orion_wdt ipv6 snd_usb_audio snd_pcm snd_page_alloc snd_hwdep snd_usbmidi_lib snd_seq_midi snd_seq_midi_event snd_rawmidi btmrvl_sdio btmrvl snd_seq snd_timer snd_seq_device snd bluetooth soundcore
> [24711.280663] [<c000d728>] (unwind_backtrace+0x0/0xf0) from [<c0022f74>] (warn_slowpath_common+0x50/0x68)
> [24711.290124] [<c0022f74>] (warn_slowpath_common+0x50/0x68) from [<c0022fa8>] (warn_slowpath_null+0x1c/0x24)
> [24711.299845] [<c0022fa8>] (warn_slowpath_null+0x1c/0x24) from [<c009caec>] (__alloc_pages_nodemask+0x1d4/0x68c)
> [24711.309914] [<c009caec>] (__alloc_pages_nodemask+0x1d4/0x68c) from [<c009cfb4>] (__get_free_pages+0x10/0x3c)
> [24711.319805] [<c009cfb4>] (__get_free_pages+0x10/0x3c) from [<c00c9fd0>] (kmalloc_order_trace+0x24/0xdc)
> [24711.329269] [<c00c9fd0>] (kmalloc_order_trace+0x24/0xdc) from [<c038d638>] (pskb_expand_head+0x68/0x298)
> [24711.338901] [<c038d638>] (pskb_expand_head+0x68/0x298) from [<bf0dd3ec>] (ip6_forward+0x4d4/0x7bc [ipv6])
> [24711.348638] [<bf0dd3ec>] (ip6_forward+0x4d4/0x7bc [ipv6]) from [<bf0dfebc>] (ipv6_rcv+0x2bc/0x3dc [ipv6])
> [24711.358333] [<bf0dfebc>] (ipv6_rcv+0x2bc/0x3dc [ipv6]) from [<c0394870>] (__netif_receive_skb+0x544/0x66c)
> [24711.368106] [<c0394870>] (__netif_receive_skb+0x544/0x66c) from [<bf1cd054>] (br_nf_pre_routing_finish_ipv6+0x10c/0x160 [bridge])
> [24711.379899] [<bf1cd054>] (br_nf_pre_routing_finish_ipv6+0x10c/0x160 [bridge]) from [<bf1cdae8>] (br_nf_pre_routing+0x59c/0x67c [bridge])
> [24711.392271] [<bf1cdae8>] (br_nf_pre_routing+0x59c/0x67c [bridge]) from [<c03bd2a4>] (nf_iterate+0x8c/0xb4)
> [24711.401988] [<c03bd2a4>] (nf_iterate+0x8c/0xb4) from [<c03bd328>] (nf_hook_slow+0x5c/0x118)
> [24711.410540] [<c03bd328>] (nf_hook_slow+0x5c/0x118) from [<bf1c7fa4>] (br_handle_frame+0x1b8/0x290 [bridge])
> [24711.420367] [<bf1c7fa4>] (br_handle_frame+0x1b8/0x290 [bridge]) from [<c03946f8>] (__netif_receive_skb+0x3cc/0x66c)
> [24711.430872] [<c03946f8>] (__netif_receive_skb+0x3cc/0x66c) from [<c031e254>] (mv643xx_eth_poll+0x540/0x734)
> [24711.440680] [<c031e254>] (mv643xx_eth_poll+0x540/0x734) from [<c0397390>] (net_rx_action+0x118/0x314)
> [24711.449970] [<c0397390>] (net_rx_action+0x118/0x314) from [<c0029924>] (__do_softirq+0xac/0x234)
> [24711.458817] [<c0029924>] (__do_softirq+0xac/0x234) from [<c0029f00>] (irq_exit+0x94/0x9c)
> [24711.467046] [<c0029f00>] (irq_exit+0x94/0x9c) from [<c00094b0>] (handle_IRQ+0x34/0x84)
> [24711.475007] [<c00094b0>] (handle_IRQ+0x34/0x84) from [<c04398d4>] (__irq_svc+0x34/0x98)
> [24711.483068] [<c04398d4>] (__irq_svc+0x34/0x98) from [<c0011d6c>] (kirkwood_enter_idle+0x4c/0x94)
> [24711.491908] [<c0011d6c>] (kirkwood_enter_idle+0x4c/0x94) from [<c0357a00>] (cpuidle_idle_call+0xc8/0x35c)
> [24711.501532] [<c0357a00>] (cpuidle_idle_call+0xc8/0x35c) from [<c0009764>] (cpu_idle+0x88/0xdc)
> [24711.510201] [<c0009764>] (cpu_idle+0x88/0xdc) from [<c05d8720>] (start_kernel+0x2a0/0x2f0)
> [24711.518512] ---[ end trace e1776fbe32468909 ]---

Francois is right that a GFP_ATOMIC allocation from pskb_expand_head()
is failing, which can easily happen, and cause your "failed to reallocate
TX buffer" errors; but it's well worth looking up what's actually on
lines 2108 and 2109 of mm/page_alloc.c in 3.2.27:

if (order >= MAX_ORDER) {
WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));

That was probably not a sane allocation request, it has gone out of range:
maybe the skb header is even corrupted. If you're lucky, it might be
something that netdev will recognize as already fixed.

Hugh

2012-10-04 16:10:17

by Maxime Bizon

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

On Fri, 2012-08-31 at 19:21 -0700, Hugh Dickins wrote:

Hi,

> Francois is right that a GFP_ATOMIC allocation from pskb_expand_head()
> is failing, which can easily happen, and cause your "failed to reallocate
> TX buffer" errors; but it's well worth looking up what's actually on
> lines 2108 and 2109 of mm/page_alloc.c in 3.2.27:
>
> if (order >= MAX_ORDER) {
> WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
>
> That was probably not a sane allocation request, it has gone out of range:
> maybe the skb header is even corrupted. If you're lucky, it might be
> something that netdev will recognize as already fixed.

I have the same problem on the exact same hardware and found the cause:

Author: Eric Dumazet <[email protected]>
Date: Tue Apr 10 20:08:39 2012 +0000

net: allow pskb_expand_head() to get maximum tailroom

[ Upstream commit 87151b8689d890dfb495081f7be9b9e257f7a2df ]

It turns out this change has a bad side effect on drivers that uses
skb_recycle(), in that case mv643xx_eth.c

Since skb_recycle() resets skb->data using (skb->head + NET_SKB_PAD), a
recycled skb going multiple times through a path that needs to expand
skb head will get bigger and bigger each time, and you eventually end up
with an allocation failure.

An idea to fix this would be to pass needed skb size to skb_resize() and
set skb->data to MIN(NET_SKB_PAD, (skb->end - skb->head - skb_size) / 2)

skb recycling gives a small speed boost, but does not get a lot of test
coverage since only 3 drivers uses it

--
Maxime

2012-10-04 16:29:38

by Eric Dumazet

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

On Thu, 2012-10-04 at 18:02 +0200, Maxime Bizon wrote:
> On Fri, 2012-08-31 at 19:21 -0700, Hugh Dickins wrote:
>
> Hi,
>
> > Francois is right that a GFP_ATOMIC allocation from pskb_expand_head()
> > is failing, which can easily happen, and cause your "failed to reallocate
> > TX buffer" errors; but it's well worth looking up what's actually on
> > lines 2108 and 2109 of mm/page_alloc.c in 3.2.27:
> >
> > if (order >= MAX_ORDER) {
> > WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> >
> > That was probably not a sane allocation request, it has gone out of range:
> > maybe the skb header is even corrupted. If you're lucky, it might be
> > something that netdev will recognize as already fixed.
>
> I have the same problem on the exact same hardware and found the cause:
>
> Author: Eric Dumazet <[email protected]>
> Date: Tue Apr 10 20:08:39 2012 +0000
>
> net: allow pskb_expand_head() to get maximum tailroom
>
> [ Upstream commit 87151b8689d890dfb495081f7be9b9e257f7a2df ]
>
>
> It turns out this change has a bad side effect on drivers that uses
> skb_recycle(), in that case mv643xx_eth.c
>
> Since skb_recycle() resets skb->data using (skb->head + NET_SKB_PAD), a
> recycled skb going multiple times through a path that needs to expand
> skb head will get bigger and bigger each time, and you eventually end up
> with an allocation failure.
>
> An idea to fix this would be to pass needed skb size to skb_resize() and
> set skb->data to MIN(NET_SKB_PAD, (skb->end - skb->head - skb_size) / 2)
>
> skb recycling gives a small speed boost, but does not get a lot of test
> coverage since only 3 drivers uses it
>

Thanks Maxime

Sure we can probably fix this issue, but its really not worth the pain.

I would get rid of it, its superseded by build_skb() to get cache hot
skbs anyway, and more over, rx path now uses skb->head allocated from a
page fragment for optimal GRO/TCP coalescing behavior.

skb_recycle() assumes skb allocation is slow, but its not per se.

Cache line misses are expensive, thats the real issue.

2012-10-04 16:50:15

by Eric Dumazet

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

On Thu, 2012-10-04 at 18:02 +0200, Maxime Bizon wrote:
> O
>
> Since skb_recycle() resets skb->data using (skb->head + NET_SKB_PAD), a
> recycled skb going multiple times through a path that needs to expand
> skb head will get bigger and bigger each time, and you eventually end up
> with an allocation failure.
>

Because there is not enough headroom ?

> An idea to fix this would be to pass needed skb size to skb_resize() and
> set skb->data to MIN(NET_SKB_PAD, (skb->end - skb->head - skb_size) / 2)

I am trying to decode this but I cant ;)

What is skb_resize() ?
and what do you mean setting skb->data to MIN(NET_SKB_PAD, (skb->end -
skb->head - skb_size) / 2)

Care to explain again your idea ?

Thanks !

2012-10-04 17:09:36

by Maxime Bizon

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

On Thu, 2012-10-04 at 18:50 +0200, Eric Dumazet wrote:

> > Since skb_recycle() resets skb->data using (skb->head + NET_SKB_PAD), a
> > recycled skb going multiple times through a path that needs to expand
> > skb head will get bigger and bigger each time, and you eventually end up
> > with an allocation failure.
> >
>
> Because there is not enough headroom ?

yes, on ipv6 forward path the default NET_SKB_PAD is too small, so each
packet forwarded has its headroom expanded, it is then recycled and gets
its original default headroom back, then it gets forwarded,
expanded, ...

> > An idea to fix this would be to pass needed skb size to skb_resize() and
> > set skb->data to MIN(NET_SKB_PAD, (skb->end - skb->head - skb_size) / 2)
>
> I am trying to decode this but I cant ;)
>
> What is skb_resize() ?
> and what do you mean setting skb->data to MIN(NET_SKB_PAD, (skb->end -
> skb->head - skb_size) / 2)
>
> Care to explain again your idea ?

damn typo, I meant skb_recycle(), and my formula is probably wrong.

skb_size is passed to skb_recycle_check() to ensure the skb is at least
that big.

The idea is it to pass the same value to skb_recycle(), allowing it to
set skb->data somewhat at the middle of skb head space (after honoring
NET_SKB_PAD), that way the recycled skb won't have its head expanded
again if it takes the same path.

--
Maxime

2012-10-04 17:17:36

by Eric Dumazet

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

On Thu, 2012-10-04 at 19:09 +0200, Maxime Bizon wrote:

> yes, on ipv6 forward path the default NET_SKB_PAD is too small, so each
> packet forwarded has its headroom expanded, it is then recycled and gets
> its original default headroom back, then it gets forwarded,
> expanded, ...

Hmm, this sounds bad (especially without recycle)

Might I assume NET_SKB_PAD is 32 on this arch ?

We might adjust NET_SKB_PAD to avoid all these memory copies...

2012-10-04 17:34:42

by Maxime Bizon

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

On Thu, 2012-10-04 at 19:17 +0200, Eric Dumazet wrote:

> > yes, on ipv6 forward path the default NET_SKB_PAD is too small, so each
> > packet forwarded has its headroom expanded, it is then recycled and gets
> > its original default headroom back, then it gets forwarded,
> > expanded, ...
>
> Hmm, this sounds bad (especially without recycle)
>
> Might I assume NET_SKB_PAD is 32 on this arch ?

It is, I have a setup with 6to4 tunneling, so needed headroom on tx is
quite big.

I used to be careful about raising this value to avoid drivers using
slab-4096 instead of slab-2048, but since our boards no longer have 16MB
of RAM and with the recent changes in mainline it doesn't seem to be an
issue anymore.

It's not a that big issue in the non recycle case, just lower
performance if the tunable is not set correctly. Though it would be nice
to have a stat/counter so you know when you hit this kind of slow path.

But on the recycle case, skb->head is reallocated to twice the size each
time the packet is recycled and takes the same path again. This stresses
the VM and you eventually get packet loss (and scary printk)

--
Maxime

2012-10-04 21:27:55

by Eric Dumazet

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

On Thu, 2012-10-04 at 19:34 +0200, Maxime Bizon wrote:
> On Thu, 2012-10-04 at 19:17 +0200, Eric Dumazet wrote:
>
> > > yes, on ipv6 forward path the default NET_SKB_PAD is too small, so each
> > > packet forwarded has its headroom expanded, it is then recycled and gets
> > > its original default headroom back, then it gets forwarded,
> > > expanded, ...
> >
> > Hmm, this sounds bad (especially without recycle)
> >
> > Might I assume NET_SKB_PAD is 32 on this arch ?
>
> It is, I have a setup with 6to4 tunneling, so needed headroom on tx is
> quite big.
>

If we change NET_SKB_PAD minimum to be 64 (as it is on x86), it should
be enough for the added tunnel encapsulation or not ?

> I used to be careful about raising this value to avoid drivers using
> slab-4096 instead of slab-2048, but since our boards no longer have 16MB
> of RAM and with the recent changes in mainline it doesn't seem to be an
> issue anymore.

Yes, granted we can allocate order-3 pages for delivering skb->head
fragments, adding 32 bytes doesnt switch to slab-4096 since we dont use
it anymore.

>
> It's not a that big issue in the non recycle case, just lower
> performance if the tunable is not set correctly. Though it would be nice
> to have a stat/counter so you know when you hit this kind of slow path.
>

Yeah, we already mentioned this idea in the past. We are lazy now we
have good performance tools (perf)

> But on the recycle case, skb->head is reallocated to twice the size each
> time the packet is recycled and takes the same path again. This stresses
> the VM and you eventually get packet loss (and scary printk)
>

OK, so to fix this on stable trees, skb_recycle() should not recycle skb
if skb->head is too big.

By the way, another problem with skb_recycle() is that skb->truesize can
be wrong as well. (One skb might had one frag, with a truesize of
2048/4096 bytes, and this frag was pulled in skb->head, so skb->truesize
is slightly wrong.

So we also must check if skb->truesize is equal to
SKB_TRUESIZE(skb_end_offset(skb)), or reset it in skb_recycle(),
I have no strong opinion.

Something like this (untested) patch :

But I really think we should remove skb_recycle() when net-next is
opened again.

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b33a3a1..13ca215 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2659,7 +2659,10 @@ static inline bool skb_is_recycleable(const struct sk_buff *skb, int skb_size)
skb_size = SKB_DATA_ALIGN(skb_size + NET_SKB_PAD);
if (skb_end_offset(skb) < skb_size)
return false;
-
+ if (skb_end_offset(skb) > 2 * skb_size)
+ return false;
+ if (skb->truesize != SKB_TRUESIZE(skb_end_offset(skb)))
+ return false;
if (skb_shared(skb) || skb_cloned(skb))
return false;

2012-10-05 07:41:17

by Eric Dumazet

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

On Thu, 2012-10-04 at 18:29 +0200, Eric Dumazet wrote:
> On Thu, 2012-10-04 at 18:02 +0200, Maxime Bizon wrote:
> > On Fri, 2012-08-31 at 19:21 -0700, Hugh Dickins wrote:
> >
> > Hi,
> >
> > > Francois is right that a GFP_ATOMIC allocation from pskb_expand_head()
> > > is failing, which can easily happen, and cause your "failed to reallocate
> > > TX buffer" errors; but it's well worth looking up what's actually on
> > > lines 2108 and 2109 of mm/page_alloc.c in 3.2.27:
> > >
> > > if (order >= MAX_ORDER) {
> > > WARN_ON_ONCE(!(gfp_mask & __GFP_NOWARN));
> > >
> > > That was probably not a sane allocation request, it has gone out of range:
> > > maybe the skb header is even corrupted. If you're lucky, it might be
> > > something that netdev will recognize as already fixed.
> >
> > I have the same problem on the exact same hardware and found the cause:
> >
> > Author: Eric Dumazet <[email protected]>
> > Date: Tue Apr 10 20:08:39 2012 +0000
> >
> > net: allow pskb_expand_head() to get maximum tailroom
> >
> > [ Upstream commit 87151b8689d890dfb495081f7be9b9e257f7a2df ]
> >
> >
> > It turns out this change has a bad side effect on drivers that uses
> > skb_recycle(), in that case mv643xx_eth.c
> >
> > Since skb_recycle() resets skb->data using (skb->head + NET_SKB_PAD), a
> > recycled skb going multiple times through a path that needs to expand
> > skb head will get bigger and bigger each time, and you eventually end up
> > with an allocation failure.
> >
> > An idea to fix this would be to pass needed skb size to skb_resize() and
> > set skb->data to MIN(NET_SKB_PAD, (skb->end - skb->head - skb_size) / 2)
> >
> > skb recycling gives a small speed boost, but does not get a lot of test
> > coverage since only 3 drivers uses it
> >
>
> Thanks Maxime

By the way, the commit you pointed has no effect on the reallocation
performed by pskb_expand_head() :

int size = nhead + skb_end_offset(skb) + ntail;

So pskb_expand_head() always assumed the current head is fully used, and
because we have some kmalloc-power-of-two contraints, each time
pskb_expand_head() is called with a non zero (nhead + ntail) we double
the skb->head ksize.

So why are we using skb_end_offset(skb) here is the question.

I guess it could be (skb_tail_pointer(skb) - skb->head) on some uses.

(ie not counting the unused bytes from tail_pointer to end_pointer)

Its probably long to audit all pskb_expand_head() users (about 77 in
tree), but most of them use (nhead = 0, ntail = 0)

It looks like the following patch should be fine.

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index cdc2859..dd42c6a 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1053,11 +1053,22 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
{
int i;
u8 *data;
- int size = nhead + skb_end_offset(skb) + ntail;
+ unsigned int tail_offset = skb_tail_pointer(skb) - skb->head;
+ int size = nhead + ntail;
long off;

BUG_ON(nhead < 0);

+ /* callers using nhead == 0 and ntail == 0 want to get a fresh copy,
+ * so allocate same amount of memory (skb_end_offset)
+ * For others, they want extra headroom or tailroom against the
+ * currently used portion of header (skb->head -> skb_tail_pointer)
+ */
+ if (!size)
+ size = skb_end_offset(skb);
+ else
+ size += tail_offset;
+
if (skb_shared(skb))
BUG();

@@ -1074,7 +1085,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
/* Copy only real data... and, alas, header. This should be
* optimized for the cases when header is void.
*/
- memcpy(data + nhead, skb->head, skb_tail_pointer(skb) - skb->head);
+ memcpy(data + nhead, skb->head, tail_offset);

memcpy((struct skb_shared_info *)(data + size),
skb_shinfo(skb),

2012-10-05 10:49:58

by Maxime Bizon

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

On Fri, 2012-10-05 at 09:41 +0200, Eric Dumazet wrote:

> By the way, the commit you pointed has no effect on the reallocation
> performed by pskb_expand_head() :

The commit has a side effect, because the problem appeared after it was
merged (and goes away if I revert it)

> int size = nhead + skb_end_offset(skb) + ntail;
>
> So pskb_expand_head() always assumed the current head is fully used, and
> because we have some kmalloc-power-of-two contraints, each time
> pskb_expand_head() is called with a non zero (nhead + ntail) we double
> the skb->head ksize.

That is true, but only after the commit I mentioned.

Before that commit, we indeed reallocate skb->head to twice the size,
but skb->end is *not* positioned at the end of newly allocated data. So
on the next pskb_expand_head(), if head and tail are not big values, the
kmalloc() will be of the same size.

The commit adds this after allocation:

size = SKB_WITH_OVERHEAD(ksize(data))
[...]
skb->end = skb->head + size;

so on the next pskb_expand_head, we are going to allocate twice the size
for sure.

> So why are we using skb_end_offset(skb) here is the question.
>
> I guess it could be (skb_tail_pointer(skb) - skb->head) on some uses.

I think your patch is wrong, ntail is not the new tailroom size, it's
what missing to the current tailroom size, by adding ntail + nhead +
tail_offset we are removing previous tailroom.

We cannot shrink the skb that way here I guess, a caller may check
needed headroom & tailroom, calls with nhead=1/ntail=0 because only
headroom is missing, but after the call tailroom would be less than
before the call.

Why don't we juste reallocate to this size:

MAX(current_alloc_size, nhead + ntail + current_end - current_head)

--
Maxime

2012-10-05 12:22:19

by Eric Dumazet

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

On Fri, 2012-10-05 at 12:49 +0200, Maxime Bizon wrote:
> On Fri, 2012-10-05 at 09:41 +0200, Eric Dumazet wrote:
>
> > By the way, the commit you pointed has no effect on the reallocation
> > performed by pskb_expand_head() :
>
> The commit has a side effect, because the problem appeared after it was
> merged (and goes away if I revert it)
>
> > int size = nhead + skb_end_offset(skb) + ntail;
> >
> > So pskb_expand_head() always assumed the current head is fully used, and
> > because we have some kmalloc-power-of-two contraints, each time
> > pskb_expand_head() is called with a non zero (nhead + ntail) we double
> > the skb->head ksize.
>
> That is true, but only after the commit I mentioned.
>
> Before that commit, we indeed reallocate skb->head to twice the size,
> but skb->end is *not* positioned at the end of newly allocated data. So
> on the next pskb_expand_head(), if head and tail are not big values, the
> kmalloc() will be of the same size.
>
>
> The commit adds this after allocation:
>
> size = SKB_WITH_OVERHEAD(ksize(data))
> [...]
> skb->end = skb->head + size;
>
> so on the next pskb_expand_head, we are going to allocate twice the size
> for sure.

Yes, but the idea of the patch was to _avoid_ next pskb_expand_head()
calls...

Its defeated because you have a too small NET_SKB_PAD, and skb_recycle()
inability to properly detect ans skb is oversized.

>
> > So why are we using skb_end_offset(skb) here is the question.
> >
> > I guess it could be (skb_tail_pointer(skb) - skb->head) on some uses.
>
> I think your patch is wrong, ntail is not the new tailroom size, it's
> what missing to the current tailroom size, by adding ntail + nhead +
> tail_offset we are removing previous tailroom.
>

> We cannot shrink the skb that way here I guess, a caller may check
> needed headroom & tailroom, calls with nhead=1/ntail=0 because only
> headroom is missing, but after the call tailroom would be less than
> before the call.
>
> Why don't we juste reallocate to this size:
>
> MAX(current_alloc_size, nhead + ntail + current_end - current_head)

Hmm,

this changes nothing assuming current_end == skb_end_offset(skb)
and current_head = skb->head

Not sure what you mean.

I guess the right answer is to change API, because we have no clue if
the callers want some tailroom or not.

If they have 'enough' they currently pass 0, so we dont really now how
many bytes they really need..

In fact very few call sites need to increase the tailroom, so it's
probably very easy to do this change.

New convention would be : pass number of needed bytes after current
tail, not after current end.

2012-10-05 12:37:58

by Eric Dumazet

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

On Fri, 2012-10-05 at 14:22 +0200, Eric Dumazet wrote:
> On Fri, 2012-10-05 at 12:49 +0200, Maxime Bizon wrote:
> > On Fri, 2012-10-05 at 09:41 +0200, Eric Dumazet wrote:
> >
> > > By the way, the commit you pointed has no effect on the reallocation
> > > performed by pskb_expand_head() :
> >
> > The commit has a side effect, because the problem appeared after it was
> > merged (and goes away if I revert it)
> >
> > > int size = nhead + skb_end_offset(skb) + ntail;
> > >
> > > So pskb_expand_head() always assumed the current head is fully used, and
> > > because we have some kmalloc-power-of-two contraints, each time
> > > pskb_expand_head() is called with a non zero (nhead + ntail) we double
> > > the skb->head ksize.
> >
> > That is true, but only after the commit I mentioned.
> >
> > Before that commit, we indeed reallocate skb->head to twice the size,
> > but skb->end is *not* positioned at the end of newly allocated data. So
> > on the next pskb_expand_head(), if head and tail are not big values, the
> > kmalloc() will be of the same size.
> >
> >
> > The commit adds this after allocation:
> >
> > size = SKB_WITH_OVERHEAD(ksize(data))
> > [...]
> > skb->end = skb->head + size;
> >
> > so on the next pskb_expand_head, we are going to allocate twice the size
> > for sure.
>
> Yes, but the idea of the patch was to _avoid_ next pskb_expand_head()
> calls...
>
> Its defeated because you have a too small NET_SKB_PAD, and skb_recycle()
> inability to properly detect ans skb is oversized.
>
> >
> > > So why are we using skb_end_offset(skb) here is the question.
> > >
> > > I guess it could be (skb_tail_pointer(skb) - skb->head) on some uses.
> >
> > I think your patch is wrong, ntail is not the new tailroom size, it's
> > what missing to the current tailroom size, by adding ntail + nhead +
> > tail_offset we are removing previous tailroom.
> >
>
>
>
> > We cannot shrink the skb that way here I guess, a caller may check
> > needed headroom & tailroom, calls with nhead=1/ntail=0 because only
> > headroom is missing, but after the call tailroom would be less than
> > before the call.
> >
> > Why don't we juste reallocate to this size:
> >
> > MAX(current_alloc_size, nhead + ntail + current_end - current_head)
>
> Hmm,
>
> this changes nothing assuming current_end == skb_end_offset(skb)
> and current_head = skb->head
>
> Not sure what you mean.

Following patch maybe ...

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index cdc2859..f6c1f52 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1053,11 +1053,22 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
{
int i;
u8 *data;
- int size = nhead + skb_end_offset(skb) + ntail;
+ unsigned int tail_offset = skb_tail_pointer(skb) - skb->head;
+ int size = nhead + ntail;
long off;

BUG_ON(nhead < 0);

+ /* callers using nhead == 0 and ntail == 0 wants to get a fresh copy,
+ * so allocate same amount of memory (skb_end_offset)
+ * For others, they want extra head or tail against the currently
+ * used portion of header (skb->head -> skb_tail_pointer).
+ * But we dont shrink the head.
+ */
+ if (size)
+ size += tail_offset;
+ size = max_t(int, size, skb_end_offset(skb));
+
if (skb_shared(skb))
BUG();

@@ -1074,7 +1085,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
/* Copy only real data... and, alas, header. This should be
* optimized for the cases when header is void.
*/
- memcpy(data + nhead, skb->head, skb_tail_pointer(skb) - skb->head);
+ memcpy(data + nhead, skb->head, tail_offset);

memcpy((struct skb_shared_info *)(data + size),
skb_shinfo(skb),

2012-10-05 12:39:57

by Eric Dumazet

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

On Fri, 2012-10-05 at 14:37 +0200, Eric Dumazet wrote:

> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index cdc2859..f6c1f52 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1053,11 +1053,22 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
> {
> int i;
> u8 *data;
> - int size = nhead + skb_end_offset(skb) + ntail;
> + unsigned int tail_offset = skb_tail_pointer(skb) - skb->head;
> + int size = nhead + ntail;
> long off;
>
> BUG_ON(nhead < 0);
>
> + /* callers using nhead == 0 and ntail == 0 wants to get a fresh copy,
> + * so allocate same amount of memory (skb_end_offset)
> + * For others, they want extra head or tail against the currently
> + * used portion of header (skb->head -> skb_tail_pointer).
> + * But we dont shrink the head.
> + */
> + if (size)
> + size += tail_offset;
> + size = max_t(int, size, skb_end_offset(skb));
> +
>

This can be factorized to :

size = tail_offset + nhead + ntail;
size = max_t(int, size, skb_end_offset(skb));

2012-10-05 12:51:52

by Maxime Bizon

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

On Fri, 2012-10-05 at 14:22 +0200, Eric Dumazet wrote:

> Yes, but the idea of the patch was to _avoid_ next pskb_expand_head()
> calls...

yes but we cannot be sure of that, the caller may not have a good idea
of the headroom needed for the whole lifetime of the skb

it's better to think we will reduce number of calls, not avoid them

that's why I think doubling the size each time is dangerous, since we
silently request bigger and bigger allocations if an skb takes an
unoptimized path

> Hmm,
>
> this changes nothing assuming current_end == skb_end_offset(skb)
> and current_head = skb->head

My idea was to leave skb->end at its last position even if we grow
skb->head.

Since we have a way to know the current allocation size of skb->head,
further pskb_expand_head() calls to request tailroom would just push
skb->tail & skb->end together if that fits in current ksize().

I've not looked at recent changes in mainline, since you changed how
skb->head is managed, that may be totally impossible.

Your proposed changed API change to expand_head will fill this anyway.

> New convention would be : pass number of needed bytes after current
> tail, not after current end.

Fully agree on this

--
Maxime

2012-10-05 13:02:59

by Eric Dumazet

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

On Fri, 2012-10-05 at 14:51 +0200, Maxime Bizon wrote:

> > New convention would be : pass number of needed bytes after current
> > tail, not after current end.
>
> Fully agree on this
>

Here is the proposal :

Change all occurrences of :

pskb_expand_head(skb, 0, 0, gfp)

to

pskb_realloc_head(skb, gfp)
[ an inline calling a common function ]

And few other pskb_expand_head() calls a new function

pskb_may_expand_head(skb, nhead, ntail, gfp)
[ an inline calling same common function ]

nhead : number of needed bytes before [data,tail] portion
(might be already < skb_headroom(skb))

ntail : number of needed bytes after [data,tail] portion
(might be already > skb_tailroom(skb))

For example, net/xfrm/xfrm_output.c would be like the following :

(Note how all the various ugly tests will be centralized in the common
function)

And the common function would now know the caller intent.

diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index 95a338c..15760df 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -24,18 +24,10 @@ static int xfrm_output2(struct sk_buff *skb);
static int xfrm_skb_check_space(struct sk_buff *skb)
{
struct dst_entry *dst = skb_dst(skb);
- int nhead = dst->header_len + LL_RESERVED_SPACE(dst->dev)
- - skb_headroom(skb);
- int ntail = dst->dev->needed_tailroom - skb_tailroom(skb);
-
- if (nhead <= 0) {
- if (ntail <= 0)
- return 0;
- nhead = 0;
- } else if (ntail < 0)
- ntail = 0;
-
- return pskb_expand_head(skb, nhead, ntail, GFP_ATOMIC);
+ int nhead = dst->header_len + LL_RESERVED_SPACE(dst->dev);
+ int ntail = dst->dev->needed_tailroom;
+
+ return pskb_may_expand_head(skb, nhead, ntail, GFP_ATOMIC);
}

static int xfrm_output_one(struct sk_buff *skb, int err)

2012-10-05 14:51:03

by Maxime Bizon

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

On Fri, 2012-10-05 at 15:02 +0200, Eric Dumazet wrote:

> On Fri, 2012-10-05 at 14:51 +0200, Maxime Bizon wrote:
>
> > > New convention would be : pass number of needed bytes after current
> > > tail, not after current end.
> >
> > Fully agree on this
> >
>
> Here is the proposal :

Looks fine

What is your plan for the actual pskb_expand_head() code now that you
will have absolute values for headroom & tailroom ?

Because there will still be callers that have no clue of required
tailroom (nor further headroom requirement), like skb_cow() in
vlan_reorder_header().

--
Maxime

2012-10-05 15:04:10

by Eric Dumazet

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

On Fri, 2012-10-05 at 16:50 +0200, Maxime Bizon wrote:
> On Fri, 2012-10-05 at 15:02 +0200, Eric Dumazet wrote:
>
> > On Fri, 2012-10-05 at 14:51 +0200, Maxime Bizon wrote:
> >
> > > > New convention would be : pass number of needed bytes after current
> > > > tail, not after current end.
> > >
> > > Fully agree on this
> > >
> >
> > Here is the proposal :
>
> Looks fine
>
> What is your plan for the actual pskb_expand_head() code now that you
> will have absolute values for headroom & tailroom ?
>

They stay relative values.

For example, netlink_trim() really wants to shrink the skb head.

> Because there will still be callers that have no clue of required
> tailroom (nor further headroom requirement), like skb_cow() in
> vlan_reorder_header().
>

What I plan is to not shrink size, unless specifically asked.

Its 3.8 material anyway, so a stable fix is needed on skb_recycle() and
NET_SKB_PAD minimal value.

2012-10-05 15:15:54

by Maxime Bizon

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

On Fri, 2012-10-05 at 17:04 +0200, Eric Dumazet wrote:
>
>
> What I plan is to not shrink size, unless specifically asked.
>
> Its 3.8 material anyway, so a stable fix is needed on skb_recycle()
> and NET_SKB_PAD minimal value.

You think removing skb_recycle() is too big a change for stable ?

Driver change is simple, as recycling is not guaranteed today you have
this:

if (!try_recycle(skb))
skb = alloc_skb()
>
we just remove the try_recycle part, we are not adding any new code
path.

I'm not the right person to give you a correct NET_SKB_PAD value, I have
a lot of out of tree patches in my kernels, some custom drivers as well
that needs quite a lot of headroom, and uncommon network setup.

--
Maxime

2012-10-05 15:37:16

by Eric Dumazet

[permalink] [raw]

Subject: Re: kernel 3.2.27 on arm: WARNING: at mm/page_alloc.c:2109 __alloc_pages_nodemask+0x1d4/0x68c()

On Fri, 2012-10-05 at 17:15 +0200, Maxime Bizon wrote:

> You think removing skb_recycle() is too big a change for stable ?
>
> Driver change is simple, as recycling is not guaranteed today you have
> this:
>
> if (!try_recycle(skb))
> skb = alloc_skb()
> >
> we just remove the try_recycle part, we are not adding any new code
> path.
>
>
> I'm not the right person to give you a correct NET_SKB_PAD value, I have
> a lot of out of tree patches in my kernels, some custom drivers as well
> that needs quite a lot of headroom, and uncommon network setup.
>

OK, I'll send the patch to remove skb_recycle()