2006-09-22 07:27:22

by Holger Kiehl

[permalink] [raw]
Subject: 2.6.1[78] page allocation failure. order:3, mode:0x20

Hello

I get some of the "page allocation failure" errors. My hardware is 4 CPU
Opteron with one quad + one dual intel e1000 cards. Kernel is plain 2.6.18
and for two cards MTU is set to 9000.

Sep 21 21:03:15 athena kernel: vsftpd: page allocation failure. order:3, mode:0x20
Sep 21 21:03:15 athena kernel:
Sep 21 21:03:15 athena kernel: Call Trace:
Sep 21 21:03:15 athena kernel: <IRQ> [<ffffffff8024e516>] __alloc_pages+0x282/0x29b
Sep 21 21:03:15 athena kernel: [<ffffffff8807aa93>] :ip_tables:ipt_do_table+0x1eb/0x318
Sep 21 21:03:15 athena kernel: [<ffffffff8026614b>] cache_grow+0x134/0x33d
Sep 21 21:03:15 athena kernel: [<ffffffff8026664c>] cache_alloc_refill+0x189/0x1d7
Sep 21 21:03:15 athena kernel: [<ffffffff80266724>] __kmalloc+0x8a/0x94
Sep 21 21:03:15 athena kernel: [<ffffffff803b5438>] __alloc_skb+0x5c/0x123
Sep 21 21:03:15 athena kernel: [<ffffffff803b5f2e>] __netdev_alloc_skb+0x12/0x2d
Sep 21 21:03:15 athena kernel: [<ffffffff8033cb22>] e1000_alloc_rx_buffers+0x6f/0x2f3
Sep 21 21:03:15 athena kernel: [<ffffffff803d1234>] ip_local_deliver+0x173/0x23b
Sep 21 21:03:15 athena kernel: [<ffffffff8033d29a>] e1000_clean_rx_irq+0x4f4/0x514
Sep 21 21:03:15 athena kernel: [<ffffffff8033c08b>] e1000_clean+0x2a8/0x394
Sep 21 21:03:15 athena kernel: [<ffffffff803bad58>] net_rx_action+0x77/0x15c
Sep 21 21:03:15 athena kernel: [<ffffffff8033ca9f>] e1000_intr+0xd5/0xe9
Sep 21 21:03:15 athena kernel: [<ffffffff8022fcb4>] __do_softirq+0x5e/0xd6
Sep 21 21:03:15 athena kernel: [<ffffffff80215fda>] end_level_ioapic_vector+0x9/0x16
Sep 21 21:03:15 athena kernel: [<ffffffff8020a9fc>] call_softirq+0x1c/0x28
Sep 21 21:03:15 athena kernel: [<ffffffff8020c4b9>] do_softirq+0x2c/0x7f
Sep 21 21:03:15 athena kernel: [<ffffffff8020c484>] do_IRQ+0x6a/0x73
Sep 21 21:03:15 athena kernel: [<ffffffff80209d21>] ret_from_intr+0x0/0xa
Sep 21 21:03:15 athena kernel: <EOI> [<ffffffff802d8ae2>] copy_user_generic_c+0x8/0x1a
Sep 21 21:03:15 athena kernel: [<ffffffff803b63f4>] memcpy_toiovec+0x36/0x66
Sep 21 21:03:15 athena kernel: [<ffffffff803b683c>] skb_copy_datagram_iovec+0x4f/0x1e8
Sep 21 21:03:15 athena kernel: [<ffffffff803dca2d>] tcp_recvmsg+0x62b/0xb05
Sep 21 21:03:15 athena kernel: [<ffffffff803b13d9>] sock_common_recvmsg+0x2d/0x42
Sep 21 21:03:15 athena kernel: [<ffffffff803af0a1>] do_sock_read+0x9b/0x9f
Sep 21 21:03:15 athena kernel: [<ffffffff803af7ba>] sock_aio_read+0x4f/0x5e
Sep 21 21:03:15 athena kernel: [<ffffffff8026b2ea>] do_sync_read+0xc7/0x104
Sep 21 21:03:15 athena kernel: [<ffffffff8023eb01>] hrtimer_start+0xbb/0xcd
Sep 21 21:03:15 athena kernel: [<ffffffff8023c720>] autoremove_wake_function+0x0/0x2e
Sep 21 21:03:15 athena kernel: [<ffffffff8040b1e5>] thread_return+0x0/0xd4
Sep 21 21:03:15 athena kernel: [<ffffffff8040abf0>] __sched_text_start+0x150/0x745
Sep 21 21:03:15 athena kernel: [<ffffffff8026baf3>] vfs_read+0xb9/0x104
Sep 21 21:03:15 athena kernel: [<ffffffff8026be77>] sys_read+0x45/0x6e
Sep 21 21:03:15 athena kernel: [<ffffffff80209826>] system_call+0x7e/0x83
Sep 21 21:03:15 athena kernel:
Sep 21 21:03:15 athena kernel: Mem-info:
Sep 21 21:03:15 athena kernel: Node 0 DMA per-cpu:
Sep 21 21:03:15 athena kernel: cpu 0 hot: high 0, batch 1 used:0
Sep 21 21:03:15 athena kernel: cpu 0 cold: high 0, batch 1 used:0
Sep 21 21:03:15 athena kernel: cpu 1 hot: high 0, batch 1 used:0
Sep 21 21:03:15 athena kernel: cpu 1 cold: high 0, batch 1 used:0
Sep 21 21:03:15 athena kernel: cpu 2 hot: high 0, batch 1 used:0
Sep 21 21:03:15 athena kernel: cpu 2 cold: high 0, batch 1 used:0
Sep 21 21:03:15 athena kernel: cpu 3 hot: high 0, batch 1 used:0
Sep 21 21:03:15 athena kernel: cpu 3 cold: high 0, batch 1 used:0
Sep 21 21:03:15 athena kernel: Node 0 DMA32 per-cpu:
Sep 21 21:03:15 athena kernel: cpu 0 hot: high 186, batch 31 used:182
Sep 21 21:03:15 athena kernel: cpu 0 cold: high 62, batch 15 used:59
Sep 21 21:03:15 athena kernel: cpu 1 hot: high 186, batch 31 used:176
Sep 21 21:03:15 athena kernel: cpu 1 cold: high 62, batch 15 used:0
Sep 21 21:03:15 athena kernel: cpu 2 hot: high 186, batch 31 used:178
Sep 21 21:03:15 athena kernel: cpu 2 cold: high 62, batch 15 used:0
Sep 21 21:03:15 athena kernel: cpu 3 hot: high 186, batch 31 used:180
Sep 21 21:03:15 athena kernel: cpu 3 cold: high 62, batch 15 used:0
Sep 21 21:03:15 athena kernel: Node 0 Normal per-cpu: empty
Sep 21 21:03:15 athena kernel: Node 0 HighMem per-cpu: empty
Sep 21 21:03:15 athena kernel: Node 1 DMA per-cpu: empty
Sep 21 21:03:15 athena kernel: Node 1 DMA32 per-cpu: empty
Sep 21 21:03:15 athena kernel: Node 1 Normal per-cpu:
Sep 21 21:03:15 athena kernel: cpu 0 hot: high 186, batch 31 used:163
Sep 21 21:03:15 athena kernel: cpu 0 cold: high 62, batch 15 used:0
Sep 21 21:03:15 athena kernel: cpu 1 hot: high 186, batch 31 used:7
Sep 21 21:03:15 athena kernel: cpu 1 cold: high 62, batch 15 used:48
Sep 21 21:03:15 athena kernel: cpu 2 hot: high 186, batch 31 used:156
Sep 21 21:03:15 athena kernel: cpu 2 cold: high 62, batch 15 used:0
Sep 21 21:03:15 athena kernel: cpu 3 hot: high 186, batch 31 used:170
Sep 21 21:03:15 athena kernel: cpu 3 cold: high 62, batch 15 used:0
Sep 21 21:03:15 athena kernel: Node 1 HighMem per-cpu: empty
Sep 21 21:03:15 athena kernel: Node 2 DMA per-cpu: empty
Sep 21 21:03:15 athena kernel: Node 2 DMA32 per-cpu: empty
Sep 21 21:03:15 athena kernel: Node 2 Normal per-cpu:
Sep 21 21:03:15 athena kernel: cpu 0 hot: high 186, batch 31 used:170
Sep 21 21:03:15 athena kernel: cpu 0 cold: high 62, batch 15 used:0
Sep 21 21:03:15 athena kernel: cpu 1 hot: high 186, batch 31 used:170
Sep 21 21:03:15 athena kernel: cpu 1 cold: high 62, batch 15 used:0
Sep 21 21:03:15 athena kernel: cpu 2 hot: high 186, batch 31 used:18
Sep 21 21:03:15 athena kernel: cpu 2 cold: high 62, batch 15 used:55
Sep 21 21:03:15 athena kernel: cpu 3 hot: high 186, batch 31 used:155
Sep 21 21:03:15 athena kernel: cpu 3 cold: high 62, batch 15 used:0
Sep 21 21:03:15 athena kernel: Node 2 HighMem per-cpu: empty
Sep 21 21:03:15 athena kernel: Node 3 DMA per-cpu: empty
Sep 21 21:03:15 athena kernel: Node 3 DMA32 per-cpu: empty
Sep 21 21:03:15 athena kernel: Node 3 Normal per-cpu:
Sep 21 21:03:15 athena kernel: cpu 0 hot: high 186, batch 31 used:169
Sep 21 21:03:15 athena kernel: cpu 0 cold: high 62, batch 15 used:0
Sep 21 21:03:15 athena kernel: cpu 1 hot: high 186, batch 31 used:172
Sep 21 21:03:15 athena kernel: cpu 1 cold: high 62, batch 15 used:0
Sep 21 21:03:15 athena kernel: cpu 2 hot: high 186, batch 31 used:154
Sep 21 21:03:15 athena kernel: cpu 2 cold: high 62, batch 15 used:0
Sep 21 21:03:15 athena kernel: cpu 3 hot: high 186, batch 31 used:155
Sep 21 21:03:15 athena kernel: cpu 3 cold: high 62, batch 15 used:60
Sep 21 21:03:15 athena kernel: Node 3 HighMem per-cpu: empty
Sep 21 21:03:15 athena kernel: Free pages: 654488kB (0kB HighMem)
Sep 21 21:03:15 athena kernel: Active:1351719 inactive:440448 dirty:105021 writeback:0 unstable:0 free:163622 slab:94084 mapped:11424 pagetables:3165
Sep 21 21:03:15 athena kernel: Node 0 DMA free:1808kB min:16kB low:20kB high:24kB active:0kB inactive:0kB present:12016kB pages_scanned:0 all_unreclaimable? yes
Sep 21 21:03:15 athena kernel: lowmem_reserve[]: 0 2003 2003 2003
Sep 21 21:03:15 athena kernel: Node 0 DMA32 free:95492kB min:2852kB low:3564kB high:4276kB active:1432704kB inactive:435520kB present:2051744kB pages_scanned:0 all_unreclaimable? no
Sep 21 21:03:15 athena kernel: lowmem_reserve[]: 0 0 0 0
Sep 21 21:03:15 athena kernel: Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Sep 21 21:03:15 athena kernel: lowmem_reserve[]: 0 0 0 0
Sep 21 21:03:15 athena kernel: Node 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Sep 21 21:03:15 athena kernel: lowmem_reserve[]: 0 0 0 0
Sep 21 21:03:15 athena kernel: Node 1 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Sep 21 21:03:15 athena kernel: lowmem_reserve[]: 0 0 2020 2020
Sep 21 21:03:15 athena kernel: Node 1 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Sep 21 21:03:15 athena kernel: lowmem_reserve[]: 0 0 2020 2020
Sep 21 21:03:15 athena kernel: Node 1 Normal free:75860kB min:2876kB low:3592kB high:4312kB active:1569988kB inactive:327416kB present:2068480kB pages_scanned:0 all_unreclaimable? no
Sep 21 21:03:15 athena kernel: lowmem_reserve[]: 0 0 0 0
Sep 21 21:03:15 athena kernel: Node 1 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Sep 21 21:03:15 athena kernel: lowmem_reserve[]: 0 0 0 0
Sep 21 21:03:15 athena kernel: Node 2 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Sep 21 21:03:15 athena kernel: lowmem_reserve[]: 0 0 2020 2020
Sep 21 21:03:15 athena kernel: Node 2 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Sep 21 21:03:15 athena kernel: lowmem_reserve[]: 0 0 2020 2020
Sep 21 21:03:15 athena kernel: Node 2 Normal free:172028kB min:2876kB low:3592kB high:4312kB active:1256216kB inactive:524508kB present:2068480kB pages_scanned:13 all_unreclaimable? no
Sep 21 21:03:15 athena kernel: lowmem_reserve[]: 0 0 0 0
Sep 21 21:03:15 athena kernel: Node 2 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Sep 21 21:03:15 athena kernel: lowmem_reserve[]: 0 0 0 0
Sep 21 21:03:15 athena kernel: Node 3 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Sep 21 21:03:15 athena kernel: lowmem_reserve[]: 0 0 2020 2020
Sep 21 21:03:15 athena kernel: Node 3 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Sep 21 21:03:15 athena kernel: lowmem_reserve[]: 0 0 2020 2020
Sep 21 21:03:15 athena kernel: Node 3 Normal free:320576kB min:2876kB low:3592kB high:4312kB active:1147740kB inactive:463360kB present:2068480kB pages_scanned:0 all_unreclaimable? no
Sep 21 21:03:15 athena kernel: lowmem_reserve[]: 0 0 0 0
Sep 21 21:03:15 athena kernel: Node 3 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Sep 21 21:03:15 athena kernel: lowmem_reserve[]: 0 0 0 0
Sep 21 21:03:15 athena kernel: Node 0 DMA: 4*4kB 4*8kB 2*16kB 4*32kB 1*64kB 2*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1808kB
Sep 21 21:03:15 athena kernel: Node 0 DMA32: 19481*4kB 2482*8kB 17*16kB 1*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 98212kB
Sep 21 21:03:15 athena kernel: Node 0 Normal: empty
Sep 21 21:03:15 athena kernel: Node 0 HighMem: empty
Sep 21 21:03:15 athena kernel: Node 1 DMA: empty
Sep 21 21:03:15 athena kernel: Node 1 DMA32: empty
Sep 21 21:03:15 athena kernel: Node 1 Normal: 10672*4kB 4024*8kB 121*16kB 1*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 76976kB
Sep 21 21:03:15 athena kernel: Node 1 HighMem: empty
Sep 21 21:03:15 athena kernel: Node 2 DMA: empty
Sep 21 21:03:15 athena kernel: Node 2 DMA32: empty
Sep 21 21:03:15 athena kernel: Node 2 Normal: 24795*4kB 8355*8kB 381*16kB 1*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 172276kB
Sep 21 21:03:15 athena kernel: Node 2 HighMem: empty
Sep 21 21:03:15 athena kernel: Node 3 DMA: empty
Sep 21 21:03:15 athena kernel: Node 3 DMA32: empty
Sep 21 21:03:15 athena kernel: Node 3 Normal: 47395*4kB 14417*8kB 926*16kB 96*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 322932kB
Sep 21 21:03:15 athena kernel: Node 3 HighMem: empty
Sep 21 21:03:15 athena kernel: Swap cache: add 53, delete 53, find 0/0, race 0+0
Sep 21 21:03:15 athena kernel: Free swap = 33214052kB
Sep 21 21:03:15 athena kernel: Total swap = 33214264kB
Sep 21 21:03:15 athena kernel: Free swap: 33214052kB
Sep 21 21:03:15 athena kernel: 2097152 pages of RAM
Sep 21 21:03:15 athena kernel: 32970 reserved pages
Sep 21 21:03:15 athena kernel: 1640568 pages shared
Sep 21 21:03:15 athena kernel: 0 pages swap cached

With 2.6.16.x there where no issues. I am not sure, but this seems to indicate
some error in the e1000 driver and there was a major update between 2.6.16
and 2.6.18. I have also tried 2.6.17, it has the same issue. So I need to
go back to 2.6.16.

What can I do to remove those errors?

Holger
--


2006-09-22 07:43:06

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.1[78] page allocation failure. order:3, mode:0x20

On Fri, 22 Sep 2006 07:27:18 +0000 (GMT)
Holger Kiehl <[email protected]> wrote:

> I get some of the "page allocation failure" errors. My hardware is 4 CPU
> Opteron with one quad + one dual intel e1000 cards. Kernel is plain 2.6.18
> and for two cards MTU is set to 9000.
>
> Sep 21 21:03:15 athena kernel: vsftpd: page allocation failure. order:3, mode:0x20
> Sep 21 21:03:15 athena kernel:
> Sep 21 21:03:15 athena kernel: Call Trace:
> Sep 21 21:03:15 athena kernel: <IRQ> [<ffffffff8024e516>] __alloc_pages+0x282/0x29b
> Sep 21 21:03:15 athena kernel: [<ffffffff8807aa93>] :ip_tables:ipt_do_table+0x1eb/0x318
> Sep 21 21:03:15 athena kernel: [<ffffffff8026614b>] cache_grow+0x134/0x33d
> Sep 21 21:03:15 athena kernel: [<ffffffff8026664c>] cache_alloc_refill+0x189/0x1d7
> Sep 21 21:03:15 athena kernel: [<ffffffff80266724>] __kmalloc+0x8a/0x94
> Sep 21 21:03:15 athena kernel: [<ffffffff803b5438>] __alloc_skb+0x5c/0x123
> Sep 21 21:03:15 athena kernel: [<ffffffff803b5f2e>] __netdev_alloc_skb+0x12/0x2d
> Sep 21 21:03:15 athena kernel: [<ffffffff8033cb22>] e1000_alloc_rx_buffers+0x6f/0x2f3
> Sep 21 21:03:15 athena kernel: [<ffffffff803d1234>] ip_local_deliver+0x173/0x23b
> Sep 21 21:03:15 athena kernel: [<ffffffff8033d29a>] e1000_clean_rx_irq+0x4f4/0x514

Is OK, it's just a warning and it is expected - the kernel will recover.

I'm half-inclined to shut the warning up by sticking a __GFP_NOWARN in there.

But on the other hand, that warning is handy sometimes. How come kmalloc
decided to request a 32k hunk of memory when the MTU size is only 9k? Is
the driver doing something dumb?

else if (max_frame <= E1000_RXBUFFER_8192)
adapter->rx_buffer_len = E1000_RXBUFFER_8192;
else if (max_frame <= E1000_RXBUFFER_16384)
adapter->rx_buffer_len = E1000_RXBUFFER_16384;

It sure is.

This is going to cause an 9000-byte MTU to use a 16384-byte allocation.
e1000_alloc_rx_buffers() adds two bytes to that, so we do kmalloc(16386),
which causes the slab allocator to request 32768 bytes. All for a 9kbyte skb.

2006-09-22 12:03:15

by Holger Kiehl

[permalink] [raw]
Subject: Re: 2.6.1[78] page allocation failure. order:3, mode:0x20

On Fri, 22 Sep 2006, Andrew Morton wrote:

> On Fri, 22 Sep 2006 07:27:18 +0000 (GMT)
> Holger Kiehl <[email protected]> wrote:
>
>> I get some of the "page allocation failure" errors. My hardware is 4 CPU
>> Opteron with one quad + one dual intel e1000 cards. Kernel is plain 2.6.18
>> and for two cards MTU is set to 9000.
>>
>> Sep 21 21:03:15 athena kernel: vsftpd: page allocation failure. order:3, mode:0x20
>> Sep 21 21:03:15 athena kernel:
>> Sep 21 21:03:15 athena kernel: Call Trace:
>> Sep 21 21:03:15 athena kernel: <IRQ> [<ffffffff8024e516>] __alloc_pages+0x282/0x29b
>> Sep 21 21:03:15 athena kernel: [<ffffffff8807aa93>] :ip_tables:ipt_do_table+0x1eb/0x318
>> Sep 21 21:03:15 athena kernel: [<ffffffff8026614b>] cache_grow+0x134/0x33d
>> Sep 21 21:03:15 athena kernel: [<ffffffff8026664c>] cache_alloc_refill+0x189/0x1d7
>> Sep 21 21:03:15 athena kernel: [<ffffffff80266724>] __kmalloc+0x8a/0x94
>> Sep 21 21:03:15 athena kernel: [<ffffffff803b5438>] __alloc_skb+0x5c/0x123
>> Sep 21 21:03:15 athena kernel: [<ffffffff803b5f2e>] __netdev_alloc_skb+0x12/0x2d
>> Sep 21 21:03:15 athena kernel: [<ffffffff8033cb22>] e1000_alloc_rx_buffers+0x6f/0x2f3
>> Sep 21 21:03:15 athena kernel: [<ffffffff803d1234>] ip_local_deliver+0x173/0x23b
>> Sep 21 21:03:15 athena kernel: [<ffffffff8033d29a>] e1000_clean_rx_irq+0x4f4/0x514
>
> Is OK, it's just a warning and it is expected - the kernel will recover.
>
> I'm half-inclined to shut the warning up by sticking a __GFP_NOWARN in there.
>
> But on the other hand, that warning is handy sometimes. How come kmalloc
> decided to request a 32k hunk of memory when the MTU size is only 9k? Is
> the driver doing something dumb?
>
> else if (max_frame <= E1000_RXBUFFER_8192)
> adapter->rx_buffer_len = E1000_RXBUFFER_8192;
> else if (max_frame <= E1000_RXBUFFER_16384)
> adapter->rx_buffer_len = E1000_RXBUFFER_16384;
>
> It sure is.
>
> This is going to cause an 9000-byte MTU to use a 16384-byte allocation.
> e1000_alloc_rx_buffers() adds two bytes to that, so we do kmalloc(16386),
> which causes the slab allocator to request 32768 bytes. All for a 9kbyte skb.
>
I searched the list, which I should have done before asking (I was not sure
if this was due to the e1000) and found this

http://www.ussg.iu.edu/hypermail/linux/kernel/0608.0/0942.html

discusion from 3rd August. As a summary I read that people are trying to find
a solution, in the meantime one should set /proc/sys/vm/min_free_kbytes to
65000 or higher, to ensure that the driver gets enough unfragmented memory.

Holger

2006-09-22 12:12:47

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: 2.6.1[78] page allocation failure. order:3, mode:0x20

On Fri, Sep 22, 2006 at 12:03:11PM +0000, Holger Kiehl ([email protected]) wrote:
> >This is going to cause an 9000-byte MTU to use a 16384-byte allocation.
> >e1000_alloc_rx_buffers() adds two bytes to that, so we do kmalloc(16386),
> >which causes the slab allocator to request 32768 bytes. All for a 9kbyte
> >skb.
> >
> I searched the list, which I should have done before asking (I was not sure
> if this was due to the e1000) and found this
>
> http://www.ussg.iu.edu/hypermail/linux/kernel/0608.0/0942.html
>
> discusion from 3rd August. As a summary I read that people are trying to
> find
> a solution, in the meantime one should set /proc/sys/vm/min_free_kbytes to
> 65000 or higher, to ensure that the driver gets enough unfragmented memory.

There is no solution (although e1000 memory management problem is one of
the reasons I created memory tree allocator) yet, only workarounds, one
of which you described above.

e1000 hardware does not support setting of the maximum transfer size, it
only allows power of two (and about 1500), so it does require 16k of
memory for 9k frame (plus network skb allocation path adds a little which
is transformed into 32k request due to power of two problem).

Intel folks were suggested to either use fragments in one skb (or wait
until network developers invent something new), but there are no patches
from them (hopefully yet).

It is not e1000 only problem - expect even 8k-12k allocation not on
startup is definitely a wrong way.

> Holger

--
Evgeniy Polyakov

2006-09-22 17:17:58

by Kok, Auke

[permalink] [raw]
Subject: Re: 2.6.1[78] page allocation failure. order:3, mode:0x20

Andrew Morton wrote:
> On Fri, 22 Sep 2006 07:27:18 +0000 (GMT)
> Holger Kiehl <[email protected]> wrote:
>
>> I get some of the "page allocation failure" errors. My hardware is 4 CPU
>> Opteron with one quad + one dual intel e1000 cards. Kernel is plain 2.6.18
>> and for two cards MTU is set to 9000.
>>
>> Sep 21 21:03:15 athena kernel: vsftpd: page allocation failure. order:3, mode:0x20
>> Sep 21 21:03:15 athena kernel:
>> Sep 21 21:03:15 athena kernel: Call Trace:
>> Sep 21 21:03:15 athena kernel: <IRQ> [<ffffffff8024e516>] __alloc_pages+0x282/0x29b
>> Sep 21 21:03:15 athena kernel: [<ffffffff8807aa93>] :ip_tables:ipt_do_table+0x1eb/0x318
>> Sep 21 21:03:15 athena kernel: [<ffffffff8026614b>] cache_grow+0x134/0x33d
>> Sep 21 21:03:15 athena kernel: [<ffffffff8026664c>] cache_alloc_refill+0x189/0x1d7
>> Sep 21 21:03:15 athena kernel: [<ffffffff80266724>] __kmalloc+0x8a/0x94
>> Sep 21 21:03:15 athena kernel: [<ffffffff803b5438>] __alloc_skb+0x5c/0x123
>> Sep 21 21:03:15 athena kernel: [<ffffffff803b5f2e>] __netdev_alloc_skb+0x12/0x2d
>> Sep 21 21:03:15 athena kernel: [<ffffffff8033cb22>] e1000_alloc_rx_buffers+0x6f/0x2f3
>> Sep 21 21:03:15 athena kernel: [<ffffffff803d1234>] ip_local_deliver+0x173/0x23b
>> Sep 21 21:03:15 athena kernel: [<ffffffff8033d29a>] e1000_clean_rx_irq+0x4f4/0x514
>
> Is OK, it's just a warning and it is expected - the kernel will recover.
>
> I'm half-inclined to shut the warning up by sticking a __GFP_NOWARN in there.
>
> But on the other hand, that warning is handy sometimes. How come kmalloc
> decided to request a 32k hunk of memory when the MTU size is only 9k? Is
> the driver doing something dumb?
>
> else if (max_frame <= E1000_RXBUFFER_8192)
> adapter->rx_buffer_len = E1000_RXBUFFER_8192;
> else if (max_frame <= E1000_RXBUFFER_16384)
> adapter->rx_buffer_len = E1000_RXBUFFER_16384;
>
> It sure is.
>
> This is going to cause an 9000-byte MTU to use a 16384-byte allocation.
> e1000_alloc_rx_buffers() adds two bytes to that, so we do kmalloc(16386),
> which causes the slab allocator to request 32768 bytes. All for a 9kbyte skb.

I wonder if we can't account for NET_IP_ALIGN when selecting bufsize, to get at
rid of at least 1 order size before we netdev_alloc_skb. This should make 9k
frames only kmalloc(16384) and thus stay within the 16k boundary. I hope.

Completely untested: don't commit :)

Auke

---

e1000: account for NET_IP_ALIGN when calculating bufsiz

Account for NET_IP_ALIGN when requesting buffer sizes from netdev_alloc_skb to
reduce slab allocation by half.

Signed-off-by: Auke Kok <[email protected]>

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index bb0d129..20b1f39 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -1144,7 +1144,7 @@ #endif

pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word);

- adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
+ adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + NET_IP_ALIGN;
adapter->rx_ps_bsize0 = E1000_RXBUFFER_128;
hw->max_frame_size = netdev->mtu +
ENET_HEADER_SIZE + ETHERNET_FCS_SIZE;
@@ -3234,26 +3234,27 @@ #define MAX_STD_JUMBO_FRAME_SIZE 9234
* larger slab size
* i.e. RXBUFFER_2048 --> size-4096 slab */

- if (max_frame <= E1000_RXBUFFER_256)
+ if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_256)
adapter->rx_buffer_len = E1000_RXBUFFER_256;
- else if (max_frame <= E1000_RXBUFFER_512)
+ else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_512)
adapter->rx_buffer_len = E1000_RXBUFFER_512;
- else if (max_frame <= E1000_RXBUFFER_1024)
+ else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_1024)
adapter->rx_buffer_len = E1000_RXBUFFER_1024;
- else if (max_frame <= E1000_RXBUFFER_2048)
+ else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_2048)
adapter->rx_buffer_len = E1000_RXBUFFER_2048;
- else if (max_frame <= E1000_RXBUFFER_4096)
+ else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_4096)
adapter->rx_buffer_len = E1000_RXBUFFER_4096;
- else if (max_frame <= E1000_RXBUFFER_8192)
+ else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_8192)
adapter->rx_buffer_len = E1000_RXBUFFER_8192;
- else if (max_frame <= E1000_RXBUFFER_16384)
+ else
adapter->rx_buffer_len = E1000_RXBUFFER_16384;

/* adjust allocation if LPE protects us, and we aren't using SBP */
if (!adapter->hw.tbi_compatibility_on &&
((max_frame == MAXIMUM_ETHERNET_FRAME_SIZE) ||
(max_frame == MAXIMUM_ETHERNET_VLAN_SIZE)))
- adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
+ adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE
+ + NET_IP_ALIGN;

netdev->mtu = new_mtu;

@@ -4076,7 +4076,8 @@ e1000_alloc_rx_buffers(struct e1000_adap
struct e1000_buffer *buffer_info;
struct sk_buff *skb;
unsigned int i;
- unsigned int bufsz = adapter->rx_buffer_len + NET_IP_ALIGN;
+ /* we have already accounted for NET_IP_ALIGN */
+ unsigned int bufsz = adapter->rx_buffer_len;

i = rx_ring->next_to_use;
buffer_info = &rx_ring->buffer_info[i];

2006-09-23 04:50:10

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.1[78] page allocation failure. order:3, mode:0x20

On Fri, 22 Sep 2006 10:10:36 -0700
Auke Kok <[email protected]> wrote:

> I wonder if we can't account for NET_IP_ALIGN when selecting bufsize, to get at
> rid of at least 1 order size before we netdev_alloc_skb. This should make 9k
> frames only kmalloc(16384) and thus stay within the 16k boundary. I hope.
>
> Completely untested: don't commit :)
>

I did - I think we want this patch.

>
> e1000: account for NET_IP_ALIGN when calculating bufsiz
>
> Account for NET_IP_ALIGN when requesting buffer sizes from netdev_alloc_skb to
> reduce slab allocation by half.

Could we please do whatever is needed to get this blessed and merged? This
is such a common problem on such a common driver that I would suggest that
we want this in 2.6.18.x as well. At least, I'd expect distributors to
ship this fix (they're nuts if they don't) and so it makes sense to deliver
it from kernel.org.

2006-09-23 05:25:10

by David Miller

[permalink] [raw]
Subject: Re: 2.6.1[78] page allocation failure. order:3, mode:0x20

From: Andrew Morton <[email protected]>
Date: Fri, 22 Sep 2006 21:50:00 -0700

> On Fri, 22 Sep 2006 10:10:36 -0700
> Auke Kok <[email protected]> wrote:
>
> > e1000: account for NET_IP_ALIGN when calculating bufsiz
> >
> > Account for NET_IP_ALIGN when requesting buffer sizes from netdev_alloc_skb to
> > reduce slab allocation by half.
>
> Could we please do whatever is needed to get this blessed and merged? This
> is such a common problem on such a common driver that I would suggest that
> we want this in 2.6.18.x as well. At least, I'd expect distributors to
> ship this fix (they're nuts if they don't) and so it makes sense to deliver
> it from kernel.org.

The NET_IP_ALIGN existed not just for fun :) There are ramifications
for removing it.

2006-09-23 05:33:55

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.1[78] page allocation failure. order:3, mode:0x20

On Fri, 22 Sep 2006 22:25:07 -0700 (PDT)
David Miller <[email protected]> wrote:

> From: Andrew Morton <[email protected]>
> Date: Fri, 22 Sep 2006 21:50:00 -0700
>
> > On Fri, 22 Sep 2006 10:10:36 -0700
> > Auke Kok <[email protected]> wrote:
> >
> > > e1000: account for NET_IP_ALIGN when calculating bufsiz
> > >
> > > Account for NET_IP_ALIGN when requesting buffer sizes from netdev_alloc_skb to
> > > reduce slab allocation by half.
> >
> > Could we please do whatever is needed to get this blessed and merged? This
> > is such a common problem on such a common driver that I would suggest that
> > we want this in 2.6.18.x as well. At least, I'd expect distributors to
> > ship this fix (they're nuts if they don't) and so it makes sense to deliver
> > it from kernel.org.
>
> The NET_IP_ALIGN existed not just for fun :) There are ramifications
> for removing it.

It's still there, isn't it?

For the 9k MTU case, for example, we end up allocating 16384 byte skbs
instead of 32786 kbytes ones.


diff -puN drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz drivers/net/e1000/e1000_main.c
--- a/drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz
+++ a/drivers/net/e1000/e1000_main.c
@@ -1101,7 +1101,7 @@ e1000_sw_init(struct e1000_adapter *adap

pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word);

- adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
+ adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + NET_IP_ALIGN;
adapter->rx_ps_bsize0 = E1000_RXBUFFER_128;
hw->max_frame_size = netdev->mtu +
ENET_HEADER_SIZE + ETHERNET_FCS_SIZE;
@@ -3163,26 +3163,27 @@ e1000_change_mtu(struct net_device *netd
* larger slab size
* i.e. RXBUFFER_2048 --> size-4096 slab */

- if (max_frame <= E1000_RXBUFFER_256)
+ if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_256)
adapter->rx_buffer_len = E1000_RXBUFFER_256;
- else if (max_frame <= E1000_RXBUFFER_512)
+ else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_512)
adapter->rx_buffer_len = E1000_RXBUFFER_512;
- else if (max_frame <= E1000_RXBUFFER_1024)
+ else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_1024)
adapter->rx_buffer_len = E1000_RXBUFFER_1024;
- else if (max_frame <= E1000_RXBUFFER_2048)
+ else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_2048)
adapter->rx_buffer_len = E1000_RXBUFFER_2048;
- else if (max_frame <= E1000_RXBUFFER_4096)
+ else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_4096)
adapter->rx_buffer_len = E1000_RXBUFFER_4096;
- else if (max_frame <= E1000_RXBUFFER_8192)
+ else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_8192)
adapter->rx_buffer_len = E1000_RXBUFFER_8192;
- else if (max_frame <= E1000_RXBUFFER_16384)
+ else
adapter->rx_buffer_len = E1000_RXBUFFER_16384;

/* adjust allocation if LPE protects us, and we aren't using SBP */
if (!adapter->hw.tbi_compatibility_on &&
((max_frame == MAXIMUM_ETHERNET_FRAME_SIZE) ||
(max_frame == MAXIMUM_ETHERNET_VLAN_SIZE)))
- adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
+ adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE +
+ NET_IP_ALIGN;

netdev->mtu = new_mtu;

@@ -4002,7 +4003,8 @@ e1000_alloc_rx_buffers(struct e1000_adap
struct e1000_buffer *buffer_info;
struct sk_buff *skb;
unsigned int i;
- unsigned int bufsz = adapter->rx_buffer_len + NET_IP_ALIGN;
+ /* we have already accounted for NET_IP_ALIGN */
+ unsigned int bufsz = adapter->rx_buffer_len;

i = rx_ring->next_to_use;
buffer_info = &rx_ring->buffer_info[i];
_

2006-09-23 18:52:36

by Kok, Auke

[permalink] [raw]
Subject: Re: 2.6.1[78] page allocation failure. order:3, mode:0x20

Andrew Morton wrote:
> On Fri, 22 Sep 2006 22:25:07 -0700 (PDT)
> David Miller <[email protected]> wrote:
>
>> From: Andrew Morton <[email protected]>
>> Date: Fri, 22 Sep 2006 21:50:00 -0700
>>
>>> On Fri, 22 Sep 2006 10:10:36 -0700
>>> Auke Kok <[email protected]> wrote:
>>>
>>>> e1000: account for NET_IP_ALIGN when calculating bufsiz
>>>>
>>>> Account for NET_IP_ALIGN when requesting buffer sizes from netdev_alloc_skb to
>>>> reduce slab allocation by half.
>>> Could we please do whatever is needed to get this blessed and merged? This
>>> is such a common problem on such a common driver that I would suggest that
>>> we want this in 2.6.18.x as well. At least, I'd expect distributors to
>>> ship this fix (they're nuts if they don't) and so it makes sense to deliver
>>> it from kernel.org.
>> The NET_IP_ALIGN existed not just for fun :) There are ramifications
>> for removing it.
>
> It's still there, isn't it?
>
> For the 9k MTU case, for example, we end up allocating 16384 byte skbs
> instead of 32786 kbytes ones.

yes, the only thing I'm doing is accounting for the 2 bytes one steap earlier.
It works fine for the general case and I tested it too, but I am not too sure
about the corner cases as the hardware has no notion of mtu at all and could
possibly overwrite by two bytes. I think my patch actually give the hardware
two bytes too much now, so we're on the other side (safe) of that problem, but
I have to verify this first of course.

I'll be wrestling this on monday with Jesse and try to nail it down.

Auke

>
>
> diff -puN drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz drivers/net/e1000/e1000_main.c
> --- a/drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz
> +++ a/drivers/net/e1000/e1000_main.c
> @@ -1101,7 +1101,7 @@ e1000_sw_init(struct e1000_adapter *adap
>
> pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word);
>
> - adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
> + adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + NET_IP_ALIGN;
> adapter->rx_ps_bsize0 = E1000_RXBUFFER_128;
> hw->max_frame_size = netdev->mtu +
> ENET_HEADER_SIZE + ETHERNET_FCS_SIZE;
> @@ -3163,26 +3163,27 @@ e1000_change_mtu(struct net_device *netd
> * larger slab size
> * i.e. RXBUFFER_2048 --> size-4096 slab */
>
> - if (max_frame <= E1000_RXBUFFER_256)
> + if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_256)
> adapter->rx_buffer_len = E1000_RXBUFFER_256;
> - else if (max_frame <= E1000_RXBUFFER_512)
> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_512)
> adapter->rx_buffer_len = E1000_RXBUFFER_512;
> - else if (max_frame <= E1000_RXBUFFER_1024)
> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_1024)
> adapter->rx_buffer_len = E1000_RXBUFFER_1024;
> - else if (max_frame <= E1000_RXBUFFER_2048)
> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_2048)
> adapter->rx_buffer_len = E1000_RXBUFFER_2048;
> - else if (max_frame <= E1000_RXBUFFER_4096)
> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_4096)
> adapter->rx_buffer_len = E1000_RXBUFFER_4096;
> - else if (max_frame <= E1000_RXBUFFER_8192)
> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_8192)
> adapter->rx_buffer_len = E1000_RXBUFFER_8192;
> - else if (max_frame <= E1000_RXBUFFER_16384)
> + else
> adapter->rx_buffer_len = E1000_RXBUFFER_16384;
>
> /* adjust allocation if LPE protects us, and we aren't using SBP */
> if (!adapter->hw.tbi_compatibility_on &&
> ((max_frame == MAXIMUM_ETHERNET_FRAME_SIZE) ||
> (max_frame == MAXIMUM_ETHERNET_VLAN_SIZE)))
> - adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
> + adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE +
> + NET_IP_ALIGN;
>
> netdev->mtu = new_mtu;
>
> @@ -4002,7 +4003,8 @@ e1000_alloc_rx_buffers(struct e1000_adap
> struct e1000_buffer *buffer_info;
> struct sk_buff *skb;
> unsigned int i;
> - unsigned int bufsz = adapter->rx_buffer_len + NET_IP_ALIGN;
> + /* we have already accounted for NET_IP_ALIGN */
> + unsigned int bufsz = adapter->rx_buffer_len;
>
> i = rx_ring->next_to_use;
> buffer_info = &rx_ring->buffer_info[i];
> _

2006-09-23 20:03:21

by David Miller

[permalink] [raw]
Subject: Re: 2.6.1[78] page allocation failure. order:3, mode:0x20

From: Auke Kok <[email protected]>
Date: Sat, 23 Sep 2006 11:50:34 -0700

> Andrew Morton wrote:
> > It's still there, isn't it?
> >
> > For the 9k MTU case, for example, we end up allocating 16384 byte skbs
> > instead of 32786 kbytes ones.
>
> yes, the only thing I'm doing is accounting for the 2 bytes one steap earlier.

Ok, I'm fine with this patch unless it causes some regression that hasn't
been discovered yet :-)

2006-09-24 15:27:29

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: 2.6.1[78] page allocation failure. order:3, mode:0x20

On Fri, Sep 22, 2006 at 10:33:48PM -0700, Andrew Morton ([email protected]) wrote:
> > The NET_IP_ALIGN existed not just for fun :) There are ramifications
> > for removing it.
>
> It's still there, isn't it?
>
> For the 9k MTU case, for example, we end up allocating 16384 byte skbs
> instead of 32786 kbytes ones.

This patch will not help - netdev_alloc_skb() adds additional
NET_SKB_PAD and then alloc_skb() adds sizeof(struct skb_shared_info).
And even if you acconut for them in adapter->rx_buf_len, chip still can
overwrite that area (in the thread mentioned in this e-mail thread
before I posted such patch and received a dump of sizes chip receives -
there were a lot of _different_ ones which were too close to the limit).

>
> diff -puN drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz drivers/net/e1000/e1000_main.c
> --- a/drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz
> +++ a/drivers/net/e1000/e1000_main.c
> @@ -1101,7 +1101,7 @@ e1000_sw_init(struct e1000_adapter *adap
>
> pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word);
>
> - adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
> + adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + NET_IP_ALIGN;
> adapter->rx_ps_bsize0 = E1000_RXBUFFER_128;
> hw->max_frame_size = netdev->mtu +
> ENET_HEADER_SIZE + ETHERNET_FCS_SIZE;
> @@ -3163,26 +3163,27 @@ e1000_change_mtu(struct net_device *netd
> * larger slab size
> * i.e. RXBUFFER_2048 --> size-4096 slab */
>
> - if (max_frame <= E1000_RXBUFFER_256)
> + if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_256)
> adapter->rx_buffer_len = E1000_RXBUFFER_256;
> - else if (max_frame <= E1000_RXBUFFER_512)
> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_512)
> adapter->rx_buffer_len = E1000_RXBUFFER_512;
> - else if (max_frame <= E1000_RXBUFFER_1024)
> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_1024)
> adapter->rx_buffer_len = E1000_RXBUFFER_1024;
> - else if (max_frame <= E1000_RXBUFFER_2048)
> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_2048)
> adapter->rx_buffer_len = E1000_RXBUFFER_2048;
> - else if (max_frame <= E1000_RXBUFFER_4096)
> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_4096)
> adapter->rx_buffer_len = E1000_RXBUFFER_4096;
> - else if (max_frame <= E1000_RXBUFFER_8192)
> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_8192)
> adapter->rx_buffer_len = E1000_RXBUFFER_8192;
> - else if (max_frame <= E1000_RXBUFFER_16384)
> + else
> adapter->rx_buffer_len = E1000_RXBUFFER_16384;
>
> /* adjust allocation if LPE protects us, and we aren't using SBP */
> if (!adapter->hw.tbi_compatibility_on &&
> ((max_frame == MAXIMUM_ETHERNET_FRAME_SIZE) ||
> (max_frame == MAXIMUM_ETHERNET_VLAN_SIZE)))
> - adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
> + adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE +
> + NET_IP_ALIGN;
>
> netdev->mtu = new_mtu;
>
> @@ -4002,7 +4003,8 @@ e1000_alloc_rx_buffers(struct e1000_adap
> struct e1000_buffer *buffer_info;
> struct sk_buff *skb;
> unsigned int i;
> - unsigned int bufsz = adapter->rx_buffer_len + NET_IP_ALIGN;
> + /* we have already accounted for NET_IP_ALIGN */
> + unsigned int bufsz = adapter->rx_buffer_len;
>
> i = rx_ring->next_to_use;
> buffer_info = &rx_ring->buffer_info[i];
> _
>
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Evgeniy Polyakov

2006-09-24 21:29:22

by Kok, Auke

[permalink] [raw]
Subject: Re: 2.6.1[78] page allocation failure. order:3, mode:0x20

Evgeniy Polyakov wrote:
> On Fri, Sep 22, 2006 at 10:33:48PM -0700, Andrew Morton ([email protected]) wrote:
>>> The NET_IP_ALIGN existed not just for fun :) There are ramifications
>>> for removing it.
>> It's still there, isn't it?
>>
>> For the 9k MTU case, for example, we end up allocating 16384 byte skbs
>> instead of 32786 kbytes ones.
>
> This patch will not help - netdev_alloc_skb() adds additional
> NET_SKB_PAD and then alloc_skb() adds sizeof(struct skb_shared_info).
> And even if you acconut for them in adapter->rx_buf_len, chip still can
> overwrite that area (in the thread mentioned in this e-mail thread
> before I posted such patch and received a dump of sizes chip receives -
> there were a lot of _different_ ones which were too close to the limit).

I just did the math on it and it does not compute as I wanted too, we're
basically flowing to the next larger buffersize 2 mtu bytes earlier, undoing
any benefit completely.

There is not much that can fix this issue since the hardware will always
receive in 2-order buffers and dma that back in its entirity, so we must always
claim size for NET_IP_ALIGN and NET_SKB_PAD after the 2-order bufsz. For the
9kb mtu case (16kb hw bufsz), we're stuck with 32kb slab allocations. bummer.

Andrew, please drop this patch.

Auke

>
>> diff -puN drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz drivers/net/e1000/e1000_main.c
>> --- a/drivers/net/e1000/e1000_main.c~e1000-account-for-net_ip_align-when-calculating-bufsiz
>> +++ a/drivers/net/e1000/e1000_main.c
>> @@ -1101,7 +1101,7 @@ e1000_sw_init(struct e1000_adapter *adap
>>
>> pci_read_config_word(pdev, PCI_COMMAND, &hw->pci_cmd_word);
>>
>> - adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
>> + adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + NET_IP_ALIGN;
>> adapter->rx_ps_bsize0 = E1000_RXBUFFER_128;
>> hw->max_frame_size = netdev->mtu +
>> ENET_HEADER_SIZE + ETHERNET_FCS_SIZE;
>> @@ -3163,26 +3163,27 @@ e1000_change_mtu(struct net_device *netd
>> * larger slab size
>> * i.e. RXBUFFER_2048 --> size-4096 slab */
>>
>> - if (max_frame <= E1000_RXBUFFER_256)
>> + if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_256)
>> adapter->rx_buffer_len = E1000_RXBUFFER_256;
>> - else if (max_frame <= E1000_RXBUFFER_512)
>> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_512)
>> adapter->rx_buffer_len = E1000_RXBUFFER_512;
>> - else if (max_frame <= E1000_RXBUFFER_1024)
>> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_1024)
>> adapter->rx_buffer_len = E1000_RXBUFFER_1024;
>> - else if (max_frame <= E1000_RXBUFFER_2048)
>> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_2048)
>> adapter->rx_buffer_len = E1000_RXBUFFER_2048;
>> - else if (max_frame <= E1000_RXBUFFER_4096)
>> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_4096)
>> adapter->rx_buffer_len = E1000_RXBUFFER_4096;
>> - else if (max_frame <= E1000_RXBUFFER_8192)
>> + else if (max_frame + NET_IP_ALIGN <= E1000_RXBUFFER_8192)
>> adapter->rx_buffer_len = E1000_RXBUFFER_8192;
>> - else if (max_frame <= E1000_RXBUFFER_16384)
>> + else
>> adapter->rx_buffer_len = E1000_RXBUFFER_16384;
>>
>> /* adjust allocation if LPE protects us, and we aren't using SBP */
>> if (!adapter->hw.tbi_compatibility_on &&
>> ((max_frame == MAXIMUM_ETHERNET_FRAME_SIZE) ||
>> (max_frame == MAXIMUM_ETHERNET_VLAN_SIZE)))
>> - adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
>> + adapter->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE +
>> + NET_IP_ALIGN;
>>
>> netdev->mtu = new_mtu;
>>
>> @@ -4002,7 +4003,8 @@ e1000_alloc_rx_buffers(struct e1000_adap
>> struct e1000_buffer *buffer_info;
>> struct sk_buff *skb;
>> unsigned int i;
>> - unsigned int bufsz = adapter->rx_buffer_len + NET_IP_ALIGN;
>> + /* we have already accounted for NET_IP_ALIGN */
>> + unsigned int bufsz = adapter->rx_buffer_len;
>>
>> i = rx_ring->next_to_use;
>> buffer_info = &rx_ring->buffer_info[i];
>> _
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2006-09-25 08:39:46

by Roy de Boer

[permalink] [raw]
Subject: Re: 2.6.1[78] page allocation failure. order:3, mode:0x20

On Fri, 22 Sep 2006 07:27:18 +0000 (GMT) Holger Kiehl
<[email protected]> wrote:
> I get some of the "page allocation failure" errors. My hardware is 4 CPU
> Opteron with one quad + one dual intel e1000 cards. Kernel is plain
2.6.18
> and for two cards MTU is set to 9000.

I'm getting more or less the same error messages (although I'm no
expert) on a AMD Geode NX 1700+ and a intel e1000 nic. I'm using a stock
2.6.18 kernel.

I hope this will help diagnose the problem.

Sep 25 07:54:46 gatar kernel: [23623.594000] java: page allocation
failure. order:1, mode:0x20
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c0147963>]
__alloc_pages+0x213/0x330
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c015da07>]
cache_alloc_refill+0x307/0x530
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c015dcad>]
__kmalloc+0x7d/0x80
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c04b7523>]
__alloc_skb+0x63/0x120
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c04e4618>]
tcp_collapse+0x138/0x360
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c04e4945>]
tcp_prune_queue+0x105/0x2c0
Sep 25 07:54:46 gatar kernel: [23623.594000] [<e0e29396>]
tcp_packet+0x356/0xcc0 [ip_conntrack]
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c04e78a1>]
tcp_data_queue+0x5f1/0xc50
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c04e8f3a>]
tcp_rcv_established+0x26a/0x940
Sep 25 07:54:46 gatar kernel: [23623.594000] [<e0e11937>]
ipt_do_table+0x267/0x300 [ip_tables]
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c04f0bba>]
tcp_v4_do_rcv+0xca/0x2d0
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c04f1bbb>]
tcp_v4_rcv+0x72b/0x890
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c04d46e0>]
ip_local_deliver_finish+0x0/0x170
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c04d46e0>]
ip_local_deliver_finish+0x0/0x170
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c04d497f>]
ip_local_deliver+0x12f/0x200
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c04d46e0>]
ip_local_deliver_finish+0x0/0x170
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c04d4d23>]
ip_rcv+0x2d3/0x500
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c04d4420>]
ip_rcv_finish+0x0/0x2c0
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c04bfaa9>]
netif_receive_skb+0x1f9/0x270
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c04bfb8e>]
process_backlog+0x6e/0xf0
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c04bd57c>]
net_rx_action+0x6c/0x150
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c0123e43>]
__do_softirq+0x43/0x90
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c0123eb6>]
do_softirq+0x26/0x30
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c0105776>] do_IRQ+0x36/0x70
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c043739f>]
ata_scsi_rw_xlat+0x2ff/0x510
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c01038fe>]
common_interrupt+0x1a/0x20
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c0141e7c>]
remove_suid+0xc/0x70
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c017b236>]
file_update_time+0x46/0xc0
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c0144df1>]
__generic_file_aio_write_nolock+0x201/0x470
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c011b796>]
__cond_resched+0x16/0x30
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c014532e>]
generic_file_aio_write+0x6e/0x110
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c0105776>] do_IRQ+0x36/0x70
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c01d8994>]
ext3_file_write+0x44/0xd0
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c0160a48>]
do_sync_write+0xd8/0x140
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c0132520>]
autoremove_wake_function+0x0/0x60
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c0160fed>]
vfs_write+0xdd/0x1c0
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c0161b0b>]
sys_write+0x4b/0x80
Sep 25 07:54:46 gatar kernel: [23623.594000] [<c0102f0d>]
sysenter_past_esp+0x56/0x79
Sep 25 07:54:46 gatar kernel: [23623.594000] Mem-info:
Sep 25 07:54:46 gatar kernel: [23623.594000] DMA per-cpu:
Sep 25 07:54:46 gatar kernel: [23623.594000] cpu 0 hot: high 0, batch 1
used:0
Sep 25 07:54:46 gatar kernel: [23623.594000] cpu 0 cold: high 0, batch 1
used:0
Sep 25 07:54:46 gatar kernel: [23623.594000] DMA32 per-cpu: empty
Sep 25 07:54:46 gatar kernel: [23623.594000] Normal per-cpu:
Sep 25 07:54:46 gatar kernel: [23623.594000] cpu 0 hot: high 186, batch
31 used:176
Sep 25 07:54:46 gatar kernel: [23623.594000] cpu 0 cold: high 62, batch
15 used:6
Sep 25 07:54:46 gatar kernel: [23623.594000] HighMem per-cpu: empty
Sep 25 07:54:46 gatar kernel: [23623.594000] Free pages: 5612kB
(0kB HighMem)
Sep 25 07:54:46 gatar kernel: [23623.594000] Active:92165 inactive:19681
dirty:1771 writeback:1 unstable:0 free:1403 slab:6352 mapped:14889
pagetables:424
Sep 25 07:54:46 gatar kernel: [23623.594000] DMA free:2016kB min:88kB
low:108kB high:132kB active:7404kB inactive:288kB present:16384kB
pages_scanned:0 all_unreclaimable? no
Sep 25 07:54:46 gatar kernel: [23623.594000] lowmem_reserve[]: 0 0 495 495
Sep 25 07:54:46 gatar kernel: [23623.594000] DMA32 free:0kB min:0kB
low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0
all_unreclaimable? no
Sep 25 07:54:46 gatar kernel: [23623.594000] lowmem_reserve[]: 0 0 495 495
Sep 25 07:54:46 gatar kernel: [23623.594000] Normal free:3596kB
min:2804kB low:3504kB high:4204kB active:361256kB inactive:78436kB
present:507840kB pages_scanned:0 all_unreclaimable? no
Sep 25 07:54:46 gatar kernel: [23623.594000] lowmem_reserve[]: 0 0 0 0
Sep 25 07:54:46 gatar kernel: [23623.594000] HighMem free:0kB min:128kB
low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0
all_unreclaimable? no
Sep 25 07:54:46 gatar kernel: [23623.594000] lowmem_reserve[]: 0 0 0 0
Sep 25 07:54:46 gatar kernel: [23623.594000] DMA: 10*4kB 1*8kB 1*16kB
1*32kB 0*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2016kB
Sep 25 07:54:46 gatar kernel: [23623.594000] DMA32: empty
Sep 25 07:54:46 gatar kernel: [23623.594000] Normal: 767*4kB 0*8kB
1*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB
= 3596kB
Sep 25 07:54:46 gatar kernel: [23623.594000] HighMem: empty
Sep 25 07:54:46 gatar kernel: [23623.594000] Swap cache: add 378, delete
378, find 16/24, race 0+0
Sep 25 07:54:46 gatar kernel: [23623.594000] Free swap = 656028kB
Sep 25 07:54:46 gatar kernel: [23623.594000] Total swap = 656496kB
Sep 25 07:54:46 gatar kernel: [23623.594000] Free swap: 656028kB
Sep 25 07:54:46 gatar kernel: [23623.594000] 131056 pages of RAM
Sep 25 07:54:46 gatar kernel: [23623.594000] 0 pages of HIGHMEM
Sep 25 07:54:46 gatar kernel: [23623.594000] 2741 reserved pages
Sep 25 07:54:46 gatar kernel: [23623.594000] 79395 pages shared
Sep 25 07:54:46 gatar kernel: [23623.594000] 0 pages swap cached
Sep 25 07:54:46 gatar kernel: [23623.594000] 1771 pages dirty
Sep 25 07:54:46 gatar kernel: [23623.594000] 1 pages writeback
Sep 25 07:54:46 gatar kernel: [23623.594000] 14889 pages mapped
Sep 25 07:54:46 gatar kernel: [23623.594000] 6352 pages slab
Sep 25 07:54:46 gatar kernel: [23623.594000] 424 pages pagetables

Regards,

Roy de Boer