2010-08-22 06:23:00

by Stan Hoeppner

[permalink] [raw]
Subject: 2.6.34.1 page allocation failure

I'm not subscribed to lkml so please CC me in replies. First post.

Mobo: Abit BP6, dual Celeron 366@500, i440BX chipset, 384MB PC100
Disk: SiI 3512 PCI (sata_sil, libata), 1 x WD5000AAKS 500 GB SATAII
Kernel: vanilla 2.6.34.1, 32 bit x86, SMP, Celeron pre Coppermine
OS: Debian 5.0.5 (Stable)
Build: kernel configured via make menuconfig
no modules, no initrd
built via "make KDEB_PKGVERSION="
installed via dpkg, bootloader is LILO
Role: headless SOHO server, run level 2, _very_ light load
Postfix, pdns-recursor, Dovecot, Lighttpd, Roundcube, Samba
bulk of system memory (>300MB) is consumed by buffers/cache
Issue: AFAIK, these errors never occurred with any revisions of
2.6.26, .31, or .32. After installing 2.6.34.1 I've noticed
the following errors in dmesg. I see 6 of these, including
two errors each for kswapd0, lighttpd, and smtpd, all not
tainted. AFAICT everything is still running fine. Are these
critical errors? If so, how do I fix?

kswapd0: page allocation failure. order:1, mode:0x20
Pid: 139, comm: kswapd0 Not tainted 2.6.34.1 #1
Call Trace:
[<c104b6b3>] ? __alloc_pages_nodemask+0x448/0x48a
[<c1062ffb>] ? cache_alloc_refill+0x22f/0x422
[<c11a9a73>] ? tcp_v4_send_check+0x6e/0xa4
[<c10632c3>] ? kmem_cache_alloc+0x41/0x6a
[<c11773a5>] ? sk_prot_alloc+0x19/0x55
[<c117744b>] ? sk_clone+0x16/0x1cc
[<c119a71d>] ? inet_csk_clone+0xf/0x80
[<c11ac0e3>] ? tcp_create_openreq_child+0x1a/0x3c8
[<c11aaf0a>] ? tcp_v4_syn_recv_sock+0x4b/0x151
[<c11abf9d>] ? tcp_check_req+0x209/0x335
[<c11aa892>] ? tcp_v4_do_rcv+0x8d/0x14d
[<c11aacd5>] ? tcp_v4_rcv+0x383/0x56d
[<c1193ba4>] ? ip_local_deliver+0x76/0xc0
[<c1193b10>] ? ip_rcv+0x3dc/0x3fa
[<c103655e>] ? ktime_get_real+0xf/0x2b
[<c117f8d3>] ? netif_receive_skb+0x219/0x234
[<c115ff46>] ? e100_poll+0x1d0/0x47e
[<c117fa98>] ? net_rx_action+0x58/0xf8
[<c102539c>] ? __do_softirq+0x78/0xe5
[<c102542c>] ? do_softirq+0x23/0x27
[<c1003955>] ? do_IRQ+0x7d/0x8e
[<c1002aa9>] ? common_interrupt+0x29/0x30
[<c1062870>] ? kmem_cache_free+0xbd/0xc5
[<c10fa7d1>] ? __xfs_inode_set_reclaim_tag+0x29/0x2f
[<c1075215>] ? destroy_inode+0x1c/0x2b
[<c10752ce>] ? dispose_list+0xaa/0xd0
[<c107548c>] ? shrink_icache_memory+0x198/0x1c5
[<c104f76b>] ? shrink_slab+0xda/0x12f
[<c104fc28>] ? kswapd+0x468/0x63b
[<c104dca3>] ? isolate_pages_global+0x0/0x1bc
[<c10304d6>] ? autoremove_wake_function+0x0/0x2d
[<c1018faf>] ? complete+0x28/0x36
[<c104f7c0>] ? kswapd+0x0/0x63b
[<c10301cd>] ? kthread+0x61/0x66
[<c103016c>] ? kthread+0x0/0x66
[<c1002ab6>] ? kernel_thread_helper+0x6/0x10
Mem-Info:
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
CPU 1: hi: 0, btch: 1 usd: 0
Normal per-cpu:
CPU 0: hi: 186, btch: 31 usd: 180
CPU 1: hi: 186, btch: 31 usd: 29
active_anon:646 inactive_anon:4337 isolated_anon:0
active_file:27189 inactive_file:35957 isolated_file:0
unevictable:0 dirty:56 writeback:0 unstable:0
free:1142 slab_reclaimable:25495 slab_unreclaimable:1020
mapped:3116 shmem:143 pagetables:123 bounce:0
DMA free:1568kB min:100kB low:124kB high:148kB active_anon:0kB
inactive_anon:4kB active_file:5704kB inactive_file:7732kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15868kB
mlocked:0kB dirty:0kB writeback:0kB mapped:28kB shmem:0kB
slab_reclaimable:912kB slab_unreclaimable:52kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
lowmem_reserve[]: 0 365 365
Normal free:3000kB min:2392kB low:2988kB high:3588kB active_anon:2584kB
inactive_anon:17344kB active_file:103052kB inactive_file:136096kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:373888kB
mlocked:0kB dirty:224kB writeback:0kB mapped:12436kB shmem:572kB
slab_reclaimable:101068kB slab_unreclaimable:4028kB kernel_stack:520kB
pagetables:492kB unstable:0kB bounce:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 391*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB
0*2048kB 0*4096kB = 1564kB
Normal: 750*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 3000kB
63342 total pagecache pages
23 pages in swap cache
Swap cache stats: add 159, delete 136, find 401/412
Free swap = 995636kB
Total swap = 995992kB
98303 pages RAM
1638 pages reserved
22416 pages shared
76947 pages non-shared

Thanks.

--
Stan


2010-08-22 06:47:30

by Mikael Abrahamsson

[permalink] [raw]
Subject: Re: 2.6.34.1 page allocation failure

On Sun, 22 Aug 2010, Stan Hoeppner wrote:

> I'm not subscribed to lkml so please CC me in replies. First post.

I'm seeing similar problems on older kernels (.24 up to .32).

<http://www.spinics.net/lists/linux-mm/msg07808.html>

I didn't get any response at all, neither on linux-mm or lkml... Our
problems seem very similar, but I'm running 64bit and I have 8 gigs of
ram.

Personally I can avoid this by tuning down my TCP settings so TCP uses
less memory, but I don't think that workaround is very good, this
shouldn't happen. My machine also freezes up (pressing caps lock doesn't
work) sometimes, sometimes it just logs the error.

> Mobo: Abit BP6, dual Celeron 366@500, i440BX chipset, 384MB PC100
> Disk: SiI 3512 PCI (sata_sil, libata), 1 x WD5000AAKS 500 GB SATAII
> Kernel: vanilla 2.6.34.1, 32 bit x86, SMP, Celeron pre Coppermine
> OS: Debian 5.0.5 (Stable)
> Build: kernel configured via make menuconfig
> no modules, no initrd
> built via "make KDEB_PKGVERSION="
> installed via dpkg, bootloader is LILO
> Role: headless SOHO server, run level 2, _very_ light load
> Postfix, pdns-recursor, Dovecot, Lighttpd, Roundcube, Samba
> bulk of system memory (>300MB) is consumed by buffers/cache
> Issue: AFAIK, these errors never occurred with any revisions of
> 2.6.26, .31, or .32. After installing 2.6.34.1 I've noticed
> the following errors in dmesg. I see 6 of these, including
> two errors each for kswapd0, lighttpd, and smtpd, all not
> tainted. AFAICT everything is still running fine. Are these
> critical errors? If so, how do I fix?
>
> kswapd0: page allocation failure. order:1, mode:0x20
> Pid: 139, comm: kswapd0 Not tainted 2.6.34.1 #1
> Call Trace:
> [<c104b6b3>] ? __alloc_pages_nodemask+0x448/0x48a
> [<c1062ffb>] ? cache_alloc_refill+0x22f/0x422
> [<c11a9a73>] ? tcp_v4_send_check+0x6e/0xa4
> [<c10632c3>] ? kmem_cache_alloc+0x41/0x6a
> [<c11773a5>] ? sk_prot_alloc+0x19/0x55
> [<c117744b>] ? sk_clone+0x16/0x1cc
> [<c119a71d>] ? inet_csk_clone+0xf/0x80
> [<c11ac0e3>] ? tcp_create_openreq_child+0x1a/0x3c8
> [<c11aaf0a>] ? tcp_v4_syn_recv_sock+0x4b/0x151
> [<c11abf9d>] ? tcp_check_req+0x209/0x335
> [<c11aa892>] ? tcp_v4_do_rcv+0x8d/0x14d
> [<c11aacd5>] ? tcp_v4_rcv+0x383/0x56d
> [<c1193ba4>] ? ip_local_deliver+0x76/0xc0
> [<c1193b10>] ? ip_rcv+0x3dc/0x3fa
> [<c103655e>] ? ktime_get_real+0xf/0x2b
> [<c117f8d3>] ? netif_receive_skb+0x219/0x234
> [<c115ff46>] ? e100_poll+0x1d0/0x47e
> [<c117fa98>] ? net_rx_action+0x58/0xf8
> [<c102539c>] ? __do_softirq+0x78/0xe5
> [<c102542c>] ? do_softirq+0x23/0x27
> [<c1003955>] ? do_IRQ+0x7d/0x8e
> [<c1002aa9>] ? common_interrupt+0x29/0x30
> [<c1062870>] ? kmem_cache_free+0xbd/0xc5
> [<c10fa7d1>] ? __xfs_inode_set_reclaim_tag+0x29/0x2f
> [<c1075215>] ? destroy_inode+0x1c/0x2b
> [<c10752ce>] ? dispose_list+0xaa/0xd0
> [<c107548c>] ? shrink_icache_memory+0x198/0x1c5
> [<c104f76b>] ? shrink_slab+0xda/0x12f
> [<c104fc28>] ? kswapd+0x468/0x63b
> [<c104dca3>] ? isolate_pages_global+0x0/0x1bc
> [<c10304d6>] ? autoremove_wake_function+0x0/0x2d
> [<c1018faf>] ? complete+0x28/0x36
> [<c104f7c0>] ? kswapd+0x0/0x63b
> [<c10301cd>] ? kthread+0x61/0x66
> [<c103016c>] ? kthread+0x0/0x66
> [<c1002ab6>] ? kernel_thread_helper+0x6/0x10
> Mem-Info:
> DMA per-cpu:
> CPU 0: hi: 0, btch: 1 usd: 0
> CPU 1: hi: 0, btch: 1 usd: 0
> Normal per-cpu:
> CPU 0: hi: 186, btch: 31 usd: 180
> CPU 1: hi: 186, btch: 31 usd: 29
> active_anon:646 inactive_anon:4337 isolated_anon:0
> active_file:27189 inactive_file:35957 isolated_file:0
> unevictable:0 dirty:56 writeback:0 unstable:0
> free:1142 slab_reclaimable:25495 slab_unreclaimable:1020
> mapped:3116 shmem:143 pagetables:123 bounce:0
> DMA free:1568kB min:100kB low:124kB high:148kB active_anon:0kB
> inactive_anon:4kB active_file:5704kB inactive_file:7732kB
> unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15868kB
> mlocked:0kB dirty:0kB writeback:0kB mapped:28kB shmem:0kB
> slab_reclaimable:912kB slab_unreclaimable:52kB kernel_stack:0kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> lowmem_reserve[]: 0 365 365
> Normal free:3000kB min:2392kB low:2988kB high:3588kB active_anon:2584kB
> inactive_anon:17344kB active_file:103052kB inactive_file:136096kB
> unevictable:0kB isolated(anon):0kB isolated(file):0kB present:373888kB
> mlocked:0kB dirty:224kB writeback:0kB mapped:12436kB shmem:572kB
> slab_reclaimable:101068kB slab_unreclaimable:4028kB kernel_stack:520kB
> pagetables:492kB unstable:0kB bounce:0kB writeback_tmp:0kB
> pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0
> DMA: 391*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB
> 0*2048kB 0*4096kB = 1564kB
> Normal: 750*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
> 0*1024kB 0*2048kB 0*4096kB = 3000kB
> 63342 total pagecache pages
> 23 pages in swap cache
> Swap cache stats: add 159, delete 136, find 401/412
> Free swap = 995636kB
> Total swap = 995992kB
> 98303 pages RAM
> 1638 pages reserved
> 22416 pages shared
> 76947 pages non-shared
>
> Thanks.
>
> --
> Stan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
Mikael Abrahamsson email: [email protected]

2010-08-22 19:51:06

by Pekka Enberg

[permalink] [raw]
Subject: Re: 2.6.34.1 page allocation failure

On Sun, Aug 22, 2010 at 9:47 AM, Mikael Abrahamsson <[email protected]> wrote:
> On Sun, 22 Aug 2010, Stan Hoeppner wrote:
>
>> I'm not subscribed to lkml so please CC me in replies. ?First post.
>
> I'm seeing similar problems on older kernels (.24 up to .32).
>
> <http://www.spinics.net/lists/linux-mm/msg07808.html>
>
> I didn't get any response at all, neither on linux-mm or lkml... Our
> problems seem very similar, but I'm running 64bit and I have 8 gigs of ram.
>
> Personally I can avoid this by tuning down my TCP settings so TCP uses less
> memory, but I don't think that workaround is very good, this shouldn't
> happen. My machine also freezes up (pressing caps lock doesn't work)
> sometimes, sometimes it just logs the error.
>
>> Mobo: ? ?Abit BP6, dual Celeron 366@500, i440BX chipset, 384MB PC100
>> Disk: ? ?SiI 3512 PCI (sata_sil, libata), 1 x WD5000AAKS 500 GB SATAII
>> Kernel: ?vanilla 2.6.34.1, 32 bit x86, SMP, Celeron pre Coppermine
>> OS: ? ? ?Debian 5.0.5 (Stable)
>> Build: ? kernel configured via make menuconfig
>> ? ? ? ?no modules, no initrd
>> ? ? ? ?built via "make KDEB_PKGVERSION="
>> ? ? ? ?installed via dpkg, bootloader is LILO
>> Role: ? ?headless SOHO server, run level 2, _very_ light load
>> ? ? ? ?Postfix, pdns-recursor, Dovecot, Lighttpd, Roundcube, Samba
>> ? ? ? ?bulk of system memory (>300MB) is consumed by buffers/cache
>> Issue: ? AFAIK, these errors never occurred with any revisions of
>> ? ? ? ?2.6.26, .31, or .32. ?After installing 2.6.34.1 I've noticed
>> ? ? ? ?the following errors in dmesg. ?I see 6 of these, including
>> ? ? ? ?two errors each for kswapd0, lighttpd, and smtpd, all not
>> ? ? ? ?tainted. ?AFAICT everything is still running fine. ?Are these
>> ? ? ? ?critical errors? ?If so, how do I fix?
>>
>> kswapd0: page allocation failure. order:1, mode:0x20
>> Pid: 139, comm: kswapd0 Not tainted 2.6.34.1 #1
>> Call Trace:
>> [<c104b6b3>] ? __alloc_pages_nodemask+0x448/0x48a
>> [<c1062ffb>] ? cache_alloc_refill+0x22f/0x422
>> [<c11a9a73>] ? tcp_v4_send_check+0x6e/0xa4
>> [<c10632c3>] ? kmem_cache_alloc+0x41/0x6a
>> [<c11773a5>] ? sk_prot_alloc+0x19/0x55
>> [<c117744b>] ? sk_clone+0x16/0x1cc
>> [<c119a71d>] ? inet_csk_clone+0xf/0x80
>> [<c11ac0e3>] ? tcp_create_openreq_child+0x1a/0x3c8
>> [<c11aaf0a>] ? tcp_v4_syn_recv_sock+0x4b/0x151
>> [<c11abf9d>] ? tcp_check_req+0x209/0x335
>> [<c11aa892>] ? tcp_v4_do_rcv+0x8d/0x14d
>> [<c11aacd5>] ? tcp_v4_rcv+0x383/0x56d
>> [<c1193ba4>] ? ip_local_deliver+0x76/0xc0
>> [<c1193b10>] ? ip_rcv+0x3dc/0x3fa
>> [<c103655e>] ? ktime_get_real+0xf/0x2b
>> [<c117f8d3>] ? netif_receive_skb+0x219/0x234
>> [<c115ff46>] ? e100_poll+0x1d0/0x47e
>> [<c117fa98>] ? net_rx_action+0x58/0xf8
>> [<c102539c>] ? __do_softirq+0x78/0xe5
>> [<c102542c>] ? do_softirq+0x23/0x27
>> [<c1003955>] ? do_IRQ+0x7d/0x8e
>> [<c1002aa9>] ? common_interrupt+0x29/0x30
>> [<c1062870>] ? kmem_cache_free+0xbd/0xc5
>> [<c10fa7d1>] ? __xfs_inode_set_reclaim_tag+0x29/0x2f
>> [<c1075215>] ? destroy_inode+0x1c/0x2b
>> [<c10752ce>] ? dispose_list+0xaa/0xd0
>> [<c107548c>] ? shrink_icache_memory+0x198/0x1c5
>> [<c104f76b>] ? shrink_slab+0xda/0x12f
>> [<c104fc28>] ? kswapd+0x468/0x63b
>> [<c104dca3>] ? isolate_pages_global+0x0/0x1bc
>> [<c10304d6>] ? autoremove_wake_function+0x0/0x2d
>> [<c1018faf>] ? complete+0x28/0x36
>> [<c104f7c0>] ? kswapd+0x0/0x63b
>> [<c10301cd>] ? kthread+0x61/0x66
>> [<c103016c>] ? kthread+0x0/0x66
>> [<c1002ab6>] ? kernel_thread_helper+0x6/0x10
>> Mem-Info:
>> DMA per-cpu:
>> CPU ? ?0: hi: ? ?0, btch: ? 1 usd: ? 0
>> CPU ? ?1: hi: ? ?0, btch: ? 1 usd: ? 0
>> Normal per-cpu:
>> CPU ? ?0: hi: ?186, btch: ?31 usd: 180
>> CPU ? ?1: hi: ?186, btch: ?31 usd: ?29
>> active_anon:646 inactive_anon:4337 isolated_anon:0
>> active_file:27189 inactive_file:35957 isolated_file:0
>> unevictable:0 dirty:56 writeback:0 unstable:0
>> free:1142 slab_reclaimable:25495 slab_unreclaimable:1020
>> mapped:3116 shmem:143 pagetables:123 bounce:0
>> DMA free:1568kB min:100kB low:124kB high:148kB active_anon:0kB
>> inactive_anon:4kB active_file:5704kB inactive_file:7732kB
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15868kB
>> mlocked:0kB dirty:0kB writeback:0kB mapped:28kB shmem:0kB
>> slab_reclaimable:912kB slab_unreclaimable:52kB kernel_stack:0kB
>> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
>> all_unreclaimable? no
>> lowmem_reserve[]: 0 365 365
>> Normal free:3000kB min:2392kB low:2988kB high:3588kB active_anon:2584kB
>> inactive_anon:17344kB active_file:103052kB inactive_file:136096kB
>> unevictable:0kB isolated(anon):0kB isolated(file):0kB present:373888kB
>> mlocked:0kB dirty:224kB writeback:0kB mapped:12436kB shmem:572kB
>> slab_reclaimable:101068kB slab_unreclaimable:4028kB kernel_stack:520kB
>> pagetables:492kB unstable:0kB bounce:0kB writeback_tmp:0kB
>> pages_scanned:0 all_unreclaimable? no
>> lowmem_reserve[]: 0 0 0
>> DMA: 391*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB
>> 0*2048kB 0*4096kB = 1564kB
>> Normal: 750*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
>> 0*1024kB 0*2048kB 0*4096kB = 3000kB
>> 63342 total pagecache pages
>> 23 pages in swap cache
>> Swap cache stats: add 159, delete 136, find 401/412
>> Free swap ?= 995636kB
>> Total swap = 995992kB
>> 98303 pages RAM
>> 1638 pages reserved
>> 22416 pages shared
>> 76947 pages non-shared

In Stan's case, it's a order-1 GFP_ATOMIC allocation but there are
only order-0 pages available. Mel, any recent page allocator fixes in
2.6.35 or 2.6.36-rc1 that Stan/Mikael should test?

Subject: Re: 2.6.34.1 page allocation failure

On Sun, 22 Aug 2010, Pekka Enberg wrote:

> In Stan's case, it's a order-1 GFP_ATOMIC allocation but there are
> only order-0 pages available. Mel, any recent page allocator fixes in
> 2.6.35 or 2.6.36-rc1 that Stan/Mikael should test?

This is the TCP slab? Best fix would be in the page allocator. However,
in this particular case the slub allocator would be able to fall back to
an order 0 allocation and still satisfy the request.

2010-08-23 09:37:09

by Pekka Enberg

[permalink] [raw]
Subject: Re: 2.6.34.1 page allocation failure

On 8/23/10 1:40 AM, Christoph Lameter wrote:
> On Sun, 22 Aug 2010, Pekka Enberg wrote:
>
>> In Stan's case, it's a order-1 GFP_ATOMIC allocation but there are
>> only order-0 pages available. Mel, any recent page allocator fixes in
>> 2.6.35 or 2.6.36-rc1 that Stan/Mikael should test?
> This is the TCP slab? Best fix would be in the page allocator. However,
> in this particular case the slub allocator would be able to fall back to
> an order 0 allocation and still satisfy the request.
>
Looking at the stack trace of the oops, I think Stan has CONFIG_SLAB
which doesn't have order-0 fallback.

2010-08-23 22:35:53

by Stan Hoeppner

[permalink] [raw]
Subject: Re: 2.6.34.1 page allocation failure

Pekka Enberg put forth on 8/23/2010 4:37 AM:
> On 8/23/10 1:40 AM, Christoph Lameter wrote:
>> On Sun, 22 Aug 2010, Pekka Enberg wrote:
>>
>>> In Stan's case, it's a order-1 GFP_ATOMIC allocation but there are
>>> only order-0 pages available. Mel, any recent page allocator fixes in
>>> 2.6.35 or 2.6.36-rc1 that Stan/Mikael should test?
>> This is the TCP slab? Best fix would be in the page allocator. However,
>> in this particular case the slub allocator would be able to fall back to
>> an order 0 allocation and still satisfy the request.
>>
> Looking at the stack trace of the oops, I think Stan has CONFIG_SLAB
> which doesn't have order-0 fallback.

That is correct. The menuconfig help screen led me to believe the SLAB
allocator was the "safe" choice:

"CONFIG_SLAB:
The regular slab allocator that is established and known to work well in
all environments"

Should I be using SLUB instead? Any downsides to SLUB on an old and
slow (500 MHz) single core dual CPU box with <512MB RAM?

Also, what is the impact of these oopses? Despite the entries in dmesg,
the system "seems" to be running ok. Or is this simply the calm before
the impending storm?

--
Stan

Subject: Re: 2.6.34.1 page allocation failure

On Mon, 23 Aug 2010, Stan Hoeppner wrote:

> Should I be using SLUB instead? Any downsides to SLUB on an old and
> slow (500 MHz) single core dual CPU box with <512MB RAM?

SLUB has a smaller memory footprint so you may come out ahead for
such a small system in particular.

> Also, what is the impact of these oopses? Despite the entries in dmesg,
> the system "seems" to be running ok. Or is this simply the calm before
> the impending storm?

The system does not guarantee that GFP_ATOMIC allocation succeed so any
caller must provide logic to fall back if no memory is allocated. So the
effect may just be that certain OS operations have to be retried.

2010-08-24 18:03:58

by Pekka Enberg

[permalink] [raw]
Subject: Re: 2.6.34.1 page allocation failure

[ I'm CC'ing netdev. ]

On 24.8.2010 1.35, Stan Hoeppner wrote:
> Pekka Enberg put forth on 8/23/2010 4:37 AM:
>> On 8/23/10 1:40 AM, Christoph Lameter wrote:
>>> On Sun, 22 Aug 2010, Pekka Enberg wrote:
>>>
>>>> In Stan's case, it's a order-1 GFP_ATOMIC allocation but there are
>>>> only order-0 pages available. Mel, any recent page allocator fixes in
>>>> 2.6.35 or 2.6.36-rc1 that Stan/Mikael should test?
>>> This is the TCP slab? Best fix would be in the page allocator. However,
>>> in this particular case the slub allocator would be able to fall back to
>>> an order 0 allocation and still satisfy the request.
>> Looking at the stack trace of the oops, I think Stan has CONFIG_SLAB
>> which doesn't have order-0 fallback.
> That is correct. The menuconfig help screen led me to believe the SLAB
> allocator was the "safe" choice:
>
> "CONFIG_SLAB:
> The regular slab allocator that is established and known to work well in
> all environments"
>
> Should I be using SLUB instead? Any downsides to SLUB on an old and
> slow (500 MHz) single core dual CPU box with<512MB RAM?
I don't think the problem here is SLAB so it shouldn't matter which one
you use. You might not see the problems with SLUB, though, because it
falls back to 0-order allocations.
> Also, what is the impact of these oopses? Despite the entries in dmesg,
> the system "seems" to be running ok. Or is this simply the calm before
> the impending storm?
The page allocation failure in question is this:

kswapd0: page allocation failure. order:1, mode:0x20
Pid: 139, comm: kswapd0 Not tainted 2.6.34.1 #1
Call Trace:
[<c104b6b3>] ? __alloc_pages_nodemask+0x448/0x48a
[<c1062ffb>] ? cache_alloc_refill+0x22f/0x422
[<c11a9a73>] ? tcp_v4_send_check+0x6e/0xa4
[<c10632c3>] ? kmem_cache_alloc+0x41/0x6a
[<c11773a5>] ? sk_prot_alloc+0x19/0x55
[<c117744b>] ? sk_clone+0x16/0x1cc
[<c119a71d>] ? inet_csk_clone+0xf/0x80
[<c11ac0e3>] ? tcp_create_openreq_child+0x1a/0x3c8
[<c11aaf0a>] ? tcp_v4_syn_recv_sock+0x4b/0x151
[<c11abf9d>] ? tcp_check_req+0x209/0x335
[<c11aa892>] ? tcp_v4_do_rcv+0x8d/0x14d
[<c11aacd5>] ? tcp_v4_rcv+0x383/0x56d
[<c1193ba4>] ? ip_local_deliver+0x76/0xc0
[<c1193b10>] ? ip_rcv+0x3dc/0x3fa
[<c103655e>] ? ktime_get_real+0xf/0x2b
[<c117f8d3>] ? netif_receive_skb+0x219/0x234
[<c115ff46>] ? e100_poll+0x1d0/0x47e
[<c117fa98>] ? net_rx_action+0x58/0xf8
[<c102539c>] ? __do_softirq+0x78/0xe5
[<c102542c>] ? do_softirq+0x23/0x27
[<c1003955>] ? do_IRQ+0x7d/0x8e
[<c1002aa9>] ? common_interrupt+0x29/0x30
[<c1062870>] ? kmem_cache_free+0xbd/0xc5
[<c10fa7d1>] ? __xfs_inode_set_reclaim_tag+0x29/0x2f
[<c1075215>] ? destroy_inode+0x1c/0x2b
[<c10752ce>] ? dispose_list+0xaa/0xd0
[<c107548c>] ? shrink_icache_memory+0x198/0x1c5
[<c104f76b>] ? shrink_slab+0xda/0x12f
[<c104fc28>] ? kswapd+0x468/0x63b
[<c104dca3>] ? isolate_pages_global+0x0/0x1bc
[<c10304d6>] ? autoremove_wake_function+0x0/0x2d
[<c1018faf>] ? complete+0x28/0x36
[<c104f7c0>] ? kswapd+0x0/0x63b
[<c10301cd>] ? kthread+0x61/0x66
[<c103016c>] ? kthread+0x0/0x66
[<c1002ab6>] ? kernel_thread_helper+0x6/0x10

It looks to me as if tcp_create_openreq_child() is able to cope with the
situation so the warning could be harmless. If that's the case, we
should probably stick a __GFP_NOWARN there.

Pekka

2010-08-24 19:08:40

by Stan Hoeppner

[permalink] [raw]
Subject: Re: 2.6.34.1 page allocation failure

Pekka Enberg put forth on 8/24/2010 1:03 PM:

> It looks to me as if tcp_create_openreq_child() is able to cope with the
> situation so the warning could be harmless. If that's the case, we
> should probably stick a __GFP_NOWARN there.

If it would be helpful, here's a complete copy of dmesg:
http://www.hardwarefreak.com/2.6.34.1-dmesg-oopses.txt

Something I forgot to mention earlier is that every now and then I
unmount swap and drop caches to clear things out a bit. Not sure if
that may be relevant, but since it has to do with memory allocation I
thought I'd mention it.

--
Stan

2010-08-24 19:21:17

by Mikael Abrahamsson

[permalink] [raw]
Subject: Re: 2.6.34.1 page allocation failure

On Tue, 24 Aug 2010, Pekka Enberg wrote:

> It looks to me as if tcp_create_openreq_child() is able to cope with the
> situation so the warning could be harmless. If that's the case, we
> should probably stick a __GFP_NOWARN there.

What about my situation? (a complete dmesg can be had at
<http://swm.pp.se/dmesg.100809-2.txt.gz>)

[87578.494471] swapper: page allocation failure. order:0, mode:0x4020
[87578.494476] Pid: 0, comm: swapper Not tainted 2.6.32-24-generic #39-Ubuntu
[87578.494480] Call Trace:
[87578.494483] <IRQ> [<ffffffff810fad0e>] __alloc_pages_slowpath+0x56e/0x580
[87578.494499] [<ffffffff810fae7e>] __alloc_pages_nodemask+0x15e/0x1a0
[87578.494506] [<ffffffff8112dba7>] alloc_pages_current+0x87/0xd0
[87578.494511] [<ffffffff81133b17>] new_slab+0x2f7/0x310
[87578.494516] [<ffffffff811363c1>] __slab_alloc+0x201/0x2d0
[87578.494522] [<ffffffff81455fe6>] ? __netdev_alloc_skb+0x36/0x60
[87578.494528] [<ffffffff81137408>] __kmalloc_node_track_caller+0xb8/0x180
[87578.494532] [<ffffffff81455fe6>] ? __netdev_alloc_skb+0x36/0x60
[87578.494536] [<ffffffff81455ca0>] __alloc_skb+0x80/0x190
[87578.494540] [<ffffffff81455fe6>] __netdev_alloc_skb+0x36/0x60
[87578.494564] [<ffffffffa008f5c7>] rtl8169_rx_interrupt+0x247/0x5b0 [r8169]
[87578.494572] [<ffffffffa008faad>] rtl8169_poll+0x3d/0x270 [r8169]
[87578.494580] [<ffffffff810397a9>] ? default_spin_lock_flags+0x9/0x10
[87578.494586] [<ffffffff8146029f>] net_rx_action+0x10f/0x250
[87578.494594] [<ffffffffa008d54e>] ? rtl8169_interrupt+0xde/0x1e0 [r8169]
[87578.494600] [<ffffffff8106e467>] __do_softirq+0xb7/0x1e0
[87578.494605] [<ffffffff810c52c0>] ? handle_IRQ_event+0x60/0x170
[87578.494610] [<ffffffff810142ec>] call_softirq+0x1c/0x30
[87578.494614] [<ffffffff81015cb5>] do_softirq+0x65/0xa0
[87578.494618] [<ffffffff8106e305>] irq_exit+0x85/0x90
[87578.494623] [<ffffffff81549515>] do_IRQ+0x75/0xf0
[87578.494627] [<ffffffff81013b13>] ret_from_intr+0x0/0x11
[87578.494629] <EOI> [<ffffffff8130f7cb>] ? acpi_idle_enter_c1+0xa3/0xc1
[87578.494639] [<ffffffff8130f7aa>] ? acpi_idle_enter_c1+0x82/0xc1
[87578.494646] [<ffffffff8143a5a7>] ? cpuidle_idle_call+0xa7/0x140
[87578.494652] [<ffffffff81011e73>] ? cpu_idle+0xb3/0x110
[87578.494657] [<ffffffff8153e27e>] ? start_secondary+0xa8/0xaa
[87578.494660] Mem-Info:
[87578.494662] Node 0 DMA per-cpu:
[87578.494666] CPU 0: hi: 0, btch: 1 usd: 0
[87578.494669] CPU 1: hi: 0, btch: 1 usd: 0
[87578.494672] CPU 2: hi: 0, btch: 1 usd: 0
[87578.494674] CPU 3: hi: 0, btch: 1 usd: 0
[87578.494677] Node 0 DMA32 per-cpu:
[87578.494680] CPU 0: hi: 186, btch: 31 usd: 173
[87578.494683] CPU 1: hi: 186, btch: 31 usd: 87
[87578.494686] CPU 2: hi: 186, btch: 31 usd: 168
[87578.494689] CPU 3: hi: 186, btch: 31 usd: 63
[87578.494691] Node 0 Normal per-cpu:
[87578.494695] CPU 0: hi: 186, btch: 31 usd: 177
[87578.494698] CPU 1: hi: 186, btch: 31 usd: 176
[87578.494700] CPU 2: hi: 186, btch: 31 usd: 82
[87578.494703] CPU 3: hi: 186, btch: 31 usd: 191
[87578.494710] active_anon:22970 inactive_anon:6433 isolated_anon:0
[87578.494711] active_file:916528 inactive_file:914736 isolated_file:0
[87578.494713] unevictable:0 dirty:135959 writeback:24423 unstable:0
[87578.494714] free:9990 slab_reclaimable:59767 slab_unreclaimable:11135
[87578.494716] mapped:119343 shmem:985 pagetables:2113 bounce:0
[87578.494719] Node 0 DMA free:15860kB min:20kB low:24kB high:28kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15272kB
mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? yes
[87578.494733] lowmem_reserve[]: 0 2866 7852 7852
[87578.494738] Node 0 DMA32 free:21420kB min:4136kB low:5168kB high:6204kB
active_anon:4056kB inactive_anon:5856kB active_file:1322360kB
inactive_file:1320432kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:2935456kB mlocked:0kB dirty:190824kB
writeback:31900kB mapped:157676kB shmem:0kB slab_reclaimable:107316kB
slab_unreclaimable:15480kB kernel_stack:56kB pagetables:764kB unstable:0kB
bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[87578.494754] lowmem_reserve[]: 0 0 4986 4986
[87578.494759] Node 0 Normal free:2680kB min:7192kB low:8988kB
high:10788kB active_anon:87824kB inactive_anon:19876kB
active_file:2343752kB inactive_file:2338512kB unevictable:0kB
isolated(anon):0kB isolated(file):0kB present:5105664kB mlocked:0kB
dirty:353012kB writeback:65792kB mapped:319696kB shmem:3940kB
slab_reclaimable:131752kB slab_unreclaimable:29060kB kernel_stack:2160kB
pagetables:7688kB unstable:0kB bounce:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
[87578.494775] lowmem_reserve[]: 0 0 0 0
[87578.494779] Node 0 DMA: 3*4kB 3*8kB 3*16kB 1*32kB 2*64kB 2*128kB
0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15860kB
[87578.494792] Node 0 DMA32: 789*4kB 765*8kB 589*16kB 1*32kB 1*64kB
4*128kB 4*256kB 2*512kB 0*1024kB 0*2048kB 0*4096kB = 21356kB
[87578.494805] Node 0 Normal: 374*4kB 4*8kB 20*16kB 1*32kB 0*64kB 0*128kB
1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2648kB
[87578.494818] 1832322 total pagecache pages
[87578.494820] 0 pages in swap cache
[87578.494823] Swap cache stats: add 0, delete 0, find 0/0
[87578.494825] Free swap = 0kB
[87578.494827] Total swap = 0kB
[87578.531041] 2064368 pages RAM
[87578.531044] 66019 pages reserved
[87578.531046] 1501227 pages shared
[87578.531048] 619257 pages non-shared
[87578.531053] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[87578.531057] cache: kmalloc-4096, object size: 4096, buffer size:
4096, default order: 3, min order: 0
[87578.531061] node 0: slabs: 1322, objs: 4129, free: 0

This actually made the machine go offline for hours before it for some
reason came back. The second time this happened it did not come back
(waited 8 hours).

I also seem to have TCP related problems:

[87578.531806] [<ffffffff8113651f>] kmem_cache_alloc_node+0x8f/0x160
[87578.531812] [<ffffffff81455c6f>] __alloc_skb+0x4f/0x190
[87578.531820] [<ffffffff814acbe0>] ? tcp_delack_timer+0x0/0x270
[87578.531828] [<ffffffff814ab423>] tcp_send_ack+0x33/0x120
[87578.531834] [<ffffffff814acd22>] tcp_delack_timer+0x142/0x270
[87578.531842] [<ffffffff8105a34d>] ? scheduler_tick+0x18d/0x260
[87578.531849] [<ffffffff8107776b>] run_timer_softirq+0x19b/0x340
[87578.531857] [<ffffffff81094ac0>] ? tick_sched_timer+0x0/0xc0
[87578.531865] [<ffffffff8108f723>] ? ktime_get+0x63/0xe0
[87578.531871] [<ffffffff8106e467>] __do_softirq+0xb7/0x1e0
[87578.531878] [<ffffffff810946aa>] ? tick_program_event+0x2a/0x30
[87578.531885] [<ffffffff810142ec>] call_softirq+0x1c/0x30
[87578.531891] [<ffffffff81015cb5>] do_softirq+0x65/0xa0
[87578.531897] [<ffffffff8106e305>] irq_exit+0x85/0x90
[87578.531904] [<ffffffff81549601>] smp_apic_timer_interrupt+0x71/0x9c
[87578.531910] [<ffffffff81013cb3>] apic_timer_interrupt+0x13/0x20
[87578.531914] <EOI> [<ffffffff8130fbbe>] ? acpi_idle_enter_simple+0x117/0x14b
[87578.531928] [<ffffffff8130fbb7>] ? acpi_idle_enter_simple+0x110/0x14b
[87578.531936] [<ffffffff8143a5a7>] ? cpuidle_idle_call+0xa7/0x140
[87578.531943] [<ffffffff81011e73>] ? cpu_idle+0xb3/0x110
[87578.531950] [<ffffffff8153e27e>] ? start_secondary+0xa8/0xaa


--
Mikael Abrahamsson email: [email protected]

2010-08-29 10:49:21

by Pekka Enberg

[permalink] [raw]
Subject: Re: 2.6.34.1 page allocation failure

On 24.8.2010 22.21, Mikael Abrahamsson wrote:
> On Tue, 24 Aug 2010, Pekka Enberg wrote:
>
>> It looks to me as if tcp_create_openreq_child() is able to cope with
>> the situation so the warning could be harmless. If that's the case,
>> we should probably stick a __GFP_NOWARN there.
>
> What about my situation? (a complete dmesg can be had at
> <http://swm.pp.se/dmesg.100809-2.txt.gz>)
This looks like something the kernel can't really recover from.
> [87578.494471] swapper: page allocation failure. order:0, mode:0x4020
> [87578.494476] Pid: 0, comm: swapper Not tainted 2.6.32-24-generic
> #39-Ubuntu
> [87578.494480] Call Trace:
> [87578.494483] <IRQ> [<ffffffff810fad0e>]
> __alloc_pages_slowpath+0x56e/0x580
> [87578.494499] [<ffffffff810fae7e>] __alloc_pages_nodemask+0x15e/0x1a0
> [87578.494506] [<ffffffff8112dba7>] alloc_pages_current+0x87/0xd0
> [87578.494511] [<ffffffff81133b17>] new_slab+0x2f7/0x310
> [87578.494516] [<ffffffff811363c1>] __slab_alloc+0x201/0x2d0
> [87578.494522] [<ffffffff81455fe6>] ? __netdev_alloc_skb+0x36/0x60
> [87578.494528] [<ffffffff81137408>]
> __kmalloc_node_track_caller+0xb8/0x180
> [87578.494532] [<ffffffff81455fe6>] ? __netdev_alloc_skb+0x36/0x60
> [87578.494536] [<ffffffff81455ca0>] __alloc_skb+0x80/0x190
> [87578.494540] [<ffffffff81455fe6>] __netdev_alloc_skb+0x36/0x60
> [87578.494564] [<ffffffffa008f5c7>] rtl8169_rx_interrupt+0x247/0x5b0
> [r8169]
> [87578.494572] [<ffffffffa008faad>] rtl8169_poll+0x3d/0x270 [r8169]
> [87578.494580] [<ffffffff810397a9>] ? default_spin_lock_flags+0x9/0x10
> [87578.494586] [<ffffffff8146029f>] net_rx_action+0x10f/0x250
> [87578.494594] [<ffffffffa008d54e>] ? rtl8169_interrupt+0xde/0x1e0
> [r8169]
> [87578.494600] [<ffffffff8106e467>] __do_softirq+0xb7/0x1e0
> [87578.494605] [<ffffffff810c52c0>] ? handle_IRQ_event+0x60/0x170
> [87578.494610] [<ffffffff810142ec>] call_softirq+0x1c/0x30
> [87578.494614] [<ffffffff81015cb5>] do_softirq+0x65/0xa0
> [87578.494618] [<ffffffff8106e305>] irq_exit+0x85/0x90
> [87578.494623] [<ffffffff81549515>] do_IRQ+0x75/0xf0
> [87578.494627] [<ffffffff81013b13>] ret_from_intr+0x0/0x11
> [87578.494629] <EOI> [<ffffffff8130f7cb>] ? acpi_idle_enter_c1+0xa3/0xc1
> [87578.494639] [<ffffffff8130f7aa>] ? acpi_idle_enter_c1+0x82/0xc1
> [87578.494646] [<ffffffff8143a5a7>] ? cpuidle_idle_call+0xa7/0x140
> [87578.494652] [<ffffffff81011e73>] ? cpu_idle+0xb3/0x110
> [87578.494657] [<ffffffff8153e27e>] ? start_secondary+0xa8/0xaa
> [87578.494660] Mem-Info:
> [87578.494662] Node 0 DMA per-cpu:
> [87578.494666] CPU 0: hi: 0, btch: 1 usd: 0
> [87578.494669] CPU 1: hi: 0, btch: 1 usd: 0
> [87578.494672] CPU 2: hi: 0, btch: 1 usd: 0
> [87578.494674] CPU 3: hi: 0, btch: 1 usd: 0
> [87578.494677] Node 0 DMA32 per-cpu:
> [87578.494680] CPU 0: hi: 186, btch: 31 usd: 173
> [87578.494683] CPU 1: hi: 186, btch: 31 usd: 87
> [87578.494686] CPU 2: hi: 186, btch: 31 usd: 168
> [87578.494689] CPU 3: hi: 186, btch: 31 usd: 63
> [87578.494691] Node 0 Normal per-cpu:
> [87578.494695] CPU 0: hi: 186, btch: 31 usd: 177
> [87578.494698] CPU 1: hi: 186, btch: 31 usd: 176
> [87578.494700] CPU 2: hi: 186, btch: 31 usd: 82
> [87578.494703] CPU 3: hi: 186, btch: 31 usd: 191
> [87578.494710] active_anon:22970 inactive_anon:6433 isolated_anon:0
> [87578.494711] active_file:916528 inactive_file:914736 isolated_file:0
> [87578.494713] unevictable:0 dirty:135959 writeback:24423 unstable:0
> [87578.494714] free:9990 slab_reclaimable:59767 slab_unreclaimable:11135
> [87578.494716] mapped:119343 shmem:985 pagetables:2113 bounce:0
> [87578.494719] Node 0 DMA free:15860kB min:20kB low:24kB high:28kB
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
> unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15272kB
> mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
> slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
> pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB
> pages_scanned:0 all_unreclaimable? yes
> [87578.494733] lowmem_reserve[]: 0 2866 7852 7852
> [87578.494738] Node 0 DMA32 free:21420kB min:4136kB low:5168kB
> high:6204kB active_anon:4056kB inactive_anon:5856kB
> active_file:1322360kB inactive_file:1320432kB unevictable:0kB
> isolated(anon):0kB isolated(file):0kB present:2935456kB mlocked:0kB
> dirty:190824kB writeback:31900kB mapped:157676kB shmem:0kB
> slab_reclaimable:107316kB slab_unreclaimable:15480kB kernel_stack:56kB
> pagetables:764kB unstable:0kB bounce:0kB writeback_tmp:0kB
> pages_scanned:0 all_unreclaimable? no
> [87578.494754] lowmem_reserve[]: 0 0 4986 4986
> [87578.494759] Node 0 Normal free:2680kB min:7192kB low:8988kB
> high:10788kB active_anon:87824kB inactive_anon:19876kB
> active_file:2343752kB inactive_file:2338512kB unevictable:0kB
> isolated(anon):0kB isolated(file):0kB present:5105664kB mlocked:0kB
> dirty:353012kB writeback:65792kB mapped:319696kB shmem:3940kB
> slab_reclaimable:131752kB slab_unreclaimable:29060kB
> kernel_stack:2160kB pagetables:7688kB unstable:0kB bounce:0kB
> writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> [87578.494775] lowmem_reserve[]: 0 0 0 0
> [87578.494779] Node 0 DMA: 3*4kB 3*8kB 3*16kB 1*32kB 2*64kB 2*128kB
> 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15860kB
> [87578.494792] Node 0 DMA32: 789*4kB 765*8kB 589*16kB 1*32kB 1*64kB
> 4*128kB 4*256kB 2*512kB 0*1024kB 0*2048kB 0*4096kB = 21356kB
> [87578.494805] Node 0 Normal: 374*4kB 4*8kB 20*16kB 1*32kB 0*64kB
> 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2648kB
You seem to have 4K pages available still. I wonder why the page
allocator isn't giving them to SLUB?
> [87578.494818] 1832322 total pagecache pages
> [87578.494820] 0 pages in swap cache
> [87578.494823] Swap cache stats: add 0, delete 0, find 0/0
> [87578.494825] Free swap = 0kB
> [87578.494827] Total swap = 0kB
> [87578.531041] 2064368 pages RAM
> [87578.531044] 66019 pages reserved
> [87578.531046] 1501227 pages shared
> [87578.531048] 619257 pages non-shared
> [87578.531053] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
> [87578.531057] cache: kmalloc-4096, object size: 4096, buffer size:
> 4096, default order: 3, min order: 0
> [87578.531061] node 0: slabs: 1322, objs: 4129, free: 0
>
> This actually made the machine go offline for hours before it for some
> reason came back. The second time this happened it did not come back
> (waited 8 hours).
Do you see these out-of-memory problems with 2.6.35?

2010-08-29 12:38:24

by Mikael Abrahamsson

[permalink] [raw]
Subject: Re: 2.6.34.1 page allocation failure

On Sun, 29 Aug 2010, Pekka Enberg wrote:

> Do you see these out-of-memory problems with 2.6.35?

Haven't tried it.

Has there been substantial work done there that changes things so that if
I reproduce it on 2.6.35, someone will look into the issue in earnest?
Since I'll most likely have to compile a new kernel, are there any debug
options I should enable to give more information to aid fault finding?

I'll start with the .config file from Ubuntu 10.04 2.6.32 kernel and
oldconfig from there.

--
Mikael Abrahamsson email: [email protected]

2010-08-29 13:17:36

by Pekka Enberg

[permalink] [raw]
Subject: Re: 2.6.34.1 page allocation failure

On Sun, 29 Aug 2010, Pekka Enberg wrote:
>> Do you see these out-of-memory problems with 2.6.35?
On 29.8.2010 15.38, Mikael Abrahamsson wrote:
> Haven't tried it.
>
> Has there been substantial work done there that changes things so that
> if I reproduce it on 2.6.35, someone will look into the issue in
> earnest? Since I'll most likely have to compile a new kernel, are
> there any debug options I should enable to give more information to
> aid fault finding?
There aren't any debug options that need to be enabled. The reason I'm
asking is because we had a bunch of similar issues being reported
earlier that got fixed and it's been calm for a while. That's why it
would be interesting to know if 2.6.35 or 2.6.36-rc2 (if it's not too
unstable to test) fixes things.

Pekka

2010-08-29 15:37:17

by Mikael Abrahamsson

[permalink] [raw]
Subject: Re: 2.6.34.1 page allocation failure

On Sun, 29 Aug 2010, Pekka Enberg wrote:

> There aren't any debug options that need to be enabled. The reason I'm
> asking is because we had a bunch of similar issues being reported
> earlier that got fixed and it's been calm for a while. That's why it
> would be interesting to know if 2.6.35 or 2.6.36-rc2 (if it's not too
> unstable to test) fixes things.

Oki, I have installed 2.6.35 now (found backport from ubuntu 10.10 for
10.04), just need to do a reboot at some convenient time.

--
Mikael Abrahamsson email: [email protected]

2010-08-31 20:28:19

by Mikael Abrahamsson

[permalink] [raw]
Subject: Re: 2.6.34.1 page allocation failure

On Sun, 29 Aug 2010, Mikael Abrahamsson wrote:

> On Sun, 29 Aug 2010, Pekka Enberg wrote:
>
>> There aren't any debug options that need to be enabled. The reason I'm
>> asking is because we had a bunch of similar issues being reported earlier
>> that got fixed and it's been calm for a while. That's why it would be
>> interesting to know if 2.6.35 or 2.6.36-rc2 (if it's not too unstable to
>> test) fixes things.
>
> Oki, I have installed 2.6.35 now (found backport from ubuntu 10.10 for
> 10.04), just need to do a reboot at some convenient time.

I just rebooted and ran a similar load of network+disk load that made the
machine give "swapper allocation failure" messages before, and I couldn't
reproduce it with 2.6.35:

2.6.35-19-generic #25~lucid1-Ubuntu SMP Wed Aug 25 03:50:05 UTC 2010 x86_64 GNU/Linux

Doing "sync" in the middle made sync take more than 5+ minutes to complete
(2 hung-task messages in dmesg), but at least nothing ran out of memory.

Considering the amount of people running 2.6.32 and who will be running it
in the future, it still worries me that this is present in 2.6.32 (and
earlier kernels as well).

--
Mikael Abrahamsson email: [email protected]