2004-11-02 07:16:45

by Brad Campbell

[permalink] [raw]
Subject: 2.6.10-rc1-bk page allocation failure. order:2, mode:0x20

G'day all,

I'm still getting quite a lot of these come up in the logs when the system is under mild load.
I suspect it might have something to do with running an MTU of 9000 on the main ethernet port which
is directly feeding a workstation with an NFS root (and thus gets quite a high load at times)

It's not so much an issue but it does cause the workstation to stall for up to a second while it
waits for data every time it occurs.

The loaded ethernet port is this one on an PCI card

0000:00:0d.0 Ethernet controller: Marvell Technology Group Ltd. Yukon Gigabit Ethernet
10/100/1000Base-T Adapter (rev 12)

This started rearing its ugly head when I moved from 2.6.5 to 2.6.9-preX and persists with BK as of
about 2 days ago.

Regards,
Brad

srv:/home/brad# uname -a
Linux srv 2.6.10-rc1 #2 Sun Oct 31 10:56:30 GST 2004 i686 GNU/Linux

srv:/home/brad# free
total used free shared buffers cached
Mem: 514676 251088 263588 0 13280 156328
-/+ buffers/cache: 81480 433196
Swap: 987988 284 987704

srv:/home/brad# lspci
0000:00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge (rev 80)
0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
0000:00:09.0 Ethernet controller: 3Com Corporation 3c940 10/100/1000Base-T [Marvell] (rev 12)
0000:00:0b.0 Unknown mass storage controller: Promise Technology, Inc. PDC20318 (SATA150 TX4) (rev 02)
0000:00:0c.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 46)
0000:00:0d.0 Ethernet controller: Marvell Technology Group Ltd. Yukon Gigabit Ethernet
10/100/1000Base-T Adapter (rev 12)
0000:00:0e.0 Unknown mass storage controller: Promise Technology, Inc. PDC20318 (SATA150 TX4) (rev 02)
0000:00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80)
0000:00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus
Master IDE (rev 06)
0000:00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
0000:00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
0000:00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
0000:00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
0000:00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
0000:00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [K8T800 South]
0000:00:13.0 Unknown mass storage controller: Promise Technology, Inc. PDC20318 (SATA150 TX4) (rev 02)
0000:01:00.0 VGA compatible controller: Silicon Integrated Systems [SiS] 315PRO PCI/AGP VGA Display
Adapter

srv:/home/brad# ifconfig
eth0 Link encap:Ethernet HWaddr 00:0E:A6:41:45:94
inet addr:192.168.3.82 Bcast:192.168.3.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
Interrupt:18 Memory:de800000-0

eth1 Link encap:Ethernet HWaddr 00:04:E2:8E:1E:AD
inet addr:192.168.2.82 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:5207091 errors:430 dropped:430 overruns:0 frame:0
TX packets:4805338 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:322428903 (307.4 MiB) TX bytes:2838601552 (2.6 GiB)
Interrupt:16 Memory:dc800000-0

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:138 errors:0 dropped:0 overruns:0 frame:0
TX packets:138 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:9378 (9.1 KiB) TX bytes:9378 (9.1 KiB)


srv:/home/brad# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/hda1 19G 9.4G 8.1G 54% /
tmpfs 252M 0 252M 0% /dev/shm
/dev/hda3 165G 38G 119G 24% /raid0
/dev/md0 2.1T 2.1T 12G 100% /raid
/dev/md2 459G 315G 145G 69% /raid2

swapper: page allocation failure. order:2, mode:0x20
[<c0138a98>] __alloc_pages+0x1c8/0x390
[<c0138c7f>] __get_free_pages+0x1f/0x40
[<c013c18d>] kmem_getpages+0x1d/0xb0
[<c013ce26>] cache_grow+0xb6/0x170
[<c013d03e>] cache_alloc_refill+0x15e/0x220
[<c013d400>] __kmalloc+0x80/0xa0
[<c02e61e7>] alloc_skb+0x47/0xe0
[<e08bc871>] FillRxDescriptor+0x31/0xc0 [sk98lin]
[<e08bc824>] FillRxRing+0x54/0x70 [sk98lin]
[<e08bb89a>] SkGeIsrOnePort+0x17a/0x190 [sk98lin]
[<c0132b24>] handle_IRQ_event+0x34/0x70
[<c0132c42>] __do_IRQ+0xe2/0x160
[<c0106666>] do_IRQ+0x26/0x40
[<c0104aa8>] common_interrupt+0x18/0x20
[<c0101fb0>] default_idle+0x0/0x30
[<c0101fd3>] default_idle+0x23/0x30
[<c010204a>] cpu_idle+0x3a/0x60
[<c047e7c2>] start_kernel+0x142/0x160
[<c047e3b0>] unknown_bootoption+0x0/0x1b0
syslogd: page allocation failure. order:2, mode:0x20
[<c0138a98>] __alloc_pages+0x1c8/0x390
[<c0138c7f>] __get_free_pages+0x1f/0x40
[<c013c18d>] kmem_getpages+0x1d/0xb0
[<c013ce26>] cache_grow+0xb6/0x170
[<c013d03e>] cache_alloc_refill+0x15e/0x220
[<c013d400>] __kmalloc+0x80/0xa0
[<c02e61e7>] alloc_skb+0x47/0xe0
[<e08bc871>] FillRxDescriptor+0x31/0xc0 [sk98lin]
[<e08bc824>] FillRxRing+0x54/0x70 [sk98lin]
[<e08bb89a>] SkGeIsrOnePort+0x17a/0x190 [sk98lin]
[<c0132b24>] handle_IRQ_event+0x34/0x70
[<c0132c42>] __do_IRQ+0xe2/0x160
[<c0106666>] do_IRQ+0x26/0x40
[<c0104aa8>] common_interrupt+0x18/0x20
syslogd: page allocation failure. order:2, mode:0x20
[<c0138a98>] __alloc_pages+0x1c8/0x390
[<c0104e15>] show_trace+0x35/0x90
[<c0138c7f>] __get_free_pages+0x1f/0x40
[<c013c18d>] kmem_getpages+0x1d/0xb0
[<c013ce26>] cache_grow+0xb6/0x170
[<c013d03e>] cache_alloc_refill+0x15e/0x220
[<c0138c7f>] __get_free_pages+0x1f/0x40
[<c013d400>] __kmalloc+0x80/0xa0
[<c02e61e7>] alloc_skb+0x47/0xe0
[<e08bc871>] FillRxDescriptor+0x31/0xc0 [sk98lin]
[<e08bc824>] FillRxRing+0x54/0x70 [sk98lin]
[<e08c0584>] SkDrvEvent+0xa44/0xaa0 [sk98lin]
[<e08cd953>] SkGeSirqIsr+0x1d3/0x890 [sk98lin]
[<e08d06e4>] SkEventDispatcher+0xc4/0x160 [sk98lin]
[<e08bb7cc>] SkGeIsrOnePort+0xac/0x190 [sk98lin]
[<c0132b24>] handle_IRQ_event+0x34/0x70
[<c0132c42>] __do_IRQ+0xe2/0x160
[<c0106666>] do_IRQ+0x26/0x40
[<c0104aa8>] common_interrupt+0x18/0x20


2004-11-02 08:03:17

by Nick Piggin

[permalink] [raw]
Subject: Re: 2.6.10-rc1-bk page allocation failure. order:2, mode:0x20

Brad Campbell wrote:
> G'day all,
>
> I'm still getting quite a lot of these come up in the logs when the
> system is under mild load.
> I suspect it might have something to do with running an MTU of 9000 on
> the main ethernet port which is directly feeding a workstation with an
> NFS root (and thus gets quite a high load at times)
>
> It's not so much an issue but it does cause the workstation to stall for
> up to a second while it waits for data every time it occurs.
>
> The loaded ethernet port is this one on an PCI card
>
> 0000:00:0d.0 Ethernet controller: Marvell Technology Group Ltd. Yukon
> Gigabit Ethernet 10/100/1000Base-T Adapter (rev 12)
>
> This started rearing its ugly head when I moved from 2.6.5 to 2.6.9-preX
> and persists with BK as of about 2 days ago.
>

There are patches in the newest -mm kernels that should help the
problem. If you're willing to test them, the feedback would be
welcome.

Thanks
Nick

2004-11-02 08:17:50

by Brad Campbell

[permalink] [raw]
Subject: Re: 2.6.10-rc1-bk page allocation failure. order:2, mode:0x20

Nick Piggin wrote:
> Brad Campbell wrote:
>
>> G'day all,
>>
>> I'm still getting quite a lot of these come up in the logs when the
>> system is under mild load.
>> I suspect it might have something to do with running an MTU of 9000 on
>> the main ethernet port which is directly feeding a workstation with an
>> NFS root (and thus gets quite a high load at times)
>>
>> It's not so much an issue but it does cause the workstation to stall
>> for up to a second while it waits for data every time it occurs.
>>
>> The loaded ethernet port is this one on an PCI card
>>
>> 0000:00:0d.0 Ethernet controller: Marvell Technology Group Ltd. Yukon
>> Gigabit Ethernet 10/100/1000Base-T Adapter (rev 12)
>>
>> This started rearing its ugly head when I moved from 2.6.5 to
>> 2.6.9-preX and persists with BK as of about 2 days ago.
>>
>
> There are patches in the newest -mm kernels that should help the
> problem. If you're willing to test them, the feedback would be
> welcome.

Always willing to test specific patches. Can I just grab the broken out patches, or pull some
specific csets from a bk tree? I'm not particularly keen on running an -mm kernel on this box if I
can avoid it (It's a server in 24hr use with 2.5TB of data where the backup media is 7,000km away).

Regards,
Brad

2004-11-02 08:46:59

by Brad Campbell

[permalink] [raw]
Subject: Re: 2.6.10-rc1-bk page allocation failure. order:2, mode:0x20

Nick Piggin wrote:

>> Always willing to test specific patches. Can I just grab the broken
>> out patches, or pull some specific csets from a bk tree? I'm not
>> particularly keen on running an -mm kernel on this box if I can avoid
>> it (It's a server in 24hr use with 2.5TB of data where the backup
>> media is 7,000km away).
>>
>
> OK fair enough.
>
> Here is a rollup of the 3 patches that are supposed to help with
> the problem. It is diffed against 2.6.10-rc1-bk8, which you probably
> wouldn't want to run either.
>
> Not sure how cleanly it will apply onto 2.6.9... shouldn't be too bad
> I think.
>

I'm actually running a reasonably recent BK pull of 2.6.10-rc1 as of a couple of days ago, but I did
some pretty severe testing and evaluation with my raid disks removed and replaced with spares before
I let it loose on the real media. I have applied those patches and I'll beat on it for a few hours
and see how it goes. I have some pretty defined cron jobs that make it easy to reproduce.

Regards,
Brad