2004-04-24 18:37:22

by Bastian Blank

[permalink] [raw]
Subject: page allocation failures with 2.6.5 on s390

Today I see the following error message in the logs of two machines
with 2.6.5 on s390, both 31 and 64 bit.

| swapper: page allocation failure. order:2, mode:0x20
| 0072d8f0 00028b18 00405490 80028bbc 80028dac 0072d868 00392f00 0039068a
| 00000002 00000001 0039a400 0000001a
| Call Trace:
| [<0000000000049d30>] __alloc_pages+0x318/0x35c
| [<0000000000049dbe>] __get_free_pages+0x4a/0x78
| [<000000000004d838>] cache_grow+0xe0/0x39c
| [<000000000004dcc0>] cache_alloc_refill+0x1cc/0x2d8
| [<000000000004e1f2>] __kmalloc+0xaa/0xb8
| [<000000000023e810>] alloc_skb+0x5c/0xe4
| [<00000000002080ca>] qeth_read_in_buffer+0xd16/0xda0
| [<0000000000212e22>] qeth_qdio_input_handler+0x42a/0x57c
| [<00000000001de17c>] qdio_handler+0xad4/0x10f4
| [<00000000001d46da>] ccw_device_call_handler+0x8a/0x94
| [<00000000001d3922>] ccw_device_irq+0x7a/0xb4
| [<00000000001d42de>] io_subchannel_irq+0x7e/0xc4
| [<00000000001cfcd2>] do_IRQ+0x18a/0x1b0
| [<0000000000020072>] io_return+0x0/0x10

It causes a lot of state-D processes on one machine.

Is that problem known?

Bastian

--
There's another way to survive. Mutual trust -- and help.
-- Kirk, "Day of the Dove", stardate unknown


Attachments:
(No filename) (1.17 kB)
signature.asc (197.00 B)
Digital signature
Download all attachments

2004-04-24 20:22:32

by Andrew Morton

[permalink] [raw]
Subject: Re: page allocation failures with 2.6.5 on s390

Bastian Blank <[email protected]> wrote:
>
> Today I see the following error message in the logs of two machines
> with 2.6.5 on s390, both 31 and 64 bit.
>
> | swapper: page allocation failure. order:2, mode:0x20
> | 0072d8f0 00028b18 00405490 80028bbc 80028dac 0072d868 00392f00 0039068a
> | 00000002 00000001 0039a400 0000001a
> | Call Trace:
> | [<0000000000049d30>] __alloc_pages+0x318/0x35c
> | [<0000000000049dbe>] __get_free_pages+0x4a/0x78
> | [<000000000004d838>] cache_grow+0xe0/0x39c
> | [<000000000004dcc0>] cache_alloc_refill+0x1cc/0x2d8
> | [<000000000004e1f2>] __kmalloc+0xaa/0xb8
> | [<000000000023e810>] alloc_skb+0x5c/0xe4
> | [<00000000002080ca>] qeth_read_in_buffer+0xd16/0xda0
> | [<0000000000212e22>] qeth_qdio_input_handler+0x42a/0x57c
> | [<00000000001de17c>] qdio_handler+0xad4/0x10f4
> | [<00000000001d46da>] ccw_device_call_handler+0x8a/0x94
> | [<00000000001d3922>] ccw_device_irq+0x7a/0xb4
> | [<00000000001d42de>] io_subchannel_irq+0x7e/0xc4
> | [<00000000001cfcd2>] do_IRQ+0x18a/0x1b0
> | [<0000000000020072>] io_return+0x0/0x10

This is pretty much unavoidable - the kernel is looking for four
physically-contiguous free pages at interrupt time, and there aren't any.

You can reduce its frequency by increasing /proc/sys/vm/min_free_kbytes.

The best fix would be to alter the driver to use order-0 pages if that's
possible.

> It causes a lot of state-D processes on one machine.

This should not happen. If it happens again, try to capture an all-task
backtrace into the logs with

echo t > /proc/sysrq-trigger


2004-04-25 08:18:50

by Bastian Blank

[permalink] [raw]
Subject: Re: page allocation failures with 2.6.5 on s390

On Sat, Apr 24, 2004 at 01:21:59PM -0700, Andrew Morton wrote:
> Bastian Blank <[email protected]> wrote:
> > Today I see the following error message in the logs of two machines
> > with 2.6.5 on s390, both 31 and 64 bit.
> This is pretty much unavoidable - the kernel is looking for four
> physically-contiguous free pages at interrupt time, and there aren't any.
> You can reduce its frequency by increasing /proc/sys/vm/min_free_kbytes.
> The best fix would be to alter the driver to use order-0 pages if that's
> possible.

The pages are 4KiB large, so it allocates 16KiB at once. So the problem is rather clear:
| 3: eth0: <BROADCAST,MULTICAST,UP> mtu 8192 qdisc pfifo_fast qlen 100

> > It causes a lot of state-D processes on one machine.
> This should not happen. If it happens again, try to capture an all-task
> backtrace into the logs with
> echo t > /proc/sysrq-trigger

I'm currently not sure if this was the problem causing D process. I
reread the logs and found the following:
| Unable to handle kernel pointer dereference at virtual kernel address 0000000000000000
| Oops: 0004 [#1]
| CPU: 0 Not tainted
| Process pdflush (pid: 33, task: 000000000fa58e90, ksp: 000000000fa53b98)
| Krnl PSW : 0700200180000000 00000000001022b0 (nfs3_request_init+0x34/0x50)
| Krnl GPRS: 0000000000000000 0000000000000001 0000000000000000 0000000000000028
| 0000000000000001 0000000000000000 0000000000001000 000000000672f548
| 0000000000403620 000000000dceb600 0000000000000000 0000000000823020
| 00000000061bfd00 000000000029afb8 00000000001022a0 000000000fa53178
| Krnl Code: ba 54 30 00 a7 44 ff fc e3 20 c0 30 00 24 e3 40 f1 10 00 04
| Call Trace:
| [nfs_create_request+216/312] nfs_create_request+0xd8/0x138
| [nfs_update_request+678/1236] nfs_update_request+0x2a6/0x4d4
| [nfs_writepage_async+44/264] nfs_writepage_async+0x2c/0x108
| [nfs_writepage+466/476] nfs_writepage+0x1d2/0x1dc
| [mpage_writepages+796/1084] mpage_writepages+0x31c/0x43c
| [nfs_writepages+56/196] nfs_writepages+0x38/0xc4
| [do_writepages+46/76] do_writepages+0x2e/0x4c
| [__sync_single_inode+200/744] __sync_single_inode+0xc8/0x2e8
| [__writeback_single_inode+246/260] __writeback_single_inode+0xf6/0x104
| [sync_sb_inodes+472/792] sync_sb_inodes+0x1d8/0x318
| [writeback_inodes+144/248] writeback_inodes+0x90/0xf8
| [wb_kupdate+182/324] wb_kupdate+0xb6/0x144
| [__pdflush+430/700] __pdflush+0x1ae/0x2bc
| [pdflush+46/60] pdflush+0x2e/0x3c
| [kthread+228/236] kthread+0xe4/0xec
| [kernel_thread_starter+20/28] kernel_thread_starter+0x14/0x1c

Bastian

--
Insults are effective only where emotion is present.
-- Spock, "Who Mourns for Adonais?" stardate 3468.1


Attachments:
(No filename) (2.64 kB)
signature.asc (197.00 B)
Digital signature
Download all attachments

2004-04-25 10:50:25

by Arnd Bergmann

[permalink] [raw]
Subject: Re: page allocation failures with 2.6.5 on s390

Bastian Blank <[email protected]> wrote:
> On Sat, Apr 24, 2004 at 01:21:59PM -0700, Andrew Morton wrote:
> > Bastian Blank <[email protected]> wrote:
> > > Today I see the following error message in the logs of two machines
> > > with 2.6.5 on s390, both 31 and 64 bit.
> > This is pretty much unavoidable - the kernel is looking for four
> > physically-contiguous free pages at interrupt time, and there aren't any.
> > You can reduce its frequency by increasing /proc/sys/vm/min_free_kbytes.
> > The best fix would be to alter the driver to use order-0 pages if that's
> > possible.
>
> The pages are 4KiB large, so it allocates 16KiB at once. So the problem is =
> rather clear:
> | 3: eth0: <BROADCAST,MULTICAST,UP> mtu 8192 qdisc pfifo_fast qlen 100

Is this the vanilla kernel? Note that 2.6.6-rc2 contains a different qeth
driver (the same that you get if you apply the 2.6.5 patches from
http://www10.software.ibm.com/developerworks/opensource/linux390), which
might help you here.
Getting the mtu size below 7500 could also give you relief if that causes
order-1 allocations instead of order-2

> I'm currently not sure if this was the problem causing D process. I
> reread the logs and found the following:
> | Unable to handle kernel pointer dereference at virtual kernel address 000=
> 0000000000000

Since this is the pdflush thread that was killed, it's just a matter of time
until everything hangs.

> | Oops: 0004 [#1]
> | CPU: 0 Not tainted
> | Process pdflush (pid: 33, task: 000000000fa58e90, ksp: 000000000fa53b98)
> | Krnl PSW : 0700200180000000 00000000001022b0 (nfs3_request_init+0x34/0x50)
> | Krnl GPRS: 0000000000000000 0000000000000001 0000000000000000 0000000000000028
> | 0000000000000001 0000000000000000 0000000000001000 000000000672f548
> | 0000000000403620 000000000dceb600 0000000000000000 0000000000823020
> | 00000000061bfd00 000000000029afb8 00000000001022a0 000000000fa53178
> | Krnl Code: ba 54 30 00 a7 44 ff fc e3 20 c0 30 00 24 e3 40 f1 10 00 04

ba 54 30 00 means 'cs %r5,%r4,0(%r3)' and accessing %r3 failed (it contains
0x28, as you can see).

In the corresponding code (from fs/nfs/nfs3proc.c), you can see what this
means:
void nfs3_request_init(struct nfs_page *req, struct file *filp)
{
req->wb_cred = get_rpccred(nfs_cred(req->wb_inode, filp));
}
nfs_cread(...) returns NULL, and get_rpccred(...) accesses this at
offset 0x28 with atomic_inc (aka cs).
Someone familiar with nfs needs to look into how this can happen,
but it does not appear to be s390 specific.

> | Call Trace:
> | [nfs_create_request+216/312] nfs_create_request+0xd8/0x138
> | [nfs_update_request+678/1236] nfs_update_request+0x2a6/0x4d4
> | [nfs_writepage_async+44/264] nfs_writepage_async+0x2c/0x108
> | [nfs_writepage+466/476] nfs_writepage+0x1d2/0x1dc
> | [mpage_writepages+796/1084] mpage_writepages+0x31c/0x43c
> | [nfs_writepages+56/196] nfs_writepages+0x38/0xc4
> | [do_writepages+46/76] do_writepages+0x2e/0x4c
> | [__sync_single_inode+200/744] __sync_single_inode+0xc8/0x2e8
> | [__writeback_single_inode+246/260] __writeback_single_inode+0xf6/0x104
> | [sync_sb_inodes+472/792] sync_sb_inodes+0x1d8/0x318
> | [writeback_inodes+144/248] writeback_inodes+0x90/0xf8
> | [wb_kupdate+182/324] wb_kupdate+0xb6/0x144
> | [__pdflush+430/700] __pdflush+0x1ae/0x2bc
> | [pdflush+46/60] pdflush+0x2e/0x3c
> | [kthread+228/236] kthread+0xe4/0xec
> | [kernel_thread_starter+20/28] kernel_thread_starter+0x14/0x1c

Arnd <><


Attachments:
(No filename) (3.42 kB)
(No filename) (189.00 B)
signature
Download all attachments