2004-01-19 11:34:06

by Oliver Kiddle

[permalink] [raw]
Subject: page allocation failure

There seems to be a problem with 2.6.1 on my machine. It will be fine
for a matter of a few days and then this error will appear on the
console. The message then appears repeatedly and continuously. The
first I know is that my remote login shell ceases to respond. About the
only thing I can do is switch between virtual consoles (until I hit the
reset button).

/var/log/messages shows:
kernel: cat: page allocation failure. order:0, mode:0x20

Then the same for lots of other processes (pdflush, syslogd, klogd,
kswapd0, nfsd to name a few). I expect that after a point it is unable
to even log stuff so syslog is quiet after a while.

It has happened three times now and on all occasions, I was untarring a
huge file on an XFS partition. I assume the problem is something to do
with VM. The machine has 1GB of RAM which should be plenty. For the
most part it is just serving NFS and NIS (to no more than about 10
clients).

The hardware is a Dell PowerEdge 600SC. It's a new machine that never
ran 2.4 before. I can supply any other information that might help in
diagnosing the problem. I don't subscribe so please CC me in any reply
(but I'll keep an eye on the archives).

If anyone can suggest any /proc variables I might change to reduce the
risk of it doing this again, I would appreciate it. I tried increasing
/proc/sys/vm/min_free_kbytes after the first time this happened. Not
that I understand what that does: I searched the archives and it was
mentioned in a vaguely relevant looking post.

Cheers

Oliver Kiddle


2004-01-19 14:54:37

by Mike Fedyk

[permalink] [raw]
Subject: Re: page allocation failure

On Mon, Jan 19, 2004 at 12:36:02PM +0100, Oliver Kiddle wrote:
> If anyone can suggest any /proc variables I might change to reduce the
> risk of it doing this again, I would appreciate it. I tried increasing
> /proc/sys/vm/min_free_kbytes after the first time this happened. Not
> that I understand what that does: I searched the archives and it was
> mentioned in a vaguely relevant looking post.

Try running "vmstat 1" and output that to a file, and post your /proc/meminfo.

Do you start getting the error before a couple of days, or you just can't
login after that amount of time?

2004-01-19 17:25:00

by Oliver Kiddle

[permalink] [raw]
Subject: Re: page allocation failure

Mike Fedyk wrote:
>
> Try running "vmstat 1" and output that to a file, and post your /proc/meminfo.
>
> Do you start getting the error before a couple of days, or you just can't
> login after that amount of time?

I can't log in immediately following the first occurence of the error.
I can type in a username at the login prompt but nothing happens after
pressing enter. Two days was just a rough idea of how long the system
could be up before going down. It has gone down twice since I posted
earlier so it wasn't even vaguely an accurate figure. On both
occasions, there has not been a "page allocation failure" error though.

These last two times, I was running xfsdump along with a nfsd activity.
I had the following, possibly unrelated messages on the console.

st0: Block limits 1 - 16777215 bytes.
spurious 8259A interrupt IRQ7

I've put /proc/meminfo below though that is from the beginning while
everything is still fine. The vmstat output is more interesting and I
have it captured for the period when it went down.

vmstat output starts off like this:
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 0 947908 5792 37128 0 0 54 49 1072 121 0 2 96 2

The free column then slowly drops.

Shortly before the end, is this sequence:

2 1 0 57036 2412 62044 0 0 2224 512 1950 188 1 70 20 9
0 0 0 55104 1284 64096 0 0 2204 320 1663 154 0 51 42 7
2 1 0 53048 44 67168 0 0 3080 0 1939 32 0 59 38 3
2 0 1388 49748 56 69592 0 1388 2796 1393 1909 161 1 64 15 19
3 2 1928 45828 60 72376 64 1208 3056 1208 2146 184 3 70 2 25
1 4 1464 94700 60 22088 0 808 3428 828 1873 213 1 58 0 41
0 1 1176 93716 60 23060 356 316 1596 429 2079 342 0 56 4 40
3 3 1176 94116 64 22368 144 0 1124 311 6419 1369 0 6 1 93
1 2 1176 109176 36 7360 0 0 828 159 29189 7978 0 1 0 99

This is the first time the swpd column is non-zero. The figures don't
change a vast amount after that and only 25 samples later, the very last
sample I got looked like this:

0 1 1176 109248 40 7364 0 0 0 0 1009 25 0 1 0 99

I can send you the full output if you want (70kb compressed).

/proc/meminfo:
MemTotal: 1034796 kB
MemFree: 884620 kB
Buffers: 14768 kB
Cached: 61192 kB
SwapCached: 0 kB
Active: 51972 kB
Inactive: 35992 kB
HighTotal: 131008 kB
HighFree: 57148 kB
LowTotal: 903788 kB
LowFree: 827472 kB
SwapTotal: 996020 kB
SwapFree: 996020 kB
Dirty: 24 kB
Writeback: 0 kB
Mapped: 16772 kB
Slab: 31064 kB
Committed_AS: 24876 kB
PageTables: 536 kB
VmallocTotal: 114680 kB
VmallocUsed: 692 kB
VmallocChunk: 113988 kB

I have /tmp mounted using tmpfs if that is in any way significant.

Thanks

Oliver

2004-01-19 18:18:03

by Mike Fedyk

[permalink] [raw]
Subject: Re: page allocation failure

On Mon, Jan 19, 2004 at 06:29:03PM +0100, Oliver Kiddle wrote:
> could be up before going down. It has gone down twice since I posted
> earlier so it wasn't even vaguely an accurate figure. On both
> occasions, there has not been a "page allocation failure" error though.

Ok, turn on the nmi_watchdog, and see if you get any traces...

2004-01-20 03:38:23

by Andrew Morton

[permalink] [raw]
Subject: Re: page allocation failure

Oliver Kiddle <[email protected]> wrote:
>
> There seems to be a problem with 2.6.1 on my machine. It will be fine
> for a matter of a few days and then this error will appear on the
> console. The message then appears repeatedly and continuously. The
> first I know is that my remote login shell ceases to respond. About the
> only thing I can do is switch between virtual consoles (until I hit the
> reset button).
>
> /var/log/messages shows:
> kernel: cat: page allocation failure. order:0, mode:0x20
>
> Then the same for lots of other processes (pdflush, syslogd, klogd,
> kswapd0, nfsd to name a few). I expect that after a point it is unable
> to even log stuff so syslog is quiet after a while.
>
> It has happened three times now and on all occasions, I was untarring a
> huge file on an XFS partition. I assume the problem is something to do
> with VM. The machine has 1GB of RAM which should be plenty. For the
> most part it is just serving NFS and NIS (to no more than about 10
> clients).

Does the machine actually recover, or does it grind to a halt and need
resetting?

Is there much network receive happening at the time?

Are you using gig-E with large MTU's?

> If anyone can suggest any /proc variables I might change to reduce the
> risk of it doing this again, I would appreciate it. I tried increasing
> /proc/sys/vm/min_free_kbytes after the first time this happened. Not
> that I understand what that does: I searched the archives and it was
> mentioned in a vaguely relevant looking post.

Yup, min_free_kbytes is the right thing to increase. Try it again, perhaps
increasing it by more - to 10000 or something like that.

min_free_kbytes will increase the amount of memory which the VM keeps in
reserve to satisfy interrupt-time memory allocation attempts - most
especially network receive.


You probably should apply this patch to tell us where the allocation
failures are coming from. Make sure that CONFIG_KALLSYMS is enabled in
kernel config.


diff -puN mm/page_alloc.c~a mm/page_alloc.c
--- 25/mm/page_alloc.c~a Mon Jan 19 19:34:09 2004
+++ 25-akpm/mm/page_alloc.c Mon Jan 19 19:34:21 2004
@@ -674,6 +674,7 @@ nopage:
printk("%s: page allocation failure."
" order:%d, mode:0x%x\n",
p->comm, order, gfp_mask);
+ dump_stack();
}
return NULL;
got_pg:

_

2004-01-20 06:02:19

by Nathan Scott

[permalink] [raw]
Subject: Re: page allocation failure

On Mon, Jan 19, 2004 at 07:38:37PM -0800, Andrew Morton wrote:
> Oliver Kiddle <[email protected]> wrote:
> >
> > It has happened three times now and on all occasions, I was untarring a
> > huge file on an XFS partition. I assume the problem is something to do
> > with VM. The machine has 1GB of RAM which should be plenty. For the
> ...
> You probably should apply this patch to tell us where the allocation
> failures are coming from. Make sure that CONFIG_KALLSYMS is enabled in
> kernel config.

We do have known issues in XFS on 2.6 with handling certain VM
allocation failures -- maybe hitting that here. Christoph has
been looking at making XFS do a better job there; __GFP_NOFAIL
allocations failing seem to be the worst issue for us - on the
occasions I've hit that though, its always immediately fatal.

cheers.

--
Nathan

2004-01-20 17:04:31

by Oliver Kiddle

[permalink] [raw]
Subject: Re: page allocation failure

Andrew Morton wrote:
>
> Does the machine actually recover, or does it grind to a halt and need
> resetting?

It needs resetting.

Today, I noticed that I could still ping it, though. I also had the
magic sysrq key stuff in the kernel and did a showTasks, Sync, Unmount
and kIll. That allowed me to briefly log in as root and save the output
of dmesg before an attempt to run vi caused it to die again.

> Is there much network receive happening at the time?

Not a lot I don't think. Is there anything like vmstat for measuring
network activity?

> Are you using gig-E with large MTU's?

No. 100Mbps full duplex, e1000 driver.

Again this time, I didn't get the "page allocation failure" message so
your patch couldn't print anything. The console was blank apart from
the message about the tape device. As suggested by Mike Fedyk, I had
the nmi_watchdog stuff enabled. Didn't see any output from it though.
Would that have displayed its output to the console?

I've put a few chunks of the saved dmesg output below incase they're
useful. All I have is some of the sysrq showTasks output. xfsdump seems
to be a reliable way to trigger the problem (perhaps once the tape
fills up) and I had run patch imediately before it died. Most other
processes seem to be in schedule_timeout.

I'm away next week and as other people use this machine I'll have to
switch it to 2.4. I'll still have opportunities to reboot to 2.6 and
try to find out what's going on, though.

Oliver

patch D 00000000 0 11620 966 (NOTLB)
d7bb7aa8 00000082 c03676c0 00000000 0000000c 00000050 00000000 c03677d4
c0138240 00001fec 536f5e97 00000bc6 f6d0f2e0 f6d0f4a0 00c029db d7bb7abc
c041dedc d7bb7ae8 c0121b9c d7bb7abc 00c029db 0000007b c040e460 f6c41920
Call Trace:
[<c0138240>] try_to_free_pages+0x9f/0x15f
[<c0121b9c>] schedule_timeout+0x63/0xb7
[<c0121b30>] process_timeout+0x0/0x9
[<c01186f4>] io_schedule_timeout+0x11/0x19
[<c024e343>] blk_congestion_wait+0x7e/0x8d
[<c0118cd6>] autoremove_wake_function+0x0/0x4f
[<c0118cd6>] autoremove_wake_function+0x0/0x4f
[<c0132cd1>] __alloc_pages+0x294/0x319
[<c012f3d5>] find_or_create_page+0xa0/0xaa
[<c0214daf>] _pagebuf_lookup_pages+0x2fa/0x398
[<c0215131>] pagebuf_get+0xba/0x135
[<c0208a70>] xfs_trans_read_buf+0x32f/0x38b
[<c01d8df1>] xfs_da_do_buf+0x6b1/0x9a6
[<c01d9198>] xfs_da_read_buf+0x57/0x5b
[<c01dcf5c>] xfs_dir2_block_lookup_int+0x52/0x192
[<c01dcf5c>] xfs_dir2_block_lookup_int+0x52/0x192
[<c01dce71>] xfs_dir2_block_lookup+0x2f/0xc8
[<c01db3a8>] xfs_dir2_lookup+0xc4/0x13b
[<c01db401>] xfs_dir2_lookup+0x11d/0x13b
[<c0209afe>] xfs_dir_lookup_int+0x4c/0x12b
[<c020f3d6>] xfs_lookup+0x50/0x88
[<c021cae0>] linvfs_lookup+0x67/0x9f
[<c0152ea2>] real_lookup+0xc8/0xea
[<c01530f3>] do_lookup+0x96/0xa1
[<c0153508>] link_path_walk+0x40a/0x7db
[<c0154116>] open_namei+0x83/0x3e1
[<c0146b0f>] filp_open+0x3e/0x64
[<c0146e98>] sys_open+0x5b/0x8b
[<c0108ab7>] syscall_call+0x7/0xb


xfsdump D 9855CCF6 2760 676 673 (NOTLB)
ece09b04 00000082 f7f9e080 9855ccf6 00000baf 00000000 9855ccf6 00000baf
f7f9e080 000012f8 9855d254 00000baf f51106a0 f5110860 f7ffe760 00000000
f7ffe778 ece09b0c c01186db ece09b3c c0131c2b 00000000 00000000 00000000
Call Trace:
[<c01186db>] io_schedule+0xe/0x16
[<c0131c2b>] mempool_alloc+0xfa/0x117
[<c0118cd6>] autoremove_wake_function+0x0/0x4f
[<c021789b>] linvfs_get_block_core+0x87/0x2ab
[<c0118cd6>] autoremove_wake_function+0x0/0x4f
[<c0139296>] __blk_queue_bounce+0x1a1/0x232
[<c013935d>] blk_queue_bounce+0x36/0x4d
[<c024e510>] __make_request+0x4c/0x537
[<c0162ff5>] do_mpage_readpage+0x1a9/0x32a
[<c024eb04>] generic_make_request+0x109/0x18a
[<c022765b>] radix_tree_node_alloc+0x1f/0x5a
[<c02277f4>] radix_tree_insert+0x82/0xb8
[<c024ebc2>] submit_bio+0x3d/0x6b
[<c0162d26>] mpage_bio_submit+0x23/0x32
[<c016324b>] mpage_readpages+0xd5/0x162
[<c0217abf>] linvfs_get_block+0x0/0x43
[<c0134460>] read_pages+0x134/0x13d
[<c0217abf>] linvfs_get_block+0x0/0x43
[<c0132ae3>] __alloc_pages+0xa6/0x319
[<c010adcd>] do_IRQ+0xc4/0xdf
[<c0109424>] common_interrupt+0x18/0x20
[<c013469c>] do_page_cache_readahead+0xbf/0x109
[<c0134852>] page_cache_readahead+0x16c/0x198
[<c012f8d7>] do_generic_mapping_read+0x3c1/0x3d3
[<c012f8e9>] file_read_actor+0x0/0xec
[<c012fb91>] __generic_file_aio_read+0x1bc/0x1ee
[<c012f8e9>] file_read_actor+0x0/0xec
[<c021dcb2>] xfs_read+0x15a/0x26c
[<c0117d92>] wait_for_completion+0x65/0x95
[<c02181af>] linvfs_read_invis+0x90/0xa2
[<c0147549>] do_sync_read+0x8b/0xb7
[<c01217fa>] update_process_times+0x46/0x52
[<c0121674>] update_wall_time+0xd/0x36
[<c0121a6c>] do_timer+0xdf/0xe4
[<c0147625>] vfs_read+0xb0/0x119
[<c01478a0>] sys_read+0x42/0x63
[<c0108ab7>] syscall_call+0x7/0xb

2004-01-20 18:38:26

by Mike Fedyk

[permalink] [raw]
Subject: Re: page allocation failure

On Tue, Jan 20, 2004 at 06:08:34PM +0100, Oliver Kiddle wrote:
> the message about the tape device. As suggested by Mike Fedyk, I had
> the nmi_watchdog stuff enabled. Didn't see any output from it though.
> Would that have displayed its output to the console?

It should have. Run cat /proc/interrupts and again afew seconds later, does
the NMI: number change?

>
> patch D 00000000 0 11620 966 (NOTLB)
> d7bb7aa8 00000082 c03676c0 00000000 0000000c 00000050 00000000 c03677d4
> c0138240 00001fec 536f5e97 00000bc6 f6d0f2e0 f6d0f4a0 00c029db d7bb7abc
> c041dedc d7bb7ae8 c0121b9c d7bb7abc 00c029db 0000007b c040e460 f6c41920

There should be some lines above this in your log...

BTW, the NMI watchdog is supposed to oops when your system hangs so we can
see where the hang is coming from. What was running when this oops happened?

> Call Trace:
> [<c0138240>] try_to_free_pages+0x9f/0x15f
> [<c0121b9c>] schedule_timeout+0x63/0xb7
> [<c0121b30>] process_timeout+0x0/0x9
> [<c01186f4>] io_schedule_timeout+0x11/0x19
> [<c024e343>] blk_congestion_wait+0x7e/0x8d
> [<c0118cd6>] autoremove_wake_function+0x0/0x4f
> [<c0118cd6>] autoremove_wake_function+0x0/0x4f
> [<c0132cd1>] __alloc_pages+0x294/0x319
> [<c012f3d5>] find_or_create_page+0xa0/0xaa
> [<c0214daf>] _pagebuf_lookup_pages+0x2fa/0x398
> [<c0215131>] pagebuf_get+0xba/0x135
> [<c0208a70>] xfs_trans_read_buf+0x32f/0x38b
> [<c01d8df1>] xfs_da_do_buf+0x6b1/0x9a6
> [<c01d9198>] xfs_da_read_buf+0x57/0x5b
> [<c01dcf5c>] xfs_dir2_block_lookup_int+0x52/0x192
> [<c01dcf5c>] xfs_dir2_block_lookup_int+0x52/0x192
> [<c01dce71>] xfs_dir2_block_lookup+0x2f/0xc8
> [<c01db3a8>] xfs_dir2_lookup+0xc4/0x13b
> [<c01db401>] xfs_dir2_lookup+0x11d/0x13b
> [<c0209afe>] xfs_dir_lookup_int+0x4c/0x12b
> [<c020f3d6>] xfs_lookup+0x50/0x88
> [<c021cae0>] linvfs_lookup+0x67/0x9f
> [<c0152ea2>] real_lookup+0xc8/0xea
> [<c01530f3>] do_lookup+0x96/0xa1
> [<c0153508>] link_path_walk+0x40a/0x7db
> [<c0154116>] open_namei+0x83/0x3e1
> [<c0146b0f>] filp_open+0x3e/0x64
> [<c0146e98>] sys_open+0x5b/0x8b
> [<c0108ab7>] syscall_call+0x7/0xb
>
>
> xfsdump D 9855CCF6 2760 676 673 (NOTLB)
> ece09b04 00000082 f7f9e080 9855ccf6 00000baf 00000000 9855ccf6 00000baf
> f7f9e080 000012f8 9855d254 00000baf f51106a0 f5110860 f7ffe760 00000000
> f7ffe778 ece09b0c c01186db ece09b3c c0131c2b 00000000 00000000 00000000
> Call Trace:
> [<c01186db>] io_schedule+0xe/0x16
> [<c0131c2b>] mempool_alloc+0xfa/0x117
> [<c0118cd6>] autoremove_wake_function+0x0/0x4f
> [<c021789b>] linvfs_get_block_core+0x87/0x2ab
> [<c0118cd6>] autoremove_wake_function+0x0/0x4f
> [<c0139296>] __blk_queue_bounce+0x1a1/0x232
> [<c013935d>] blk_queue_bounce+0x36/0x4d
> [<c024e510>] __make_request+0x4c/0x537
> [<c0162ff5>] do_mpage_readpage+0x1a9/0x32a
> [<c024eb04>] generic_make_request+0x109/0x18a
> [<c022765b>] radix_tree_node_alloc+0x1f/0x5a
> [<c02277f4>] radix_tree_insert+0x82/0xb8
> [<c024ebc2>] submit_bio+0x3d/0x6b
> [<c0162d26>] mpage_bio_submit+0x23/0x32
> [<c016324b>] mpage_readpages+0xd5/0x162
> [<c0217abf>] linvfs_get_block+0x0/0x43
> [<c0134460>] read_pages+0x134/0x13d
> [<c0217abf>] linvfs_get_block+0x0/0x43
> [<c0132ae3>] __alloc_pages+0xa6/0x319
> [<c010adcd>] do_IRQ+0xc4/0xdf
> [<c0109424>] common_interrupt+0x18/0x20
> [<c013469c>] do_page_cache_readahead+0xbf/0x109
> [<c0134852>] page_cache_readahead+0x16c/0x198
> [<c012f8d7>] do_generic_mapping_read+0x3c1/0x3d3
> [<c012f8e9>] file_read_actor+0x0/0xec
> [<c012fb91>] __generic_file_aio_read+0x1bc/0x1ee
> [<c012f8e9>] file_read_actor+0x0/0xec
> [<c021dcb2>] xfs_read+0x15a/0x26c
> [<c0117d92>] wait_for_completion+0x65/0x95
> [<c02181af>] linvfs_read_invis+0x90/0xa2
> [<c0147549>] do_sync_read+0x8b/0xb7
> [<c01217fa>] update_process_times+0x46/0x52
> [<c0121674>] update_wall_time+0xd/0x36
> [<c0121a6c>] do_timer+0xdf/0xe4
> [<c0147625>] vfs_read+0xb0/0x119
> [<c01478a0>] sys_read+0x42/0x63
> [<c0108ab7>] syscall_call+0x7/0xb
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2004-01-22 09:30:45

by Oliver Kiddle

[permalink] [raw]
Subject: Re: page allocation failure

Mike Fedyk wrote:
> On Tue, Jan 20, 2004 at 06:08:34PM +0100, Oliver Kiddle wrote:
> > the message about the tape device. As suggested by Mike Fedyk, I had
> > the nmi_watchdog stuff enabled. Didn't see any output from it though.
> > Would that have displayed its output to the console?
>
> It should have. Run cat /proc/interrupts and again afew seconds later, does
> the NMI: number change?

Yes, the number changes. Still haven't seen any output from it though.

> There should be some lines above this in your log...

Only the trace for other processes. Any initial part was lost, probably
because the task list overflowed the dmesg buffer. I didn't see anything
on the console though.

I got a few page allocation errors yesterday. As they now include
dump_stack() output, I have attached them below. This time, the system
kept going for a few minutes after these error messages. Again, when it
locked up, killing all processes with the sysrq key got things temporarily
back. I have the full dmesg output if anyone wants.

Oliver

st0: Block limits 1 - 16777215 bytes.
xfsdump: page allocation failure. order:9, mode:0xd0
Call Trace:
[<c0132d18>] __alloc_pages+0x2db/0x319
[<c02a5dc9>] enlarge_buffer+0xcf/0x182
[<c02a6cd9>] st_map_user_pages+0x37/0x88
[<c02a2909>] setup_buffering+0xf3/0x127
[<c02a3690>] st_read+0xe0/0x3d1
[<c0147625>] vfs_read+0xb0/0x119
[<c01478a0>] sys_read+0x42/0x63
[<c0108ab7>] syscall_call+0x7/0xb

xfsdump: page allocation failure. order:8, mode:0xd0
Call Trace:
[<c0132d18>] __alloc_pages+0x2db/0x319
[<c02a5dc9>] enlarge_buffer+0xcf/0x182
[<c02a6cd9>] st_map_user_pages+0x37/0x88
[<c02a2909>] setup_buffering+0xf3/0x127
[<c02a3690>] st_read+0xe0/0x3d1
[<c0147625>] vfs_read+0xb0/0x119
[<c01478a0>] sys_read+0x42/0x63
[<c0108ab7>] syscall_call+0x7/0xb

xfsdump: page allocation failure. order:7, mode:0xd0
Call Trace:
[<c0132d18>] __alloc_pages+0x2db/0x319
[<c02a5dc9>] enlarge_buffer+0xcf/0x182
[<c02a6cd9>] st_map_user_pages+0x37/0x88
[<c02a2909>] setup_buffering+0xf3/0x127
[<c02a3690>] st_read+0xe0/0x3d1
[<c0147625>] vfs_read+0xb0/0x119
[<c01478a0>] sys_read+0x42/0x63
[<c0108ab7>] syscall_call+0x7/0xb

st0: Incorrect block size.
xfsdump: page allocation failure. order:9, mode:0xd0
Call Trace:
[<c0132d18>] __alloc_pages+0x2db/0x319
[<c02a5dc9>] enlarge_buffer+0xcf/0x182
[<c02a6cd9>] st_map_user_pages+0x37/0x88
[<c02a2909>] setup_buffering+0xf3/0x127
[<c02a2b86>] st_write+0x20c/0x7e7
[<c0115ecb>] do_page_fault+0x120/0x501
[<c02a297a>] st_write+0x0/0x7e7
[<c01477f5>] vfs_write+0xb0/0x119
[<c0147903>] sys_write+0x42/0x63
[<c0108ab7>] syscall_call+0x7/0xb

xfsdump: page allocation failure. order:8, mode:0xd0
Call Trace:
[<c0132d18>] __alloc_pages+0x2db/0x319
[<c02a5dc9>] enlarge_buffer+0xcf/0x182
[<c02a6cd9>] st_map_user_pages+0x37/0x88
[<c02a2909>] setup_buffering+0xf3/0x127
[<c02a2b86>] st_write+0x20c/0x7e7
[<c0115ecb>] do_page_fault+0x120/0x501
[<c02a297a>] st_write+0x0/0x7e7
[<c01477f5>] vfs_write+0xb0/0x119
[<c0147903>] sys_write+0x42/0x63
[<c0108ab7>] syscall_call+0x7/0xb

xfsdump: page allocation failure. order:7, mode:0xd0
Call Trace:
[<c0132d18>] __alloc_pages+0x2db/0x319
[<c02a5dc9>] enlarge_buffer+0xcf/0x182
[<c02a6cd9>] st_map_user_pages+0x37/0x88
[<c02a2909>] setup_buffering+0xf3/0x127
[<c02a2b86>] st_write+0x20c/0x7e7
[<c0115ecb>] do_page_fault+0x120/0x501
[<c02a297a>] st_write+0x0/0x7e7
[<c01477f5>] vfs_write+0xb0/0x119
[<c0147903>] sys_write+0x42/0x63
[<c0108ab7>] syscall_call+0x7/0xb

2004-01-22 09:58:25

by Andrew Morton

[permalink] [raw]
Subject: Re: page allocation failure

Oliver Kiddle <[email protected]> wrote:
>
> st0: Block limits 1 - 16777215 bytes.
> xfsdump: page allocation failure. order:9, mode:0xd0
> Call Trace:
> [<c0132d18>] __alloc_pages+0x2db/0x319
> [<c02a5dc9>] enlarge_buffer+0xcf/0x182
> [<c02a6cd9>] st_map_user_pages+0x37/0x88
> [<c02a2909>] setup_buffering+0xf3/0x127
> [<c02a3690>] st_read+0xe0/0x3d1
> [<c0147625>] vfs_read+0xb0/0x119
> [<c01478a0>] sys_read+0x42/0x63
> [<c0108ab7>] syscall_call+0x7/0xb

This one's actually somewhat OK. The tape driver is simply trying to
allocate a huge buffer and is falling back if it fails.

This will shut up the debugging code:

--- 25/drivers/scsi/osst.c~osst-warning-fix 2004-01-22 01:57:35.000000000 -0800
+++ 25-akpm/drivers/scsi/osst.c 2004-01-22 01:57:59.000000000 -0800
@@ -5106,6 +5106,8 @@ static int enlarge_buffer(OSST_buffer *S
if (need_dma)
priority |= GFP_DMA;

+ priority |= __GFP_NOWARN;
+
/* Try to allocate the first segment up to OS_DATA_SIZE and the others
big enough to reach the goal (code assumes no segments in place) */
for (b_size = OS_DATA_SIZE, order = OSST_FIRST_ORDER; b_size >= PAGE_SIZE; order--, b_size /= 2) {

_