2008-03-20 03:09:04

by Frank van Maarseveen

[permalink] [raw]
Subject: 2.6.24.3 kernel BUG at fs/nfs/pagelist.c:82

FYI,

2.6.24.3 wrote:
> kernel BUG at fs/nfs/pagelist.c:82!

BUG_ON(PagePrivate(page));

> invalid opcode: 0000 [#1] SMP
> Modules linked in: vmnetfilter vmnet(P) vmmon(P) vmthrottle

In addition, there are some NFS patches for handling >16 groups and
selectively disabling attribute caching so its not a clean kernel.

>
> Pid: 4575, comm: tail Tainted: P (2.6.24.3-x177 #1)
> EIP: 0060:[<c01fc084>] EFLAGS: 00010202 CPU: 1
> EIP is at nfs_create_request+0xf4/0x100
> EAX: 80000821 EBX: e31a5300 ECX: 00000000 EDX: c1f0712c
> ESI: c1f0712c EDI: e31a5338 EBP: e56dfd90 ESP: e56dfd74
> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process tail (pid: 4575, ti=e56de000 task=d4b65500 task.ti=e56de000)
> Stack: 00000000 f669ad20 cac3c168 e7330cb0 00000000 00000000 cac3c168 e56dfdc8
> c01fded5 00000000 000000a4 039cffff 000000a4 c1f0712c cac3c168 e7330cb0
> e56dfdb4 e56dfdb4 ffffff8c c1f0712c e7330cb0 e56dfdf0 c01fe8ce e56dfddc
> Call Trace:
> [<c010562a>] show_trace_log_lvl+0x1a/0x30
> [<c01056fa>] show_stack_log_lvl+0x9a/0xc0
> [<c01058a8>] show_registers+0xc8/0x1d0
> [<c0105b1c>] die+0x10c/0x230
> [<c0105cd1>] do_trap+0x91/0xd0
> [<c0105f79>] do_invalid_op+0x89/0xa0
> [<c05bba02>] error_code+0x72/0x80
> [<c01fded5>] nfs_readpage_async+0xb5/0x1b0
> [<c01fe8ce>] nfs_readpage+0xae/0x120
> [<c0155268>] do_generic_mapping_read+0xe8/0x440
> [<c01558b0>] generic_file_aio_read+0x160/0x190
> [<c01f4fb7>] nfs_file_read+0x97/0xe0
> [<c01795e7>] do_sync_read+0xc7/0x120
> [<c01796c4>] vfs_read+0x84/0x130
> [<c01799fd>] sys_read+0x3d/0x70
> [<c0104292>] syscall_call+0x7/0xb
> =======================
> Code: 02 75 0a e8 4f dc 3b 00 e9 4a ff ff ff 83 c4 10 b8 00 fe ff ff 5b 5e 5f 5d c3 8b 56 0c e9 7a ff ff ff 0f 0b eb fe 90 0f 0b eb fe <0f> 0b eb fe 90 8d b4 26 00 00 00 00 55 89 e5 53 83 ec 04 89 c3
> EIP: [<c01fc084>] nfs_create_request+0xf4/0x100 SS:ESP 0068:e56dfd74
> ---[ end trace 0ef921372ea6410b ]---

The machine is a quad Xeon with 4GB ram with CONFIG_HIGHMEM64G=y

--
Frank


2008-03-20 12:47:22

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.6.24.3 kernel BUG at fs/nfs/pagelist.c:82


On Wed, 2008-03-19 at 10:49 +0100, Frank van Maarseveen wrote:
> FYI,
>
> 2.6.24.3 wrote:
> > kernel BUG at fs/nfs/pagelist.c:82!
>
> BUG_ON(PagePrivate(page));
>
> > invalid opcode: 0000 [#1] SMP
> > Modules linked in: vmnetfilter vmnet(P) vmmon(P) vmthrottle
>
> In addition, there are some NFS patches for handling >16 groups and
> selectively disabling attribute caching so its not a clean kernel.
>
> >
> > Pid: 4575, comm: tail Tainted: P (2.6.24.3-x177 #1)
> > EIP: 0060:[<c01fc084>] EFLAGS: 00010202 CPU: 1
> > EIP is at nfs_create_request+0xf4/0x100
> > EAX: 80000821 EBX: e31a5300 ECX: 00000000 EDX: c1f0712c
> > ESI: c1f0712c EDI: e31a5338 EBP: e56dfd90 ESP: e56dfd74
> > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > Process tail (pid: 4575, ti=e56de000 task=d4b65500 task.ti=e56de000)
> > Stack: 00000000 f669ad20 cac3c168 e7330cb0 00000000 00000000 cac3c168 e56dfdc8
> > c01fded5 00000000 000000a4 039cffff 000000a4 c1f0712c cac3c168 e7330cb0
> > e56dfdb4 e56dfdb4 ffffff8c c1f0712c e7330cb0 e56dfdf0 c01fe8ce e56dfddc
> > Call Trace:
> > [<c010562a>] show_trace_log_lvl+0x1a/0x30
> > [<c01056fa>] show_stack_log_lvl+0x9a/0xc0
> > [<c01058a8>] show_registers+0xc8/0x1d0
> > [<c0105b1c>] die+0x10c/0x230
> > [<c0105cd1>] do_trap+0x91/0xd0
> > [<c0105f79>] do_invalid_op+0x89/0xa0
> > [<c05bba02>] error_code+0x72/0x80
> > [<c01fded5>] nfs_readpage_async+0xb5/0x1b0
> > [<c01fe8ce>] nfs_readpage+0xae/0x120
> > [<c0155268>] do_generic_mapping_read+0xe8/0x440
> > [<c01558b0>] generic_file_aio_read+0x160/0x190
> > [<c01f4fb7>] nfs_file_read+0x97/0xe0
> > [<c01795e7>] do_sync_read+0xc7/0x120
> > [<c01796c4>] vfs_read+0x84/0x130
> > [<c01799fd>] sys_read+0x3d/0x70
> > [<c0104292>] syscall_call+0x7/0xb
> > =======================
> > Code: 02 75 0a e8 4f dc 3b 00 e9 4a ff ff ff 83 c4 10 b8 00 fe ff ff 5b 5e 5f 5d c3 8b 56 0c e9 7a ff ff ff 0f 0b eb fe 90 0f 0b eb fe <0f> 0b eb fe 90 8d b4 26 00 00 00 00 55 89 e5 53 83 ec 04 89 c3
> > EIP: [<c01fc084>] nfs_create_request+0xf4/0x100 SS:ESP 0068:e56dfd74
> > ---[ end trace 0ef921372ea6410b ]---
>
> The machine is a quad Xeon with 4GB ram with CONFIG_HIGHMEM64G=y

Would that be on a file that was open for read and write, or is it
possible that some other process was writing to the same file? If so,
then it might be a bug in nfs_wb_page(). Otherwise, then it has to be
either a VM bug, or a memory corruption issue.

Cheers
Trond


2008-03-20 12:57:19

by Frank van Maarseveen

[permalink] [raw]
Subject: Re: 2.6.24.3 kernel BUG at fs/nfs/pagelist.c:82

On Thu, Mar 20, 2008 at 08:47:13AM -0400, Trond Myklebust wrote:
>
> On Wed, 2008-03-19 at 10:49 +0100, Frank van Maarseveen wrote:
> > FYI,
> >
> > 2.6.24.3 wrote:
> > > kernel BUG at fs/nfs/pagelist.c:82!
> >
> > BUG_ON(PagePrivate(page));
> >
> > > invalid opcode: 0000 [#1] SMP
> > > Modules linked in: vmnetfilter vmnet(P) vmmon(P) vmthrottle
> >
> > In addition, there are some NFS patches for handling >16 groups and
> > selectively disabling attribute caching so its not a clean kernel.
> >
> > >
> > > Pid: 4575, comm: tail Tainted: P (2.6.24.3-x177 #1)
^^^^
> > > EIP: 0060:[<c01fc084>] EFLAGS: 00010202 CPU: 1
> > > EIP is at nfs_create_request+0xf4/0x100
> > > EAX: 80000821 EBX: e31a5300 ECX: 00000000 EDX: c1f0712c
> > > ESI: c1f0712c EDI: e31a5338 EBP: e56dfd90 ESP: e56dfd74
> > > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > > Process tail (pid: 4575, ti=e56de000 task=d4b65500 task.ti=e56de000)
> > > Stack: 00000000 f669ad20 cac3c168 e7330cb0 00000000 00000000 cac3c168 e56dfdc8
> > > c01fded5 00000000 000000a4 039cffff 000000a4 c1f0712c cac3c168 e7330cb0
> > > e56dfdb4 e56dfdb4 ffffff8c c1f0712c e7330cb0 e56dfdf0 c01fe8ce e56dfddc
> > > Call Trace:
> > > [<c010562a>] show_trace_log_lvl+0x1a/0x30
> > > [<c01056fa>] show_stack_log_lvl+0x9a/0xc0
> > > [<c01058a8>] show_registers+0xc8/0x1d0
> > > [<c0105b1c>] die+0x10c/0x230
> > > [<c0105cd1>] do_trap+0x91/0xd0
> > > [<c0105f79>] do_invalid_op+0x89/0xa0
> > > [<c05bba02>] error_code+0x72/0x80
> > > [<c01fded5>] nfs_readpage_async+0xb5/0x1b0
> > > [<c01fe8ce>] nfs_readpage+0xae/0x120
> > > [<c0155268>] do_generic_mapping_read+0xe8/0x440
> > > [<c01558b0>] generic_file_aio_read+0x160/0x190
> > > [<c01f4fb7>] nfs_file_read+0x97/0xe0
> > > [<c01795e7>] do_sync_read+0xc7/0x120
> > > [<c01796c4>] vfs_read+0x84/0x130
> > > [<c01799fd>] sys_read+0x3d/0x70
> > > [<c0104292>] syscall_call+0x7/0xb
> > > =======================
> > > Code: 02 75 0a e8 4f dc 3b 00 e9 4a ff ff ff 83 c4 10 b8 00 fe ff ff 5b 5e 5f 5d c3 8b 56 0c e9 7a ff ff ff 0f 0b eb fe 90 0f 0b eb fe <0f> 0b eb fe 90 8d b4 26 00 00 00 00 55 89 e5 53 83 ec 04 89 c3
> > > EIP: [<c01fc084>] nfs_create_request+0xf4/0x100 SS:ESP 0068:e56dfd74
> > > ---[ end trace 0ef921372ea6410b ]---
> >
> > The machine is a quad Xeon with 4GB ram with CONFIG_HIGHMEM64G=y
>
> Would that be on a file that was open for read and write, or is it
> possible that some other process was writing to the same file? If so,
> then it might be a bug in nfs_wb_page().

Yes, I'm quite sure it was a "tail -f" on a logfile which gets
continuously appended to by another process.. So, one process reads it
while another one writes to it through different descriptors/struct file.

--
Frank

2008-04-12 10:10:19

by Frank van Maarseveen

[permalink] [raw]
Subject: Re: 2.6.24.3 kernel BUG at fs/nfs/pagelist.c:82


It happened again. tail -f died with the usual stack trace:
> [<c010562a>] show_trace_log_lvl+0x1a/0x30
> [<c01056fa>] show_stack_log_lvl+0x9a/0xc0
> [<c01058a8>] show_registers+0xc8/0x1d0
> [<c0105b1c>] die+0x10c/0x230
> [<c0105cd1>] do_trap+0x91/0xd0
> [<c0105f79>] do_invalid_op+0x89/0xa0
> [<c05bba62>] error_code+0x72/0x80
> [<c01fdf55>] nfs_readpage_async+0xb5/0x1b0
> [<c01fe94e>] nfs_readpage+0xae/0x120
> [<c01552b8>] do_generic_mapping_read+0xe8/0x440
> [<c0155900>] generic_file_aio_read+0x160/0x190
> [<c01f5037>] nfs_file_read+0x97/0xe0
> [<c0179647>] do_sync_read+0xc7/0x120
> [<c0179724>] vfs_read+0x84/0x130
> [<c0179a5d>] sys_read+0x3d/0x70
> [<c0104292>] syscall_call+0x7/0xb

And this is the trace of the writer, now stuck in state D (using echo
t >/proc/sysrq-trigger):
> [<c05b9d4e>] io_schedule+0x1e/0x30
> [<c0154465>] sync_page+0x35/0x60
> [<c05b9fc9>] __wait_on_bit_lock+0x49/0x70
> [<c0154cd5>] __lock_page+0x85/0xa0
> [<c0154e62>] find_lock_page+0x62/0xa0
> [<c0156a5c>] __grab_cache_page+0x1c/0xb0
> [<c01f5258>] nfs_write_begin+0x18/0x60
> [<c0156ebe>] generic_perform_write+0x9e/0x180
> [<c0157007>] generic_file_buffered_write+0x67/0x110
> [<c01572f7>] __generic_file_aio_write_nolock+0x247/0x560
> [<c015774c>] generic_file_aio_write+0x5c/0xd0
> [<c01f55d7>] nfs_file_write+0xa7/0x150
> [<c0179897>] do_sync_write+0xc7/0x120
> [<c0179977>] vfs_write+0x87/0x130
> [<c0179acd>] sys_write+0x3d/0x70
> [<c0104292>] syscall_call+0x7/0xb

The file is not mmap()'ed, at least not by reader or writer in userland.

--
Frank

2008-04-10 11:54:35

by Frank van Maarseveen

[permalink] [raw]
Subject: Re: 2.6.24.3 kernel BUG at fs/nfs/pagelist.c:82

FYI,

On Thu, Mar 20, 2008 at 01:57:16PM +0100, Frank van Maarseveen wrote:
> On Thu, Mar 20, 2008 at 08:47:13AM -0400, Trond Myklebust wrote:
> >
> > On Wed, 2008-03-19 at 10:49 +0100, Frank van Maarseveen wrote:
> > > FYI,
> > >
> > > 2.6.24.3 wrote:
> > > > kernel BUG at fs/nfs/pagelist.c:82!
> > >
> > > BUG_ON(PagePrivate(page));
> > >
> > > > invalid opcode: 0000 [#1] SMP
> > > > Modules linked in: vmnetfilter vmnet(P) vmmon(P) vmthrottle
> > >
> > > In addition, there are some NFS patches for handling >16 groups and
> > > selectively disabling attribute caching so its not a clean kernel.
> > >
> > > >
> > > > Pid: 4575, comm: tail Tainted: P (2.6.24.3-x177 #1)
> ^^^^
> > > > EIP: 0060:[<c01fc084>] EFLAGS: 00010202 CPU: 1
> > > > EIP is at nfs_create_request+0xf4/0x100
> > > > EAX: 80000821 EBX: e31a5300 ECX: 00000000 EDX: c1f0712c
> > > > ESI: c1f0712c EDI: e31a5338 EBP: e56dfd90 ESP: e56dfd74
> > > > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > > > Process tail (pid: 4575, ti=e56de000 task=d4b65500 task.ti=e56de000)
> > > > Stack: 00000000 f669ad20 cac3c168 e7330cb0 00000000 00000000 cac3c168 e56dfdc8
> > > > c01fded5 00000000 000000a4 039cffff 000000a4 c1f0712c cac3c168 e7330cb0
> > > > e56dfdb4 e56dfdb4 ffffff8c c1f0712c e7330cb0 e56dfdf0 c01fe8ce e56dfddc
> > > > Call Trace:
> > > > [<c010562a>] show_trace_log_lvl+0x1a/0x30
> > > > [<c01056fa>] show_stack_log_lvl+0x9a/0xc0
> > > > [<c01058a8>] show_registers+0xc8/0x1d0
> > > > [<c0105b1c>] die+0x10c/0x230
> > > > [<c0105cd1>] do_trap+0x91/0xd0
> > > > [<c0105f79>] do_invalid_op+0x89/0xa0
> > > > [<c05bba02>] error_code+0x72/0x80
> > > > [<c01fded5>] nfs_readpage_async+0xb5/0x1b0
> > > > [<c01fe8ce>] nfs_readpage+0xae/0x120
> > > > [<c0155268>] do_generic_mapping_read+0xe8/0x440
> > > > [<c01558b0>] generic_file_aio_read+0x160/0x190
> > > > [<c01f4fb7>] nfs_file_read+0x97/0xe0
> > > > [<c01795e7>] do_sync_read+0xc7/0x120
> > > > [<c01796c4>] vfs_read+0x84/0x130
> > > > [<c01799fd>] sys_read+0x3d/0x70
> > > > [<c0104292>] syscall_call+0x7/0xb
> > > > =======================
> > > > Code: 02 75 0a e8 4f dc 3b 00 e9 4a ff ff ff 83 c4 10 b8 00 fe ff ff 5b 5e 5f 5d c3 8b 56 0c e9 7a ff ff ff 0f 0b eb fe 90 0f 0b eb fe <0f> 0b eb fe 90 8d b4 26 00 00 00 00 55 89 e5 53 83 ec 04 89 c3
> > > > EIP: [<c01fc084>] nfs_create_request+0xf4/0x100 SS:ESP 0068:e56dfd74
> > > > ---[ end trace 0ef921372ea6410b ]---
> > >
> > > The machine is a quad Xeon with 4GB ram with CONFIG_HIGHMEM64G=y
> >
> > Would that be on a file that was open for read and write, or is it
> > possible that some other process was writing to the same file? If so,
> > then it might be a bug in nfs_wb_page().
>
> Yes, I'm quite sure it was a "tail -f" on a logfile which gets
> continuously appended to by another process.. So, one process reads it
> while another one writes to it through different descriptors/struct file.

The problem occurred again on a different box under exactly the same
userland conditions yielding exactly the same stack trace. Kernels are
identical but no vmware modules this time.

--
Frank

2008-04-11 20:07:25

by Myklebust, Trond

[permalink] [raw]
Subject: Re: 2.6.24.3 kernel BUG at fs/nfs/pagelist.c:82


On Thu, 2008-04-10 at 13:54 +0200, Frank van Maarseveen wrote:
> FYI,
>
> On Thu, Mar 20, 2008 at 01:57:16PM +0100, Frank van Maarseveen wrote:
> > On Thu, Mar 20, 2008 at 08:47:13AM -0400, Trond Myklebust wrote:
> > >
> > > On Wed, 2008-03-19 at 10:49 +0100, Frank van Maarseveen wrote:
> > > > FYI,
> > > >
> > > > 2.6.24.3 wrote:
> > > > > kernel BUG at fs/nfs/pagelist.c:82!
> > > >
> > > > BUG_ON(PagePrivate(page));
> > > >
> > > > > invalid opcode: 0000 [#1] SMP
> > > > > Modules linked in: vmnetfilter vmnet(P) vmmon(P) vmthrottle
> > > >
> > > > In addition, there are some NFS patches for handling >16 groups and
> > > > selectively disabling attribute caching so its not a clean kernel.
> > > >
> > > > >
> > > > > Pid: 4575, comm: tail Tainted: P (2.6.24.3-x177 #1)
> > ^^^^
> > > > > EIP: 0060:[<c01fc084>] EFLAGS: 00010202 CPU: 1
> > > > > EIP is at nfs_create_request+0xf4/0x100
> > > > > EAX: 80000821 EBX: e31a5300 ECX: 00000000 EDX: c1f0712c
> > > > > ESI: c1f0712c EDI: e31a5338 EBP: e56dfd90 ESP: e56dfd74
> > > > > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > > > > Process tail (pid: 4575, ti=e56de000 task=d4b65500 task.ti=e56de000)
> > > > > Stack: 00000000 f669ad20 cac3c168 e7330cb0 00000000 00000000 cac3c168 e56dfdc8
> > > > > c01fded5 00000000 000000a4 039cffff 000000a4 c1f0712c cac3c168 e7330cb0
> > > > > e56dfdb4 e56dfdb4 ffffff8c c1f0712c e7330cb0 e56dfdf0 c01fe8ce e56dfddc
> > > > > Call Trace:
> > > > > [<c010562a>] show_trace_log_lvl+0x1a/0x30
> > > > > [<c01056fa>] show_stack_log_lvl+0x9a/0xc0
> > > > > [<c01058a8>] show_registers+0xc8/0x1d0
> > > > > [<c0105b1c>] die+0x10c/0x230
> > > > > [<c0105cd1>] do_trap+0x91/0xd0
> > > > > [<c0105f79>] do_invalid_op+0x89/0xa0
> > > > > [<c05bba02>] error_code+0x72/0x80
> > > > > [<c01fded5>] nfs_readpage_async+0xb5/0x1b0
> > > > > [<c01fe8ce>] nfs_readpage+0xae/0x120
> > > > > [<c0155268>] do_generic_mapping_read+0xe8/0x440
> > > > > [<c01558b0>] generic_file_aio_read+0x160/0x190
> > > > > [<c01f4fb7>] nfs_file_read+0x97/0xe0
> > > > > [<c01795e7>] do_sync_read+0xc7/0x120
> > > > > [<c01796c4>] vfs_read+0x84/0x130
> > > > > [<c01799fd>] sys_read+0x3d/0x70
> > > > > [<c0104292>] syscall_call+0x7/0xb
> > > > > =======================
> > > > > Code: 02 75 0a e8 4f dc 3b 00 e9 4a ff ff ff 83 c4 10 b8 00 fe ff ff 5b 5e 5f 5d c3 8b 56 0c e9 7a ff ff ff 0f 0b eb fe 90 0f 0b eb fe <0f> 0b eb fe 90 8d b4 26 00 00 00 00 55 89 e5 53 83 ec 04 89 c3
> > > > > EIP: [<c01fc084>] nfs_create_request+0xf4/0x100 SS:ESP 0068:e56dfd74
> > > > > ---[ end trace 0ef921372ea6410b ]---
> > > >
> > > > The machine is a quad Xeon with 4GB ram with CONFIG_HIGHMEM64G=y
> > >
> > > Would that be on a file that was open for read and write, or is it
> > > possible that some other process was writing to the same file? If so,
> > > then it might be a bug in nfs_wb_page().
> >
> > Yes, I'm quite sure it was a "tail -f" on a logfile which gets
> > continuously appended to by another process.. So, one process reads it
> > while another one writes to it through different descriptors/struct file.
>
> The problem occurred again on a different box under exactly the same
> userland conditions yielding exactly the same stack trace. Kernels are
> identical but no vmware modules this time.

Just a quick question: how does your > 16 groups patch behave when it is
denied a write with an EACCES error? I've got a feeling that this may be
due to the page getting redirtied and the RPC call retried. If so, then
the following patch may help.

Cheers
Trond
--------------------------------------------------------------------------
From: Trond Myklebust <[email protected]>
Date: Fri, 11 Apr 2008 16:03:54 -0400
Subject: NFS: Fix nfs_wb_page() to always exit with an error or a clean page

It is possible for nfs_wb_page() to sometimes exit with 0 return value, yet
the page is left in a dirty state.
For instance in the case where the server rebooted, and the COMMIT request
failed, then all the previously "clean" pages which were cached by the
server, but were not guaranteed to have been writted out to disk,
have to be redirtied and resent to the server.
The fix is to have nfs_wb_page_priority() check that the page is clean
before it exits...

This fixes a condition that triggers the BUG_ON(PagePrivate(page)) in
nfs_create_request() when we're in the nfs_readpage() path.

Also eliminate a redundant BUG_ON(!PageLocked(page)) while we're at it. It
turns out that clear_page_dirty_for_io() has the exact same test.

Signed-off-by: Trond Myklebust <[email protected]>
---

fs/nfs/write.c | 24 +++++++++++++-----------
1 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ce40cad..3a2e821 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1493,18 +1493,20 @@ static int nfs_wb_page_priority(struct inode *inode, struct page *page,
};
int ret;

- BUG_ON(!PageLocked(page));
- if (clear_page_dirty_for_io(page)) {
- ret = nfs_writepage_locked(page, &wbc);
+ do {
+ if (clear_page_dirty_for_io(page)) {
+ ret = nfs_writepage_locked(page, &wbc);
+ if (ret < 0)
+ goto out_error;
+ } else if (!PagePrivate(page))
+ break;
+ ret = nfs_sync_mapping_wait(page->mapping, &wbc, how);
if (ret < 0)
- goto out;
- }
- if (!PagePrivate(page))
- return 0;
- ret = nfs_sync_mapping_wait(page->mapping, &wbc, how);
- if (ret >= 0)
- return 0;
-out:
+ goto out_error;
+ /* Is the test for PageDirty() really needed here? */
+ } while (PagePrivate(page) || PageDirty(page));
+ return 0;
+out_error:
__mark_inode_dirty(inode, I_DIRTY_PAGES);
return ret;
}

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2008-04-11 20:11:01

by Peter Staubach

[permalink] [raw]
Subject: Re: 2.6.24.3 kernel BUG at fs/nfs/pagelist.c:82

Trond Myklebust wrote:
> On Thu, 2008-04-10 at 13:54 +0200, Frank van Maarseveen wrote:
>
>> FYI,
>>
>> On Thu, Mar 20, 2008 at 01:57:16PM +0100, Frank van Maarseveen wrote:
>>
>>> On Thu, Mar 20, 2008 at 08:47:13AM -0400, Trond Myklebust wrote:
>>>
>>>> On Wed, 2008-03-19 at 10:49 +0100, Frank van Maarseveen wrote:
>>>>
>>>>> FYI,
>>>>>
>>>>> 2.6.24.3 wrote:
>>>>>
>>>>>> kernel BUG at fs/nfs/pagelist.c:82!
>>>>>>
>>>>> BUG_ON(PagePrivate(page));
>>>>>
>>>>>
>>>>>> invalid opcode: 0000 [#1] SMP
>>>>>> Modules linked in: vmnetfilter vmnet(P) vmmon(P) vmthrottle
>>>>>>
>>>>> In addition, there are some NFS patches for handling >16 groups and
>>>>> selectively disabling attribute caching so its not a clean kernel.
>>>>>
>>>>>
>>>>>> Pid: 4575, comm: tail Tainted: P (2.6.24.3-x177 #1)
>>>>>>
>>> ^^^^
>>>
>>>>>> EIP: 0060:[<c01fc084>] EFLAGS: 00010202 CPU: 1
>>>>>> EIP is at nfs_create_request+0xf4/0x100
>>>>>> EAX: 80000821 EBX: e31a5300 ECX: 00000000 EDX: c1f0712c
>>>>>> ESI: c1f0712c EDI: e31a5338 EBP: e56dfd90 ESP: e56dfd74
>>>>>> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
>>>>>> Process tail (pid: 4575, ti=e56de000 task=d4b65500 task.ti=e56de000)
>>>>>> Stack: 00000000 f669ad20 cac3c168 e7330cb0 00000000 00000000 cac3c168 e56dfdc8
>>>>>> c01fded5 00000000 000000a4 039cffff 000000a4 c1f0712c cac3c168 e7330cb0
>>>>>> e56dfdb4 e56dfdb4 ffffff8c c1f0712c e7330cb0 e56dfdf0 c01fe8ce e56dfddc
>>>>>> Call Trace:
>>>>>> [<c010562a>] show_trace_log_lvl+0x1a/0x30
>>>>>> [<c01056fa>] show_stack_log_lvl+0x9a/0xc0
>>>>>> [<c01058a8>] show_registers+0xc8/0x1d0
>>>>>> [<c0105b1c>] die+0x10c/0x230
>>>>>> [<c0105cd1>] do_trap+0x91/0xd0
>>>>>> [<c0105f79>] do_invalid_op+0x89/0xa0
>>>>>> [<c05bba02>] error_code+0x72/0x80
>>>>>> [<c01fded5>] nfs_readpage_async+0xb5/0x1b0
>>>>>> [<c01fe8ce>] nfs_readpage+0xae/0x120
>>>>>> [<c0155268>] do_generic_mapping_read+0xe8/0x440
>>>>>> [<c01558b0>] generic_file_aio_read+0x160/0x190
>>>>>> [<c01f4fb7>] nfs_file_read+0x97/0xe0
>>>>>> [<c01795e7>] do_sync_read+0xc7/0x120
>>>>>> [<c01796c4>] vfs_read+0x84/0x130
>>>>>> [<c01799fd>] sys_read+0x3d/0x70
>>>>>> [<c0104292>] syscall_call+0x7/0xb
>>>>>> =======================
>>>>>> Code: 02 75 0a e8 4f dc 3b 00 e9 4a ff ff ff 83 c4 10 b8 00 fe ff ff 5b 5e 5f 5d c3 8b 56 0c e9 7a ff ff ff 0f 0b eb fe 90 0f 0b eb fe <0f> 0b eb fe 90 8d b4 26 00 00 00 00 55 89 e5 53 83 ec 04 89 c3
>>>>>> EIP: [<c01fc084>] nfs_create_request+0xf4/0x100 SS:ESP 0068:e56dfd74
>>>>>> ---[ end trace 0ef921372ea6410b ]---
>>>>>>
>>>>> The machine is a quad Xeon with 4GB ram with CONFIG_HIGHMEM64G=y
>>>>>
>>>> Would that be on a file that was open for read and write, or is it
>>>> possible that some other process was writing to the same file? If so,
>>>> then it might be a bug in nfs_wb_page().
>>>>
>>> Yes, I'm quite sure it was a "tail -f" on a logfile which gets
>>> continuously appended to by another process.. So, one process reads it
>>> while another one writes to it through different descriptors/struct file.
>>>
>> The problem occurred again on a different box under exactly the same
>> userland conditions yielding exactly the same stack trace. Kernels are
>> identical but no vmware modules this time.
>>
>
> Just a quick question: how does your > 16 groups patch behave when it is
> denied a write with an EACCES error? I've got a feeling that this may be
> due to the page getting redirtied and the RPC call retried. If so, then
> the following patch may help.
>
> Cheers
> Trond
> --------------------------------------------------------------------------
> From: Trond Myklebust <[email protected]>
> Date: Fri, 11 Apr 2008 16:03:54 -0400
> Subject: NFS: Fix nfs_wb_page() to always exit with an error or a clean page
>
> It is possible for nfs_wb_page() to sometimes exit with 0 return value, yet
> the page is left in a dirty state.
> For instance in the case where the server rebooted, and the COMMIT request
> failed, then all the previously "clean" pages which were cached by the
> server, but were not guaranteed to have been writted out to disk,
> have to be redirtied and resent to the server.
> The fix is to have nfs_wb_page_priority() check that the page is clean
> before it exits...
>
>

Is this a problem if the page is mmap'd into a process address
space which is constantly touching it?

Thanx...

ps

> This fixes a condition that triggers the BUG_ON(PagePrivate(page)) in
> nfs_create_request() when we're in the nfs_readpage() path.
>
> Also eliminate a redundant BUG_ON(!PageLocked(page)) while we're at it. It
> turns out that clear_page_dirty_for_io() has the exact same test.
>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
>
> fs/nfs/write.c | 24 +++++++++++++-----------
> 1 files changed, 13 insertions(+), 11 deletions(-)
>
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index ce40cad..3a2e821 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -1493,18 +1493,20 @@ static int nfs_wb_page_priority(struct inode *inode, struct page *page,
> };
> int ret;
>
> - BUG_ON(!PageLocked(page));
> - if (clear_page_dirty_for_io(page)) {
> - ret = nfs_writepage_locked(page, &wbc);
> + do {
> + if (clear_page_dirty_for_io(page)) {
> + ret = nfs_writepage_locked(page, &wbc);
> + if (ret < 0)
> + goto out_error;
> + } else if (!PagePrivate(page))
> + break;
> + ret = nfs_sync_mapping_wait(page->mapping, &wbc, how);
> if (ret < 0)
> - goto out;
> - }
> - if (!PagePrivate(page))
> - return 0;
> - ret = nfs_sync_mapping_wait(page->mapping, &wbc, how);
> - if (ret >= 0)
> - return 0;
> -out:
> + goto out_error;
> + /* Is the test for PageDirty() really needed here? */
> + } while (PagePrivate(page) || PageDirty(page));
> + return 0;
> +out_error:
> __mark_inode_dirty(inode, I_DIRTY_PAGES);
> return ret;
> }
>
>


2008-04-11 20:44:42

by Myklebust, Trond

[permalink] [raw]
Subject: Re: 2.6.24.3 kernel BUG at fs/nfs/pagelist.c:82


On Fri, 2008-04-11 at 16:10 -0400, Peter Staubach wrote:
> Trond Myklebust wrote:
> > On Thu, 2008-04-10 at 13:54 +0200, Frank van Maarseveen wrote:
> >
> >> FYI,
> >>
> >> On Thu, Mar 20, 2008 at 01:57:16PM +0100, Frank van Maarseveen wrote:
> >>
> >>> On Thu, Mar 20, 2008 at 08:47:13AM -0400, Trond Myklebust wrote:
> >>>
> >>>> On Wed, 2008-03-19 at 10:49 +0100, Frank van Maarseveen wrote:
> >>>>
> >>>>> FYI,
> >>>>>
> >>>>> 2.6.24.3 wrote:
> >>>>>
> >>>>>> kernel BUG at fs/nfs/pagelist.c:82!
> >>>>>>
> >>>>> BUG_ON(PagePrivate(page));
> >>>>>
> >>>>>
> >>>>>> invalid opcode: 0000 [#1] SMP
> >>>>>> Modules linked in: vmnetfilter vmnet(P) vmmon(P) vmthrottle
> >>>>>>
> >>>>> In addition, there are some NFS patches for handling >16 groups and
> >>>>> selectively disabling attribute caching so its not a clean kernel.
> >>>>>
> >>>>>
> >>>>>> Pid: 4575, comm: tail Tainted: P (2.6.24.3-x177 #1)
> >>>>>>
> >>> ^^^^
> >>>
> >>>>>> EIP: 0060:[<c01fc084>] EFLAGS: 00010202 CPU: 1
> >>>>>> EIP is at nfs_create_request+0xf4/0x100
> >>>>>> EAX: 80000821 EBX: e31a5300 ECX: 00000000 EDX: c1f0712c
> >>>>>> ESI: c1f0712c EDI: e31a5338 EBP: e56dfd90 ESP: e56dfd74
> >>>>>> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> >>>>>> Process tail (pid: 4575, ti=e56de000 task=d4b65500 task.ti=e56de000)
> >>>>>> Stack: 00000000 f669ad20 cac3c168 e7330cb0 00000000 00000000 cac3c168 e56dfdc8
> >>>>>> c01fded5 00000000 000000a4 039cffff 000000a4 c1f0712c cac3c168 e7330cb0
> >>>>>> e56dfdb4 e56dfdb4 ffffff8c c1f0712c e7330cb0 e56dfdf0 c01fe8ce e56dfddc
> >>>>>> Call Trace:
> >>>>>> [<c010562a>] show_trace_log_lvl+0x1a/0x30
> >>>>>> [<c01056fa>] show_stack_log_lvl+0x9a/0xc0
> >>>>>> [<c01058a8>] show_registers+0xc8/0x1d0
> >>>>>> [<c0105b1c>] die+0x10c/0x230
> >>>>>> [<c0105cd1>] do_trap+0x91/0xd0
> >>>>>> [<c0105f79>] do_invalid_op+0x89/0xa0
> >>>>>> [<c05bba02>] error_code+0x72/0x80
> >>>>>> [<c01fded5>] nfs_readpage_async+0xb5/0x1b0
> >>>>>> [<c01fe8ce>] nfs_readpage+0xae/0x120
> >>>>>> [<c0155268>] do_generic_mapping_read+0xe8/0x440
> >>>>>> [<c01558b0>] generic_file_aio_read+0x160/0x190
> >>>>>> [<c01f4fb7>] nfs_file_read+0x97/0xe0
> >>>>>> [<c01795e7>] do_sync_read+0xc7/0x120
> >>>>>> [<c01796c4>] vfs_read+0x84/0x130
> >>>>>> [<c01799fd>] sys_read+0x3d/0x70
> >>>>>> [<c0104292>] syscall_call+0x7/0xb
> >>>>>> =======================
> >>>>>> Code: 02 75 0a e8 4f dc 3b 00 e9 4a ff ff ff 83 c4 10 b8 00 fe ff ff 5b 5e 5f 5d c3 8b 56 0c e9 7a ff ff ff 0f 0b eb fe 90 0f 0b eb fe <0f> 0b eb fe 90 8d b4 26 00 00 00 00 55 89 e5 53 83 ec 04 89 c3
> >>>>>> EIP: [<c01fc084>] nfs_create_request+0xf4/0x100 SS:ESP 0068:e56dfd74
> >>>>>> ---[ end trace 0ef921372ea6410b ]---
> >>>>>>
> >>>>> The machine is a quad Xeon with 4GB ram with CONFIG_HIGHMEM64G=y
> >>>>>
> >>>> Would that be on a file that was open for read and write, or is it
> >>>> possible that some other process was writing to the same file? If so,
> >>>> then it might be a bug in nfs_wb_page().
> >>>>
> >>> Yes, I'm quite sure it was a "tail -f" on a logfile which gets
> >>> continuously appended to by another process.. So, one process reads it
> >>> while another one writes to it through different descriptors/struct file.
> >>>
> >> The problem occurred again on a different box under exactly the same
> >> userland conditions yielding exactly the same stack trace. Kernels are
> >> identical but no vmware modules this time.
> >>
> >
> > Just a quick question: how does your > 16 groups patch behave when it is
> > denied a write with an EACCES error? I've got a feeling that this may be
> > due to the page getting redirtied and the RPC call retried. If so, then
> > the following patch may help.
> >
> > Cheers
> > Trond
> > --------------------------------------------------------------------------
> > From: Trond Myklebust <[email protected]>
> > Date: Fri, 11 Apr 2008 16:03:54 -0400
> > Subject: NFS: Fix nfs_wb_page() to always exit with an error or a clean page
> >
> > It is possible for nfs_wb_page() to sometimes exit with 0 return value, yet
> > the page is left in a dirty state.
> > For instance in the case where the server rebooted, and the COMMIT request
> > failed, then all the previously "clean" pages which were cached by the
> > server, but were not guaranteed to have been writted out to disk,
> > have to be redirtied and resent to the server.
> > The fix is to have nfs_wb_page_priority() check that the page is clean
> > before it exits...
> >
> >
>
> Is this a problem if the page is mmap'd into a process address
> space which is constantly touching it?

I initially thought that we only use nfs_wb_page() in the readpage()
case, but looking again, it seems that we also use it in cases where tha
page may indeed be mapped. OK, then that's a good reason to eliminate
the PageDirty() test at the end of that loop.

Note that since the callers of nfs_wb_page() are guaranteed to hold the
page lock, then there is no way for someone else to add a new nfs_page
structure and set PagePrivate.

Thanks,
Trond

--------------------------------------------
From: Trond Myklebust <[email protected]>
Date: Fri, 11 Apr 2008 16:03:54 -0400
Subject: NFS: Fix nfs_wb_page() to always exit with an error or a clean page

It is possible for nfs_wb_page() to sometimes exit with 0 return value, yet
the page is left in a dirty state.
For instance in the case where the server rebooted, and the COMMIT request
failed, then all the previously "clean" pages which were cached by the
server, but were not guaranteed to have been writted out to disk,
have to be redirtied and resent to the server.
The fix is to have nfs_wb_page_priority() check that the page is clean
before it exits...

This fixes a condition that triggers the BUG_ON(PagePrivate(page)) in
nfs_create_request() when we're in the nfs_readpage() path.

Also eliminate a redundant BUG_ON(!PageLocked(page)) while we're at it. It
turns out that clear_page_dirty_for_io() has the exact same test.

Signed-off-by: Trond Myklebust <[email protected]>
---

fs/nfs/write.c | 23 ++++++++++++-----------
1 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ce40cad..997b42a 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1493,18 +1493,19 @@ static int nfs_wb_page_priority(struct inode *inode, struct page *page,
};
int ret;

- BUG_ON(!PageLocked(page));
- if (clear_page_dirty_for_io(page)) {
- ret = nfs_writepage_locked(page, &wbc);
+ do {
+ if (clear_page_dirty_for_io(page)) {
+ ret = nfs_writepage_locked(page, &wbc);
+ if (ret < 0)
+ goto out_error;
+ } else if (!PagePrivate(page))
+ break;
+ ret = nfs_sync_mapping_wait(page->mapping, &wbc, how);
if (ret < 0)
- goto out;
- }
- if (!PagePrivate(page))
- return 0;
- ret = nfs_sync_mapping_wait(page->mapping, &wbc, how);
- if (ret >= 0)
- return 0;
-out:
+ goto out_error;
+ } while (PagePrivate(page));
+ return 0;
+out_error:
__mark_inode_dirty(inode, I_DIRTY_PAGES);
return ret;
}

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2008-04-12 09:42:08

by Frank van Maarseveen

[permalink] [raw]
Subject: Re: 2.6.24.3 kernel BUG at fs/nfs/pagelist.c:82

On Fri, Apr 11, 2008 at 04:07:16PM -0400, Trond Myklebust wrote:
>
> On Thu, 2008-04-10 at 13:54 +0200, Frank van Maarseveen wrote:
> > FYI,
> >
> > On Thu, Mar 20, 2008 at 01:57:16PM +0100, Frank van Maarseveen wrote:
> > > On Thu, Mar 20, 2008 at 08:47:13AM -0400, Trond Myklebust wrote:
> > > >
> > > > On Wed, 2008-03-19 at 10:49 +0100, Frank van Maarseveen wrote:
> > > > > FYI,
> > > > >
> > > > > 2.6.24.3 wrote:
> > > > > > kernel BUG at fs/nfs/pagelist.c:82!
> > > > >
> > > > > BUG_ON(PagePrivate(page));
[...]
> > > > > The machine is a quad Xeon with 4GB ram with CONFIG_HIGHMEM64G=y
> > > >
> > > > Would that be on a file that was open for read and write, or is it
> > > > possible that some other process was writing to the same file? If so,
> > > > then it might be a bug in nfs_wb_page().
> > >
> > > Yes, I'm quite sure it was a "tail -f" on a logfile which gets
> > > continuously appended to by another process.. So, one process reads it
> > > while another one writes to it through different descriptors/struct file.
> >
> > The problem occurred again on a different box under exactly the same
> > userland conditions yielding exactly the same stack trace. Kernels are
> > identical but no vmware modules this time.
>
> Just a quick question: how does your > 16 groups patch behave when it is
> denied a write with an EACCES error? I've got a feeling that this may be
> due to the page getting redirtied and the RPC call retried. If so, then
> the following patch may help.

The >16 groups patch doesn't do anything special with file I/O
(credentials are determined at open time) and is not retrying
anywhere upon error.

It's just one process which writes a big logfile on NFS (also involving
small writes) and a tail -f trying to catch up. The machine is heavily
loaded at that time, probably both CPU and networking I/O (non-NFS).

--
Frank