2001-11-30 11:53:57

by Steffen Persvold

[permalink] [raw]
Subject: 2.4.9 kernel crash

Hi all,

This just happened to my RedHat 7.2 box running the 2.4.9-13 update kernel from RedHat. The box is
running as a NFS server, exporting two ext3 volumes (one 36GB and one 73GB) :

VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day...
Unable to handle kernel NULL pointer dereference at virtual address 0000000b
printing eip:
c01537aa
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c01537aa>] Not tainted
EFLAGS: 00010282
eax: fffffffb ebx: c0a69580 ecx: c0a69590 edx: c0a69590
esi: cf12bce0 edi: fffffffb ebp: fffffb4c esp: cff6bf60
ds: 0018 es: 0018 ss: 0018
Process kswapd (pid: 5, stackpage=cff6b000)
Stack: c02dbdf8 ffffff55 cbf027c8 c181a8a0 cf12bcf8 cf12bce0 c0a69580 c0150eb6
c0a69580 00000419 00000000 c1022dc4 c1022dc4 c1022d9c 00000000 c01357a5
00000001 00000ed2 000000c0 000000c0 0008e000 c01512a1 00000000 c0135bab
Call Trace: [<c0150eb6>] prune_dcache [kernel] 0xf6
[<c01357a5>] page_launder [kernel] 0x8f5
[<c01512a1>] shrink_dcache_memory [kernel] 0x21
[<c0135bab>] do_try_to_free_pages [kernel] 0x1b
[<c0135c35>] kswapd [kernel] 0x55
[<c0105000>] stext [kernel] 0x0
[<c0105866>] kernel_thread [kernel] 0x26
[<c0135be0>] kswapd [kernel] 0x0


Code: 8b 47 10 85 c0 74 04 53 ff d0 58 68 40 be 2d c0 8d 43 24 50
<1>Unable to handle kernel NULL pointer dereference at virtual address 0000000b
printing eip:
c01537aa
*pde = 00000000
Oops: 0000
CPU: 1
EIP: 0010:[<c01537aa>] Not tainted
EFLAGS: 00010282
eax: fffffffb ebx: cc33a580 ecx: cc33a590 edx: cc33a590
esi: cf543aa0 edi: fffffffb ebp: 00000000 esp: cc62dd34
ds: 0018 es: 0018 ss: 0018
Process nfsd (pid: 979, stackpage=cc62d000)
Stack: c65bb8a0 c02da480 000008b4 c02da460 cf543ab8 cf543aa0 cc33a580 c0150eb6
cc33a580 0000045a 00000000 c1025008 c1025008 c1024fe0 00000000 c01357a5
00000001 00001648 000000d2 00000000 000000d2 c01512a1 00000000 c0135bab
Call Trace: [<c0150eb6>] prune_dcache [kernel] 0xf6
[<c01357a5>] page_launder [kernel] 0x8f5
[<c01512a1>] shrink_dcache_memory [kernel] 0x21
[<c0135bab>] do_try_to_free_pages [kernel] 0x1b
[<c0135d28>] try_to_free_pages [kernel] 0x28
[<c0136a11>] _wrapped_alloc_pages [kernel] 0x1c1
[<c0136acf>] __alloc_pages [kernel] 0xf
[<c012df71>] generic_file_readahead [kernel] 0x201
[<c012e29d>] do_generic_file_read [kernel] 0x26d
[<c012e774>] generic_file_read [kernel] 0x64
[<c012e610>] file_read_actor [kernel] 0x0
[<d08f20a4>] __insmod_nfsd_S.text_L52160 [nfsd] 0x4044
[<d0856920>] __insmod_ext3_S.data_L672 [ext3] 0xc0
[<d08f720b>] __insmod_nfsd_S.text_L52160 [nfsd] 0x91ab
[<d08ff080>] __insmod_nfsd_S.data_L2208 [nfsd] 0x660
[<d08ee5b1>] __insmod_nfsd_S.text_L52160 [nfsd] 0x551
[<d08ff080>] __insmod_nfsd_S.data_L2208 [nfsd] 0x660
[<d08d2d9a>] svc_process_Rsmp_64b56219 [sunrpc] 0x34a
[<d08fea38>] __insmod_nfsd_S.data_L2208 [nfsd] 0x18
[<d08fea58>] __insmod_nfsd_S.data_L2208 [nfsd] 0x38
[<d08ee39b>] __insmod_nfsd_S.text_L52160 [nfsd] 0x33b
[<c0105866>] kernel_thread [kernel] 0x26
[<d08ee190>] __insmod_nfsd_S.text_L52160 [nfsd] 0x130


Code: 8b 47 10 85 c0 74 04 53 ff d0 58 68 40 be 2d c0 8d 43 24 50
<1>Unable to handle kernel NULL pointer dereference at virtual address 0000000b
printing eip:
c01537aa
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c01537aa>] Not tainted
EFLAGS: 00010282
eax: fffffffb ebx: c7245900 ecx: c7245910 edx: c7245910
esi: c56779e0 edi: fffffffb ebp: 00000000 esp: cc6ebd34
ds: 0018 es: 0018 ss: 0018
Process nfsd (pid: 980, stackpage=cc6eb000)
Stack: c0933d20 c0933d20 00000003 c0141ea8 c56779f8 c56779e0 c7245900 c0150eb6
c7245900 00000000 c1015278 00000000 c12df20c c1015278 00000000 c01357a5
00000000 000020f2 000000d2 00000000 000000d2 c01512a1 00000000 c0135bab
Call Trace: [<c0141ea8>] try_to_free_buffers [kernel] 0xf8
[<c0150eb6>] prune_dcache [kernel] 0xf6
[<c01357a5>] page_launder [kernel] 0x8f5
[<c01512a1>] shrink_dcache_memory [kernel] 0x21
[<c0135bab>] do_try_to_free_pages [kernel] 0x1b
[<c0135d28>] try_to_free_pages [kernel] 0x28
[<c0136a11>] _wrapped_alloc_pages [kernel] 0x1c1
[<c0136acf>] __alloc_pages [kernel] 0xf
[<c012df71>] generic_file_readahead [kernel] 0x201
[<c012e408>] do_generic_file_read [kernel] 0x3d8
[<c012e774>] generic_file_read [kernel] 0x64
[<c012e610>] file_read_actor [kernel] 0x0
[<d08f20a4>] __insmod_nfsd_S.text_L52160 [nfsd] 0x4044
[<d0856920>] __insmod_ext3_S.data_L672 [ext3] 0xc0
[<d08f720b>] __insmod_nfsd_S.text_L52160 [nfsd] 0x91ab
[<d08ff080>] __insmod_nfsd_S.data_L2208 [nfsd] 0x660
[<d08ee5b1>] __insmod_nfsd_S.text_L52160 [nfsd] 0x551
[<d08ff080>] __insmod_nfsd_S.data_L2208 [nfsd] 0x660
[<d08d2d9a>] svc_process_Rsmp_64b56219 [sunrpc] 0x34a
[<d08fea38>] __insmod_nfsd_S.data_L2208 [nfsd] 0x18
[<d08fea58>] __insmod_nfsd_S.data_L2208 [nfsd] 0x38
[<d08ee39b>] __insmod_nfsd_S.text_L52160 [nfsd] 0x33b
[<c0105866>] kernel_thread [kernel] 0x26
[<d08ee190>] __insmod_nfsd_S.text_L52160 [nfsd] 0x130


Code: 8b 47 10 85 c0 74 04 53 ff d0 58 68 40 be 2d c0 8d 43 24 50
<7>eth0: 0 multicast blocks dropped.


The machine did not crash completely and I was still able to access it from remote (through ssh).
When I looked at top, kswapd was 'defunct' (zombie). Is this something that is fixed in newer
'vanilla' kernels (e.g 2.4.16) ?

Regards,
--
Steffen Persvold | Scalable Linux Systems | Try out the world's best
mailto:[email protected] | http://www.scali.com | performing MPI implementation:
Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.12.2 -
Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >300MBytes/s and <4uS latency


2001-11-30 20:16:42

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.4.9 kernel crash

Steffen Persvold wrote:
>
> Hi all,
>
> This just happened to my RedHat 7.2 box running the 2.4.9-13 update kernel from RedHat. The box is
> running as a NFS server, exporting two ext3 volumes (one 36GB and one 73GB) :
>
> VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day...

There was a bug in ext3 which was fixed around about the 2.4.9
timeframe. I don't know if the fix is present in that
particular Red Hat kernel. It was fixed in ext3 0.9.8. The
ext3 version number is displayed when you mount a filesystem.

The 0.9.8 changelog says:

- Fix an NFS oops when doing a local delete on an active, nfs-exported
file.

I never observed this bug - I think the fix came from Ted T'so. I
do not know whether the bug manifested itself as "busy inodes
after unmount". Perhaps Ted or Stephen can comment?

2001-12-04 00:21:11

by Stephen Walton

[permalink] [raw]
Subject: Re: [NFS] Re: 2.4.9 kernel crash

[Sorry for the long list of CC's but I wasn't sure which to delete.]

> There was a bug in ext3 which was fixed around about the 2.4.9
> timeframe. I don't know if the fix is present in that
> particular Red Hat kernel. It was fixed in ext3 0.9.8.

According to /usr/include/linux/ext3_fs.h, the redhat 2.4.9-13 kernel is
running ext3 0.9.11. I've had no trouble with my NFS-exported ext3 disks.

--
Stephen Walton, Professor of Physics and Astronomy,
California State University, Northridge
[email protected]

2001-12-04 00:35:52

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: [Ext2-devel] Re: [NFS] Re: 2.4.9 kernel crash

Hi,

On Mon, Dec 03, 2001 at 11:10:31AM -0800, Stephen Walton wrote:
> [Sorry for the long list of CC's but I wasn't sure which to delete.]
>
> > There was a bug in ext3 which was fixed around about the 2.4.9
> > timeframe. I don't know if the fix is present in that
> > particular Red Hat kernel. It was fixed in ext3 0.9.8.
>
> According to /usr/include/linux/ext3_fs.h, the redhat 2.4.9-13 kernel is
> running ext3 0.9.11. I've had no trouble with my NFS-exported ext3 disks.

It's 0.9.11 with a couple of critical back-ported fixes, and it
definitely includes the NFS fix.

Cheers,
Stephen

2001-12-04 00:35:53

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: [Ext2-devel] Re: 2.4.9 kernel crash

Hi,

On Fri, Nov 30, 2001 at 12:16:05PM -0800, Andrew Morton wrote:

> There was a bug in ext3 which was fixed around about the 2.4.9
> timeframe. I don't know if the fix is present in that
> particular Red Hat kernel. It was fixed in ext3 0.9.8. The
> ext3 version number is displayed when you mount a filesystem.
>
> The 0.9.8 changelog says:
>
> - Fix an NFS oops when doing a local delete on an active, nfs-exported
> file.
>
> I never observed this bug - I think the fix came from Ted T'so. I
> do not know whether the bug manifested itself as "busy inodes
> after unmount". Perhaps Ted or Stephen can comment?

The NFS bug could present as "bit already cleared", but I don't think
I ever saw it leave busy inodes behind.

Cheers,
Stephen

2001-12-06 17:32:52

by Steffen Persvold

[permalink] [raw]
Subject: Re: [Ext2-devel] Re: 2.4.9 kernel crash

"Stephen C. Tweedie" wrote:
>
> Hi,
>
> On Fri, Nov 30, 2001 at 12:16:05PM -0800, Andrew Morton wrote:
>
> > There was a bug in ext3 which was fixed around about the 2.4.9
> > timeframe. I don't know if the fix is present in that
> > particular Red Hat kernel. It was fixed in ext3 0.9.8. The
> > ext3 version number is displayed when you mount a filesystem.
> >
> > The 0.9.8 changelog says:
> >
> > - Fix an NFS oops when doing a local delete on an active, nfs-exported
> > file.
> >
> > I never observed this bug - I think the fix came from Ted T'so. I
> > do not know whether the bug manifested itself as "busy inodes
> > after unmount". Perhaps Ted or Stephen can comment?
>
> The NFS bug could present as "bit already cleared", but I don't think
> I ever saw it leave busy inodes behind.
>

So what could this be then ?

(original message i case someone missed it) :
Hi all,

This just happened to my RedHat 7.2 box running the 2.4.9-13 update kernel from RedHat. The box is
running as a NFS server, exporting two ext3 volumes (one 36GB and one 73GB) :

VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day...
Unable to handle kernel NULL pointer dereference at virtual address 0000000b
printing eip:
c01537aa
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c01537aa>] Not tainted
EFLAGS: 00010282
eax: fffffffb ebx: c0a69580 ecx: c0a69590 edx: c0a69590
esi: cf12bce0 edi: fffffffb ebp: fffffb4c esp: cff6bf60
ds: 0018 es: 0018 ss: 0018
Process kswapd (pid: 5, stackpage=cff6b000)
Stack: c02dbdf8 ffffff55 cbf027c8 c181a8a0 cf12bcf8 cf12bce0 c0a69580 c0150eb6
c0a69580 00000419 00000000 c1022dc4 c1022dc4 c1022d9c 00000000 c01357a5
00000001 00000ed2 000000c0 000000c0 0008e000 c01512a1 00000000 c0135bab
Call Trace: [<c0150eb6>] prune_dcache [kernel] 0xf6
[<c01357a5>] page_launder [kernel] 0x8f5
[<c01512a1>] shrink_dcache_memory [kernel] 0x21
[<c0135bab>] do_try_to_free_pages [kernel] 0x1b
[<c0135c35>] kswapd [kernel] 0x55
[<c0105000>] stext [kernel] 0x0
[<c0105866>] kernel_thread [kernel] 0x26
[<c0135be0>] kswapd [kernel] 0x0


Code: 8b 47 10 85 c0 74 04 53 ff d0 58 68 40 be 2d c0 8d 43 24 50
<1>Unable to handle kernel NULL pointer dereference at virtual address 0000000b
printing eip:
c01537aa
*pde = 00000000
Oops: 0000
CPU: 1
EIP: 0010:[<c01537aa>] Not tainted
EFLAGS: 00010282
eax: fffffffb ebx: cc33a580 ecx: cc33a590 edx: cc33a590
esi: cf543aa0 edi: fffffffb ebp: 00000000 esp: cc62dd34
ds: 0018 es: 0018 ss: 0018
Process nfsd (pid: 979, stackpage=cc62d000)
Stack: c65bb8a0 c02da480 000008b4 c02da460 cf543ab8 cf543aa0 cc33a580 c0150eb6
cc33a580 0000045a 00000000 c1025008 c1025008 c1024fe0 00000000 c01357a5
00000001 00001648 000000d2 00000000 000000d2 c01512a1 00000000 c0135bab
Call Trace: [<c0150eb6>] prune_dcache [kernel] 0xf6
[<c01357a5>] page_launder [kernel] 0x8f5
[<c01512a1>] shrink_dcache_memory [kernel] 0x21
[<c0135bab>] do_try_to_free_pages [kernel] 0x1b
[<c0135d28>] try_to_free_pages [kernel] 0x28
[<c0136a11>] _wrapped_alloc_pages [kernel] 0x1c1
[<c0136acf>] __alloc_pages [kernel] 0xf
[<c012df71>] generic_file_readahead [kernel] 0x201
[<c012e29d>] do_generic_file_read [kernel] 0x26d
[<c012e774>] generic_file_read [kernel] 0x64
[<c012e610>] file_read_actor [kernel] 0x0
[<d08f20a4>] __insmod_nfsd_S.text_L52160 [nfsd] 0x4044
[<d0856920>] __insmod_ext3_S.data_L672 [ext3] 0xc0
[<d08f720b>] __insmod_nfsd_S.text_L52160 [nfsd] 0x91ab
[<d08ff080>] __insmod_nfsd_S.data_L2208 [nfsd] 0x660
[<d08ee5b1>] __insmod_nfsd_S.text_L52160 [nfsd] 0x551
[<d08ff080>] __insmod_nfsd_S.data_L2208 [nfsd] 0x660
[<d08d2d9a>] svc_process_Rsmp_64b56219 [sunrpc] 0x34a
[<d08fea38>] __insmod_nfsd_S.data_L2208 [nfsd] 0x18
[<d08fea58>] __insmod_nfsd_S.data_L2208 [nfsd] 0x38
[<d08ee39b>] __insmod_nfsd_S.text_L52160 [nfsd] 0x33b
[<c0105866>] kernel_thread [kernel] 0x26
[<d08ee190>] __insmod_nfsd_S.text_L52160 [nfsd] 0x130


Code: 8b 47 10 85 c0 74 04 53 ff d0 58 68 40 be 2d c0 8d 43 24 50
<1>Unable to handle kernel NULL pointer dereference at virtual address 0000000b
printing eip:
c01537aa
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c01537aa>] Not tainted
EFLAGS: 00010282
eax: fffffffb ebx: c7245900 ecx: c7245910 edx: c7245910
esi: c56779e0 edi: fffffffb ebp: 00000000 esp: cc6ebd34
ds: 0018 es: 0018 ss: 0018
Process nfsd (pid: 980, stackpage=cc6eb000)
Stack: c0933d20 c0933d20 00000003 c0141ea8 c56779f8 c56779e0 c7245900 c0150eb6
c7245900 00000000 c1015278 00000000 c12df20c c1015278 00000000 c01357a5
00000000 000020f2 000000d2 00000000 000000d2 c01512a1 00000000 c0135bab
Call Trace: [<c0141ea8>] try_to_free_buffers [kernel] 0xf8
[<c0150eb6>] prune_dcache [kernel] 0xf6
[<c01357a5>] page_launder [kernel] 0x8f5
[<c01512a1>] shrink_dcache_memory [kernel] 0x21
[<c0135bab>] do_try_to_free_pages [kernel] 0x1b
[<c0135d28>] try_to_free_pages [kernel] 0x28
[<c0136a11>] _wrapped_alloc_pages [kernel] 0x1c1
[<c0136acf>] __alloc_pages [kernel] 0xf
[<c012df71>] generic_file_readahead [kernel] 0x201
[<c012e408>] do_generic_file_read [kernel] 0x3d8
[<c012e774>] generic_file_read [kernel] 0x64
[<c012e610>] file_read_actor [kernel] 0x0
[<d08f20a4>] __insmod_nfsd_S.text_L52160 [nfsd] 0x4044
[<d0856920>] __insmod_ext3_S.data_L672 [ext3] 0xc0
[<d08f720b>] __insmod_nfsd_S.text_L52160 [nfsd] 0x91ab
[<d08ff080>] __insmod_nfsd_S.data_L2208 [nfsd] 0x660
[<d08ee5b1>] __insmod_nfsd_S.text_L52160 [nfsd] 0x551
[<d08ff080>] __insmod_nfsd_S.data_L2208 [nfsd] 0x660
[<d08d2d9a>] svc_process_Rsmp_64b56219 [sunrpc] 0x34a
[<d08fea38>] __insmod_nfsd_S.data_L2208 [nfsd] 0x18
[<d08fea58>] __insmod_nfsd_S.data_L2208 [nfsd] 0x38
[<d08ee39b>] __insmod_nfsd_S.text_L52160 [nfsd] 0x33b
[<c0105866>] kernel_thread [kernel] 0x26
[<d08ee190>] __insmod_nfsd_S.text_L52160 [nfsd] 0x130


Code: 8b 47 10 85 c0 74 04 53 ff d0 58 68 40 be 2d c0 8d 43 24 50
<7>eth0: 0 multicast blocks dropped.


The machine did not crash completely and I was still able to access it from remote (through ssh).
When I looked at top, kswapd was 'defunct' (zombie). Is this something that is fixed in newer
'vanilla' kernels (e.g 2.4.16) ?



--
Steffen Persvold | Scalable Linux Systems | Try out the world's best
mailto:[email protected] | http://www.scali.com | performing MPI implementation:
Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.12.2 -
Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >300MBytes/s and <4uS latency

2001-12-06 18:51:12

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: [Ext2-devel] Re: 2.4.9 kernel crash

Hi,

On Thu, Dec 06, 2001 at 07:26:18PM +0100, Steffen Persvold wrote:

> So what could this be then ?

> VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day...

This one is a little unfamiliary, but the oops:

> Call Trace: [<c0150eb6>] prune_dcache [kernel] 0xf6
> [<c01357a5>] page_launder [kernel] 0x8f5
> [<c01512a1>] shrink_dcache_memory [kernel] 0x21
> [<c0135bab>] do_try_to_free_pages [kernel] 0x1b
> [<c0135c35>] kswapd [kernel] 0x55
> [<c0105000>] stext [kernel] 0x0
> [<c0105866>] kernel_thread [kernel] 0x26
> [<c0135be0>] kswapd [kernel] 0x0

has been reported before, even on much more recent kernels, and even
without ext3 loaded. So basically I've no idea what's behind it.

Cheers,
Stephen

2001-12-06 20:56:27

by Ragnar Kjørstad

[permalink] [raw]
Subject: Re: [NFS] Re: [Ext2-devel] Re: 2.4.9 kernel crash

Hi Stephen.

On Thu, Dec 06, 2001 at 06:50:27PM +0000, Stephen C. Tweedie wrote:
> > So what could this be then ?
>
> > VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day...
>
> has been reported before, even on much more recent kernels, and even
> without ext3 loaded. So basically I've no idea what's behind it.

To quote my own post to xfs_devel and lk july 20th, subject
"Busy inodes after umount":

I've now been able to reproduce:

* make a filesystem
* mount it
* export it (nfs)
* mount on remote machine
* lock file (fcntl)
* unexport
* unmount

Then you get the VFS message about self-destruct. Tested with both ext2
and xfs.

A reply from Neil Brown:
Yep. It is not filesystem specific.
nfsd does not flush locks when a filesystem is un-exported, only when
a client is removed, and that actually never happens.
In fs/nfsd/lockd.c there is a comment:

/*
* When removing an NFS client entry, notify lockd that it is gone.
* FIXME: We should do the same when unexporting an NFS volume.
*/

That FIXME needs to be fixed. I need to read through some more code
before I am sure how to do it, but it shouldn't be too hard.



I hope that can be of some help?



--
Ragnar Kj?rstad
Big Storage