2004-01-02 09:50:41

by Claas Langbehn

[permalink] [raw]
Subject: XFS forced shutdown with kernel 2.6.0

Hello!


Last night one of my machines running xfs shut down my /homes partition.

That machine was running Azureus (a bittorrent client) with probably
high memory usage.

But even if the memory usage of one program is going near to 100% it
should not force the filesystem to shutdown. Instead it should crash
the application.

I could also think of bad memory, but we did test the SDRAM modules
only a week ago, and they passed memtest86.

After rebooting everything was working fine, again.

So, is this a bug of xfs?


regards,
Claas



kernel: Filesystem "hdb3": XFS internal error xfs_btree_check_lblock at line 222 of file fs/xfs/xfs_btree.c. Caller 0xc01d770c
kernel: Call Trace:
kernel: [<c01da8be>] xfs_btree_check_lblock+0x5e/0x190
kernel: [<c01d770c>] xfs_bmbt_lookup+0x18c/0x550
kernel: [<c01d770c>] xfs_bmbt_lookup+0x18c/0x550
kernel: [<c01cabe9>] xfs_bmap_add_extent_delay_real+0x11b9/0x16a0
kernel: [<c01cdc64>] xfs_bmap_alloc+0xb34/0x1870
kernel: [<d0a9998d>] __nvsym00795+0x31/0x50 [nvidia]
kernel: [<c01d957f>] xfs_bmbt_get_state+0x2f/0x40
kernel: [<c01c989f>] xfs_bmap_add_extent+0x38f/0x520
kernel: [<c01d2283>] xfs_bmapi+0x783/0x1600
kernel: [<c01d957f>] xfs_bmbt_get_state+0x2f/0x40
kernel: [<c01d033c>] xfs_bmap_do_search_extents+0xbc/0x3f0
kernel: [<c02013dd>] xfs_log_reserve+0xbd/0xd0
kernel: [<c02269f8>] xfs_iomap_write_allocate+0x2a8/0x4d0
kernel: [<c01d1df3>] xfs_bmapi+0x2f3/0x1600
kernel: [<c0225cec>] xfs_iomap+0x40c/0x550
kernel: [<c0220ae2>] map_blocks+0x72/0x130
kernel: [<c0221ba7>] page_state_convert+0x4c7/0x620
kernel: [<c02223bc>] linvfs_writepage+0x5c/0x110
kernel: [<c0174953>] mpage_writepages+0x203/0x2f0
kernel: [<c0222360>] linvfs_writepage+0x0/0x110
kernel: [<c013d846>] do_writepages+0x36/0x40
kernel: [<c0137e7e>] __filemap_fdatawrite+0xbe/0xd0
kernel: [<c0137ea7>] filemap_fdatawrite+0x17/0x20
kernel: [<c01736f0>] generic_osync_inode+0x120/0x130
kernel: [<c013a27f>] generic_file_aio_write_nolock+0x5af/0xb90
kernel: [<c011cb10>] default_wake_function+0x0/0x20
kernel: [<c01fdde0>] xfs_ichgtime+0x120/0x122
kernel: [<c0211729>] xfs_trans_unlocked_item+0x39/0x60
kernel: [<c0228e1b>] xfs_write+0x29b/0x860
kernel: [<d0a99a8d>] __nvsym00727+0x31/0x38 [nvidia]
kernel: [<c0109d16>] apic_timer_interrupt+0x1a/0x20
kernel: [<c02228c1>] linvfs_write+0xb1/0x120
kernel: [<c01536bb>] do_sync_write+0x8b/0xc0
kernel: [<c0147ae9>] find_extend_vma+0x29/0x90
kernel: [<c011cb10>] default_wake_function+0x0/0x20
kernel: [<c0127fc6>] update_process_times+0x46/0x60
kernel: [<c0127e2b>] update_wall_time+0xb/0x40
kernel: [<c012829f>] do_timer+0xdf/0xf0
kernel: [<c0153630>] do_sync_write+0x0/0xc0
kernel: [<c01537a8>] vfs_write+0xb8/0x130
kernel: [<c01538d2>] sys_write+0x42/0x70
kernel: [<c0109387>] syscall_call+0x7/0xb
kernel:
kernel: xfs_force_shutdown(hdb3,0x8) called from line 1070 of file fs/xfs/xfs_trans.c. Return address = 0xc022aa0c
kernel: Filesystem "hdb3": Corruption of in-memory data detected. Shutting down filesystem: hdb3
kernel: Please umount the filesystem, and rectify the problem(s)

- about 30mins later: -

kernel: Out of Memory: Killed process 2825 (java).



2004-01-02 18:29:24

by Christoph Hellwig

[permalink] [raw]
Subject: Re: XFS forced shutdown with kernel 2.6.0

On Fri, Jan 02, 2004 at 10:50:51AM +0100, Claas Langbehn wrote:
> Hello!
>
>
> Last night one of my machines running xfs shut down my /homes partition.
>
> That machine was running Azureus (a bittorrent client) with probably
> high memory usage.
>
> But even if the memory usage of one program is going near to 100% it
> should not force the filesystem to shutdown. Instead it should crash
> the application.
>
> I could also think of bad memory, but we did test the SDRAM modules
> only a week ago, and they passed memtest86.
>
> After rebooting everything was working fine, again.
>
> So, is this a bug of xfs?

I've seen the same bug a few times lately, but only if I had previous
memory corruption due to code I was hacking on. Can you reproduce it
without the nvidia module loaded as that is likely source of such
corruption?

2004-01-02 20:35:59

by Christoph Hellwig

[permalink] [raw]
Subject: Re: XFS forced shutdown with kernel 2.6.0

On Fri, Jan 02, 2004 at 03:27:40PM -0500, [email protected] wrote:
> On Fri, 02 Jan 2004 18:29:21 GMT, Christoph Hellwig said:
>
> > I've seen the same bug a few times lately, but only if I had previous
> > memory corruption due to code I was hacking on. Can you reproduce it
> > without the nvidia module loaded as that is likely source of such
> > corruption?
>
> While you're at it, see what *else* you can turn off - RAID, devfs, NFS, etc.
>
> It's equally likely that you're tripping over some other kernel module's
> use-after-free or chase-the-wrong-pointer bug. I've seen a lot more bugfixes
> for *those* on this list than cases where "I turned off nvidia and it started
> working".

The difference is that I can look at those while I can't look at nvidias
driver. Pretty simple.

2004-01-02 20:30:43

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: XFS forced shutdown with kernel 2.6.0

On Fri, 02 Jan 2004 18:29:21 GMT, Christoph Hellwig said:

> I've seen the same bug a few times lately, but only if I had previous
> memory corruption due to code I was hacking on. Can you reproduce it
> without the nvidia module loaded as that is likely source of such
> corruption?

While you're at it, see what *else* you can turn off - RAID, devfs, NFS, etc.

It's equally likely that you're tripping over some other kernel module's
use-after-free or chase-the-wrong-pointer bug. I've seen a lot more bugfixes
for *those* on this list than cases where "I turned off nvidia and it started
working".

>From the 2.6.1-rc1 release notes:

Jeff Garzik:
o [libata] fix use-after-free
Stephen Hemminger:
o [ROSE]: Fix use after free in socket destruction

Andrew Morton, on broken iee1394:

> aargh, sorry. You need to revert
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.1-rc1/2.6.1-rc1-mm1/broken-out/sysfs-add-vc-class.patch
>
> This is the totally weird tty oops which Greg and I have been starting
> at bemusedly for a few days.

That's *this week* or so. Yes, I understand the political and/or realistic
reasons for refusing to look at tainted kernels, but let's face it guys, *our*
code is to blame more often than NVidia's. When was the last time there was a
*verified* report of "I turned the NVidia graphics module off and things
worked" that wasn't directly related to a graphics issue?


Attachments:
(No filename) (226.00 B)

2004-03-29 16:18:27

by Christoph Hellwig

[permalink] [raw]
Subject: Re: XFS forced shutdown with kernel 2.6.0

Could you please stop the bitching? If you want your collection of
assorted buggy free and non-free kernel configs supported buy a support
contract from someone who's interested in providing such support for you.

*plonk*