LinuxLists.cc - xfs [_fsr] probs in 2.6.24.0

2008-02-12 01:02:24

Subject: xfs [_fsr] probs in 2.6.24.0

I'm getting similar errors on an x86-32 & x86-64 kernel. The x86-64 system
(2nd log below w/date+times) was unusable this morning: one or more of the
xfs file systems had "gone off line" due to some unknown error (upon reboot,
no errors were indicated; all partitions on the same physical disk).

I keep ending up with random failures on two systems. The 32-bit sys more
often than not just "locks-up" -- no messages, no keyboard response...etc.
Nothing to do except reset -- and of course, nothing in the logs....

I'm turning on all the stat and diagnostics that don't seem to have a noted
performance penalty (or not much of one) to see if that helps. Perhaps one
issue is the "bug" (looks like multiple instances of the same bug) in xfs.

The 32-bit machine does lots of disk activity in early morning hours...and
coincidentally, it seemed to be crashing (not consistently) in the morning.
With 2.6.24, the 32-bit machine "seems" a bit more stable -- up for over 3
days now (average before was <48 hours). But the 64-bit machine went
bonkers (not sure of exact time) -- It *had* been stable (non crashing,
anyway) before 2.6.24.

I vaguely remember there being a similar xfs lock bug a few versions back
and was told just "not to worry about and turn off the lock checking to
"avoid the problem"....( :^| ). So I did, but still trying to track down
randomness. So I turn back on what checks I could and ... ding -- still a
problem in the xfs area -- which "coincidentally" (has happened on more than
one occasion), an xfs partition developed a run-time error and shut itself
down. This also happened on the 32-bit machine with the SATA disk (thought
it might be sata specific), so removed its controller and disk and threw in
a same-size PATA. No more file-system errors on 32bit. But first time I've
had a filesystem 'forced offline' on the 64bit (but just switched to 2.6.24
recently for obvious reasons).

Sadly -- it also seemed to be the case that the 32bit machine when it had
the SATA controller+disk installed, was more likely to crash if xfs_fsr was
running at the same-time remote backups were being written to the
SATA-drive. Couldn't repeat it reliably, though so not sure what's going on
there.

Both machines are SMP, so that's a potential factor along with the fact that
both machines also use SCSI (64bit, SAS form). Neither are running a
graphical console but are being remotely accessed.

Lemme know if I can provide any info in addressing these problems. I don't
know if they are related to the other system crashes, but usage patterns
indicate it could be disk-activity related. So eliminating known bugs &
problems in that area might help (?)...

Suspect log messages follow (1st, w/o Date/time were from dmesg on 32bit,
latter was a previously unnoticed one on 64bit until I started looking
through logs for clues on last night's failures.

The first set of errors the I have some diagnostics for, I could
only find in dmesg:
(ish-32bit machine)
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.24-ish #1
-------------------------------------------------------
xfs_fsr/2119 is trying to acquire lock:
(&mm->mmap_sem){----}, at: [<b01a3432>] dio_get_page+0x62/0x160

but task is already holding lock:
(&(&ip->i_iolock)->mr_lock){----}, at: [<b02651db>] xfs_ilock+0x5b/0xb0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&(&ip->i_iolock)->mr_lock){----}:
[<b014c026>] __lock_acquire+0xce6/0x1150
[<b0265217>] xfs_ilock+0x97/0xb0
[<b014ae3f>] trace_hardirqs_on+0xaf/0x160
[<b014c4fa>] lock_acquire+0x6a/0x80
[<b0265217>] xfs_ilock+0x97/0xb0
[<b0140d63>] down_write_nested+0x43/0x80
[<b0265217>] xfs_ilock+0x97/0xb0
[<b0265217>] xfs_ilock+0x97/0xb0
[<b0286c61>] xfs_free_eofblocks+0x231/0x2e0
[<b028b137>] xfs_release+0x1a7/0x230
[<b029154b>] xfs_file_release+0xb/0x10
[<b017dc4c>] __fput+0xac/0x180
[<b016dc25>] remove_vma+0x35/0x50
[<b016e790>] do_munmap+0x190/0x200
[<b016e82f>] sys_munmap+0x2f/0x50
[<b0104396>] sysenter_past_esp+0x5f/0xa5
[<ffffffff>] 0xffffffff

-> #0 (&mm->mmap_sem){----}:
[<b0148df1>] print_circular_bug_entry+0x41/0x50
[<b014be19>] __lock_acquire+0xad9/0x1150
[<b014aca3>] mark_held_locks+0x43/0x80
[<b014c4fa>] lock_acquire+0x6a/0x80
[<b01a3432>] dio_get_page+0x62/0x160
[<b04675a8>] down_read+0x38/0x80
[<b01a3432>] dio_get_page+0x62/0x160
[<b01a3432>] dio_get_page+0x62/0x160
[<b02bd4c2>] __spin_lock_init+0x32/0x60
[<b01a4053>] __blockdev_direct_IO+0x483/0xc70
[<b017945b>] kmem_cache_alloc+0x9b/0xc0
[<b04684d4>] _spin_unlock+0x14/0x20
[<b014863d>] lockdep_init_map+0x3d/0x4c0
[<b028cd70>] xfs_alloc_ioend+0x120/0x170
[<b028cf1f>] xfs_vm_direct_IO+0x15f/0x170
[<b028d410>] xfs_get_blocks_direct+0x0/0x30
[<b028c880>] xfs_end_io_direct+0x0/0x90
[<b015db82>] generic_file_direct_IO+0xd2/0x160
[<b015dc6a>] generic_file_direct_write+0x5a/0x170
[<b0140db4>] up_write+0x14/0x30
[<b0295418>] xfs_write+0x3e8/0x920
[<b015c810>] file_read_actor+0x0/0x120
[<b02916ce>] xfs_file_aio_write+0x5e/0x70
[<b017c807>] do_sync_write+0xc7/0x110
[<b013ce80>] autoremove_wake_function+0x0/0x40
[<b01484a7>] lock_release_holdtime+0x47/0x70
[<b017d240>] vfs_read+0xd0/0x160
[<b017c740>] do_sync_write+0x0/0x110
[<b017d0b0>] vfs_write+0xa0/0x160
[<b017d7d1>] sys_write+0x41/0x70
[<b0104396>] sysenter_past_esp+0x5f/0xa5
[<ffffffff>] 0xffffffff

other info that might help us debug this:

1 lock held by xfs_fsr/2119:
#0: (&(&ip->i_iolock)->mr_lock){----}, at: [<b02651db>]
xfs_ilock+0x5b/0xb0

stack backtrace:
Pid: 2119, comm: xfs_fsr Not tainted 2.6.24-ish #1
[<b01499da>] print_circular_bug_tail+0x7a/0x90
[<b014be19>] __lock_acquire+0xad9/0x1150
[<b014aca3>] mark_held_locks+0x43/0x80
[<b014c4fa>] lock_acquire+0x6a/0x80
[<b01a3432>] dio_get_page+0x62/0x160
[<b04675a8>] down_read+0x38/0x80
[<b01a3432>] dio_get_page+0x62/0x160
[<b01a3432>] dio_get_page+0x62/0x160
[<b02bd4c2>] __spin_lock_init+0x32/0x60
[<b01a4053>] __blockdev_direct_IO+0x483/0xc70
[<b017945b>] kmem_cache_alloc+0x9b/0xc0
[<b04684d4>] _spin_unlock+0x14/0x20
[<b014863d>] lockdep_init_map+0x3d/0x4c0
[<b028cd70>] xfs_alloc_ioend+0x120/0x170
[<b028cf1f>] xfs_vm_direct_IO+0x15f/0x170
[<b028d410>] xfs_get_blocks_direct+0x0/0x30
[<b028c880>] xfs_end_io_direct+0x0/0x90
[<b015db82>] generic_file_direct_IO+0xd2/0x160
[<b015dc6a>] generic_file_direct_write+0x5a/0x170
[<b0140db4>] up_write+0x14/0x30
[<b0295418>] xfs_write+0x3e8/0x920
[<b015c810>] file_read_actor+0x0/0x120
[<b02916ce>] xfs_file_aio_write+0x5e/0x70
[<b017c807>] do_sync_write+0xc7/0x110
[<b013ce80>] autoremove_wake_function+0x0/0x40
[<b01484a7>] lock_release_holdtime+0x47/0x70
[<b017d240>] vfs_read+0xd0/0x160
[<b017c740>] do_sync_write+0x0/0x110
[<b017d0b0>] vfs_write+0xa0/0x160
[<b017d7d1>] sys_write+0x41/0x70
[<b0104396>] sysenter_past_esp+0x5f/0xa5
=======================

--------------
The other system shows a similar message: in its logs:
Feb 7 02:01:50 kern:
=======================================================
Feb 7 02:01:50 kern: [ INFO: possible circular locking dependency
detected ]
Feb 7 02:01:50 kern: 2.6.24-asa64def #3
Feb 7 02:01:50 kern:
-------------------------------------------------------
Feb 7 02:01:50 kern: xfs_fsr/6313 is trying to acquire lock:
Feb 7 02:01:50 kern: (&(&ip->i_lock)->mr_lock/2){----}, at:
[<ffffffff803c1b22>] xfs_ilock+0x82/0xc0
Feb 7 02:01:50 kern:
Feb 7 02:01:50 kern: but task is already holding lock:
Feb 7 02:01:50 kern: (&(&ip->i_iolock)->mr_lock/3){--..}, at:
[<ffffffff803c1b45>] xfs_ilock+0xa5/0xc0
Feb 7 02:01:50 kern:
Feb 7 02:01:50 kern: which lock already depends on the new lock.
Feb 7 02:01:50 kern:
Feb 7 02:01:50 kern:
Feb 7 02:01:50 kern: the existing dependency chain (in reverse order) is:
Feb 7 02:01:50 kern:
Feb 7 02:01:50 kern: -> #1 (&(&ip->i_iolock)->mr_lock/3){--..}:
Feb 7 02:01:50 kern: [<ffffffff80261d39>]
__lock_acquire+0xc39/0x1090
Feb 7 02:01:50 kern: [<ffffffff802621f1>] lock_acquire+0x61/0x80
Feb 7 02:01:50 kern: [<ffffffff803c1b45>] xfs_ilock+0xa5/0xc0
Feb 7 02:01:50 kern: [<ffffffff802572ba>]
down_write_nested+0x3a/0x80
Feb 7 02:01:50 kern: [<ffffffff803c1b45>] xfs_ilock+0xa5/0xc0
Feb 7 02:01:50 kern: [<ffffffff803e0a33>]
xfs_lock_inodes+0x143/0x190
Feb 7 02:01:50 kern: [<ffffffff803cbc81>]
xfs_swap_extents+0xa1/0x630
Feb 7 02:01:50 kern: [<ffffffff8025df35>]
lock_release_holdtime+0x45/0x70
Feb 7 02:01:50 kern: [<ffffffff803cbbb1>] xfs_swapext+0x131/0x160
Feb 7 02:01:50 kern: [<ffffffff803edace>] xfs_ioctl+0x5ce/0x740
Feb 7 02:01:50 kern: [<ffffffff8025df35>]
lock_release_holdtime+0x45/0x70
Feb 7 02:01:50 kern: [<ffffffff80626eba>]
__mutex_unlock_slowpath+0xca/0x1a0
Feb 7 02:01:50 kern: [<ffffffff803eb9b6>]
xfs_file_ioctl_invis+0x36/0x80
Feb 7 02:01:50 kern: [<ffffffff802a9dcf>] do_ioctl+0x2f/0xa0
Feb 7 02:01:50 kern: [<ffffffff802aa070>] vfs_ioctl+0x230/0x2d0
Feb 7 02:01:50 kern: [<ffffffff80260bb1>]
trace_hardirqs_on+0xc1/0x160
Feb 7 02:01:50 kern: [<ffffffff802aa159>] sys_ioctl+0x49/0x80
Feb 7 02:01:51 kern: [<ffffffff8020c3ce>] system_call+0x7e/0x83
Feb 7 02:01:51 kern: [<ffffffffffffffff>] 0xffffffffffffffff
Feb 7 02:01:51 kern:
Feb 7 02:01:51 kern: -> #0 (&(&ip->i_lock)->mr_lock/2){----}:
Feb 7 02:01:51 kern: [<ffffffff80260018>]
print_circular_bug_header+0xe8/0xf0
Feb 7 02:01:51 kern: [<ffffffff80261ba8>]
__lock_acquire+0xaa8/0x1090
Feb 7 02:01:51 kern: [<ffffffff802621f1>] lock_acquire+0x61/0x80
Feb 7 02:01:51 kern: [<ffffffff803c1b22>] xfs_ilock+0x82/0xc0
Feb 7 02:01:51 kern: [<ffffffff80628597>] _spin_unlock+0x17/0x20
Feb 7 02:01:51 kern: [<ffffffff802572ba>]
down_write_nested+0x3a/0x80
Feb 7 02:01:51 kern: [<ffffffff803c1b22>] xfs_ilock+0x82/0xc0
Feb 7 02:01:51 kern: [<ffffffff803e0a33>]
xfs_lock_inodes+0x143/0x190
Feb 7 02:01:51 kern: [<ffffffff803cbefe>]
xfs_swap_extents+0x31e/0x630
Feb 7 02:01:51 kern: [<ffffffff803cbbb1>] xfs_swapext+0x131/0x160
Feb 7 02:01:51 kern: [<ffffffff803edace>] xfs_ioctl+0x5ce/0x740
Feb 7 02:01:51 kern: [<ffffffff8025df35>]
lock_release_holdtime+0x45/0x70
Feb 7 02:01:51 kern: [<ffffffff80626eba>]
__mutex_unlock_slowpath+0xca/0x1a0
Feb 7 02:01:51 kern: [<ffffffff803eb9b6>]
xfs_file_ioctl_invis+0x36/0x80
Feb 7 02:01:51 kern: [<ffffffff802a9dcf>] do_ioctl+0x2f/0xa0
Feb 7 02:01:51 kern: [<ffffffff802aa070>] vfs_ioctl+0x230/0x2d0
Feb 7 02:01:51 kern: [<ffffffff80260bb1>]
trace_hardirqs_on+0xc1/0x160
Feb 7 02:01:51 kern: [<ffffffff802aa159>] sys_ioctl+0x49/0x80
Feb 7 02:01:51 kern: [<ffffffff8020c3ce>] system_call+0x7e/0x83
Feb 7 02:01:51 kern: [<ffffffffffffffff>] 0xffffffffffffffff
Feb 7 02:01:51 kern:
Feb 7 02:01:51 kern: other info that might help us debug this:
Feb 7 02:01:51 kern:
Feb 7 02:01:51 kern: 2 locks held by xfs_fsr/6313:
Feb 7 02:01:51 kern: #0: (&(&ip->i_iolock)->mr_lock/2){--..}, at:
[<ffffffff803c1b45>] xfs_ilock+0xa5/0xc0
Feb 7 02:01:51 kern: #1: (&(&ip->i_iolock)->mr_lock/3){--..}, at:
[<ffffffff803c1b45>] xfs_ilock+0xa5/0xc0
Feb 7 02:01:51 kern:
Feb 7 02:01:51 kern: stack backtrace:
Feb 7 02:01:51 kern: Pid: 6313, comm: xfs_fsr Not tainted
2.6.24-asa64def #3
Feb 7 02:01:51 kern:
Feb 7 02:01:51 kern: Call Trace:
Feb 7 02:01:51 kern: [<ffffffff8025f413>]
print_circular_bug_tail+0x83/0x90
Feb 7 02:01:51 kern: [<ffffffff80260018>]
print_circular_bug_header+0xe8/0xf0
Feb 7 02:01:51 kern: [<ffffffff80261ba8>] __lock_acquire+0xaa8/0x1090
Feb 7 02:01:51 kern: [<ffffffff802621f1>] lock_acquire+0x61/0x80
Feb 7 02:01:51 kern: [<ffffffff803c1b22>] xfs_ilock+0x82/0xc0
Feb 7 02:01:51 kern: [<ffffffff80628597>] _spin_unlock+0x17/0x20
Feb 7 02:01:51 kern: [<ffffffff802572ba>] down_write_nested+0x3a/0x80
Feb 7 02:01:51 kern: [<ffffffff803c1b22>] xfs_ilock+0x82/0xc0
Feb 7 02:01:51 kern: [<ffffffff803e0a33>] xfs_lock_inodes+0x143/0x190
Feb 7 02:01:51 kern: [<ffffffff803cbefe>] xfs_swap_extents+0x31e/0x630
Feb 7 02:01:51 kern: [<ffffffff803cbbb1>] xfs_swapext+0x131/0x160
Feb 7 02:01:51 kern: [<ffffffff803edace>] xfs_ioctl+0x5ce/0x740
Feb 7 02:01:51 kern: [<ffffffff8025df35>] lock_release_holdtime+0x45/0x70
Feb 7 02:01:51 kern: [<ffffffff80626eba>]
__mutex_unlock_slowpath+0xca/0x1a0
[<ffffffff803eb9b6>] xfs_file_ioctl_invis+0x36/0x80
Feb 7 02:01:51 kern: [<ffffffff802a9dcf>] do_ioctl+0x2f/0xa0
Feb 7 02:01:51 kern: [<ffffffff802aa070>] vfs_ioctl+0x230/0x2d0
Feb 7 02:01:51 kern: [<ffffffff80260bb1>] trace_hardirqs_on+0xc1/0x160
Feb 7 02:01:51 kern: [<ffffffff802aa159>] sys_ioctl+0x49/0x80
Feb 7 02:01:51 kern: [<ffffffff8020c3ce>] system_call+0x7e/0x83

2008-02-12 08:58:25

by David Chinner

[permalink] [raw]

Subject: Re: xfs [_fsr] probs in 2.6.24.0

On Mon, Feb 11, 2008 at 05:02:05PM -0800, Linda Walsh wrote:
>
> I'm getting similar errors on an x86-32 & x86-64 kernel. The x86-64 system
> (2nd log below w/date+times) was unusable this morning: one or more of the
> xfs file systems had "gone off line" due to some unknown error (upon reboot,
> no errors were indicated; all partitions on the same physical disk).
>
> I keep ending up with random failures on two systems. The 32-bit sys more
> often than not just "locks-up" -- no messages, no keyboard response...etc.
> Nothing to do except reset -- and of course, nothing in the logs....

Filesystem bugs rarely hang systems hard like that - more likely is
a hardware or driver problem. And neither of the lockdep reports
below are likely to be responsible for a system wide, no-response
hang.

[cut a bunch of speculation and stuff about hardware problems
causing XFS problems]

> Lemme know if I can provide any info in addressing these problems. I don't
> know if they are related to the other system crashes, but usage patterns
> indicate it could be disk-activity related. So eliminating known bugs &
> problems in that area might help (?)...

If your hardware or drivers are unstable, then XFS cannot be
expected to reliably work. Given that xfs_fsr apparently triggers
the hangs, I'd suggest putting lots of I/O load on your disk subsystem
by copying files around with direct I/O (just like xfs_fsr does) to
try to reproduce the problem.

Perhaps by running xfs_fsr manually you could reproduce the
problem while you are sitting in front of the machine...

Looking at the lockdep reports:

> The first set of errors the I have some diagnostics for, I could
> only find in dmesg:
> (ish-32bit machine)
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.24-ish #1
> -------------------------------------------------------
> xfs_fsr/2119 is trying to acquire lock:
> (&mm->mmap_sem){----}, at: [<b01a3432>] dio_get_page+0x62/0x160
>
> but task is already holding lock:
> (&(&ip->i_iolock)->mr_lock){----}, at: [<b02651db>] xfs_ilock+0x5b/0xb0

dio_get_page() takes the mmap_sem of the processes
vma that has the pages we do I/O into. That's not new.
We're holding the xfs inode iolock at this point to protect
against truncate and simultaneous buffered I/O races and
this is also unchanged. i.e. this is normal.

> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:

munmap() dropping the last reference to it's vm_file and
calling ->release() which causes a truncate of speculatively
allocated space to take place. IOWs, ->release() is called
with the mmap_sem held. Hmmm....

Looking at it in terms of i_mutex, other filesystems hold
i_mutex over dio_get_page() (all those that use DIO_LOCKING)
so question is whether we are allowed to take the i_mutex
in ->release. I note that both reiserfs and hfsplus take
i_mutex in ->release as well as use DIO_LOCKING, so this
problem is not isolated to XFS.

However, it would appear that mmap_sem -> i_mutex is illegal
according to the comment at the head of mm/filemap.c. While we are
not using i_mutex in this case, the inversion would seem to be
equivalent in nature.

There's not going to be a quick fix for this.

And the other one:

> Feb 7 02:01:50 kern:
> -------------------------------------------------------
> Feb 7 02:01:50 kern: xfs_fsr/6313 is trying to acquire lock:
> Feb 7 02:01:50 kern: (&(&ip->i_lock)->mr_lock/2){----}, at:
> [<ffffffff803c1b22>] xfs_ilock+0x82/0xc0
> Feb 7 02:01:50 kern:
> Feb 7 02:01:50 kern: but task is already holding lock:
> Feb 7 02:01:50 kern: (&(&ip->i_iolock)->mr_lock/3){--..}, at:
> [<ffffffff803c1b45>] xfs_ilock+0xa5/0xc0
> Feb 7 02:01:50 kern:
> Feb 7 02:01:50 kern: which lock already depends on the new lock.

Looks like yet another false positive. Basically we do this
in xfs_swap_extents:

inode A: i_iolock class 2
inode A: i_ilock class 2
inode B: i_iolock class 3
inode B: i_ilock class 3
.....
inode A: unlock ilock
inode B: unlock ilock
.....
>>>>> inode A: ilock class 2
inode B: ilock class 3

And lockdep appears to be complaining about the relocking of inode A
as class 2 because we've got a class 3 iolock still held, hence
violating the order it saw initially. There's no possible deadlock
here so we'll just have to add more hacks to the annotation code to make
lockdep happy.

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2008-02-12 21:02:28

by Linda Walsh

[permalink] [raw]

Subject: Re: xfs [_fsr] probs in 2.6.24.0

David Chinner wrote:
> Filesystem bugs rarely hang systems hard like that - more likely is
> a hardware or driver problem. And neither of the lockdep reports
> below are likely to be responsible for a system wide, no-response
> hang.
---
"Ish", the 32-bitter, has been the only hard-hanger. Since
upgrading to 2.6.24, it's crashed once, inexplicably, but has since
stayed up longer than it has since I started with the whole SATA
fiasco (which I intend to inflict upon myself again, as soon as I
get back to a "stable" config -- masochistic nature I suppose).

> If your hardware or drivers are unstable, then XFS cannot be
> expected to reliably work. Given that xfs_fsr apparently triggers
> the hangs, I'd suggest putting lots of I/O load on your disk subsystem
> by copying files around with direct I/O (just like xfs_fsr does) to
> try to reproduce the problem.
---
The hardware drivers in ish are the older PATA drivers --
nothing new...cept I did add tickless option for system clock.
I've only been running XFS on this system (mostly same hardware,
disks upgraded), for about 6-7 years.

> Perhaps by running xfs_fsr manually you could reproduce the
> problem while you are sitting in front of the machine...
----
Um...yeah, AND with multiple "cp's of multi-gig files
going on at same time, both local, by a sister machine via NFS,
and and a 3rd machine tapping (not banging) away via CIFS.
These were on top of normal server duties. Whenever I stress
it on *purpose* and watch it, works fine. GRRRRRRR....I HATE
THAT!!!

>> xfs_fsr/2119 is trying to acquire lock:
>> (&mm->mmap_sem){----}, at: [<b01a3432>] dio_get_page+0x62/0x160
>>
>> but task is already holding lock:
>> (&(&ip->i_iolock)->mr_lock){----}, at: [<b02651db>] xfs_ilock+0x5b/0xb0
>
> dio_get_page() takes the mmap_sem of the processes
> vma that has the pages we do I/O into. That's not new.
> We're holding the xfs inode iolock at this point to protect
> against truncate and simultaneous buffered I/O races and
> this is also unchanged. i.e. this is normal.
---
Uh huh...please note I'm not, trying to point fingers at
xfs_fsr, but the locking diagnostics associated with xfs_fsr have
been the only "hint" of anything "irregular", _at_ _least_, that is,
since I've removed the SATA controller+disk) on 'ish32'.

The file system(s) going "offline" due
to xfs-detected filesystem errors has only happened *once* on
asa, the 64-bit machine. It's a fairly new machine w/o added
hardware -- but this only happened in 2.24.0 when I added the
tickless clock option, which sure seems like a remote possibility for
causing an xfs error, but could be. A 3rd linux system, hardware poor,
"ast-32", was up over 20 days on 2.23.14 (w/tickless) before I
took it down for a 2.24.2 kernel install (its single 20G disk is so
old that it doesn't support barriers).

>
>> which lock already depends on the new lock.
>> the existing dependency chain (in reverse order) is:
>
> munmap() dropping the last reference to it's vm_file and
> calling ->release() which causes a truncate of speculatively
> allocated space to take place. IOWs, ->release() is called
> with the mmap_sem held. Hmmm....
>
> Looking at it in terms of i_mutex, other filesystems hold
> i_mutex over dio_get_page() (all those that use DIO_LOCKING)
> so question is whether we are allowed to take the i_mutex
> in ->release. I note that both reiserfs and hfsplus take
> i_mutex in ->release as well as use DIO_LOCKING, so this
> problem is not isolated to XFS.
>
> However, it would appear that mmap_sem -> i_mutex is illegal
> according to the comment at the head of mm/filemap.c. While we are
> not using i_mutex in this case, the inversion would seem to be
> equivalent in nature.
>
> There's not going to be a quick fix for this.
----
What could be the consequences of this locking anomaly?
I.e., for example, in NFS, I have enabled "allow direct I/O on NFS
files". The times when the system has been unstable would be around
the time when the local machine might be running xfs_fsr while
a remote system is using NFS to write its backups. The exact
timing of things depends on the dump-level and internet-'book-keeping'
work done on the local system which adds an element of uncertainty
as to whether or not xfs_fsr might be running at the same time
NFS might be doing direct I/O.

It's also possible for a local backup to be writing to a backup disk
at the same time xfs_fsr is running, since they trigger off of
different cron entries (xfs_fsr off of cron.daily which runs
"whenever"), and backups which run at mostly fixed times.
The local backup uses xfs_dump (which might use some direct I/O to read?)
but the writes go through compression, and are likely using buffered
i/o.

>
> And the other one:
>
>> Feb 7 02:01:50 kern:
>> -------------------------------------------------------
>> Feb 7 02:01:50 kern: xfs_fsr/6313 is trying to acquire lock:
>> Feb 7 02:01:50 kern: (&(&ip->i_lock)->mr_lock/2){----}, at:
>> [<ffffffff803c1b22>] xfs_ilock+0x82/0xc0
>> Feb 7 02:01:50 kern:
>> Feb 7 02:01:50 kern: but task is already holding lock:
>> Feb 7 02:01:50 kern: (&(&ip->i_iolock)->mr_lock/3){--..}, at:
>> [<ffffffff803c1b45>] xfs_ilock+0xa5/0xc0
>> Feb 7 02:01:50 kern:
>> Feb 7 02:01:50 kern: which lock already depends on the new lock.
>
> Looks like yet another false positive. Basically we do this
> in xfs_swap_extents:
>
> inode A: i_iolock class 2
> inode A: i_ilock class 2
> inode B: i_iolock class 3
> inode B: i_ilock class 3
> .....
> inode A: unlock ilock
> inode B: unlock ilock
> .....
>>>>>> inode A: ilock class 2
> inode B: ilock class 3
>
> And lockdep appears to be complaining about the relocking of inode A
> as class 2 because we've got a class 3 iolock still held, hence
> violating the order it saw initially. There's no possible deadlock
> here so we'll just have to add more hacks to the annotation code to make
> lockdep happy.
----
Is there a reason to unlock and relock the same inode while
the level 3 lock is held -- i.e. does 'unlocking ilock' allow some
increased 'throughput' for some other potential process to access
the same inode? I'd expect not, if the 'iolock' is held, but just
a question. I certainly don't understand the exact effects of
the various locks in question, but it seems that the 2nd two groups
where the inodes are unlocked and relocked are superfluous if
an iolock for those inodes remains held. But again, I don't really
know what the locks are doing, so don't know.

Sorry for all the bother. Just trying to figure out why a
system that was rock-solid (2-3 month uptimes, easily, only planned
downs), to going all flako on me when I tried to add SATA and upgraded
kernel to include latest SATA code & drivers. Unfortunately part of
that was adding udev in place of a static /dev, so that's another
unknown that I know is flakey at times (had a SATA sdb disk go off line
with a supposed HW-reset error, then have it come back on line as
"sdc"!) That's certainly a bit weird from my perspective, but hey,
some might consider it a feature, so who am I to argue....:-)

2008-02-12 21:28:36

by Eric Sandeen

[permalink] [raw]

Subject: Re: xfs [_fsr] probs in 2.6.24.0

Linda Walsh wrote:
>
> David Chinner wrote:
>> Filesystem bugs rarely hang systems hard like that - more likely is
>> a hardware or driver problem. And neither of the lockdep reports
>> below are likely to be responsible for a system wide, no-response
>> hang.
> ---
> "Ish", the 32-bitter, has been the only hard-hanger.

4k stacks?

-Eric

2008-02-12 21:44:50

by Linda Walsh

[permalink] [raw]

Subject: Re: xfs [_fsr] probs in 2.6.24.0

Eric Sandeen wrote:
> Linda Walsh wrote:
>> David Chinner wrote:
>>> Filesystem bugs rarely hang systems hard like that - more likely is
>>> a hardware or driver problem. And neither of the lockdep reports
>>> below are likely to be responsible for a system wide, no-response
>>> hang.
>> ---
>> "Ish", the 32-bitter, has been the only hard-hanger.
>
> 4k stacks?
----
But but but...almost from the day they were introduced. And
these are more recent probs. Has stack usage increased for some reason,
:-(. I do have the option to detect stack-overflow turned on as well
-- guess it doesn't work so well?

If they are that problematic, maybe selecting xfs as a config
option should force 8k stacks (ugly solution, but might eliminate
some lost hair (from pulling it out) for end users....?

Guess I should go back to 8k's for now...seems odd that it'd
pop up now, but maybe it's the xtra NFS loading? Sigh.

2008-02-12 21:56:55

by Eric Sandeen

[permalink] [raw]

Subject: Re: xfs [_fsr] probs in 2.6.24.0

Linda Walsh wrote:
>
> Eric Sandeen wrote:
>> Linda Walsh wrote:
>>> David Chinner wrote:
>>>> Filesystem bugs rarely hang systems hard like that - more likely is
>>>> a hardware or driver problem. And neither of the lockdep reports
>>>> below are likely to be responsible for a system wide, no-response
>>>> hang.
>>> ---
>>> "Ish", the 32-bitter, has been the only hard-hanger.
>> 4k stacks?
> ----
> But but but...almost from the day they were introduced. And
> these are more recent probs. Has stack usage increased for some reason,
> :-(. I do have the option to detect stack-overflow turned on as well
> -- guess it doesn't work so well?

Resource requirements grow over time, film at 11? :)

the checker is a random thing, it checks only on interrupts; it won't
always hit. you could try CONFIG_DEBUG_STACK_USAGE too, each thread
prints max stack used when it exits, to see if you're getting close on
normal usage.

Or just use 8k.

-Eric

2008-02-12 21:59:29

by David Chinner

[permalink] [raw]

Subject: Re: xfs [_fsr] probs in 2.6.24.0

On Tue, Feb 12, 2008 at 01:02:05PM -0800, Linda Walsh wrote:
> David Chinner wrote:
> >Perhaps by running xfs_fsr manually you could reproduce the
> >problem while you are sitting in front of the machine...
> ----
> Um...yeah, AND with multiple "cp's of multi-gig files
> going on at same time, both local, by a sister machine via NFS,
> and and a 3rd machine tapping (not banging) away via CIFS.
> These were on top of normal server duties. Whenever I stress
> it on *purpose* and watch it, works fine. GRRRRRRR....I HATE
> THAT!!!

I feel your pain.

> The file system(s) going "offline" due
> to xfs-detected filesystem errors has only happened *once* on
> asa, the 64-bit machine. It's a fairly new machine w/o added
> hardware -- but this only happened in 2.24.0 when I added the
> tickless clock option, which sure seems like a remote possibility for
> causing an xfs error, but could be.

Well, tickless is new and shiny and I doubt anyone has done
much testing with XFS on tickless kernels. Still, if that's a new
config option you set, change it back to what you had for .23 on
that hardware and try again.

> >Looking at it in terms of i_mutex, other filesystems hold
> >i_mutex over dio_get_page() (all those that use DIO_LOCKING)
> >so question is whether we are allowed to take the i_mutex
> >in ->release. I note that both reiserfs and hfsplus take
> >i_mutex in ->release as well as use DIO_LOCKING, so this
> >problem is not isolated to XFS.
> >
> >However, it would appear that mmap_sem -> i_mutex is illegal
> >according to the comment at the head of mm/filemap.c. While we are
> >not using i_mutex in this case, the inversion would seem to be
> >equivalent in nature.
> >
> >There's not going to be a quick fix for this.
> ----
> What could be the consequences of this locking anomaly?

If you have a multithreaded application that mixes mmap and
direct I/O, and you have a simultaneous munmap() call and
read() to the same file, you might be able to deadlock access
to that file. However, you'd have to be certifiably insane
to write an application that did this (mix mmap and direct I/O
to the same file at the same time), so I think exposure is
pretty limited.

> I.e., for example, in NFS, I have enabled "allow direct I/O on NFS
> files".

That's client side direct I/O, which is not what the server
does. Client side direct I/O results in synchronous buffered
I/O on the server, which will thrash your disks pretty hard.
The config option help does warn you about this. ;)

> >And the other one:
> >
> >>Feb 7 02:01:50 kern:
> >>-------------------------------------------------------
> >>Feb 7 02:01:50 kern: xfs_fsr/6313 is trying to acquire lock:
> >>Feb 7 02:01:50 kern: (&(&ip->i_lock)->mr_lock/2){----}, at:
> >>[<ffffffff803c1b22>] xfs_ilock+0x82/0xc0
> >>Feb 7 02:01:50 kern:
> >>Feb 7 02:01:50 kern: but task is already holding lock:
> >>Feb 7 02:01:50 kern: (&(&ip->i_iolock)->mr_lock/3){--..}, at:
> >>[<ffffffff803c1b45>] xfs_ilock+0xa5/0xc0
> >>Feb 7 02:01:50 kern:
> >>Feb 7 02:01:50 kern: which lock already depends on the new lock.
> >
> >Looks like yet another false positive. Basically we do this
> >in xfs_swap_extents:
> >
> > inode A: i_iolock class 2
> > inode A: i_ilock class 2
> > inode B: i_iolock class 3
> > inode B: i_ilock class 3
> > .....
> > inode A: unlock ilock
> > inode B: unlock ilock
> > .....
> >>>>>> inode A: ilock class 2
> > inode B: ilock class 3
> >
> >And lockdep appears to be complaining about the relocking of inode A
> >as class 2 because we've got a class 3 iolock still held, hence
> >violating the order it saw initially. There's no possible deadlock
> >here so we'll just have to add more hacks to the annotation code to make
> >lockdep happy.
> ----
> Is there a reason to unlock and relock the same inode while
> the level 3 lock is held -- i.e. does 'unlocking ilock' allow some
> increased 'throughput' for some other potential process to access
> the same inode?

It prevents a single thread deadlock when doing transaction reservation.
i.e. the process of setting up a transaction can require the ilock
to be taken, and hence we have to drop it before and pick it back up
after the transaction reservation.

We hold on to the iolock to prevent the inode from having new I/O
started while we do the transaction reservation, so it's in the
same state after the reservation as it was before....

We have to hold both locks to guarantee exclusive access to the
inode, so once we have the reservation we need to pick the ilocks
back up. The way we do it here does not violate lock ordering at all
(iolock before ilock on a single inode, and ascending inode number
order for multiple inodes), but lockdep is not smart enough to know
that. Hence we need more complex annotations to shut it up.

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group