Hello,
as the subject says it's some time[0] I'm experiencing deadlocks[1] (I'm
only tracking -mm, and sporadically using the stable series). I have a
couple of use cases that seem to reliably trigger the deadlock, namely
using Eclipse and Firefox.
[0]: I'd say at least since 2.6.17-rc-something
[1]: no sysrq, no logs, no nothing, netconsole doesn't log anything
more, just screwed FS which grub can't read (Inconsistent
filesystem - press any key to continue)
I decided to try out the LOCKDEP stuff to see if I can provide more
useful info, and here we go, as soon as my /home (jfs) partition is
written (at login):
[ 72.420000]
[ 72.420000] =============================================
[ 72.420000] [ INFO: possible recursive locking detected ]
[ 72.420000] 2.6.18-rc5-mm1-2-lockdep #9
[ 72.420000] ---------------------------------------------
[ 72.420000] zsh/3844 is trying to acquire lock:
[ 72.420000] (&jfs_ip->commit_mutex){--..}, at: [<c0320948>] mutex_lock+0x8/0x10
[ 72.420000]
[ 72.420000] but task is already holding lock:
[ 72.420000] (&jfs_ip->commit_mutex){--..}, at: [<c0320948>] mutex_lock+0x8/0x10
[ 72.420000]
[ 72.420000] other info that might help us debug this:
[ 72.420000] 2 locks held by zsh/3844:
[ 72.420000] #0: (&inode->i_mutex){--..}, at: [<c0320948>] mutex_lock+0x8/0x10
[ 72.420000] #1: (&jfs_ip->commit_mutex){--..}, at: [<c0320948>] mutex_lock+0x8/0x10
[ 72.420000]
[ 72.420000] stack backtrace:
[ 72.420000] [<c0103b8f>] dump_trace+0x1ef/0x230
[ 72.420000] [<c0103bf6>] show_trace_log_lvl+0x26/0x40
[ 72.420000] [<c01043cb>] show_trace+0x1b/0x20
[ 72.420000] [<c01044b4>] dump_stack+0x24/0x30
[ 72.420000] [<c0136050>] __lock_acquire+0x8e0/0xd80
[ 72.420000] [<c0136869>] lock_acquire+0x69/0x90
[ 72.420000] [<c0320705>] __mutex_lock_slowpath+0x75/0x2b0
[ 72.420000] [<c0320948>] mutex_lock+0x8/0x10
[ 72.420000] [<d137950b>] jfs_create+0xbb/0x420 [jfs]
[ 72.420000] [<c016cb7b>] vfs_create+0xcb/0x120
[ 72.420000] [<c0170158>] open_namei+0x618/0x6f0
[ 72.420000] [<c0162368>] do_filp_open+0x38/0x60
[ 72.420000] [<c01623db>] do_sys_open+0x4b/0xf0
[ 72.420000] [<c01624d7>] sys_open+0x27/0x30
[ 72.420000] [<c01032af>] syscall_call+0x7/0xb
[ 72.420000] [<b7ec8b8d>] 0xb7ec8b8d
[ 72.420000] =======================
I suspect JFS is guilty, anyway my HD has these partitions:
/dev/hda1 on / type reiserfs (rw)
/dev/hda3 on /usr type reiserfs (rw)
/dev/hda5 on /home type jfs (rw)
bootlog: http://oioio.altervista.org/linux/dmesg-2.6.18-rc5-mm1-lockdep
config: http://oioio.altervista.org/linux/config-2.6.18-rc5-mm1-lockdep
--
mattia
:wq!
I meant to reply to this earlier. I've had a lot of distractions.
On Tue, 2006-09-05 at 22:33 +0200, Mattia Dongili wrote:
> Hello,
>
> as the subject says it's some time[0] I'm experiencing deadlocks[1] (I'm
> only tracking -mm, and sporadically using the stable series). I have a
> couple of use cases that seem to reliably trigger the deadlock, namely
> using Eclipse and Firefox.
>
> [0]: I'd say at least since 2.6.17-rc-something
> [1]: no sysrq, no logs, no nothing, netconsole doesn't log anything
> more, just screwed FS which grub can't read (Inconsistent
> filesystem - press any key to continue)
>
> I decided to try out the LOCKDEP stuff to see if I can provide more
> useful info, and here we go, as soon as my /home (jfs) partition is
> written (at login):
> [ 72.420000]
> [ 72.420000] =============================================
> [ 72.420000] [ INFO: possible recursive locking detected ]
> [ 72.420000] 2.6.18-rc5-mm1-2-lockdep #9
> [ 72.420000] ---------------------------------------------
> [ 72.420000] zsh/3844 is trying to acquire lock:
> [ 72.420000] (&jfs_ip->commit_mutex){--..}, at: [<c0320948>] mutex_lock+0x8/0x10
> [ 72.420000]
> [ 72.420000] but task is already holding lock:
> [ 72.420000] (&jfs_ip->commit_mutex){--..}, at: [<c0320948>] mutex_lock+0x8/0x10
> [ 72.420000]
> [ 72.420000] other info that might help us debug this:
> [ 72.420000] 2 locks held by zsh/3844:
> [ 72.420000] #0: (&inode->i_mutex){--..}, at: [<c0320948>] mutex_lock+0x8/0x10
> [ 72.420000] #1: (&jfs_ip->commit_mutex){--..}, at: [<c0320948>] mutex_lock+0x8/0x10
> [ 72.420000]
> [ 72.420000] stack backtrace:
> [ 72.420000] [<c0103b8f>] dump_trace+0x1ef/0x230
> [ 72.420000] [<c0103bf6>] show_trace_log_lvl+0x26/0x40
> [ 72.420000] [<c01043cb>] show_trace+0x1b/0x20
> [ 72.420000] [<c01044b4>] dump_stack+0x24/0x30
> [ 72.420000] [<c0136050>] __lock_acquire+0x8e0/0xd80
> [ 72.420000] [<c0136869>] lock_acquire+0x69/0x90
> [ 72.420000] [<c0320705>] __mutex_lock_slowpath+0x75/0x2b0
> [ 72.420000] [<c0320948>] mutex_lock+0x8/0x10
> [ 72.420000] [<d137950b>] jfs_create+0xbb/0x420 [jfs]
> [ 72.420000] [<c016cb7b>] vfs_create+0xcb/0x120
> [ 72.420000] [<c0170158>] open_namei+0x618/0x6f0
> [ 72.420000] [<c0162368>] do_filp_open+0x38/0x60
> [ 72.420000] [<c01623db>] do_sys_open+0x4b/0xf0
> [ 72.420000] [<c01624d7>] sys_open+0x27/0x30
> [ 72.420000] [<c01032af>] syscall_call+0x7/0xb
> [ 72.420000] [<b7ec8b8d>] 0xb7ec8b8d
> [ 72.420000] =======================
>
> I suspect JFS is guilty, anyway my HD has these partitions:
I haven't got around to instrumenting jfs properly with
mutex_lock_nested(), so I know jfs doesn't run clean with lockdep
enabled. What that means is that these warnings don't necessarily point
to a real problem, and on the other hand, lockdep hasn't been run
correctly against jfs to prove that the mutex usage is safe.
That said, I'm not aware of any known problems in jfs resulting in a
deadlock. Unfortunately, without being able to use sysrq, I don't have
any real good ideas for you off the top of my head to further track down
the problem.
I'm pretty busy this week, but I'll try to get the lockdep stuff right
in jfs as soon as possible. Who knows? Maybe it will find a real
locking problem.
>
> /dev/hda1 on / type reiserfs (rw)
> /dev/hda3 on /usr type reiserfs (rw)
> /dev/hda5 on /home type jfs (rw)
>
> bootlog: http://oioio.altervista.org/linux/dmesg-2.6.18-rc5-mm1-lockdep
> config: http://oioio.altervista.org/linux/config-2.6.18-rc5-mm1-lockdep
>
--
David Kleikamp
IBM Linux Technology Center
On Wed, Sep 06, 2006 at 05:00:28PM -0500, Dave Kleikamp wrote:
[...]
> > I suspect JFS is guilty, anyway my HD has these partitions:
>
> I haven't got around to instrumenting jfs properly with
> mutex_lock_nested(), so I know jfs doesn't run clean with lockdep
> enabled. What that means is that these warnings don't necessarily point
> to a real problem, and on the other hand, lockdep hasn't been run
> correctly against jfs to prove that the mutex usage is safe.
>
> That said, I'm not aware of any known problems in jfs resulting in a
> deadlock. Unfortunately, without being able to use sysrq, I don't have
> any real good ideas for you off the top of my head to further track down
> the problem.
yup... don't know if it could put some light but simply doing
$ mv ~/.firefox ~/.firefox.old
$ cp ~/.firefox.old ~/.firefox
in order to allocate different inodes (right?) I can run Firefox
with my favourite skin (the one that was causing the hang before).
> I'm pretty busy this week, but I'll try to get the lockdep stuff right
> in jfs as soon as possible. Who knows? Maybe it will find a real
> locking problem.
I'll try to keep the filesystem as is to be able to test any fix/test
you'll propose (Eclipse still hangs the computer). Fortunately I have
one more spare partition where I can move /home.
Thanks
--
mattia
:wq!
On Thu, 2006-09-07 at 17:39 +0200, Mattia Dongili wrote:
> On Wed, Sep 06, 2006 at 05:00:28PM -0500, Dave Kleikamp wrote:
> [...]
> > > I suspect JFS is guilty, anyway my HD has these partitions:
> >
> > I haven't got around to instrumenting jfs properly with
> > mutex_lock_nested(), so I know jfs doesn't run clean with lockdep
> > enabled. What that means is that these warnings don't necessarily point
> > to a real problem, and on the other hand, lockdep hasn't been run
> > correctly against jfs to prove that the mutex usage is safe.
> >
> > That said, I'm not aware of any known problems in jfs resulting in a
> > deadlock. Unfortunately, without being able to use sysrq, I don't have
> > any real good ideas for you off the top of my head to further track down
> > the problem.
>
> yup... don't know if it could put some light but simply doing
>
> $ mv ~/.firefox ~/.firefox.old
> $ cp ~/.firefox.old ~/.firefox
>
> in order to allocate different inodes (right?) I can run Firefox
> with my favourite skin (the one that was causing the hang before).
>
> > I'm pretty busy this week, but I'll try to get the lockdep stuff right
> > in jfs as soon as possible. Who knows? Maybe it will find a real
> > locking problem.
>
> I'll try to keep the filesystem as is to be able to test any fix/test
> you'll propose (Eclipse still hangs the computer). Fortunately I have
> one more spare partition where I can move /home.
As long as you're going to try a different /home partition, why don't
you format it as something other than jfs. This way if you see the
problem again, you can rule out jfs as the culprit. Also, if jfs is the
problem, you can avoid it.
--
David Kleikamp
IBM Linux Technology Center
On Thu, Sep 07, 2006 at 10:46:36AM -0500, Dave Kleikamp wrote:
> On Thu, 2006-09-07 at 17:39 +0200, Mattia Dongili wrote:
[...]
> > one more spare partition where I can move /home.
>
> As long as you're going to try a different /home partition, why don't
> you format it as something other than jfs. This way if you see the
yeah, of course ;)
Thanks
--
mattia
:wq!
On Wed, Sep 06, 2006 at 05:00:28PM -0500, Dave Kleikamp wrote:
> I meant to reply to this earlier. I've had a lot of distractions.
>
> On Tue, 2006-09-05 at 22:33 +0200, Mattia Dongili wrote:
> > Hello,
> >
> > as the subject says it's some time[0] I'm experiencing deadlocks[1] (I'm
> > only tracking -mm, and sporadically using the stable series). I have a
> > couple of use cases that seem to reliably trigger the deadlock, namely
> > using Eclipse and Firefox.
[...]
> > /dev/hda1 on / type reiserfs (rw)
> > /dev/hda3 on /usr type reiserfs (rw)
> > /dev/hda5 on /home type jfs (rw)
> >
> > bootlog: http://oioio.altervista.org/linux/dmesg-2.6.18-rc5-mm1-lockdep
> > config: http://oioio.altervista.org/linux/config-2.6.18-rc5-mm1-lockdep
Dave,
I have to apologize. Reiser3 seem to be the one deadlocking here
actually. Changing /home to reiser4 still deadlocks.
Now, reiserfs-developers:
would you want me to keep the filesystem around to try to test patches
or potential fixes or can I wipe it out?
The good thing is that the deadlock is 100% repeatable, the bad thing is
that this laptop has a broken cdrom and I have to take the drive out and
fsck it via usb1.1 each time. :)
Thanks
--
mattia
:wq!
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Mattia Dongili wrote:
> On Wed, Sep 06, 2006 at 05:00:28PM -0500, Dave Kleikamp wrote:
>> I meant to reply to this earlier. I've had a lot of distractions.
>>
>> On Tue, 2006-09-05 at 22:33 +0200, Mattia Dongili wrote:
>>> Hello,
>>>
>>> as the subject says it's some time[0] I'm experiencing deadlocks[1] (I'm
>>> only tracking -mm, and sporadically using the stable series). I have a
>>> couple of use cases that seem to reliably trigger the deadlock, namely
>>> using Eclipse and Firefox.
> [...]
>>> /dev/hda1 on / type reiserfs (rw)
>>> /dev/hda3 on /usr type reiserfs (rw)
>>> /dev/hda5 on /home type jfs (rw)
>>>
>>> bootlog: http://oioio.altervista.org/linux/dmesg-2.6.18-rc5-mm1-lockdep
>>> config: http://oioio.altervista.org/linux/config-2.6.18-rc5-mm1-lockdep
>
> Dave,
>
> I have to apologize. Reiser3 seem to be the one deadlocking here
> actually. Changing /home to reiser4 still deadlocks.
>
> Now, reiserfs-developers:
> would you want me to keep the filesystem around to try to test patches
> or potential fixes or can I wipe it out?
> The good thing is that the deadlock is 100% repeatable, the bad thing is
> that this laptop has a broken cdrom and I have to take the drive out and
> fsck it via usb1.1 each time. :)
>
> Thanks
How is it that you arrived on reiser3 and reiser4 deadlocking here?
- -Jeff
- --
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org
iD8DBQFFCVxYLPWxlyuTD7IRArZ1AJ9pJPRw0ERLLS36xhn6dyHiXZJtVgCeN+Xe
OVBhxH0xk8UL/YaUKlHJuEE=
=9R9c
-----END PGP SIGNATURE-----
n Thu, Sep 14, 2006 at 09:42:48AM -0400, Jeff Mahoney wrote:
> Mattia Dongili wrote:
> > On Wed, Sep 06, 2006 at 05:00:28PM -0500, Dave Kleikamp wrote:
> >> I meant to reply to this earlier. I've had a lot of distractions.
> >>
> >> On Tue, 2006-09-05 at 22:33 +0200, Mattia Dongili wrote:
> >>> Hello,
> >>>
> >>> as the subject says it's some time[0] I'm experiencing deadlocks[1] (I'm
> >>> only tracking -mm, and sporadically using the stable series). I have a
> >>> couple of use cases that seem to reliably trigger the deadlock, namely
> >>> using Eclipse and Firefox.
> > [...]
> >>> /dev/hda1 on / type reiserfs (rw)
> >>> /dev/hda3 on /usr type reiserfs (rw)
> >>> /dev/hda5 on /home type jfs (rw)
> >>>
> >>> bootlog: http://oioio.altervista.org/linux/dmesg-2.6.18-rc5-mm1-lockdep
> >>> config: http://oioio.altervista.org/linux/config-2.6.18-rc5-mm1-lockdep
> >
> > Dave,
> >
> > I have to apologize. Reiser3 seem to be the one deadlocking here
> > actually. Changing /home to reiser4 still deadlocks.
> >
> > Now, reiserfs-developers:
> > would you want me to keep the filesystem around to try to test patches
> > or potential fixes or can I wipe it out?
> > The good thing is that the deadlock is 100% repeatable, the bad thing is
> > that this laptop has a broken cdrom and I have to take the drive out and
> > fsck it via usb1.1 each time. :)
> >
> > Thanks
>
>
> How is it that you arrived on reiser3 and reiser4 deadlocking here?
oh, no that's not what I said.
I had:
/dev/hda1 on / type reiserfs (rw)
/dev/hda3 on /usr type reiserfs (rw)
/dev/hda5 on /home type jfs (rw)
the deadlock was there. I then changed to
/dev/hda1 on / type reiserfs (rw)
/dev/hda3 on /usr type reiserfs (rw)
/dev/hda5 on /home type reiser4 (rw)
still deadlocking, so it wasn't jfs but reiserfs.
Anyway I wiped out the reiserfs partitions, it was starting to become
really annoying sorry.
--
mattia
:wq!