LinuxLists.cc - Re: [PATCH 1/2] kill-the-bkl/reiserfs: acquire the inode mutex safely

2009-06-05 18:28:43

Subject: Re: [PATCH 1/2] kill-the-bkl/reiserfs: acquire the inode mutex safely

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Trenton D. Adams wrote:
> On Sat, May 16, 2009 at 12:02 PM, Frederic Weisbecker
> <[email protected]> wrote:
>> While searching a pathname, an inode mutex can be acquired
>> in do_lookup() which calls reiserfs_lookup() which in turn
>> acquires the write lock.
>>
>> On the other side reiserfs_fill_super() can acquire the write_lock
>> and then call reiserfs_lookup_privroot() which can acquire an
>> inode mutex (the root of the mount point).
>>
>> So we theoretically risk an AB - BA lock inversion that could lead
>> to a deadlock.
>>
>> As for other lock dependencies found since the bkl to mutex
>> conversion, the fix is to use reiserfs_mutex_lock_safe() which
>> drops the lock dependency to the write lock.
>>
>
> I'm curious, did this get applied, and is it related to the following?
> I was having these in 2.6.30-rc3. I am now on 2.6.30-rc7 as of
> today. I haven't seen them today. But then again, I only seen this
> happen one time.
>
> May 27 01:56:12 tdamac INFO: task pdflush:15370 blocked for more than
> 120 seconds.
> May 27 01:56:12 tdamac "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> May 27 01:56:12 tdamac pdflush D ffff8800518a0000 0 15370 2
> May 27 01:56:12 tdamac ffff880025023b50 0000000000000046
> 0000000025023a90 000000000000d7a0
> May 27 01:56:12 tdamac 0000000000004000 0000000000011440
> 000000000000ca78 ffff880045e71568
> May 27 01:56:12 tdamac ffff880045e7156c ffff8800518a0000
> ffff880067f54230 ffff8800518a0380
> May 27 01:56:12 tdamac Call Trace:
> May 27 01:56:12 tdamac [<ffffffff80687d1b>] ? __mutex_lock_slowpath+0xe2/0x124
> May 27 01:56:12 tdamac [<ffffffff80687d13>] __mutex_lock_slowpath+0xda/0x124
> May 27 01:56:12 tdamac [<ffffffff8068809e>] mutex_lock+0x1e/0x36
> May 27 01:56:12 tdamac [<ffffffff803087ae>] flush_commit_list+0x150/0x689
> May 27 01:56:12 tdamac [<ffffffff8022f8e5>] ? __wake_up+0x43/0x50
> May 27 01:56:12 tdamac [<ffffffff8030ad8a>] do_journal_end+0xb4a/0xd6c
> May 27 01:56:12 tdamac [<ffffffff8023053d>] ? dequeue_entity+0x1b/0x1df
> May 27 01:56:12 tdamac [<ffffffff8030b020>] journal_end_sync+0x74/0x7d
> May 27 01:56:12 tdamac [<ffffffff802fd2fd>] reiserfs_sync_fs+0x41/0x67
> May 27 01:56:12 tdamac [<ffffffff80688091>] ? mutex_lock+0x11/0x36
> May 27 01:56:12 tdamac [<ffffffff802fd331>] reiserfs_write_super+0xe/0x10
> May 27 01:56:12 tdamac [<ffffffff802a532a>] sync_supers+0x61/0xa6
> May 27 01:56:12 tdamac [<ffffffff8027e140>] wb_kupdate+0x32/0x128
> May 27 01:56:12 tdamac [<ffffffff8027ee7c>] pdflush+0x140/0x21f
> May 27 01:56:12 tdamac [<ffffffff8027e10e>] ? wb_kupdate+0x0/0x128
> May 27 01:56:12 tdamac [<ffffffff8027ed3c>] ? pdflush+0x0/0x21f
> May 27 01:56:12 tdamac [<ffffffff8024fb26>] kthread+0x56/0x83
> May 27 01:56:12 tdamac [<ffffffff8020beba>] child_rip+0xa/0x20
> May 27 01:56:12 tdamac [<ffffffff8024fad0>] ? kthread+0x0/0x83
> May 27 01:56:12 tdamac [<ffffffff8020beb0>] ? child_rip+0x0/0x20

Can you capture a sysrq+t when this happens? The lock is properly
released, but I have a hunch that another thread is doing ordered
writeback that's taking a while. That happens under the j_commit_mutex.

- -Jeff

- --
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iEYEARECAAYFAkopY0MACgkQLPWxlyuTD7IM0gCdGepeXFcB68gcCaXCb3Z/KTg9
F5MAn3rOomgVzmXfI4DKtIHqKxwLNDj0
=qzqo
-----END PGP SIGNATURE-----

2009-06-05 19:07:05

by Trenton D. Adams

[permalink] [raw]

Subject: Re: [PATCH 1/2] kill-the-bkl/reiserfs: acquire the inode mutex safely

On Fri, Jun 5, 2009 at 12:26 PM, Jeff Mahoney<[email protected]> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Trenton D. Adams wrote:
>> On Sat, May 16, 2009 at 12:02 PM, Frederic Weisbecker
>> <[email protected]> wrote:
>>> While searching a pathname, an inode mutex can be acquired
>>> in do_lookup() which calls reiserfs_lookup() which in turn
>>> acquires the write lock.
>>>
>>> On the other side reiserfs_fill_super() can acquire the write_lock
>>> and then call reiserfs_lookup_privroot() which can acquire an
>>> inode mutex (the root of the mount point).
>>>
>>> So we theoretically risk an AB - BA lock inversion that could lead
>>> to a deadlock.
>>>
>>> As for other lock dependencies found since the bkl to mutex
>>> conversion, the fix is to use reiserfs_mutex_lock_safe() which
>>> drops the lock dependency to the write lock.
>>>
>>
>> I'm curious, did this get applied, and is it related to the following?
>> ?I was having these in 2.6.30-rc3. ?I am now on 2.6.30-rc7 as of
>> today. ?I haven't seen them today. ?But then again, I only seen this
>> happen one time.
>>
>> May 27 01:56:12 tdamac INFO: task pdflush:15370 blocked for more than
>> 120 seconds.
>> May 27 01:56:12 tdamac "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> May 27 01:56:12 tdamac pdflush ? ? ? D ffff8800518a0000 ? ? 0 15370 ? ? ?2
>> May 27 01:56:12 tdamac ffff880025023b50 0000000000000046
>> 0000000025023a90 000000000000d7a0
>> May 27 01:56:12 tdamac 0000000000004000 0000000000011440
>> 000000000000ca78 ffff880045e71568
>> May 27 01:56:12 tdamac ffff880045e7156c ffff8800518a0000
>> ffff880067f54230 ffff8800518a0380
>> May 27 01:56:12 tdamac Call Trace:
>> May 27 01:56:12 tdamac [<ffffffff80687d1b>] ? __mutex_lock_slowpath+0xe2/0x124
>> May 27 01:56:12 tdamac [<ffffffff80687d13>] __mutex_lock_slowpath+0xda/0x124
>> May 27 01:56:12 tdamac [<ffffffff8068809e>] mutex_lock+0x1e/0x36
>> May 27 01:56:12 tdamac [<ffffffff803087ae>] flush_commit_list+0x150/0x689
>> May 27 01:56:12 tdamac [<ffffffff8022f8e5>] ? __wake_up+0x43/0x50
>> May 27 01:56:12 tdamac [<ffffffff8030ad8a>] do_journal_end+0xb4a/0xd6c
>> May 27 01:56:12 tdamac [<ffffffff8023053d>] ? dequeue_entity+0x1b/0x1df
>> May 27 01:56:12 tdamac [<ffffffff8030b020>] journal_end_sync+0x74/0x7d
>> May 27 01:56:12 tdamac [<ffffffff802fd2fd>] reiserfs_sync_fs+0x41/0x67
>> May 27 01:56:12 tdamac [<ffffffff80688091>] ? mutex_lock+0x11/0x36
>> May 27 01:56:12 tdamac [<ffffffff802fd331>] reiserfs_write_super+0xe/0x10
>> May 27 01:56:12 tdamac [<ffffffff802a532a>] sync_supers+0x61/0xa6
>> May 27 01:56:12 tdamac [<ffffffff8027e140>] wb_kupdate+0x32/0x128
>> May 27 01:56:12 tdamac [<ffffffff8027ee7c>] pdflush+0x140/0x21f
>> May 27 01:56:12 tdamac [<ffffffff8027e10e>] ? wb_kupdate+0x0/0x128
>> May 27 01:56:12 tdamac [<ffffffff8027ed3c>] ? pdflush+0x0/0x21f
>> May 27 01:56:12 tdamac [<ffffffff8024fb26>] kthread+0x56/0x83
>> May 27 01:56:12 tdamac [<ffffffff8020beba>] child_rip+0xa/0x20
>> May 27 01:56:12 tdamac [<ffffffff8024fad0>] ? kthread+0x0/0x83
>> May 27 01:56:12 tdamac [<ffffffff8020beb0>] ? child_rip+0x0/0x20
>
> Can you capture a sysrq+t when this happens? The lock is properly
> released, but I have a hunch that another thread is doing ordered
> writeback that's taking a while. That happens under the j_commit_mutex.

FYI: I never did anything specific that I knew of, so I didn't
actually notice a delay. I was rsyncing to a USB key at the time.
And seeing it took over an hour, I walked away, so I wouldn't have
noticed it. But, I could fiddle around a little to see if I could get
some sort of delay going on. Any ideas on what I should try? Then I
can do the sysreq+t for you if I can reproduce.

2009-06-05 19:32:34

by Jeff Mahoney

[permalink] [raw]

Subject: Re: [PATCH 1/2] kill-the-bkl/reiserfs: acquire the inode mutex safely

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Trenton D. Adams wrote:
> FYI: I never did anything specific that I knew of, so I didn't
> actually notice a delay. I was rsyncing to a USB key at the time.
> And seeing it took over an hour, I walked away, so I wouldn't have
> noticed it. But, I could fiddle around a little to see if I could get
> some sort of delay going on. Any ideas on what I should try? Then I
> can do the sysreq+t for you if I can reproduce.

Well if the rsync triggered it, that's a good start.

Try applying the following patch as well. It will cause the hung task detector
to do a sysrq+t automatically so it's not as much of a guessing game. You'll need to boot with
hung_task_show_state=1.

- -Jeff

- ---
kernel/hung_task.c | 11 +++++++++++
1 file changed, 11 insertions(+)

- --- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -56,6 +56,14 @@ static int __init hung_task_panic_setup(
}
__setup("hung_task_panic=", hung_task_panic_setup);

+unsigned int __read_mostly sysctl_hung_task_show_state;
+static int __init hung_task_show_state_setup(char *str)
+{
+ sysctl_hung_task_show_state = simple_strtoul(str, NULL, 0);
+ return 1;
+}
+__setup("hung_task_show_state=", hung_task_show_state_setup);
+
static int
hung_task_panic(struct notifier_block *this, unsigned long event, void *ptr)
{
@@ -102,6 +110,9 @@ static void check_hung_task(struct task_

touch_nmi_watchdog();

+ if (sysctl_hung_task_show_state)
+ show_state();
+
if (sysctl_hung_task_panic)
panic("hung_task: blocked tasks");
}

- --
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iEYEARECAAYFAkopcjwACgkQLPWxlyuTD7IEdgCfVVzIL/DA0stfnYEW6aixFwxM
qIEAnjJjyn6HQAbVIicRYzvNcGvPwbiq
=z8Pn
-----END PGP SIGNATURE-----

2009-06-05 19:57:39

by Trenton D. Adams

[permalink] [raw]

Subject: Re: [PATCH 1/2] kill-the-bkl/reiserfs: acquire the inode mutex safely

I'll see if I can try that this weekend.

Thanks.

On Fri, Jun 5, 2009 at 1:30 PM, Jeff Mahoney<[email protected]> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Trenton D. Adams wrote:
>> FYI: I never did anything specific that I knew of, so I didn't
>> actually notice a delay. ?I was rsyncing to a USB key at the time.
>> And seeing it took over an hour, I walked away, so I wouldn't have
>> noticed it. ?But, I could fiddle around a little to see if I could get
>> some sort of delay going on. ?Any ideas on what I should try? ?Then I
>> can do the sysreq+t for you if I can reproduce.
>
> Well if the rsync triggered it, that's a good start.
>
> Try applying the following patch as well. It will cause the hung task detector
> to do a sysrq+t automatically so it's not as much of a guessing game. You'll need to boot with
> hung_task_show_state=1.
>
> - -Jeff
>
> - ---
> ?kernel/hung_task.c | ? 11 +++++++++++
> ?1 file changed, 11 insertions(+)
>
> - --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -56,6 +56,14 @@ static int __init hung_task_panic_setup(
> ?}
> ?__setup("hung_task_panic=", hung_task_panic_setup);
>
> +unsigned int __read_mostly sysctl_hung_task_show_state;
> +static int __init hung_task_show_state_setup(char *str)
> +{
> + ? ? ? sysctl_hung_task_show_state = simple_strtoul(str, NULL, 0);
> + ? ? ? return 1;
> +}
> +__setup("hung_task_show_state=", hung_task_show_state_setup);
> +
> ?static int
> ?hung_task_panic(struct notifier_block *this, unsigned long event, void *ptr)
> ?{
> @@ -102,6 +110,9 @@ static void check_hung_task(struct task_
>
> ? ? ? ?touch_nmi_watchdog();
>
> + ? ? ? if (sysctl_hung_task_show_state)
> + ? ? ? ? ? ? ? show_state();
> +
> ? ? ? ?if (sysctl_hung_task_panic)
> ? ? ? ? ? ? ? ?panic("hung_task: blocked tasks");
> ?}
>
> - --
> Jeff Mahoney
> SUSE Labs
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.9 (GNU/Linux)
> Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org
>
> iEYEARECAAYFAkopcjwACgkQLPWxlyuTD7IEdgCfVVzIL/DA0stfnYEW6aixFwxM
> qIEAnjJjyn6HQAbVIicRYzvNcGvPwbiq
> =z8Pn
> -----END PGP SIGNATURE-----
>

2009-06-11 00:42:48

by Trenton D. Adams

[permalink] [raw]

Subject: Re: [PATCH 1/2] kill-the-bkl/reiserfs: acquire the inode mutex safely

I have not been seeing this with 2.6.30-rc7. Perhaps it was resolved
in that version. I'll let you know if it happens again.