2009-03-10 12:17:51

by Alexander Beregalov

[permalink] [raw]
Subject: next-20090310: ext4 hangs

Hi

It is similar to http://lkml.org/lkml/2009/2/6/304,
http://bugzilla.kernel.org/show_bug.cgi?id=12579


I run dbench on ext4 on loop on ext3 on sparc.
The kernel is 2.6.29-rc7-next-20090310
e2fsprogs -1.41.4

ioctl32(mkfs.ext4:1668): Unknown cmd fd(3) cmd(40041272){t:12;sz:4}
arg(ffb2ce60) on /testfs
ioctl32(mkfs.ext4:1668): Unknown cmd fd(3) cmd(20001260){t:12;sz:0}
arg(ffb2ce6c) on /testfs
ioctl32(mkfs.ext4:1668): Unknown cmd fd(3) cmd(401c0204){t:02;sz:28}
arg(ffb2ce44) on /testfs
ioctl32(mkfs.ext4:1668): Unknown cmd fd(3) cmd(20001268){t:12;sz:0}
arg(ffb2d564) on /testfs
ioctl32(mkfs.ext4:1668): Unknown cmd fd(3) cmd(2000125e){t:12;sz:0}
arg(ffb2ce6c) on /testfs
EXT4-fs: barriers enabled
EXT4 FS on loop0, internal journal on loop0:8
EXT4-fs: delayed allocation enabled
EXT4-fs: file extents enabled
EXT4-fs: mballoc enabled
EXT4-fs: mounted filesystem loop0 with ordered data mode
kjournald2 starting: pid 1672, dev loop0:8, commit interval 5 seconds
JBD: barrier-based sync failed on loop0:8 - disabling barriers
INFO: task kjournald2:1672 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald2 D 0000000000557a0c 0 1672 2
Call Trace:
[0000000000768bcc] schedule+0x18/0x40
[0000000000557a0c] jbd2_journal_commit_transaction+0x208/0x1718
[000000000055d758] kjournald2+0x14c/0x318
[0000000000465d34] kthread+0x48/0x7c
[000000000042b364] kernel_thread+0x3c/0x54
[0000000000465c88] kthreadd+0xc8/0x12c
INFO: lockdep is turned off.
INFO: task dbench:1674 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
dbench D 000000000055716c 0 1674 1673
Call Trace:
[0000000000768bcc] schedule+0x18/0x40
[000000000055716c] start_this_handle+0x374/0x508
[0000000000557500] jbd2_journal_start+0xbc/0x11c
[000000000053a438] ext4_journal_start_sb+0x5c/0x84
[000000000052a9b8] ext4_dirty_inode+0xd4/0xf0
[00000000004d1a14] __mark_inode_dirty+0x2c/0x188
[00000000004c7714] touch_atime+0x160/0x178
[00000000004c2a3c] vfs_readdir+0x88/0xb0
[00000000004eab90] compat_sys_getdents+0x3c/0x8c
[0000000000406154] linux_sparc_syscall32+0x34/0x40


2009-03-10 12:46:58

by Theodore Ts'o

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs

On Tue, Mar 10, 2009 at 03:17:51PM +0300, Alexander Beregalov wrote:
> Hi
>
> It is similar to http://lkml.org/lkml/2009/2/6/304,
> http://bugzilla.kernel.org/show_bug.cgi?id=12579
>
> I run dbench on ext4 on loop on ext3 on sparc.
> The kernel is 2.6.29-rc7-next-20090310
> e2fsprogs -1.41.4

Thanks for reporting this; does it show up on stock 2.6.29-rc7? And
any chance you can enable LOCKDEP? Finally, how many CPU's do you
have on your machine?

Thanks again,

- Ted

2009-03-10 12:54:30

by Alexander Beregalov

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs

2009/3/10 Theodore Tso <[email protected]>:
> On Tue, Mar 10, 2009 at 03:17:51PM +0300, Alexander Beregalov wrote:
>> Hi
>>
>> It is similar to http://lkml.org/lkml/2009/2/6/304,
>> http://bugzilla.kernel.org/show_bug.cgi?id=12579
>>
>> I run dbench on ext4 on loop on ext3 on sparc.
>> The kernel is 2.6.29-rc7-next-20090310
>> e2fsprogs -1.41.4
>
> Thanks for reporting this; does it show up on stock 2.6.29-rc7?  And
> any chance you can enable LOCKDEP?  Finally, how many CPU's do you
> have on your machine?

I will try it with current -git.
LOCkDEP is enabled already.
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_LOCKDEP=y
CONFIG_DEBUG_LOCKDEP=y

It is UP system.

2009-03-10 14:19:00

by Alexander Beregalov

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs

2009/3/10 Theodore Tso <[email protected]>:
> On Tue, Mar 10, 2009 at 03:17:51PM +0300, Alexander Beregalov wrote:
>> Hi
>>
>> It is similar to http://lkml.org/lkml/2009/2/6/304,
>> http://bugzilla.kernel.org/show_bug.cgi?id=12579
>>
>> I run dbench on ext4 on loop on ext3 on sparc.
>> The kernel is 2.6.29-rc7-next-20090310
>> e2fsprogs -1.41.4
>
> Thanks for reporting this; does it show up on stock 2.6.29-rc7?
No, I can not reproduce it.
It is a slow system, I would not like to bisect it, only if it is the
last possibility.

2009-03-10 15:47:50

by Theodore Ts'o

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs

On Tue, Mar 10, 2009 at 05:18:55PM +0300, Alexander Beregalov wrote:
> > Thanks for reporting this; does it show up on stock 2.6.29-rc7?
> No, I can not reproduce it.
> It is a slow system, I would not like to bisect it, only if it is the
> last possibility.

Just to be clear; you weren't able to reproduce it on stock 2.6.29-rc7
---- does it reproduce easily on linux-next?

Next question --- does the system hang completely, or just the dbench
process (and probably any process that tries touching the filesystem
that's hung up)? Do you have a serial console, or someone of
recording all of the dumps coming from sysrq-t? The two stack traces
you gave weren't the ones causing the problem, but rather the ones
waiting for the journal lock.

I don't think is related to BZ #12579, although some of the symptoms
look superficially the same.

- Ted

P.S. What are your dbench parameters, and how big is your filesystem,
so I can try to reproduce it on my side? And how long do you
typically need to run before this triggers? Thanks!

2009-03-11 16:07:30

by Alexander Beregalov

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs

2009/3/10 Theodore Tso <[email protected]>:
> On Tue, Mar 10, 2009 at 05:18:55PM +0300, Alexander Beregalov wrote:
>> > Thanks for reporting this; does it show up on stock 2.6.29-rc7?
>> No, I can not reproduce it.
>> It is a slow system, I would not like to bisect it, only if it is the
>> last possibility.
>
> Just to be clear; you weren't able to reproduce it on stock 2.6.29-rc7
> ---- does it reproduce easily on linux-next?
Right.
But now I am not sure 2.6.29-rc7 is not affected, I will try to
reproduce it again.

>
> Next question --- does the system hang completely, or just the dbench
> process (and probably any process that tries touching the filesystem
System is responsible.
> that's hung up)?  Do you have a serial console, or someone of
> recording all of the dumps coming from sysrq-t?  The two stack traces
I can read dmesg.
> you gave weren't the ones causing the problem, but rather the ones
> waiting for the journal lock.

SysRq : Show State

<..>

kjournald2 D 0000000000557a0c 0 1547 2
Call Trace:
[0000000000768bcc] schedule+0x18/0x40
[0000000000557a0c] jbd2_journal_commit_transaction+0x208/0x1718
[000000000055d758] kjournald2+0x14c/0x318
[0000000000465d34] kthread+0x48/0x7c
[000000000042b364] kernel_thread+0x3c/0x54
[0000000000465c88] kthreadd+0xc8/0x12c
pdflush D 000000000055716c 0 2021 2
Call Trace:
[0000000000768bcc] schedule+0x18/0x40
[000000000055716c] start_this_handle+0x374/0x508
[0000000000557500] jbd2_journal_start+0xbc/0x11c
[000000000053a438] ext4_journal_start_sb+0x5c/0x84
[000000000052dd84] ext4_da_writepages+0x1a8/0x404
[0000000000490ec0] do_writepages+0x40/0x78
[00000000004d0ff8] __writeback_single_inode+0x198/0x330
[00000000004d15d8] generic_sync_sb_inodes+0x238/0x3c4
[00000000004d1984] writeback_inodes+0xb0/0x114
[00000000004915c0] background_writeout+0xc8/0x114
[0000000000492004] pdflush+0x128/0x1e0
[0000000000465d34] kthread+0x48/0x7c
[000000000042b364] kernel_thread+0x3c/0x54
[0000000000465c88] kthreadd+0xc8/0x12c
dbench D 000000000055716c 0 2024 1
Call Trace:
[0000000000768bcc] schedule+0x18/0x40
[000000000055716c] start_this_handle+0x374/0x508
[0000000000557500] jbd2_journal_start+0xbc/0x11c
[000000000053a438] ext4_journal_start_sb+0x5c/0x84
[000000000052a9b8] ext4_dirty_inode+0xd4/0xf0
[00000000004d1a14] __mark_inode_dirty+0x2c/0x188
[00000000004c7714] touch_atime+0x160/0x178
[00000000004c2a3c] vfs_readdir+0x88/0xb0
[00000000004eab90] compat_sys_getdents+0x3c/0x8c
[0000000000406154] linux_sparc_syscall32+0x34/0x40
dbench D 000000000055716c 0 2025 1
Call Trace:
[0000000000768bcc] schedule+0x18/0x40
[000000000055716c] start_this_handle+0x374/0x508
[0000000000557500] jbd2_journal_start+0xbc/0x11c
[000000000053a438] ext4_journal_start_sb+0x5c/0x84
[000000000052a9b8] ext4_dirty_inode+0xd4/0xf0
[00000000004d1a14] __mark_inode_dirty+0x2c/0x188
[00000000004c7714] touch_atime+0x160/0x178
[00000000004c2a3c] vfs_readdir+0x88/0xb0
[00000000004eab90] compat_sys_getdents+0x3c/0x8c
[0000000000406154] linux_sparc_syscall32+0x34/0x40
dbench D 000000000055716c 0 2026 1
Call Trace:
[0000000000768bcc] schedule+0x18/0x40
[000000000055716c] start_this_handle+0x374/0x508
[0000000000557500] jbd2_journal_start+0xbc/0x11c
[000000000053a438] ext4_journal_start_sb+0x5c/0x84
[000000000052a9b8] ext4_dirty_inode+0xd4/0xf0
[00000000004d1a14] __mark_inode_dirty+0x2c/0x188
[00000000004c7714] touch_atime+0x160/0x178
[000000000048b56c] generic_file_aio_read+0x578/0x620
[00000000004b5254] do_sync_read+0x90/0xe0
[00000000004b5c40] vfs_read+0x7c/0x11c
[00000000004b5d30] SyS_pread64+0x50/0x7c
[000000000043f778] sys32_pread64+0x20/0x34
[0000000000406154] linux_sparc_syscall32+0x34/0x40
dbench D 000000000055716c 0 2027 1
Call Trace:
[0000000000768bcc] schedule+0x18/0x40
[000000000055716c] start_this_handle+0x374/0x508
[0000000000557500] jbd2_journal_start+0xbc/0x11c
[000000000053a438] ext4_journal_start_sb+0x5c/0x84
[000000000052a9b8] ext4_dirty_inode+0xd4/0xf0
[00000000004d1a14] __mark_inode_dirty+0x2c/0x188
[00000000004c757c] file_update_time+0xe0/0x118
[000000000048aef4] __generic_file_aio_write_nolock+0x280/0x380
[000000000048b930] generic_file_aio_write+0x58/0xc8
[0000000000526274] ext4_file_write+0xa8/0x15c
[00000000004b5174] do_sync_write+0x90/0xe0
[00000000004b5a40] vfs_write+0x7c/0x11c
[00000000004b5b30] SyS_pwrite64+0x50/0x7c
[000000000043f744] sys32_pwrite64+0x20/0x34
[0000000000406154] linux_sparc_syscall32+0x34/0x40
dbench D 000000000055716c 0 2028 1
Call Trace:
[0000000000768bcc] schedule+0x18/0x40
[000000000055716c] start_this_handle+0x374/0x508
[0000000000557500] jbd2_journal_start+0xbc/0x11c
[000000000053a438] ext4_journal_start_sb+0x5c/0x84
[000000000052b958] ext4_da_write_begin+0x98/0x1a0
[000000000048a8dc] generic_file_buffered_write+0xe4/0x2b8
[000000000048afd0] __generic_file_aio_write_nolock+0x35c/0x380
[000000000048b930] generic_file_aio_write+0x58/0xc8
[0000000000526274] ext4_file_write+0xa8/0x15c
[00000000004b5174] do_sync_write+0x90/0xe0
[00000000004b5a40] vfs_write+0x7c/0x11c
[00000000004b5b30] SyS_pwrite64+0x50/0x7c
[000000000043f744] sys32_pwrite64+0x20/0x34
[0000000000406154] linux_sparc_syscall32+0x34/0x40
dbench D 000000000055c010 0 2029 1
Call Trace:
[0000000000768bcc] schedule+0x18/0x40
[000000000055c010] jbd2_journal_release_jbd_inode+0x94/0xd8
[0000000000535bb4] ext4_clear_inode+0x2c/0x3c
[00000000004c7b1c] clear_inode+0xac/0x124
[0000000000527e48] ext4_free_inode+0x7c/0x3d4
[000000000052f884] ext4_delete_inode+0x218/0x29c
[00000000004c86b4] generic_delete_inode+0x8c/0x124
[00000000004c876c] generic_drop_inode+0x20/0x19c
[00000000004c77a4] iput+0x78/0x88
[00000000004c0200] do_unlinkat+0xf8/0x154
[00000000004c0270] SyS_unlink+0x14/0x24
[0000000000406154] linux_sparc_syscall32+0x34/0x40
dbench D 000000000055cf84 0 2030 1
Call Trace:
[0000000000768bcc] schedule+0x18/0x40
[000000000055cf84] jbd2_log_wait_commit+0x144/0x1a8
[0000000000555cdc] jbd2_journal_stop+0x32c/0x37c
[0000000000557594] jbd2_journal_force_commit+0x34/0x48
[0000000000536d14] ext4_force_commit+0x30/0x50
[00000000005296b8] ext4_write_inode+0x80/0xa0
[00000000004d1044] __writeback_single_inode+0x1e4/0x330
[00000000004d11b0] sync_inode+0x20/0x40
[00000000005263ec] ext4_sync_file+0xc4/0x118
[00000000004d46f0] vfs_fsync+0x6c/0xa0
[00000000004d474c] do_fsync+0x28/0x44
[00000000004d47a4] SyS_fsync+0x14/0x28
[0000000000406154] linux_sparc_syscall32+0x34/0x40
dbench D 000000000055716c 0 2031 1
Call Trace:
[0000000000768bcc] schedule+0x18/0x40
[000000000055716c] start_this_handle+0x374/0x508
[0000000000557500] jbd2_journal_start+0xbc/0x11c
[000000000053a438] ext4_journal_start_sb+0x5c/0x84
[000000000052b958] ext4_da_write_begin+0x98/0x1a0
[000000000048a8dc] generic_file_buffered_write+0xe4/0x2b8
[000000000048afd0] __generic_file_aio_write_nolock+0x35c/0x380
[000000000048b930] generic_file_aio_write+0x58/0xc8
[0000000000526274] ext4_file_write+0xa8/0x15c
[00000000004b5174] do_sync_write+0x90/0xe0
[00000000004b5a40] vfs_write+0x7c/0x11c
[00000000004b5b30] SyS_pwrite64+0x50/0x7c
[000000000043f744] sys32_pwrite64+0x20/0x34
[0000000000406154] linux_sparc_syscall32+0x34/0x40
dbench D 000000000055716c 0 2032 1
Call Trace:
[0000000000768bcc] schedule+0x18/0x40
[000000000055716c] start_this_handle+0x374/0x508
[0000000000557500] jbd2_journal_start+0xbc/0x11c
[000000000053a438] ext4_journal_start_sb+0x5c/0x84
[000000000052a9b8] ext4_dirty_inode+0xd4/0xf0
[00000000004d1a14] __mark_inode_dirty+0x2c/0x188
[00000000004c7714] touch_atime+0x160/0x178
[00000000004c2a3c] vfs_readdir+0x88/0xb0
[00000000004eab90] compat_sys_getdents+0x3c/0x8c
[0000000000406154] linux_sparc_syscall32+0x34/0x40
dbench D 000000000055716c 0 2033 1
Call Trace:
[0000000000768bcc] schedule+0x18/0x40
[000000000055716c] start_this_handle+0x374/0x508
[0000000000557500] jbd2_journal_start+0xbc/0x11c
[000000000053a438] ext4_journal_start_sb+0x5c/0x84
[000000000052a9b8] ext4_dirty_inode+0xd4/0xf0
[00000000004d1a14] __mark_inode_dirty+0x2c/0x188
[00000000004c7714] touch_atime+0x160/0x178
[000000000048b56c] generic_file_aio_read+0x578/0x620
[00000000004b5254] do_sync_read+0x90/0xe0
[00000000004b5c40] vfs_read+0x7c/0x11c
[00000000004b5d30] SyS_pread64+0x50/0x7c
[000000000043f778] sys32_pread64+0x20/0x34
[0000000000406154] linux_sparc_syscall32+0x34/0x40


>
> I don't think is related to BZ #12579, although some of the symptoms
> look superficially the same.
>
>                                                        - Ted
>
> P.S.  What are your dbench parameters, and how big is your filesystem,
> so I can try to reproduce it on my side?  And how long do you
> typically need to run before this triggers?  Thanks!

It is 2G filesystem, I run
`dbench -c /usr/share/dbench/client.txt 10` or 5 clients.

Yesterday it happened when I run it for the first time, but today I
run it 10 times before I got the deadlock.

So, I think I need to try it on 2.6.29-rc7 again.

2009-03-25 15:11:22

by Jan Kara

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs

> 2009/3/10 Theodore Tso <[email protected]>:
> > On Tue, Mar 10, 2009 at 05:18:55PM +0300, Alexander Beregalov wrote:
> >> > Thanks for reporting this; does it show up on stock 2.6.29-rc7?
> >> No, I can not reproduce it.
> >> It is a slow system, I would not like to bisect it, only if it is the
> >> last possibility.
> >
> > Just to be clear; you weren't able to reproduce it on stock 2.6.29-rc7
> > ---- does it reproduce easily on linux-next?
> Right.
> But now I am not sure 2.6.29-rc7 is not affected, I will try to
> reproduce it again.
>
> >
> > Next question --- does the system hang completely, or just the dbench
> > process (and probably any process that tries touching the filesystem
> System is responsible.
> > that's hung up)? ?Do you have a serial console, or someone of
> > recording all of the dumps coming from sysrq-t? ?The two stack traces
> I can read dmesg.
> > you gave weren't the ones causing the problem, but rather the ones
> > waiting for the journal lock.
>
> SysRq : Show State
>
> <..>
>
> kjournald2 D 0000000000557a0c 0 1547 2
> Call Trace:
> [0000000000768bcc] schedule+0x18/0x40
> [0000000000557a0c] jbd2_journal_commit_transaction+0x208/0x1718
> [000000000055d758] kjournald2+0x14c/0x318
> [0000000000465d34] kthread+0x48/0x7c
> [000000000042b364] kernel_thread+0x3c/0x54
> [0000000000465c88] kthreadd+0xc8/0x12c
> pdflush D 000000000055716c 0 2021 2
> Call Trace:
> [0000000000768bcc] schedule+0x18/0x40
> [000000000055716c] start_this_handle+0x374/0x508
> [0000000000557500] jbd2_journal_start+0xbc/0x11c
> [000000000053a438] ext4_journal_start_sb+0x5c/0x84
> [000000000052dd84] ext4_da_writepages+0x1a8/0x404
> [0000000000490ec0] do_writepages+0x40/0x78
> [00000000004d0ff8] __writeback_single_inode+0x198/0x330
> [00000000004d15d8] generic_sync_sb_inodes+0x238/0x3c4
> [00000000004d1984] writeback_inodes+0xb0/0x114
> [00000000004915c0] background_writeout+0xc8/0x114
> [0000000000492004] pdflush+0x128/0x1e0
> [0000000000465d34] kthread+0x48/0x7c
> [000000000042b364] kernel_thread+0x3c/0x54
> [0000000000465c88] kthreadd+0xc8/0x12c
> dbench D 000000000055716c 0 2024 1
> Call Trace:
> [0000000000768bcc] schedule+0x18/0x40
> [000000000055716c] start_this_handle+0x374/0x508
> [0000000000557500] jbd2_journal_start+0xbc/0x11c
> [000000000053a438] ext4_journal_start_sb+0x5c/0x84
> [000000000052a9b8] ext4_dirty_inode+0xd4/0xf0
> [00000000004d1a14] __mark_inode_dirty+0x2c/0x188
> [00000000004c7714] touch_atime+0x160/0x178
> [00000000004c2a3c] vfs_readdir+0x88/0xb0
> [00000000004eab90] compat_sys_getdents+0x3c/0x8c
> [0000000000406154] linux_sparc_syscall32+0x34/0x40
> dbench D 000000000055716c 0 2025 1
> Call Trace:
> [0000000000768bcc] schedule+0x18/0x40
> [000000000055716c] start_this_handle+0x374/0x508
> [0000000000557500] jbd2_journal_start+0xbc/0x11c
> [000000000053a438] ext4_journal_start_sb+0x5c/0x84
> [000000000052a9b8] ext4_dirty_inode+0xd4/0xf0
> [00000000004d1a14] __mark_inode_dirty+0x2c/0x188
> [00000000004c7714] touch_atime+0x160/0x178
> [00000000004c2a3c] vfs_readdir+0x88/0xb0
> [00000000004eab90] compat_sys_getdents+0x3c/0x8c
> [0000000000406154] linux_sparc_syscall32+0x34/0x40
> dbench D 000000000055716c 0 2026 1
> Call Trace:
> [0000000000768bcc] schedule+0x18/0x40
> [000000000055716c] start_this_handle+0x374/0x508
> [0000000000557500] jbd2_journal_start+0xbc/0x11c
> [000000000053a438] ext4_journal_start_sb+0x5c/0x84
> [000000000052a9b8] ext4_dirty_inode+0xd4/0xf0
> [00000000004d1a14] __mark_inode_dirty+0x2c/0x188
> [00000000004c7714] touch_atime+0x160/0x178
> [000000000048b56c] generic_file_aio_read+0x578/0x620
> [00000000004b5254] do_sync_read+0x90/0xe0
> [00000000004b5c40] vfs_read+0x7c/0x11c
> [00000000004b5d30] SyS_pread64+0x50/0x7c
> [000000000043f778] sys32_pread64+0x20/0x34
> [0000000000406154] linux_sparc_syscall32+0x34/0x40
> dbench D 000000000055716c 0 2027 1
> Call Trace:
> [0000000000768bcc] schedule+0x18/0x40
> [000000000055716c] start_this_handle+0x374/0x508
> [0000000000557500] jbd2_journal_start+0xbc/0x11c
> [000000000053a438] ext4_journal_start_sb+0x5c/0x84
> [000000000052a9b8] ext4_dirty_inode+0xd4/0xf0
> [00000000004d1a14] __mark_inode_dirty+0x2c/0x188
> [00000000004c757c] file_update_time+0xe0/0x118
> [000000000048aef4] __generic_file_aio_write_nolock+0x280/0x380
> [000000000048b930] generic_file_aio_write+0x58/0xc8
> [0000000000526274] ext4_file_write+0xa8/0x15c
> [00000000004b5174] do_sync_write+0x90/0xe0
> [00000000004b5a40] vfs_write+0x7c/0x11c
> [00000000004b5b30] SyS_pwrite64+0x50/0x7c
> [000000000043f744] sys32_pwrite64+0x20/0x34
> [0000000000406154] linux_sparc_syscall32+0x34/0x40
> dbench D 000000000055716c 0 2028 1
> Call Trace:
> [0000000000768bcc] schedule+0x18/0x40
> [000000000055716c] start_this_handle+0x374/0x508
> [0000000000557500] jbd2_journal_start+0xbc/0x11c
> [000000000053a438] ext4_journal_start_sb+0x5c/0x84
> [000000000052b958] ext4_da_write_begin+0x98/0x1a0
> [000000000048a8dc] generic_file_buffered_write+0xe4/0x2b8
> [000000000048afd0] __generic_file_aio_write_nolock+0x35c/0x380
> [000000000048b930] generic_file_aio_write+0x58/0xc8
> [0000000000526274] ext4_file_write+0xa8/0x15c
> [00000000004b5174] do_sync_write+0x90/0xe0
> [00000000004b5a40] vfs_write+0x7c/0x11c
> [00000000004b5b30] SyS_pwrite64+0x50/0x7c
> [000000000043f744] sys32_pwrite64+0x20/0x34
> [0000000000406154] linux_sparc_syscall32+0x34/0x40
> dbench D 000000000055c010 0 2029 1
> Call Trace:
> [0000000000768bcc] schedule+0x18/0x40
> [000000000055c010] jbd2_journal_release_jbd_inode+0x94/0xd8
> [0000000000535bb4] ext4_clear_inode+0x2c/0x3c
> [00000000004c7b1c] clear_inode+0xac/0x124
> [0000000000527e48] ext4_free_inode+0x7c/0x3d4
> [000000000052f884] ext4_delete_inode+0x218/0x29c
> [00000000004c86b4] generic_delete_inode+0x8c/0x124
> [00000000004c876c] generic_drop_inode+0x20/0x19c
> [00000000004c77a4] iput+0x78/0x88
> [00000000004c0200] do_unlinkat+0xf8/0x154
> [00000000004c0270] SyS_unlink+0x14/0x24
> [0000000000406154] linux_sparc_syscall32+0x34/0x40
> dbench D 000000000055cf84 0 2030 1
> Call Trace:
> [0000000000768bcc] schedule+0x18/0x40
> [000000000055cf84] jbd2_log_wait_commit+0x144/0x1a8
> [0000000000555cdc] jbd2_journal_stop+0x32c/0x37c
> [0000000000557594] jbd2_journal_force_commit+0x34/0x48
> [0000000000536d14] ext4_force_commit+0x30/0x50
> [00000000005296b8] ext4_write_inode+0x80/0xa0
> [00000000004d1044] __writeback_single_inode+0x1e4/0x330
> [00000000004d11b0] sync_inode+0x20/0x40
> [00000000005263ec] ext4_sync_file+0xc4/0x118
> [00000000004d46f0] vfs_fsync+0x6c/0xa0
> [00000000004d474c] do_fsync+0x28/0x44
> [00000000004d47a4] SyS_fsync+0x14/0x28
> [0000000000406154] linux_sparc_syscall32+0x34/0x40
> dbench D 000000000055716c 0 2031 1
> Call Trace:
> [0000000000768bcc] schedule+0x18/0x40
> [000000000055716c] start_this_handle+0x374/0x508
> [0000000000557500] jbd2_journal_start+0xbc/0x11c
> [000000000053a438] ext4_journal_start_sb+0x5c/0x84
> [000000000052b958] ext4_da_write_begin+0x98/0x1a0
> [000000000048a8dc] generic_file_buffered_write+0xe4/0x2b8
> [000000000048afd0] __generic_file_aio_write_nolock+0x35c/0x380
> [000000000048b930] generic_file_aio_write+0x58/0xc8
> [0000000000526274] ext4_file_write+0xa8/0x15c
> [00000000004b5174] do_sync_write+0x90/0xe0
> [00000000004b5a40] vfs_write+0x7c/0x11c
> [00000000004b5b30] SyS_pwrite64+0x50/0x7c
> [000000000043f744] sys32_pwrite64+0x20/0x34
> [0000000000406154] linux_sparc_syscall32+0x34/0x40
> dbench D 000000000055716c 0 2032 1
> Call Trace:
> [0000000000768bcc] schedule+0x18/0x40
> [000000000055716c] start_this_handle+0x374/0x508
> [0000000000557500] jbd2_journal_start+0xbc/0x11c
> [000000000053a438] ext4_journal_start_sb+0x5c/0x84
> [000000000052a9b8] ext4_dirty_inode+0xd4/0xf0
> [00000000004d1a14] __mark_inode_dirty+0x2c/0x188
> [00000000004c7714] touch_atime+0x160/0x178
> [00000000004c2a3c] vfs_readdir+0x88/0xb0
> [00000000004eab90] compat_sys_getdents+0x3c/0x8c
> [0000000000406154] linux_sparc_syscall32+0x34/0x40
> dbench D 000000000055716c 0 2033 1
> Call Trace:
> [0000000000768bcc] schedule+0x18/0x40
> [000000000055716c] start_this_handle+0x374/0x508
> [0000000000557500] jbd2_journal_start+0xbc/0x11c
> [000000000053a438] ext4_journal_start_sb+0x5c/0x84
> [000000000052a9b8] ext4_dirty_inode+0xd4/0xf0
> [00000000004d1a14] __mark_inode_dirty+0x2c/0x188
> [00000000004c7714] touch_atime+0x160/0x178
> [000000000048b56c] generic_file_aio_read+0x578/0x620
> [00000000004b5254] do_sync_read+0x90/0xe0
> [00000000004b5c40] vfs_read+0x7c/0x11c
> [00000000004b5d30] SyS_pread64+0x50/0x7c
> [000000000043f778] sys32_pread64+0x20/0x34
> [0000000000406154] linux_sparc_syscall32+0x34/0x40
>
>
> >
> > I don't think is related to BZ #12579, although some of the symptoms
> > look superficially the same.
> >
> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?- Ted
> >
> > P.S. ?What are your dbench parameters, and how big is your filesystem,
> > so I can try to reproduce it on my side? ?And how long do you
> > typically need to run before this triggers? ?Thanks!
>
> It is 2G filesystem, I run
> `dbench -c /usr/share/dbench/client.txt 10` or 5 clients.
>
> Yesterday it happened when I run it for the first time, but today I
> run it 10 times before I got the deadlock.
>
> So, I think I need to try it on 2.6.29-rc7 again.
I've looked into this. Obviously, what's happenning is that we delete
an inode and jbd2_journal_release_jbd_inode() finds inode is just under
writeout in transaction commit and thus it waits. But it gets never woken
up and because it has a handle from the transaction, every one eventually
blocks on waiting for a transaction to finish.
But I don't really see how that can happen. The code is really
straightforward and everything happens under j_list_lock... Strange.

Honza
--
Jan Kara <[email protected]>
SuSE CR Labs

2009-03-25 15:15:18

by Jan Kara

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs

> > 2009/3/10 Theodore Tso <[email protected]>:
> > > On Tue, Mar 10, 2009 at 05:18:55PM +0300, Alexander Beregalov wrote:
> > >> > Thanks for reporting this; does it show up on stock 2.6.29-rc7?
> > >> No, I can not reproduce it.
> > >> It is a slow system, I would not like to bisect it, only if it is the
> > >> last possibility.
> > >
> > > Just to be clear; you weren't able to reproduce it on stock 2.6.29-rc7
> > > ---- does it reproduce easily on linux-next?
> > Right.
> > But now I am not sure 2.6.29-rc7 is not affected, I will try to
> > reproduce it again.
> >
> > >
> > > Next question --- does the system hang completely, or just the dbench
> > > process (and probably any process that tries touching the filesystem
> > System is responsible.
> > > that's hung up)? ?Do you have a serial console, or someone of
> > > recording all of the dumps coming from sysrq-t? ?The two stack traces
> > I can read dmesg.
> > > you gave weren't the ones causing the problem, but rather the ones
> > > waiting for the journal lock.
> >
> > SysRq : Show State
> >
> > <..>
> >
> > kjournald2 D 0000000000557a0c 0 1547 2
> > Call Trace:
> > [0000000000768bcc] schedule+0x18/0x40
> > [0000000000557a0c] jbd2_journal_commit_transaction+0x208/0x1718
> > [000000000055d758] kjournald2+0x14c/0x318
> > [0000000000465d34] kthread+0x48/0x7c
> > [000000000042b364] kernel_thread+0x3c/0x54
> > [0000000000465c88] kthreadd+0xc8/0x12c
> > pdflush D 000000000055716c 0 2021 2
> > Call Trace:
> > [0000000000768bcc] schedule+0x18/0x40
> > [000000000055716c] start_this_handle+0x374/0x508
> > [0000000000557500] jbd2_journal_start+0xbc/0x11c
> > [000000000053a438] ext4_journal_start_sb+0x5c/0x84
> > [000000000052dd84] ext4_da_writepages+0x1a8/0x404
> > [0000000000490ec0] do_writepages+0x40/0x78
> > [00000000004d0ff8] __writeback_single_inode+0x198/0x330
> > [00000000004d15d8] generic_sync_sb_inodes+0x238/0x3c4
> > [00000000004d1984] writeback_inodes+0xb0/0x114
> > [00000000004915c0] background_writeout+0xc8/0x114
> > [0000000000492004] pdflush+0x128/0x1e0
> > [0000000000465d34] kthread+0x48/0x7c
> > [000000000042b364] kernel_thread+0x3c/0x54
> > [0000000000465c88] kthreadd+0xc8/0x12c
> > dbench D 000000000055716c 0 2024 1
> > Call Trace:
> > [0000000000768bcc] schedule+0x18/0x40
> > [000000000055716c] start_this_handle+0x374/0x508
> > [0000000000557500] jbd2_journal_start+0xbc/0x11c
> > [000000000053a438] ext4_journal_start_sb+0x5c/0x84
> > [000000000052a9b8] ext4_dirty_inode+0xd4/0xf0
> > [00000000004d1a14] __mark_inode_dirty+0x2c/0x188
> > [00000000004c7714] touch_atime+0x160/0x178
> > [00000000004c2a3c] vfs_readdir+0x88/0xb0
> > [00000000004eab90] compat_sys_getdents+0x3c/0x8c
> > [0000000000406154] linux_sparc_syscall32+0x34/0x40
> > dbench D 000000000055716c 0 2025 1
> > Call Trace:
> > [0000000000768bcc] schedule+0x18/0x40
> > [000000000055716c] start_this_handle+0x374/0x508
> > [0000000000557500] jbd2_journal_start+0xbc/0x11c
> > [000000000053a438] ext4_journal_start_sb+0x5c/0x84
> > [000000000052a9b8] ext4_dirty_inode+0xd4/0xf0
> > [00000000004d1a14] __mark_inode_dirty+0x2c/0x188
> > [00000000004c7714] touch_atime+0x160/0x178
> > [00000000004c2a3c] vfs_readdir+0x88/0xb0
> > [00000000004eab90] compat_sys_getdents+0x3c/0x8c
> > [0000000000406154] linux_sparc_syscall32+0x34/0x40
> > dbench D 000000000055716c 0 2026 1
> > Call Trace:
> > [0000000000768bcc] schedule+0x18/0x40
> > [000000000055716c] start_this_handle+0x374/0x508
> > [0000000000557500] jbd2_journal_start+0xbc/0x11c
> > [000000000053a438] ext4_journal_start_sb+0x5c/0x84
> > [000000000052a9b8] ext4_dirty_inode+0xd4/0xf0
> > [00000000004d1a14] __mark_inode_dirty+0x2c/0x188
> > [00000000004c7714] touch_atime+0x160/0x178
> > [000000000048b56c] generic_file_aio_read+0x578/0x620
> > [00000000004b5254] do_sync_read+0x90/0xe0
> > [00000000004b5c40] vfs_read+0x7c/0x11c
> > [00000000004b5d30] SyS_pread64+0x50/0x7c
> > [000000000043f778] sys32_pread64+0x20/0x34
> > [0000000000406154] linux_sparc_syscall32+0x34/0x40
> > dbench D 000000000055716c 0 2027 1
> > Call Trace:
> > [0000000000768bcc] schedule+0x18/0x40
> > [000000000055716c] start_this_handle+0x374/0x508
> > [0000000000557500] jbd2_journal_start+0xbc/0x11c
> > [000000000053a438] ext4_journal_start_sb+0x5c/0x84
> > [000000000052a9b8] ext4_dirty_inode+0xd4/0xf0
> > [00000000004d1a14] __mark_inode_dirty+0x2c/0x188
> > [00000000004c757c] file_update_time+0xe0/0x118
> > [000000000048aef4] __generic_file_aio_write_nolock+0x280/0x380
> > [000000000048b930] generic_file_aio_write+0x58/0xc8
> > [0000000000526274] ext4_file_write+0xa8/0x15c
> > [00000000004b5174] do_sync_write+0x90/0xe0
> > [00000000004b5a40] vfs_write+0x7c/0x11c
> > [00000000004b5b30] SyS_pwrite64+0x50/0x7c
> > [000000000043f744] sys32_pwrite64+0x20/0x34
> > [0000000000406154] linux_sparc_syscall32+0x34/0x40
> > dbench D 000000000055716c 0 2028 1
> > Call Trace:
> > [0000000000768bcc] schedule+0x18/0x40
> > [000000000055716c] start_this_handle+0x374/0x508
> > [0000000000557500] jbd2_journal_start+0xbc/0x11c
> > [000000000053a438] ext4_journal_start_sb+0x5c/0x84
> > [000000000052b958] ext4_da_write_begin+0x98/0x1a0
> > [000000000048a8dc] generic_file_buffered_write+0xe4/0x2b8
> > [000000000048afd0] __generic_file_aio_write_nolock+0x35c/0x380
> > [000000000048b930] generic_file_aio_write+0x58/0xc8
> > [0000000000526274] ext4_file_write+0xa8/0x15c
> > [00000000004b5174] do_sync_write+0x90/0xe0
> > [00000000004b5a40] vfs_write+0x7c/0x11c
> > [00000000004b5b30] SyS_pwrite64+0x50/0x7c
> > [000000000043f744] sys32_pwrite64+0x20/0x34
> > [0000000000406154] linux_sparc_syscall32+0x34/0x40
> > dbench D 000000000055c010 0 2029 1
> > Call Trace:
> > [0000000000768bcc] schedule+0x18/0x40
> > [000000000055c010] jbd2_journal_release_jbd_inode+0x94/0xd8
> > [0000000000535bb4] ext4_clear_inode+0x2c/0x3c
> > [00000000004c7b1c] clear_inode+0xac/0x124
> > [0000000000527e48] ext4_free_inode+0x7c/0x3d4
> > [000000000052f884] ext4_delete_inode+0x218/0x29c
> > [00000000004c86b4] generic_delete_inode+0x8c/0x124
> > [00000000004c876c] generic_drop_inode+0x20/0x19c
> > [00000000004c77a4] iput+0x78/0x88
> > [00000000004c0200] do_unlinkat+0xf8/0x154
> > [00000000004c0270] SyS_unlink+0x14/0x24
> > [0000000000406154] linux_sparc_syscall32+0x34/0x40
> > dbench D 000000000055cf84 0 2030 1
> > Call Trace:
> > [0000000000768bcc] schedule+0x18/0x40
> > [000000000055cf84] jbd2_log_wait_commit+0x144/0x1a8
> > [0000000000555cdc] jbd2_journal_stop+0x32c/0x37c
> > [0000000000557594] jbd2_journal_force_commit+0x34/0x48
> > [0000000000536d14] ext4_force_commit+0x30/0x50
> > [00000000005296b8] ext4_write_inode+0x80/0xa0
> > [00000000004d1044] __writeback_single_inode+0x1e4/0x330
> > [00000000004d11b0] sync_inode+0x20/0x40
> > [00000000005263ec] ext4_sync_file+0xc4/0x118
> > [00000000004d46f0] vfs_fsync+0x6c/0xa0
> > [00000000004d474c] do_fsync+0x28/0x44
> > [00000000004d47a4] SyS_fsync+0x14/0x28
> > [0000000000406154] linux_sparc_syscall32+0x34/0x40
> > dbench D 000000000055716c 0 2031 1
> > Call Trace:
> > [0000000000768bcc] schedule+0x18/0x40
> > [000000000055716c] start_this_handle+0x374/0x508
> > [0000000000557500] jbd2_journal_start+0xbc/0x11c
> > [000000000053a438] ext4_journal_start_sb+0x5c/0x84
> > [000000000052b958] ext4_da_write_begin+0x98/0x1a0
> > [000000000048a8dc] generic_file_buffered_write+0xe4/0x2b8
> > [000000000048afd0] __generic_file_aio_write_nolock+0x35c/0x380
> > [000000000048b930] generic_file_aio_write+0x58/0xc8
> > [0000000000526274] ext4_file_write+0xa8/0x15c
> > [00000000004b5174] do_sync_write+0x90/0xe0
> > [00000000004b5a40] vfs_write+0x7c/0x11c
> > [00000000004b5b30] SyS_pwrite64+0x50/0x7c
> > [000000000043f744] sys32_pwrite64+0x20/0x34
> > [0000000000406154] linux_sparc_syscall32+0x34/0x40
> > dbench D 000000000055716c 0 2032 1
> > Call Trace:
> > [0000000000768bcc] schedule+0x18/0x40
> > [000000000055716c] start_this_handle+0x374/0x508
> > [0000000000557500] jbd2_journal_start+0xbc/0x11c
> > [000000000053a438] ext4_journal_start_sb+0x5c/0x84
> > [000000000052a9b8] ext4_dirty_inode+0xd4/0xf0
> > [00000000004d1a14] __mark_inode_dirty+0x2c/0x188
> > [00000000004c7714] touch_atime+0x160/0x178
> > [00000000004c2a3c] vfs_readdir+0x88/0xb0
> > [00000000004eab90] compat_sys_getdents+0x3c/0x8c
> > [0000000000406154] linux_sparc_syscall32+0x34/0x40
> > dbench D 000000000055716c 0 2033 1
> > Call Trace:
> > [0000000000768bcc] schedule+0x18/0x40
> > [000000000055716c] start_this_handle+0x374/0x508
> > [0000000000557500] jbd2_journal_start+0xbc/0x11c
> > [000000000053a438] ext4_journal_start_sb+0x5c/0x84
> > [000000000052a9b8] ext4_dirty_inode+0xd4/0xf0
> > [00000000004d1a14] __mark_inode_dirty+0x2c/0x188
> > [00000000004c7714] touch_atime+0x160/0x178
> > [000000000048b56c] generic_file_aio_read+0x578/0x620
> > [00000000004b5254] do_sync_read+0x90/0xe0
> > [00000000004b5c40] vfs_read+0x7c/0x11c
> > [00000000004b5d30] SyS_pread64+0x50/0x7c
> > [000000000043f778] sys32_pread64+0x20/0x34
> > [0000000000406154] linux_sparc_syscall32+0x34/0x40
> >
> >
> > >
> > > I don't think is related to BZ #12579, although some of the symptoms
> > > look superficially the same.
> > >
> > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?- Ted
> > >
> > > P.S. ?What are your dbench parameters, and how big is your filesystem,
> > > so I can try to reproduce it on my side? ?And how long do you
> > > typically need to run before this triggers? ?Thanks!
> >
> > It is 2G filesystem, I run
> > `dbench -c /usr/share/dbench/client.txt 10` or 5 clients.
> >
> > Yesterday it happened when I run it for the first time, but today I
> > run it 10 times before I got the deadlock.
> >
> > So, I think I need to try it on 2.6.29-rc7 again.
> I've looked into this. Obviously, what's happenning is that we delete
> an inode and jbd2_journal_release_jbd_inode() finds inode is just under
> writeout in transaction commit and thus it waits. But it gets never woken
> up and because it has a handle from the transaction, every one eventually
> blocks on waiting for a transaction to finish.
> But I don't really see how that can happen. The code is really
> straightforward and everything happens under j_list_lock... Strange.
BTW: Is the system SMP?

Honza
--
Jan Kara <[email protected]>
SuSE CR Labs

2009-03-25 15:18:46

by Alexander Beregalov

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs

2009/3/25 Jan Kara <[email protected]>:
>> > So, I think I need to try it on 2.6.29-rc7 again.
>>   I've looked into this. Obviously, what's happenning is that we delete
>> an inode and jbd2_journal_release_jbd_inode() finds inode is just under
>> writeout in transaction commit and thus it waits. But it gets never woken
>> up and because it has a handle from the transaction, every one eventually
>> blocks on waiting for a transaction to finish.
>>   But I don't really see how that can happen. The code is really
>> straightforward and everything happens under j_list_lock... Strange.
>  BTW: Is the system SMP?
No, it is UP system.

The bug exists even in 2.6.29, I posted it with a new topic.

2009-03-25 15:22:35

by Jan Kara

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs

On Wed 25-03-09 18:18:43, Alexander Beregalov wrote:
> 2009/3/25 Jan Kara <[email protected]>:
> >> > So, I think I need to try it on 2.6.29-rc7 again.
> >> ? I've looked into this. Obviously, what's happenning is that we delete
> >> an inode and jbd2_journal_release_jbd_inode() finds inode is just under
> >> writeout in transaction commit and thus it waits. But it gets never woken
> >> up and because it has a handle from the transaction, every one eventually
> >> blocks on waiting for a transaction to finish.
> >> ? But I don't really see how that can happen. The code is really
> >> straightforward and everything happens under j_list_lock... Strange.
> > ?BTW: Is the system SMP?
> No, it is UP system.
Even stranger. And do you have CONFIG_PREEMPT set?

> The bug exists even in 2.6.29, I posted it with a new topic.
OK, I've sort-of expected this.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2009-03-25 15:29:10

by Alexander Beregalov

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs

2009/3/25 Jan Kara <[email protected]>:
> On Wed 25-03-09 18:18:43, Alexander Beregalov wrote:
>> 2009/3/25 Jan Kara <[email protected]>:
>> >> > So, I think I need to try it on 2.6.29-rc7 again.
>> >>   I've looked into this. Obviously, what's happenning is that we delete
>> >> an inode and jbd2_journal_release_jbd_inode() finds inode is just under
>> >> writeout in transaction commit and thus it waits. But it gets never woken
>> >> up and because it has a handle from the transaction, every one eventually
>> >> blocks on waiting for a transaction to finish.
>> >>   But I don't really see how that can happen. The code is really
>> >> straightforward and everything happens under j_list_lock... Strange.
>> >  BTW: Is the system SMP?
>> No, it is UP system.
>  Even stranger. And do you have CONFIG_PREEMPT set?
>
>> The bug exists even in 2.6.29, I posted it with a new topic.
>  OK, I've sort-of expected this.

CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU_TRACE=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_DEBUG_PREEMPT=y
# CONFIG_PREEMPT_TRACER is not set

config is attached.


Attachments:
sparc-config (25.16 kB)

2009-03-25 16:15:59

by Jan Kara

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs

On Wed 25-03-09 18:29:10, Alexander Beregalov wrote:
> 2009/3/25 Jan Kara <[email protected]>:
> > On Wed 25-03-09 18:18:43, Alexander Beregalov wrote:
> >> 2009/3/25 Jan Kara <[email protected]>:
> >> >> > So, I think I need to try it on 2.6.29-rc7 again.
> >> >> ? I've looked into this. Obviously, what's happenning is that we delete
> >> >> an inode and jbd2_journal_release_jbd_inode() finds inode is just under
> >> >> writeout in transaction commit and thus it waits. But it gets never woken
> >> >> up and because it has a handle from the transaction, every one eventually
> >> >> blocks on waiting for a transaction to finish.
> >> >> ? But I don't really see how that can happen. The code is really
> >> >> straightforward and everything happens under j_list_lock... Strange.
> >> > ?BTW: Is the system SMP?
> >> No, it is UP system.
> > ?Even stranger. And do you have CONFIG_PREEMPT set?
> >
> >> The bug exists even in 2.6.29, I posted it with a new topic.
> > ?OK, I've sort-of expected this.
>
> CONFIG_PREEMPT_RCU=y
> CONFIG_PREEMPT_RCU_TRACE=y
> # CONFIG_PREEMPT_NONE is not set
> # CONFIG_PREEMPT_VOLUNTARY is not set
> CONFIG_PREEMPT=y
> CONFIG_DEBUG_PREEMPT=y
> # CONFIG_PREEMPT_TRACER is not set
>
> config is attached.
Thanks for the data. I still don't see how the wakeup can get lost. The
process even cannot be preempted when we are in the section protected by
j_list_lock... Can you send me a disassembly of functions
jbd2_journal_release_jbd_inode() and journal_submit_data_buffers() so that
I can see whether the compiler has not reordered something unexpectedly?
Thanks.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2009-03-25 17:07:46

by Alexander Beregalov

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs

2009/3/25 Jan Kara <[email protected]>:
> On Wed 25-03-09 18:29:10, Alexander Beregalov wrote:
>> 2009/3/25 Jan Kara <[email protected]>:
>> > On Wed 25-03-09 18:18:43, Alexander Beregalov wrote:
>> >> 2009/3/25 Jan Kara <[email protected]>:
>> >> >> > So, I think I need to try it on 2.6.29-rc7 again.
>> >> >>   I've looked into this. Obviously, what's happenning is that we delete
>> >> >> an inode and jbd2_journal_release_jbd_inode() finds inode is just under
>> >> >> writeout in transaction commit and thus it waits. But it gets never woken
>> >> >> up and because it has a handle from the transaction, every one eventually
>> >> >> blocks on waiting for a transaction to finish.
>> >> >>   But I don't really see how that can happen. The code is really
>> >> >> straightforward and everything happens under j_list_lock... Strange.
>> >> >  BTW: Is the system SMP?
>> >> No, it is UP system.
>> >  Even stranger. And do you have CONFIG_PREEMPT set?
>> >
>> >> The bug exists even in 2.6.29, I posted it with a new topic.
>> >  OK, I've sort-of expected this.
>>
>> CONFIG_PREEMPT_RCU=y
>> CONFIG_PREEMPT_RCU_TRACE=y
>> # CONFIG_PREEMPT_NONE is not set
>> # CONFIG_PREEMPT_VOLUNTARY is not set
>> CONFIG_PREEMPT=y
>> CONFIG_DEBUG_PREEMPT=y
>> # CONFIG_PREEMPT_TRACER is not set
>>
>> config is attached.
>  Thanks for the data. I still don't see how the wakeup can get lost. The
> process even cannot be preempted when we are in the section protected by
> j_list_lock... Can you send me a disassembly of functions
> jbd2_journal_release_jbd_inode() and journal_submit_data_buffers() so that
> I can see whether the compiler has not reordered something unexpectedly?

void jbd2_journal_release_jbd_inode(journal_t *journal,
struct jbd2_inode *jinode)
{
6d8: 9d e3 bf 00 save %sp, -256, %sp
6dc: 11 00 00 00 sethi %hi(0), %o0
6e0: 40 00 00 00 call 6e0 <jbd2_journal_release_jbd_inode+0x8>
6e4: 90 12 20 00 mov %o0, %o0 ! 0 <jbd2_history_skip_empty>
int writeout = 0;

if (!journal)
6e8: 02 c6 00 30 brz,pn %i0, 7a8
<jbd2_journal_release_jbd_inode+0xd0>
6ec: 03 00 00 00 sethi %hi(0), %g1
return;
restart:
spin_lock(&journal->j_list_lock);
6f0: b0 06 25 70 add %i0, 0x570, %i0
/* Is commit writing out inode - we have to wait */
if (jinode->i_flags & JI_COMMIT_RUNNING) {
wait_queue_head_t *wq;
DEFINE_WAIT_BIT(wait, &jinode->i_flags, __JI_COMMIT_RUNNING);
6f4: aa 10 60 00 mov %g1, %l5
6f8: a2 06 60 28 add %i1, 0x28, %l1
6fc: a8 07 a7 b7 add %fp, 0x7b7, %l4
700: a6 07 a7 df add %fp, 0x7df, %l3
wq = bit_waitqueue(&jinode->i_flags, __JI_COMMIT_RUNNING);
prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
704: a4 07 a7 c7 add %fp, 0x7c7, %l2
int writeout = 0;

if (!journal)
return;
restart:
spin_lock(&journal->j_list_lock);
708: 40 00 00 00 call 708 <jbd2_journal_release_jbd_inode+0x30>
70c: 90 10 00 18 mov %i0, %o0
/* Is commit writing out inode - we have to wait */
if (jinode->i_flags & JI_COMMIT_RUNNING) {
710: c2 06 60 28 ld [ %i1 + 0x28 ], %g1
wait_queue_head_t *wq;
DEFINE_WAIT_BIT(wait, &jinode->i_flags, __JI_COMMIT_RUNNING);
714: 94 10 20 38 mov 0x38, %o2
718: 90 10 00 14 mov %l4, %o0
71c: 92 10 20 00 clr %o1
if (!journal)
return;
restart:
spin_lock(&journal->j_list_lock);
/* Is commit writing out inode - we have to wait */
if (jinode->i_flags & JI_COMMIT_RUNNING) {
720: 80 88 60 01 btst 1, %g1
724: 02 60 00 19 be,pn %xcc, 788
<jbd2_journal_release_jbd_inode+0xb0>
728: a0 10 00 04 mov %g4, %l0
wait_queue_head_t *wq;
DEFINE_WAIT_BIT(wait, &jinode->i_flags, __JI_COMMIT_RUNNING);
72c: 40 00 00 00 call 72c <jbd2_journal_release_jbd_inode+0x54>
730: 01 00 00 00 nop
wq = bit_waitqueue(&jinode->i_flags, __JI_COMMIT_RUNNING);
734: 90 10 00 11 mov %l1, %o0
738: 92 10 20 00 clr %o1
restart:
spin_lock(&journal->j_list_lock);
/* Is commit writing out inode - we have to wait */
if (jinode->i_flags & JI_COMMIT_RUNNING) {
wait_queue_head_t *wq;
DEFINE_WAIT_BIT(wait, &jinode->i_flags, __JI_COMMIT_RUNNING);
73c: e0 77 a7 cf stx %l0, [ %fp + 0x7cf ]
740: e2 77 a7 b7 stx %l1, [ %fp + 0x7b7 ]
744: ea 77 a7 d7 stx %l5, [ %fp + 0x7d7 ]
748: e6 77 a7 df stx %l3, [ %fp + 0x7df ]
wq = bit_waitqueue(&jinode->i_flags, __JI_COMMIT_RUNNING);
74c: 40 00 00 00 call 74c <jbd2_journal_release_jbd_inode+0x74>
750: e6 77 a7 e7 stx %l3, [ %fp + 0x7e7 ]
prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
754: 92 10 00 12 mov %l2, %o1
758: 94 10 20 02 mov 2, %o2
75c: 40 00 00 00 call 75c <jbd2_journal_release_jbd_inode+0x84>
760: a0 10 00 08 mov %o0, %l0
spin_unlock(&journal->j_list_lock);
764: 40 00 00 00 call 764 <jbd2_journal_release_jbd_inode+0x8c>
768: 90 10 00 18 mov %i0, %o0
schedule();
76c: 40 00 00 00 call 76c <jbd2_journal_release_jbd_inode+0x94>
770: 01 00 00 00 nop
finish_wait(wq, &wait.wait);
774: 90 10 00 10 mov %l0, %o0
778: 40 00 00 00 call 778 <jbd2_journal_release_jbd_inode+0xa0>
77c: 92 10 00 12 mov %l2, %o1
780: 10 6f ff e2 b %xcc, 708
<jbd2_journal_release_jbd_inode+0x30>
784: 01 00 00 00 nop
}

/* Do we need to wait for data writeback? */
if (journal->j_committing_transaction == jinode->i_transaction)
writeout = 1;
if (jinode->i_transaction) {
788: c2 5e 40 00 ldx [ %i1 ], %g1
78c: 02 c0 40 05 brz,pn %g1, 7a0
<jbd2_journal_release_jbd_inode+0xc8>
790: 01 00 00 00 nop
list_del(&jinode->i_list);
794: 40 00 00 00 call 794 <jbd2_journal_release_jbd_inode+0xbc>
798: 90 06 60 10 add %i1, 0x10, %o0
jinode->i_transaction = NULL;
79c: c0 76 40 00 clrx [ %i1 ]
}
spin_unlock(&journal->j_list_lock);
7a0: 40 00 00 00 call 7a0 <jbd2_journal_release_jbd_inode+0xc8>
7a4: 90 10 00 18 mov %i0, %o0
7a8: 81 cf e0 08 rett %i7 + 8
7ac: 01 00 00 00 nop

====
By default gcc inlines journal_submit_data_buffers()
Here is -fno-inline version. Default version is in attach.
====

static int journal_submit_data_buffers(journal_t *journal,
transaction_t *commit_transaction)
{
9c: 9d e3 bf 40 save %sp, -192, %sp
a0: 11 00 00 00 sethi %hi(0), %o0
struct jbd2_inode *jinode;
int err, ret = 0;
struct address_space *mapping;

spin_lock(&journal->j_list_lock);
a4: a4 06 25 70 add %i0, 0x570, %l2
* our inode list. We use JI_COMMIT_RUNNING flag to protect inode we currently
* operate on from being released while we write out pages.
*/
static int journal_submit_data_buffers(journal_t *journal,
transaction_t *commit_transaction)
{
a8: 90 12 20 00 mov %o0, %o0
ac: 40 00 00 00 call ac <journal_submit_data_buffers+0x10>
b0: b0 10 20 00 clr %i0
struct jbd2_inode *jinode;
int err, ret = 0;
struct address_space *mapping;

spin_lock(&journal->j_list_lock);
list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
b4: a6 06 60 60 add %i1, 0x60, %l3
{
struct jbd2_inode *jinode;
int err, ret = 0;
struct address_space *mapping;

spin_lock(&journal->j_list_lock);
b8: 40 00 00 00 call b8 <journal_submit_data_buffers+0x1c>
bc: 90 10 00 12 mov %l2, %o0
list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
c0: 10 68 00 1d b %xcc, 134 <journal_submit_data_buffers+0x98>
c4: c2 5e 60 60 ldx [ %i1 + 0x60 ], %g1
mapping = jinode->i_vfs_inode->i_mapping;
jinode->i_flags |= JI_COMMIT_RUNNING;
spin_unlock(&journal->j_list_lock);
c8: 90 10 00 12 mov %l2, %o0
struct address_space *mapping;

spin_lock(&journal->j_list_lock);
list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
mapping = jinode->i_vfs_inode->i_mapping;
jinode->i_flags |= JI_COMMIT_RUNNING;
cc: c2 04 60 28 ld [ %l1 + 0x28 ], %g1
int err, ret = 0;
struct address_space *mapping;

spin_lock(&journal->j_list_lock);
list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
mapping = jinode->i_vfs_inode->i_mapping;
d0: e0 58 a1 e0 ldx [ %g2 + 0x1e0 ], %l0
jinode->i_flags |= JI_COMMIT_RUNNING;
d4: 82 10 60 01 or %g1, 1, %g1
spin_unlock(&journal->j_list_lock);
d8: 40 00 00 00 call d8 <journal_submit_data_buffers+0x3c>
dc: c2 24 60 28 st %g1, [ %l1 + 0x28 ]
* submit the inode data buffers. We use writepage
* instead of writepages. Because writepages can do
* block allocation with delalloc. We need to write
* only allocated blocks here.
*/
err = journal_submit_inode_data_buffers(mapping);
e0: 7f ff ff d3 call 2c <journal_submit_inode_data_buffers>
e4: 90 10 00 10 mov %l0, %o0
if (!ret)
e8: 80 a6 20 00 cmp %i0, 0
ec: b1 64 40 08 move %icc, %o0, %i0
ret = err;
spin_lock(&journal->j_list_lock);
f0: 40 00 00 00 call f0 <journal_submit_data_buffers+0x54>
f4: 90 10 00 12 mov %l2, %o0
J_ASSERT(jinode->i_transaction == commit_transaction);
f8: c2 5c 40 00 ldx [ %l1 ], %g1
fc: 80 a0 40 19 cmp %g1, %i1
100: 22 68 00 07 be,a %xcc, 11c
<journal_submit_data_buffers+0x80>
104: c2 04 60 28 ld [ %l1 + 0x28 ], %g1
108: 11 00 00 00 sethi %hi(0), %o0
10c: 92 10 21 04 mov 0x104, %o1
110: 40 00 00 00 call 110 <journal_submit_data_buffers+0x74>
114: 90 12 20 00 mov %o0, %o0
118: 91 d0 20 05 ta 5
jinode->i_flags &= ~JI_COMMIT_RUNNING;
wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING);
11c: 90 04 60 28 add %l1, 0x28, %o0
120: 92 10 20 00 clr %o1
err = journal_submit_inode_data_buffers(mapping);
if (!ret)
ret = err;
spin_lock(&journal->j_list_lock);
J_ASSERT(jinode->i_transaction == commit_transaction);
jinode->i_flags &= ~JI_COMMIT_RUNNING;
124: 82 08 7f fe and %g1, -2, %g1
wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING);
128: 40 00 00 00 call 128 <journal_submit_data_buffers+0x8c>
12c: c2 24 60 28 st %g1, [ %l1 + 0x28 ]
struct jbd2_inode *jinode;
int err, ret = 0;
struct address_space *mapping;

spin_lock(&journal->j_list_lock);
list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
130: c2 5c 60 10 ldx [ %l1 + 0x10 ], %g1
134: a2 00 7f f0 add %g1, -16, %l1
* prefetches into the prefetch-cache which only is accessible
* by floating point operations in UltraSPARC-III and later.
* By contrast, "#one_write" prefetches into the L2 cache
* in shared state.
*/
__asm__ __volatile__("prefetch [%0], #one_write"
138: c2 5c 60 10 ldx [ %l1 + 0x10 ], %g1
13c: c7 68 40 00 prefetch [ %g1 ], #one_write
140: 82 04 60 10 add %l1, 0x10, %g1
144: 80 a4 c0 01 cmp %l3, %g1
148: 32 6f ff e0 bne,a %xcc, c8
<journal_submit_data_buffers+0x2c>
14c: c4 5c 60 20 ldx [ %l1 + 0x20 ], %g2
spin_lock(&journal->j_list_lock);
J_ASSERT(jinode->i_transaction == commit_transaction);
wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING);
}
spin_unlock(&journal->j_list_lock);
150: 90 10 00 12 mov %l2, %o0
154: 40 00 00 00 call 154 <journal_submit_data_buffers+0xb8>
158: b1 3e 20 00 sra %i0, 0, %i0
return ret;
}
15c: 81 cf e0 08 rett %i7 + 8
160: 01 00 00 00 nop


Attachments:
jbd2-commit-noinline.out (148.35 kB)

2009-03-25 19:43:17

by Jan Kara

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs

On Wed 25-03-09 20:07:46, Alexander Beregalov wrote:
> 2009/3/25 Jan Kara <[email protected]>:
> > On Wed 25-03-09 18:29:10, Alexander Beregalov wrote:
> >> 2009/3/25 Jan Kara <[email protected]>:
> >> > On Wed 25-03-09 18:18:43, Alexander Beregalov wrote:
> >> >> 2009/3/25 Jan Kara <[email protected]>:
> >> >> >> > So, I think I need to try it on 2.6.29-rc7 again.
> >> >> >> ? I've looked into this. Obviously, what's happenning is that we delete
> >> >> >> an inode and jbd2_journal_release_jbd_inode() finds inode is just under
> >> >> >> writeout in transaction commit and thus it waits. But it gets never woken
> >> >> >> up and because it has a handle from the transaction, every one eventually
> >> >> >> blocks on waiting for a transaction to finish.
> >> >> >> ? But I don't really see how that can happen. The code is really
> >> >> >> straightforward and everything happens under j_list_lock... Strange.
> >> >> > ?BTW: Is the system SMP?
> >> >> No, it is UP system.
> >> > ?Even stranger. And do you have CONFIG_PREEMPT set?
> >> >
> >> >> The bug exists even in 2.6.29, I posted it with a new topic.
> >> > ?OK, I've sort-of expected this.
> >>
> >> CONFIG_PREEMPT_RCU=y
> >> CONFIG_PREEMPT_RCU_TRACE=y
> >> # CONFIG_PREEMPT_NONE is not set
> >> # CONFIG_PREEMPT_VOLUNTARY is not set
> >> CONFIG_PREEMPT=y
> >> CONFIG_DEBUG_PREEMPT=y
> >> # CONFIG_PREEMPT_TRACER is not set
> >>
> >> config is attached.
> > ?Thanks for the data. I still don't see how the wakeup can get lost. The
> > process even cannot be preempted when we are in the section protected by
> > j_list_lock... Can you send me a disassembly of functions
> > jbd2_journal_release_jbd_inode() and journal_submit_data_buffers() so that
> > I can see whether the compiler has not reordered something unexpectedly?
Thanks for the disassembly...

> By default gcc inlines journal_submit_data_buffers()
> Here is -fno-inline version. Default version is in attach.
> ====
>
> static int journal_submit_data_buffers(journal_t *journal,
> transaction_t *commit_transaction)
> {
> 9c: 9d e3 bf 40 save %sp, -192, %sp
> a0: 11 00 00 00 sethi %hi(0), %o0
> struct jbd2_inode *jinode;
> int err, ret = 0;
> struct address_space *mapping;
>
> spin_lock(&journal->j_list_lock);
> a4: a4 06 25 70 add %i0, 0x570, %l2
> * our inode list. We use JI_COMMIT_RUNNING flag to protect inode we currently
> * operate on from being released while we write out pages.
> */
> static int journal_submit_data_buffers(journal_t *journal,
> transaction_t *commit_transaction)
> {
> a8: 90 12 20 00 mov %o0, %o0
> ac: 40 00 00 00 call ac <journal_submit_data_buffers+0x10>
> b0: b0 10 20 00 clr %i0
> struct jbd2_inode *jinode;
> int err, ret = 0;
> struct address_space *mapping;
>
> spin_lock(&journal->j_list_lock);
> list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
> b4: a6 06 60 60 add %i1, 0x60, %l3
> {
> struct jbd2_inode *jinode;
> int err, ret = 0;
> struct address_space *mapping;
>
> spin_lock(&journal->j_list_lock);
> b8: 40 00 00 00 call b8 <journal_submit_data_buffers+0x1c>
> bc: 90 10 00 12 mov %l2, %o0
> list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
> c0: 10 68 00 1d b %xcc, 134 <journal_submit_data_buffers+0x98>
> c4: c2 5e 60 60 ldx [ %i1 + 0x60 ], %g1
> mapping = jinode->i_vfs_inode->i_mapping;
> jinode->i_flags |= JI_COMMIT_RUNNING;
> spin_unlock(&journal->j_list_lock);
> c8: 90 10 00 12 mov %l2, %o0
> struct address_space *mapping;
>
> spin_lock(&journal->j_list_lock);
> list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
> mapping = jinode->i_vfs_inode->i_mapping;
> jinode->i_flags |= JI_COMMIT_RUNNING;
> cc: c2 04 60 28 ld [ %l1 + 0x28 ], %g1
Here we load jbd2_inode->i_flags into %g1.

> int err, ret = 0;
> struct address_space *mapping;
>
> spin_lock(&journal->j_list_lock);
> list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
> mapping = jinode->i_vfs_inode->i_mapping;
> d0: e0 58 a1 e0 ldx [ %g2 + 0x1e0 ], %l0
> jinode->i_flags |= JI_COMMIT_RUNNING;
> d4: 82 10 60 01 or %g1, 1, %g1
Here we set JI_COMMIT_RUNNING.

> spin_unlock(&journal->j_list_lock);
> d8: 40 00 00 00 call d8 <journal_submit_data_buffers+0x3c>
Here we seem to call preempt_disable() (it would be useful if we could
confirm that - easiest option I know is compiling JBD2 into a kernel but
some object file trickery should be able to find it out as well...)

> dc: c2 24 60 28 st %g1, [ %l1 + 0x28 ]
And here we store the register back to memory - but we could be already
preempted here which could cause bugs...

> * submit the inode data buffers. We use writepage
> * instead of writepages. Because writepages can do
> * block allocation with delalloc. We need to write
> * only allocated blocks here.
> */
> err = journal_submit_inode_data_buffers(mapping);
> e0: 7f ff ff d3 call 2c <journal_submit_inode_data_buffers>
> e4: 90 10 00 10 mov %l0, %o0
> if (!ret)
> e8: 80 a6 20 00 cmp %i0, 0
> ec: b1 64 40 08 move %icc, %o0, %i0
> ret = err;
> spin_lock(&journal->j_list_lock);
> f0: 40 00 00 00 call f0 <journal_submit_data_buffers+0x54>
> f4: 90 10 00 12 mov %l2, %o0
> J_ASSERT(jinode->i_transaction == commit_transaction);
> f8: c2 5c 40 00 ldx [ %l1 ], %g1
> fc: 80 a0 40 19 cmp %g1, %i1
> 100: 22 68 00 07 be,a %xcc, 11c
> <journal_submit_data_buffers+0x80>
> 104: c2 04 60 28 ld [ %l1 + 0x28 ], %g1
Again, here we load jinode->i_flags.

> 108: 11 00 00 00 sethi %hi(0), %o0
> 10c: 92 10 21 04 mov 0x104, %o1
> 110: 40 00 00 00 call 110 <journal_submit_data_buffers+0x74>
> 114: 90 12 20 00 mov %o0, %o0
> 118: 91 d0 20 05 ta 5
> jinode->i_flags &= ~JI_COMMIT_RUNNING;
> wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING);
> 11c: 90 04 60 28 add %l1, 0x28, %o0
> 120: 92 10 20 00 clr %o1
> err = journal_submit_inode_data_buffers(mapping);
> if (!ret)
> ret = err;
> spin_lock(&journal->j_list_lock);
> J_ASSERT(jinode->i_transaction == commit_transaction);
> jinode->i_flags &= ~JI_COMMIT_RUNNING;
> 124: 82 08 7f fe and %g1, -2, %g1
Here we go &= ~JI_COMMIT_RUNNING

> wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING);
> 128: 40 00 00 00 call 128 <journal_submit_data_buffers+0x8c>
> 12c: c2 24 60 28 st %g1, [ %l1 + 0x28 ]
And only here we store it back to memory...

> struct jbd2_inode *jinode;
> int err, ret = 0;
> struct address_space *mapping;
>
> spin_lock(&journal->j_list_lock);
> list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
> 130: c2 5c 60 10 ldx [ %l1 + 0x10 ], %g1
> 134: a2 00 7f f0 add %g1, -16, %l1
> * prefetches into the prefetch-cache which only is accessible
> * by floating point operations in UltraSPARC-III and later.
> * By contrast, "#one_write" prefetches into the L2 cache
> * in shared state.
> */
> __asm__ __volatile__("prefetch [%0], #one_write"
> 138: c2 5c 60 10 ldx [ %l1 + 0x10 ], %g1
> 13c: c7 68 40 00 prefetch [ %g1 ], #one_write
> 140: 82 04 60 10 add %l1, 0x10, %g1
> 144: 80 a4 c0 01 cmp %l3, %g1
> 148: 32 6f ff e0 bne,a %xcc, c8
> <journal_submit_data_buffers+0x2c>
> 14c: c4 5c 60 20 ldx [ %l1 + 0x20 ], %g2
> spin_lock(&journal->j_list_lock);
> J_ASSERT(jinode->i_transaction == commit_transaction);
> wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING);
> }
> spin_unlock(&journal->j_list_lock);
> 150: 90 10 00 12 mov %l2, %o0
> 154: 40 00 00 00 call 154 <journal_submit_data_buffers+0xb8>
> 158: b1 3e 20 00 sra %i0, 0, %i0
> return ret;
> }
> 15c: 81 cf e0 08 rett %i7 + 8
> 160: 01 00 00 00 nop
So the compiled code looks a bit suspitious to me. Having the disassembly
with symbols properly resolved would help confirm it. I'm adding sparc list
to CC just in case someone sees the problem...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2009-03-25 22:38:32

by Alexander Beregalov

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs

2009/3/25 Jan Kara <[email protected]>:
> On Wed 25-03-09 20:07:46, Alexander Beregalov wrote:
>> 2009/3/25 Jan Kara <[email protected]>:
>> > On Wed 25-03-09 18:29:10, Alexander Beregalov wrote:
>> >> 2009/3/25 Jan Kara <[email protected]>:
>> >> > On Wed 25-03-09 18:18:43, Alexander Beregalov wrote:
>> >> >> 2009/3/25 Jan Kara <[email protected]>:
>> >> >> >> > So, I think I need to try it on 2.6.29-rc7 again.
>> >> >> >>   I've looked into this. Obviously, what's happenning is that we delete
>> >> >> >> an inode and jbd2_journal_release_jbd_inode() finds inode is just under
>> >> >> >> writeout in transaction commit and thus it waits. But it gets never woken
>> >> >> >> up and because it has a handle from the transaction, every one eventually
>> >> >> >> blocks on waiting for a transaction to finish.
>> >> >> >>   But I don't really see how that can happen. The code is really
>> >> >> >> straightforward and everything happens under j_list_lock... Strange.
>> >> >> >  BTW: Is the system SMP?
>> >> >> No, it is UP system.
>> >> >  Even stranger. And do you have CONFIG_PREEMPT set?
>> >> >
>> >> >> The bug exists even in 2.6.29, I posted it with a new topic.
>> >> >  OK, I've sort-of expected this.
>> >>
>> >> CONFIG_PREEMPT_RCU=y
>> >> CONFIG_PREEMPT_RCU_TRACE=y
>> >> # CONFIG_PREEMPT_NONE is not set
>> >> # CONFIG_PREEMPT_VOLUNTARY is not set
>> >> CONFIG_PREEMPT=y
>> >> CONFIG_DEBUG_PREEMPT=y
>> >> # CONFIG_PREEMPT_TRACER is not set
>> >>
>> >> config is attached.
>> >  Thanks for the data. I still don't see how the wakeup can get lost. The
>> > process even cannot be preempted when we are in the section protected by
>> > j_list_lock... Can you send me a disassembly of functions
>> > jbd2_journal_release_jbd_inode() and journal_submit_data_buffers() so that
>> > I can see whether the compiler has not reordered something unexpectedly?
>  Thanks for the disassembly...
>
>> By default gcc inlines journal_submit_data_buffers()
>> Here is -fno-inline version. Default version is in attach.
>> ====
>>
>> static int journal_submit_data_buffers(journal_t *journal,
>>                 transaction_t *commit_transaction)
>> {
>>       9c:       9d e3 bf 40     save  %sp, -192, %sp
>>       a0:       11 00 00 00     sethi  %hi(0), %o0
>>         struct jbd2_inode *jinode;
>>         int err, ret = 0;
>>         struct address_space *mapping;
>>
>>         spin_lock(&journal->j_list_lock);
>>       a4:       a4 06 25 70     add  %i0, 0x570, %l2
>>  * our inode list. We use JI_COMMIT_RUNNING flag to protect inode we currently
>>  * operate on from being released while we write out pages.
>>  */
>> static int journal_submit_data_buffers(journal_t *journal,
>>                 transaction_t *commit_transaction)
>> {
>>       a8:       90 12 20 00     mov  %o0, %o0
>>       ac:       40 00 00 00     call  ac <journal_submit_data_buffers+0x10>
>>       b0:       b0 10 20 00     clr  %i0
>>         struct jbd2_inode *jinode;
>>         int err, ret = 0;
>>         struct address_space *mapping;
>>
>>         spin_lock(&journal->j_list_lock);
>>         list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
>>       b4:       a6 06 60 60     add  %i1, 0x60, %l3
>> {
>>         struct jbd2_inode *jinode;
>>         int err, ret = 0;
>>         struct address_space *mapping;
>>
>>         spin_lock(&journal->j_list_lock);
>>       b8:       40 00 00 00     call  b8 <journal_submit_data_buffers+0x1c>
>>       bc:       90 10 00 12     mov  %l2, %o0
>>         list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
>>       c0:       10 68 00 1d     b  %xcc, 134 <journal_submit_data_buffers+0x98>
>>       c4:       c2 5e 60 60     ldx  [ %i1 + 0x60 ], %g1
>>                 mapping = jinode->i_vfs_inode->i_mapping;
>>                 jinode->i_flags |= JI_COMMIT_RUNNING;
>>                 spin_unlock(&journal->j_list_lock);
>>       c8:       90 10 00 12     mov  %l2, %o0
>>         struct address_space *mapping;
>>
>>         spin_lock(&journal->j_list_lock);
>>         list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
>>                 mapping = jinode->i_vfs_inode->i_mapping;
>>                 jinode->i_flags |= JI_COMMIT_RUNNING;
>>       cc:       c2 04 60 28     ld  [ %l1 + 0x28 ], %g1
>  Here we load jbd2_inode->i_flags into %g1.
>
>>         int err, ret = 0;
>>         struct address_space *mapping;
>>
>>         spin_lock(&journal->j_list_lock);
>>         list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
>>                 mapping = jinode->i_vfs_inode->i_mapping;
>>       d0:       e0 58 a1 e0     ldx  [ %g2 + 0x1e0 ], %l0
>>                 jinode->i_flags |= JI_COMMIT_RUNNING;
>>       d4:       82 10 60 01     or  %g1, 1, %g1
>  Here we set JI_COMMIT_RUNNING.
>
>>                 spin_unlock(&journal->j_list_lock);
>>       d8:       40 00 00 00     call  d8 <journal_submit_data_buffers+0x3c>
>  Here we seem to call preempt_disable() (it would be useful if we could
> confirm that - easiest option I know is compiling JBD2 into a kernel but
> some object file trickery should be able to find it out as well...)
55bab0: 82 10 60 01 or %g1, 1, %g1
spin_unlock(&journal->j_list_lock);
55bab4: 40 06 4b 20 call 6ee734 <_spin_unlock>
55bab8: c2 24 e0 28 st %g1, [ %l3 + 0x28 ]
<..>
void __lockfunc _spin_unlock(spinlock_t *lock)
{
6ee734: 9d e3 bf 40 save %sp, -192, %sp
6ee738: 11 00 1f f9 sethi %hi(0x7fe400), %o0
6ee73c: 7f fb 36 59 call 5bc0a0 <_mcount>
6ee740: 90 12 22 40 or %o0, 0x240, %o0 ! 7fe640
<rt_trace_on+0xc8>
spin_release(&lock->dep_map, 1, _RET_IP_);
6ee744: 94 10 00 1f mov %i7, %o2
6ee748: 92 10 20 01 mov 1, %o1
6ee74c: 7f f6 21 18 call 476bac <lock_release>
6ee750: 90 06 20 18 add %i0, 0x18, %o0
_raw_spin_unlock(lock);
6ee754: 7f fb 7f 91 call 5ce598 <_raw_spin_unlock>
6ee758: 90 10 00 18 mov %i0, %o0
preempt_enable();
6ee75c: 40 00 05 fd call 6eff50 <sub_preempt_count>
6ee760: 90 10 20 01 mov 1, %o0
6ee764: c2 59 a0 08 ldx [ %g6 + 8 ], %g1
6ee768: 82 08 60 08 and %g1, 8, %g1
6ee76c: 02 c8 40 04 brz %g1, 6ee77c <_spin_unlock+0x48>
6ee770: 01 00 00 00 nop
6ee774: 7f ff f4 f1 call 6ebb38 <preempt_schedule>
6ee778: 01 00 00 00 nop
6ee77c: 81 cf e0 08 rett %i7 + 8
6ee780: 01 00 00 00 nop
}


>
>>       dc:       c2 24 60 28     st  %g1, [ %l1 + 0x28 ]
>  And here we store the register back to memory - but we could be already
> preempted here which could cause bugs...
>
>>                  * submit the inode data buffers. We use writepage
>>                  * instead of writepages. Because writepages can do
>>                  * block allocation  with delalloc. We need to write
>>                  * only allocated blocks here.
>>                  */
>>                 err = journal_submit_inode_data_buffers(mapping);
>>       e0:       7f ff ff d3     call  2c <journal_submit_inode_data_buffers>
>>       e4:       90 10 00 10     mov  %l0, %o0
>>                 if (!ret)
>>       e8:       80 a6 20 00     cmp  %i0, 0
>>       ec:       b1 64 40 08     move  %icc, %o0, %i0
>>                         ret = err;
>>                 spin_lock(&journal->j_list_lock);
>>       f0:       40 00 00 00     call  f0 <journal_submit_data_buffers+0x54>
>>       f4:       90 10 00 12     mov  %l2, %o0
>>                 J_ASSERT(jinode->i_transaction == commit_transaction);
>>       f8:       c2 5c 40 00     ldx  [ %l1 ], %g1
>>       fc:       80 a0 40 19     cmp  %g1, %i1
>>      100:       22 68 00 07     be,a   %xcc, 11c
>> <journal_submit_data_buffers+0x80>
>>      104:       c2 04 60 28     ld  [ %l1 + 0x28 ], %g1
>  Again, here we load jinode->i_flags.
>
>>      108:       11 00 00 00     sethi  %hi(0), %o0
>>      10c:       92 10 21 04     mov  0x104, %o1
>>      110:       40 00 00 00     call  110 <journal_submit_data_buffers+0x74>
>>      114:       90 12 20 00     mov  %o0, %o0
>>      118:       91 d0 20 05     ta  5
>>                 jinode->i_flags &= ~JI_COMMIT_RUNNING;
>>                 wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING);
>>      11c:       90 04 60 28     add  %l1, 0x28, %o0
>>      120:       92 10 20 00     clr  %o1
>>                 err = journal_submit_inode_data_buffers(mapping);
>>                 if (!ret)
>>                         ret = err;
>>                 spin_lock(&journal->j_list_lock);
>>                 J_ASSERT(jinode->i_transaction == commit_transaction);
>>                 jinode->i_flags &= ~JI_COMMIT_RUNNING;
>>      124:       82 08 7f fe     and  %g1, -2, %g1
>  Here we go &= ~JI_COMMIT_RUNNING
>
>>                 wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING);
>>      128:       40 00 00 00     call  128 <journal_submit_data_buffers+0x8c>
>>      12c:       c2 24 60 28     st  %g1, [ %l1 + 0x28 ]
>  And only here we store it back to memory...

spin_lock(&journal->j_list_lock);
55c104: 40 06 48 3c call 6ee1f4 <_spin_lock>
55c108: 90 10 00 12 mov %l2, %o0
jinode->i_flags &= ~JI_COMMIT_RUNNING;
55c10c: c2 04 20 28 ld [ %l0 + 0x28 ], %g1
wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING);
55c110: 90 04 20 28 add %l0, 0x28, %o0
55c114: 92 10 20 00 clr %o1
if (!ret)
ret = err;
}
spin_lock(&journal->j_list_lock);
jinode->i_flags &= ~JI_COMMIT_RUNNING;
55c118: 82 08 7f fe and %g1, -2, %g1
wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING);
55c11c: 7f fc 27 6e call 465ed4 <wake_up_bit>
55c120: c2 24 20 28 st %g1, [ %l0 + 0x28 ]
<..>
void __lockfunc _spin_lock(spinlock_t *lock)
{
6ee1f4: 9d e3 bf 30 save %sp, -208, %sp
6ee1f8: 11 00 1f f9 sethi %hi(0x7fe400), %o0
6ee1fc: 7f fb 37 a9 call 5bc0a0 <_mcount>
6ee200: 90 12 21 c8 or %o0, 0x1c8, %o0 ! 7fe5c8
<rt_trace_on+0x50>
preempt_disable();
6ee204: 40 00 07 83 call 6f0010 <add_preempt_count>
6ee208: 90 10 20 01 mov 1, %o0
spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
6ee20c: 92 10 20 00 clr %o1
6ee210: 90 06 20 18 add %i0, 0x18, %o0
6ee214: fe 73 a8 af stx %i7, [ %sp + 0x8af ]
6ee218: 94 10 20 00 clr %o2
6ee21c: 96 10 20 00 clr %o3
6ee220: 98 10 20 02 mov 2, %o4
6ee224: 7f f6 21 a5 call 4768b8 <lock_acquire>
6ee228: 9a 10 20 00 clr %o5
LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
6ee22c: 7f fb 81 33 call 5ce6f8 <_raw_spin_lock>
6ee230: 90 10 00 18 mov %i0, %o0
}
6ee234: 81 cf e0 08 rett %i7 + 8
6ee238: 01 00 00 00 nop
<..>
void wake_up_bit(void *word, int bit)
{
465ed4: 9d e3 bf 40 save %sp, -192, %sp
465ed8: 11 00 1f f2 sethi %hi(0x7fc800), %o0
465edc: 40 05 58 71 call 5bc0a0 <_mcount>
465ee0: 90 12 23 c8 or %o0, 0x3c8, %o0 ! 7fcbc8
<kthread_stop_lock+0x88>
__wake_up_bit(bit_waitqueue(word, bit), word, bit);
465ee4: 92 10 00 19 mov %i1, %o1
465ee8: 7f ff ff c5 call 465dfc <bit_waitqueue>
465eec: 90 10 00 18 mov %i0, %o0
465ef0: 92 10 00 18 mov %i0, %o1
465ef4: 7f ff ff e7 call 465e90 <__wake_up_bit>
465ef8: 94 10 00 19 mov %i1, %o2
}
465efc: 81 cf e0 08 rett %i7 + 8
465f00: 01 00 00 00 nop


>
>>         struct jbd2_inode *jinode;
>>         int err, ret = 0;
>>         struct address_space *mapping;
>>
>>         spin_lock(&journal->j_list_lock);
>>         list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
>>      130:       c2 5c 60 10     ldx  [ %l1 + 0x10 ], %g1
>>      134:       a2 00 7f f0     add  %g1, -16, %l1
>>          * prefetches into the prefetch-cache which only is accessible
>>          * by floating point operations in UltraSPARC-III and later.
>>          * By contrast, "#one_write" prefetches into the L2 cache
>>          * in shared state.
>>          */
>>         __asm__ __volatile__("prefetch [%0], #one_write"
>>      138:       c2 5c 60 10     ldx  [ %l1 + 0x10 ], %g1
>>      13c:       c7 68 40 00     prefetch  [ %g1 ], #one_write
>>      140:       82 04 60 10     add  %l1, 0x10, %g1
>>      144:       80 a4 c0 01     cmp  %l3, %g1
>>      148:       32 6f ff e0     bne,a   %xcc, c8
>> <journal_submit_data_buffers+0x2c>
>>      14c:       c4 5c 60 20     ldx  [ %l1 + 0x20 ], %g2
>>                 spin_lock(&journal->j_list_lock);
>>                 J_ASSERT(jinode->i_transaction == commit_transaction);
>>                 wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING);
>>         }
>>         spin_unlock(&journal->j_list_lock);
>>      150:       90 10 00 12     mov  %l2, %o0
>>      154:       40 00 00 00     call  154 <journal_submit_data_buffers+0xb8>
>>      158:       b1 3e 20 00     sra  %i0, 0, %i0
>>         return ret;
>> }
>>      15c:       81 cf e0 08     rett  %i7 + 8
>>      160:       01 00 00 00     nop
>  So the compiled code looks a bit suspitious to me. Having the disassembly
> with symbols properly resolved would help confirm it. I'm adding sparc list
> to CC just in case someone sees the problem...
>
>                                                                        Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR
>

2009-03-26 00:00:39

by Jan Kara

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs

Hi,

thanks for the updated disassembly.

> 55bab0: 82 10 60 01 or %g1, 1, %g1
> spin_unlock(&journal->j_list_lock);
> 55bab4: 40 06 4b 20 call 6ee734 <_spin_unlock>
> 55bab8: c2 24 e0 28 st %g1, [ %l3 + 0x28 ]
OK, so it really seems that:
jinode->i_flags |= JI_COMMIT_RUNNING;
spin_unlock(&journal->j_list_lock);

has been compiled to
ld [ %l3 + 0x28 ], %g1
or %g1, 1, %g1
call 6ee734 <_spin_unlock>
st %g1, [ %l3 + 0x28 ]

Which seems like a bug in the compiler or in the way implement compiler
barriers in spin_unlock() on UP sparc. Or is there some sparc magic by which
this is correct code? Any clever sparc guy?

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2009-03-26 00:25:55

by Jan Kara

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs

On Thu 26-03-09 01:17:06, Jiri Gaisler wrote:
> Jan Kara wrote:
> > Hi,
> >
> > thanks for the updated disassembly.
> >
> >> 55bab0: 82 10 60 01 or %g1, 1, %g1
> >> spin_unlock(&journal->j_list_lock);
> >> 55bab4: 40 06 4b 20 call 6ee734 <_spin_unlock>
> >> 55bab8: c2 24 e0 28 st %g1, [ %l3 + 0x28 ]
> > OK, so it really seems that:
> > jinode->i_flags |= JI_COMMIT_RUNNING;
> > spin_unlock(&journal->j_list_lock);
> >
> > has been compiled to
> > ld [ %l3 + 0x28 ], %g1
> > or %g1, 1, %g1
> > call 6ee734 <_spin_unlock>
> > st %g1, [ %l3 + 0x28 ]
> >
> > Which seems like a bug in the compiler or in the way implement compiler
> > barriers in spin_unlock() on UP sparc. Or is there some sparc magic by which
> > this is correct code? Any clever sparc guy?
> >
> Looks fine to me - the store is made in the branch delay slot
> of the call, so it will occur before the first instruction at
> the call target address executes.
Thanks for explanation. I knew sparc had some surprising features like
this ;).
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2009-03-26 00:17:06

by Jiri Gaisler

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs



Jan Kara wrote:
> Hi,
>
> thanks for the updated disassembly.
>
>> 55bab0: 82 10 60 01 or %g1, 1, %g1
>> spin_unlock(&journal->j_list_lock);
>> 55bab4: 40 06 4b 20 call 6ee734 <_spin_unlock>
>> 55bab8: c2 24 e0 28 st %g1, [ %l3 + 0x28 ]
> OK, so it really seems that:
> jinode->i_flags |= JI_COMMIT_RUNNING;
> spin_unlock(&journal->j_list_lock);
>
> has been compiled to
> ld [ %l3 + 0x28 ], %g1
> or %g1, 1, %g1
> call 6ee734 <_spin_unlock>
> st %g1, [ %l3 + 0x28 ]
>
> Which seems like a bug in the compiler or in the way implement compiler
> barriers in spin_unlock() on UP sparc. Or is there some sparc magic by which
> this is correct code? Any clever sparc guy?
>
> Honza


Looks fine to me - the store is made in the branch delay slot
of the call, so it will occur before the first instruction at
the call target address executes.
Jiri.

2009-03-31 10:01:54

by Jan Kara

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs

On Thu 26-03-09 01:38:32, Alexander Beregalov wrote:
> 2009/3/25 Jan Kara <[email protected]>:
> > On Wed 25-03-09 20:07:46, Alexander Beregalov wrote:
> >> 2009/3/25 Jan Kara <[email protected]>:
> >> > On Wed 25-03-09 18:29:10, Alexander Beregalov wrote:
> >> >> 2009/3/25 Jan Kara <[email protected]>:
> >> >> > On Wed 25-03-09 18:18:43, Alexander Beregalov wrote:
> >> >> >> 2009/3/25 Jan Kara <[email protected]>:
> >> >> >> >> > So, I think I need to try it on 2.6.29-rc7 again.
> >> >> >> >> ? I've looked into this. Obviously, what's happenning is that we delete
> >> >> >> >> an inode and jbd2_journal_release_jbd_inode() finds inode is just under
> >> >> >> >> writeout in transaction commit and thus it waits. But it gets never woken
> >> >> >> >> up and because it has a handle from the transaction, every one eventually
> >> >> >> >> blocks on waiting for a transaction to finish.
> >> >> >> >> ? But I don't really see how that can happen. The code is really
> >> >> >> >> straightforward and everything happens under j_list_lock... Strange.
> >> >> >> > ?BTW: Is the system SMP?
> >> >> >> No, it is UP system.
> >> >> > ?Even stranger. And do you have CONFIG_PREEMPT set?
> >> >> >
> >> >> >> The bug exists even in 2.6.29, I posted it with a new topic.
> >> >> > ?OK, I've sort-of expected this.
> >> >>
> >> >> CONFIG_PREEMPT_RCU=y
> >> >> CONFIG_PREEMPT_RCU_TRACE=y
> >> >> # CONFIG_PREEMPT_NONE is not set
> >> >> # CONFIG_PREEMPT_VOLUNTARY is not set
> >> >> CONFIG_PREEMPT=y
> >> >> CONFIG_DEBUG_PREEMPT=y
> >> >> # CONFIG_PREEMPT_TRACER is not set
> >> >>
> >> >> config is attached.
> >> > ?Thanks for the data. I still don't see how the wakeup can get lost. The
> >> > process even cannot be preempted when we are in the section protected by
> >> > j_list_lock... Can you send me a disassembly of functions
> >> > jbd2_journal_release_jbd_inode() and journal_submit_data_buffers() so that
> >> > I can see whether the compiler has not reordered something unexpectedly?
> > ?Thanks for the disassembly...
> >
> >> By default gcc inlines journal_submit_data_buffers()
> >> Here is -fno-inline version. Default version is in attach.
<snip>

I'm helpless here. I don't see how we can miss a wakeup (plus you seem to
be the only one reporting the bug). Could you please compile and test the kernel
with the attached patch? It will print to kernel log when we go to sleep
waiting for inode commit and when we send wakeups etc. When you hit the
deadlock, please send me your kernel log. It should help with debugging why do
we miss the wakeup. Thanks.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2009-03-31 10:07:30

by Alexander Beregalov

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs

2009/3/31 Jan Kara <[email protected]>:
> On Thu 26-03-09 01:38:32, Alexander Beregalov wrote:
>> 2009/3/25 Jan Kara <[email protected]>:
>> > On Wed 25-03-09 20:07:46, Alexander Beregalov wrote:
>> >> 2009/3/25 Jan Kara <[email protected]>:
>> >> > On Wed 25-03-09 18:29:10, Alexander Beregalov wrote:
>> >> >> 2009/3/25 Jan Kara <[email protected]>:
>> >> >> > On Wed 25-03-09 18:18:43, Alexander Beregalov wrote:
>> >> >> >> 2009/3/25 Jan Kara <[email protected]>:
>> >> >> >> >> > So, I think I need to try it on 2.6.29-rc7 again.
>> >> >> >> >>   I've looked into this. Obviously, what's happenning is that we delete
>> >> >> >> >> an inode and jbd2_journal_release_jbd_inode() finds inode is just under
>> >> >> >> >> writeout in transaction commit and thus it waits. But it gets never woken
>> >> >> >> >> up and because it has a handle from the transaction, every one eventually
>> >> >> >> >> blocks on waiting for a transaction to finish.
>> >> >> >> >>   But I don't really see how that can happen. The code is really
>> >> >> >> >> straightforward and everything happens under j_list_lock... Strange.
>> >> >> >> >  BTW: Is the system SMP?
>> >> >> >> No, it is UP system.
>> >> >> >  Even stranger. And do you have CONFIG_PREEMPT set?
>> >> >> >
>> >> >> >> The bug exists even in 2.6.29, I posted it with a new topic.
>> >> >> >  OK, I've sort-of expected this.
>> >> >>
>> >> >> CONFIG_PREEMPT_RCU=y
>> >> >> CONFIG_PREEMPT_RCU_TRACE=y
>> >> >> # CONFIG_PREEMPT_NONE is not set
>> >> >> # CONFIG_PREEMPT_VOLUNTARY is not set
>> >> >> CONFIG_PREEMPT=y
>> >> >> CONFIG_DEBUG_PREEMPT=y
>> >> >> # CONFIG_PREEMPT_TRACER is not set
>> >> >>
>> >> >> config is attached.
>> >> >  Thanks for the data. I still don't see how the wakeup can get lost. The
>> >> > process even cannot be preempted when we are in the section protected by
>> >> > j_list_lock... Can you send me a disassembly of functions
>> >> > jbd2_journal_release_jbd_inode() and journal_submit_data_buffers() so that
>> >> > I can see whether the compiler has not reordered something unexpectedly?
>> >  Thanks for the disassembly...
>> >
>> >> By default gcc inlines journal_submit_data_buffers()
>> >> Here is -fno-inline version. Default version is in attach.
>  <snip>
>
>  I'm helpless here. I don't see how we can miss a wakeup (plus you seem to
> be the only one reporting the bug). Could you please compile and test the kernel
> with the attached patch? It will print to kernel log when we go to sleep
> waiting for inode commit and when we send wakeups etc. When you hit the
> deadlock, please send me your kernel log. It should help with debugging why do
> we miss the wakeup. Thanks.

Which patch?

2009-03-31 12:33:11

by Jan Kara

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs

On Tue 31-03-09 14:07:30, Alexander Beregalov wrote:
> 2009/3/31 Jan Kara <[email protected]>:
> > On Thu 26-03-09 01:38:32, Alexander Beregalov wrote:
> >> 2009/3/25 Jan Kara <[email protected]>:
> >> > On Wed 25-03-09 20:07:46, Alexander Beregalov wrote:
> >> >> 2009/3/25 Jan Kara <[email protected]>:
> >> >> > On Wed 25-03-09 18:29:10, Alexander Beregalov wrote:
> >> >> >> 2009/3/25 Jan Kara <[email protected]>:
> >> >> >> > On Wed 25-03-09 18:18:43, Alexander Beregalov wrote:
> >> >> >> >> 2009/3/25 Jan Kara <[email protected]>:
> >> >> >> >> >> > So, I think I need to try it on 2.6.29-rc7 again.
> >> >> >> >> >> ? I've looked into this. Obviously, what's happenning is that we delete
> >> >> >> >> >> an inode and jbd2_journal_release_jbd_inode() finds inode is just under
> >> >> >> >> >> writeout in transaction commit and thus it waits. But it gets never woken
> >> >> >> >> >> up and because it has a handle from the transaction, every one eventually
> >> >> >> >> >> blocks on waiting for a transaction to finish.
> >> >> >> >> >> ? But I don't really see how that can happen. The code is really
> >> >> >> >> >> straightforward and everything happens under j_list_lock... Strange.
> >> >> >> >> > ?BTW: Is the system SMP?
> >> >> >> >> No, it is UP system.
> >> >> >> > ?Even stranger. And do you have CONFIG_PREEMPT set?
> >> >> >> >
> >> >> >> >> The bug exists even in 2.6.29, I posted it with a new topic.
> >> >> >> > ?OK, I've sort-of expected this.
> >> >> >>
> >> >> >> CONFIG_PREEMPT_RCU=y
> >> >> >> CONFIG_PREEMPT_RCU_TRACE=y
> >> >> >> # CONFIG_PREEMPT_NONE is not set
> >> >> >> # CONFIG_PREEMPT_VOLUNTARY is not set
> >> >> >> CONFIG_PREEMPT=y
> >> >> >> CONFIG_DEBUG_PREEMPT=y
> >> >> >> # CONFIG_PREEMPT_TRACER is not set
> >> >> >>
> >> >> >> config is attached.
> >> >> > ?Thanks for the data. I still don't see how the wakeup can get lost. The
> >> >> > process even cannot be preempted when we are in the section protected by
> >> >> > j_list_lock... Can you send me a disassembly of functions
> >> >> > jbd2_journal_release_jbd_inode() and journal_submit_data_buffers() so that
> >> >> > I can see whether the compiler has not reordered something unexpectedly?
> >> > ?Thanks for the disassembly...
> >> >
> >> >> By default gcc inlines journal_submit_data_buffers()
> >> >> Here is -fno-inline version. Default version is in attach.
> > ?<snip>
> >
> > ?I'm helpless here. I don't see how we can miss a wakeup (plus you seem to
> > be the only one reporting the bug). Could you please compile and test the kernel
> > with the attached patch? It will print to kernel log when we go to sleep
> > waiting for inode commit and when we send wakeups etc. When you hit the
> > deadlock, please send me your kernel log. It should help with debugging why do
> > we miss the wakeup. Thanks.
>
> Which patch?
Ups. Forgot to attach ;).

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR


Attachments:
(No filename) (2.83 kB)
0001-ext4-Debug-sleepers-in-iput.patch (1.94 kB)
Download all attachments

2009-04-02 18:51:03

by Alexander Beregalov

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs

>> >  I'm helpless here. I don't see how we can miss a wakeup (plus you seem to
>> > be the only one reporting the bug). Could you please compile and test the kernel
>> > with the attached patch? It will print to kernel log when we go to sleep
>> > waiting for inode commit and when we send wakeups etc. When you hit the
>> > deadlock, please send me your kernel log. It should help with debugging why do
>> > we miss the wakeup. Thanks.
>>
>> Which patch?
>  Ups. Forgot to attach ;).

Cannot reproduce it on current 2.6.29-git. Strange.
It should already have all ext4/jbd2 patches from next-20090310,
but anyway it happened with 2.6.29-rc8 also.
I ran dbench in cycle on two indentical hosts for more than 24 hours
with no hang tasks.

I will try 2.6.29.

2009-04-04 21:09:31

by Alexander Beregalov

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs

2009/4/2 Alexander Beregalov <[email protected]>:
>>> >  I'm helpless here. I don't see how we can miss a wakeup (plus you seem to
>>> > be the only one reporting the bug). Could you please compile and test the kernel
>>> > with the attached patch? It will print to kernel log when we go to sleep
>>> > waiting for inode commit and when we send wakeups etc. When you hit the
>>> > deadlock, please send me your kernel log. It should help with debugging why do
>>> > we miss the wakeup. Thanks.
>>>
>>> Which patch?
>>  Ups. Forgot to attach ;).
>
> Cannot reproduce it on current 2.6.29-git. Strange.
> It should already have all ext4/jbd2 patches from next-20090310,
> but anyway it happened with 2.6.29-rc8 also.
> I ran dbench in cycle on two indentical hosts for more than 24 hours
> with no hang tasks.
>
> I will try 2.6.29.

I cannot reproduce it with vanilla v2.6.29.
It seems the problem has gone.
Thanks Jan.

The patch output:
[133886.375874] JBD2: Waiting for ino 1062
[133886.376372] JBD2: Waking up sleeper on ino 1062
[133886.376824] JBD2: Woken on ino 1062
[134611.108451] JBD2: Waiting for ino 1102
[134611.108903] JBD2: Waking up sleeper on ino 1102
[134611.109787] JBD2: Woken on ino 1102
[134611.132912] JBD2: Waiting for ino 1074
[134611.133311] JBD2: Waking up sleeper on ino 1074
[134611.133707] JBD2: Woken on ino 1074

2009-04-06 09:20:42

by Jan Kara

[permalink] [raw]
Subject: Re: next-20090310: ext4 hangs

On Sun 05-04-09 01:09:31, Alexander Beregalov wrote:
> 2009/4/2 Alexander Beregalov <[email protected]>:
> >>> > ?I'm helpless here. I don't see how we can miss a wakeup (plus you seem to
> >>> > be the only one reporting the bug). Could you please compile and test the kernel
> >>> > with the attached patch? It will print to kernel log when we go to sleep
> >>> > waiting for inode commit and when we send wakeups etc. When you hit the
> >>> > deadlock, please send me your kernel log. It should help with debugging why do
> >>> > we miss the wakeup. Thanks.
> >>>
> >>> Which patch?
> >> ?Ups. Forgot to attach ;).
> >
> > Cannot reproduce it on current 2.6.29-git. Strange.
> > It should already have all ext4/jbd2 patches from next-20090310,
> > but anyway it happened with 2.6.29-rc8 also.
> > I ran dbench in cycle on two indentical hosts for more than 24 hours
> > with no hang tasks.
> >
> > I will try 2.6.29.
>
> I cannot reproduce it with vanilla v2.6.29.
> It seems the problem has gone.
> Thanks Jan.
Thanks for testing. I'm glad we have one mystery less ;).

> The patch output:
> [133886.375874] JBD2: Waiting for ino 1062
> [133886.376372] JBD2: Waking up sleeper on ino 1062
> [133886.376824] JBD2: Woken on ino 1062
> [134611.108451] JBD2: Waiting for ino 1102
> [134611.108903] JBD2: Waking up sleeper on ino 1102
> [134611.109787] JBD2: Woken on ino 1102
> [134611.132912] JBD2: Waiting for ino 1074
> [134611.133311] JBD2: Waking up sleeper on ino 1074
> [134611.133707] JBD2: Woken on ino 1074
Yes, this is how it should always look...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR