2022-11-09 10:41:40

by zhanchengbin

[permalink] [raw]
Subject: [bug report] e2fsck: The process is deadlocked

Hi Tytso,
The process is deadlocked, and an I/O error occurs when logs
are replayed. Because in the I/O error handling function, I/O
is sent again and catch the mutexlock.
stack:
(gdb) bt
#0 0x0000ffffa740bc34 in ?? () from /usr/lib64/libc.so.6
#1 0x0000ffffa7412024 in pthread_mutex_lock () from /usr/lib64/libc.so.6
#2 0x0000ffffa7654e54 in mutex_lock (kind=CACHE_MTX,
data=0xaaaaf5c98f30) at unix_io.c:151
#3 unix_write_blk64 (channel=0xaaaaf5c98e60, block=2, count=4,
buf=0xaaaaf5c9d170) at unix_io.c:1092
#4 0x0000ffffa762e610 in ext2fs_flush2 (flags=0, fs=0xaaaaf5c98cc0) at
closefs.c:401
#5 ext2fs_flush2 (fs=0xaaaaf5c98cc0, flags=0) at closefs.c:279
#6 0x0000ffffa762eb14 in ext2fs_close2 (fs=fs@entry=0xaaaaf5c98cc0,
flags=flags@entry=0) at closefs.c:510
#7 0x0000ffffa762eba4 in ext2fs_close_free
(fs_ptr=fs_ptr@entry=0xffffc8cbab30) at closefs.c:472
#8 0x0000aaaadcc39bd8 in preenhalt (ctx=ctx@entry=0xaaaaf5c98460) at
util.c:365
#9 0x0000aaaadcc3bc5c in e2fsck_handle_write_error (channel=<optimized
out>, block=262152, count=<optimized out>, data=<optimized out>,
size=<optimized out>, actual=<optimized out>, error=5)
at ehandler.c:114
#10 0x0000ffffa7655044 in reuse_cache (block=262206,
cache=0xaaaaf5c98f80, data=0xaaaaf5c98f30, channel=0xaaaaf5c98e60) at
unix_io.c:583
#11 unix_write_blk64 (channel=0xaaaaf5c98e60, block=262206,
count=<optimized out>, buf=<optimized out>) at unix_io.c:1097
#12 0x0000aaaadcc3702c in ll_rw_block (rw=rw@entry=1,
op_flags=op_flags@entry=0, nr=<optimized out>, nr@entry=1,
bhp=0xffffc8cbac60, bhp@entry=0xffffc8cbac58) at journal.c:184
#13 0x0000aaaadcc375e8 in brelse (bh=<optimized out>,
bh@entry=0xaaaaf5cac4a0) at journal.c:217
#14 0x0000aaaadcc3ebe0 in do_one_pass
(journal=journal@entry=0xaaaaf5c9f590, info=info@entry=0xffffc8cbad60,
pass=pass@entry=PASS_REPLAY) at recovery.c:693
#15 0x0000aaaadcc3ee74 in jbd2_journal_recover (journal=0xaaaaf5c9f590)
at recovery.c:310
#16 0x0000aaaadcc386a8 in recover_ext3_journal (ctx=0xaaaaf5c98460) at
journal.c:1653
#17 e2fsck_run_ext3_journal (ctx=0xaaaaf5c98460) at journal.c:1706
#18 0x0000aaaadcc207e0 in main (argc=<optimized out>, argv=<optimized
out>) at unix.c:1791


2022-11-09 16:02:08

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [bug report] e2fsck: The process is deadlocked

On Wed, Nov 09, 2022 at 06:40:31PM +0800, zhanchengbin wrote:
> Hi Tytso,
> The process is deadlocked, and an I/O error occurs when logs
> are replayed. Because in the I/O error handling function, I/O
> is sent again and catch the mutexlock.

What version of e2fsprogs are you using, and do you have a reliable
reproducer?

Thanks,

- Ted

2022-11-10 03:55:03

by zhanchengbin

[permalink] [raw]
Subject: Re: [bug report] e2fsck: The process is deadlocked

Version is 1.46.4, I think whether to try to release the mutex lock
in the ext2fs_close_free, such as CACHE_MTX,BOUNCE_MTX,STATS_MTX. But
you need to decide if it's the device you're checking, because I've
checked everyplace where ext2fs_close_free is called, in addition to
the call in the program end and exception branch, it is also called
when the journal device is close.

Reliable reproducer is in attachment.

-zhanchengbin.

On 2022/11/9 23:43, Theodore Ts'o wrote:
> On Wed, Nov 09, 2022 at 06:40:31PM +0800, zhanchengbin wrote:
>> Hi Tytso,
>> The process is deadlocked, and an I/O error occurs when logs
>> are replayed. Because in the I/O error handling function, I/O
>> is sent again and catch the mutexlock.
>
> What version of e2fsprogs are you using, and do you have a reliable
> reproducer?
>
> Thanks,
>
> - Ted
>
> .
>


Attachments:
test.sh (2.02 kB)