2011-12-16 11:22:01

by Toshiyuki Okajima

[permalink] [raw]
Subject: [PATCH][RFC] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_updates operations

Hi.

I found a bug by executing the following reproducer (needs tuning).
(The reproducer never ends.)

I wrote it for a confirmation of
"[PATCH 1/5] ext4: allocate delalloc blocks before changing journal mode".
Therefore, without
"[PATCH 1/5] ext4: allocate delalloc blocks before changing journal mode"
patch, the other problem (panic) which the patch describes can happen more
frequently:
-------------------------------------------------------------------------------
#!/bin/sh

date
LOOP=100000
dd if=/dev/zero of=/tmp/ext4.img bs=1k count=1000k
/sbin/mkfs.ext4 -Fq /tmp/ext4.img
mount -o loop /tmp/ext4.img /mnt
rm -f /mnt/file
echo "0" > /mnt/file
(while [ 1 ]; do dd if=/dev/zero of=/mnt/file2 bs=4k > /dev/null 2>&1; done) &
PID=$!
for ((i=0; i < LOOP; i++));
do
DATE=$(date)
echo -n "[LOOP $i] $DATE"
if ((i%2 == 0));
then
chattr +j /mnt/file
else
chattr -j /mnt/file
fi
echo "0" >> /mnt/file
done
kill -9 $PID
rm -f /mnt/file*
umount /mnt
exit 0
-------------------------------------------------------------------------------
Though I ran it, it never ended forever.
(At one of my tries to reproduce the hang, the reproducer stopped in 6
hours.)

Because jbd2_journal_lock_updates() never returns from schedule()
after prepare_to_wait().

The detailed explanation is as follows:
===============================================================================
Current implementation of jbd2_journal_stop() has a bug not to synchronize
t_updates with other operators, jbd2_journal_lock_updates(), start_this_handle()
and jbd2_journal_commit_transaction().

This bug was derived from commit: c35a56a090eacefca07afeb994029b57d8dd8025 and
commit: 8dd420466c7bfc459fa04680bd5690bfc41a4553.

---------------------------
jbd2_journal_lock_updates()
---------------------------
509 write_lock(&journal->j_state_lock);
510 ++journal->j_barrier_count;
...
513 while (1) {
...
519 spin_lock(&transaction->t_handle_lock);
520 if (!atomic_read(&transaction->t_updates)) {
521 spin_unlock(&transaction->t_handle_lock);
522 break;
523 }
524 prepare_to_wait(&journal->j_wait_updates, &wait,
525 TASK_UNINTERRUPTIBLE);
526 spin_unlock(&transaction->t_handle_lock);
527 write_unlock(&journal->j_state_lock);
528 schedule();
529 finish_wait(&journal->j_wait_updates, &wait);
530 write_lock(&journal->j_state_lock);
531 }
532 write_unlock(&journal->j_state_lock);

* The operation of t_updates is protected by write_lock(&journal->j_state_lock)
and spin_lock(&transaction->t_handle_lock).

-------------------
start_this_handle()
-------------------
160 read_lock(&journal->j_state_lock);
...
278 atomic_inc(&transaction->t_updates);
...
284 read_unlock(&journal->j_state_lock);

* The operation of t_updates is protected by read_lock(&journal->j_state_lock).

---------------------------------
jbd2_journal_commit_transaction()
---------------------------------
358 write_lock(&journal->j_state_lock);
...
367 spin_lock(&commit_transaction->t_handle_lock);
368 while (atomic_read(&commit_transaction->t_updates)) {
369 DEFINE_WAIT(wait);
370
371 prepare_to_wait(&journal->j_wait_updates, &wait,
372 TASK_UNINTERRUPTIBLE);
373 if (atomic_read(&commit_transaction->t_updates)) {
374 spin_unlock(&commit_transaction->t_handle_lock);
375 write_unlock(&journal->j_state_lock);
376 schedule();
377 write_lock(&journal->j_state_lock);
378 spin_lock(&commit_transaction->t_handle_lock);
379 }
380 finish_wait(&journal->j_wait_updates, &wait);
381 }
382 spin_unlock(&commit_transaction->t_handle_lock);
...
447 wake_up(&journal->j_wait_transaction_locked);
448 write_unlock(&journal->j_state_lock);

* The operation of t_updates is protected by write_lock(&journal->j_state_lock)
and spin_lock(&transaction->t_handle_lock).

-------------------
jbd2_journal_stop()
-------------------
...
1452 if (atomic_dec_and_test(&transaction->t_updates)) {
1453 wake_up(&journal->j_wait_updates);
1454 if (journal->j_barrier_count)
1455 wake_up(&journal->j_wait_transaction_locked);
1456 }
1457
1458 if (wait_for_commit)
1459 err = jbd2_log_wait_commit(journal, tid);
1460
1461 lock_map_release(&handle->h_lockdep_map);
1462
1463 jbd2_free_handle(handle);
1464 return err;
1465 }

* The operation of t_updates is not protected.

Therefore,
- jbd2_journal_lock_updates() sleeps if transaction->t_update is 0.
- jbd2_journal_stop() tries to wake someone up
(wake_up(&journal->j_wait_updates)) after transaction->t_updates becomes 0.
=> But jbd2_journal_lock_updates() isn't woken up and then sleeps forever,
if it has not yet slept completely.

After we run the reproducer, we can see the processes which become hang-up.

-------------------------------------------------------------------------------
crash> ps | grep UN
16645 23737 1 ffff88007a592b20 UN 0.0 105136 508 dd
16684 23727 1 ffff88007a2a9480 UN 0.0 107152 512 chattr
23722 2 1 ffff88007b496080 UN 0.0 0 0 [flush-7:0]

[1]
crash> bt 16645
PID: 16645 TASK: ffff88007a592b20 CPU: 1 COMMAND: "dd"
#0 [ffff88007b575878] __schedule at ffffffff814b9b77
#1 [ffff88007b575920] schedule at ffffffff814ba3cf
#2 [ffff88007b575930] start_this_handle at ffffffffa01acef5 [jbd2]
#3 [ffff88007b5759f0] jbd2__journal_start at ffffffffa01ad4cb [jbd2]
#4 [ffff88007b575a30] jbd2_journal_start at ffffffffa01ad523 [jbd2]
#5 [ffff88007b575a40] ext4_journal_start_sb at ffffffffa03a7d83 [ext4]
#6 [ffff88007b575ab0] ext4_da_write_begin at ffffffffa038ddce [ext4]
#7 [ffff88007b575b50] generic_file_buffered_write at ffffffff811005ce
#8 [ffff88007b575c10] __generic_file_aio_write at ffffffff81103418
#9 [ffff88007b575cd0] generic_file_aio_write at ffffffff811036b6
#10 [ffff88007b575d50] ext4_file_write at ffffffffa0386ee9 [ext4]
#11 [ffff88007b575de0] do_sync_write at ffffffff8115de52
#12 [ffff88007b575f00] vfs_write at ffffffff8115e3e8
#13 [ffff88007b575f30] sys_write at ffffffff8115e5b1
#14 [ffff88007b575f80] system_call_fastpath at ffffffff814c3f42
RIP: 00000034930d4230 RSP: 00007fffd8176980 RFLAGS: 00000206
RAX: 0000000000000001 RBX: ffffffff814c3f42 RCX: 0000000000000010
RDX: 0000000000001000 RSI: 0000000001ac6000 RDI: 0000000000000001
RBP: 0000000001ac6000 R8: 0000000000000003 R9: 0000000000040000
R10: 0000000000003003 R11: 0000000000000246 R12: 0000000001ac5fff
R13: 0000000000000000 R14: 0000000000001000 R15: 0000000000000246
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b

[2]
crash> bt 16684
PID: 16684 TASK: ffff88007a2a9480 CPU: 1 COMMAND: "chattr"
#0 [ffff88007af87ca8] __schedule at ffffffff814b9b77
#1 [ffff88007af87d50] schedule at ffffffff814ba3cf
#2 [ffff88007af87d60] jbd2_journal_lock_updates at ffffffffa01acca6 [jbd2]
#3 [ffff88007af87de0] ext4_change_inode_journal_flag at ffffffffa038faf1 [ext4]
#4 [ffff88007af87e10] ext4_ioctl at ffffffffa0392f48 [ext4]
#5 [ffff88007af87ea0] do_vfs_ioctl at ffffffff8116fb8a
#6 [ffff88007af87f30] sys_ioctl at ffffffff811700d1
#7 [ffff88007af87f80] system_call_fastpath at ffffffff814c3f42
RIP: 00000034930d95f7 RSP: 00007fff8830e0d8 RFLAGS: 00010213
RAX: 0000000000000010 RBX: ffffffff814c3f42 RCX: 00000034930d95f7
RDX: 00007fff8830e17c RSI: 0000000040086602 RDI: 0000000000000003
RBP: 0000000000084000 R8: 0000000000000000 R9: 0000000000000001
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff8830f8db
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000003
ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b

[3]
crash> bt 23722
PID: 23722 TASK: ffff88007b496080 CPU: 1 COMMAND: "flush-7:0"
#0 [ffff88007b4f7840] __schedule at ffffffff814b9b77
#1 [ffff88007b4f78e8] schedule at ffffffff814ba3cf
#2 [ffff88007b4f78f8] start_this_handle at ffffffffa01acef5 [jbd2]
#3 [ffff88007b4f79b8] jbd2__journal_start at ffffffffa01ad4cb [jbd2]
#4 [ffff88007b4f79f8] jbd2_journal_start at ffffffffa01ad523 [jbd2]
#5 [ffff88007b4f7a08] ext4_journal_start_sb at ffffffffa03a7d83 [ext4]
#6 [ffff88007b4f7a78] ext4_da_writepages at ffffffffa0390b6d [ext4]
#7 [ffff88007b4f7bc8] do_writepages at ffffffff8110c261
#8 [ffff88007b4f7bd8] writeback_single_inode at ffffffff81185830
#9 [ffff88007b4f7c38] writeback_sb_inodes at ffffffff81185e21
#10 [ffff88007b4f7ce8] __writeback_inodes_wb at ffffffff81185f7e
#11 [ffff88007b4f7d38] wb_writeback at ffffffff81186223
#12 [ffff88007b4f7db8] wb_do_writeback at ffffffff811864f9
#13 [ffff88007b4f7e58] bdi_writeback_thread at ffffffff811865fa
#14 [ffff88007b4f7ee8] kthread at ffffffff810828c6
#15 [ffff88007b4f7f48] kernel_thread_helper at ffffffff814c60b4

Step to reproduce a hang-up(figure):
=======================================+=======================================
( "chattr -j" process PID: 16684 ) | ( a certain process )
t ======================================+=======================================
i |
m +---------------------------------------
e |jbd2_journal_start()
| +---------------------------------------
| | . . .
| | t_updates++ // t_updates = 1
| | . . .
| +---------------------------------------
| ...
|---------------------------------------+---------------------------------------
|jbd2_journal_lock_updates() |jbd2_journal_stop()
|---------------------------------------+---------------------------------------
| write_lock(&journal->j_state_lock) | .
| ++journal->j_barrier_count | .
| spin_lock(&tran->t_handle_lock) | .
| atomic_read(&tran->t_updates) //not 0 |
| | atomic_dec_and_test(&tran->t_updates)
| | // t_updates = 0
| | wake_up(&journal->j_wait_updates)
| prepare_to_wait() | // no process is woken up.
| spin_unlock(&tran->t_handle_lock) + wake_up(&journal->j_wait_tran_locked)
| write_unlock(&journal->j_state_lock) | // no process is woken up.
| schedule() // never return +---------------------------------------
| | ( "dd" PID: 16645 or "flush-7:0" )
| +=======================================
| |start_this_handle()
| +---------------------------------------
| | . . .
| | read_lock(&journal->j_state_lock)
| | if (journal->j_barrier_count)
| | read_unlock(&journal->j_state_lock)
| | wait_event(journal->j_wait_tran_locked)
v | // never return

Therefore, jbd2_journal_stop() needs spin_lock(&tran->t_handle_lock) or
read_lock(&journal->j_state_lock) to synchronize trans->t_updates with other
operators.
===============================================================================

Signed-off-by: Toshiyuki Okajima <[email protected]>
---
fs/jbd2/transaction.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index a0e41a4..d48f395 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1448,12 +1448,14 @@ int jbd2_journal_stop(handle_t *handle)
* once we do this, we must not dereference transaction
* pointer again.
*/
+ read_lock(&journal->j_state_lock);
tid = transaction->t_tid;
if (atomic_dec_and_test(&transaction->t_updates)) {
wake_up(&journal->j_wait_updates);
if (journal->j_barrier_count)
wake_up(&journal->j_wait_transaction_locked);
}
+ read_unlock(&journal->j_state_lock);

if (wait_for_commit)
err = jbd2_log_wait_commit(journal, tid);
--
1.5.5.6


2011-12-16 12:22:16

by Yongqiang Yang

[permalink] [raw]
Subject: Re: [PATCH][RFC] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_updates operations

On Fri, Dec 16, 2011 at 7:19 PM, Toshiyuki Okajima
<[email protected]> wrote:
> Hi.
>
> I found a bug by executing the following reproducer (needs tuning).
> (The reproducer never ends.)
>
> I wrote it for a confirmation of
> ?"[PATCH 1/5] ext4: allocate delalloc blocks before changing journal mode".
> Therefore, without
> ?"[PATCH 1/5] ext4: allocate delalloc blocks before changing journal mode"
> patch, the other problem (panic) which the patch describes can happen more
> frequently:
> -------------------------------------------------------------------------------
> #!/bin/sh
>
> date
> LOOP=100000
> dd if=/dev/zero of=/tmp/ext4.img bs=1k count=1000k
> /sbin/mkfs.ext4 -Fq /tmp/ext4.img
> mount -o loop /tmp/ext4.img /mnt
> rm -f /mnt/file
> echo "0" > /mnt/file
> (while [ 1 ]; do dd if=/dev/zero of=/mnt/file2 bs=4k > /dev/null 2>&1; done) &
> PID=$!
> for ((i=0; i < LOOP; i++));
> do
> ? ? ? ?DATE=$(date)
> ? ? ? ?echo -n "[LOOP $i] $DATE"
> ? ? ? ?if ((i%2 == 0));
> ? ? ? ?then
> ? ? ? ? ? ? ? ?chattr +j /mnt/file
> ? ? ? ?else
> ? ? ? ? ? ? ? ?chattr -j /mnt/file
> ? ? ? ?fi
> ? ? ? ?echo "0" >> /mnt/file
> done
> kill -9 $PID
> rm -f /mnt/file*
> umount /mnt
> exit 0
> -------------------------------------------------------------------------------
> Though I ran it, it never ended forever.
> (At one of my tries to reproduce the hang, the reproducer stopped in 6
> hours.)
>
> Because jbd2_journal_lock_updates() never returns from schedule()
> after prepare_to_wait().
>
> The detailed explanation is as follows:
> ===============================================================================
> Current implementation of jbd2_journal_stop() has a bug not to synchronize
> t_updates with other operators, jbd2_journal_lock_updates(), start_this_handle()
IMHO, the description is not very exactly, the problem is not due to
desync of t_updates, t_updates is a var of atomic_t. The reason is
that we have a critical code section here:[operation on t_updates,
wait or wakeup], this section should be synchronized.

> and jbd2_journal_commit_transaction().
>
> This bug was derived from commit: c35a56a090eacefca07afeb994029b57d8dd8025 and
> ?commit: 8dd420466c7bfc459fa04680bd5690bfc41a4553.
>
> ---------------------------
> jbd2_journal_lock_updates()
> ---------------------------
> ?509 ? ? ? ? write_lock(&journal->j_state_lock);
> ?510 ? ? ? ? ++journal->j_barrier_count;
> ...
> ?513 ? ? ? ? while (1) {
> ...
> ?519 ? ? ? ? ? ? ? ? spin_lock(&transaction->t_handle_lock);
> ?520 ? ? ? ? ? ? ? ? if (!atomic_read(&transaction->t_updates)) {
> ?521 ? ? ? ? ? ? ? ? ? ? ? ? spin_unlock(&transaction->t_handle_lock);
> ?522 ? ? ? ? ? ? ? ? ? ? ? ? break;
> ?523 ? ? ? ? ? ? ? ? }
> ?524 ? ? ? ? ? ? ? ? prepare_to_wait(&journal->j_wait_updates, &wait,
> ?525 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? TASK_UNINTERRUPTIBLE);
> ?526 ? ? ? ? ? ? ? ? spin_unlock(&transaction->t_handle_lock);
It seems that t_handle_lock is used to protect the critical section
here. I think t_handle is not needed here at all. t_handle_lock
should be used to protect transaction and j_state_lock should be used
to protect journal. prepare_to_wait operates on journal, so we should
remove t_handle_lock here.

> ?527 ? ? ? ? ? ? ? ? write_unlock(&journal->j_state_lock);
> ?528 ? ? ? ? ? ? ? ? schedule();
> ?529 ? ? ? ? ? ? ? ? finish_wait(&journal->j_wait_updates, &wait);
> ?530 ? ? ? ? ? ? ? ? write_lock(&journal->j_state_lock);
> ?531 ? ? ? ? }
> ?532 ? ? ? ? write_unlock(&journal->j_state_lock);
>
> * The operation of t_updates is protected by write_lock(&journal->j_state_lock)
> ?and spin_lock(&transaction->t_handle_lock).
>
> -------------------
> start_this_handle()
> -------------------
> ?160 ? ? ? ? read_lock(&journal->j_state_lock);
> ...
> ?278 ? ? ? ? atomic_inc(&transaction->t_updates);
> ...
> ?284 ? ? ? ? read_unlock(&journal->j_state_lock);
>
> * The operation of t_updates is protected by read_lock(&journal->j_state_lock).
>
> ---------------------------------
> jbd2_journal_commit_transaction()
> ---------------------------------
> ?358 ? ? ? ? write_lock(&journal->j_state_lock);
> ...
> ?367 ? ? ? ? spin_lock(&commit_transaction->t_handle_lock);
> ?368 ? ? ? ? while (atomic_read(&commit_transaction->t_updates)) {
> ?369 ? ? ? ? ? ? ? ? DEFINE_WAIT(wait);
> ?370
> ?371 ? ? ? ? ? ? ? ? prepare_to_wait(&journal->j_wait_updates, &wait,
> ?372 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? TASK_UNINTERRUPTIBLE);
> ?373 ? ? ? ? ? ? ? ? if (atomic_read(&commit_transaction->t_updates)) {
> ?374 ? ? ? ? ? ? ? ? ? ? ? ? spin_unlock(&commit_transaction->t_handle_lock);
Also, t_handle_lock is used to protect the critical section. I think
t_handle_lock is not needed here at all.

> ?375 ? ? ? ? ? ? ? ? ? ? ? ? write_unlock(&journal->j_state_lock);
> ?376 ? ? ? ? ? ? ? ? ? ? ? ? schedule();
> ?377 ? ? ? ? ? ? ? ? ? ? ? ? write_lock(&journal->j_state_lock);
> ?378 ? ? ? ? ? ? ? ? ? ? ? ? spin_lock(&commit_transaction->t_handle_lock);
> ?379 ? ? ? ? ? ? ? ? }
> ?380 ? ? ? ? ? ? ? ? finish_wait(&journal->j_wait_updates, &wait);
> ?381 ? ? ? ? }
> ?382 ? ? ? ? spin_unlock(&commit_transaction->t_handle_lock);
> ...
> ?447 ? ? ? ? wake_up(&journal->j_wait_transaction_locked);
> ?448 ? ? ? ? write_unlock(&journal->j_state_lock);
>
> * The operation of t_updates is protected by write_lock(&journal->j_state_lock)
> ?and spin_lock(&transaction->t_handle_lock).
>
> -------------------
> jbd2_journal_stop()
> -------------------
> ...
> 1452 ? ? ? ? if (atomic_dec_and_test(&transaction->t_updates)) {
> 1453 ? ? ? ? ? ? ? ? wake_up(&journal->j_wait_updates);
> 1454 ? ? ? ? ? ? ? ? if (journal->j_barrier_count)
> 1455 ? ? ? ? ? ? ? ? ? ? ? ? wake_up(&journal->j_wait_transaction_locked);
> 1456 ? ? ? ? }
> 1457
> 1458 ? ? ? ? if (wait_for_commit)
> 1459 ? ? ? ? ? ? ? ? err = jbd2_log_wait_commit(journal, tid);
> 1460
> 1461 ? ? ? ? lock_map_release(&handle->h_lockdep_map);
> 1462
> 1463 ? ? ? ? jbd2_free_handle(handle);
> 1464 ? ? ? ? return err;
> 1465 }
>
> * The operation of t_updates is not protected.
>
> Therefore,
> - jbd2_journal_lock_updates() sleeps if transaction->t_update is 0.
> - jbd2_journal_stop() tries to wake someone up
> (wake_up(&journal->j_wait_updates)) after transaction->t_updates becomes 0.
> => But jbd2_journal_lock_updates() isn't woken up and then sleeps forever,
> ? if it has not yet slept completely.
>
> After we run the reproducer, we can see the processes which become hang-up.
>
> -------------------------------------------------------------------------------
> crash> ps | grep UN
> ?16645 ?23737 ? 1 ?ffff88007a592b20 ?UN ? 0.0 ?105136 ? ?508 ?dd
> ?16684 ?23727 ? 1 ?ffff88007a2a9480 ?UN ? 0.0 ?107152 ? ?512 ?chattr
> ?23722 ? ? ?2 ? 1 ?ffff88007b496080 ?UN ? 0.0 ? ? ? 0 ? ? ?0 ?[flush-7:0]
>
> [1]
> crash> bt 16645
> PID: 16645 ?TASK: ffff88007a592b20 ?CPU: 1 ? COMMAND: "dd"
> ?#0 [ffff88007b575878] __schedule at ffffffff814b9b77
> ?#1 [ffff88007b575920] schedule at ffffffff814ba3cf
> ?#2 [ffff88007b575930] start_this_handle at ffffffffa01acef5 [jbd2]
> ?#3 [ffff88007b5759f0] jbd2__journal_start at ffffffffa01ad4cb [jbd2]
> ?#4 [ffff88007b575a30] jbd2_journal_start at ffffffffa01ad523 [jbd2]
> ?#5 [ffff88007b575a40] ext4_journal_start_sb at ffffffffa03a7d83 [ext4]
> ?#6 [ffff88007b575ab0] ext4_da_write_begin at ffffffffa038ddce [ext4]
> ?#7 [ffff88007b575b50] generic_file_buffered_write at ffffffff811005ce
> ?#8 [ffff88007b575c10] __generic_file_aio_write at ffffffff81103418
> ?#9 [ffff88007b575cd0] generic_file_aio_write at ffffffff811036b6
> #10 [ffff88007b575d50] ext4_file_write at ffffffffa0386ee9 [ext4]
> #11 [ffff88007b575de0] do_sync_write at ffffffff8115de52
> #12 [ffff88007b575f00] vfs_write at ffffffff8115e3e8
> #13 [ffff88007b575f30] sys_write at ffffffff8115e5b1
> #14 [ffff88007b575f80] system_call_fastpath at ffffffff814c3f42
> ? ?RIP: 00000034930d4230 ?RSP: 00007fffd8176980 ?RFLAGS: 00000206
> ? ?RAX: 0000000000000001 ?RBX: ffffffff814c3f42 ?RCX: 0000000000000010
> ? ?RDX: 0000000000001000 ?RSI: 0000000001ac6000 ?RDI: 0000000000000001
> ? ?RBP: 0000000001ac6000 ? R8: 0000000000000003 ? R9: 0000000000040000
> ? ?R10: 0000000000003003 ?R11: 0000000000000246 ?R12: 0000000001ac5fff
> ? ?R13: 0000000000000000 ?R14: 0000000000001000 ?R15: 0000000000000246
> ? ?ORIG_RAX: 0000000000000001 ?CS: 0033 ?SS: 002b
>
> [2]
> crash> bt 16684
> PID: 16684 ?TASK: ffff88007a2a9480 ?CPU: 1 ? COMMAND: "chattr"
> ?#0 [ffff88007af87ca8] __schedule at ffffffff814b9b77
> ?#1 [ffff88007af87d50] schedule at ffffffff814ba3cf
> ?#2 [ffff88007af87d60] jbd2_journal_lock_updates at ffffffffa01acca6 [jbd2]
> ?#3 [ffff88007af87de0] ext4_change_inode_journal_flag at ffffffffa038faf1 [ext4]
> ?#4 [ffff88007af87e10] ext4_ioctl at ffffffffa0392f48 [ext4]
> ?#5 [ffff88007af87ea0] do_vfs_ioctl at ffffffff8116fb8a
> ?#6 [ffff88007af87f30] sys_ioctl at ffffffff811700d1
> ?#7 [ffff88007af87f80] system_call_fastpath at ffffffff814c3f42
> ? ?RIP: 00000034930d95f7 ?RSP: 00007fff8830e0d8 ?RFLAGS: 00010213
> ? ?RAX: 0000000000000010 ?RBX: ffffffff814c3f42 ?RCX: 00000034930d95f7
> ? ?RDX: 00007fff8830e17c ?RSI: 0000000040086602 ?RDI: 0000000000000003
> ? ?RBP: 0000000000084000 ? R8: 0000000000000000 ? R9: 0000000000000001
> ? ?R10: 0000000000000000 ?R11: 0000000000000246 ?R12: 00007fff8830f8db
> ? ?R13: 0000000000000001 ?R14: 0000000000000000 ?R15: 0000000000000003
> ? ?ORIG_RAX: 0000000000000010 ?CS: 0033 ?SS: 002b
>
> [3]
> crash> bt 23722
> PID: 23722 ?TASK: ffff88007b496080 ?CPU: 1 ? COMMAND: "flush-7:0"
> ?#0 [ffff88007b4f7840] __schedule at ffffffff814b9b77
> ?#1 [ffff88007b4f78e8] schedule at ffffffff814ba3cf
> ?#2 [ffff88007b4f78f8] start_this_handle at ffffffffa01acef5 [jbd2]
> ?#3 [ffff88007b4f79b8] jbd2__journal_start at ffffffffa01ad4cb [jbd2]
> ?#4 [ffff88007b4f79f8] jbd2_journal_start at ffffffffa01ad523 [jbd2]
> ?#5 [ffff88007b4f7a08] ext4_journal_start_sb at ffffffffa03a7d83 [ext4]
> ?#6 [ffff88007b4f7a78] ext4_da_writepages at ffffffffa0390b6d [ext4]
> ?#7 [ffff88007b4f7bc8] do_writepages at ffffffff8110c261
> ?#8 [ffff88007b4f7bd8] writeback_single_inode at ffffffff81185830
> ?#9 [ffff88007b4f7c38] writeback_sb_inodes at ffffffff81185e21
> #10 [ffff88007b4f7ce8] __writeback_inodes_wb at ffffffff81185f7e
> #11 [ffff88007b4f7d38] wb_writeback at ffffffff81186223
> #12 [ffff88007b4f7db8] wb_do_writeback at ffffffff811864f9
> #13 [ffff88007b4f7e58] bdi_writeback_thread at ffffffff811865fa
> #14 [ffff88007b4f7ee8] kthread at ffffffff810828c6
> #15 [ffff88007b4f7f48] kernel_thread_helper at ffffffff814c60b4
>
> Step to reproduce a hang-up(figure):
> ?=======================================+=======================================
> ?( "chattr -j" process PID: 16684 ) ? ? | ?( a certain process )
> t ======================================+=======================================
> i ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? |
> m ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? +---------------------------------------
> e ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? |jbd2_journal_start()
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? +---------------------------------------
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? | . . .
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? | t_updates++ // t_updates = 1
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? | . . .
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? +---------------------------------------
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?...
> |---------------------------------------+---------------------------------------
> |jbd2_journal_lock_updates() ? ? ? ? ? ?|jbd2_journal_stop()
> |---------------------------------------+---------------------------------------
> | write_lock(&journal->j_state_lock) ? ?| ? ?.
> | ++journal->j_barrier_count ? ? ? ? ? ?| ? ?.
> | spin_lock(&tran->t_handle_lock) ? ? ? | ? ?.
> | atomic_read(&tran->t_updates) //not 0 |
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? | atomic_dec_and_test(&tran->t_updates)
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? | ? ?// t_updates = 0
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? | wake_up(&journal->j_wait_updates)
> | prepare_to_wait() ? ? ? ? ? ? ? ? ? ? | ? ?// no process is woken up.
> | spin_unlock(&tran->t_handle_lock) ? ? + wake_up(&journal->j_wait_tran_locked)
> | write_unlock(&journal->j_state_lock) ?| ? ?// no process is woken up.
> | schedule() // never return ? ? ? ? ? ?+---------------------------------------
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? | ?( "dd" PID: 16645 or "flush-7:0" )
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? +=======================================
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? |start_this_handle()
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? +---------------------------------------
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? | . . .
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? | read_lock(&journal->j_state_lock)
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? | if (journal->j_barrier_count)
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? | ?read_unlock(&journal->j_state_lock)
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? | ?wait_event(journal->j_wait_tran_locked)
> v ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? | ?// never return
>
> Therefore, jbd2_journal_stop() needs spin_lock(&tran->t_handle_lock) or
> read_lock(&journal->j_state_lock) to synchronize trans->t_updates with other
> operators.
> ===============================================================================
>
> Signed-off-by: Toshiyuki Okajima <[email protected]>
> ---
> ?fs/jbd2/transaction.c | ? ?2 ++
> ?1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index a0e41a4..d48f395 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -1448,12 +1448,14 @@ int jbd2_journal_stop(handle_t *handle)
> ? ? ? ? * once we do this, we must not dereference transaction
> ? ? ? ? * pointer again.
> ? ? ? ? */
> + ? ? ? read_lock(&journal->j_state_lock);
> ? ? ? ?tid = transaction->t_tid;
> ? ? ? ?if (atomic_dec_and_test(&transaction->t_updates)) {
> ? ? ? ? ? ? ? ?wake_up(&journal->j_wait_updates);
> ? ? ? ? ? ? ? ?if (journal->j_barrier_count)
oops, j_barrier_count is not protected too in original code. The
problem does not exists in patched code any more:-)


Yongqiang.
> ? ? ? ? ? ? ? ? ? ? ? ?wake_up(&journal->j_wait_transaction_locked);
> ? ? ? ?}
> + ? ? ? read_unlock(&journal->j_state_lock);
>
> ? ? ? ?if (wait_for_commit)
> ? ? ? ? ? ? ? ?err = jbd2_log_wait_commit(journal, tid);
> --
> 1.5.5.6
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html



--
Best Wishes
Yongqiang Yang

2011-12-20 10:40:40

by Toshiyuki Okajima

[permalink] [raw]
Subject: Re: [PATCH][RFC] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_updates operations

Hi, Yongqiang.

Thanks for reviewing.

(2011/12/16 21:22), Yongqiang Yang wrote:
> On Fri, Dec 16, 2011 at 7:19 PM, Toshiyuki Okajima
> <[email protected]> wrote:
>> Hi.
>>
>> I found a bug by executing the following reproducer (needs tuning).
>> (The reproducer never ends.)
>>
>> I wrote it for a confirmation of
>> "[PATCH 1/5] ext4: allocate delalloc blocks before changing journal mode".
>> Therefore, without
>> "[PATCH 1/5] ext4: allocate delalloc blocks before changing journal mode"
>> patch, the other problem (panic) which the patch describes can happen more
>> frequently:
>> -------------------------------------------------------------------------------
>> #!/bin/sh
>>
>> date
>> LOOP=100000
>> dd if=/dev/zero of=/tmp/ext4.img bs=1k count=1000k
>> /sbin/mkfs.ext4 -Fq /tmp/ext4.img
>> mount -o loop /tmp/ext4.img /mnt
>> rm -f /mnt/file
>> echo "0" > /mnt/file
>> (while [ 1 ]; do dd if=/dev/zero of=/mnt/file2 bs=4k > /dev/null 2>&1; done) &
>> PID=$!
>> for ((i=0; i < LOOP; i++));
>> do
>> DATE=$(date)
>> echo -n "[LOOP $i] $DATE"
>> if ((i%2 == 0));
>> then
>> chattr +j /mnt/file
>> else
>> chattr -j /mnt/file
>> fi
>> echo "0" >> /mnt/file
>> done
>> kill -9 $PID
>> rm -f /mnt/file*
>> umount /mnt
>> exit 0
>> -------------------------------------------------------------------------------
>> Though I ran it, it never ended forever.
>> (At one of my tries to reproduce the hang, the reproducer stopped in 6
>> hours.)
>>
>> Because jbd2_journal_lock_updates() never returns from schedule()
>> after prepare_to_wait().
>>
>> The detailed explanation is as follows:
>> ===============================================================================
>> Current implementation of jbd2_journal_stop() has a bug not to synchronize
>> t_updates with other operators, jbd2_journal_lock_updates(), start_this_handle()

> IMHO, the description is not very exactly, the problem is not due to
> desync of t_updates, t_updates is a var of atomic_t. The reason is
> that we have a critical code section here:[operation on t_updates,
> wait or wakeup], this section should be synchronized.
I understood. I will rewrite the description much better than this by
applying your comment.

>
>> and jbd2_journal_commit_transaction().
>>
>> This bug was derived from commit: c35a56a090eacefca07afeb994029b57d8dd8025 and
>> commit: 8dd420466c7bfc459fa04680bd5690bfc41a4553.
>>
>> ---------------------------
>> jbd2_journal_lock_updates()
>> ---------------------------
>> 509 write_lock(&journal->j_state_lock);
>> 510 ++journal->j_barrier_count;
>> ...
>> 513 while (1) {
>> ...
>> 519 spin_lock(&transaction->t_handle_lock);
>> 520 if (!atomic_read(&transaction->t_updates)) {
>> 521 spin_unlock(&transaction->t_handle_lock);
>> 522 break;
>> 523 }
>> 524 prepare_to_wait(&journal->j_wait_updates, &wait,
>> 525 TASK_UNINTERRUPTIBLE);
>> 526 spin_unlock(&transaction->t_handle_lock);

> It seems that t_handle_lock is used to protect the critical section
> here. I think t_handle is not needed here at all. t_handle_lock
> should be used to protect transaction and j_state_lock should be used
> to protect journal. prepare_to_wait operates on journal, so we should
> remove t_handle_lock here.
I think so, too. Then, I will write another patch which deletes the statements
of t_handle_lock.

>
>> 527 write_unlock(&journal->j_state_lock);
>> 528 schedule();
>> 529 finish_wait(&journal->j_wait_updates, &wait);
>> 530 write_lock(&journal->j_state_lock);
>> 531 }
>> 532 write_unlock(&journal->j_state_lock);
>>
>> * The operation of t_updates is protected by write_lock(&journal->j_state_lock)
>> and spin_lock(&transaction->t_handle_lock).
>>
>> -------------------
>> start_this_handle()
>> -------------------
>> 160 read_lock(&journal->j_state_lock);
>> ...
>> 278 atomic_inc(&transaction->t_updates);
>> ...
>> 284 read_unlock(&journal->j_state_lock);
>>
>> * The operation of t_updates is protected by read_lock(&journal->j_state_lock).
>>
>> ---------------------------------
>> jbd2_journal_commit_transaction()
>> ---------------------------------
>> 358 write_lock(&journal->j_state_lock);
>> ...
>> 367 spin_lock(&commit_transaction->t_handle_lock);
>> 368 while (atomic_read(&commit_transaction->t_updates)) {
>> 369 DEFINE_WAIT(wait);
>> 370
>> 371 prepare_to_wait(&journal->j_wait_updates, &wait,
>> 372 TASK_UNINTERRUPTIBLE);
>> 373 if (atomic_read(&commit_transaction->t_updates)) {
>> 374 spin_unlock(&commit_transaction->t_handle_lock);

> Also, t_handle_lock is used to protect the critical section. I think
> t_handle_lock is not needed here at all.
Yes.

>
>> 375 write_unlock(&journal->j_state_lock);
>> 376 schedule();
>> 377 write_lock(&journal->j_state_lock);
>> 378 spin_lock(&commit_transaction->t_handle_lock);
>> 379 }
>> 380 finish_wait(&journal->j_wait_updates, &wait);
>> 381 }
>> 382 spin_unlock(&commit_transaction->t_handle_lock);
>> ...
>> 447 wake_up(&journal->j_wait_transaction_locked);
>> 448 write_unlock(&journal->j_state_lock);
>>
>> * The operation of t_updates is protected by write_lock(&journal->j_state_lock)
>> and spin_lock(&transaction->t_handle_lock).
>>
>> -------------------
>> jbd2_journal_stop()
>> -------------------
>> ...
>> 1452 if (atomic_dec_and_test(&transaction->t_updates)) {
>> 1453 wake_up(&journal->j_wait_updates);
>> 1454 if (journal->j_barrier_count)
>> 1455 wake_up(&journal->j_wait_transaction_locked);
>> 1456 }
>> 1457
>> 1458 if (wait_for_commit)
>> 1459 err = jbd2_log_wait_commit(journal, tid);
>> 1460
>> 1461 lock_map_release(&handle->h_lockdep_map);
>> 1462
>> 1463 jbd2_free_handle(handle);
>> 1464 return err;
>> 1465 }
>>
>> * The operation of t_updates is not protected.
>>
>> Therefore,
>> - jbd2_journal_lock_updates() sleeps if transaction->t_update is 0.
>> - jbd2_journal_stop() tries to wake someone up
>> (wake_up(&journal->j_wait_updates)) after transaction->t_updates becomes 0.
>> => But jbd2_journal_lock_updates() isn't woken up and then sleeps forever,
>> if it has not yet slept completely.
>>
>> After we run the reproducer, we can see the processes which become hang-up.
>>
>> -------------------------------------------------------------------------------
>> crash> ps | grep UN
>> 16645 23737 1 ffff88007a592b20 UN 0.0 105136 508 dd
>> 16684 23727 1 ffff88007a2a9480 UN 0.0 107152 512 chattr
>> 23722 2 1 ffff88007b496080 UN 0.0 0 0 [flush-7:0]
>>
>> [1]
>> crash> bt 16645
>> PID: 16645 TASK: ffff88007a592b20 CPU: 1 COMMAND: "dd"
>> #0 [ffff88007b575878] __schedule at ffffffff814b9b77
>> #1 [ffff88007b575920] schedule at ffffffff814ba3cf
>> #2 [ffff88007b575930] start_this_handle at ffffffffa01acef5 [jbd2]
>> #3 [ffff88007b5759f0] jbd2__journal_start at ffffffffa01ad4cb [jbd2]
>> #4 [ffff88007b575a30] jbd2_journal_start at ffffffffa01ad523 [jbd2]
>> #5 [ffff88007b575a40] ext4_journal_start_sb at ffffffffa03a7d83 [ext4]
>> #6 [ffff88007b575ab0] ext4_da_write_begin at ffffffffa038ddce [ext4]
>> #7 [ffff88007b575b50] generic_file_buffered_write at ffffffff811005ce
>> #8 [ffff88007b575c10] __generic_file_aio_write at ffffffff81103418
>> #9 [ffff88007b575cd0] generic_file_aio_write at ffffffff811036b6
>> #10 [ffff88007b575d50] ext4_file_write at ffffffffa0386ee9 [ext4]
>> #11 [ffff88007b575de0] do_sync_write at ffffffff8115de52
>> #12 [ffff88007b575f00] vfs_write at ffffffff8115e3e8
>> #13 [ffff88007b575f30] sys_write at ffffffff8115e5b1
>> #14 [ffff88007b575f80] system_call_fastpath at ffffffff814c3f42
>> RIP: 00000034930d4230 RSP: 00007fffd8176980 RFLAGS: 00000206
>> RAX: 0000000000000001 RBX: ffffffff814c3f42 RCX: 0000000000000010
>> RDX: 0000000000001000 RSI: 0000000001ac6000 RDI: 0000000000000001
>> RBP: 0000000001ac6000 R8: 0000000000000003 R9: 0000000000040000
>> R10: 0000000000003003 R11: 0000000000000246 R12: 0000000001ac5fff
>> R13: 0000000000000000 R14: 0000000000001000 R15: 0000000000000246
>> ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
>>
>> [2]
>> crash> bt 16684
>> PID: 16684 TASK: ffff88007a2a9480 CPU: 1 COMMAND: "chattr"
>> #0 [ffff88007af87ca8] __schedule at ffffffff814b9b77
>> #1 [ffff88007af87d50] schedule at ffffffff814ba3cf
>> #2 [ffff88007af87d60] jbd2_journal_lock_updates at ffffffffa01acca6 [jbd2]
>> #3 [ffff88007af87de0] ext4_change_inode_journal_flag at ffffffffa038faf1 [ext4]
>> #4 [ffff88007af87e10] ext4_ioctl at ffffffffa0392f48 [ext4]
>> #5 [ffff88007af87ea0] do_vfs_ioctl at ffffffff8116fb8a
>> #6 [ffff88007af87f30] sys_ioctl at ffffffff811700d1
>> #7 [ffff88007af87f80] system_call_fastpath at ffffffff814c3f42
>> RIP: 00000034930d95f7 RSP: 00007fff8830e0d8 RFLAGS: 00010213
>> RAX: 0000000000000010 RBX: ffffffff814c3f42 RCX: 00000034930d95f7
>> RDX: 00007fff8830e17c RSI: 0000000040086602 RDI: 0000000000000003
>> RBP: 0000000000084000 R8: 0000000000000000 R9: 0000000000000001
>> R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff8830f8db
>> R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000003
>> ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b
>>
>> [3]
>> crash> bt 23722
>> PID: 23722 TASK: ffff88007b496080 CPU: 1 COMMAND: "flush-7:0"
>> #0 [ffff88007b4f7840] __schedule at ffffffff814b9b77
>> #1 [ffff88007b4f78e8] schedule at ffffffff814ba3cf
>> #2 [ffff88007b4f78f8] start_this_handle at ffffffffa01acef5 [jbd2]
>> #3 [ffff88007b4f79b8] jbd2__journal_start at ffffffffa01ad4cb [jbd2]
>> #4 [ffff88007b4f79f8] jbd2_journal_start at ffffffffa01ad523 [jbd2]
>> #5 [ffff88007b4f7a08] ext4_journal_start_sb at ffffffffa03a7d83 [ext4]
>> #6 [ffff88007b4f7a78] ext4_da_writepages at ffffffffa0390b6d [ext4]
>> #7 [ffff88007b4f7bc8] do_writepages at ffffffff8110c261
>> #8 [ffff88007b4f7bd8] writeback_single_inode at ffffffff81185830
>> #9 [ffff88007b4f7c38] writeback_sb_inodes at ffffffff81185e21
>> #10 [ffff88007b4f7ce8] __writeback_inodes_wb at ffffffff81185f7e
>> #11 [ffff88007b4f7d38] wb_writeback at ffffffff81186223
>> #12 [ffff88007b4f7db8] wb_do_writeback at ffffffff811864f9
>> #13 [ffff88007b4f7e58] bdi_writeback_thread at ffffffff811865fa
>> #14 [ffff88007b4f7ee8] kthread at ffffffff810828c6
>> #15 [ffff88007b4f7f48] kernel_thread_helper at ffffffff814c60b4
>>
>> Step to reproduce a hang-up(figure):
>> =======================================+=======================================
>> ( "chattr -j" process PID: 16684 ) | ( a certain process )
>> t ======================================+=======================================
>> i |
>> m +---------------------------------------
>> e |jbd2_journal_start()
>> | +---------------------------------------
>> | | . . .
>> | | t_updates++ // t_updates = 1
>> | | . . .
>> | +---------------------------------------
>> | ...
>> |---------------------------------------+---------------------------------------
>> |jbd2_journal_lock_updates() |jbd2_journal_stop()
>> |---------------------------------------+---------------------------------------
>> | write_lock(&journal->j_state_lock) | .
>> | ++journal->j_barrier_count | .
>> | spin_lock(&tran->t_handle_lock) | .
>> | atomic_read(&tran->t_updates) //not 0 |
>> | | atomic_dec_and_test(&tran->t_updates)
>> | | // t_updates = 0
>> | | wake_up(&journal->j_wait_updates)
>> | prepare_to_wait() | // no process is woken up.
>> | spin_unlock(&tran->t_handle_lock) + wake_up(&journal->j_wait_tran_locked)
>> | write_unlock(&journal->j_state_lock) | // no process is woken up.
>> | schedule() // never return +---------------------------------------
>> | | ( "dd" PID: 16645 or "flush-7:0" )
>> | +=======================================
>> | |start_this_handle()
>> | +---------------------------------------
>> | | . . .
>> | | read_lock(&journal->j_state_lock)
>> | | if (journal->j_barrier_count)
>> | | read_unlock(&journal->j_state_lock)
>> | | wait_event(journal->j_wait_tran_locked)
>> v | // never return
>>
>> Therefore, jbd2_journal_stop() needs spin_lock(&tran->t_handle_lock) or
>> read_lock(&journal->j_state_lock) to synchronize trans->t_updates with other
>> operators.
>> ===============================================================================
>>
>> Signed-off-by: Toshiyuki Okajima <[email protected]>
>> ---
>> fs/jbd2/transaction.c | 2 ++
>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
>> index a0e41a4..d48f395 100644
>> --- a/fs/jbd2/transaction.c
>> +++ b/fs/jbd2/transaction.c
>> @@ -1448,12 +1448,14 @@ int jbd2_journal_stop(handle_t *handle)
>> * once we do this, we must not dereference transaction
>> * pointer again.
>> */
>> + read_lock(&journal->j_state_lock);
>> tid = transaction->t_tid;
>> if (atomic_dec_and_test(&transaction->t_updates)) {
>> wake_up(&journal->j_wait_updates);
>> if (journal->j_barrier_count)

> oops, j_barrier_count is not protected too in original code. The
> problem does not exists in patched code any more:-)
OK.
I will add the notice into the description which I will rewrite.

Best Regards,
Toshiyuki Okajima


>
>
> Yongqiang.
>> wake_up(&journal->j_wait_transaction_locked);
>> }
>> + read_unlock(&journal->j_state_lock);
>>
>> if (wait_for_commit)
>> err = jbd2_log_wait_commit(journal, tid);
>> --
>> 1.5.5.6
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>


2011-12-22 11:42:32

by Toshiyuki Okajima

[permalink] [raw]
Subject: [PATCH 0/2 take2][RFC] hangup in jbd2_journal_lock_updates

Hi.

I found a bug by executing the following reproducer. The reproducer is for a
confirmation of
"[PATCH 1/5] ext4: allocate delalloc blocks before changing journal mode".
Therefore, without its patch, the other problem (panic) which the patch
describes can happen more frequently:
-------------------------------------------------------------------------------
#!/bin/sh

date
LOOP=100000
dd if=/dev/zero of=/tmp/ext4.img bs=1k count=1000k
/sbin/mkfs.ext4 -Fq /tmp/ext4.img
mount -o loop /tmp/ext4.img /mnt
rm -f /mnt/file
echo "0" > /mnt/file
(while [ 1 ]; do dd if=/dev/zero of=/mnt/file2 bs=4k > /dev/null 2>&1; done) &
PID=$!
for ((i=0; i < LOOP; i++));
do
DATE=$(date)
echo -n "[LOOP $i] $DATE"
if ((i%2 == 0));
then
chattr +j /mnt/file
else
chattr -j /mnt/file
fi
echo "0" >> /mnt/file
done
kill -9 $PID
rm -f /mnt/file*
umount /mnt
exit 0
-------------------------------------------------------------------------------
Though I ran it, it never ended forever.
(At one of my tries to reproduce the hangup, the reproducer stopped in 6
hours.)

Because jbd2_journal_lock_updates() never returns from schedule() after
prepare_to_wait().
(Its filesystem becomes a hangup.)

The detailed explanation is as follows:
===============================================================================
Current implementation of jbd2_journal_stop() has a bug not to synchronize
a critical code section of it with others,
- jbd2_journal_lock_updates()
- start_this_handle()
- jbd2_journal_commit_transaction().

This bug was derived from commit: c35a56a090eacefca07afeb994029b57d8dd8025 and
commit: 8dd420466c7bfc459fa04680bd5690bfc41a4553.

The source codes and critical code sections of them are as follows:
---------------------------
jbd2_journal_lock_updates()
---------------------------
509 write_lock(&journal->j_state_lock);
--- critical code section ------------------------------------------------------
510 ++journal->j_barrier_count;
...
513 while (1) {
...
519 spin_lock(&transaction->t_handle_lock);
520 if (!atomic_read(&transaction->t_updates)) {
521 spin_unlock(&transaction->t_handle_lock);
522 break;
523 }
524 prepare_to_wait(&journal->j_wait_updates, &wait,
525 TASK_UNINTERRUPTIBLE);
526 spin_unlock(&transaction->t_handle_lock);
527 write_unlock(&journal->j_state_lock);
528 schedule();
529 finish_wait(&journal->j_wait_updates, &wait);
530 write_lock(&journal->j_state_lock);
531 }
-------------------------------------------------------------------------------
532 write_unlock(&journal->j_state_lock);

* The critical code section is from line 510 to line 531
referencer) protected by write_lock(j_state_lock) and
spin_lock(t_handle_lock)

-------------------
start_this_handle()
-------------------
160 read_lock(&journal->j_state_lock);
--- critical code section ------------------------------------------------------
...
278 atomic_inc(&transaction->t_updates);
...
-------------------------------------------------------------------------------
284 read_unlock(&journal->j_state_lock);

* The critical code sections is in line 278 protected by read_lock(j_state_lock)

---------------------------------
jbd2_journal_commit_transaction()
---------------------------------
358 write_lock(&journal->j_state_lock);
--- critical code section ------------------------------------------------------
...
367 spin_lock(&commit_transaction->t_handle_lock);
368 while (atomic_read(&commit_transaction->t_updates)) {
369 DEFINE_WAIT(wait);
370
371 prepare_to_wait(&journal->j_wait_updates, &wait,
372 TASK_UNINTERRUPTIBLE);
373 if (atomic_read(&commit_transaction->t_updates)) {
374 spin_unlock(&commit_transaction->t_handle_lock);
375 write_unlock(&journal->j_state_lock);
376 schedule();
377 write_lock(&journal->j_state_lock);
378 spin_lock(&commit_transaction->t_handle_lock);
379 }
380 finish_wait(&journal->j_wait_updates, &wait);
381 }
382 spin_unlock(&commit_transaction->t_handle_lock);
...
447 wake_up(&journal->j_wait_transaction_locked);
-------------------------------------------------------------------------------
448 write_unlock(&journal->j_state_lock);

* The critical code sections is from line 367 to line 447 protected by
write_lock(&journal->j_state_lock) and spin_lock(&transaction->t_handle_lock).

-------------------
jbd2_journal_stop()
-------------------
...
--- critical code section ------------------------------------------------------
1452 if (atomic_dec_and_test(&transaction->t_updates)) {
1453 wake_up(&journal->j_wait_updates);
1454 if (journal->j_barrier_count)
1455 wake_up(&journal->j_wait_transaction_locked);
1456 }
-------------------------------------------------------------------------------
1457
1458 if (wait_for_commit)
1459 err = jbd2_log_wait_commit(journal, tid);
1460
1461 lock_map_release(&handle->h_lockdep_map);
1462
1463 jbd2_free_handle(handle);
1464 return err;
1465 }

* The critical code sections is from line 1452 to line 1456. However,
it isn't protected by no exclusive control.

Therefore,
- jbd2_journal_lock_updates() sleeps if transaction->t_update is 0.
- jbd2_journal_stop() tries to wake someone up
(wake_up(&journal->j_wait_updates)) after transaction->t_updates becomes 0.
=> But jbd2_journal_lock_updates() isn't woken up and then sleeps forever,
if it has not yet slept completely (before executing schedule()).

After we run the reproducer, we can see the processes which become hangup:
-------------------------------------------------------------------------------
crash> ps | grep UN
16645 23737 1 ffff88007a592b20 UN 0.0 105136 508 dd
16684 23727 1 ffff88007a2a9480 UN 0.0 107152 512 chattr
23722 2 1 ffff88007b496080 UN 0.0 0 0 [flush-7:0]

[1]
crash> bt 16645
PID: 16645 TASK: ffff88007a592b20 CPU: 1 COMMAND: "dd"
#0 [ffff88007b575878] __schedule at ffffffff814b9b77
#1 [ffff88007b575920] schedule at ffffffff814ba3cf
#2 [ffff88007b575930] start_this_handle at ffffffffa01acef5 [jbd2]
#3 [ffff88007b5759f0] jbd2__journal_start at ffffffffa01ad4cb [jbd2]
#4 [ffff88007b575a30] jbd2_journal_start at ffffffffa01ad523 [jbd2]
#5 [ffff88007b575a40] ext4_journal_start_sb at ffffffffa03a7d83 [ext4]
#6 [ffff88007b575ab0] ext4_da_write_begin at ffffffffa038ddce [ext4]
#7 [ffff88007b575b50] generic_file_buffered_write at ffffffff811005ce
#8 [ffff88007b575c10] __generic_file_aio_write at ffffffff81103418
#9 [ffff88007b575cd0] generic_file_aio_write at ffffffff811036b6
#10 [ffff88007b575d50] ext4_file_write at ffffffffa0386ee9 [ext4]
#11 [ffff88007b575de0] do_sync_write at ffffffff8115de52
#12 [ffff88007b575f00] vfs_write at ffffffff8115e3e8
#13 [ffff88007b575f30] sys_write at ffffffff8115e5b1
#14 [ffff88007b575f80] system_call_fastpath at ffffffff814c3f42
RIP: 00000034930d4230 RSP: 00007fffd8176980 RFLAGS: 00000206
RAX: 0000000000000001 RBX: ffffffff814c3f42 RCX: 0000000000000010
RDX: 0000000000001000 RSI: 0000000001ac6000 RDI: 0000000000000001
RBP: 0000000001ac6000 R8: 0000000000000003 R9: 0000000000040000
R10: 0000000000003003 R11: 0000000000000246 R12: 0000000001ac5fff
R13: 0000000000000000 R14: 0000000000001000 R15: 0000000000000246
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b

[2]
crash> bt 16684
PID: 16684 TASK: ffff88007a2a9480 CPU: 1 COMMAND: "chattr"
#0 [ffff88007af87ca8] __schedule at ffffffff814b9b77
#1 [ffff88007af87d50] schedule at ffffffff814ba3cf
#2 [ffff88007af87d60] jbd2_journal_lock_updates at ffffffffa01acca6 [jbd2]
#3 [ffff88007af87de0] ext4_change_inode_journal_flag at ffffffffa038faf1 [ext4]
#4 [ffff88007af87e10] ext4_ioctl at ffffffffa0392f48 [ext4]
#5 [ffff88007af87ea0] do_vfs_ioctl at ffffffff8116fb8a
#6 [ffff88007af87f30] sys_ioctl at ffffffff811700d1
#7 [ffff88007af87f80] system_call_fastpath at ffffffff814c3f42
RIP: 00000034930d95f7 RSP: 00007fff8830e0d8 RFLAGS: 00010213
RAX: 0000000000000010 RBX: ffffffff814c3f42 RCX: 00000034930d95f7
RDX: 00007fff8830e17c RSI: 0000000040086602 RDI: 0000000000000003
RBP: 0000000000084000 R8: 0000000000000000 R9: 0000000000000001
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff8830f8db
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000003
ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b

[3]
crash> bt 23722
PID: 23722 TASK: ffff88007b496080 CPU: 1 COMMAND: "flush-7:0"
#0 [ffff88007b4f7840] __schedule at ffffffff814b9b77
#1 [ffff88007b4f78e8] schedule at ffffffff814ba3cf
#2 [ffff88007b4f78f8] start_this_handle at ffffffffa01acef5 [jbd2]
#3 [ffff88007b4f79b8] jbd2__journal_start at ffffffffa01ad4cb [jbd2]
#4 [ffff88007b4f79f8] jbd2_journal_start at ffffffffa01ad523 [jbd2]
#5 [ffff88007b4f7a08] ext4_journal_start_sb at ffffffffa03a7d83 [ext4]
#6 [ffff88007b4f7a78] ext4_da_writepages at ffffffffa0390b6d [ext4]
#7 [ffff88007b4f7bc8] do_writepages at ffffffff8110c261
#8 [ffff88007b4f7bd8] writeback_single_inode at ffffffff81185830
#9 [ffff88007b4f7c38] writeback_sb_inodes at ffffffff81185e21
#10 [ffff88007b4f7ce8] __writeback_inodes_wb at ffffffff81185f7e
#11 [ffff88007b4f7d38] wb_writeback at ffffffff81186223
#12 [ffff88007b4f7db8] wb_do_writeback at ffffffff811864f9
#13 [ffff88007b4f7e58] bdi_writeback_thread at ffffffff811865fa
#14 [ffff88007b4f7ee8] kthread at ffffffff810828c6
#15 [ffff88007b4f7f48] kernel_thread_helper at ffffffff814c60b4

Step to reproduce a hangup(figure):
=======================================+=======================================
( "chattr -j" process PID: 16684 ) | ( a certain process )
t ======================================+=======================================
i |
m +---------------------------------------
e |jbd2_journal_start()
| +---------------------------------------
| | . . .
| | t_updates++ // t_updates = 1
| + . . .
| +---------------------------------------
| ...
|---------------------------------------+---------------------------------------
|jbd2_journal_lock_updates() |jbd2_journal_stop()
|---------------------------------------+---------------------------------------
| write_lock(&journal->j_state_lock) | .
| ++journal->j_barrier_count | .
| spin_lock(&tran->t_handle_lock) | .
| atomic_read(&tran->t_updates) //not 0 |
| | atomic_dec_and_test(&tran->t_updates)
| | // t_updates = 0
| | wake_up(&journal->j_wait_updates)
| prepare_to_wait() | // no process is woken up.
| spin_unlock(&tran->t_handle_lock) + wake_up(&journal->j_wait_tran_locked)
| write_unlock(&journal->j_state_lock) | // no process is woken up.
| schedule() // never return +---------------------------------------
| | ( "dd" PID: 16645 or "flush-7:0" )
| +=======================================
| |start_this_handle()
| +---------------------------------------
| | . . .
| | read_lock(&journal->j_state_lock)
| | if (journal->j_barrier_count)
| | read_unlock(&journal->j_state_lock)
| | wait_event(jrnl->j_wait_tran_locked)
v | // never return

Therefore, jbd2_journal_stop() needs spin_lock(&tran->t_handle_lock) or
read_lock(&journal->j_state_lock) to synchronize its critical code section with
others. Besides, I noticed that t_handle_lock exclusive control had already
been unnecessary when I examined all the functions which used t_handle_lock.
So, read_lock(&journal->j_state_lock) exclusive control is good for this fix.

Regards,
Toshiyuki Okajima

2011-12-22 12:02:43

by Toshiyuki Okajima

[permalink] [raw]
Subject: [PATCH 1/2] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_update operations

The following statements need an exclusive control for the critical code section
around t_update operations:
[jbd2_journal_stop()]
1445 /*
1446 * Once we drop t_updates, if it goes to zero the transaction
1447 * could start committing on us and eventually disappear. So
1448 * once we do this, we must not dereference transaction
1449 * pointer again.
1450 */
1451 tid = transaction->t_tid;
+ read_lock(&journal->j_state_lock);
----- critical code section ------------------------------------------------
1452 if (atomic_dec_and_test(&transaction->t_updates)) {
1453 wake_up(&journal->j_wait_updates);
1454 if (journal->j_barrier_count)
1455 wake_up(&journal->j_wait_transaction_locked);
1456 }
-----------------------------------------------------------------------------
+ read_unlock(&journal->j_state_lock);
1457

Because the functions which have the other critical code sections around t_update
operations,
- jbd2_journal_commit_transaction
- start_this_handle
- jbd2_journal_lock_updates
can not synchronize with jbd2_journal_stop.

ex) jbd2_journal_lock_updates
505 void jbd2_journal_lock_updates(journal_t *journal)
506 {
507 DEFINE_WAIT(wait);
508
509 write_lock(&journal->j_state_lock);
510 ++journal->j_barrier_count;
511
512 /* Wait until there are no running updates */
513 while (1) {
514 transaction_t *transaction = journal->j_running_transaction;
515
516 if (!transaction)
517 break;
518
519 spin_lock(&transaction->t_handle_lock);
----- critical code section ------------------------------------------------
520 if (!atomic_read(&transaction->t_updates)) {
521 spin_unlock(&transaction->t_handle_lock);
522 break;
523 }
524 prepare_to_wait(&journal->j_wait_updates, &wait,
525 TASK_UNINTERRUPTIBLE);
-----------------------------------------------------------------------------
526 spin_unlock(&transaction->t_handle_lock);
527 write_unlock(&journal->j_state_lock);
528 schedule();
529 finish_wait(&journal->j_wait_updates, &wait);
530 write_lock(&journal->j_state_lock);
531 }
532 write_unlock(&journal->j_state_lock);

Thefore, the following steps causes a hang-up of process1:
1) (process1) line 520 in jbd2_journal_lock_updates
transaction->t_updates is equal to 1, and then goto 4).
2) (process2) line 1452 in jbd2_journal_stop
transaction->t_updates becomes to 0, and then goto 3).
3) (process2) line 1453 in jbd2_journal_stop
wake_up(&journal->j_wait_updates) tries to wake someone up.
4) (process1) line 524 in jbd2_journal_lock_updates
prepare to sleep itself, and then goto 5).
5) (process1) line 528 in jbd2_journal_lock_updates
sleep forever because process2 doesn't wake it up anymore.

Similar problem also exists for j_barrier_count operations but it can be
fixed, too:
[jbd2_journal_lock_updates]
505 void jbd2_journal_lock_updates(journal_t *journal)
506 {
507 DEFINE_WAIT(wait);
508
509 write_lock(&journal->j_state_lock);
----------------------------------------------------------------------------
510 ++journal->j_barrier_count;
----------------------------------------------------------------------------
...
532 write_unlock(&journal->j_state_lock);

[jbd2_journal_stop]
1445 /*
1446 * Once we drop t_updates, if it goes to zero the transaction
1447 * could start committing on us and eventually disappear. So
1448 * once we do this, we must not dereference transaction
1449 * pointer again.
1450 */
1451 tid = transaction->t_tid;
+ read_lock(&journal->j_state_lock);
1452 if (atomic_dec_and_test(&transaction->t_updates)) {
1453 wake_up(&journal->j_wait_updates);
----------------------------------------------------------------------------
1454 if (journal->j_barrier_count)
1455 wake_up(&journal->j_wait_transaction_locked);
----------------------------------------------------------------------------
1456 }
+ read_unlock(&journal->j_state_lock);
1457

Signed-off-by: Toshiyuki Okajima <[email protected]>
---
fs/jbd2/transaction.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index a0e41a4..76f2eca 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1449,11 +1449,13 @@ int jbd2_journal_stop(handle_t *handle)
* pointer again.
*/
tid = transaction->t_tid;
+ read_lock(&journal->j_state_lock);
if (atomic_dec_and_test(&transaction->t_updates)) {
wake_up(&journal->j_wait_updates);
if (journal->j_barrier_count)
wake_up(&journal->j_wait_transaction_locked);
}
+ read_unlock(&journal->j_state_lock);

if (wait_for_commit)
err = jbd2_log_wait_commit(journal, tid);
--
1.5.5.6

2011-12-22 12:06:00

by Toshiyuki Okajima

[permalink] [raw]
Subject: [PATCH 2/2] jbd2: remove all t_handle_lock statements

Remove all t_handle_lock statements because they are not necessary anymore.

Because there is read_lock(&journal->j_state_lock) or
write_lock(&journal->j_state_lock) on all the forward codes beyond the place
which needs a spin_lock(&transaction->t_handle_lock).

Signed-off-by: Toshiyuki Okajima <[email protected]>
Cc:
---
fs/jbd2/commit.c | 4 ----
fs/jbd2/transaction.c | 18 +++---------------
include/linux/jbd2.h | 8 --------
3 files changed, 3 insertions(+), 27 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 68d704d..1030d47 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -364,22 +364,18 @@ void jbd2_journal_commit_transaction(journal_t *journal)
stats.run.rs_running = jbd2_time_diff(commit_transaction->t_start,
stats.run.rs_locked);

- spin_lock(&commit_transaction->t_handle_lock);
while (atomic_read(&commit_transaction->t_updates)) {
DEFINE_WAIT(wait);

prepare_to_wait(&journal->j_wait_updates, &wait,
TASK_UNINTERRUPTIBLE);
if (atomic_read(&commit_transaction->t_updates)) {
- spin_unlock(&commit_transaction->t_handle_lock);
write_unlock(&journal->j_state_lock);
schedule();
write_lock(&journal->j_state_lock);
- spin_lock(&commit_transaction->t_handle_lock);
}
finish_wait(&journal->j_wait_updates, &wait);
}
- spin_unlock(&commit_transaction->t_handle_lock);

J_ASSERT (atomic_read(&commit_transaction->t_outstanding_credits) <=
journal->j_max_transaction_buffers);
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 76f2eca..7f84e3f 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -56,7 +56,6 @@ jbd2_get_transaction(journal_t *journal, transaction_t *transaction)
transaction->t_start_time = ktime_get();
transaction->t_tid = journal->j_transaction_sequence++;
transaction->t_expires = jiffies + journal->j_commit_interval;
- spin_lock_init(&transaction->t_handle_lock);
atomic_set(&transaction->t_updates, 0);
atomic_set(&transaction->t_outstanding_credits, 0);
atomic_set(&transaction->t_handle_count, 0);
@@ -100,10 +99,8 @@ static inline void update_t_max_wait(transaction_t *transaction,
if (jbd2_journal_enable_debug &&
time_after(transaction->t_start, ts)) {
ts = jbd2_time_diff(ts, transaction->t_start);
- spin_lock(&transaction->t_handle_lock);
if (ts > transaction->t_max_wait)
transaction->t_max_wait = ts;
- spin_unlock(&transaction->t_handle_lock);
}
#endif
}
@@ -401,19 +398,18 @@ int jbd2_journal_extend(handle_t *handle, int nblocks)
goto error_out;
}

- spin_lock(&transaction->t_handle_lock);
wanted = atomic_read(&transaction->t_outstanding_credits) + nblocks;

if (wanted > journal->j_max_transaction_buffers) {
jbd_debug(3, "denied handle %p %d blocks: "
"transaction too large\n", handle, nblocks);
- goto unlock;
+ goto error_out;
}

if (wanted > __jbd2_log_space_left(journal)) {
jbd_debug(3, "denied handle %p %d blocks: "
"insufficient log space\n", handle, nblocks);
- goto unlock;
+ goto error_out;
}

handle->h_buffer_credits += nblocks;
@@ -421,8 +417,6 @@ int jbd2_journal_extend(handle_t *handle, int nblocks)
result = 0;

jbd_debug(3, "extended handle %p by %d\n", handle, nblocks);
-unlock:
- spin_unlock(&transaction->t_handle_lock);
error_out:
read_unlock(&journal->j_state_lock);
out:
@@ -464,12 +458,10 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, gfp_t gfp_mask)
J_ASSERT(journal_current_handle() == handle);

read_lock(&journal->j_state_lock);
- spin_lock(&transaction->t_handle_lock);
atomic_sub(handle->h_buffer_credits,
&transaction->t_outstanding_credits);
if (atomic_dec_and_test(&transaction->t_updates))
wake_up(&journal->j_wait_updates);
- spin_unlock(&transaction->t_handle_lock);

jbd_debug(2, "restarting handle %p\n", handle);
tid = transaction->t_tid;
@@ -516,14 +508,10 @@ void jbd2_journal_lock_updates(journal_t *journal)
if (!transaction)
break;

- spin_lock(&transaction->t_handle_lock);
- if (!atomic_read(&transaction->t_updates)) {
- spin_unlock(&transaction->t_handle_lock);
+ if (!atomic_read(&transaction->t_updates))
break;
- }
prepare_to_wait(&journal->j_wait_updates, &wait,
TASK_UNINTERRUPTIBLE);
- spin_unlock(&transaction->t_handle_lock);
write_unlock(&journal->j_state_lock);
schedule();
finish_wait(&journal->j_wait_updates, &wait);
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 2092ea2..55f7a8c 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -440,9 +440,6 @@ struct transaction_chp_stats_s {
* ->j_list_lock
*
* j_state_lock
- * ->t_handle_lock
- *
- * j_state_lock
* ->j_list_lock (journal_unmap_buffer)
*
*/
@@ -538,11 +535,6 @@ struct transaction_s
struct list_head t_inode_list;

/*
- * Protects info related to handles
- */
- spinlock_t t_handle_lock;

2011-12-22 14:07:33

by Yongqiang Yang

[permalink] [raw]
Subject: Re: [PATCH 2/2] jbd2: remove all t_handle_lock statements

On Thu, Dec 22, 2011 at 8:00 PM, Toshiyuki Okajima
<[email protected]> wrote:
> Remove all t_handle_lock statements because they are not necessary anymore.
>
> Because there is read_lock(&journal->j_state_lock) or
> write_lock(&journal->j_state_lock) on all the forward codes beyond the place
> which needs a spin_lock(&transaction->t_handle_lock).
>
> Signed-off-by: Toshiyuki Okajima <[email protected]>
> Cc:
> ---
> ?fs/jbd2/commit.c ? ? ?| ? ?4 ----
> ?fs/jbd2/transaction.c | ? 18 +++---------------
> ?include/linux/jbd2.h ?| ? ?8 --------
> ?3 files changed, 3 insertions(+), 27 deletions(-)
>
> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index 68d704d..1030d47 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -364,22 +364,18 @@ void jbd2_journal_commit_transaction(journal_t *journal)
> ? ? ? ?stats.run.rs_running = jbd2_time_diff(commit_transaction->t_start,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stats.run.rs_locked);
>
> - ? ? ? spin_lock(&commit_transaction->t_handle_lock);
> ? ? ? ?while (atomic_read(&commit_transaction->t_updates)) {
> ? ? ? ? ? ? ? ?DEFINE_WAIT(wait);
>
> ? ? ? ? ? ? ? ?prepare_to_wait(&journal->j_wait_updates, &wait,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?TASK_UNINTERRUPTIBLE);
> ? ? ? ? ? ? ? ?if (atomic_read(&commit_transaction->t_updates)) {
> - ? ? ? ? ? ? ? ? ? ? ? spin_unlock(&commit_transaction->t_handle_lock);
> ? ? ? ? ? ? ? ? ? ? ? ?write_unlock(&journal->j_state_lock);
> ? ? ? ? ? ? ? ? ? ? ? ?schedule();
> ? ? ? ? ? ? ? ? ? ? ? ?write_lock(&journal->j_state_lock);
> - ? ? ? ? ? ? ? ? ? ? ? spin_lock(&commit_transaction->t_handle_lock);
> ? ? ? ? ? ? ? ?}
> ? ? ? ? ? ? ? ?finish_wait(&journal->j_wait_updates, &wait);
> ? ? ? ?}
> - ? ? ? spin_unlock(&commit_transaction->t_handle_lock);
>
> ? ? ? ?J_ASSERT (atomic_read(&commit_transaction->t_outstanding_credits) <=
> ? ? ? ? ? ? ? ? ? ? ? ?journal->j_max_transaction_buffers);
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index 76f2eca..7f84e3f 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -56,7 +56,6 @@ jbd2_get_transaction(journal_t *journal, transaction_t *transaction)
> ? ? ? ?transaction->t_start_time = ktime_get();
> ? ? ? ?transaction->t_tid = journal->j_transaction_sequence++;
> ? ? ? ?transaction->t_expires = jiffies + journal->j_commit_interval;
> - ? ? ? spin_lock_init(&transaction->t_handle_lock);
> ? ? ? ?atomic_set(&transaction->t_updates, 0);
> ? ? ? ?atomic_set(&transaction->t_outstanding_credits, 0);
> ? ? ? ?atomic_set(&transaction->t_handle_count, 0);
> @@ -100,10 +99,8 @@ static inline void update_t_max_wait(transaction_t *transaction,
> ? ? ? ?if (jbd2_journal_enable_debug &&
> ? ? ? ? ? ?time_after(transaction->t_start, ts)) {
> ? ? ? ? ? ? ? ?ts = jbd2_time_diff(ts, transaction->t_start);
> - ? ? ? ? ? ? ? spin_lock(&transaction->t_handle_lock);
> ? ? ? ? ? ? ? ?if (ts > transaction->t_max_wait)
> ? ? ? ? ? ? ? ? ? ? ? ?transaction->t_max_wait = ts;
> - ? ? ? ? ? ? ? spin_unlock(&transaction->t_handle_lock);
Hi Toshiyuki,

I think you removed too much. t_handle_lock is needed here to
protect t_max_wait. I meant just removing t_handle_lock from where I
commented in your patch post out last time.

Yongqiang.
> ? ? ? ?}
> ?#endif
> ?}
> @@ -401,19 +398,18 @@ int jbd2_journal_extend(handle_t *handle, int nblocks)
> ? ? ? ? ? ? ? ?goto error_out;
> ? ? ? ?}
>
> - ? ? ? spin_lock(&transaction->t_handle_lock);
> ? ? ? ?wanted = atomic_read(&transaction->t_outstanding_credits) + nblocks;
>
> ? ? ? ?if (wanted > journal->j_max_transaction_buffers) {
> ? ? ? ? ? ? ? ?jbd_debug(3, "denied handle %p %d blocks: "
> ? ? ? ? ? ? ? ? ? ? ? ? ?"transaction too large\n", handle, nblocks);
> - ? ? ? ? ? ? ? goto unlock;
> + ? ? ? ? ? ? ? goto error_out;
> ? ? ? ?}
>
> ? ? ? ?if (wanted > __jbd2_log_space_left(journal)) {
> ? ? ? ? ? ? ? ?jbd_debug(3, "denied handle %p %d blocks: "
> ? ? ? ? ? ? ? ? ? ? ? ? ?"insufficient log space\n", handle, nblocks);
> - ? ? ? ? ? ? ? goto unlock;
> + ? ? ? ? ? ? ? goto error_out;
> ? ? ? ?}
>
> ? ? ? ?handle->h_buffer_credits += nblocks;
> @@ -421,8 +417,6 @@ int jbd2_journal_extend(handle_t *handle, int nblocks)
> ? ? ? ?result = 0;
>
> ? ? ? ?jbd_debug(3, "extended handle %p by %d\n", handle, nblocks);
> -unlock:
> - ? ? ? spin_unlock(&transaction->t_handle_lock);
> ?error_out:
> ? ? ? ?read_unlock(&journal->j_state_lock);
> ?out:
> @@ -464,12 +458,10 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, gfp_t gfp_mask)
> ? ? ? ?J_ASSERT(journal_current_handle() == handle);
>
> ? ? ? ?read_lock(&journal->j_state_lock);
> - ? ? ? spin_lock(&transaction->t_handle_lock);
> ? ? ? ?atomic_sub(handle->h_buffer_credits,
> ? ? ? ? ? ? ? ? ? &transaction->t_outstanding_credits);
> ? ? ? ?if (atomic_dec_and_test(&transaction->t_updates))
> ? ? ? ? ? ? ? ?wake_up(&journal->j_wait_updates);
> - ? ? ? spin_unlock(&transaction->t_handle_lock);
>
> ? ? ? ?jbd_debug(2, "restarting handle %p\n", handle);
> ? ? ? ?tid = transaction->t_tid;
> @@ -516,14 +508,10 @@ void jbd2_journal_lock_updates(journal_t *journal)
> ? ? ? ? ? ? ? ?if (!transaction)
> ? ? ? ? ? ? ? ? ? ? ? ?break;
>
> - ? ? ? ? ? ? ? spin_lock(&transaction->t_handle_lock);
> - ? ? ? ? ? ? ? if (!atomic_read(&transaction->t_updates)) {
> - ? ? ? ? ? ? ? ? ? ? ? spin_unlock(&transaction->t_handle_lock);
> + ? ? ? ? ? ? ? if (!atomic_read(&transaction->t_updates))
> ? ? ? ? ? ? ? ? ? ? ? ?break;
> - ? ? ? ? ? ? ? }
> ? ? ? ? ? ? ? ?prepare_to_wait(&journal->j_wait_updates, &wait,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?TASK_UNINTERRUPTIBLE);
> - ? ? ? ? ? ? ? spin_unlock(&transaction->t_handle_lock);
> ? ? ? ? ? ? ? ?write_unlock(&journal->j_state_lock);
> ? ? ? ? ? ? ? ?schedule();
> ? ? ? ? ? ? ? ?finish_wait(&journal->j_wait_updates, &wait);
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 2092ea2..55f7a8c 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -440,9 +440,6 @@ struct transaction_chp_stats_s {
> ?* ? ?->j_list_lock
> ?*
> ?* ? ?j_state_lock
> - * ? ?->t_handle_lock
> - *
> - * ? ?j_state_lock
> ?* ? ?->j_list_lock ? ? ? ? ? ? ? ? ? ?(journal_unmap_buffer)
> ?*
> ?*/
> @@ -538,11 +535,6 @@ struct transaction_s
> ? ? ? ?struct list_head ? ? ? ?t_inode_list;
>
> ? ? ? ?/*
> - ? ? ? ?* Protects info related to handles
> - ? ? ? ?*/
> - ? ? ? spinlock_t ? ? ? ? ? ? ?t_handle_lock;
> -
> - ? ? ? /*
> ? ? ? ? * Longest time some handle had to wait for running transaction
> ? ? ? ? */
> ? ? ? ?unsigned long ? ? ? ? ? t_max_wait;
> --
> 1.5.5.6



--
Best Wishes
Yongqiang Yang

2011-12-26 00:17:21

by Toshiyuki Okajima

[permalink] [raw]
Subject: Re: [PATCH 2/2] jbd2: remove all t_handle_lock statements

Hi Yongqiang,

(2011/12/22 23:07), Yongqiang Yang wrote:
> On Thu, Dec 22, 2011 at 8:00 PM, Toshiyuki Okajima
> <[email protected]> wrote:
>> Remove all t_handle_lock statements because they are not necessary anymore.
>>
>> Because there is read_lock(&journal->j_state_lock) or
>> write_lock(&journal->j_state_lock) on all the forward codes beyond the place
>> which needs a spin_lock(&transaction->t_handle_lock).
>>
>> Signed-off-by: Toshiyuki Okajima <[email protected]>
>> Cc:
>> ---
>> fs/jbd2/commit.c | 4 ----
>> fs/jbd2/transaction.c | 18 +++---------------
>> include/linux/jbd2.h | 8 --------
>> 3 files changed, 3 insertions(+), 27 deletions(-)
>>
>> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
>> index 68d704d..1030d47 100644
>> --- a/fs/jbd2/commit.c
>> +++ b/fs/jbd2/commit.c
>> @@ -364,22 +364,18 @@ void jbd2_journal_commit_transaction(journal_t *journal)
>> stats.run.rs_running = jbd2_time_diff(commit_transaction->t_start,
>> stats.run.rs_locked);
>>
>> - spin_lock(&commit_transaction->t_handle_lock);
>> while (atomic_read(&commit_transaction->t_updates)) {
>> DEFINE_WAIT(wait);
>>
>> prepare_to_wait(&journal->j_wait_updates, &wait,
>> TASK_UNINTERRUPTIBLE);
>> if (atomic_read(&commit_transaction->t_updates)) {
>> - spin_unlock(&commit_transaction->t_handle_lock);
>> write_unlock(&journal->j_state_lock);
>> schedule();
>> write_lock(&journal->j_state_lock);
>> - spin_lock(&commit_transaction->t_handle_lock);
>> }
>> finish_wait(&journal->j_wait_updates, &wait);
>> }
>> - spin_unlock(&commit_transaction->t_handle_lock);
>>
>> J_ASSERT (atomic_read(&commit_transaction->t_outstanding_credits) <=
>> journal->j_max_transaction_buffers);
>> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
>> index 76f2eca..7f84e3f 100644
>> --- a/fs/jbd2/transaction.c
>> +++ b/fs/jbd2/transaction.c
>> @@ -56,7 +56,6 @@ jbd2_get_transaction(journal_t *journal, transaction_t *transaction)
>> transaction->t_start_time = ktime_get();
>> transaction->t_tid = journal->j_transaction_sequence++;
>> transaction->t_expires = jiffies + journal->j_commit_interval;
>> - spin_lock_init(&transaction->t_handle_lock);
>> atomic_set(&transaction->t_updates, 0);
>> atomic_set(&transaction->t_outstanding_credits, 0);
>> atomic_set(&transaction->t_handle_count, 0);

>> @@ -100,10 +99,8 @@ static inline void update_t_max_wait(transaction_t *transaction,
>> if (jbd2_journal_enable_debug &&
>> time_after(transaction->t_start, ts)) {
>> ts = jbd2_time_diff(ts, transaction->t_start);
>> - spin_lock(&transaction->t_handle_lock);
>> if (ts > transaction->t_max_wait)
>> transaction->t_max_wait = ts;
>> - spin_unlock(&transaction->t_handle_lock);
> Hi Toshiyuki,
>
> I think you removed too much. t_handle_lock is needed here to
> protect t_max_wait. I meant just removing t_handle_lock from where I
> commented in your patch post out last time.
Thanks for your comment. OK, I understand.

This function tries to protect itself by
read_lock(&journal->j_state_lock), but by only
read_lock(&journal->j_state_lock) cannot protect this critical code
section.

96 static inline void update_t_max_wait(transaction_t *transaction,
97 unsigned long ts)
98 {
99 #ifdef CONFIG_JBD2_DEBUG
100 if (jbd2_journal_enable_debug &&
101 time_after(transaction->t_start, ts)) {
102 ts = jbd2_time_diff(ts, transaction->t_start);
103 spin_lock(&transaction->t_handle_lock);
104 if (ts > transaction->t_max_wait)
105 transaction->t_max_wait = ts;
106 spin_unlock(&transaction->t_handle_lock);
107 }
108 #endif
109 }

I will delete the part of my patch for update_t_max_wait().

Due to the same reason,
- jbd2_journal_extend
- jbd2_journal_restart
cannot be protected by my patch. So, I will delete these parts, too.

Best Regards,
Toshiyuki Okajima

>
> Yongqiang.
>> }
>> #endif
>> }
>> @@ -401,19 +398,18 @@ int jbd2_journal_extend(handle_t *handle, int nblocks)
>> goto error_out;
>> }
>>
>> - spin_lock(&transaction->t_handle_lock);
>> wanted = atomic_read(&transaction->t_outstanding_credits) + nblocks;
>>
>> if (wanted > journal->j_max_transaction_buffers) {
>> jbd_debug(3, "denied handle %p %d blocks: "
>> "transaction too large\n", handle, nblocks);
>> - goto unlock;
>> + goto error_out;
>> }
>>
>> if (wanted > __jbd2_log_space_left(journal)) {
>> jbd_debug(3, "denied handle %p %d blocks: "
>> "insufficient log space\n", handle, nblocks);
>> - goto unlock;
>> + goto error_out;
>> }
>>
>> handle->h_buffer_credits += nblocks;
>> @@ -421,8 +417,6 @@ int jbd2_journal_extend(handle_t *handle, int nblocks)
>> result = 0;
>>
>> jbd_debug(3, "extended handle %p by %d\n", handle, nblocks);
>> -unlock:
>> - spin_unlock(&transaction->t_handle_lock);
>> error_out:
>> read_unlock(&journal->j_state_lock);
>> out:
>> @@ -464,12 +458,10 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, gfp_t gfp_mask)
>> J_ASSERT(journal_current_handle() == handle);
>>
>> read_lock(&journal->j_state_lock);
>> - spin_lock(&transaction->t_handle_lock);
>> atomic_sub(handle->h_buffer_credits,
>> &transaction->t_outstanding_credits);
>> if (atomic_dec_and_test(&transaction->t_updates))
>> wake_up(&journal->j_wait_updates);
>> - spin_unlock(&transaction->t_handle_lock);
>>
>> jbd_debug(2, "restarting handle %p\n", handle);
>> tid = transaction->t_tid;
>> @@ -516,14 +508,10 @@ void jbd2_journal_lock_updates(journal_t *journal)
>> if (!transaction)
>> break;
>>
>> - spin_lock(&transaction->t_handle_lock);
>> - if (!atomic_read(&transaction->t_updates)) {
>> - spin_unlock(&transaction->t_handle_lock);
>> + if (!atomic_read(&transaction->t_updates))
>> break;
>> - }
>> prepare_to_wait(&journal->j_wait_updates, &wait,
>> TASK_UNINTERRUPTIBLE);
>> - spin_unlock(&transaction->t_handle_lock);
>> write_unlock(&journal->j_state_lock);
>> schedule();
>> finish_wait(&journal->j_wait_updates, &wait);
>> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
>> index 2092ea2..55f7a8c 100644
>> --- a/include/linux/jbd2.h
>> +++ b/include/linux/jbd2.h
>> @@ -440,9 +440,6 @@ struct transaction_chp_stats_s {
>> * ->j_list_lock
>> *
>> * j_state_lock
>> - * ->t_handle_lock
>> - *
>> - * j_state_lock
>> * ->j_list_lock (journal_unmap_buffer)
>> *
>> */
>> @@ -538,11 +535,6 @@ struct transaction_s
>> struct list_head t_inode_list;
>>
>> /*
>> - * Protects info related to handles
>> - */
>> - spinlock_t t_handle_lock;
>> -
>> - /*
>> * Longest time some handle had to wait for running transaction
>> */
>> unsigned long t_max_wait;
>> --
>> 1.5.5.6
>
>
>


2011-12-26 01:25:13

by Toshiyuki Okajima

[permalink] [raw]
Subject: [PATCH 2/2 take2] jbd2: delete spin_lock(t_handle_lock) inside wirte_lock(j_state_lock)

Delete all spin_lock(t_handle_lock) statements inside write_lock(j_state_lock)
because the critical code sections can be protected by write_lock(j_state_lock)
only.

Signed-off-by: Toshiyuki Okajima <[email protected]>
Reported-by: Yongqiang Yang <[email protected]>
---
fs/jbd2/commit.c | 4 ----
fs/jbd2/transaction.c | 6 +-----
2 files changed, 1 insertions(+), 9 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 68d704d..1030d47 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -364,22 +364,18 @@ void jbd2_journal_commit_transaction(journal_t *journal)
stats.run.rs_running = jbd2_time_diff(commit_transaction->t_start,
stats.run.rs_locked);

- spin_lock(&commit_transaction->t_handle_lock);
while (atomic_read(&commit_transaction->t_updates)) {
DEFINE_WAIT(wait);

prepare_to_wait(&journal->j_wait_updates, &wait,
TASK_UNINTERRUPTIBLE);
if (atomic_read(&commit_transaction->t_updates)) {
- spin_unlock(&commit_transaction->t_handle_lock);
write_unlock(&journal->j_state_lock);
schedule();
write_lock(&journal->j_state_lock);
- spin_lock(&commit_transaction->t_handle_lock);
}
finish_wait(&journal->j_wait_updates, &wait);
}
- spin_unlock(&commit_transaction->t_handle_lock);

J_ASSERT (atomic_read(&commit_transaction->t_outstanding_credits) <=
journal->j_max_transaction_buffers);
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 76f2eca..c418ed9 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -516,14 +516,10 @@ void jbd2_journal_lock_updates(journal_t *journal)
if (!transaction)
break;

- spin_lock(&transaction->t_handle_lock);
- if (!atomic_read(&transaction->t_updates)) {
- spin_unlock(&transaction->t_handle_lock);
+ if (!atomic_read(&transaction->t_updates))
break;
- }
prepare_to_wait(&journal->j_wait_updates, &wait,
TASK_UNINTERRUPTIBLE);
- spin_unlock(&transaction->t_handle_lock);
write_unlock(&journal->j_state_lock);
schedule();
finish_wait(&journal->j_wait_updates, &wait);
--
1.5.5.6

2011-12-31 06:35:14

by Yongqiang Yang

[permalink] [raw]
Subject: Re: [PATCH 2/2 take2] jbd2: delete spin_lock(t_handle_lock) inside wirte_lock(j_state_lock)

Looks good to me.

Thanks,
Yongqiang.
On Mon, Dec 26, 2011 at 9:17 AM, Toshiyuki Okajima
<[email protected]> wrote:
> Delete all spin_lock(t_handle_lock) statements inside write_lock(j_state_lock)
> because the critical code sections can be protected by write_lock(j_state_lock)
> only.
>
> Signed-off-by: Toshiyuki Okajima <[email protected]>
> Reported-by: Yongqiang Yang <[email protected]>
> ---
> ?fs/jbd2/commit.c ? ? ?| ? ?4 ----
> ?fs/jbd2/transaction.c | ? ?6 +-----
> ?2 files changed, 1 insertions(+), 9 deletions(-)
>
> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index 68d704d..1030d47 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -364,22 +364,18 @@ void jbd2_journal_commit_transaction(journal_t *journal)
> ? ? ? ?stats.run.rs_running = jbd2_time_diff(commit_transaction->t_start,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stats.run.rs_locked);
>
> - ? ? ? spin_lock(&commit_transaction->t_handle_lock);
> ? ? ? ?while (atomic_read(&commit_transaction->t_updates)) {
> ? ? ? ? ? ? ? ?DEFINE_WAIT(wait);
>
> ? ? ? ? ? ? ? ?prepare_to_wait(&journal->j_wait_updates, &wait,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?TASK_UNINTERRUPTIBLE);
> ? ? ? ? ? ? ? ?if (atomic_read(&commit_transaction->t_updates)) {
> - ? ? ? ? ? ? ? ? ? ? ? spin_unlock(&commit_transaction->t_handle_lock);
> ? ? ? ? ? ? ? ? ? ? ? ?write_unlock(&journal->j_state_lock);
> ? ? ? ? ? ? ? ? ? ? ? ?schedule();
> ? ? ? ? ? ? ? ? ? ? ? ?write_lock(&journal->j_state_lock);
> - ? ? ? ? ? ? ? ? ? ? ? spin_lock(&commit_transaction->t_handle_lock);
> ? ? ? ? ? ? ? ?}
> ? ? ? ? ? ? ? ?finish_wait(&journal->j_wait_updates, &wait);
> ? ? ? ?}
> - ? ? ? spin_unlock(&commit_transaction->t_handle_lock);
>
> ? ? ? ?J_ASSERT (atomic_read(&commit_transaction->t_outstanding_credits) <=
> ? ? ? ? ? ? ? ? ? ? ? ?journal->j_max_transaction_buffers);
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index 76f2eca..c418ed9 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -516,14 +516,10 @@ void jbd2_journal_lock_updates(journal_t *journal)
> ? ? ? ? ? ? ? ?if (!transaction)
> ? ? ? ? ? ? ? ? ? ? ? ?break;
>
> - ? ? ? ? ? ? ? spin_lock(&transaction->t_handle_lock);
> - ? ? ? ? ? ? ? if (!atomic_read(&transaction->t_updates)) {
> - ? ? ? ? ? ? ? ? ? ? ? spin_unlock(&transaction->t_handle_lock);
> + ? ? ? ? ? ? ? if (!atomic_read(&transaction->t_updates))
> ? ? ? ? ? ? ? ? ? ? ? ?break;
> - ? ? ? ? ? ? ? }
> ? ? ? ? ? ? ? ?prepare_to_wait(&journal->j_wait_updates, &wait,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?TASK_UNINTERRUPTIBLE);
> - ? ? ? ? ? ? ? spin_unlock(&transaction->t_handle_lock);
> ? ? ? ? ? ? ? ?write_unlock(&journal->j_state_lock);
> ? ? ? ? ? ? ? ?schedule();
> ? ? ? ? ? ? ? ?finish_wait(&journal->j_wait_updates, &wait);
> --
> 1.5.5.6



--
Best Wishes
Yongqiang Yang

2012-01-04 17:00:28

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 2/2 take2] jbd2: delete spin_lock(t_handle_lock) inside wirte_lock(j_state_lock)

On Mon 26-12-11 10:17:27, Toshiyuki Okajima wrote:
> Delete all spin_lock(t_handle_lock) statements inside write_lock(j_state_lock)
> because the critical code sections can be protected by write_lock(j_state_lock)
> only.
>
> Signed-off-by: Toshiyuki Okajima <[email protected]>
> Reported-by: Yongqiang Yang <[email protected]>
The patch looks good. You can add:
Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/jbd2/commit.c | 4 ----
> fs/jbd2/transaction.c | 6 +-----
> 2 files changed, 1 insertions(+), 9 deletions(-)
>
> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index 68d704d..1030d47 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -364,22 +364,18 @@ void jbd2_journal_commit_transaction(journal_t *journal)
> stats.run.rs_running = jbd2_time_diff(commit_transaction->t_start,
> stats.run.rs_locked);
>
> - spin_lock(&commit_transaction->t_handle_lock);
> while (atomic_read(&commit_transaction->t_updates)) {
> DEFINE_WAIT(wait);
>
> prepare_to_wait(&journal->j_wait_updates, &wait,
> TASK_UNINTERRUPTIBLE);
> if (atomic_read(&commit_transaction->t_updates)) {
> - spin_unlock(&commit_transaction->t_handle_lock);
> write_unlock(&journal->j_state_lock);
> schedule();
> write_lock(&journal->j_state_lock);
> - spin_lock(&commit_transaction->t_handle_lock);
> }
> finish_wait(&journal->j_wait_updates, &wait);
> }
> - spin_unlock(&commit_transaction->t_handle_lock);
>
> J_ASSERT (atomic_read(&commit_transaction->t_outstanding_credits) <=
> journal->j_max_transaction_buffers);
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index 76f2eca..c418ed9 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -516,14 +516,10 @@ void jbd2_journal_lock_updates(journal_t *journal)
> if (!transaction)
> break;
>
> - spin_lock(&transaction->t_handle_lock);
> - if (!atomic_read(&transaction->t_updates)) {
> - spin_unlock(&transaction->t_handle_lock);
> + if (!atomic_read(&transaction->t_updates))
> break;
> - }
> prepare_to_wait(&journal->j_wait_updates, &wait,
> TASK_UNINTERRUPTIBLE);
> - spin_unlock(&transaction->t_handle_lock);
> write_unlock(&journal->j_state_lock);
> schedule();
> finish_wait(&journal->j_wait_updates, &wait);
> --
> 1.5.5.6
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2012-01-04 17:00:56

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 1/2] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_update operations

Hello,

On Thu 22-12-11 20:56:50, Toshiyuki Okajima wrote:
> The following statements need an exclusive control for the critical code section
> around t_update operations:
> [jbd2_journal_stop()]
> 1445 /*
> 1446 * Once we drop t_updates, if it goes to zero the transaction
> 1447 * could start committing on us and eventually disappear. So
> 1448 * once we do this, we must not dereference transaction
> 1449 * pointer again.
> 1450 */
> 1451 tid = transaction->t_tid;
> + read_lock(&journal->j_state_lock);
> ----- critical code section ------------------------------------------------
> 1452 if (atomic_dec_and_test(&transaction->t_updates)) {
> 1453 wake_up(&journal->j_wait_updates);
> 1454 if (journal->j_barrier_count)
> 1455 wake_up(&journal->j_wait_transaction_locked);
> 1456 }
> -----------------------------------------------------------------------------
> + read_unlock(&journal->j_state_lock);
> 1457
>
> Because the functions which have the other critical code sections around t_update
> operations,
> - jbd2_journal_commit_transaction
> - start_this_handle
> - jbd2_journal_lock_updates
> can not synchronize with jbd2_journal_stop.
>
> ex) jbd2_journal_lock_updates
> 505 void jbd2_journal_lock_updates(journal_t *journal)
> 506 {
> 507 DEFINE_WAIT(wait);
> 508
> 509 write_lock(&journal->j_state_lock);
> 510 ++journal->j_barrier_count;
> 511
> 512 /* Wait until there are no running updates */
> 513 while (1) {
> 514 transaction_t *transaction = journal->j_running_transaction;
> 515
> 516 if (!transaction)
> 517 break;
> 518
> 519 spin_lock(&transaction->t_handle_lock);
> ----- critical code section ------------------------------------------------
> 520 if (!atomic_read(&transaction->t_updates)) {
> 521 spin_unlock(&transaction->t_handle_lock);
> 522 break;
> 523 }
> 524 prepare_to_wait(&journal->j_wait_updates, &wait,
> 525 TASK_UNINTERRUPTIBLE);
> -----------------------------------------------------------------------------
> 526 spin_unlock(&transaction->t_handle_lock);
> 527 write_unlock(&journal->j_state_lock);
> 528 schedule();
> 529 finish_wait(&journal->j_wait_updates, &wait);
> 530 write_lock(&journal->j_state_lock);
> 531 }
> 532 write_unlock(&journal->j_state_lock);
>
> Thefore, the following steps causes a hang-up of process1:
> 1) (process1) line 520 in jbd2_journal_lock_updates
> transaction->t_updates is equal to 1, and then goto 4).
> 2) (process2) line 1452 in jbd2_journal_stop
> transaction->t_updates becomes to 0, and then goto 3).
> 3) (process2) line 1453 in jbd2_journal_stop
> wake_up(&journal->j_wait_updates) tries to wake someone up.
> 4) (process1) line 524 in jbd2_journal_lock_updates
> prepare to sleep itself, and then goto 5).
> 5) (process1) line 528 in jbd2_journal_lock_updates
> sleep forever because process2 doesn't wake it up anymore.
Thanks for the analysis. Actually, you fix adds unnecessary overhead.
The problem really is the wrong ordering of prepare_to_wait() and t_updates
check. So attached patch should fix the issue as well without introducing
the overhead.

> Similar problem also exists for j_barrier_count operations but it can be
> fixed, too:
> [jbd2_journal_lock_updates]
> 505 void jbd2_journal_lock_updates(journal_t *journal)
> 506 {
> 507 DEFINE_WAIT(wait);
> 508
> 509 write_lock(&journal->j_state_lock);
> ----------------------------------------------------------------------------
> 510 ++journal->j_barrier_count;
> ----------------------------------------------------------------------------
> ...
> 532 write_unlock(&journal->j_state_lock);
>
> [jbd2_journal_stop]
> 1445 /*
> 1446 * Once we drop t_updates, if it goes to zero the transaction
> 1447 * could start committing on us and eventually disappear. So
> 1448 * once we do this, we must not dereference transaction
> 1449 * pointer again.
> 1450 */
> 1451 tid = transaction->t_tid;
> + read_lock(&journal->j_state_lock);
> 1452 if (atomic_dec_and_test(&transaction->t_updates)) {
> 1453 wake_up(&journal->j_wait_updates);
> ----------------------------------------------------------------------------
> 1454 if (journal->j_barrier_count)
> 1455 wake_up(&journal->j_wait_transaction_locked);
> ----------------------------------------------------------------------------
> 1456 }
> + read_unlock(&journal->j_state_lock);
> 1457
Here I don't agree. We use wait_event() to wait for j_barrier_count to
drop to zero and wait_event() has proper ordering of prepare_to_wait() and
test.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR


Attachments:
(No filename) (5.11 kB)
0001-jbd2-Fix-hung-processes-in-jbd2_journal_lock_updates.patch (2.29 kB)
Download all attachments

2012-01-05 03:12:48

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 1/2] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_update operations

On Tue, Jan 03, 2012 at 04:32:45PM +0100, Jan Kara wrote:
> Thanks for the analysis. Actually, you fix adds unnecessary overhead.
> The problem really is the wrong ordering of prepare_to_wait() and t_updates
> check. So attached patch should fix the issue as well without introducing
> the overhead.

Thanks, applied.

- Ted


> From 1cd5b8178893f3f186ce93eb1f47664a1a3e81fc Mon Sep 17 00:00:00 2001
> From: Jan Kara <[email protected]>
> Date: Tue, 3 Jan 2012 16:13:29 +0100
> Subject: [PATCH] jbd2: Fix hung processes in jbd2_journal_lock_updates()
>
> Toshiyuki Okajima found out that when running
>
> for ((i=0; i < 100000; i++)); do
> if ((i%2 == 0)); then
> chattr +j /mnt/file
> else
> chattr -j /mnt/file
> fi
> echo "0" >> /mnt/file
> done
>
> process sometimes hangs indefinitely in jbd2_journal_lock_updates().
>
> Toshiyuki identified that the following race happens:
>
> jbd2_journal_lock_updates() |jbd2_journal_stop()
> ---------------------------------------+---------------------------------------
> write_lock(&journal->j_state_lock) | .
> ++journal->j_barrier_count | .
> spin_lock(&tran->t_handle_lock) | .
> atomic_read(&tran->t_updates) //not 0 |
> | atomic_dec_and_test(&tran->t_updates)
> | // t_updates = 0
> | wake_up(&journal->j_wait_updates)
> prepare_to_wait() | // no process is woken up.
> spin_unlock(&tran->t_handle_lock) |
> write_unlock(&journal->j_state_lock) |
> schedule() // never return |
>
> We fix the problem by first calling prepare_to_wait() and only after that
> checking t_updates in jbd2_journal_lock_updates().
>
> Reported-and-analyzed-by: Toshiyuki Okajima <[email protected]>
> Signed-off-by: Jan Kara <[email protected]>

2012-01-05 03:27:57

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 2/2 take2] jbd2: delete spin_lock(t_handle_lock) inside wirte_lock(j_state_lock)

On Mon, Dec 26, 2011 at 10:17:27AM +0900, Toshiyuki Okajima wrote:
> Delete all spin_lock(t_handle_lock) statements inside write_lock(j_state_lock)
> because the critical code sections can be protected by write_lock(j_state_lock)
> only.
>
> Signed-off-by: Toshiyuki Okajima <[email protected]>
> Reported-by: Yongqiang Yang <[email protected]>

This may be true today, but I'm worried this is going to make the code
more fragile. The header file include/linux/jbd2.h defines which
fields are protected by which lock. This patch is going to make those
comments partially obsolete. We need to make sure the locking
protocol is clearly defined before we go around removing locks.

This is something I would like to do, but we need to make sure the
code remains maintainable in the long run.

- Ted