2013-01-19 23:44:31

by Sedat Dilek

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

Hi,

I and some others hit a similiar problem in Linux-Next
(next-20130118), please see [1] and [2].

[3] has a interim analyze of my problems.

After suspecting the problem was caused by TTY-NEXT, it turned out to
be a JBD2 problem finally.
The freezer/pm_test was helpful to hit the issue (Thanks Rafael for the hint!).

So, the issue has two faces: TTY and JBD2.
[4] gives a list and URLs of the patches I had to apply to have a
clean system again.

After applying the two TTY patches (without Eric's JBD2-fix!) the
call-trace after freezer/pm_test looked like this;

[ 433.527986] PM: Syncing filesystems ... done.
[ 433.843761] PM: Preparing system for mem sleep
[ 436.306002] Freezing user space processes ...
[ 456.304956] Freezing of tasks failed after 20.01 seconds (1 tasks
refusing to freeze, wq_busy=0):
[ 456.305060] Cache I/O D ffffffff8180d780 0 2132 1 0x00000004
[ 456.305065] ffff88007b9dfe18 0000000000000046 ffff88007b9dfdc8
0000000300000001
[ 456.305069] ffff880097e21720 ffff88007b9dffd8 ffff88007b9dffd8
ffff88007b9dffd8
[ 456.305072] ffff880119b32e40 ffff880097e21720 ffff88007b9dfe28
ffff880118077800
[ 456.305076] Call Trace:
[ 456.305085] [<ffffffff816b7469>] schedule+0x29/0x70
[ 456.305089] [<ffffffff8129067d>] jbd2_log_wait_commit+0xcd/0x1a0
[ 456.305094] [<ffffffff8107eac0>] ? add_wait_queue+0x60/0x60
[ 456.305098] [<ffffffff812307a5>] ext4_sync_file+0x205/0x380
[ 456.305103] [<ffffffff811c43ad>] do_fsync+0x5d/0x90
[ 456.305108] [<ffffffff8119442b>] ? sys_write+0x6b/0xa0
[ 456.305111] [<ffffffff811c4790>] sys_fsync+0x10/0x20
[ 456.305114] [<ffffffff816c0edd>] system_call_fastpath+0x1a/0x1f
[ 456.305122]
[ 456.305124] Restarting tasks ... done.
[ 456.315056] video LNXVIDEO:00: Restoring backlight state

After applying Eric's patch [5], I could not hit the call-trace again.
NOTE: The patch is from December 2012 and is not shipped in latest Linux-Next.

The attached testcase script was helpful to force the call-trace.
I have run 50 loops of it w/o an issue!

Feel free to add a Reported-by/Tested-by.
( The issue kept me busy for the last days. )

Regards,
- Sedat -

[1] http://marc.info/?t=135283664600002&r=1&w=2
[2] http://marc.info/?t=135862023700004&r=1&w=2
[3] http://marc.info/?l=linux-kernel&m=135862010419101&w=2
[4] http://marc.info/?l=linux-next&m=135863550923093&w=2
[5] http://patchwork.ozlabs.org/patch/207237/


Attachments:
run_pm-test_v2.sh (875.00 B)

2013-01-20 00:06:41

by Sedat Dilek

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

On Sun, Jan 20, 2013 at 12:44 AM, Sedat Dilek <[email protected]> wrote:
> Hi,
>
> I and some others hit a similiar problem in Linux-Next
> (next-20130118), please see [1] and [2].
>
> [3] has a interim analyze of my problems.
>
> After suspecting the problem was caused by TTY-NEXT, it turned out to
> be a JBD2 problem finally.
> The freezer/pm_test was helpful to hit the issue (Thanks Rafael for the hint!).
>
> So, the issue has two faces: TTY and JBD2.
> [4] gives a list and URLs of the patches I had to apply to have a
> clean system again.
>
> After applying the two TTY patches (without Eric's JBD2-fix!) the
> call-trace after freezer/pm_test looked like this;
>
> [ 433.527986] PM: Syncing filesystems ... done.
> [ 433.843761] PM: Preparing system for mem sleep
> [ 436.306002] Freezing user space processes ...
> [ 456.304956] Freezing of tasks failed after 20.01 seconds (1 tasks
> refusing to freeze, wq_busy=0):
> [ 456.305060] Cache I/O D ffffffff8180d780 0 2132 1 0x00000004
> [ 456.305065] ffff88007b9dfe18 0000000000000046 ffff88007b9dfdc8
> 0000000300000001
> [ 456.305069] ffff880097e21720 ffff88007b9dffd8 ffff88007b9dffd8
> ffff88007b9dffd8
> [ 456.305072] ffff880119b32e40 ffff880097e21720 ffff88007b9dfe28
> ffff880118077800
> [ 456.305076] Call Trace:
> [ 456.305085] [<ffffffff816b7469>] schedule+0x29/0x70
> [ 456.305089] [<ffffffff8129067d>] jbd2_log_wait_commit+0xcd/0x1a0
> [ 456.305094] [<ffffffff8107eac0>] ? add_wait_queue+0x60/0x60
> [ 456.305098] [<ffffffff812307a5>] ext4_sync_file+0x205/0x380
> [ 456.305103] [<ffffffff811c43ad>] do_fsync+0x5d/0x90
> [ 456.305108] [<ffffffff8119442b>] ? sys_write+0x6b/0xa0
> [ 456.305111] [<ffffffff811c4790>] sys_fsync+0x10/0x20
> [ 456.305114] [<ffffffff816c0edd>] system_call_fastpath+0x1a/0x1f
> [ 456.305122]
> [ 456.305124] Restarting tasks ... done.
> [ 456.315056] video LNXVIDEO:00: Restoring backlight state
>
> After applying Eric's patch [5], I could not hit the call-trace again.
> NOTE: The patch is from December 2012 and is not shipped in latest Linux-Next.
>
> The attached testcase script was helpful to force the call-trace.
> I have run 50 loops of it w/o an issue!
>
> Feel free to add a Reported-by/Tested-by.
> ( The issue kept me busy for the last days. )
>
> Regards,
> - Sedat -
>
> [1] http://marc.info/?t=135283664600002&r=1&w=2
> [2] http://marc.info/?t=135862023700004&r=1&w=2
> [3] http://marc.info/?l=linux-kernel&m=135862010419101&w=2
> [4] http://marc.info/?l=linux-next&m=135863550923093&w=2
> [5] http://patchwork.ozlabs.org/patch/207237/

Hi all,

this is really ugly... I could cry!

GRRR, I have hit the original issue again while running my testcase script!

[ 363.831226] PM: Syncing filesystems ... done.
[ 363.988855] PM: Preparing system for mem sleep
[ 369.032965] Freezing user space processes ... (elapsed 0.11 seconds) done.
[ 369.145792] Freezing remaining freezable tasks ...
[ 389.137643] Freezing of tasks failed after 20.01 seconds (1 tasks
refusing to freeze, wq_busy=0):
[ 389.137760] jbd2/loop0-8 D ffffffff8180d780 0 295 2 0x00000000
[ 389.137769] ffff8801181dfb68 0000000000000046 ffff880117de5c80
0000000000000001
[ 389.137778] ffff880117de5c80 ffff8801181dffd8 ffff8801181dffd8
ffff8801181dffd8
[ 389.137784] ffffffff81c15440 ffff880117de5c80 ffff8801181dfb68
ffff88011fa14738
[ 389.137791] Call Trace:
[ 389.137810] [<ffffffff811c6830>] ? __wait_on_buffer+0x30/0x30
[ 389.137819] [<ffffffff816b7479>] schedule+0x29/0x70
[ 389.137825] [<ffffffff816b754f>] io_schedule+0x8f/0xd0
[ 389.137832] [<ffffffff811c683e>] sleep_on_buffer+0xe/0x20
[ 389.137842] [<ffffffff816b5d4f>] __wait_on_bit+0x5f/0x90
[ 389.137849] [<ffffffff811c5ef1>] ? submit_bh+0x121/0x1e0
[ 389.137856] [<ffffffff811c6830>] ? __wait_on_buffer+0x30/0x30
[ 389.137864] [<ffffffff816b5dfc>] out_of_line_wait_on_bit+0x7c/0x90
[ 389.137873] [<ffffffff8107eb00>] ? autoremove_wake_function+0x40/0x40
[ 389.137879] [<ffffffff811c682e>] __wait_on_buffer+0x2e/0x30
[ 389.137891] [<ffffffff8128c02c>]
jbd2_journal_commit_transaction+0x18cc/0x1d60
[ 389.137899] [<ffffffff816b860e>] ? _raw_spin_lock_irqsave+0x2e/0x40
[ 389.137908] [<ffffffff81069fbf>] ? try_to_del_timer_sync+0x4f/0x70
[ 389.137915] [<ffffffff81290836>] kjournald2+0xd6/0x3e0
[ 389.137921] [<ffffffff8107eac0>] ? add_wait_queue+0x60/0x60
[ 389.137926] [<ffffffff81290760>] ? commit_timeout+0x10/0x10
[ 389.137932] [<ffffffff8107ded0>] kthread+0xc0/0xd0
[ 389.137939] [<ffffffff8107de10>] ? flush_kthread_worker+0xb0/0xb0
[ 389.137946] [<ffffffff816c0e2c>] ret_from_fork+0x7c/0xb0
[ 389.137951] [<ffffffff8107de10>] ? flush_kthread_worker+0xb0/0xb0
[ 389.138017]
[ 389.138021] Restarting kernel threads ... done.
[ 389.138173] Restarting tasks ... done.
[ 389.147980] video LNXVIDEO:00: Restoring backlight state

I suspect there are still issues in JBD2 (BTW I run here EXT4FS).

[ 389.137643] Freezing of tasks failed after 20.01 seconds (1 tasks
refusing to freeze, wq_busy=0):
[ 389.137760] jbd2/loop0-8 D ffffffff8180d780 0 295 2 0x00000000

Have a good night, /me -> watching TV!

- Sedat -

2013-01-20 00:35:09

by Sedat Dilek

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

On Sun, Jan 20, 2013 at 1:06 AM, Sedat Dilek <[email protected]> wrote:
> On Sun, Jan 20, 2013 at 12:44 AM, Sedat Dilek <[email protected]> wrote:
>> Hi,
>>
>> I and some others hit a similiar problem in Linux-Next
>> (next-20130118), please see [1] and [2].
>>
>> [3] has a interim analyze of my problems.
>>
>> After suspecting the problem was caused by TTY-NEXT, it turned out to
>> be a JBD2 problem finally.
>> The freezer/pm_test was helpful to hit the issue (Thanks Rafael for the hint!).
>>
>> So, the issue has two faces: TTY and JBD2.
>> [4] gives a list and URLs of the patches I had to apply to have a
>> clean system again.
>>
>> After applying the two TTY patches (without Eric's JBD2-fix!) the
>> call-trace after freezer/pm_test looked like this;
>>
>> [ 433.527986] PM: Syncing filesystems ... done.
>> [ 433.843761] PM: Preparing system for mem sleep
>> [ 436.306002] Freezing user space processes ...
>> [ 456.304956] Freezing of tasks failed after 20.01 seconds (1 tasks
>> refusing to freeze, wq_busy=0):
>> [ 456.305060] Cache I/O D ffffffff8180d780 0 2132 1 0x00000004
>> [ 456.305065] ffff88007b9dfe18 0000000000000046 ffff88007b9dfdc8
>> 0000000300000001
>> [ 456.305069] ffff880097e21720 ffff88007b9dffd8 ffff88007b9dffd8
>> ffff88007b9dffd8
>> [ 456.305072] ffff880119b32e40 ffff880097e21720 ffff88007b9dfe28
>> ffff880118077800
>> [ 456.305076] Call Trace:
>> [ 456.305085] [<ffffffff816b7469>] schedule+0x29/0x70
>> [ 456.305089] [<ffffffff8129067d>] jbd2_log_wait_commit+0xcd/0x1a0
>> [ 456.305094] [<ffffffff8107eac0>] ? add_wait_queue+0x60/0x60
>> [ 456.305098] [<ffffffff812307a5>] ext4_sync_file+0x205/0x380
>> [ 456.305103] [<ffffffff811c43ad>] do_fsync+0x5d/0x90
>> [ 456.305108] [<ffffffff8119442b>] ? sys_write+0x6b/0xa0
>> [ 456.305111] [<ffffffff811c4790>] sys_fsync+0x10/0x20
>> [ 456.305114] [<ffffffff816c0edd>] system_call_fastpath+0x1a/0x1f
>> [ 456.305122]
>> [ 456.305124] Restarting tasks ... done.
>> [ 456.315056] video LNXVIDEO:00: Restoring backlight state
>>
>> After applying Eric's patch [5], I could not hit the call-trace again.
>> NOTE: The patch is from December 2012 and is not shipped in latest Linux-Next.
>>
>> The attached testcase script was helpful to force the call-trace.
>> I have run 50 loops of it w/o an issue!
>>
>> Feel free to add a Reported-by/Tested-by.
>> ( The issue kept me busy for the last days. )
>>
>> Regards,
>> - Sedat -
>>
>> [1] http://marc.info/?t=135283664600002&r=1&w=2
>> [2] http://marc.info/?t=135862023700004&r=1&w=2
>> [3] http://marc.info/?l=linux-kernel&m=135862010419101&w=2
>> [4] http://marc.info/?l=linux-next&m=135863550923093&w=2
>> [5] http://patchwork.ozlabs.org/patch/207237/
>
> Hi all,
>
> this is really ugly... I could cry!
>
> GRRR, I have hit the original issue again while running my testcase script!
>
> [ 363.831226] PM: Syncing filesystems ... done.
> [ 363.988855] PM: Preparing system for mem sleep
> [ 369.032965] Freezing user space processes ... (elapsed 0.11 seconds) done.
> [ 369.145792] Freezing remaining freezable tasks ...
> [ 389.137643] Freezing of tasks failed after 20.01 seconds (1 tasks
> refusing to freeze, wq_busy=0):
> [ 389.137760] jbd2/loop0-8 D ffffffff8180d780 0 295 2 0x00000000
> [ 389.137769] ffff8801181dfb68 0000000000000046 ffff880117de5c80
> 0000000000000001
> [ 389.137778] ffff880117de5c80 ffff8801181dffd8 ffff8801181dffd8
> ffff8801181dffd8
> [ 389.137784] ffffffff81c15440 ffff880117de5c80 ffff8801181dfb68
> ffff88011fa14738
> [ 389.137791] Call Trace:
> [ 389.137810] [<ffffffff811c6830>] ? __wait_on_buffer+0x30/0x30
> [ 389.137819] [<ffffffff816b7479>] schedule+0x29/0x70
> [ 389.137825] [<ffffffff816b754f>] io_schedule+0x8f/0xd0
> [ 389.137832] [<ffffffff811c683e>] sleep_on_buffer+0xe/0x20
> [ 389.137842] [<ffffffff816b5d4f>] __wait_on_bit+0x5f/0x90
> [ 389.137849] [<ffffffff811c5ef1>] ? submit_bh+0x121/0x1e0
> [ 389.137856] [<ffffffff811c6830>] ? __wait_on_buffer+0x30/0x30
> [ 389.137864] [<ffffffff816b5dfc>] out_of_line_wait_on_bit+0x7c/0x90
> [ 389.137873] [<ffffffff8107eb00>] ? autoremove_wake_function+0x40/0x40
> [ 389.137879] [<ffffffff811c682e>] __wait_on_buffer+0x2e/0x30
> [ 389.137891] [<ffffffff8128c02c>]
> jbd2_journal_commit_transaction+0x18cc/0x1d60
> [ 389.137899] [<ffffffff816b860e>] ? _raw_spin_lock_irqsave+0x2e/0x40
> [ 389.137908] [<ffffffff81069fbf>] ? try_to_del_timer_sync+0x4f/0x70
> [ 389.137915] [<ffffffff81290836>] kjournald2+0xd6/0x3e0
> [ 389.137921] [<ffffffff8107eac0>] ? add_wait_queue+0x60/0x60
> [ 389.137926] [<ffffffff81290760>] ? commit_timeout+0x10/0x10
> [ 389.137932] [<ffffffff8107ded0>] kthread+0xc0/0xd0
> [ 389.137939] [<ffffffff8107de10>] ? flush_kthread_worker+0xb0/0xb0
> [ 389.137946] [<ffffffff816c0e2c>] ret_from_fork+0x7c/0xb0
> [ 389.137951] [<ffffffff8107de10>] ? flush_kthread_worker+0xb0/0xb0
> [ 389.138017]
> [ 389.138021] Restarting kernel threads ... done.
> [ 389.138173] Restarting tasks ... done.
> [ 389.147980] video LNXVIDEO:00: Restoring backlight state
>
> I suspect there are still issues in JBD2 (BTW I run here EXT4FS).
>
> [ 389.137643] Freezing of tasks failed after 20.01 seconds (1 tasks
> refusing to freeze, wq_busy=0):
> [ 389.137760] jbd2/loop0-8 D ffffffff8180d780 0 295 2 0x00000000
>
> Have a good night, /me -> watching TV!
>
> - Sedat -

Man, what a NONSENSE I am telling...
The mentioned so-called fix is already in next-20130118, shame on me!

commit 7e2fb2d7e6a3094473f101ae33dd6431ae6d2ed1
"jbd: don't wake kjournald unnecessarily"

- Sedat -

[1] http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commitdiff;h=7e2fb2d7e6a3094473f101ae33dd6431ae6d2ed1

2013-01-20 00:55:05

by Sedat Dilek

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

On Sun, Jan 20, 2013 at 1:35 AM, Sedat Dilek <[email protected]> wrote:
> On Sun, Jan 20, 2013 at 1:06 AM, Sedat Dilek <[email protected]> wrote:
>> On Sun, Jan 20, 2013 at 12:44 AM, Sedat Dilek <[email protected]> wrote:
>>> Hi,
>>>
>>> I and some others hit a similiar problem in Linux-Next
>>> (next-20130118), please see [1] and [2].
>>>
>>> [3] has a interim analyze of my problems.
>>>
>>> After suspecting the problem was caused by TTY-NEXT, it turned out to
>>> be a JBD2 problem finally.
>>> The freezer/pm_test was helpful to hit the issue (Thanks Rafael for the hint!).
>>>
>>> So, the issue has two faces: TTY and JBD2.
>>> [4] gives a list and URLs of the patches I had to apply to have a
>>> clean system again.
>>>
>>> After applying the two TTY patches (without Eric's JBD2-fix!) the
>>> call-trace after freezer/pm_test looked like this;
>>>
>>> [ 433.527986] PM: Syncing filesystems ... done.
>>> [ 433.843761] PM: Preparing system for mem sleep
>>> [ 436.306002] Freezing user space processes ...
>>> [ 456.304956] Freezing of tasks failed after 20.01 seconds (1 tasks
>>> refusing to freeze, wq_busy=0):
>>> [ 456.305060] Cache I/O D ffffffff8180d780 0 2132 1 0x00000004
>>> [ 456.305065] ffff88007b9dfe18 0000000000000046 ffff88007b9dfdc8
>>> 0000000300000001
>>> [ 456.305069] ffff880097e21720 ffff88007b9dffd8 ffff88007b9dffd8
>>> ffff88007b9dffd8
>>> [ 456.305072] ffff880119b32e40 ffff880097e21720 ffff88007b9dfe28
>>> ffff880118077800
>>> [ 456.305076] Call Trace:
>>> [ 456.305085] [<ffffffff816b7469>] schedule+0x29/0x70
>>> [ 456.305089] [<ffffffff8129067d>] jbd2_log_wait_commit+0xcd/0x1a0
>>> [ 456.305094] [<ffffffff8107eac0>] ? add_wait_queue+0x60/0x60
>>> [ 456.305098] [<ffffffff812307a5>] ext4_sync_file+0x205/0x380
>>> [ 456.305103] [<ffffffff811c43ad>] do_fsync+0x5d/0x90
>>> [ 456.305108] [<ffffffff8119442b>] ? sys_write+0x6b/0xa0
>>> [ 456.305111] [<ffffffff811c4790>] sys_fsync+0x10/0x20
>>> [ 456.305114] [<ffffffff816c0edd>] system_call_fastpath+0x1a/0x1f
>>> [ 456.305122]
>>> [ 456.305124] Restarting tasks ... done.
>>> [ 456.315056] video LNXVIDEO:00: Restoring backlight state
>>>
>>> After applying Eric's patch [5], I could not hit the call-trace again.
>>> NOTE: The patch is from December 2012 and is not shipped in latest Linux-Next.
>>>
>>> The attached testcase script was helpful to force the call-trace.
>>> I have run 50 loops of it w/o an issue!
>>>
>>> Feel free to add a Reported-by/Tested-by.
>>> ( The issue kept me busy for the last days. )
>>>
>>> Regards,
>>> - Sedat -
>>>
>>> [1] http://marc.info/?t=135283664600002&r=1&w=2
>>> [2] http://marc.info/?t=135862023700004&r=1&w=2
>>> [3] http://marc.info/?l=linux-kernel&m=135862010419101&w=2
>>> [4] http://marc.info/?l=linux-next&m=135863550923093&w=2
>>> [5] http://patchwork.ozlabs.org/patch/207237/
>>
>> Hi all,
>>
>> this is really ugly... I could cry!
>>
>> GRRR, I have hit the original issue again while running my testcase script!
>>
>> [ 363.831226] PM: Syncing filesystems ... done.
>> [ 363.988855] PM: Preparing system for mem sleep
>> [ 369.032965] Freezing user space processes ... (elapsed 0.11 seconds) done.
>> [ 369.145792] Freezing remaining freezable tasks ...
>> [ 389.137643] Freezing of tasks failed after 20.01 seconds (1 tasks
>> refusing to freeze, wq_busy=0):
>> [ 389.137760] jbd2/loop0-8 D ffffffff8180d780 0 295 2 0x00000000
>> [ 389.137769] ffff8801181dfb68 0000000000000046 ffff880117de5c80
>> 0000000000000001
>> [ 389.137778] ffff880117de5c80 ffff8801181dffd8 ffff8801181dffd8
>> ffff8801181dffd8
>> [ 389.137784] ffffffff81c15440 ffff880117de5c80 ffff8801181dfb68
>> ffff88011fa14738
>> [ 389.137791] Call Trace:
>> [ 389.137810] [<ffffffff811c6830>] ? __wait_on_buffer+0x30/0x30
>> [ 389.137819] [<ffffffff816b7479>] schedule+0x29/0x70
>> [ 389.137825] [<ffffffff816b754f>] io_schedule+0x8f/0xd0
>> [ 389.137832] [<ffffffff811c683e>] sleep_on_buffer+0xe/0x20
>> [ 389.137842] [<ffffffff816b5d4f>] __wait_on_bit+0x5f/0x90
>> [ 389.137849] [<ffffffff811c5ef1>] ? submit_bh+0x121/0x1e0
>> [ 389.137856] [<ffffffff811c6830>] ? __wait_on_buffer+0x30/0x30
>> [ 389.137864] [<ffffffff816b5dfc>] out_of_line_wait_on_bit+0x7c/0x90
>> [ 389.137873] [<ffffffff8107eb00>] ? autoremove_wake_function+0x40/0x40
>> [ 389.137879] [<ffffffff811c682e>] __wait_on_buffer+0x2e/0x30
>> [ 389.137891] [<ffffffff8128c02c>]
>> jbd2_journal_commit_transaction+0x18cc/0x1d60
>> [ 389.137899] [<ffffffff816b860e>] ? _raw_spin_lock_irqsave+0x2e/0x40
>> [ 389.137908] [<ffffffff81069fbf>] ? try_to_del_timer_sync+0x4f/0x70
>> [ 389.137915] [<ffffffff81290836>] kjournald2+0xd6/0x3e0
>> [ 389.137921] [<ffffffff8107eac0>] ? add_wait_queue+0x60/0x60
>> [ 389.137926] [<ffffffff81290760>] ? commit_timeout+0x10/0x10
>> [ 389.137932] [<ffffffff8107ded0>] kthread+0xc0/0xd0
>> [ 389.137939] [<ffffffff8107de10>] ? flush_kthread_worker+0xb0/0xb0
>> [ 389.137946] [<ffffffff816c0e2c>] ret_from_fork+0x7c/0xb0
>> [ 389.137951] [<ffffffff8107de10>] ? flush_kthread_worker+0xb0/0xb0
>> [ 389.138017]
>> [ 389.138021] Restarting kernel threads ... done.
>> [ 389.138173] Restarting tasks ... done.
>> [ 389.147980] video LNXVIDEO:00: Restoring backlight state
>>
>> I suspect there are still issues in JBD2 (BTW I run here EXT4FS).
>>
>> [ 389.137643] Freezing of tasks failed after 20.01 seconds (1 tasks
>> refusing to freeze, wq_busy=0):
>> [ 389.137760] jbd2/loop0-8 D ffffffff8180d780 0 295 2 0x00000000
>>
>> Have a good night, /me -> watching TV!
>>
>> - Sedat -
>
> Man, what a NONSENSE I am telling...
> The mentioned so-called fix is already in next-20130118, shame on me!
>
> commit 7e2fb2d7e6a3094473f101ae33dd6431ae6d2ed1
> "jbd: don't wake kjournald unnecessarily"
>

GNARF, SMASH... that's for JBD.
So Eric's JBD2-fix is really NOT in Linux-Next.
More sleep!

- Sedat -

> - Sedat -
>
> [1] http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commitdiff;h=7e2fb2d7e6a3094473f101ae33dd6431ae6d2ed1

2013-01-21 10:47:38

by Jan Kara

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

On Sun 20-01-13 01:06:40, Sedat Dilek wrote:
> On Sun, Jan 20, 2013 at 12:44 AM, Sedat Dilek <[email protected]> wrote:
> > Hi,
> >
> > I and some others hit a similiar problem in Linux-Next
> > (next-20130118), please see [1] and [2].
> >
> > [3] has a interim analyze of my problems.
> >
> > After suspecting the problem was caused by TTY-NEXT, it turned out to
> > be a JBD2 problem finally.
> > The freezer/pm_test was helpful to hit the issue (Thanks Rafael for the hint!).
> >
> > So, the issue has two faces: TTY and JBD2.
> > [4] gives a list and URLs of the patches I had to apply to have a
> > clean system again.
> >
> > After applying the two TTY patches (without Eric's JBD2-fix!) the
> > call-trace after freezer/pm_test looked like this;
> >
> > [ 433.527986] PM: Syncing filesystems ... done.
> > [ 433.843761] PM: Preparing system for mem sleep
> > [ 436.306002] Freezing user space processes ...
> > [ 456.304956] Freezing of tasks failed after 20.01 seconds (1 tasks
> > refusing to freeze, wq_busy=0):
> > [ 456.305060] Cache I/O D ffffffff8180d780 0 2132 1 0x00000004
> > [ 456.305065] ffff88007b9dfe18 0000000000000046 ffff88007b9dfdc8
> > 0000000300000001
> > [ 456.305069] ffff880097e21720 ffff88007b9dffd8 ffff88007b9dffd8
> > ffff88007b9dffd8
> > [ 456.305072] ffff880119b32e40 ffff880097e21720 ffff88007b9dfe28
> > ffff880118077800
> > [ 456.305076] Call Trace:
> > [ 456.305085] [<ffffffff816b7469>] schedule+0x29/0x70
> > [ 456.305089] [<ffffffff8129067d>] jbd2_log_wait_commit+0xcd/0x1a0
> > [ 456.305094] [<ffffffff8107eac0>] ? add_wait_queue+0x60/0x60
> > [ 456.305098] [<ffffffff812307a5>] ext4_sync_file+0x205/0x380
> > [ 456.305103] [<ffffffff811c43ad>] do_fsync+0x5d/0x90
> > [ 456.305108] [<ffffffff8119442b>] ? sys_write+0x6b/0xa0
> > [ 456.305111] [<ffffffff811c4790>] sys_fsync+0x10/0x20
> > [ 456.305114] [<ffffffff816c0edd>] system_call_fastpath+0x1a/0x1f
> > [ 456.305122]
> > [ 456.305124] Restarting tasks ... done.
> > [ 456.315056] video LNXVIDEO:00: Restoring backlight state
> >
> > After applying Eric's patch [5], I could not hit the call-trace again.
> > NOTE: The patch is from December 2012 and is not shipped in latest Linux-Next.
> >
> > The attached testcase script was helpful to force the call-trace.
> > I have run 50 loops of it w/o an issue!
> >
> > Feel free to add a Reported-by/Tested-by.
> > ( The issue kept me busy for the last days. )
> >
> > Regards,
> > - Sedat -
> >
> > [1] http://marc.info/?t=135283664600002&r=1&w=2
> > [2] http://marc.info/?t=135862023700004&r=1&w=2
> > [3] http://marc.info/?l=linux-kernel&m=135862010419101&w=2
> > [4] http://marc.info/?l=linux-next&m=135863550923093&w=2
> > [5] http://patchwork.ozlabs.org/patch/207237/
>
> Hi all,
>
> this is really ugly... I could cry!
>
> GRRR, I have hit the original issue again while running my testcase script!
>
> [ 363.831226] PM: Syncing filesystems ... done.
> [ 363.988855] PM: Preparing system for mem sleep
> [ 369.032965] Freezing user space processes ... (elapsed 0.11 seconds) done.
> [ 369.145792] Freezing remaining freezable tasks ...
> [ 389.137643] Freezing of tasks failed after 20.01 seconds (1 tasks
> refusing to freeze, wq_busy=0):
> [ 389.137760] jbd2/loop0-8 D ffffffff8180d780 0 295 2 0x00000000
> [ 389.137769] ffff8801181dfb68 0000000000000046 ffff880117de5c80
> 0000000000000001
> [ 389.137778] ffff880117de5c80 ffff8801181dffd8 ffff8801181dffd8
> ffff8801181dffd8
> [ 389.137784] ffffffff81c15440 ffff880117de5c80 ffff8801181dfb68
> ffff88011fa14738
> [ 389.137791] Call Trace:
> [ 389.137810] [<ffffffff811c6830>] ? __wait_on_buffer+0x30/0x30
> [ 389.137819] [<ffffffff816b7479>] schedule+0x29/0x70
> [ 389.137825] [<ffffffff816b754f>] io_schedule+0x8f/0xd0
> [ 389.137832] [<ffffffff811c683e>] sleep_on_buffer+0xe/0x20
> [ 389.137842] [<ffffffff816b5d4f>] __wait_on_bit+0x5f/0x90
> [ 389.137849] [<ffffffff811c5ef1>] ? submit_bh+0x121/0x1e0
> [ 389.137856] [<ffffffff811c6830>] ? __wait_on_buffer+0x30/0x30
> [ 389.137864] [<ffffffff816b5dfc>] out_of_line_wait_on_bit+0x7c/0x90
> [ 389.137873] [<ffffffff8107eb00>] ? autoremove_wake_function+0x40/0x40
> [ 389.137879] [<ffffffff811c682e>] __wait_on_buffer+0x2e/0x30
> [ 389.137891] [<ffffffff8128c02c>]
> jbd2_journal_commit_transaction+0x18cc/0x1d60
> [ 389.137899] [<ffffffff816b860e>] ? _raw_spin_lock_irqsave+0x2e/0x40
> [ 389.137908] [<ffffffff81069fbf>] ? try_to_del_timer_sync+0x4f/0x70
> [ 389.137915] [<ffffffff81290836>] kjournald2+0xd6/0x3e0
> [ 389.137921] [<ffffffff8107eac0>] ? add_wait_queue+0x60/0x60
> [ 389.137926] [<ffffffff81290760>] ? commit_timeout+0x10/0x10
> [ 389.137932] [<ffffffff8107ded0>] kthread+0xc0/0xd0
> [ 389.137939] [<ffffffff8107de10>] ? flush_kthread_worker+0xb0/0xb0
> [ 389.137946] [<ffffffff816c0e2c>] ret_from_fork+0x7c/0xb0
> [ 389.137951] [<ffffffff8107de10>] ? flush_kthread_worker+0xb0/0xb0
> [ 389.138017]
> [ 389.138021] Restarting kernel threads ... done.
> [ 389.138173] Restarting tasks ... done.
> [ 389.147980] video LNXVIDEO:00: Restoring backlight state
>
> I suspect there are still issues in JBD2 (BTW I run here EXT4FS).
>
> [ 389.137643] Freezing of tasks failed after 20.01 seconds (1 tasks
> refusing to freeze, wq_busy=0):
> [ 389.137760] jbd2/loop0-8 D ffffffff8180d780 0 295 2 0x00000000
>
> Have a good night, /me -> watching TV!
The traces don't suggest an ext4/jbd2 problem. What is happening is that
jbd2 is waiting for IO to finish and that never happens. Seeing that you
loop device I'd think it's some interaction of the loop device and
freezing. Can you reproduce the issue without the loop device (i.e. with
the filesystem directly on e.g. scsi disk)? I suspect the reason is
something like that the backing filesystem is already frozen so filesystem
on top of it cannot write all the data and hangs waiting for IO -> suspend
doesn't happen. Contents of /proc/mounts and losetup -l would help us
understand what's going on.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-01-21 11:40:14

by Sedat Dilek

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

On Mon, Jan 21, 2013 at 11:47 AM, Jan Kara <[email protected]> wrote:
> On Sun 20-01-13 01:06:40, Sedat Dilek wrote:
>> On Sun, Jan 20, 2013 at 12:44 AM, Sedat Dilek <[email protected]> wrote:
>> > Hi,
>> >
>> > I and some others hit a similiar problem in Linux-Next
>> > (next-20130118), please see [1] and [2].
>> >
>> > [3] has a interim analyze of my problems.
>> >
>> > After suspecting the problem was caused by TTY-NEXT, it turned out to
>> > be a JBD2 problem finally.
>> > The freezer/pm_test was helpful to hit the issue (Thanks Rafael for the hint!).
>> >
>> > So, the issue has two faces: TTY and JBD2.
>> > [4] gives a list and URLs of the patches I had to apply to have a
>> > clean system again.
>> >
>> > After applying the two TTY patches (without Eric's JBD2-fix!) the
>> > call-trace after freezer/pm_test looked like this;
>> >
>> > [ 433.527986] PM: Syncing filesystems ... done.
>> > [ 433.843761] PM: Preparing system for mem sleep
>> > [ 436.306002] Freezing user space processes ...
>> > [ 456.304956] Freezing of tasks failed after 20.01 seconds (1 tasks
>> > refusing to freeze, wq_busy=0):
>> > [ 456.305060] Cache I/O D ffffffff8180d780 0 2132 1 0x00000004
>> > [ 456.305065] ffff88007b9dfe18 0000000000000046 ffff88007b9dfdc8
>> > 0000000300000001
>> > [ 456.305069] ffff880097e21720 ffff88007b9dffd8 ffff88007b9dffd8
>> > ffff88007b9dffd8
>> > [ 456.305072] ffff880119b32e40 ffff880097e21720 ffff88007b9dfe28
>> > ffff880118077800
>> > [ 456.305076] Call Trace:
>> > [ 456.305085] [<ffffffff816b7469>] schedule+0x29/0x70
>> > [ 456.305089] [<ffffffff8129067d>] jbd2_log_wait_commit+0xcd/0x1a0
>> > [ 456.305094] [<ffffffff8107eac0>] ? add_wait_queue+0x60/0x60
>> > [ 456.305098] [<ffffffff812307a5>] ext4_sync_file+0x205/0x380
>> > [ 456.305103] [<ffffffff811c43ad>] do_fsync+0x5d/0x90
>> > [ 456.305108] [<ffffffff8119442b>] ? sys_write+0x6b/0xa0
>> > [ 456.305111] [<ffffffff811c4790>] sys_fsync+0x10/0x20
>> > [ 456.305114] [<ffffffff816c0edd>] system_call_fastpath+0x1a/0x1f
>> > [ 456.305122]
>> > [ 456.305124] Restarting tasks ... done.
>> > [ 456.315056] video LNXVIDEO:00: Restoring backlight state
>> >
>> > After applying Eric's patch [5], I could not hit the call-trace again.
>> > NOTE: The patch is from December 2012 and is not shipped in latest Linux-Next.
>> >
>> > The attached testcase script was helpful to force the call-trace.
>> > I have run 50 loops of it w/o an issue!
>> >
>> > Feel free to add a Reported-by/Tested-by.
>> > ( The issue kept me busy for the last days. )
>> >
>> > Regards,
>> > - Sedat -
>> >
>> > [1] http://marc.info/?t=135283664600002&r=1&w=2
>> > [2] http://marc.info/?t=135862023700004&r=1&w=2
>> > [3] http://marc.info/?l=linux-kernel&m=135862010419101&w=2
>> > [4] http://marc.info/?l=linux-next&m=135863550923093&w=2
>> > [5] http://patchwork.ozlabs.org/patch/207237/
>>
>> Hi all,
>>
>> this is really ugly... I could cry!
>>
>> GRRR, I have hit the original issue again while running my testcase script!
>>
>> [ 363.831226] PM: Syncing filesystems ... done.
>> [ 363.988855] PM: Preparing system for mem sleep
>> [ 369.032965] Freezing user space processes ... (elapsed 0.11 seconds) done.
>> [ 369.145792] Freezing remaining freezable tasks ...
>> [ 389.137643] Freezing of tasks failed after 20.01 seconds (1 tasks
>> refusing to freeze, wq_busy=0):
>> [ 389.137760] jbd2/loop0-8 D ffffffff8180d780 0 295 2 0x00000000
>> [ 389.137769] ffff8801181dfb68 0000000000000046 ffff880117de5c80
>> 0000000000000001
>> [ 389.137778] ffff880117de5c80 ffff8801181dffd8 ffff8801181dffd8
>> ffff8801181dffd8
>> [ 389.137784] ffffffff81c15440 ffff880117de5c80 ffff8801181dfb68
>> ffff88011fa14738
>> [ 389.137791] Call Trace:
>> [ 389.137810] [<ffffffff811c6830>] ? __wait_on_buffer+0x30/0x30
>> [ 389.137819] [<ffffffff816b7479>] schedule+0x29/0x70
>> [ 389.137825] [<ffffffff816b754f>] io_schedule+0x8f/0xd0
>> [ 389.137832] [<ffffffff811c683e>] sleep_on_buffer+0xe/0x20
>> [ 389.137842] [<ffffffff816b5d4f>] __wait_on_bit+0x5f/0x90
>> [ 389.137849] [<ffffffff811c5ef1>] ? submit_bh+0x121/0x1e0
>> [ 389.137856] [<ffffffff811c6830>] ? __wait_on_buffer+0x30/0x30
>> [ 389.137864] [<ffffffff816b5dfc>] out_of_line_wait_on_bit+0x7c/0x90
>> [ 389.137873] [<ffffffff8107eb00>] ? autoremove_wake_function+0x40/0x40
>> [ 389.137879] [<ffffffff811c682e>] __wait_on_buffer+0x2e/0x30
>> [ 389.137891] [<ffffffff8128c02c>]
>> jbd2_journal_commit_transaction+0x18cc/0x1d60
>> [ 389.137899] [<ffffffff816b860e>] ? _raw_spin_lock_irqsave+0x2e/0x40
>> [ 389.137908] [<ffffffff81069fbf>] ? try_to_del_timer_sync+0x4f/0x70
>> [ 389.137915] [<ffffffff81290836>] kjournald2+0xd6/0x3e0
>> [ 389.137921] [<ffffffff8107eac0>] ? add_wait_queue+0x60/0x60
>> [ 389.137926] [<ffffffff81290760>] ? commit_timeout+0x10/0x10
>> [ 389.137932] [<ffffffff8107ded0>] kthread+0xc0/0xd0
>> [ 389.137939] [<ffffffff8107de10>] ? flush_kthread_worker+0xb0/0xb0
>> [ 389.137946] [<ffffffff816c0e2c>] ret_from_fork+0x7c/0xb0
>> [ 389.137951] [<ffffffff8107de10>] ? flush_kthread_worker+0xb0/0xb0
>> [ 389.138017]
>> [ 389.138021] Restarting kernel threads ... done.
>> [ 389.138173] Restarting tasks ... done.
>> [ 389.147980] video LNXVIDEO:00: Restoring backlight state
>>
>> I suspect there are still issues in JBD2 (BTW I run here EXT4FS).
>>
>> [ 389.137643] Freezing of tasks failed after 20.01 seconds (1 tasks
>> refusing to freeze, wq_busy=0):
>> [ 389.137760] jbd2/loop0-8 D ffffffff8180d780 0 295 2 0x00000000
>>
>> Have a good night, /me -> watching TV!
> The traces don't suggest an ext4/jbd2 problem. What is happening is that
> jbd2 is waiting for IO to finish and that never happens. Seeing that you
> loop device I'd think it's some interaction of the loop device and
> freezing. Can you reproduce the issue without the loop device (i.e. with
> the filesystem directly on e.g. scsi disk)? I suspect the reason is
> something like that the backing filesystem is already frozen so filesystem
> on top of it cannot write all the data and hangs waiting for IO -> suspend
> doesn't happen. Contents of /proc/mounts and losetup -l would help us
> understand what's going on.
>

As said I am here in a very uncommon WUBI environment means my
Ubuntu/precise rootfs-image lays on the Win7-partition (NTFS).
Your explanation sounds reasonable to me as this line from my attached
testcase causes the troubles.

echo mem > /sys/power/state && sleep 1

So, /sys/ is not writable immediately after freezer ends

I checked again and again my logs and have seen "starving" lines
reported by rtkit-daemon, but did not really get wiser what they want
to tell me. Stopping rtkit-daemon or resetting all or all-known
threads before running my pm_test/freezer did not help, too.

/usr/sbin/rtkitctl --help
rtkitctl [options]

-h, --help Show this help
--version Show version

--reset-known Reset real-time status of known threads
--reset-all Reset real-time status of all threads
--start Start RealtimeKit if it is not running already
-k, --exit Terminate running RealtimeKit daemon

Here are the outputs you wanted with some more (fstab, grub-config) etc.
I have here no -l option for losetup command.


- Sedat -

P.S.: Outputs for Honza...

$ sudo cat /proc/mounts
rootfs / rootfs rw 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
udev /dev devtmpfs rw,relatime,size=1966948k,nr_inodes=491737,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,relatime,size=788076k,mode=755 0 0
/dev/sda2 /host fuseblk
rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other,blksize=4096
0 0
/dev/loop0 / ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
none /sys/fs/fuse/connections fusectl rw,relatime 0 0
none /sys/kernel/debug debugfs rw,relatime 0 0
none /sys/kernel/security securityfs rw,relatime 0 0
none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0
gvfs-fuse-daemon /home/wearefam/.gvfs fuse.gvfs-fuse-daemon
rw,nosuid,nodev,relatime,user_id=1000,group_id=1000 0 0

$ sudo losetup --all --verbose
/dev/loop0: [0802]:17982 (/host/ubuntu/disks/root.disk)

$ sudo losetup --find --verbose
Loop device is /dev/loop1
/dev/loop1

[ /etc/fstab ]
# /etc/fstab: static file system information.

# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).

# <file system> <mount point> <type> <options>
<dump> <pass>
proc /proc proc
nodev,noexec,nosuid 0 0
/host/ubuntu/disks/root.disk / ext4
loop,errors=remount-ro 0 1
/host/ubuntu/disks/swap.disk none swap loop,sw
0 0
- EOF -

[ /boot/grub/grub.cfg ]
...
menuentry 'Ubuntu, mit Linux 3.8.0-rc4-next20130121-1-iniza-generic'
--class ubuntu --class gnu-linux --class gnu --class os {
set gfxpayload=$linux_gfx_mode
insmod part_msdos
insmod ntfs
set root='(hd0,msdos2)'
search --no-floppy --fs-uuid --set=root 001AADA61AAD9964
loopback loop0 /ubuntu/disks/root.disk
set root=(loop0)
linux /boot/vmlinuz-3.8.0-rc4-next20130121-1-iniza-generic
root=UUID=001AADA61AAD9964 loop=/ubuntu/disks/root.disk ro
initrd /boot/initrd.img-3.8.0-rc4-next20130121-1-iniza-generic
}
...

Hope this helps you!

Regards,
- Sedat -

> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR


Attachments:
syslog.txt (57.08 kB)
dmesg_freezer-pm-test-2.diff (9.30 kB)
run_pm-test.sh (1.05 kB)
kern-log.txt (28.58 kB)
Download all attachments

2013-01-21 12:02:21

by Carlos Maiolino

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

On Mon, Jan 21, 2013 at 11:47:33AM +0100, Jan Kara wrote:
> On Sun 20-01-13 01:06:40, Sedat Dilek wrote:
> > On Sun, Jan 20, 2013 at 12:44 AM, Sedat Dilek <[email protected]> wrote:
> > > Hi,
> > >
> > > I and some others hit a similiar problem in Linux-Next
> > > (next-20130118), please see [1] and [2].
> > >
> > > [3] has a interim analyze of my problems.
> > >
> > > After suspecting the problem was caused by TTY-NEXT, it turned out to
> > > be a JBD2 problem finally.
> > > The freezer/pm_test was helpful to hit the issue (Thanks Rafael for the hint!).
> > >
> > > So, the issue has two faces: TTY and JBD2.
> > > [4] gives a list and URLs of the patches I had to apply to have a
> > > clean system again.
> > >
> > > After applying the two TTY patches (without Eric's JBD2-fix!) the
> > > call-trace after freezer/pm_test looked like this;
> > >
> > > [ 433.527986] PM: Syncing filesystems ... done.
> > > [ 433.843761] PM: Preparing system for mem sleep
> > > [ 436.306002] Freezing user space processes ...
> > > [ 456.304956] Freezing of tasks failed after 20.01 seconds (1 tasks
> > > refusing to freeze, wq_busy=0):
> > > [ 456.305060] Cache I/O D ffffffff8180d780 0 2132 1 0x00000004
> > > [ 456.305065] ffff88007b9dfe18 0000000000000046 ffff88007b9dfdc8
> > > 0000000300000001
> > > [ 456.305069] ffff880097e21720 ffff88007b9dffd8 ffff88007b9dffd8
> > > ffff88007b9dffd8
> > > [ 456.305072] ffff880119b32e40 ffff880097e21720 ffff88007b9dfe28
> > > ffff880118077800
> > > [ 456.305076] Call Trace:
> > > [ 456.305085] [<ffffffff816b7469>] schedule+0x29/0x70
> > > [ 456.305089] [<ffffffff8129067d>] jbd2_log_wait_commit+0xcd/0x1a0
> > > [ 456.305094] [<ffffffff8107eac0>] ? add_wait_queue+0x60/0x60
> > > [ 456.305098] [<ffffffff812307a5>] ext4_sync_file+0x205/0x380
> > > [ 456.305103] [<ffffffff811c43ad>] do_fsync+0x5d/0x90
> > > [ 456.305108] [<ffffffff8119442b>] ? sys_write+0x6b/0xa0
> > > [ 456.305111] [<ffffffff811c4790>] sys_fsync+0x10/0x20
> > > [ 456.305114] [<ffffffff816c0edd>] system_call_fastpath+0x1a/0x1f
> > > [ 456.305122]
> > > [ 456.305124] Restarting tasks ... done.
> > > [ 456.315056] video LNXVIDEO:00: Restoring backlight state
> > >
> > > After applying Eric's patch [5], I could not hit the call-trace again.
> > > NOTE: The patch is from December 2012 and is not shipped in latest Linux-Next.
> > >
> > > The attached testcase script was helpful to force the call-trace.
> > > I have run 50 loops of it w/o an issue!
> > >
> > > Feel free to add a Reported-by/Tested-by.
> > > ( The issue kept me busy for the last days. )
> > >
> > > Regards,
> > > - Sedat -
> > >
> > > [1] http://marc.info/?t=135283664600002&r=1&w=2
> > > [2] http://marc.info/?t=135862023700004&r=1&w=2
> > > [3] http://marc.info/?l=linux-kernel&m=135862010419101&w=2
> > > [4] http://marc.info/?l=linux-next&m=135863550923093&w=2
> > > [5] http://patchwork.ozlabs.org/patch/207237/
> >
> > Hi all,
> >
> > this is really ugly... I could cry!
> >
> > GRRR, I have hit the original issue again while running my testcase script!
> >
> > [ 363.831226] PM: Syncing filesystems ... done.
> > [ 363.988855] PM: Preparing system for mem sleep
> > [ 369.032965] Freezing user space processes ... (elapsed 0.11 seconds) done.
> > [ 369.145792] Freezing remaining freezable tasks ...
> > [ 389.137643] Freezing of tasks failed after 20.01 seconds (1 tasks
> > refusing to freeze, wq_busy=0):
> > [ 389.137760] jbd2/loop0-8 D ffffffff8180d780 0 295 2 0x00000000
> > [ 389.137769] ffff8801181dfb68 0000000000000046 ffff880117de5c80
> > 0000000000000001
> > [ 389.137778] ffff880117de5c80 ffff8801181dffd8 ffff8801181dffd8
> > ffff8801181dffd8
> > [ 389.137784] ffffffff81c15440 ffff880117de5c80 ffff8801181dfb68
> > ffff88011fa14738
> > [ 389.137791] Call Trace:
> > [ 389.137810] [<ffffffff811c6830>] ? __wait_on_buffer+0x30/0x30
> > [ 389.137819] [<ffffffff816b7479>] schedule+0x29/0x70
> > [ 389.137825] [<ffffffff816b754f>] io_schedule+0x8f/0xd0
> > [ 389.137832] [<ffffffff811c683e>] sleep_on_buffer+0xe/0x20
> > [ 389.137842] [<ffffffff816b5d4f>] __wait_on_bit+0x5f/0x90
> > [ 389.137849] [<ffffffff811c5ef1>] ? submit_bh+0x121/0x1e0
> > [ 389.137856] [<ffffffff811c6830>] ? __wait_on_buffer+0x30/0x30
> > [ 389.137864] [<ffffffff816b5dfc>] out_of_line_wait_on_bit+0x7c/0x90
> > [ 389.137873] [<ffffffff8107eb00>] ? autoremove_wake_function+0x40/0x40
> > [ 389.137879] [<ffffffff811c682e>] __wait_on_buffer+0x2e/0x30
> > [ 389.137891] [<ffffffff8128c02c>]
> > jbd2_journal_commit_transaction+0x18cc/0x1d60
> > [ 389.137899] [<ffffffff816b860e>] ? _raw_spin_lock_irqsave+0x2e/0x40
> > [ 389.137908] [<ffffffff81069fbf>] ? try_to_del_timer_sync+0x4f/0x70
> > [ 389.137915] [<ffffffff81290836>] kjournald2+0xd6/0x3e0
> > [ 389.137921] [<ffffffff8107eac0>] ? add_wait_queue+0x60/0x60
> > [ 389.137926] [<ffffffff81290760>] ? commit_timeout+0x10/0x10
> > [ 389.137932] [<ffffffff8107ded0>] kthread+0xc0/0xd0
> > [ 389.137939] [<ffffffff8107de10>] ? flush_kthread_worker+0xb0/0xb0
> > [ 389.137946] [<ffffffff816c0e2c>] ret_from_fork+0x7c/0xb0
> > [ 389.137951] [<ffffffff8107de10>] ? flush_kthread_worker+0xb0/0xb0
> > [ 389.138017]
> > [ 389.138021] Restarting kernel threads ... done.
> > [ 389.138173] Restarting tasks ... done.
> > [ 389.147980] video LNXVIDEO:00: Restoring backlight state
> >
> > I suspect there are still issues in JBD2 (BTW I run here EXT4FS).
> >
> > [ 389.137643] Freezing of tasks failed after 20.01 seconds (1 tasks
> > refusing to freeze, wq_busy=0):
> > [ 389.137760] jbd2/loop0-8 D ffffffff8180d780 0 295 2 0x00000000
> >
> > Have a good night, /me -> watching TV!
> The traces don't suggest an ext4/jbd2 problem. What is happening is that
> jbd2 is waiting for IO to finish and that never happens. Seeing that you
> loop device I'd think it's some interaction of the loop device and
> freezing. Can you reproduce the issue without the loop device (i.e. with
> the filesystem directly on e.g. scsi disk)? I suspect the reason is
> something like that the backing filesystem is already frozen so filesystem
> on top of it cannot write all the data and hangs waiting for IO -> suspend
> doesn't happen. Contents of /proc/mounts and losetup -l would help us
> understand what's going on.
^^^ Shouldn't be `losetup -a`?
>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Carlos

2013-01-21 12:30:26

by Sedat Dilek

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

On Mon, Jan 21, 2013 at 12:40 PM, Sedat Dilek <[email protected]> wrote:
> On Mon, Jan 21, 2013 at 11:47 AM, Jan Kara <[email protected]> wrote:
>> On Sun 20-01-13 01:06:40, Sedat Dilek wrote:
>>> On Sun, Jan 20, 2013 at 12:44 AM, Sedat Dilek <[email protected]> wrote:
>>> > Hi,
>>> >
>>> > I and some others hit a similiar problem in Linux-Next
>>> > (next-20130118), please see [1] and [2].
>>> >
>>> > [3] has a interim analyze of my problems.
>>> >
>>> > After suspecting the problem was caused by TTY-NEXT, it turned out to
>>> > be a JBD2 problem finally.
>>> > The freezer/pm_test was helpful to hit the issue (Thanks Rafael for the hint!).
>>> >
>>> > So, the issue has two faces: TTY and JBD2.
>>> > [4] gives a list and URLs of the patches I had to apply to have a
>>> > clean system again.
>>> >
>>> > After applying the two TTY patches (without Eric's JBD2-fix!) the
>>> > call-trace after freezer/pm_test looked like this;
>>> >
>>> > [ 433.527986] PM: Syncing filesystems ... done.
>>> > [ 433.843761] PM: Preparing system for mem sleep
>>> > [ 436.306002] Freezing user space processes ...
>>> > [ 456.304956] Freezing of tasks failed after 20.01 seconds (1 tasks
>>> > refusing to freeze, wq_busy=0):
>>> > [ 456.305060] Cache I/O D ffffffff8180d780 0 2132 1 0x00000004
>>> > [ 456.305065] ffff88007b9dfe18 0000000000000046 ffff88007b9dfdc8
>>> > 0000000300000001
>>> > [ 456.305069] ffff880097e21720 ffff88007b9dffd8 ffff88007b9dffd8
>>> > ffff88007b9dffd8
>>> > [ 456.305072] ffff880119b32e40 ffff880097e21720 ffff88007b9dfe28
>>> > ffff880118077800
>>> > [ 456.305076] Call Trace:
>>> > [ 456.305085] [<ffffffff816b7469>] schedule+0x29/0x70
>>> > [ 456.305089] [<ffffffff8129067d>] jbd2_log_wait_commit+0xcd/0x1a0
>>> > [ 456.305094] [<ffffffff8107eac0>] ? add_wait_queue+0x60/0x60
>>> > [ 456.305098] [<ffffffff812307a5>] ext4_sync_file+0x205/0x380
>>> > [ 456.305103] [<ffffffff811c43ad>] do_fsync+0x5d/0x90
>>> > [ 456.305108] [<ffffffff8119442b>] ? sys_write+0x6b/0xa0
>>> > [ 456.305111] [<ffffffff811c4790>] sys_fsync+0x10/0x20
>>> > [ 456.305114] [<ffffffff816c0edd>] system_call_fastpath+0x1a/0x1f
>>> > [ 456.305122]
>>> > [ 456.305124] Restarting tasks ... done.
>>> > [ 456.315056] video LNXVIDEO:00: Restoring backlight state
>>> >
>>> > After applying Eric's patch [5], I could not hit the call-trace again.
>>> > NOTE: The patch is from December 2012 and is not shipped in latest Linux-Next.
>>> >
>>> > The attached testcase script was helpful to force the call-trace.
>>> > I have run 50 loops of it w/o an issue!
>>> >
>>> > Feel free to add a Reported-by/Tested-by.
>>> > ( The issue kept me busy for the last days. )
>>> >
>>> > Regards,
>>> > - Sedat -
>>> >
>>> > [1] http://marc.info/?t=135283664600002&r=1&w=2
>>> > [2] http://marc.info/?t=135862023700004&r=1&w=2
>>> > [3] http://marc.info/?l=linux-kernel&m=135862010419101&w=2
>>> > [4] http://marc.info/?l=linux-next&m=135863550923093&w=2
>>> > [5] http://patchwork.ozlabs.org/patch/207237/
>>>
>>> Hi all,
>>>
>>> this is really ugly... I could cry!
>>>
>>> GRRR, I have hit the original issue again while running my testcase script!
>>>
>>> [ 363.831226] PM: Syncing filesystems ... done.
>>> [ 363.988855] PM: Preparing system for mem sleep
>>> [ 369.032965] Freezing user space processes ... (elapsed 0.11 seconds) done.
>>> [ 369.145792] Freezing remaining freezable tasks ...
>>> [ 389.137643] Freezing of tasks failed after 20.01 seconds (1 tasks
>>> refusing to freeze, wq_busy=0):
>>> [ 389.137760] jbd2/loop0-8 D ffffffff8180d780 0 295 2 0x00000000
>>> [ 389.137769] ffff8801181dfb68 0000000000000046 ffff880117de5c80
>>> 0000000000000001
>>> [ 389.137778] ffff880117de5c80 ffff8801181dffd8 ffff8801181dffd8
>>> ffff8801181dffd8
>>> [ 389.137784] ffffffff81c15440 ffff880117de5c80 ffff8801181dfb68
>>> ffff88011fa14738
>>> [ 389.137791] Call Trace:
>>> [ 389.137810] [<ffffffff811c6830>] ? __wait_on_buffer+0x30/0x30
>>> [ 389.137819] [<ffffffff816b7479>] schedule+0x29/0x70
>>> [ 389.137825] [<ffffffff816b754f>] io_schedule+0x8f/0xd0
>>> [ 389.137832] [<ffffffff811c683e>] sleep_on_buffer+0xe/0x20
>>> [ 389.137842] [<ffffffff816b5d4f>] __wait_on_bit+0x5f/0x90
>>> [ 389.137849] [<ffffffff811c5ef1>] ? submit_bh+0x121/0x1e0
>>> [ 389.137856] [<ffffffff811c6830>] ? __wait_on_buffer+0x30/0x30
>>> [ 389.137864] [<ffffffff816b5dfc>] out_of_line_wait_on_bit+0x7c/0x90
>>> [ 389.137873] [<ffffffff8107eb00>] ? autoremove_wake_function+0x40/0x40
>>> [ 389.137879] [<ffffffff811c682e>] __wait_on_buffer+0x2e/0x30
>>> [ 389.137891] [<ffffffff8128c02c>]
>>> jbd2_journal_commit_transaction+0x18cc/0x1d60
>>> [ 389.137899] [<ffffffff816b860e>] ? _raw_spin_lock_irqsave+0x2e/0x40
>>> [ 389.137908] [<ffffffff81069fbf>] ? try_to_del_timer_sync+0x4f/0x70
>>> [ 389.137915] [<ffffffff81290836>] kjournald2+0xd6/0x3e0
>>> [ 389.137921] [<ffffffff8107eac0>] ? add_wait_queue+0x60/0x60
>>> [ 389.137926] [<ffffffff81290760>] ? commit_timeout+0x10/0x10
>>> [ 389.137932] [<ffffffff8107ded0>] kthread+0xc0/0xd0
>>> [ 389.137939] [<ffffffff8107de10>] ? flush_kthread_worker+0xb0/0xb0
>>> [ 389.137946] [<ffffffff816c0e2c>] ret_from_fork+0x7c/0xb0
>>> [ 389.137951] [<ffffffff8107de10>] ? flush_kthread_worker+0xb0/0xb0
>>> [ 389.138017]
>>> [ 389.138021] Restarting kernel threads ... done.
>>> [ 389.138173] Restarting tasks ... done.
>>> [ 389.147980] video LNXVIDEO:00: Restoring backlight state
>>>
>>> I suspect there are still issues in JBD2 (BTW I run here EXT4FS).
>>>
>>> [ 389.137643] Freezing of tasks failed after 20.01 seconds (1 tasks
>>> refusing to freeze, wq_busy=0):
>>> [ 389.137760] jbd2/loop0-8 D ffffffff8180d780 0 295 2 0x00000000
>>>
>>> Have a good night, /me -> watching TV!
>> The traces don't suggest an ext4/jbd2 problem. What is happening is that
>> jbd2 is waiting for IO to finish and that never happens. Seeing that you
>> loop device I'd think it's some interaction of the loop device and
>> freezing. Can you reproduce the issue without the loop device (i.e. with
>> the filesystem directly on e.g. scsi disk)? I suspect the reason is
>> something like that the backing filesystem is already frozen so filesystem
>> on top of it cannot write all the data and hangs waiting for IO -> suspend
>> doesn't happen. Contents of /proc/mounts and losetup -l would help us
>> understand what's going on.
>>
>
> As said I am here in a very uncommon WUBI environment means my
> Ubuntu/precise rootfs-image lays on the Win7-partition (NTFS).
> Your explanation sounds reasonable to me as this line from my attached
> testcase causes the troubles.
>
> echo mem > /sys/power/state && sleep 1
>
> So, /sys/ is not writable immediately after freezer ends
>
> I checked again and again my logs and have seen "starving" lines
> reported by rtkit-daemon, but did not really get wiser what they want
> to tell me. Stopping rtkit-daemon or resetting all or all-known
> threads before running my pm_test/freezer did not help, too.
>
> /usr/sbin/rtkitctl --help
> rtkitctl [options]
>
> -h, --help Show this help
> --version Show version
>
> --reset-known Reset real-time status of known threads
> --reset-all Reset real-time status of all threads
> --start Start RealtimeKit if it is not running already
> -k, --exit Terminate running RealtimeKit daemon
>
> Here are the outputs you wanted with some more (fstab, grub-config) etc.
> I have here no -l option for losetup command.
>
>
> - Sedat -
>
> P.S.: Outputs for Honza...
>
> $ sudo cat /proc/mounts
> rootfs / rootfs rw 0 0
> sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
> proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
> udev /dev devtmpfs rw,relatime,size=1966948k,nr_inodes=491737,mode=755 0 0
> devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
> tmpfs /run tmpfs rw,nosuid,relatime,size=788076k,mode=755 0 0
> /dev/sda2 /host fuseblk
> rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other,blksize=4096
> 0 0
> /dev/loop0 / ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
> none /sys/fs/fuse/connections fusectl rw,relatime 0 0
> none /sys/kernel/debug debugfs rw,relatime 0 0
> none /sys/kernel/security securityfs rw,relatime 0 0
> none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
> none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0
> gvfs-fuse-daemon /home/wearefam/.gvfs fuse.gvfs-fuse-daemon
> rw,nosuid,nodev,relatime,user_id=1000,group_id=1000 0 0
>
> $ sudo losetup --all --verbose
> /dev/loop0: [0802]:17982 (/host/ubuntu/disks/root.disk)
>
> $ sudo losetup --find --verbose
> Loop device is /dev/loop1
> /dev/loop1
>
> [ /etc/fstab ]
> # /etc/fstab: static file system information.
>
> # Use 'blkid' to print the universally unique identifier for a
> # device; this may be used with UUID= as a more robust way to name devices
> # that works even if disks are added and removed. See fstab(5).
>
> # <file system> <mount point> <type> <options>
> <dump> <pass>
> proc /proc proc
> nodev,noexec,nosuid 0 0
> /host/ubuntu/disks/root.disk / ext4
> loop,errors=remount-ro 0 1
> /host/ubuntu/disks/swap.disk none swap loop,sw
> 0 0
> - EOF -
>
> [ /boot/grub/grub.cfg ]
> ...
> menuentry 'Ubuntu, mit Linux 3.8.0-rc4-next20130121-1-iniza-generic'
> --class ubuntu --class gnu-linux --class gnu --class os {
> set gfxpayload=$linux_gfx_mode
> insmod part_msdos
> insmod ntfs
> set root='(hd0,msdos2)'
> search --no-floppy --fs-uuid --set=root 001AADA61AAD9964
> loopback loop0 /ubuntu/disks/root.disk
> set root=(loop0)
> linux /boot/vmlinuz-3.8.0-rc4-next20130121-1-iniza-generic
> root=UUID=001AADA61AAD9964 loop=/ubuntu/disks/root.disk ro
> initrd /boot/initrd.img-3.8.0-rc4-next20130121-1-iniza-generic
> }
> ...
>

Here some more useful outputs:

$ LC_ALL=C df -h -T
Filesystem Type Size Used Avail Use% Mounted on
rootfs rootfs 17G 15G 1.5G 92% /
udev devtmpfs 1.9G 12K 1.9G 1% /dev
tmpfs tmpfs 770M 892K 769M 1% /run
/dev/sda2 fuseblk 444G 81G 364G 19% /host
/dev/loop0 ext4 17G 15G 1.5G 92% /
none tmpfs 5.0M 0 5.0M 0% /run/lock
none tmpfs 1.9G 260K 1.9G 1% /run/shm

$ sudo LC_ALL=C fdisk -l /dev/sda

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0xcb9885ab

Device Boot Start End Blocks Id System
/dev/sda1 * 2048 206847 102400 7 HPFS/NTFS/exFAT
/dev/sda2 206848 931299327 465546240 7 HPFS/NTFS/exFAT
/dev/sda3 931299328 976773119 22736896 27 Hidden NTFS WinRE

I am still reflecting on any shitty userspace app is causing all the
trouble, but I have zero clue how to dig that...

- Sedat -

> Hope this helps you!
>
> Regards,
> - Sedat -
>
>> Honza
>> --
>> Jan Kara <[email protected]>
>> SUSE Labs, CR

2013-01-21 14:07:40

by Jan Kara

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

On Mon 21-01-13 13:30:26, Sedat Dilek wrote:
> On Mon, Jan 21, 2013 at 12:40 PM, Sedat Dilek <[email protected]> wrote:
> > On Mon, Jan 21, 2013 at 11:47 AM, Jan Kara <[email protected]> wrote:
> >> The traces don't suggest an ext4/jbd2 problem. What is happening is that
> >> jbd2 is waiting for IO to finish and that never happens. Seeing that you
> >> loop device I'd think it's some interaction of the loop device and
> >> freezing. Can you reproduce the issue without the loop device (i.e. with
> >> the filesystem directly on e.g. scsi disk)? I suspect the reason is
> >> something like that the backing filesystem is already frozen so filesystem
> >> on top of it cannot write all the data and hangs waiting for IO -> suspend
> >> doesn't happen. Contents of /proc/mounts and losetup -l would help us
> >> understand what's going on.
> >>
> >
> > As said I am here in a very uncommon WUBI environment means my
> > Ubuntu/precise rootfs-image lays on the Win7-partition (NTFS).
> > Your explanation sounds reasonable to me as this line from my attached
> > testcase causes the troubles.
> >
> > echo mem > /sys/power/state && sleep 1
> >
> > So, /sys/ is not writable immediately after freezer ends
> >
> > I checked again and again my logs and have seen "starving" lines
> > reported by rtkit-daemon, but did not really get wiser what they want
> > to tell me. Stopping rtkit-daemon or resetting all or all-known
> > threads before running my pm_test/freezer did not help, too.
> >
> > /usr/sbin/rtkitctl --help
> > rtkitctl [options]
> >
> > -h, --help Show this help
> > --version Show version
> >
> > --reset-known Reset real-time status of known threads
> > --reset-all Reset real-time status of all threads
> > --start Start RealtimeKit if it is not running already
> > -k, --exit Terminate running RealtimeKit daemon
> >
> > Here are the outputs you wanted with some more (fstab, grub-config) etc.
> > I have here no -l option for losetup command.
> >
> >
> > - Sedat -
> >
> > P.S.: Outputs for Honza...
> >
> > $ sudo cat /proc/mounts
> > rootfs / rootfs rw 0 0
> > sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
> > proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
> > udev /dev devtmpfs rw,relatime,size=1966948k,nr_inodes=491737,mode=755 0 0
> > devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
> > tmpfs /run tmpfs rw,nosuid,relatime,size=788076k,mode=755 0 0
> > /dev/sda2 /host fuseblk
> > rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other,blksize=4096
> > 0 0
> > /dev/loop0 / ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
> > none /sys/fs/fuse/connections fusectl rw,relatime 0 0
> > none /sys/kernel/debug debugfs rw,relatime 0 0
> > none /sys/kernel/security securityfs rw,relatime 0 0
> > none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
> > none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0
> > gvfs-fuse-daemon /home/wearefam/.gvfs fuse.gvfs-fuse-daemon
> > rw,nosuid,nodev,relatime,user_id=1000,group_id=1000 0 0
> >
> > $ sudo losetup --all --verbose
> > /dev/loop0: [0802]:17982 (/host/ubuntu/disks/root.disk)
> >
> > $ sudo losetup --find --verbose
> > Loop device is /dev/loop1
> > /dev/loop1
> >
> > [ /etc/fstab ]
> > # /etc/fstab: static file system information.
> >
> > # Use 'blkid' to print the universally unique identifier for a
> > # device; this may be used with UUID= as a more robust way to name devices
> > # that works even if disks are added and removed. See fstab(5).
> >
> > # <file system> <mount point> <type> <options>
> > <dump> <pass>
> > proc /proc proc
> > nodev,noexec,nosuid 0 0
> > /host/ubuntu/disks/root.disk / ext4
> > loop,errors=remount-ro 0 1
> > /host/ubuntu/disks/swap.disk none swap loop,sw
> > 0 0
> > - EOF -
> >
> > [ /boot/grub/grub.cfg ]
> > ...
> > menuentry 'Ubuntu, mit Linux 3.8.0-rc4-next20130121-1-iniza-generic'
> > --class ubuntu --class gnu-linux --class gnu --class os {
> > set gfxpayload=$linux_gfx_mode
> > insmod part_msdos
> > insmod ntfs
> > set root='(hd0,msdos2)'
> > search --no-floppy --fs-uuid --set=root 001AADA61AAD9964
> > loopback loop0 /ubuntu/disks/root.disk
> > set root=(loop0)
> > linux /boot/vmlinuz-3.8.0-rc4-next20130121-1-iniza-generic
> > root=UUID=001AADA61AAD9964 loop=/ubuntu/disks/root.disk ro
> > initrd /boot/initrd.img-3.8.0-rc4-next20130121-1-iniza-generic
> > }
> > ...
> >
>
> Here some more useful outputs:
>
> $ LC_ALL=C df -h -T
> Filesystem Type Size Used Avail Use% Mounted on
> rootfs rootfs 17G 15G 1.5G 92% /
> udev devtmpfs 1.9G 12K 1.9G 1% /dev
> tmpfs tmpfs 770M 892K 769M 1% /run
> /dev/sda2 fuseblk 444G 81G 364G 19% /host
> /dev/loop0 ext4 17G 15G 1.5G 92% /
> none tmpfs 5.0M 0 5.0M 0% /run/lock
> none tmpfs 1.9G 260K 1.9G 1% /run/shm
>
> $ sudo LC_ALL=C fdisk -l /dev/sda
>
> Disk /dev/sda: 500.1 GB, 500107862016 bytes
> 255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disk identifier: 0xcb9885ab
>
> Device Boot Start End Blocks Id System
> /dev/sda1 * 2048 206847 102400 7 HPFS/NTFS/exFAT
> /dev/sda2 206848 931299327 465546240 7 HPFS/NTFS/exFAT
> /dev/sda3 931299328 976773119 22736896 27 Hidden NTFS WinRE
>
> I am still reflecting on any shitty userspace app is causing all the
> trouble, but I have zero clue how to dig that...
Well, I think the outputs make it pretty clear. /dev/loop0 is a mounted
image from fuse filesystem. Fuse daemon making filesystem accessible gets
frozen before /dev/loop0 gets fully written out and so jbd2 journal thread
hangs. Maybe Miklos (added to CC) could fill in some details / ideas but I
think the setup like you have never really worked...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-01-21 14:28:44

by Sedat Dilek

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

On Mon, Jan 21, 2013 at 3:07 PM, Jan Kara <[email protected]> wrote:
> On Mon 21-01-13 13:30:26, Sedat Dilek wrote:
>> On Mon, Jan 21, 2013 at 12:40 PM, Sedat Dilek <[email protected]> wrote:
>> > On Mon, Jan 21, 2013 at 11:47 AM, Jan Kara <[email protected]> wrote:
>> >> The traces don't suggest an ext4/jbd2 problem. What is happening is that
>> >> jbd2 is waiting for IO to finish and that never happens. Seeing that you
>> >> loop device I'd think it's some interaction of the loop device and
>> >> freezing. Can you reproduce the issue without the loop device (i.e. with
>> >> the filesystem directly on e.g. scsi disk)? I suspect the reason is
>> >> something like that the backing filesystem is already frozen so filesystem
>> >> on top of it cannot write all the data and hangs waiting for IO -> suspend
>> >> doesn't happen. Contents of /proc/mounts and losetup -l would help us
>> >> understand what's going on.
>> >>
>> >
>> > As said I am here in a very uncommon WUBI environment means my
>> > Ubuntu/precise rootfs-image lays on the Win7-partition (NTFS).
>> > Your explanation sounds reasonable to me as this line from my attached
>> > testcase causes the troubles.
>> >
>> > echo mem > /sys/power/state && sleep 1
>> >
>> > So, /sys/ is not writable immediately after freezer ends
>> >
>> > I checked again and again my logs and have seen "starving" lines
>> > reported by rtkit-daemon, but did not really get wiser what they want
>> > to tell me. Stopping rtkit-daemon or resetting all or all-known
>> > threads before running my pm_test/freezer did not help, too.
>> >
>> > /usr/sbin/rtkitctl --help
>> > rtkitctl [options]
>> >
>> > -h, --help Show this help
>> > --version Show version
>> >
>> > --reset-known Reset real-time status of known threads
>> > --reset-all Reset real-time status of all threads
>> > --start Start RealtimeKit if it is not running already
>> > -k, --exit Terminate running RealtimeKit daemon
>> >
>> > Here are the outputs you wanted with some more (fstab, grub-config) etc.
>> > I have here no -l option for losetup command.
>> >
>> >
>> > - Sedat -
>> >
>> > P.S.: Outputs for Honza...
>> >
>> > $ sudo cat /proc/mounts
>> > rootfs / rootfs rw 0 0
>> > sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
>> > proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
>> > udev /dev devtmpfs rw,relatime,size=1966948k,nr_inodes=491737,mode=755 0 0
>> > devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
>> > tmpfs /run tmpfs rw,nosuid,relatime,size=788076k,mode=755 0 0
>> > /dev/sda2 /host fuseblk
>> > rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other,blksize=4096
>> > 0 0
>> > /dev/loop0 / ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
>> > none /sys/fs/fuse/connections fusectl rw,relatime 0 0
>> > none /sys/kernel/debug debugfs rw,relatime 0 0
>> > none /sys/kernel/security securityfs rw,relatime 0 0
>> > none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
>> > none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0
>> > gvfs-fuse-daemon /home/wearefam/.gvfs fuse.gvfs-fuse-daemon
>> > rw,nosuid,nodev,relatime,user_id=1000,group_id=1000 0 0
>> >
>> > $ sudo losetup --all --verbose
>> > /dev/loop0: [0802]:17982 (/host/ubuntu/disks/root.disk)
>> >
>> > $ sudo losetup --find --verbose
>> > Loop device is /dev/loop1
>> > /dev/loop1
>> >
>> > [ /etc/fstab ]
>> > # /etc/fstab: static file system information.
>> >
>> > # Use 'blkid' to print the universally unique identifier for a
>> > # device; this may be used with UUID= as a more robust way to name devices
>> > # that works even if disks are added and removed. See fstab(5).
>> >
>> > # <file system> <mount point> <type> <options>
>> > <dump> <pass>
>> > proc /proc proc
>> > nodev,noexec,nosuid 0 0
>> > /host/ubuntu/disks/root.disk / ext4
>> > loop,errors=remount-ro 0 1
>> > /host/ubuntu/disks/swap.disk none swap loop,sw
>> > 0 0
>> > - EOF -
>> >
>> > [ /boot/grub/grub.cfg ]
>> > ...
>> > menuentry 'Ubuntu, mit Linux 3.8.0-rc4-next20130121-1-iniza-generic'
>> > --class ubuntu --class gnu-linux --class gnu --class os {
>> > set gfxpayload=$linux_gfx_mode
>> > insmod part_msdos
>> > insmod ntfs
>> > set root='(hd0,msdos2)'
>> > search --no-floppy --fs-uuid --set=root 001AADA61AAD9964
>> > loopback loop0 /ubuntu/disks/root.disk
>> > set root=(loop0)
>> > linux /boot/vmlinuz-3.8.0-rc4-next20130121-1-iniza-generic
>> > root=UUID=001AADA61AAD9964 loop=/ubuntu/disks/root.disk ro
>> > initrd /boot/initrd.img-3.8.0-rc4-next20130121-1-iniza-generic
>> > }
>> > ...
>> >
>>
>> Here some more useful outputs:
>>
>> $ LC_ALL=C df -h -T
>> Filesystem Type Size Used Avail Use% Mounted on
>> rootfs rootfs 17G 15G 1.5G 92% /
>> udev devtmpfs 1.9G 12K 1.9G 1% /dev
>> tmpfs tmpfs 770M 892K 769M 1% /run
>> /dev/sda2 fuseblk 444G 81G 364G 19% /host
>> /dev/loop0 ext4 17G 15G 1.5G 92% /
>> none tmpfs 5.0M 0 5.0M 0% /run/lock
>> none tmpfs 1.9G 260K 1.9G 1% /run/shm
>>
>> $ sudo LC_ALL=C fdisk -l /dev/sda
>>
>> Disk /dev/sda: 500.1 GB, 500107862016 bytes
>> 255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 4096 bytes
>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>> Disk identifier: 0xcb9885ab
>>
>> Device Boot Start End Blocks Id System
>> /dev/sda1 * 2048 206847 102400 7 HPFS/NTFS/exFAT
>> /dev/sda2 206848 931299327 465546240 7 HPFS/NTFS/exFAT
>> /dev/sda3 931299328 976773119 22736896 27 Hidden NTFS WinRE
>>
>> I am still reflecting on any shitty userspace app is causing all the
>> trouble, but I have zero clue how to dig that...
> Well, I think the outputs make it pretty clear. /dev/loop0 is a mounted
> image from fuse filesystem. Fuse daemon making filesystem accessible gets
> frozen before /dev/loop0 gets fully written out and so jbd2 journal thread
> hangs. Maybe Miklos (added to CC) could fill in some details / ideas but I
> think the setup like you have never really worked...
>

Thank you for the explanations.
Before doing some patches for Linux Test Project and replying to
LLVMLinux I was reflecting about FUSE when saw the 'df -h -T' line.
You might be right, that pm_test/freezer could never be OK here.
I wantend to test against Linux v3.8-rc4 to be sure.
Anyway, thanks for sharing your thoughts with me!

Let's see what Miklos says...

Regards,
- Sedat -

> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

2013-01-21 17:39:39

by Miklos Szeredi

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

On Mon, 2013-01-21 at 15:28 +0100, Sedat Dilek wrote:
> On Mon, Jan 21, 2013 at 3:07 PM, Jan Kara <[email protected]> wrote:

> > Well, I think the outputs make it pretty clear. /dev/loop0 is a mounted
> > image from fuse filesystem. Fuse daemon making filesystem accessible gets
> > frozen before /dev/loop0 gets fully written out and so jbd2 journal thread
> > hangs. Maybe Miklos (added to CC) could fill in some details / ideas but I
> > think the setup like you have never really worked...
> >
>
> Thank you for the explanations.
> Before doing some patches for Linux Test Project and replying to
> LLVMLinux I was reflecting about FUSE when saw the 'df -h -T' line.
> You might be right, that pm_test/freezer could never be OK here.
> I wantend to test against Linux v3.8-rc4 to be sure.
> Anyway, thanks for sharing your thoughts with me!

There are several problems with this setup. Loop over fuse has always
been problematic because of interaction with the memory management.
E.g. the kernel is trying to allocate memory and is writing out dirty
pages to the loop device then that write ends up in the userspace
filesystem process which may want to allocate additional memory to
complete the write. This may end up deadlocking the writeout, which is
not good. So loop over fuse (for write) is not a good idea in any case.

One way to get around this limitation is to use mountlo, which does a
loop mount in userspace using UML and fuse. That will probably be
better.

Unfortunately fuse has a bad interaction with the freezer as well. The
reason for that is the freezer wants all syscalls to finish but with a
frozen userspace a fuse daemon will not be able to complete the work
necessary to finish those syscalls. We thought about this long and hard
and there are no easy solutions. We can allow the freezer to proceed
for cases when the syscall is directly waiting for a fuse filesystem
daemon reply. But the syscall may be waiting for the filesystem daemon
indirectly, e.g. sleeping on i_mutex, and that's rather more difficult
to fix. It's not impossible but it's a big project.

To conclude: writable loop over fuse is broken and was always broken,
it should not be used. Other solutions should work better but fuse is
currently not

Thanks,
Miklos

2013-01-21 23:04:33

by Sedat Dilek

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

On Mon, Jan 21, 2013 at 3:07 PM, Jan Kara <[email protected]> wrote:
> On Mon 21-01-13 13:30:26, Sedat Dilek wrote:
>> On Mon, Jan 21, 2013 at 12:40 PM, Sedat Dilek <[email protected]> wrote:
>> > On Mon, Jan 21, 2013 at 11:47 AM, Jan Kara <[email protected]> wrote:
>> >> The traces don't suggest an ext4/jbd2 problem. What is happening is that
>> >> jbd2 is waiting for IO to finish and that never happens. Seeing that you
>> >> loop device I'd think it's some interaction of the loop device and
>> >> freezing. Can you reproduce the issue without the loop device (i.e. with
>> >> the filesystem directly on e.g. scsi disk)? I suspect the reason is
>> >> something like that the backing filesystem is already frozen so filesystem
>> >> on top of it cannot write all the data and hangs waiting for IO -> suspend
>> >> doesn't happen. Contents of /proc/mounts and losetup -l would help us
>> >> understand what's going on.
>> >>
>> >
>> > As said I am here in a very uncommon WUBI environment means my
>> > Ubuntu/precise rootfs-image lays on the Win7-partition (NTFS).
>> > Your explanation sounds reasonable to me as this line from my attached
>> > testcase causes the troubles.
>> >
>> > echo mem > /sys/power/state && sleep 1
>> >
>> > So, /sys/ is not writable immediately after freezer ends
>> >
>> > I checked again and again my logs and have seen "starving" lines
>> > reported by rtkit-daemon, but did not really get wiser what they want
>> > to tell me. Stopping rtkit-daemon or resetting all or all-known
>> > threads before running my pm_test/freezer did not help, too.
>> >
>> > /usr/sbin/rtkitctl --help
>> > rtkitctl [options]
>> >
>> > -h, --help Show this help
>> > --version Show version
>> >
>> > --reset-known Reset real-time status of known threads
>> > --reset-all Reset real-time status of all threads
>> > --start Start RealtimeKit if it is not running already
>> > -k, --exit Terminate running RealtimeKit daemon
>> >
>> > Here are the outputs you wanted with some more (fstab, grub-config) etc.
>> > I have here no -l option for losetup command.
>> >
>> >
>> > - Sedat -
>> >
>> > P.S.: Outputs for Honza...
>> >
>> > $ sudo cat /proc/mounts
>> > rootfs / rootfs rw 0 0
>> > sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
>> > proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
>> > udev /dev devtmpfs rw,relatime,size=1966948k,nr_inodes=491737,mode=755 0 0
>> > devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
>> > tmpfs /run tmpfs rw,nosuid,relatime,size=788076k,mode=755 0 0
>> > /dev/sda2 /host fuseblk
>> > rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other,blksize=4096
>> > 0 0
>> > /dev/loop0 / ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
>> > none /sys/fs/fuse/connections fusectl rw,relatime 0 0
>> > none /sys/kernel/debug debugfs rw,relatime 0 0
>> > none /sys/kernel/security securityfs rw,relatime 0 0
>> > none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
>> > none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0
>> > gvfs-fuse-daemon /home/wearefam/.gvfs fuse.gvfs-fuse-daemon
>> > rw,nosuid,nodev,relatime,user_id=1000,group_id=1000 0 0
>> >
>> > $ sudo losetup --all --verbose
>> > /dev/loop0: [0802]:17982 (/host/ubuntu/disks/root.disk)
>> >
>> > $ sudo losetup --find --verbose
>> > Loop device is /dev/loop1
>> > /dev/loop1
>> >
>> > [ /etc/fstab ]
>> > # /etc/fstab: static file system information.
>> >
>> > # Use 'blkid' to print the universally unique identifier for a
>> > # device; this may be used with UUID= as a more robust way to name devices
>> > # that works even if disks are added and removed. See fstab(5).
>> >
>> > # <file system> <mount point> <type> <options>
>> > <dump> <pass>
>> > proc /proc proc
>> > nodev,noexec,nosuid 0 0
>> > /host/ubuntu/disks/root.disk / ext4
>> > loop,errors=remount-ro 0 1
>> > /host/ubuntu/disks/swap.disk none swap loop,sw
>> > 0 0
>> > - EOF -
>> >
>> > [ /boot/grub/grub.cfg ]
>> > ...
>> > menuentry 'Ubuntu, mit Linux 3.8.0-rc4-next20130121-1-iniza-generic'
>> > --class ubuntu --class gnu-linux --class gnu --class os {
>> > set gfxpayload=$linux_gfx_mode
>> > insmod part_msdos
>> > insmod ntfs
>> > set root='(hd0,msdos2)'
>> > search --no-floppy --fs-uuid --set=root 001AADA61AAD9964
>> > loopback loop0 /ubuntu/disks/root.disk
>> > set root=(loop0)
>> > linux /boot/vmlinuz-3.8.0-rc4-next20130121-1-iniza-generic
>> > root=UUID=001AADA61AAD9964 loop=/ubuntu/disks/root.disk ro
>> > initrd /boot/initrd.img-3.8.0-rc4-next20130121-1-iniza-generic
>> > }
>> > ...
>> >
>>
>> Here some more useful outputs:
>>
>> $ LC_ALL=C df -h -T
>> Filesystem Type Size Used Avail Use% Mounted on
>> rootfs rootfs 17G 15G 1.5G 92% /
>> udev devtmpfs 1.9G 12K 1.9G 1% /dev
>> tmpfs tmpfs 770M 892K 769M 1% /run
>> /dev/sda2 fuseblk 444G 81G 364G 19% /host
>> /dev/loop0 ext4 17G 15G 1.5G 92% /
>> none tmpfs 5.0M 0 5.0M 0% /run/lock
>> none tmpfs 1.9G 260K 1.9G 1% /run/shm
>>
>> $ sudo LC_ALL=C fdisk -l /dev/sda
>>
>> Disk /dev/sda: 500.1 GB, 500107862016 bytes
>> 255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
>> Units = sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 4096 bytes
>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>> Disk identifier: 0xcb9885ab
>>
>> Device Boot Start End Blocks Id System
>> /dev/sda1 * 2048 206847 102400 7 HPFS/NTFS/exFAT
>> /dev/sda2 206848 931299327 465546240 7 HPFS/NTFS/exFAT
>> /dev/sda3 931299328 976773119 22736896 27 Hidden NTFS WinRE
>>
>> I am still reflecting on any shitty userspace app is causing all the
>> trouble, but I have zero clue how to dig that...
> Well, I think the outputs make it pretty clear. /dev/loop0 is a mounted
> image from fuse filesystem. Fuse daemon making filesystem accessible gets
> frozen before /dev/loop0 gets fully written out and so jbd2 journal thread
> hangs. Maybe Miklos (added to CC) could fill in some details / ideas but I
> think the setup like you have never really worked...
>

Beyond the FUSE/LOOP fun, will you apply this patch to your linux-next GIT tree?

Feel free to add...

Tested-by: Sedat Dilek <[email protected]>

A similiar patch for JBD went through your tree into mainline (see [1] and [2]).

Thanks!

- Sedat -


[1] http://git.kernel.org/?p=linux/kernel/git/jack/linux-fs.git;a=shortlog;h=refs/heads/for_next
[2] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=31db720643073571f15eede808486371556f6380
[3] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=7e2fb2d7e6a3094473f101ae33dd6431ae6d2ed1


> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

2013-01-21 23:11:30

by Theodore Ts'o

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

On Tue, Jan 22, 2013 at 12:04:32AM +0100, Sedat Dilek wrote:
>
> Beyond the FUSE/LOOP fun, will you apply this patch to your linux-next GIT tree?
>
> Feel free to add...
>
> Tested-by: Sedat Dilek <[email protected]>
>
> A similiar patch for JBD went through your tree into mainline (see [1] and [2]).

I'm not at all convinced that this patch has anything to do with your
problem. I don't see how it could affect things, and I believe you
mentioned that you saw the problem even with this patch applied? (I'm
not sure; some of your messages which you sent were hard to
understand, and you mentioned something about trying to send messages
when low on sleep :-).

In any case, the reason why I haven't pulled this patch into the ext4
tree is because I was waiting for Eric and some of the performance
team folks at Red Hat to supply some additional information about why
this commit was making a difference in performance for a particular
proprietary, closed source benchmark.

I'm very suspicious about applying patches under the "cargo cult"
school of programming. ("We don't understand why it makes a
difference, but it seems to be good, so bombs away!" :-)

Regards,

- Ted

2013-01-21 23:29:46

by Sedat Dilek

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

On Tue, Jan 22, 2013 at 12:11 AM, Theodore Ts'o <[email protected]> wrote:
> On Tue, Jan 22, 2013 at 12:04:32AM +0100, Sedat Dilek wrote:
>>
>> Beyond the FUSE/LOOP fun, will you apply this patch to your linux-next GIT tree?
>>
>> Feel free to add...
>>
>> Tested-by: Sedat Dilek <[email protected]>
>>
>> A similiar patch for JBD went through your tree into mainline (see [1] and [2]).
>
> I'm not at all convinced that this patch has anything to do with your
> problem. I don't see how it could affect things, and I believe you
> mentioned that you saw the problem even with this patch applied? (I'm
> not sure; some of your messages which you sent were hard to
> understand, and you mentioned something about trying to send messages
> when low on sleep :-).
>

Even with little sleep I know what I have written.
YES, it didn't fix my problem but it also did not hurt my system.
I just run Linux Test Project w/o errors which also means nothing.
( Hmmm, I checked-out xfstests... did I compile it :-)? )

> In any case, the reason why I haven't pulled this patch into the ext4
> tree is because I was waiting for Eric and some of the performance
> team folks at Red Hat to supply some additional information about why
> this commit was making a difference in performance for a particular
> proprietary, closed source benchmark.
>

I am not following the linux-ext4/linux-fsdevel MLs so much, but IIRC
this patch is from December 2012.
An identical patch for JBD (approx. from same date) went upstream,
that's why I asked.
But now I know you care of JBD2 stuff :-).
Of course, I leave it to the experts to decide.
Come on Eric, open the pandora box and bomb Ted with logs and traces :-).

> I'm very suspicious about applying patches under the "cargo cult"
> school of programming. ("We don't understand why it makes a
> difference, but it seems to be good, so bombs away!" :-)
>

A good attitude - a man we can trust!

I would liked to see a more positive development on my issue.
But I still see no real reason to throw this WUBI installation away
hitting a "corner" case.
It was done as a temporary installation and made me happy in my daily
working life.
If I will get unhappy I boot into Win7, oops.

- Sedat -

> Regards,
>
> - Ted

2013-01-22 23:50:44

by Jan Kara

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

On Mon 21-01-13 18:11:30, Ted Tso wrote:
> On Tue, Jan 22, 2013 at 12:04:32AM +0100, Sedat Dilek wrote:
> >
> > Beyond the FUSE/LOOP fun, will you apply this patch to your linux-next GIT tree?
> >
> > Feel free to add...
> >
> > Tested-by: Sedat Dilek <[email protected]>
> >
> > A similiar patch for JBD went through your tree into mainline (see [1] and [2]).
>
> I'm not at all convinced that this patch has anything to do with your
> problem. I don't see how it could affect things, and I believe you
> mentioned that you saw the problem even with this patch applied? (I'm
> not sure; some of your messages which you sent were hard to
> understand, and you mentioned something about trying to send messages
> when low on sleep :-).
>
> In any case, the reason why I haven't pulled this patch into the ext4
> tree is because I was waiting for Eric and some of the performance
> team folks at Red Hat to supply some additional information about why
> this commit was making a difference in performance for a particular
> proprietary, closed source benchmark.
Just a small correction - it was aim7 AFAIK which isn't closed source
(anymore). You can download it from SourceForge
(http://sourceforge.net/projects/aimbench/files/aim-suite7/Initial%20release/).
Now I have some reservations about what the benchmark does but historically
it has found quite a few issues for us as well.

> I'm very suspicious about applying patches under the "cargo cult"
> school of programming. ("We don't understand why it makes a
> difference, but it seems to be good, so bombs away!" :-)
Well, neither am I ;) But it is obvious the patch speeds up
log_start_commit() by 'a bit' (taking spinlock, disabling irqs, ...). And
apparently 'a bit' is noticeable for particular workload on a particular
machine - commit statistics Eric provided showed that clearly. I'd still be
happier if Eric also told us how much log_start_commit() calls there were
so that one could verify that 'a bit' could indeed multiply to a measurable
difference. But given how simple the patch is, I gave away after a while
and just merged it...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-01-23 01:38:33

by Eric Sandeen

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

On 1/22/13 5:50 PM, Jan Kara wrote:
> On Mon 21-01-13 18:11:30, Ted Tso wrote:
>> On Tue, Jan 22, 2013 at 12:04:32AM +0100, Sedat Dilek wrote:
>>>
>>> Beyond the FUSE/LOOP fun, will you apply this patch to your linux-next GIT tree?
>>>
>>> Feel free to add...
>>>
>>> Tested-by: Sedat Dilek <[email protected]>
>>>
>>> A similiar patch for JBD went through your tree into mainline (see [1] and [2]).
>>
>> I'm not at all convinced that this patch has anything to do with your
>> problem. I don't see how it could affect things, and I believe you
>> mentioned that you saw the problem even with this patch applied? (I'm
>> not sure; some of your messages which you sent were hard to
>> understand, and you mentioned something about trying to send messages
>> when low on sleep :-).
>>
>> In any case, the reason why I haven't pulled this patch into the ext4
>> tree is because I was waiting for Eric and some of the performance
>> team folks at Red Hat to supply some additional information about why
>> this commit was making a difference in performance for a particular
>> proprietary, closed source benchmark.
> Just a small correction - it was aim7 AFAIK which isn't closed source
> (anymore). You can download it from SourceForge
> (http://sourceforge.net/projects/aimbench/files/aim-suite7/Initial%20release/).
> Now I have some reservations about what the benchmark does but historically
> it has found quite a few issues for us as well.
>
>> I'm very suspicious about applying patches under the "cargo cult"
>> school of programming. ("We don't understand why it makes a
>> difference, but it seems to be good, so bombs away!" :-)
> Well, neither am I ;) But it is obvious the patch speeds up
> log_start_commit() by 'a bit' (taking spinlock, disabling irqs, ...). And
> apparently 'a bit' is noticeable for particular workload on a particular
> machine - commit statistics Eric provided showed that clearly. I'd still be
> happier if Eric also told us how much log_start_commit() calls there were
> so that one could verify that 'a bit' could indeed multiply to a measurable
> difference. But given how simple the patch is, I gave away after a while
> and just merged it...

I am still trying to get our perf guys to collect that data, FWIW...
I will send it when I get it. I bugged them again today. :)

(Just to be sure: I was going to measure the wakeups the old way, and the
avoided wakeups with the new change; sound ok?)

-Eric

> Honza
>


2013-01-23 09:44:21

by Jan Kara

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

On Tue 22-01-13 19:37:46, Eric Sandeen wrote:
> On 1/22/13 5:50 PM, Jan Kara wrote:
> > On Mon 21-01-13 18:11:30, Ted Tso wrote:
> >> On Tue, Jan 22, 2013 at 12:04:32AM +0100, Sedat Dilek wrote:
> >>>
> >>> Beyond the FUSE/LOOP fun, will you apply this patch to your linux-next GIT tree?
> >>>
> >>> Feel free to add...
> >>>
> >>> Tested-by: Sedat Dilek <[email protected]>
> >>>
> >>> A similiar patch for JBD went through your tree into mainline (see [1] and [2]).
> >>
> >> I'm not at all convinced that this patch has anything to do with your
> >> problem. I don't see how it could affect things, and I believe you
> >> mentioned that you saw the problem even with this patch applied? (I'm
> >> not sure; some of your messages which you sent were hard to
> >> understand, and you mentioned something about trying to send messages
> >> when low on sleep :-).
> >>
> >> In any case, the reason why I haven't pulled this patch into the ext4
> >> tree is because I was waiting for Eric and some of the performance
> >> team folks at Red Hat to supply some additional information about why
> >> this commit was making a difference in performance for a particular
> >> proprietary, closed source benchmark.
> > Just a small correction - it was aim7 AFAIK which isn't closed source
> > (anymore). You can download it from SourceForge
> > (http://sourceforge.net/projects/aimbench/files/aim-suite7/Initial%20release/).
> > Now I have some reservations about what the benchmark does but historically
> > it has found quite a few issues for us as well.
> >
> >> I'm very suspicious about applying patches under the "cargo cult"
> >> school of programming. ("We don't understand why it makes a
> >> difference, but it seems to be good, so bombs away!" :-)
> > Well, neither am I ;) But it is obvious the patch speeds up
> > log_start_commit() by 'a bit' (taking spinlock, disabling irqs, ...). And
> > apparently 'a bit' is noticeable for particular workload on a particular
> > machine - commit statistics Eric provided showed that clearly. I'd still be
> > happier if Eric also told us how much log_start_commit() calls there were
> > so that one could verify that 'a bit' could indeed multiply to a measurable
> > difference. But given how simple the patch is, I gave away after a while
> > and just merged it...
>
> I am still trying to get our perf guys to collect that data, FWIW...
> I will send it when I get it. I bugged them again today. :)
>
> (Just to be sure: I was going to measure the wakeups the old way, and the
> avoided wakeups with the new change; sound ok?)
Yes, that would be what I'm interested in.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-01-23 15:53:24

by Eric Sandeen

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

On 1/23/13 3:44 AM, Jan Kara wrote:
> On Tue 22-01-13 19:37:46, Eric Sandeen wrote:
>> On 1/22/13 5:50 PM, Jan Kara wrote:
>>> On Mon 21-01-13 18:11:30, Ted Tso wrote:
>>>> On Tue, Jan 22, 2013 at 12:04:32AM +0100, Sedat Dilek wrote:
>>>>>
>>>>> Beyond the FUSE/LOOP fun, will you apply this patch to your linux-next GIT tree?
>>>>>
>>>>> Feel free to add...
>>>>>
>>>>> Tested-by: Sedat Dilek <[email protected]>
>>>>>
>>>>> A similiar patch for JBD went through your tree into mainline (see [1] and [2]).
>>>>
>>>> I'm not at all convinced that this patch has anything to do with your
>>>> problem. I don't see how it could affect things, and I believe you
>>>> mentioned that you saw the problem even with this patch applied? (I'm
>>>> not sure; some of your messages which you sent were hard to
>>>> understand, and you mentioned something about trying to send messages
>>>> when low on sleep :-).
>>>>
>>>> In any case, the reason why I haven't pulled this patch into the ext4
>>>> tree is because I was waiting for Eric and some of the performance
>>>> team folks at Red Hat to supply some additional information about why
>>>> this commit was making a difference in performance for a particular
>>>> proprietary, closed source benchmark.
>>> Just a small correction - it was aim7 AFAIK which isn't closed source
>>> (anymore). You can download it from SourceForge
>>> (http://sourceforge.net/projects/aimbench/files/aim-suite7/Initial%20release/).
>>> Now I have some reservations about what the benchmark does but historically
>>> it has found quite a few issues for us as well.
>>>
>>>> I'm very suspicious about applying patches under the "cargo cult"
>>>> school of programming. ("We don't understand why it makes a
>>>> difference, but it seems to be good, so bombs away!" :-)
>>> Well, neither am I ;) But it is obvious the patch speeds up
>>> log_start_commit() by 'a bit' (taking spinlock, disabling irqs, ...). And
>>> apparently 'a bit' is noticeable for particular workload on a particular
>>> machine - commit statistics Eric provided showed that clearly. I'd still be
>>> happier if Eric also told us how much log_start_commit() calls there were
>>> so that one could verify that 'a bit' could indeed multiply to a measurable
>>> difference. But given how simple the patch is, I gave away after a while
>>> and just merged it...
>>
>> I am still trying to get our perf guys to collect that data, FWIW...
>> I will send it when I get it. I bugged them again today. :)
>>
>> (Just to be sure: I was going to measure the wakeups the old way, and the
>> avoided wakeups with the new change; sound ok?)
> Yes, that would be what I'm interested in.

Holy cow, this is much more than I expected, but here's what they report:

old JBD: AIM7 jobs/min 97624.39; got 78193 jbd wakeups
new JBD: AIM7 jobs/min 85929.43; got 6306999 jbd wakeups, 6264684 extra wakeups

The "extra wakeups" were hacked in like:

diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
index d492d57..3e0c4eb 100644
--- a/fs/jbd/journal.c
+++ b/fs/jbd/journal.c
@@ -433,15 +433,25 @@ int __log_space_left(journal_t *journal)
return left;
}

+unsigned long jbd_wakeups;
+unsigned long jbd_extra_wakeups;
+
/*
* Called under j_state_lock. Returns true if a transaction commit was started.
*/
int __log_start_commit(journal_t *journal, tid_t target)
{
/*
- * Are we already doing a recent enough commit?
+ * The only transaction we can possibly wait upon is the
+ * currently running transaction (if it exists). Otherwise,
+ * the target tid must be an old one.
*/
- if (!tid_geq(journal->j_commit_request, target)) {
+ if (/* journal->j_commit_request != target && <--- ERS: Undo "fix" */
+ journal->j_running_transaction &&
+ journal->j_running_transaction->t_tid == target) {
+ /* if we already have the right target, this is extra */
+ if (journal->j_commit_request == target)
+ jbd_extra_wakeups++;
/*
* We want a new commit: OK, mark the request and wakup the
* commit thread. We do _not_ do the commit ourselves.
@@ -451,9 +461,17 @@ int __log_start_commit(journal_t *journal, tid_t target)
jbd_debug(1, "JBD: requesting commit %d/%d\n",
journal->j_commit_request,
journal->j_commit_sequence);
+ jbd_wakeups++;
wake_up(&journal->j_wait_commit);
return 1;
- }
+ } else if (!tid_geq(journal->j_commit_request, target))
+ /* This should never happen, but if it does, preserve
+ the evidence before kjournald goes into a loop and
+ increments j_commit_sequence beyond all recognition. */
+ WARN_ONCE(1, "jbd: bad log_start_commit: %u %u %u %u\n",
+ journal->j_commit_request, journal->j_commit_sequence,
+ target, journal->j_running_transaction ?
+ journal->j_running_transaction->t_tid : 0);
return 0;
}

@@ -2039,6 +2057,7 @@ static void __exit journal_exit(void)
if (n)
printk(KERN_EMERG "JBD: leaked %d journal_heads!\n", n);
#endif
+ printk("got %lu jbd wakeups, %lu extra wakeups\n", jbd_wakeups, jbd_extra_wakeups);
jbd_remove_debugfs_entry();
journal_destroy_caches();
}

-Eric

> Honza
>


2013-01-23 19:29:13

by Jan Kara

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

On Wed 23-01-13 09:20:38, Eric Sandeen wrote:
> On 1/23/13 3:44 AM, Jan Kara wrote:
> > On Tue 22-01-13 19:37:46, Eric Sandeen wrote:
> >> On 1/22/13 5:50 PM, Jan Kara wrote:
> >>> On Mon 21-01-13 18:11:30, Ted Tso wrote:
> >>>> On Tue, Jan 22, 2013 at 12:04:32AM +0100, Sedat Dilek wrote:
> >>>>>
> >>>>> Beyond the FUSE/LOOP fun, will you apply this patch to your linux-next GIT tree?
> >>>>>
> >>>>> Feel free to add...
> >>>>>
> >>>>> Tested-by: Sedat Dilek <[email protected]>
> >>>>>
> >>>>> A similiar patch for JBD went through your tree into mainline (see [1] and [2]).
> >>>>
> >>>> I'm not at all convinced that this patch has anything to do with your
> >>>> problem. I don't see how it could affect things, and I believe you
> >>>> mentioned that you saw the problem even with this patch applied? (I'm
> >>>> not sure; some of your messages which you sent were hard to
> >>>> understand, and you mentioned something about trying to send messages
> >>>> when low on sleep :-).
> >>>>
> >>>> In any case, the reason why I haven't pulled this patch into the ext4
> >>>> tree is because I was waiting for Eric and some of the performance
> >>>> team folks at Red Hat to supply some additional information about why
> >>>> this commit was making a difference in performance for a particular
> >>>> proprietary, closed source benchmark.
> >>> Just a small correction - it was aim7 AFAIK which isn't closed source
> >>> (anymore). You can download it from SourceForge
> >>> (http://sourceforge.net/projects/aimbench/files/aim-suite7/Initial%20release/).
> >>> Now I have some reservations about what the benchmark does but historically
> >>> it has found quite a few issues for us as well.
> >>>
> >>>> I'm very suspicious about applying patches under the "cargo cult"
> >>>> school of programming. ("We don't understand why it makes a
> >>>> difference, but it seems to be good, so bombs away!" :-)
> >>> Well, neither am I ;) But it is obvious the patch speeds up
> >>> log_start_commit() by 'a bit' (taking spinlock, disabling irqs, ...). And
> >>> apparently 'a bit' is noticeable for particular workload on a particular
> >>> machine - commit statistics Eric provided showed that clearly. I'd still be
> >>> happier if Eric also told us how much log_start_commit() calls there were
> >>> so that one could verify that 'a bit' could indeed multiply to a measurable
> >>> difference. But given how simple the patch is, I gave away after a while
> >>> and just merged it...
> >>
> >> I am still trying to get our perf guys to collect that data, FWIW...
> >> I will send it when I get it. I bugged them again today. :)
> >>
> >> (Just to be sure: I was going to measure the wakeups the old way, and the
> >> avoided wakeups with the new change; sound ok?)
> > Yes, that would be what I'm interested in.
>
> Holy cow, this is much more than I expected, but here's what they report:
>
> old JBD: AIM7 jobs/min 97624.39; got 78193 jbd wakeups
> new JBD: AIM7 jobs/min 85929.43; got 6306999 jbd wakeups, 6264684 extra wakeups
Yeah, that's a lot. My guess would be *a lot* of processes are hanging in
start_this_handle() and waiting for transaction commit. Each of them calls
__log_start_commit() and things add up. Thanks for getting these numbers.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-01-30 05:27:05

by Theodore Ts'o

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

On Wed, Jan 23, 2013 at 08:29:11PM +0100, Jan Kara wrote:
> > old JBD: AIM7 jobs/min 97624.39; got 78193 jbd wakeups
> > new JBD: AIM7 jobs/min 85929.43; got 6306999 jbd wakeups, 6264684 extra wakeups
> Yeah, that's a lot. My guess would be *a lot* of processes are hanging in
> start_this_handle() and waiting for transaction commit. Each of them calls
> __log_start_commit() and things add up. Thanks for getting these numbers.

Yeah, wow. That would imply that there are a huge number of processes
that get hung up in start_this_handle(), and they are waking up the
journal before the kjournald has a chance to set T_LOCKED (since then
they would get blocked earlier in start_this_handle).

Given that, I wonder if the following change would actually help or
hurt things. Eric, would you be willing to ask your perf team to try
testing out these patches?

- Ted

2013-01-30 05:39:38

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 1/2] jbd2: don't wake kjournald unnecessarily

Don't send an extra wakeup to kjournald in the case where we
already have the proper target in j_commit_request, i.e. that
transaction has already been requested for commit.

commit deeeaf13 "jbd2: fix fsync() tid wraparound bug" changed
the logic leading to a wakeup, but it caused some extra wakeups
which were found to lead to a measurable performance regression.

Reported-by: Eric Sandeen <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Cc: Eric Sandeen <[email protected]>
Cc: Sedat Dilek <[email protected]>
Cc: [email protected]
---

Eric,

I think this might be clearer than the patch which you proposed; what do
you think?

fs/jbd2/journal.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index dbf41f9..1a80e31 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -513,6 +513,10 @@ int __jbd2_log_space_left(journal_t *journal)
*/
int __jbd2_log_start_commit(journal_t *journal, tid_t target)
{
+ /* Return if the txn has already requested to be committed */
+ if (journal->j_commit_request == target)
+ return 0;
+
/*
* The only transaction we can possibly wait upon is the
* currently running transaction (if it exists). Otherwise,
--
1.7.12.rc0.22.gcdd159b


2013-01-30 05:39:36

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 2/2] jbd2: commit as soon as possible after log_start_commit

Once a transaction has been requested to be committed, don't let any
other handles start under that transaction, and don't allow any
pending transactions to be extended (i.e., in the case of
unlink/ftruncate).

The idea is once the transaction has had log_start_commit() called on
it, at least one thread is blocked waiting for that transaction to
commit, and over time, more and more threads will end up getting
blocked. In order to avoid high variability in file system operations
getting blocked behind the a blocked start_this_handle(), we should
try to get the commit started as soon as possible.

Signed-off-by: "Theodore Ts'o" <[email protected]>
Cc: Eric Sandeen <[email protected]>
Cc: Sedat Dilek <[email protected]>
Cc: [email protected]
---
fs/jbd2/commit.c | 3 ++-
fs/jbd2/journal.c | 1 +
fs/jbd2/transaction.c | 6 ++++--
include/linux/jbd2.h | 1 +
4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 3091d42..fd2ac94 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -424,7 +424,8 @@ void jbd2_journal_commit_transaction(journal_t *journal)
J_ASSERT(journal->j_committing_transaction == NULL);

commit_transaction = journal->j_running_transaction;
- J_ASSERT(commit_transaction->t_state == T_RUNNING);
+ J_ASSERT(commit_transaction->t_state == T_REQUESTED ||
+ commit_transaction->t_state == T_RUNNING);

trace_jbd2_start_commit(journal, commit_transaction);
jbd_debug(1, "JBD2: starting commit of transaction %d\n",
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 1a80e31..c22773b 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -533,6 +533,7 @@ int __jbd2_log_start_commit(journal_t *journal, tid_t target)
jbd_debug(1, "JBD2: requesting commit %d/%d\n",
journal->j_commit_request,
journal->j_commit_sequence);
+ journal->j_running_transaction->t_state = T_REQUESTED;
wake_up(&journal->j_wait_commit);
return 1;
} else if (!tid_geq(journal->j_commit_request, target))
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index df9f297..6daff29 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -224,7 +224,8 @@ repeat:
* If the current transaction is locked down for commit, wait for the
* lock to be released.
*/
- if (transaction->t_state == T_LOCKED) {
+ if ((transaction->t_state == T_LOCKED) ||
+ (transaction->t_state == T_REQUESTED)) {
DEFINE_WAIT(wait);

prepare_to_wait(&journal->j_wait_transaction_locked,
@@ -2179,7 +2180,8 @@ void __jbd2_journal_refile_buffer(struct journal_head *jh)
else
jlist = BJ_Reserved;
__jbd2_journal_file_buffer(jh, jh->b_transaction, jlist);
- J_ASSERT_JH(jh, jh->b_transaction->t_state == T_RUNNING);
+ J_ASSERT_JH(jh, (jh->b_transaction->t_state == T_RUNNING ||
+ jh->b_transaction->t_state == T_REQUESTED));

if (was_dirty)
set_buffer_jbddirty(bh);
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index e30b663..920a8d0 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -493,6 +493,7 @@ struct transaction_s
*/
enum {
T_RUNNING,
+ T_REQUESTED,
T_LOCKED,
T_FLUSH,
T_COMMIT,
--
1.7.12.rc0.22.gcdd159b


2013-01-30 10:07:53

by Jan Kara

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

On Wed 30-01-13 00:26:58, Ted Tso wrote:
> On Wed, Jan 23, 2013 at 08:29:11PM +0100, Jan Kara wrote:
> > > old JBD: AIM7 jobs/min 97624.39; got 78193 jbd wakeups
> > > new JBD: AIM7 jobs/min 85929.43; got 6306999 jbd wakeups, 6264684 extra wakeups
> > Yeah, that's a lot. My guess would be *a lot* of processes are hanging in
> > start_this_handle() and waiting for transaction commit. Each of them calls
> > __log_start_commit() and things add up. Thanks for getting these numbers.
>
> Yeah, wow. That would imply that there are a huge number of processes
> that get hung up in start_this_handle(), and they are waking up the
> journal before the kjournald has a chance to set T_LOCKED (since then
> they would get blocked earlier in start_this_handle).
>
> Given that, I wonder if the following change would actually help or
> hurt things. Eric, would you be willing to ask your perf team to try
> testing out these patches?
Umm, I don't see anything. Forgot to attach them?

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-01-30 10:29:05

by Sedat Dilek

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

On Wed, Jan 30, 2013 at 11:07 AM, Jan Kara <[email protected]> wrote:
> On Wed 30-01-13 00:26:58, Ted Tso wrote:
>> On Wed, Jan 23, 2013 at 08:29:11PM +0100, Jan Kara wrote:
>> > > old JBD: AIM7 jobs/min 97624.39; got 78193 jbd wakeups
>> > > new JBD: AIM7 jobs/min 85929.43; got 6306999 jbd wakeups, 6264684 extra wakeups
>> > Yeah, that's a lot. My guess would be *a lot* of processes are hanging in
>> > start_this_handle() and waiting for transaction commit. Each of them calls
>> > __log_start_commit() and things add up. Thanks for getting these numbers.
>>
>> Yeah, wow. That would imply that there are a huge number of processes
>> that get hung up in start_this_handle(), and they are waking up the
>> journal before the kjournald has a chance to set T_LOCKED (since then
>> they would get blocked earlier in start_this_handle).
>>
>> Given that, I wonder if the following change would actually help or
>> hurt things. Eric, would you be willing to ask your perf team to try
>> testing out these patches?
> Umm, I don't see anything. Forgot to attach them?
>

Here I catched the two patches:

http://patchwork.ozlabs.org/patch/216768/
http://patchwork.ozlabs.org/patch/216767/

- Sedat -
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

2013-01-30 10:41:24

by Jan Kara

[permalink] [raw]
Subject: Re: jbd2: don't wake kjournald unnecessarily

On Wed 30-01-13 11:29:02, Sedat Dilek wrote:
> On Wed, Jan 30, 2013 at 11:07 AM, Jan Kara <[email protected]> wrote:
> > On Wed 30-01-13 00:26:58, Ted Tso wrote:
> >> On Wed, Jan 23, 2013 at 08:29:11PM +0100, Jan Kara wrote:
> >> > > old JBD: AIM7 jobs/min 97624.39; got 78193 jbd wakeups
> >> > > new JBD: AIM7 jobs/min 85929.43; got 6306999 jbd wakeups, 6264684 extra wakeups
> >> > Yeah, that's a lot. My guess would be *a lot* of processes are hanging in
> >> > start_this_handle() and waiting for transaction commit. Each of them calls
> >> > __log_start_commit() and things add up. Thanks for getting these numbers.
> >>
> >> Yeah, wow. That would imply that there are a huge number of processes
> >> that get hung up in start_this_handle(), and they are waking up the
> >> journal before the kjournald has a chance to set T_LOCKED (since then
> >> they would get blocked earlier in start_this_handle).
> >>
> >> Given that, I wonder if the following change would actually help or
> >> hurt things. Eric, would you be willing to ask your perf team to try
> >> testing out these patches?
> > Umm, I don't see anything. Forgot to attach them?
> >
>
> Here I catched the two patches:
>
> http://patchwork.ozlabs.org/patch/216768/
> http://patchwork.ozlabs.org/patch/216767/
Ah, OK. Thanks for the pointer.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2013-02-02 19:33:45

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH 2/2] jbd2: commit as soon as possible after log_start_commit

On 2013-01-29, at 10:39 PM, Theodore Ts'o wrote:
> Once a transaction has been requested to be committed, don't let any
> other handles start under that transaction, and don't allow any
> pending transactions to be extended (i.e., in the case of
> unlink/ftruncate).
>
> The idea is once the transaction has had log_start_commit() called on
> it, at least one thread is blocked waiting for that transaction to
> commit, and over time, more and more threads will end up getting
> blocked. In order to avoid high variability in file system operations
> getting blocked behind the a blocked start_this_handle(), we should
> try to get the commit started as soon as possible.

I was thinking this could break transaction batching from multiple
threads, which would hurt performance significantly in a multi-
threaded sync-heavy workload.

However, it seems that jbd2_journal_stop() blocks the h_sync thread
before it calls jbd2_log_start_commit(), so it looks like it should
be OK.

You can add my:

Acked-by: Andreas Dilger <[email protected]>

>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> Cc: Eric Sandeen <[email protected]>
> Cc: Sedat Dilek <[email protected]>
> Cc: [email protected]
> ---
> fs/jbd2/commit.c | 3 ++-
> fs/jbd2/journal.c | 1 +
> fs/jbd2/transaction.c | 6 ++++--
> include/linux/jbd2.h | 1 +
> 4 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index 3091d42..fd2ac94 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -424,7 +424,8 @@ void jbd2_journal_commit_transaction(journal_t *journal)
> J_ASSERT(journal->j_committing_transaction == NULL);
>
> commit_transaction = journal->j_running_transaction;
> - J_ASSERT(commit_transaction->t_state == T_RUNNING);
> + J_ASSERT(commit_transaction->t_state == T_REQUESTED ||
> + commit_transaction->t_state == T_RUNNING);
>
> trace_jbd2_start_commit(journal, commit_transaction);
> jbd_debug(1, "JBD2: starting commit of transaction %d\n",
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index 1a80e31..c22773b 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -533,6 +533,7 @@ int __jbd2_log_start_commit(journal_t *journal, tid_t target)
> jbd_debug(1, "JBD2: requesting commit %d/%d\n",
> journal->j_commit_request,
> journal->j_commit_sequence);
> + journal->j_running_transaction->t_state = T_REQUESTED;
> wake_up(&journal->j_wait_commit);
> return 1;
> } else if (!tid_geq(journal->j_commit_request, target))
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index df9f297..6daff29 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -224,7 +224,8 @@ repeat:
> * If the current transaction is locked down for commit, wait for the
> * lock to be released.
> */
> - if (transaction->t_state == T_LOCKED) {
> + if ((transaction->t_state == T_LOCKED) ||
> + (transaction->t_state == T_REQUESTED)) {
> DEFINE_WAIT(wait);
>
> prepare_to_wait(&journal->j_wait_transaction_locked,
> @@ -2179,7 +2180,8 @@ void __jbd2_journal_refile_buffer(struct journal_head *jh)
> else
> jlist = BJ_Reserved;
> __jbd2_journal_file_buffer(jh, jh->b_transaction, jlist);
> - J_ASSERT_JH(jh, jh->b_transaction->t_state == T_RUNNING);
> + J_ASSERT_JH(jh, (jh->b_transaction->t_state == T_RUNNING ||
> + jh->b_transaction->t_state == T_REQUESTED));
>
> if (was_dirty)
> set_buffer_jbddirty(bh);
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index e30b663..920a8d0 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -493,6 +493,7 @@ struct transaction_s
> */
> enum {
> T_RUNNING,
> + T_REQUESTED,
> T_LOCKED,
> T_FLUSH,
> T_COMMIT,
> --
> 1.7.12.rc0.22.gcdd159b
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html