From: Carlos Maiolino Subject: Re: jbd2: don't wake kjournald unnecessarily Date: Mon, 21 Jan 2013 07:02:21 -0500 Message-ID: <20130121120221.GA5789@localhost.localdomain> References: <20130121104733.GE5588@quack.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Sedat Dilek , Eric Sandeen , linux-fsdevel , Ext4 Developers List , "Theodore Ts'o" , LKML , linux-next To: Jan Kara Return-path: Content-Disposition: inline In-Reply-To: <20130121104733.GE5588@quack.suse.cz> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org On Mon, Jan 21, 2013 at 11:47:33AM +0100, Jan Kara wrote: > On Sun 20-01-13 01:06:40, Sedat Dilek wrote: > > On Sun, Jan 20, 2013 at 12:44 AM, Sedat Dilek wrote: > > > Hi, > > > > > > I and some others hit a similiar problem in Linux-Next > > > (next-20130118), please see [1] and [2]. > > > > > > [3] has a interim analyze of my problems. > > > > > > After suspecting the problem was caused by TTY-NEXT, it turned out to > > > be a JBD2 problem finally. > > > The freezer/pm_test was helpful to hit the issue (Thanks Rafael for the hint!). > > > > > > So, the issue has two faces: TTY and JBD2. > > > [4] gives a list and URLs of the patches I had to apply to have a > > > clean system again. > > > > > > After applying the two TTY patches (without Eric's JBD2-fix!) the > > > call-trace after freezer/pm_test looked like this; > > > > > > [ 433.527986] PM: Syncing filesystems ... done. > > > [ 433.843761] PM: Preparing system for mem sleep > > > [ 436.306002] Freezing user space processes ... > > > [ 456.304956] Freezing of tasks failed after 20.01 seconds (1 tasks > > > refusing to freeze, wq_busy=0): > > > [ 456.305060] Cache I/O D ffffffff8180d780 0 2132 1 0x00000004 > > > [ 456.305065] ffff88007b9dfe18 0000000000000046 ffff88007b9dfdc8 > > > 0000000300000001 > > > [ 456.305069] ffff880097e21720 ffff88007b9dffd8 ffff88007b9dffd8 > > > ffff88007b9dffd8 > > > [ 456.305072] ffff880119b32e40 ffff880097e21720 ffff88007b9dfe28 > > > ffff880118077800 > > > [ 456.305076] Call Trace: > > > [ 456.305085] [] schedule+0x29/0x70 > > > [ 456.305089] [] jbd2_log_wait_commit+0xcd/0x1a0 > > > [ 456.305094] [] ? add_wait_queue+0x60/0x60 > > > [ 456.305098] [] ext4_sync_file+0x205/0x380 > > > [ 456.305103] [] do_fsync+0x5d/0x90 > > > [ 456.305108] [] ? sys_write+0x6b/0xa0 > > > [ 456.305111] [] sys_fsync+0x10/0x20 > > > [ 456.305114] [] system_call_fastpath+0x1a/0x1f > > > [ 456.305122] > > > [ 456.305124] Restarting tasks ... done. > > > [ 456.315056] video LNXVIDEO:00: Restoring backlight state > > > > > > After applying Eric's patch [5], I could not hit the call-trace again. > > > NOTE: The patch is from December 2012 and is not shipped in latest Linux-Next. > > > > > > The attached testcase script was helpful to force the call-trace. > > > I have run 50 loops of it w/o an issue! > > > > > > Feel free to add a Reported-by/Tested-by. > > > ( The issue kept me busy for the last days. ) > > > > > > Regards, > > > - Sedat - > > > > > > [1] http://marc.info/?t=135283664600002&r=1&w=2 > > > [2] http://marc.info/?t=135862023700004&r=1&w=2 > > > [3] http://marc.info/?l=linux-kernel&m=135862010419101&w=2 > > > [4] http://marc.info/?l=linux-next&m=135863550923093&w=2 > > > [5] http://patchwork.ozlabs.org/patch/207237/ > > > > Hi all, > > > > this is really ugly... I could cry! > > > > GRRR, I have hit the original issue again while running my testcase script! > > > > [ 363.831226] PM: Syncing filesystems ... done. > > [ 363.988855] PM: Preparing system for mem sleep > > [ 369.032965] Freezing user space processes ... (elapsed 0.11 seconds) done. > > [ 369.145792] Freezing remaining freezable tasks ... > > [ 389.137643] Freezing of tasks failed after 20.01 seconds (1 tasks > > refusing to freeze, wq_busy=0): > > [ 389.137760] jbd2/loop0-8 D ffffffff8180d780 0 295 2 0x00000000 > > [ 389.137769] ffff8801181dfb68 0000000000000046 ffff880117de5c80 > > 0000000000000001 > > [ 389.137778] ffff880117de5c80 ffff8801181dffd8 ffff8801181dffd8 > > ffff8801181dffd8 > > [ 389.137784] ffffffff81c15440 ffff880117de5c80 ffff8801181dfb68 > > ffff88011fa14738 > > [ 389.137791] Call Trace: > > [ 389.137810] [] ? __wait_on_buffer+0x30/0x30 > > [ 389.137819] [] schedule+0x29/0x70 > > [ 389.137825] [] io_schedule+0x8f/0xd0 > > [ 389.137832] [] sleep_on_buffer+0xe/0x20 > > [ 389.137842] [] __wait_on_bit+0x5f/0x90 > > [ 389.137849] [] ? submit_bh+0x121/0x1e0 > > [ 389.137856] [] ? __wait_on_buffer+0x30/0x30 > > [ 389.137864] [] out_of_line_wait_on_bit+0x7c/0x90 > > [ 389.137873] [] ? autoremove_wake_function+0x40/0x40 > > [ 389.137879] [] __wait_on_buffer+0x2e/0x30 > > [ 389.137891] [] > > jbd2_journal_commit_transaction+0x18cc/0x1d60 > > [ 389.137899] [] ? _raw_spin_lock_irqsave+0x2e/0x40 > > [ 389.137908] [] ? try_to_del_timer_sync+0x4f/0x70 > > [ 389.137915] [] kjournald2+0xd6/0x3e0 > > [ 389.137921] [] ? add_wait_queue+0x60/0x60 > > [ 389.137926] [] ? commit_timeout+0x10/0x10 > > [ 389.137932] [] kthread+0xc0/0xd0 > > [ 389.137939] [] ? flush_kthread_worker+0xb0/0xb0 > > [ 389.137946] [] ret_from_fork+0x7c/0xb0 > > [ 389.137951] [] ? flush_kthread_worker+0xb0/0xb0 > > [ 389.138017] > > [ 389.138021] Restarting kernel threads ... done. > > [ 389.138173] Restarting tasks ... done. > > [ 389.147980] video LNXVIDEO:00: Restoring backlight state > > > > I suspect there are still issues in JBD2 (BTW I run here EXT4FS). > > > > [ 389.137643] Freezing of tasks failed after 20.01 seconds (1 tasks > > refusing to freeze, wq_busy=0): > > [ 389.137760] jbd2/loop0-8 D ffffffff8180d780 0 295 2 0x00000000 > > > > Have a good night, /me -> watching TV! > The traces don't suggest an ext4/jbd2 problem. What is happening is that > jbd2 is waiting for IO to finish and that never happens. Seeing that you > loop device I'd think it's some interaction of the loop device and > freezing. Can you reproduce the issue without the loop device (i.e. with > the filesystem directly on e.g. scsi disk)? I suspect the reason is > something like that the backing filesystem is already frozen so filesystem > on top of it cannot write all the data and hangs waiting for IO -> suspend > doesn't happen. Contents of /proc/mounts and losetup -l would help us > understand what's going on. ^^^ Shouldn't be `losetup -a`? > > Honza > -- > Jan Kara > SUSE Labs, CR > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Carlos