From: Sandeep Joshi Subject: Re: process hangs in ext4_sync_file Date: Tue, 29 Oct 2013 20:30:19 +0530 Message-ID: References: <20131023102042.GE1275@quack.suse.cz> <20131029144649.GB1890@quack.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: linux-ext4@vger.kernel.org To: Jan Kara Return-path: Received: from mail-vb0-f41.google.com ([209.85.212.41]:53756 "EHLO mail-vb0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752287Ab3J2PAU (ORCPT ); Tue, 29 Oct 2013 11:00:20 -0400 Received: by mail-vb0-f41.google.com with SMTP id w8so5731406vbj.0 for ; Tue, 29 Oct 2013 08:00:19 -0700 (PDT) In-Reply-To: <20131029144649.GB1890@quack.suse.cz> Sender: linux-ext4-owner@vger.kernel.org List-ID: On Tue, Oct 29, 2013 at 8:16 PM, Jan Kara wrote: > On Tue 29-10-13 11:00:25, Sandeep Joshi wrote: >> On Wed, Oct 23, 2013 at 8:28 PM, Sandeep Joshi wrote: >> > On Wed, Oct 23, 2013 at 3:50 PM, Jan Kara wrote: >> > > On Mon 21-10-13 18:09:02, Sandeep Joshi wrote: >> > >> I am seeing a problem reported 4 years earlier >> > >> https://lkml.org/lkml/2009/3/12/226 >> > >> (same stack as seen by Alexander) >> > >> >> > >> The problem is reproducible. Let me know if you need any info in >> > >> addition to that seen below. >> > >> >> > >> I have multiple threads in a process doing heavy IO on a ext4 >> > >> filesystem mounted with (discard, noatime) on a SSD or HDD. >> > >> >> > >> This is on Linux 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 >> > >> 16:19:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux >> > >> >> > >> For upto minutes at a time, one of the threads seems to hang in sync to >> > disk. >> > >> >> > >> When I check the thread stack in /proc, I find that the stack is one >> > >> of the following two >> > >> >> > >> ] sleep_on_page+0xe/0x20 >> > >> [] wait_on_page_bit+0x78/0x80 >> > >> [] filemap_fdatawait_range+0x10c/0x1a0 >> > >> [] filemap_write_and_wait_range+0x68/0x80 >> > >> [] ext4_sync_file+0x6f/0x2b0 >> > >> [] vfs_fsync+0x2b/0x40 >> > >> [] sys_msync+0x143/0x1d0 >> > >> [] system_call_fastpath+0x1a/0x1f >> > >> [] 0xffffffffffffffff >> > >> >> > >> >> > >> OR >> > >> >> > >> >> > >> [] jbd2_log_wait_commit+0xb5/0x130 >> > >> [] jbd2_complete_transaction+0x53/0x90 >> > >> [] ext4_sync_file+0x1ed/0x2b0 >> > >> [] vfs_fsync+0x2b/0x40 >> > >> [] sys_msync+0x143/0x1d0 >> > >> [] system_call_fastpath+0x1a/0x1f >> > >> [] 0xffffffffffffffff >> > >> >> > >> Any clues? >> > > We are waiting for IO to complete. As the first thing, try to remount >> > > your filesystem without 'discard' mount option. That is often causing >> > > problems. >> > > >> > > Honza >> > >> >> >> Update : I removed the "discard" option as suggested and I dont see >> processes hanging in ext4_sync_file anymore. I also tried ext2 and no >> problems there either. >> >> So isn't the "discard' option recommended for SSDs? Is this a known >> problem with ext4? > No, it isn't really recommended for ordinary SSDs. If you have one of > those fancy PCIe attached SSDs, 'discard' option might be useful for you > but for usual SATA attached ones it's usually a disaster. There you might > be better off running 'fstrim' command once a week or something like that. > > Honza Could you briefly point out what problematic code paths come into play when the "discard" option is enabled? I want to read the code to understand the problem better. --Sandeep > -- > Jan Kara > SUSE Labs, CR