From: Jan Kara Subject: Re: process hangs in ext4_sync_file Date: Tue, 29 Oct 2013 15:46:49 +0100 Message-ID: <20131029144649.GB1890@quack.suse.cz> References: <20131023102042.GE1275@quack.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Jan Kara , linux-ext4@vger.kernel.org To: Sandeep Joshi Return-path: Received: from cantor2.suse.de ([195.135.220.15]:60571 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753264Ab3J2Oqy (ORCPT ); Tue, 29 Oct 2013 10:46:54 -0400 Content-Disposition: inline In-Reply-To: Sender: linux-ext4-owner@vger.kernel.org List-ID: On Tue 29-10-13 11:00:25, Sandeep Joshi wrote: > On Wed, Oct 23, 2013 at 8:28 PM, Sandeep Joshi wrote: > > On Wed, Oct 23, 2013 at 3:50 PM, Jan Kara wrote: > > > On Mon 21-10-13 18:09:02, Sandeep Joshi wrote: > > >> I am seeing a problem reported 4 years earlier > > >> https://lkml.org/lkml/2009/3/12/226 > > >> (same stack as seen by Alexander) > > >> > > >> The problem is reproducible. Let me know if you need any info in > > >> addition to that seen below. > > >> > > >> I have multiple threads in a process doing heavy IO on a ext4 > > >> filesystem mounted with (discard, noatime) on a SSD or HDD. > > >> > > >> This is on Linux 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 > > >> 16:19:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux > > >> > > >> For upto minutes at a time, one of the threads seems to hang in sync to > > disk. > > >> > > >> When I check the thread stack in /proc, I find that the stack is one > > >> of the following two > > >> > > >> ] sleep_on_page+0xe/0x20 > > >> [] wait_on_page_bit+0x78/0x80 > > >> [] filemap_fdatawait_range+0x10c/0x1a0 > > >> [] filemap_write_and_wait_range+0x68/0x80 > > >> [] ext4_sync_file+0x6f/0x2b0 > > >> [] vfs_fsync+0x2b/0x40 > > >> [] sys_msync+0x143/0x1d0 > > >> [] system_call_fastpath+0x1a/0x1f > > >> [] 0xffffffffffffffff > > >> > > >> > > >> OR > > >> > > >> > > >> [] jbd2_log_wait_commit+0xb5/0x130 > > >> [] jbd2_complete_transaction+0x53/0x90 > > >> [] ext4_sync_file+0x1ed/0x2b0 > > >> [] vfs_fsync+0x2b/0x40 > > >> [] sys_msync+0x143/0x1d0 > > >> [] system_call_fastpath+0x1a/0x1f > > >> [] 0xffffffffffffffff > > >> > > >> Any clues? > > > We are waiting for IO to complete. As the first thing, try to remount > > > your filesystem without 'discard' mount option. That is often causing > > > problems. > > > > > > Honza > > > > > Update : I removed the "discard" option as suggested and I dont see > processes hanging in ext4_sync_file anymore. I also tried ext2 and no > problems there either. > > So isn't the "discard' option recommended for SSDs? Is this a known > problem with ext4? No, it isn't really recommended for ordinary SSDs. If you have one of those fancy PCIe attached SSDs, 'discard' option might be useful for you but for usual SATA attached ones it's usually a disaster. There you might be better off running 'fstrim' command once a week or something like that. Honza -- Jan Kara SUSE Labs, CR