Received: by 2002:ac0:b08d:0:0:0:0:0 with SMTP id l13csp4079127imc; Sun, 24 Feb 2019 21:10:59 -0800 (PST) X-Google-Smtp-Source: AHgI3IZwXXLoLtdB4jJDQGQdfhMPU1IhsX+wXsWwhsXz7BHO/stXhqcyljdoB9kZGkQDFtAe1Be/ X-Received: by 2002:a17:902:bd43:: with SMTP id b3mr18607059plx.186.1551071459175; Sun, 24 Feb 2019 21:10:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1551071459; cv=none; d=google.com; s=arc-20160816; b=uoqzlUnE5YDnVujmqyocHqN62oBrK5n75TXR6OrHvpyaBEPBWIzubsCM+ppcsMZOTy /+sIUjFKOe2Nlosi4lNbx3bDsyt34rX24fIVOY/JVMuMZ2Re+KXiZlwSZKuJu2DTYScR GO+ogV6CQrIXm8m5Q136dC9HS3pIVpCCUdRhM3ekFi+YF6mvjbPGgOv1IpUsDTEQBNg/ vFRVgtyGmfeRs2SN7D8RMySUXHlnHtm6Ab2oJVz7P566JrY2uHvFuqy5+FmujFAxtDUu yedSdFiUTtF3rBGYfJ5WnBAqAq/OBWBnebPX0xi5fQXzrDNafzLAfqAiMl/3jtTwJbcx i6RA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dmarc-filter:dkim-signature:dkim-signature; bh=WK3S+ucKHTD22bp4hTZsV6p7lYxGS90j26aqhld4tVI=; b=unviaYLB8p34pUdcUGaQ98v6t97oVCVbUuXpryeN+C8EwC2JT+kb5J2IvwmVGVa09a AqTYGKA3FAkKwiFv4NhMVM6DZdaZ5Tckjn2XiYje+wTxAgrSwgSjvKSZ60jaCziDpyJE Ry+C/R3J4Aqy5x768rAITDrdhUTIXf6Cp7J1HPtoQfgHJucx409mCMGemhZTmdTH1eiC 3FLPtEpeVXkx1W8VfBS9PlRloyGdbGMkYqujY8hJI7F3bZnPKIYLmh+/egPz/h6f1p+w xbO2LOA3oCsRymrtHpW6kTElm4hQGq6JpfLc1nyzrmQI1aEANQUehPWNZa/uFIOrasNk CwVw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=W1T6qA0r; dkim=pass header.i=@codeaurora.org header.s=default header.b=GlIhM6pH; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n4si8714264pgd.10.2019.02.24.21.10.43; Sun, 24 Feb 2019 21:10:59 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=W1T6qA0r; dkim=pass header.i=@codeaurora.org header.s=default header.b=GlIhM6pH; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726401AbfBYFKR (ORCPT + 99 others); Mon, 25 Feb 2019 00:10:17 -0500 Received: from smtp.codeaurora.org ([198.145.29.96]:58532 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725923AbfBYFKR (ORCPT ); Mon, 25 Feb 2019 00:10:17 -0500 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id 72DD260DB2; Mon, 25 Feb 2019 05:10:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1551071415; bh=cTxzJoOMo7yxmvD1a7x2XpnTqx10bHqG55OWqo5jx0E=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=W1T6qA0rNNLka6W2RNYDAO2f5lMEXVZnyl9xbjSu5++hblRxnPXjYrZpmYsN3AgTB A1Lw24YIoKlLfHQXIgmYURGvk77Gc8SYDonpX15c9qmJMqrAkl4vaMMqnuMWQCaKOW yjioPtOHSmrD/hXbtXp7iqzr0Be5y17+3VkyZq64= X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on pdx-caf-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=ALL_TRUSTED,BAYES_00, DKIM_INVALID,DKIM_SIGNED autolearn=no autolearn_force=no version=3.4.0 Received: from codeaurora.org (blr-c-bdr-fw-01_globalnat_allzones-outside.qualcomm.com [103.229.19.19]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: stummala@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id 6AF1560ADE; Mon, 25 Feb 2019 05:10:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1551071413; bh=cTxzJoOMo7yxmvD1a7x2XpnTqx10bHqG55OWqo5jx0E=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=GlIhM6pH2onwlY7Gnm0YXfoI2p73tBblA5/WONs6yy1ZttSZzFKE6lEjSa8IVfWhg rYPUB1buuDpB/R+hdhi4sLi6a66bzoFmLJqWJlqqpWiykxkbUy79W+eSXOlLDjGXDN 9u401fV6l5sr0K6ViBq2UJADKeyZVov5W876ze2k= DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 6AF1560ADE Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=stummala@codeaurora.org Date: Mon, 25 Feb 2019 10:40:07 +0530 From: Sahitya Tummala To: Jan Kara Cc: tytso@mit.edu, adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, stummala@codeaurora.org Subject: Re: huge fsync latencies for a small file on ext4 Message-ID: <20190225051007.GA32651@codeaurora.org> References: <20190219135302.GB27420@quack2.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190219135302.GB27420@quack2.suse.cz> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 19, 2019 at 02:53:02PM +0100, Jan Kara wrote: > Hi! > > On Tue 19-02-19 15:50:23, stummala@codeaurora.org wrote: > > I am observing huge fsync latencies for a small file under the below test > > scenario - > > > > process A - > > Issue async write of 4GB using dd command (say large_file) on /data mounted > > with ext4: > > dd if=/dev/zero of=/data/testfile bs=1M count=4096 > > > > process B - > > In parallel another process wrote a small 4KB data to another file > > (say, small_file) and has issued fsync on this file. > > > > Problem - > > The fsync() on 4KB file, is taking upto ~30sec (worst case latency). > > This is tested on an eMMC based device. > > > > Observations - > > This happens when the small_file and large_file both are part of the same > > committing transaction or when the small_file is part of the running > > transaction > > while large_file is part of the committing transaction. > > > > During the commit of a transaction which includes large_file, the jbd2 > > thread > > does journal_finish_inode_data_buffers() by calling > > filemap_fdatawait_keep_errors() on the file's inode address space. While > > this is > > happening, if the writeback thread is running in parallel for the > > large_file, then > > filemap_fdatawait_keep_errors() could potentially run in a loop of all the > > pages (upto 4GB of data) and also wait for all the file's data to be written > > to the disk in the current transaction context itself. At the time > > of calling journal_finish_inode_data_buffers(), the file size is of only > > 150MB. > > and by the time filemap_fdatawait_keep_errors() returns, the file size is > > 4GB > > and the page index also points to 4GB file offset in > > __filemap_fdatawait_range(), indicating that is has scanned and waited for > > writeback > > all the pages upto 4GB and not just 150MB. > > Thanks for the detailed analysis! I'm somewhat surprised that the flusher > is able to submit new batch of pages for writeback faster than > __filemap_fdatawait_range() is scanning the radix tree but it is certainly > a possibility. > > > Ideally, I think the jbd2 thread should have waited for only the amount of > > data > > it has submitted as part of the current transaction and not to wait for the > > on-going pages that are getting tagged for writeback in parallel in another > > context. > > So along these lines, I have tried to use the inode's size at the time of > > calling > > journal_finish_inode_data_buffers() as below - > > One has to be really careful when using i_size like this. By the time the > transaction is committing, i_size could have been reduced from the value at > the time page writeback was issued. And that change will be journalled only > in the following transaction. So if the system crashes in the wrong moment, > user could see uninitialized blocks between new_size and old_size after > journal replay. So I don't think your patch is really correct. > Thanks Jan for the clarification on the patch. I agree with your comments. From that discussion, I think the problem that it is discussing is w.r.t journal thread waiting for on-going active transaction updates to be done and thus causing commit latencies. And I think the proposal is to do not hold any handle while extents are being mapped in ext4_map_blocks() but defer it till IO is completely done. And with the new proposal since the inode will be added to transaction->t_inode_list only after the IO is completed, there will be no longer the need to do journal_finish_inode_data_buffers() in the journal context and thus this problem also will not be observed? Is my understanding correct, please clarify. > Ted has outlined a plan how to get rid of data=ordered limitations [1] and > thus also this problem. It is quite some work but you're certainly welcome > to help out :) > > Honza > > [1] https://www.spinics.net/lists/linux-ext4/msg64175.html > > > > > diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c > > index 2eb55c3..e86cf67 100644 > > --- a/fs/jbd2/commit.c > > +++ b/fs/jbd2/commit.c > > @@ -261,8 +261,8 @@ static int journal_finish_inode_data_buffers(journal_t > > *journal, > > continue; > > jinode->i_flags |= JI_COMMIT_RUNNING; > > spin_unlock(&journal->j_list_lock); > > - err = filemap_fdatawait_keep_errors( > > - jinode->i_vfs_inode->i_mapping); > > + err = > > filemap_fdatawait_range(jinode->i_vfs_inode->i_mapping, > > + 0, > > i_size_read(jinode->i_vfs_inode->i_mapping->host)); > > if (!ret) > > ret = err; > > spin_lock(&journal->j_list_lock); > > > > With this, the fsync latencies for small_file have reduced significantly. > > It took upto max ~5sec (worst case latency). > > > > Although this is seen in a test code, this could potentially impact the > > phone's performance if any application or main UI thread in Android issues > > fsync() in foreground while a large data transfer is going on in another > > context. > > > > Request you to share your thoughts and comments on this issue > > and the fix suggested above. > > > > Thanks, > > Sahitya. > > > > -- > > Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, > > Inc. > > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux > > Foundation Collaborative Project. > -- > Jan Kara > SUSE Labs, CR -- -- Sent by a consultant of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.