Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp4707420rwb; Tue, 20 Sep 2022 19:09:48 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5X8LeU+44Ws9Rkbco3WbSGuQRM9pzpU8zxecKlcHaai02MtEvs0qnpnusWlzPf5mesSqaR X-Received: by 2002:a17:902:dac4:b0:178:3037:680a with SMTP id q4-20020a170902dac400b001783037680amr2571231plx.37.1663726188614; Tue, 20 Sep 2022 19:09:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663726188; cv=none; d=google.com; s=arc-20160816; b=kK5oSSzecbjb/lHlOZmhgY9ttbEM+sl2oSmozRZL61V8JJwQnZwW0qsXoxS7YibLef PRxsq1VqoNsZrFIl1hYHdw/+76JmFolLAsDZvOxRJN2z1HJ0FOWTj0XRz9IEEIF/MlUI 8rE3j5R7EYi86gmb3BGadQfc3+sz+GncP6dEArhn622qHpITD7PaY2EWCxsKj/hOC1Qk KJ/7tdc81ev+ENLP6wBZuVcnwcpp2Wqqq+tYgFZyIh253xHaGqPW4H/vU0Z8rvyPEqoz D/XhFwaQ7juuTPrcvKBwuIpiZFUDxfKW+cCi4UQoUCC013pQwco844TH/EBwODgPlWun 3/Cw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=3ohc6YQiQrAmEO861AamUK/Zf3us3MNirXCnBSsXsbs=; b=riCfAH+RNUCS/MHoL03xPPS35Z+UTjirgcQ4JPNYEtVhBh5Jo/tfengdQVYBhqjL3A DBjWgr/gOXZXJSTxz70av7O1FgkKmgW5d1KhvmD1GGr2es5MGZmj3Ldm/v0QAP1y4Rsm RPwIVvICv7MBRdSiK7Z/qDLuT87RG0LVzkcs7jQGWhwjMcqqOzqm74eg8o60DiP+ZNaT dxFMfjIVJRMDRpnVP0sjSf+imZj0y4VQOz+EZfXIAfi6DdnaDymNtU8XKHhDXBnLFDGI vFRzU/nEJrcanjIOmZNmHlfZXviX3IUn6oN9K6F7hPBRCn1PT4ZRZX/KqVQAD5V4mhZl +PmQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@mit.edu header.s=outgoing header.b=UwUFSoEl; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=mit.edu Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 196-20020a6300cd000000b0041cb4fd2e4esi1395059pga.793.2022.09.20.19.09.24; Tue, 20 Sep 2022 19:09:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@mit.edu header.s=outgoing header.b=UwUFSoEl; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=mit.edu Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229522AbiIUCH4 (ORCPT + 99 others); Tue, 20 Sep 2022 22:07:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59418 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229496AbiIUCHz (ORCPT ); Tue, 20 Sep 2022 22:07:55 -0400 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D3B3852467; Tue, 20 Sep 2022 19:07:53 -0700 (PDT) Received: from cwcc.thunk.org (pool-173-48-120-46.bstnma.fios.verizon.net [173.48.120.46]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 28L27baS018728 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 20 Sep 2022 22:07:38 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing; t=1663726059; bh=3ohc6YQiQrAmEO861AamUK/Zf3us3MNirXCnBSsXsbs=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=UwUFSoEld8CG3BMMq8pop268EWXXWl1XoEpBqoW1h2Y3/OKaqBVVcY4ZnsdCIzpNg h91S3bupRx2elhJ4ZQ6C0hp9VDHoFtPW2Oy0X6VrfLIVrchUdGneOAqyJXXm29c8nm mgcVntkBtXyVfij06N60RpPVhHasNL+35Jh4Bv4rnkeCF3qB9zV3syjd2LkRR9AHjL MrAyZW7TXxLPZRNiUUW9P5RqmUoaIZjmBUO1ywCFx4IHVhviS6hMIz0MERfAoKKHjx Tg7YAFofyCxkThZCoIF0UJ6MIkPpZgwgtoz4H50gdVn3yXr/Gscphzhe0ClioeRmZV SOVmGH29H2w4w== Received: by cwcc.thunk.org (Postfix, from userid 15806) id 166A815C526C; Tue, 20 Sep 2022 22:07:37 -0400 (EDT) Date: Tue, 20 Sep 2022 22:07:37 -0400 From: "Theodore Ts'o" To: "Mohamed Abuelfotoh, Hazem" Cc: "linux-ext4@vger.kernel.org" , "adilger.kernel@dilger.ca" , "regressions@lists.linux.dev" , "stable@vger.kernel.org" Subject: Re: Ext4: Buffered random writes performance regression with dioread_nolock enabled Message-ID: References: <28460B7B-F66E-4BDC-9F6E-B7E77A3FEE83@amazon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <28460B7B-F66E-4BDC-9F6E-B7E77A3FEE83@amazon.com> X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Mon, Sep 19, 2022 at 03:06:46PM +0000, Mohamed Abuelfotoh, Hazem wrote: > Hey Team, > > > * I am sending this e-mail to report a performance regression that’s caused by commit 244adf6426(ext4: make dioread_nolock the default) , I am listing the performance regression symptoms below & our analysis for the reported regression. Performance regressions are always tricky; dioread_nolock improves on some workloads, and can cause regressions for others. In this particular case, the choice to make it the default was to also fix a direct I/O vs. writeback race which can result in stale data being revealed (which is a security issue). That being said... 1) as you've noted, this commit has been around since 5.6. 2) as you noted, Increasing the journal size from ext4 128 MiB to 1GiB will also fix the problem . Since 2016, the commit bbd2f78cf63a ("libext2fs: allow the default journal size to go as large as a gigabyte") has been in e2fsprogs v1.43.2 and newer (the current version of e2fsprogs v1.46.5; v1.43.2 was released in September 2016, six years ago). Quoting the commit description: Recent research has shown that for a metadata-heavy workload, a 128 MB is journal be a bottleneck on HDD's, and that the optimal journal size is proportional to number of unique metadata blocks that can be modified (and written into the journal) in a 30 second window. One gigabyte should be sufficient for most workloads, which will be used for file systems larger than 128 gigabytes. So this should not be a problem in practice, and if there are users who are using antedeluvian versions of e2fsprogs, or who have old file systems which were created many years ago, it's quite easy to adjust the journal size. For example, to adjust the journal to be 2GiB (2048 MiB), just run the commands: tune2fs -O ^has_journal /dev/sdXX tune2fs -O has_journal -J size=2048 /tmp/sdXX Hence, I disagree that we should revert commit 244adf6426. It may be that for your workload and your file system configuration, using the mount option nodioread_nolock (or dioread_lock), may make sense. But there were also workloads for which using dioread_nolock improved benchmark numbers, so the question of which is the better default is not at all obvious. That being said, I do have plans for a new writeback scheme which will replace dioread_nolock *and* dioread_lock, and which will hopefully be faster than either approach. - Ted P.S. I'm puzzled by your comment, "we have to note that this should be only beneficial with extent-based files" --- while this is true, why does this matter? Unless you're dealing with an ancient file system that was originally created as ext2 or ext3 and then converted to ext4, *all* ext4 files should be extent-based...