Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp10307033rwl; Wed, 11 Jan 2023 18:07:47 -0800 (PST) X-Google-Smtp-Source: AMrXdXvZVtcV855o43nCn5gsjivrwUUkjXaQFu9CXj68B1AUY0QkqMbTqIBP+kAf9z4f6f6Hz9qS X-Received: by 2002:a05:6a21:998b:b0:a6:713d:4343 with SMTP id ve11-20020a056a21998b00b000a6713d4343mr119330244pzb.5.1673489267173; Wed, 11 Jan 2023 18:07:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673489267; cv=none; d=google.com; s=arc-20160816; b=VqUG0+pKv4jxGTH9/v/aZ/BpInvjtq5i90pDYLXTBs2Nj1FXEwRvRH7XvgOXwHLQwA 7I+MMGvdpuxrLWAREfqeRTsFmAu48dXdPQoar6GuYnWEPFvGWb/Ur4J2aA4Y/lg3H1px bCVNP2biNqd3OaGUT09WocUaudmqZSn9QeDlbVlCuZMO+mFJ1ljJAv3PZEYgvrUk6Vkz alJHwzHvJ9rMbdQCpm5Ij17U+0hdG4lb0u7xKNw5DfMr4FLLrefhla+7ap8/mwiJOe7Y f/t1P4CnTR7gO9/YkdF4aaTWkKW0e3N2zCE0AeCiYi1zNQnkPzsKxLim/J9TWbQBDaBa MoiA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:subject:from:cc:to :content-language:user-agent:mime-version:date:message-id :dkim-signature; bh=ppTCI7sdSRaJG/C38kwZs+XXJQMXi+iVS8rtTtOcILI=; b=K4s9yWg1SNazkXunKuxWUJHh59jreywaj43yvJrMbPsHzWyafxaKVxlErKGQrxMNRM EMXBa+DXIaZfJjz9sQz6L94NkRtERLKT5Buu4JxA+IAD1L3wn0e+xlQW7kmj0WmjRWM/ Sy8LkIOYQIHMvhNasnKsRBrHwnI4JXKgTTwJlZ5E/S/yRaFJm1ULD15acVd9EXZU0pUv j74Nz970VCWCjwFgowuUwjCaL/N01dzJzWnrmT+2S9j2Rt5AKqngcM1woW+14V/IyR+y Vbz94W4wfeUVmK2mK9PYGyat94CHXRFwIgIKQkTpKE3yL6UVNADfUvbhMEYC3Gmc6q3S tKJw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=vuu3gbbo; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b16-20020a170902d89000b0018946b0b641si15519336plz.371.2023.01.11.18.07.30; Wed, 11 Jan 2023 18:07:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=vuu3gbbo; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235883AbjALCGr (ORCPT + 99 others); Wed, 11 Jan 2023 21:06:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35056 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236111AbjALCGp (ORCPT ); Wed, 11 Jan 2023 21:06:45 -0500 Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 562D5CE0B; Wed, 11 Jan 2023 18:06:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1673489204; x=1705025204; h=message-id:date:mime-version:to:cc:from:subject: content-transfer-encoding; bh=ppTCI7sdSRaJG/C38kwZs+XXJQMXi+iVS8rtTtOcILI=; b=vuu3gbboFUBh1msGknC31YU0IqxVmi1HvIIYn0dJPnFio6KYPtDEDEUJ PUWIoRNbceOCjyXE4pCu5Evdk2DuZYQcCb0umTbdW9cDEkwDCQR44y7rK piRM23P8Yb2XiF2BQriSlMqY8ljdcsk788iENLD21qBd9zjICGAD2Nk0O g=; X-IronPort-AV: E=Sophos;i="5.96,318,1665446400"; d="scan'208";a="299335154" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO email-inbound-relay-iad-1d-m6i4x-d23e07e8.us-east-1.amazon.com) ([10.25.36.210]) by smtp-border-fw-9102.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jan 2023 02:06:43 +0000 Received: from EX13MTAUWC001.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan3.iad.amazon.com [10.40.163.38]) by email-inbound-relay-iad-1d-m6i4x-d23e07e8.us-east-1.amazon.com (Postfix) with ESMTPS id 7CD6A81808; Thu, 12 Jan 2023 02:06:41 +0000 (UTC) Received: from EX19D002UWC004.ant.amazon.com (10.13.138.186) by EX13MTAUWC001.ant.amazon.com (10.43.162.135) with Microsoft SMTP Server (TLS) id 15.0.1497.42; Thu, 12 Jan 2023 02:06:40 +0000 Received: from [192.168.5.53] (10.43.160.120) by EX19D002UWC004.ant.amazon.com (10.13.138.186) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1118.7; Thu, 12 Jan 2023 02:06:40 +0000 Message-ID: <03d52010-06a8-9bff-0565-b698b48850a9@amazon.com> Date: Wed, 11 Jan 2023 18:06:39 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.5.1 Content-Language: en-US To: , , CC: , , From: "Bhatnagar, Rishabh" Subject: EXT4 IOPS degradation in 5.10 compared to 5.4 Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.43.160.120] X-ClientProxiedBy: EX13D46UWC001.ant.amazon.com (10.43.162.126) To EX19D002UWC004.ant.amazon.com (10.13.138.186) X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Hi Theodore/Jan Please ignore the previous mail which isn't formatted well. We have been seeing a consistent 3% degradation in IOPS score between 5.4 and 5.10 stable kernels while running fio tests. I'm running test case on m6g.8xlarge AWS instances using arm64. The test involves: 1. Creating 100GB volume with IO1 500 iops. Attaching it to the instance. 2. Setup and mount fs: mke2fs -m 1 -t ext4 -b 4096 -L /mnt /dev/nvme1n1 mount -t ext4 -o noatime,nodiratime,data=ordered /dev/nvme1n1 /mnt 3. Install fio package and run following test: (running 16 threads doing random buffered 16kb writes on a file. ioengine=psync, runtime=60secs) jobs=16 blocksize="16k" filesize=1000000 if [[ -n $1 ]]; then jobs=$1; fi if [[ -n $2 ]]; then blocksize=$2; fi /usr/bin/fio --name=fio-test --directory=/mnt --rw=randwrite --ioengine=psync --buffered=1 --bs=${blocksize} \         --max-jobs=${jobs} --numjobs=${jobs} --runtime=30 --thread \         --filename=file0 --filesize=${filesize} \         --fsync=1 --group_reporting --create_only=1 > /dev/null sudo echo 1 > /proc/sys/vm/drop_caches set -x echo "Running with jobs=${jobs} filesize=${filesize} blocksize=${blocksize}" /usr/bin/fio --name=fio-test --directory=/mnt --rw=randwrite --ioengine=psync --buffered=1 --bs=${blocksize} \         --max-jobs=${jobs} --numjobs=${jobs} --runtime=60 --thread \         --filename=file0 --filesize=${filesize} \         --fsync=1 --group_reporting --time_based After doing some kernel bisecting between we were able to pinpoint this commit that drops the iops score by 10~15 points (~3%). ext4: avoid unnecessary transaction starts during writeback (6b8ed62008a49751fc71fefd2a4f89202a7c2d4d) We see higher iops/bw/total io after reverting the commit compared to base 5.10 kernel. Although the average clat is higher after reverting the commit the higher bw drives the iops score higher as seen in below fio output. Fio output (5.10.162): write: io=431280KB, bw=7186.3KB/s, iops=449, runt= 60015msec clat (usec): min=6, max=25942, avg=267.76,stdev=1604.25 lat (usec): min=6, max=25943, avg=267.93,stdev=1604.25 clat percentiles (usec): | 1.00th=[ 9], 5.00th=[ 10], 10.00th=[ 16], 20.00th=[ 24] | 30.00th=[ 34], 40.00th=[ 45], 50.00th=[ 58], 60.00th=[ 70], | 70.00th=[ 81], 80.00th=[ 94], 90.00th=[ 107], 95.00th=[ 114], | 99.00th=[10048], 99.50th=[14016], 99.90th=[20096], 99.95th=[21888], | 99.99th=[24448] lat (usec) : 10=3.46%, 20=12.54%, 50=26.66%, 100=41.16%, 250=13.64% lat (usec) : 500=0.02%, 750=0.03%, 1000=0.01% lat (msec) : 2=0.23%, 4=0.50%, 10=0.73%, 20=0.91%, 50=0.12% cpu : usr=0.02%, sys=0.42%, ctx=299540, majf=0, minf=0 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=26955/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: io=431280KB, aggrb=7186KB/s, minb=7186KB/s, maxb=7186KB/s, mint=60015msec, maxt=60015msec Disk stats (read/write): nvme1n1: ios=0/30627, merge=0/2125, ticks=0/410990, in_queue=410990, util=99.94% Fio output (5.10.162 with revert): write: io=441920KB, bw=7363.7KB/s, iops=460, runt= 60014msec clat (usec): min=6, max=35768, avg=289.09, stdev=1736.62 lat (usec): min=6, max=35768, avg=289.28,stdev=1736.62 clat percentiles (usec): | 1.00th=[ 8], 5.00th=[ 10], 10.00th=[ 16], 20.00th=[ 24], | 30.00th=[ 36], 40.00th=[ 46], 50.00th=[ 59], 60.00th=[ 71], | 70.00th=[ 83], 80.00th=[ 97], 90.00th=[ 110], 95.00th=[ 117], | 99.00th=[10048], 99.50th=[14144], 99.90th=[21632], 99.95th=[25984], | 99.99th=[28288] lat (usec) : 10=4.13%, 20=11.67%, 50=26.59%, 100=39.57%, 250=15.28% lat (usec) : 500=0.03%, 750=0.03%, 1000=0.03% lat (msec) : 2=0.20%, 4=0.64%, 10=0.80%, 20=0.86%, 50=0.18% cpu : usr=0.01%, sys=0.43%, ctx=313909, majf=0, minf=0 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=27620/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): WRITE: io=441920KB, aggrb=7363KB/s, minb=7363KB/s, maxb=7363KB/s, mint=60014msec, maxt=60014msec Disk stats (read/write): nvme1n1: ios=0/31549, merge=0/2348, ticks=0/409221, in_queue=409221, util=99.88% Also i looked ext4_writepages latency which increases when the commit is reverted. (This makes sense since the commit avoids unnecessary transactions). ./funclatency ext4_writepages -->(5.10.162) avg = 7734912 nsecs, total: 134131121171 nsecs, count: 17341 ./funclatency ext4_writepages -->(5.10.162 with revert) avg = 9036068 nsecs, total: 168956404886 nsecs, count: 18698 Looking at the journal transaction data I can see that the average transaction commit time decreases after reverting the commit. This probably helps in the IOPS score. 5.10.162: cat /proc/fs/jbd2/nvme1n1-8/info --> After 1st test iteration 2143 transactions (2143 requested), each up to 8192 blocks average: 0ms waiting for transaction 0ms request delay 0ms running transaction 0ms transaction was being locked 0ms flushing data (in ordered mode) 20ms logging transaction 20731us average transaction commit time 51 handles per transaction 1 blocks per transaction 3 logged blocks per transaction cat /proc/fs/jbd2/nvme1n1-8/info/ --> After 2nd test iteration 4292 transactions (4292 requested), each up to 8192 blocks average: 0ms waiting for transaction 0ms request delay 0ms running transaction 0ms transaction was being locked 0ms flushing data (in ordered mode) 20ms logging transaction 26470us average transaction commit time 51 handles per transaction 1 blocks per transaction 3 logged blocks per transaction 5.10.162 with revert: cat /proc/fs/jbd2/nvme1n1-8/info--> After 1st test iteration 2092 transactions (2091 requested), each up to 8192 blocks average: 0ms waiting for transaction 0ms request delay 20ms running transaction 0ms transaction was being locked 0ms flushing data (in ordered mode) 20ms logging transaction 15981us average transaction commit time 67 handles per transaction 1 blocks per transaction 2 logged blocks per transaction /cat /proc/fs/jbd2/nvme1n1-8/info/*//--> After 2nd test iteration 4263 transactions (4262 requested), each up to 8192 blocks average: 0ms waiting for transaction 0ms request delay 10ms running transaction 0ms transaction was being locked 0ms flushing data (in ordered mode) 20ms logging transaction 19795us average transaction commit time 65 handles per transaction 1 blocks per transaction 2 logged blocks per transaction Looking at the commit it seems we should be avoiding unnecessary journal transactions. This is reflected in the ext4_writepages latency. But the transaction commit time seems to be increasing with this commit leading to reduced IOPS. (atleast that's my theory). Can you help look into why this commit introduces this IOPS regression? Also any suggestions on running any more tests to isolate the issue are welcome. Thanks, Rishabh