Received: by 2002:a89:48b:0:b0:1f5:f2ab:c469 with SMTP id a11csp1113070lqd; Thu, 25 Apr 2024 06:30:53 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCV2etUbSWgz5zBtMiQ4ykv+qndAAmxUZZJc/GFWwZiiNBmujbZKeQtGavNLPQIfN07qcDKSzA3Uth71EbGpL98K4PKG6IOeJbRSDT50UQ== X-Google-Smtp-Source: AGHT+IFMvSfuxUvZjDS7znqwg2K5PtwtwhIGewcJCcFAVlGlcONWlobLn+tq/853/wobv3Kqnt8L X-Received: by 2002:a05:6870:e409:b0:239:4af6:fa0f with SMTP id n9-20020a056870e40900b002394af6fa0fmr6004192oag.24.1714051852775; Thu, 25 Apr 2024 06:30:52 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1714051852; cv=pass; d=google.com; s=arc-20160816; b=NvbigFELym46aL4OYLuj1+XKusT+5yFCoYAyiRH1Rsi7JeoUncrKbYPNz1PfkGzbXf taptUq8i6jJsuDQ3DiCo79KiG9l2TatnX0bVSwrGnN1H+kpAflYhsmiRZOHSOdHw4erq S1/FD7LbSpysNTAP6khnvbGyabxvmvN2821kI7xcm+RdHWav3MBVaLBzOSyZnaqADS/k V0Zb86LA5cUu7q5dqkoLxLyVzgg4O/C5+pKIFXKJGBOZ2DN2cchXUGbEGZW+X/CjCK5j WWY/iXV8B0jQoAVS/T6E/kC7M1vmZvo8GCGDEc4xIDZ8cGim7D5d0NEVKQY8ZtZdudyz aJwA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=QRED1FXDeH08jKI4Y85H1+m9zHcF9yp/Q4DUuv7pi/8=; fh=qZvDjT6QMED4PXLKjHOVR3KspqsE8j2lmpftj/fyzl4=; b=RxWLhoskBlvlq45yK5simE/U2mO7THWm2CRBOYXf7BM2k7F5nmE7M7Oe3tv3POhUBU TreDySYgW4HbK02jSA7cUGY0sQ2VU9RgnB3thrpTm3P4B+h2v/ZylJdvmFeaMgfI5NjS Uf7a28e92CKbcgsb/gCW7X6vMsmc17lVszVJNAYnraxZmnOtlZAvxPCB4uWnLS23SPCi vBGRjvOi2kTs37is0SUL3aMmBSBjNEzqy4tRmAt6T4LTEXtKXOPENdcW9Kpkrr9JGCSa bIvC/XKYBNbN0JCVqf0eY6tGiXJnig3qU5C06boaN3MkbTKZgt2gI2HoNzxd8rCPuZih l5xQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=Sbv6z1kR; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-ext4+bounces-2187-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-ext4+bounces-2187-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id c16-20020ac87d90000000b0043a1dcadf7dsi3775425qtd.768.2024.04.25.06.30.52 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Apr 2024 06:30:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4+bounces-2187-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=Sbv6z1kR; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-ext4+bounces-2187-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-ext4+bounces-2187-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 6DF4B1C21BF1 for ; Thu, 25 Apr 2024 13:30:52 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4BB4614A619; Thu, 25 Apr 2024 13:29:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Sbv6z1kR" X-Original-To: linux-ext4@vger.kernel.org Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 16FE6149C5E; Thu, 25 Apr 2024 13:29:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.169 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714051771; cv=none; b=IgzavEUF3eTFnrpUX5JLBRsrMl1MuvMhXTf0a6DGPOjmQJMnKQHlv/LIfoebnVWj46LfVAH84HZRL75J4+ruFEbAZuKMiDQSOTw4o3wA4bq14LF9XPCFD4okikhyqRzFCyYbi95oLKliV6My9Ey8jVpI3lNcQojA0ZR9Ho3XWDU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714051771; c=relaxed/simple; bh=mXwhomWyRGbw6y94ChOUxJcx7nd/wj3RuraEiiLl0WQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kEjGKrBotMB6xpGgAQpNx0ICmQ65vkF/1JNI8PYscSALSzqsFoffXP51fNA+QOhSbW1BVvKFqD/E54AD76ipYdtdUhJYGAY5bflLLWtzVH7UGH0QQOxGJ4YZVuNmIQ92/9piV4YL0h9NUD/IMxaaWEms1i0XcjGXIA+PRdr5ZQE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Sbv6z1kR; arc=none smtp.client-ip=209.85.210.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pf1-f169.google.com with SMTP id d2e1a72fcca58-6f043f9e6d7so1044409b3a.3; Thu, 25 Apr 2024 06:29:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714051768; x=1714656568; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QRED1FXDeH08jKI4Y85H1+m9zHcF9yp/Q4DUuv7pi/8=; b=Sbv6z1kR5nZpnFP5sCwuN36E6Ct1UmgXrQNAo5HokBABhq8IfDToZUIZmL9gbw12Vy 9MbLKNAOUydfFj3iVMQ5oa894fJ4adL4lDTa+bkcKU/Jq7ww9u1m+ZBfa0LUF5qmcwjj nGMtv/+D0DIi6KGUJPcCXEClSqWxTUgGDbHAGIQG1bQ3FAcdyVeR2Nh5BOIAHwLSgrNi HJMZ9n79TFCHOc9NkkcwxtRG7Oeg7wqOmgf0ah3edMDiGQ/9R2+Mc/aoNKgOvvS/45yj Zu79iYnm0qK9OAPYJhMgAsLae6DPPD9yTCfGy6GS7QqmGfSJmWqwx6g36cpOvEnLvP3P frfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714051768; x=1714656568; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QRED1FXDeH08jKI4Y85H1+m9zHcF9yp/Q4DUuv7pi/8=; b=ES6uAIy0g6gNXERMGG0uUNZm2wD+/2jhVa8reT82A021GZu7+q4E96MDRecLpjTGfc B8zEZQtStL9zQMHlkR7MAoRn2WUVyOGZI/B6MQSQiGpiJsUOg8M4QnPAzmaR5kZiZOjV vKLIbidnzidgXlm8Ih1AQMErmINP+wjVTI9CwsbnUj+4fwJY666PoPHkOI3Bx5+RG8TL ZdWDWvZ9PqVHOiRbsAwcnBs4QRb9ZpCL97cTifvQlr6kuMNBGFSGmSbnmKrhLrWkwpff Cdt8vceBlfBdaOt8r25mLorMYth/i8Zq5Fc8WjHYxoOQQNuJu1+W0Ya/QEWoXdxFrmSs HYtw== X-Forwarded-Encrypted: i=1; AJvYcCX4VI7uG3EqO3BCxWHjgwbv5qzcdDHH2UGGnrsBUwTlwrPFbDWvSZKUyTYVVQGRKfSlEDrQkwBg9vmjDw4IFR+YSvLJtU6VgUMO X-Gm-Message-State: AOJu0YyjPUXIYAAW1T4kA8Hu8UVc0NPted2azA9VLFoRD3yInEcxB9zb ETUlTJ0A9en1E0nHFqMWGj+Zh08aLlqk+jfgits+LmtXKuVNCP1llQ02q1+Z X-Received: by 2002:a05:6a00:a93:b0:6e6:843f:1d05 with SMTP id b19-20020a056a000a9300b006e6843f1d05mr7305784pfl.25.1714051768374; Thu, 25 Apr 2024 06:29:28 -0700 (PDT) Received: from dw-tp.in.ibm.com ([129.41.58.7]) by smtp.gmail.com with ESMTPSA id s15-20020a62e70f000000b006f260fb17e5sm9764518pfh.141.2024.04.25.06.29.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Apr 2024 06:29:27 -0700 (PDT) From: "Ritesh Harjani (IBM)" To: linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, Matthew Wilcox , "Darrick J . Wong" , Ojaswin Mujoo , Ritesh Harjani , Jan Kara Subject: [RFCv3 7/7] iomap: Optimize data access patterns for filesystems with indirect mappings Date: Thu, 25 Apr 2024 18:58:51 +0530 Message-ID: <4e2752e99f55469c4eb5f2fe83e816d529110192.1714046808.git.ritesh.list@gmail.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-ext4@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit This patch optimizes the data access patterns for filesystems with indirect block mapping by implementing BH_Boundary handling within iomap. Currently the bios for reads within iomap are only submitted at 2 places - 1. If we cannot merge the new req. with previous bio, only then we submit the previous bio. 2. Submit the bio at the end of the entire read processing. This means for filesystems with indirect block mapping, we call into ->iomap_begin() again w/o submitting the previous bios. That causes unoptimized data access patterns for blocks which are of BH_Boundary type. For e.g. consider the file mapping logical block(4k) physical block(4k) 0-11 1000-1011 12-15 1013-1016 In above physical block 1012 is an indirect metadata block which has the mapping information for next set of indirect blocks (1013-1016). With iomap buffered reads for reading 1st 16 logical blocks of a file (0-15), we get below I/O pattern - submit a bio for 1012 - complete the bio for 1012 - submit a bio for 1000-1011 - submit a bio for 1013-1016 - complete the bios for 1000-1011 - complete the bios for 1013-1016 So as we can see, above is an non-optimal I/O access pattern and also we get 3 bio completions instead of 2. This patch changes this behavior by doing submit_bio() if there are any bios already processed, before calling ->iomap_begin() again. That means if there are any blocks which are already processed, gets submitted for I/O earlier and then within ->iomap_begin(), if we get a request for reading an indirect metadata block, then block layer can merge those bios with the already submitted read request to reduce the no. of bio completions. Now, for bs < ps or for large folios, this patch requires proper handling of "ifs->read_bytes_pending". In that we first set ifs->read_bytes_pending to folio_size. Then handle all the cases where we need to subtract ifs->read_bytes_pending either during the submission side (if we don't need to submit any I/O - for e.g. for uptodate sub blocks), or during an I/O error, or at the completion of an I/O. Here is the ftrace output of iomap and block layer with ext2 iomap conversion patches - root# filefrag -b512 -v /mnt1/test/f1 Filesystem type is: ef53 Filesystem cylinder groups approximately 32 File size of /mnt1/test/f1 is 65536 (128 blocks of 512 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 95: 98304.. 98399: 96: merged 1: 96.. 127: 98408.. 98439: 32: 98400: last,merged,eof /mnt1/test/f1: 2 extents found root# #This reads 4 blocks starting from lblk 10, 11, 12, 13 root# xfs_io -c "pread -b$((4*4096)) $((10*4096)) $((4*4096))" /mnt1/test/f1 w/o this patch - (indirect block is submitted before and does not get merged, resulting in 3 bios completion) xfs_io-907 [002] ..... 185.608791: iomap_readahead: dev 8:16 ino 0xc nr_pages 4 xfs_io-907 [002] ..... 185.608819: iomap_iter: dev 8:16 ino 0xc pos 0xa000 length 0x4000 processed 0 flags (0x0) ops 0xffffffff82242160 caller iomap_readahead+0x9d/0x2c0 xfs_io-907 [002] ..... 185.608823: iomap_iter_dstmap: dev 8:16 ino 0xc bdev 8:16 addr 0x300a000 offset 0xa000 length 0x2000 type MAPPED flags MERGED xfs_io-907 [002] ..... 185.608831: iomap_iter: dev 8:16 ino 0xc pos 0xa000 length 0x2000 processed 8192 flags (0x0) ops 0xffffffff82242160 caller iomap_readahead+0x1e1/0x2c0 xfs_io-907 [002] ..... 185.608859: block_bio_queue: 8,16 R 98400 + 8 [xfs_io] xfs_io-907 [002] ..... 185.608865: block_getrq: 8,16 R 98400 + 8 [xfs_io] xfs_io-907 [002] ..... 185.608867: block_io_start: 8,16 R 4096 () 98400 + 8 [xfs_io] xfs_io-907 [002] ..... 185.608869: block_plug: [xfs_io] xfs_io-907 [002] ..... 185.608872: block_unplug: [xfs_io] 1 xfs_io-907 [002] ..... 185.608874: block_rq_insert: 8,16 R 4096 () 98400 + 8 [xfs_io] kworker/2:1H-198 [002] ..... 185.608908: block_rq_issue: 8,16 R 4096 () 98400 + 8 [kworker/2:1H] -0 [002] d.h2. 185.609579: block_rq_complete: 8,16 R () 98400 + 8 [0] -0 [002] dNh2. 185.609631: block_io_done: 8,16 R 0 () 98400 + 0 [swapper/2] xfs_io-907 [002] ..... 185.609694: iomap_iter_dstmap: dev 8:16 ino 0xc bdev 8:16 addr 0x300d000 offset 0xc000 length 0x2000 type MAPPED flags MERGED xfs_io-907 [002] ..... 185.609704: block_bio_queue: 8,16 RA 98384 + 16 [xfs_io] xfs_io-907 [002] ..... 185.609718: block_getrq: 8,16 RA 98384 + 16 [xfs_io] xfs_io-907 [002] ..... 185.609721: block_io_start: 8,16 RA 8192 () 98384 + 16 [xfs_io] xfs_io-907 [002] ..... 185.609726: block_plug: [xfs_io] xfs_io-907 [002] ..... 185.609735: iomap_iter: dev 8:16 ino 0xc pos 0xc000 length 0x2000 processed 8192 flags (0x0) ops 0xffffffff82242160 caller iomap_readahead+0x1e1/0x2c0 xfs_io-907 [002] ..... 185.609736: block_bio_queue: 8,16 RA 98408 + 16 [xfs_io] xfs_io-907 [002] ..... 185.609740: block_getrq: 8,16 RA 98408 + 16 [xfs_io] xfs_io-907 [002] ..... 185.609741: block_io_start: 8,16 RA 8192 () 98408 + 16 [xfs_io] xfs_io-907 [002] ..... 185.609756: block_rq_issue: 8,16 RA 8192 () 98408 + 16 [xfs_io] xfs_io-907 [002] ..... 185.609769: block_rq_issue: 8,16 RA 8192 () 98384 + 16 [xfs_io] -0 [002] d.H2. 185.610280: block_rq_complete: 8,16 RA () 98408 + 16 [0] -0 [002] d.H2. 185.610289: block_io_done: 8,16 RA 0 () 98408 + 0 [swapper/2] -0 [002] d.H2. 185.610292: block_rq_complete: 8,16 RA () 98384 + 16 [0] -0 [002] dNH2. 185.610301: block_io_done: 8,16 RA 0 () 98384 + 0 [swapper/2] v/s with the patch - (optimzed I/O access pattern and bio gets merged resulting in only 2 bios completion) xfs_io-944 [005] ..... 99.926187: iomap_readahead: dev 8:16 ino 0xc nr_pages 4 xfs_io-944 [005] ..... 99.926208: iomap_iter: dev 8:16 ino 0xc pos 0xa000 length 0x4000 processed 0 flags (0x0) ops 0xffffffff82242160 caller iomap_readahead+0x9d/0x2c0 xfs_io-944 [005] ..... 99.926211: iomap_iter_dstmap: dev 8:16 ino 0xc bdev 8:16 addr 0x300a000 offset 0xa000 length 0x2000 type MAPPED flags MERGED xfs_io-944 [005] ..... 99.926222: block_bio_queue: 8,16 RA 98384 + 16 [xfs_io] xfs_io-944 [005] ..... 99.926232: block_getrq: 8,16 RA 98384 + 16 [xfs_io] xfs_io-944 [005] ..... 99.926233: block_io_start: 8,16 RA 8192 () 98384 + 16 [xfs_io] xfs_io-944 [005] ..... 99.926234: block_plug: [xfs_io] xfs_io-944 [005] ..... 99.926235: iomap_iter: dev 8:16 ino 0xc pos 0xa000 length 0x2000 processed 8192 flags (0x0) ops 0xffffffff82242160 caller iomap_readahead+0x1f9/0x2c0 xfs_io-944 [005] ..... 99.926261: block_bio_queue: 8,16 R 98400 + 8 [xfs_io] xfs_io-944 [005] ..... 99.926266: block_bio_backmerge: 8,16 R 98400 + 8 [xfs_io] xfs_io-944 [005] ..... 99.926271: block_unplug: [xfs_io] 1 xfs_io-944 [005] ..... 99.926272: block_rq_insert: 8,16 RA 12288 () 98384 + 24 [xfs_io] kworker/5:1H-234 [005] ..... 99.926314: block_rq_issue: 8,16 RA 12288 () 98384 + 24 [kworker/5:1H] -0 [005] d.h2. 99.926905: block_rq_complete: 8,16 RA () 98384 + 24 [0] -0 [005] dNh2. 99.926931: block_io_done: 8,16 RA 0 () 98384 + 0 [swapper/5] xfs_io-944 [005] ..... 99.926971: iomap_iter_dstmap: dev 8:16 ino 0xc bdev 8:16 addr 0x300d000 offset 0xc000 length 0x2000 type MAPPED flags MERGED xfs_io-944 [005] ..... 99.926981: block_bio_queue: 8,16 RA 98408 + 16 [xfs_io] xfs_io-944 [005] ..... 99.926989: block_getrq: 8,16 RA 98408 + 16 [xfs_io] xfs_io-944 [005] ..... 99.926989: block_io_start: 8,16 RA 8192 () 98408 + 16 [xfs_io] xfs_io-944 [005] ..... 99.926991: block_plug: [xfs_io] xfs_io-944 [005] ..... 99.926993: iomap_iter: dev 8:16 ino 0xc pos 0xc000 length 0x2000 processed 8192 flags (0x0) ops 0xffffffff82242160 caller iomap_readahead+0x1f9/0x2c0 xfs_io-944 [005] ..... 99.927001: block_rq_issue: 8,16 RA 8192 () 98408 + 16 [xfs_io] -0 [005] d.h2. 99.927397: block_rq_complete: 8,16 RA () 98408 + 16 [0] -0 [005] dNh2. 99.927414: block_io_done: 8,16 RA 0 () 98408 + 0 [swapper/5] Suggested-by: Matthew Wilcox Signed-off-by: Ritesh Harjani (IBM) cc: Ojaswin Mujoo --- fs/iomap/buffered-io.c | 112 +++++++++++++++++++++++++++++++---------- 1 file changed, 85 insertions(+), 27 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 0a4269095ae2..a1d50086a3f5 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -30,7 +30,7 @@ typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length); */ struct iomap_folio_state { spinlock_t state_lock; - unsigned int read_bytes_pending; + size_t read_bytes_pending; atomic_t write_bytes_pending; /* @@ -380,6 +380,7 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter, loff_t orig_pos = pos; size_t poff, plen; sector_t sector; + bool rbp_finished = false; if (iomap->type == IOMAP_INLINE) return iomap_read_inline_data(iter, folio); @@ -387,21 +388,39 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter, /* zero post-eof blocks as the page may be mapped */ ifs = ifs_alloc(iter->inode, folio, iter->flags); iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff, &plen); + + if (ifs) { + loff_t to_read = min_t(loff_t, iter->len - offset, + folio_size(folio) - offset_in_folio(folio, orig_pos)); + size_t padjust; + + spin_lock_irq(&ifs->state_lock); + if (!ifs->read_bytes_pending) + ifs->read_bytes_pending = to_read; + padjust = pos - orig_pos; + ifs->read_bytes_pending -= padjust; + if (!ifs->read_bytes_pending) + rbp_finished = true; + spin_unlock_irq(&ifs->state_lock); + } + if (plen == 0) goto done; if (iomap_block_needs_zeroing(iter, pos)) { + if (ifs) { + spin_lock_irq(&ifs->state_lock); + ifs->read_bytes_pending -= plen; + if (!ifs->read_bytes_pending) + rbp_finished = true; + spin_unlock_irq(&ifs->state_lock); + } folio_zero_range(folio, poff, plen); iomap_set_range_uptodate(folio, poff, plen); goto done; } ctx->cur_folio_in_bio = true; - if (ifs) { - spin_lock_irq(&ifs->state_lock); - ifs->read_bytes_pending += plen; - spin_unlock_irq(&ifs->state_lock); - } sector = iomap_sector(iomap, pos); if (!ctx->bio || @@ -435,6 +454,14 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter, } done: + /* + * If there is no bio prepared and if rbp is finished and + * this was the last offset within this folio then mark + * cur_folio_in_bio to false. + */ + if (!ctx->bio && rbp_finished && + offset_in_folio(folio, pos + plen) == 0) + ctx->cur_folio_in_bio = false; /* * Move the caller beyond our range so that it keeps making progress. * For that, we have to include any leading non-uptodate ranges, but @@ -459,9 +486,43 @@ static loff_t iomap_read_folio_iter(const struct iomap_iter *iter, return ret; } + if (ctx->bio) { + submit_bio(ctx->bio); + WARN_ON_ONCE(!ctx->cur_folio_in_bio); + ctx->bio = NULL; + } + if (offset_in_folio(folio, iter->pos + done) == 0 && + !ctx->cur_folio_in_bio) { + folio_unlock(ctx->cur_folio); + } + return done; } +static void iomap_handle_read_error(struct iomap_readpage_ctx *ctx, + struct iomap_iter *iter) +{ + struct folio *folio = ctx->cur_folio; + struct iomap_folio_state *ifs; + unsigned long flags; + bool rbp_finished = false; + size_t rbp_adjust = folio_size(folio) - offset_in_folio(folio, + iter->pos); + ifs = folio->private; + if (!ifs || !ifs->read_bytes_pending) + goto unlock; + + spin_lock_irqsave(&ifs->state_lock, flags); + ifs->read_bytes_pending -= rbp_adjust; + if (!ifs->read_bytes_pending) + rbp_finished = true; + spin_unlock_irqrestore(&ifs->state_lock, flags); + +unlock: + if (rbp_finished || !ctx->cur_folio_in_bio) + folio_unlock(folio); +} + int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops) { struct iomap_iter iter = { @@ -479,15 +540,9 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops) while ((ret = iomap_iter(&iter, ops)) > 0) iter.processed = iomap_read_folio_iter(&iter, &ctx); - if (ret < 0) + if (ret < 0) { folio_set_error(folio); - - if (ctx.bio) { - submit_bio(ctx.bio); - WARN_ON_ONCE(!ctx.cur_folio_in_bio); - } else { - WARN_ON_ONCE(ctx.cur_folio_in_bio); - folio_unlock(folio); + iomap_handle_read_error(&ctx, &iter); } /* @@ -506,12 +561,6 @@ static loff_t iomap_readahead_iter(const struct iomap_iter *iter, loff_t done, ret; for (done = 0; done < length; done += ret) { - if (ctx->cur_folio && - offset_in_folio(ctx->cur_folio, iter->pos + done) == 0) { - if (!ctx->cur_folio_in_bio) - folio_unlock(ctx->cur_folio); - ctx->cur_folio = NULL; - } if (!ctx->cur_folio) { ctx->cur_folio = readahead_folio(ctx->rac); ctx->cur_folio_in_bio = false; @@ -519,6 +568,17 @@ static loff_t iomap_readahead_iter(const struct iomap_iter *iter, ret = iomap_readpage_iter(iter, ctx, done); if (ret <= 0) return ret; + if (ctx->cur_folio && offset_in_folio(ctx->cur_folio, + iter->pos + done + ret) == 0) { + if (!ctx->cur_folio_in_bio) + folio_unlock(ctx->cur_folio); + ctx->cur_folio = NULL; + } + } + + if (ctx->bio) { + submit_bio(ctx->bio); + ctx->bio = NULL; } return done; @@ -549,18 +609,16 @@ void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops) struct iomap_readpage_ctx ctx = { .rac = rac, }; + int ret = 0; trace_iomap_readahead(rac->mapping->host, readahead_count(rac)); - while (iomap_iter(&iter, ops) > 0) + while ((ret = iomap_iter(&iter, ops)) > 0) iter.processed = iomap_readahead_iter(&iter, &ctx); - if (ctx.bio) - submit_bio(ctx.bio); - if (ctx.cur_folio) { - if (!ctx.cur_folio_in_bio) - folio_unlock(ctx.cur_folio); - } + if (ret < 0 && ctx.cur_folio) + iomap_handle_read_error(&ctx, &iter); + } EXPORT_SYMBOL_GPL(iomap_readahead); -- 2.44.0