Hello,
Please find these two patches which were identified while working on ext2
buffered-io conversion to iomap. One of them is a bug fix and the other one
optimizes the read_folio path. More details can be found in individual
commit messages.
[v1]: https://lore.kernel.org/all/[email protected]/
Ritesh Harjani (IBM) (2):
iomap: Fix iomap_adjust_read_range for plen calculation
iomap: Optimize iomap_read_folio
fs/iomap/buffered-io.c | 23 +++++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)
--
2.44.0
If the extent spans the block that contains i_size, we need to handle
both halves separately so that we properly zero data in the page cache
for blocks that are entirely outside of i_size. But this is needed only
when i_size is within the current folio under processing.
"orig_pos + length > isize" can be true for all folios if the mapped
extent length is greater than the folio size. That is making plen to
break for every folio instead of only the last folio.
So use orig_plen for checking if "orig_pos + orig_plen > isize".
Signed-off-by: Ritesh Harjani (IBM) <[email protected]>
cc: Ojaswin Mujoo <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
---
fs/iomap/buffered-io.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 4e8e41c8b3c0..9f79c82d1f73 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -241,6 +241,7 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
unsigned block_size = (1 << block_bits);
size_t poff = offset_in_folio(folio, *pos);
size_t plen = min_t(loff_t, folio_size(folio) - poff, length);
+ size_t orig_plen = plen;
unsigned first = poff >> block_bits;
unsigned last = (poff + plen - 1) >> block_bits;
@@ -277,7 +278,7 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
* handle both halves separately so that we properly zero data in the
* page cache for blocks that are entirely outside of i_size.
*/
- if (orig_pos <= isize && orig_pos + length > isize) {
+ if (orig_pos <= isize && orig_pos + orig_plen > isize) {
unsigned end = offset_in_folio(folio, isize - 1) >> block_bits;
if (first <= end && last > end)
--
2.44.0
iomap_readpage_iter() handles "uptodate blocks" and "not uptodate blocks"
within a folio separately. This makes iomap_read_folio() to call into
->iomap_begin() to request for extent mapping even though it might already
have an extent which is not fully processed.
This happens when we either have a large folio or with bs < ps. In these
cases we can have sub blocks which can be uptodate (say for e.g. due to
previous writes). With iomap_read_folio_iter(), this is handled more
efficiently by not calling ->iomap_begin() call until all the sub blocks
with the current folio are processed.
iomap_read_folio_iter() handles multiple sub blocks within a given
folio but it's implementation logic is similar to how
iomap_readahead_iter() handles multiple folios within a single mapped
extent. Both of them iterate over a given range of folio/mapped extent
and call iomap_readpage_iter() for reading.
Signed-off-by: Ritesh Harjani (IBM) <[email protected]>
cc: Ojaswin Mujoo <[email protected]>
---
fs/iomap/buffered-io.c | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 9f79c82d1f73..a9bd74ee7870 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -444,6 +444,24 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
return pos - orig_pos + plen;
}
+static loff_t iomap_read_folio_iter(const struct iomap_iter *iter,
+ struct iomap_readpage_ctx *ctx)
+{
+ struct folio *folio = ctx->cur_folio;
+ size_t offset = offset_in_folio(folio, iter->pos);
+ loff_t length = min_t(loff_t, folio_size(folio) - offset,
+ iomap_length(iter));
+ loff_t done, ret;
+
+ for (done = 0; done < length; done += ret) {
+ ret = iomap_readpage_iter(iter, ctx, done);
+ if (ret <= 0)
+ return ret;
+ }
+
+ return done;
+}
+
int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
{
struct iomap_iter iter = {
@@ -459,7 +477,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
trace_iomap_readpage(iter.inode, 1);
while ((ret = iomap_iter(&iter, ops)) > 0)
- iter.processed = iomap_readpage_iter(&iter, &ctx, 0);
+ iter.processed = iomap_read_folio_iter(&iter, &ctx);
if (ret < 0)
folio_set_error(folio);
--
2.44.0
On Tue, May 07, 2024 at 02:25:43PM +0530, Ritesh Harjani (IBM) wrote:
> iomap_readpage_iter() handles "uptodate blocks" and "not uptodate blocks"
> within a folio separately. This makes iomap_read_folio() to call into
> ->iomap_begin() to request for extent mapping even though it might already
> have an extent which is not fully processed.
> This happens when we either have a large folio or with bs < ps. In these
> cases we can have sub blocks which can be uptodate (say for e.g. due to
> previous writes). With iomap_read_folio_iter(), this is handled more
> efficiently by not calling ->iomap_begin() call until all the sub blocks
> with the current folio are processed.
>
> iomap_read_folio_iter() handles multiple sub blocks within a given
> folio but it's implementation logic is similar to how
> iomap_readahead_iter() handles multiple folios within a single mapped
> extent. Both of them iterate over a given range of folio/mapped extent
> and call iomap_readpage_iter() for reading.
>
> Signed-off-by: Ritesh Harjani (IBM) <[email protected]>
> cc: Ojaswin Mujoo <[email protected]>
I like this improved changelog, it'e easier to understand why
_read_folio_iter needs to exist.
Reviewed-by: Darrick J. Wong <[email protected]>
--D
> ---
> fs/iomap/buffered-io.c | 20 +++++++++++++++++++-
> 1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 9f79c82d1f73..a9bd74ee7870 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -444,6 +444,24 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
> return pos - orig_pos + plen;
> }
>
> +static loff_t iomap_read_folio_iter(const struct iomap_iter *iter,
> + struct iomap_readpage_ctx *ctx)
> +{
> + struct folio *folio = ctx->cur_folio;
> + size_t offset = offset_in_folio(folio, iter->pos);
> + loff_t length = min_t(loff_t, folio_size(folio) - offset,
> + iomap_length(iter));
> + loff_t done, ret;
> +
> + for (done = 0; done < length; done += ret) {
> + ret = iomap_readpage_iter(iter, ctx, done);
> + if (ret <= 0)
> + return ret;
> + }
> +
> + return done;
> +}
> +
> int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
> {
> struct iomap_iter iter = {
> @@ -459,7 +477,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
> trace_iomap_readpage(iter.inode, 1);
>
> while ((ret = iomap_iter(&iter, ops)) > 0)
> - iter.processed = iomap_readpage_iter(&iter, &ctx, 0);
> + iter.processed = iomap_read_folio_iter(&iter, &ctx);
>
> if (ret < 0)
> folio_set_error(folio);
> --
> 2.44.0
>
>
Looks good:
Reviewed-by: Christoph Hellwig <[email protected]>
On Tue 07-05-24 14:25:42, Ritesh Harjani (IBM) wrote:
> If the extent spans the block that contains i_size, we need to handle
> both halves separately so that we properly zero data in the page cache
> for blocks that are entirely outside of i_size. But this is needed only
> when i_size is within the current folio under processing.
> "orig_pos + length > isize" can be true for all folios if the mapped
> extent length is greater than the folio size. That is making plen to
> break for every folio instead of only the last folio.
>
> So use orig_plen for checking if "orig_pos + orig_plen > isize".
>
> Signed-off-by: Ritesh Harjani (IBM) <[email protected]>
> cc: Ojaswin Mujoo <[email protected]>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Reviewed-by: Darrick J. Wong <[email protected]>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <[email protected]>
Honza
> ---
> fs/iomap/buffered-io.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 4e8e41c8b3c0..9f79c82d1f73 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -241,6 +241,7 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
> unsigned block_size = (1 << block_bits);
> size_t poff = offset_in_folio(folio, *pos);
> size_t plen = min_t(loff_t, folio_size(folio) - poff, length);
> + size_t orig_plen = plen;
> unsigned first = poff >> block_bits;
> unsigned last = (poff + plen - 1) >> block_bits;
>
> @@ -277,7 +278,7 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio,
> * handle both halves separately so that we properly zero data in the
> * page cache for blocks that are entirely outside of i_size.
> */
> - if (orig_pos <= isize && orig_pos + length > isize) {
> + if (orig_pos <= isize && orig_pos + orig_plen > isize) {
> unsigned end = offset_in_folio(folio, isize - 1) >> block_bits;
>
> if (first <= end && last > end)
> --
> 2.44.0
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Tue 07-05-24 14:25:43, Ritesh Harjani (IBM) wrote:
> iomap_readpage_iter() handles "uptodate blocks" and "not uptodate blocks"
> within a folio separately. This makes iomap_read_folio() to call into
> ->iomap_begin() to request for extent mapping even though it might already
> have an extent which is not fully processed.
> This happens when we either have a large folio or with bs < ps. In these
> cases we can have sub blocks which can be uptodate (say for e.g. due to
> previous writes). With iomap_read_folio_iter(), this is handled more
> efficiently by not calling ->iomap_begin() call until all the sub blocks
> with the current folio are processed.
>
> iomap_read_folio_iter() handles multiple sub blocks within a given
> folio but it's implementation logic is similar to how
> iomap_readahead_iter() handles multiple folios within a single mapped
> extent. Both of them iterate over a given range of folio/mapped extent
> and call iomap_readpage_iter() for reading.
>
> Signed-off-by: Ritesh Harjani (IBM) <[email protected]>
> cc: Ojaswin Mujoo <[email protected]>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <[email protected]>
Honza
> ---
> fs/iomap/buffered-io.c | 20 +++++++++++++++++++-
> 1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 9f79c82d1f73..a9bd74ee7870 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -444,6 +444,24 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
> return pos - orig_pos + plen;
> }
>
> +static loff_t iomap_read_folio_iter(const struct iomap_iter *iter,
> + struct iomap_readpage_ctx *ctx)
> +{
> + struct folio *folio = ctx->cur_folio;
> + size_t offset = offset_in_folio(folio, iter->pos);
> + loff_t length = min_t(loff_t, folio_size(folio) - offset,
> + iomap_length(iter));
> + loff_t done, ret;
> +
> + for (done = 0; done < length; done += ret) {
> + ret = iomap_readpage_iter(iter, ctx, done);
> + if (ret <= 0)
> + return ret;
> + }
> +
> + return done;
> +}
> +
> int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
> {
> struct iomap_iter iter = {
> @@ -459,7 +477,7 @@ int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops)
> trace_iomap_readpage(iter.inode, 1);
>
> while ((ret = iomap_iter(&iter, ops)) > 0)
> - iter.processed = iomap_readpage_iter(&iter, &ctx, 0);
> + iter.processed = iomap_read_folio_iter(&iter, &ctx);
>
> if (ret < 0)
> folio_set_error(folio);
> --
> 2.44.0
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
+Christian
"Ritesh Harjani (IBM)" <[email protected]> writes:
> Hello,
>
> Please find these two patches which were identified while working on ext2
> buffered-io conversion to iomap. One of them is a bug fix and the other one
> optimizes the read_folio path. More details can be found in individual
> commit messages.
>
> [v1]: https://lore.kernel.org/all/[email protected]/
>
> Ritesh Harjani (IBM) (2):
> iomap: Fix iomap_adjust_read_range for plen calculation
> iomap: Optimize iomap_read_folio
Hi Christian,
I guess these 2 patches are not picked yet into the tree? In case if
these are missed could you please pick them up for inclusion?
-ritesh
>
> fs/iomap/buffered-io.c | 23 +++++++++++++++++++++--
> 1 file changed, 21 insertions(+), 2 deletions(-)
>
> --
> 2.44.0
On Tue, Jun 04, 2024 at 06:02:41AM +0530, Ritesh Harjani wrote:
> > Ritesh Harjani (IBM) (2):
> > iomap: Fix iomap_adjust_read_range for plen calculation
> > iomap: Optimize iomap_read_folio
>
> Hi Christian,
>
> I guess these 2 patches are not picked yet into the tree? In case if
> these are missed could you please pick them up for inclusion?
The first one sounds like a 6.10 candidate, the other ones is probably
6.11 material at this point.
On Tue, 07 May 2024 14:25:42 +0530, Ritesh Harjani (IBM) wrote:
> If the extent spans the block that contains i_size, we need to handle
> both halves separately so that we properly zero data in the page cache
> for blocks that are entirely outside of i_size. But this is needed only
> when i_size is within the current folio under processing.
> "orig_pos + length > isize" can be true for all folios if the mapped
> extent length is greater than the folio size. That is making plen to
> break for every folio instead of only the last folio.
>
> [...]
Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes
[1/2] iomap: Fix iomap_adjust_read_range for plen calculation
https://git.kernel.org/vfs/vfs/c/0fbe97059215
On Tue, 07 May 2024 14:25:43 +0530, Ritesh Harjani (IBM) wrote:
> iomap_readpage_iter() handles "uptodate blocks" and "not uptodate blocks"
> within a folio separately. This makes iomap_read_folio() to call into
> ->iomap_begin() to request for extent mapping even though it might already
> have an extent which is not fully processed.
> This happens when we either have a large folio or with bs < ps. In these
> cases we can have sub blocks which can be uptodate (say for e.g. due to
> previous writes). With iomap_read_folio_iter(), this is handled more
> efficiently by not calling ->iomap_begin() call until all the sub blocks
> with the current folio are processed.
>
> [...]
Applied to the vfs.iomap branch of the vfs/vfs.git tree.
Patches in the vfs.iomap branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.iomap
[2/2] iomap: Optimize iomap_read_folio
https://git.kernel.org/vfs/vfs/c/20b686c56bd0