2024-04-10 03:51:00

by Zhang Yi

[permalink] [raw]
Subject: [PATCH v2 0/9] ext4: support adding multi-delalloc blocks

From: Zhang Yi <[email protected]>

Hello!

This patch series is the part 2 prepartory changes of the buffered IO
iomap conversion, I picked them out from my buffered IO iomap conversion
RFC series v3[1], add a fix for an issue found in current ext4 code,and
also add bigalloc feature support. Please look the following patches for
details.

The first 2 patches fix an incorrect delalloc reserved blocks count
issue, the second 6 patches make ext4_insert_delayed_block() call path
support inserting multi-delalloc blocks once a time, and the last patch
makes ext4_da_map_blocks() buffer_head unaware, prepared for iomap.

This patch set has been passed 'kvm-xfstests -g auto' tests, I hope it
could be reviewed and merged first.

[1] https://lore.kernel.org/linux-ext4/[email protected]/

Thanks,
Yi.

Zhang Yi (9):
ext4: factor out a common helper to query extent map
ext4: check the extent status again before inserting delalloc block
ext4: trim delalloc extent
ext4: drop iblock parameter
ext4: make ext4_es_insert_delayed_block() insert multi-blocks
ext4: make ext4_da_reserve_space() reserve multi-clusters
ext4: factor out check for whether a cluster is allocated
ext4: make ext4_insert_delayed_block() insert multi-blocks
ext4: make ext4_da_map_blocks() buffer_head unaware

fs/ext4/extents_status.c | 63 ++++++----
fs/ext4/extents_status.h | 5 +-
fs/ext4/inode.c | 240 +++++++++++++++++++++++-------------
include/trace/events/ext4.h | 26 ++--
4 files changed, 213 insertions(+), 121 deletions(-)

--
2.39.2



2024-04-10 03:51:04

by Zhang Yi

[permalink] [raw]
Subject: [PATCH v2 3/9] ext4: trim delalloc extent

From: Zhang Yi <[email protected]>

The cached delalloc or hole extent should be trimed to the map->map_len
if we map delalloc blocks in ext4_da_map_blocks(). But it doesn't
trigger any issue now because the map->m_len is always set to one and we
always insert one delayed block once a time. Fix this by trim the extent
once we get one from the cached extent tree, prearing for mapping a
extent with multiple delalloc blocks.

Signed-off-by: Zhang Yi <[email protected]>
---
fs/ext4/inode.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 118b0497a954..e4043ddb07a5 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1734,6 +1734,11 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,

/* Lookup extent status tree firstly */
if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
+ retval = es.es_len - (iblock - es.es_lblk);
+ if (retval > map->m_len)
+ retval = map->m_len;
+ map->m_len = retval;
+
if (ext4_es_is_hole(&es))
goto add_delayed;

@@ -1750,10 +1755,6 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
}

map->m_pblk = ext4_es_pblock(&es) + iblock - es.es_lblk;
- retval = es.es_len - (iblock - es.es_lblk);
- if (retval > map->m_len)
- retval = map->m_len;
- map->m_len = retval;
if (ext4_es_is_written(&es))
map->m_flags |= EXT4_MAP_MAPPED;
else if (ext4_es_is_unwritten(&es))
@@ -1788,6 +1789,11 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
* whitout holding i_rwsem and folio lock.
*/
if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
+ retval = es.es_len - (iblock - es.es_lblk);
+ if (retval > map->m_len)
+ retval = map->m_len;
+ map->m_len = retval;
+
if (!ext4_es_is_hole(&es)) {
up_write(&EXT4_I(inode)->i_data_sem);
goto found;
--
2.39.2


2024-04-10 03:51:09

by Zhang Yi

[permalink] [raw]
Subject: [PATCH v2 2/9] ext4: check the extent status again before inserting delalloc block

From: Zhang Yi <[email protected]>

Now we lookup extent status entry without holding the i_data_sem before
inserting delalloc block, it works fine in buffered write path and
because it holds i_rwsem and folio lock, and the mmap path holds folio
lock, so the found extent locklessly couldn't be modified concurrently.
But it could be raced by fallocate since it allocate block whitout
holding i_rwsem and folio lock.

ext4_page_mkwrite() ext4_fallocate()
block_page_mkwrite()
ext4_da_map_blocks()
//find hole in extent status tree
ext4_alloc_file_blocks()
ext4_map_blocks()
//allocate block and unwritten extent
ext4_insert_delayed_block()
ext4_da_reserve_space()
//reserve one more block
ext4_es_insert_delayed_block()
//drop unwritten extent and add delayed extent by mistake

Then, the delalloc extent is wrong until writeback, the one more
reserved block can't be release any more and trigger below warning:

EXT4-fs (pmem2): Inode 13 (00000000bbbd4d23): i_reserved_data_blocks(1) not cleared!

Hold i_data_sem in write mode directly can fix the problem, but it's
expansive, we should keep the lockless check and check the extent again
once we need to add an new delalloc block.

Cc: [email protected]
Signed-off-by: Zhang Yi <[email protected]>
---
fs/ext4/inode.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 6a41172c06e1..118b0497a954 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1737,6 +1737,7 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
if (ext4_es_is_hole(&es))
goto add_delayed;

+found:
/*
* Delayed extent could be allocated by fallocate.
* So we need to check it.
@@ -1781,6 +1782,24 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,

add_delayed:
down_write(&EXT4_I(inode)->i_data_sem);
+ /*
+ * Lookup extents tree again under i_data_sem, make sure this
+ * inserting delalloc range haven't been delayed or allocated
+ * whitout holding i_rwsem and folio lock.
+ */
+ if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
+ if (!ext4_es_is_hole(&es)) {
+ up_write(&EXT4_I(inode)->i_data_sem);
+ goto found;
+ }
+ } else if (!ext4_has_inline_data(inode)) {
+ retval = ext4_map_query_blocks(NULL, inode, map);
+ if (retval) {
+ up_write(&EXT4_I(inode)->i_data_sem);
+ return retval;
+ }
+ }
+
retval = ext4_insert_delayed_block(inode, map->m_lblk);
up_write(&EXT4_I(inode)->i_data_sem);
if (retval)
--
2.39.2


2024-04-10 03:51:17

by Zhang Yi

[permalink] [raw]
Subject: [PATCH v2 1/9] ext4: factor out a common helper to query extent map

From: Zhang Yi <[email protected]>

Factor out a new common helper ext4_map_query_blocks() from the
ext4_da_map_blocks(), it query and return the extent map status on the
inode's extent path, no logic changes.

Signed-off-by: Zhang Yi <[email protected]>
---
fs/ext4/inode.c | 57 +++++++++++++++++++++++++++----------------------
1 file changed, 32 insertions(+), 25 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 537803250ca9..6a41172c06e1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -453,6 +453,35 @@ static void ext4_map_blocks_es_recheck(handle_t *handle,
}
#endif /* ES_AGGRESSIVE_TEST */

+static int ext4_map_query_blocks(handle_t *handle, struct inode *inode,
+ struct ext4_map_blocks *map)
+{
+ unsigned int status;
+ int retval;
+
+ if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
+ retval = ext4_ext_map_blocks(handle, inode, map, 0);
+ else
+ retval = ext4_ind_map_blocks(handle, inode, map, 0);
+
+ if (retval <= 0)
+ return retval;
+
+ if (unlikely(retval != map->m_len)) {
+ ext4_warning(inode->i_sb,
+ "ES len assertion failed for inode "
+ "%lu: retval %d != map->m_len %d",
+ inode->i_ino, retval, map->m_len);
+ WARN_ON(1);
+ }
+
+ status = map->m_flags & EXT4_MAP_UNWRITTEN ?
+ EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
+ ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
+ map->m_pblk, status);
+ return retval;
+}
+
/*
* The ext4_map_blocks() function tries to look up the requested blocks,
* and returns if the blocks are already mapped.
@@ -1744,33 +1773,11 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
down_read(&EXT4_I(inode)->i_data_sem);
if (ext4_has_inline_data(inode))
retval = 0;
- else if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
- retval = ext4_ext_map_blocks(NULL, inode, map, 0);
else
- retval = ext4_ind_map_blocks(NULL, inode, map, 0);
- if (retval < 0) {
- up_read(&EXT4_I(inode)->i_data_sem);
- return retval;
- }
- if (retval > 0) {
- unsigned int status;
-
- if (unlikely(retval != map->m_len)) {
- ext4_warning(inode->i_sb,
- "ES len assertion failed for inode "
- "%lu: retval %d != map->m_len %d",
- inode->i_ino, retval, map->m_len);
- WARN_ON(1);
- }
-
- status = map->m_flags & EXT4_MAP_UNWRITTEN ?
- EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
- ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
- map->m_pblk, status);
- up_read(&EXT4_I(inode)->i_data_sem);
- return retval;
- }
+ retval = ext4_map_query_blocks(NULL, inode, map);
up_read(&EXT4_I(inode)->i_data_sem);
+ if (retval)
+ return retval;

add_delayed:
down_write(&EXT4_I(inode)->i_data_sem);
--
2.39.2


2024-04-10 03:51:20

by Zhang Yi

[permalink] [raw]
Subject: [PATCH v2 4/9] ext4: drop iblock parameter

From: Zhang Yi <[email protected]>

The start block of the delalloc extent to be inserted is equal to
map->m_lblk, just drop the duplicate iblock input parameter.

Signed-off-by: Zhang Yi <[email protected]>
---
fs/ext4/inode.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e4043ddb07a5..cccc16506f5f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1712,8 +1712,7 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
* time. This function looks up the requested blocks and sets the
* buffer delay bit under the protection of i_data_sem.
*/
-static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
- struct ext4_map_blocks *map,
+static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
struct buffer_head *bh)
{
struct extent_status es;
@@ -1733,8 +1732,8 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
(unsigned long) map->m_lblk);

/* Lookup extent status tree firstly */
- if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
- retval = es.es_len - (iblock - es.es_lblk);
+ if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
+ retval = es.es_len - (map->m_lblk - es.es_lblk);
if (retval > map->m_len)
retval = map->m_len;
map->m_len = retval;
@@ -1754,7 +1753,7 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
return 0;
}

- map->m_pblk = ext4_es_pblock(&es) + iblock - es.es_lblk;
+ map->m_pblk = ext4_es_pblock(&es) + map->m_lblk - es.es_lblk;
if (ext4_es_is_written(&es))
map->m_flags |= EXT4_MAP_MAPPED;
else if (ext4_es_is_unwritten(&es))
@@ -1788,8 +1787,8 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
* inserting delalloc range haven't been delayed or allocated
* whitout holding i_rwsem and folio lock.
*/
- if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
- retval = es.es_len - (iblock - es.es_lblk);
+ if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
+ retval = es.es_len - (map->m_lblk - es.es_lblk);
if (retval > map->m_len)
retval = map->m_len;
map->m_len = retval;
@@ -1846,7 +1845,7 @@ int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
* preallocated blocks are unmapped but should treated
* the same as allocated blocks.
*/
- ret = ext4_da_map_blocks(inode, iblock, &map, bh);
+ ret = ext4_da_map_blocks(inode, &map, bh);
if (ret <= 0)
return ret;

--
2.39.2


2024-04-10 03:51:27

by Zhang Yi

[permalink] [raw]
Subject: [PATCH v2 5/9] ext4: make ext4_es_insert_delayed_block() insert multi-blocks

From: Zhang Yi <[email protected]>

Rename ext4_es_insert_delayed_block() to ext4_es_insert_delayed_extent()
and pass length parameter to make it insert multi delalloc blocks once a
time. For the case of bigalloc, expand the allocated parameter to
lclu_allocated and end_allocated. lclu_allocated indicates the allocate
state of the cluster which containing the lblk, end_allocated represents
the end, and the middle clusters must be unallocated.

Signed-off-by: Zhang Yi <[email protected]>
---
fs/ext4/extents_status.c | 63 ++++++++++++++++++++++++-------------
fs/ext4/extents_status.h | 5 +--
fs/ext4/inode.c | 2 +-
include/trace/events/ext4.h | 16 +++++-----
4 files changed, 55 insertions(+), 31 deletions(-)

diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 4a00e2f019d9..2320b0d71001 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -2052,34 +2052,42 @@ bool ext4_is_pending(struct inode *inode, ext4_lblk_t lblk)
}

/*
- * ext4_es_insert_delayed_block - adds a delayed block to the extents status
- * tree, adding a pending reservation where
- * needed
+ * ext4_es_insert_delayed_extent - adds some delayed blocks to the extents
+ * status tree, adding a pending reservation
+ * where needed
*
* @inode - file containing the newly added block
- * @lblk - logical block to be added
- * @allocated - indicates whether a physical cluster has been allocated for
- * the logical cluster that contains the block
+ * @lblk - start logical block to be added
+ * @len - length of blocks to be added
+ * @lclu_allocated/end_allocated - indicates whether a physical cluster has
+ * been allocated for the logical cluster
+ * that contains the block
*/
-void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
- bool allocated)
+void ext4_es_insert_delayed_extent(struct inode *inode, ext4_lblk_t lblk,
+ ext4_lblk_t len, bool lclu_allocated,
+ bool end_allocated)
{
struct extent_status newes;
+ ext4_lblk_t end = lblk + len - 1;
int err1 = 0, err2 = 0, err3 = 0;
struct extent_status *es1 = NULL;
struct extent_status *es2 = NULL;
- struct pending_reservation *pr = NULL;
+ struct pending_reservation *pr1 = NULL;
+ struct pending_reservation *pr2 = NULL;

if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
return;

- es_debug("add [%u/1) delayed to extent status tree of inode %lu\n",
- lblk, inode->i_ino);
+ es_debug("add [%u/%u) delayed to extent status tree of inode %lu\n",
+ lblk, len, inode->i_ino);
+ if (!len)
+ return;

newes.es_lblk = lblk;
- newes.es_len = 1;
+ newes.es_len = len;
ext4_es_store_pblock_status(&newes, ~0, EXTENT_STATUS_DELAYED);
- trace_ext4_es_insert_delayed_block(inode, &newes, allocated);
+ trace_ext4_es_insert_delayed_extent(inode, &newes, lclu_allocated,
+ end_allocated);

ext4_es_insert_extent_check(inode, &newes);

@@ -2088,11 +2096,15 @@ void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
es1 = __es_alloc_extent(true);
if ((err1 || err2) && !es2)
es2 = __es_alloc_extent(true);
- if ((err1 || err2 || err3) && allocated && !pr)
- pr = __alloc_pending(true);
+ if (err1 || err2 || err3) {
+ if (lclu_allocated && !pr1)
+ pr1 = __alloc_pending(true);
+ if (end_allocated && !pr2)
+ pr2 = __alloc_pending(true);
+ }
write_lock(&EXT4_I(inode)->i_es_lock);

- err1 = __es_remove_extent(inode, lblk, lblk, NULL, es1);
+ err1 = __es_remove_extent(inode, lblk, end, NULL, es1);
if (err1 != 0)
goto error;
/* Free preallocated extent if it didn't get used. */
@@ -2112,13 +2124,22 @@ void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
es2 = NULL;
}

- if (allocated) {
- err3 = __insert_pending(inode, lblk, &pr);
+ if (lclu_allocated) {
+ err3 = __insert_pending(inode, lblk, &pr1);
if (err3 != 0)
goto error;
- if (pr) {
- __free_pending(pr);
- pr = NULL;
+ if (pr1) {
+ __free_pending(pr1);
+ pr1 = NULL;
+ }
+ }
+ if (end_allocated) {
+ err3 = __insert_pending(inode, end, &pr2);
+ if (err3 != 0)
+ goto error;
+ if (pr2) {
+ __free_pending(pr2);
+ pr2 = NULL;
}
}
error:
diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
index d9847a4a25db..3c8e2edee5d5 100644
--- a/fs/ext4/extents_status.h
+++ b/fs/ext4/extents_status.h
@@ -249,8 +249,9 @@ extern void ext4_exit_pending(void);
extern void ext4_init_pending_tree(struct ext4_pending_tree *tree);
extern void ext4_remove_pending(struct inode *inode, ext4_lblk_t lblk);
extern bool ext4_is_pending(struct inode *inode, ext4_lblk_t lblk);
-extern void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
- bool allocated);
+extern void ext4_es_insert_delayed_extent(struct inode *inode, ext4_lblk_t lblk,
+ ext4_lblk_t len, bool lclu_allocated,
+ bool end_allocated);
extern unsigned int ext4_es_delayed_clu(struct inode *inode, ext4_lblk_t lblk,
ext4_lblk_t len);
extern void ext4_clear_inode_es(struct inode *inode);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index cccc16506f5f..d37233e2ed0b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1702,7 +1702,7 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
}
}

- ext4_es_insert_delayed_block(inode, lblk, allocated);
+ ext4_es_insert_delayed_extent(inode, lblk, 1, allocated, false);
return 0;
}

diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index a697f4b77162..6b41ac61310f 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -2478,11 +2478,11 @@ TRACE_EVENT(ext4_es_shrink,
__entry->scan_time, __entry->nr_skipped, __entry->retried)
);

-TRACE_EVENT(ext4_es_insert_delayed_block,
+TRACE_EVENT(ext4_es_insert_delayed_extent,
TP_PROTO(struct inode *inode, struct extent_status *es,
- bool allocated),
+ bool lclu_allocated, bool end_allocated),

- TP_ARGS(inode, es, allocated),
+ TP_ARGS(inode, es, lclu_allocated, end_allocated),

TP_STRUCT__entry(
__field( dev_t, dev )
@@ -2491,7 +2491,8 @@ TRACE_EVENT(ext4_es_insert_delayed_block,
__field( ext4_lblk_t, len )
__field( ext4_fsblk_t, pblk )
__field( char, status )
- __field( bool, allocated )
+ __field( bool, lclu_allocated )
+ __field( bool, end_allocated )
),

TP_fast_assign(
@@ -2501,16 +2502,17 @@ TRACE_EVENT(ext4_es_insert_delayed_block,
__entry->len = es->es_len;
__entry->pblk = ext4_es_show_pblock(es);
__entry->status = ext4_es_status(es);
- __entry->allocated = allocated;
+ __entry->lclu_allocated = lclu_allocated;
+ __entry->end_allocated = end_allocated;
),

TP_printk("dev %d,%d ino %lu es [%u/%u) mapped %llu status %s "
- "allocated %d",
+ "allocated %d %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
__entry->lblk, __entry->len,
__entry->pblk, show_extent_status(__entry->status),
- __entry->allocated)
+ __entry->lclu_allocated, __entry->end_allocated)
);

/* fsmap traces */
--
2.39.2


2024-04-10 03:51:35

by Zhang Yi

[permalink] [raw]
Subject: [PATCH v2 7/9] ext4: factor out check for whether a cluster is allocated

From: Zhang Yi <[email protected]>

Factor out a common helper ext4_da_check_clu_allocated(), check whether
the cluster containing a delalloc block to be added has been delayed or
allocated, no logic changes.

Signed-off-by: Zhang Yi <[email protected]>
---
fs/ext4/inode.c | 52 +++++++++++++++++++++++++++++++++----------------
1 file changed, 35 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1180a9eb4362..46c34baa848a 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1649,6 +1649,34 @@ static void ext4_print_free_blocks(struct inode *inode)
return;
}

+/*
+ * Check whether the cluster containing lblk has been delayed or allocated,
+ * if not, it means we should reserve a cluster when add delalloc, return 1,
+ * otherwise return 0 or error code.
+ */
+static int ext4_da_check_clu_allocated(struct inode *inode, ext4_lblk_t lblk,
+ bool *allocated)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+ int ret;
+
+ *allocated = false;
+ if (ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk))
+ return 0;
+
+ if (ext4_es_scan_clu(inode, &ext4_es_is_mapped, lblk))
+ goto allocated;
+
+ ret = ext4_clu_mapped(inode, EXT4_B2C(sbi, lblk));
+ if (ret < 0)
+ return ret;
+ if (ret == 0)
+ return 1;
+allocated:
+ *allocated = true;
+ return 0;
+}
+
/*
* ext4_insert_delayed_block - adds a delayed block to the extents status
* tree, incrementing the reserved cluster/block
@@ -1682,23 +1710,13 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
if (ret != 0) /* ENOSPC */
return ret;
} else { /* bigalloc */
- if (!ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk)) {
- if (!ext4_es_scan_clu(inode,
- &ext4_es_is_mapped, lblk)) {
- ret = ext4_clu_mapped(inode,
- EXT4_B2C(sbi, lblk));
- if (ret < 0)
- return ret;
- if (ret == 0) {
- ret = ext4_da_reserve_space(inode, 1);
- if (ret != 0) /* ENOSPC */
- return ret;
- } else {
- allocated = true;
- }
- } else {
- allocated = true;
- }
+ ret = ext4_da_check_clu_allocated(inode, lblk, &allocated);
+ if (ret < 0)
+ return ret;
+ if (ret > 0) {
+ ret = ext4_da_reserve_space(inode, 1);
+ if (ret != 0) /* ENOSPC */
+ return ret;
}
}

--
2.39.2


2024-04-10 03:51:37

by Zhang Yi

[permalink] [raw]
Subject: [PATCH v2 6/9] ext4: make ext4_da_reserve_space() reserve multi-clusters

From: Zhang Yi <[email protected]>

Add 'nr_resv' parameter to ext4_da_reserve_space(), which indicates the
number of clusters wants to reserve, make it reserve multi-clusters once
a time.

Signed-off-by: Zhang Yi <[email protected]>
---
fs/ext4/inode.c | 18 +++++++++---------
include/trace/events/ext4.h | 10 ++++++----
2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d37233e2ed0b..1180a9eb4362 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1479,9 +1479,9 @@ static int ext4_journalled_write_end(struct file *file,
}

/*
- * Reserve space for a single cluster
+ * Reserve space for 'nr_resv' clusters
*/
-static int ext4_da_reserve_space(struct inode *inode)
+static int ext4_da_reserve_space(struct inode *inode, int nr_resv)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
struct ext4_inode_info *ei = EXT4_I(inode);
@@ -1492,18 +1492,18 @@ static int ext4_da_reserve_space(struct inode *inode)
* us from metadata over-estimation, though we may go over by
* a small amount in the end. Here we just reserve for data.
*/
- ret = dquot_reserve_block(inode, EXT4_C2B(sbi, 1));
+ ret = dquot_reserve_block(inode, EXT4_C2B(sbi, nr_resv));
if (ret)
return ret;

spin_lock(&ei->i_block_reservation_lock);
- if (ext4_claim_free_clusters(sbi, 1, 0)) {
+ if (ext4_claim_free_clusters(sbi, nr_resv, 0)) {
spin_unlock(&ei->i_block_reservation_lock);
- dquot_release_reservation_block(inode, EXT4_C2B(sbi, 1));
+ dquot_release_reservation_block(inode, EXT4_C2B(sbi, nr_resv));
return -ENOSPC;
}
- ei->i_reserved_data_blocks++;
- trace_ext4_da_reserve_space(inode);
+ ei->i_reserved_data_blocks += nr_resv;
+ trace_ext4_da_reserve_space(inode, nr_resv);
spin_unlock(&ei->i_block_reservation_lock);

return 0; /* success */
@@ -1678,7 +1678,7 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
* extents status tree doesn't get a match.
*/
if (sbi->s_cluster_ratio == 1) {
- ret = ext4_da_reserve_space(inode);
+ ret = ext4_da_reserve_space(inode, 1);
if (ret != 0) /* ENOSPC */
return ret;
} else { /* bigalloc */
@@ -1690,7 +1690,7 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
if (ret < 0)
return ret;
if (ret == 0) {
- ret = ext4_da_reserve_space(inode);
+ ret = ext4_da_reserve_space(inode, 1);
if (ret != 0) /* ENOSPC */
return ret;
} else {
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 6b41ac61310f..cc5e9b7b2b44 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -1246,14 +1246,15 @@ TRACE_EVENT(ext4_da_update_reserve_space,
);

TRACE_EVENT(ext4_da_reserve_space,
- TP_PROTO(struct inode *inode),
+ TP_PROTO(struct inode *inode, int nr_resv),

- TP_ARGS(inode),
+ TP_ARGS(inode, nr_resv),

TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( __u64, i_blocks )
+ __field( int, reserve_blocks )
__field( int, reserved_data_blocks )
__field( __u16, mode )
),
@@ -1262,16 +1263,17 @@ TRACE_EVENT(ext4_da_reserve_space,
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->i_blocks = inode->i_blocks;
+ __entry->reserve_blocks = nr_resv;
__entry->reserved_data_blocks = EXT4_I(inode)->i_reserved_data_blocks;
__entry->mode = inode->i_mode;
),

- TP_printk("dev %d,%d ino %lu mode 0%o i_blocks %llu "
+ TP_printk("dev %d,%d ino %lu mode 0%o i_blocks %llu reserve_blocks %d"
"reserved_data_blocks %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
__entry->mode, __entry->i_blocks,
- __entry->reserved_data_blocks)
+ __entry->reserve_blocks, __entry->reserved_data_blocks)
);

TRACE_EVENT(ext4_da_release_space,
--
2.39.2


2024-04-10 03:51:45

by Zhang Yi

[permalink] [raw]
Subject: [PATCH v2 9/9] ext4: make ext4_da_map_blocks() buffer_head unaware

From: Zhang Yi <[email protected]>

After calling ext4_da_map_blocks(), a delalloc extent state could be
distinguished through EXT4_MAP_DELAYED flag in map. So factor out
buffer_head related handles in ext4_da_map_blocks(), make it
buffer_head unaware, make it become a common helper, it could be used
for iomap in the future.

Signed-off-by: Zhang Yi <[email protected]>
---
fs/ext4/inode.c | 54 +++++++++++++++++++++++--------------------------
1 file changed, 25 insertions(+), 29 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 08e2692b7286..1731c1d24362 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1749,33 +1749,26 @@ static int ext4_insert_delayed_blocks(struct inode *inode, ext4_lblk_t lblk,
* This function is grabs code from the very beginning of
* ext4_map_blocks, but assumes that the caller is from delayed write
* time. This function looks up the requested blocks and sets the
- * buffer delay bit under the protection of i_data_sem.
+ * delalloc extent map under the protection of i_data_sem.
*/
-static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
- struct buffer_head *bh)
+static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map)
{
struct extent_status es;
int retval;
- sector_t invalid_block = ~((sector_t) 0xffff);
#ifdef ES_AGGRESSIVE_TEST
struct ext4_map_blocks orig_map;

memcpy(&orig_map, map, sizeof(*map));
#endif

- if (invalid_block < ext4_blocks_count(EXT4_SB(inode->i_sb)->s_es))
- invalid_block = ~0;
-
map->m_flags = 0;
ext_debug(inode, "max_blocks %u, logical block %lu\n", map->m_len,
(unsigned long) map->m_lblk);

/* Lookup extent status tree firstly */
if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
- retval = es.es_len - (map->m_lblk - es.es_lblk);
- if (retval > map->m_len)
- retval = map->m_len;
- map->m_len = retval;
+ map->m_len = min_t(unsigned int, map->m_len,
+ es.es_len - (map->m_lblk - es.es_lblk));

if (ext4_es_is_hole(&es))
goto add_delayed;
@@ -1785,10 +1778,8 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
* Delayed extent could be allocated by fallocate.
* So we need to check it.
*/
- if (ext4_es_is_delayed(&es) && !ext4_es_is_unwritten(&es)) {
- map_bh(bh, inode->i_sb, invalid_block);
- set_buffer_new(bh);
- set_buffer_delay(bh);
+ if (ext4_es_is_delonly(&es)) {
+ map->m_flags |= EXT4_MAP_DELAYED;
return 0;
}

@@ -1803,7 +1794,7 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
#ifdef ES_AGGRESSIVE_TEST
ext4_map_blocks_es_recheck(NULL, inode, map, &orig_map, 0);
#endif
- return retval;
+ return 0;
}

/*
@@ -1817,7 +1808,7 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
retval = ext4_map_query_blocks(NULL, inode, map);
up_read(&EXT4_I(inode)->i_data_sem);
if (retval)
- return retval;
+ return retval < 0 ? retval : 0;

add_delayed:
down_write(&EXT4_I(inode)->i_data_sem);
@@ -1827,10 +1818,8 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
* whitout holding i_rwsem and folio lock.
*/
if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
- retval = es.es_len - (map->m_lblk - es.es_lblk);
- if (retval > map->m_len)
- retval = map->m_len;
- map->m_len = retval;
+ map->m_len = min_t(unsigned int, map->m_len,
+ es.es_len - (map->m_lblk - es.es_lblk));

if (!ext4_es_is_hole(&es)) {
up_write(&EXT4_I(inode)->i_data_sem);
@@ -1840,18 +1829,14 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
retval = ext4_map_query_blocks(NULL, inode, map);
if (retval) {
up_write(&EXT4_I(inode)->i_data_sem);
- return retval;
+ return retval < 0 ? retval : 0;
}
}

+ map->m_flags |= EXT4_MAP_DELAYED;
retval = ext4_insert_delayed_blocks(inode, map->m_lblk, map->m_len);
up_write(&EXT4_I(inode)->i_data_sem);
- if (retval)
- return retval;

- map_bh(bh, inode->i_sb, invalid_block);
- set_buffer_new(bh);
- set_buffer_delay(bh);
return retval;
}

@@ -1871,11 +1856,15 @@ int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
struct buffer_head *bh, int create)
{
struct ext4_map_blocks map;
+ sector_t invalid_block = ~((sector_t) 0xffff);
int ret = 0;

BUG_ON(create == 0);
BUG_ON(bh->b_size != inode->i_sb->s_blocksize);

+ if (invalid_block < ext4_blocks_count(EXT4_SB(inode->i_sb)->s_es))
+ invalid_block = ~0;
+
map.m_lblk = iblock;
map.m_len = 1;

@@ -1884,10 +1873,17 @@ int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
* preallocated blocks are unmapped but should treated
* the same as allocated blocks.
*/
- ret = ext4_da_map_blocks(inode, &map, bh);
- if (ret <= 0)
+ ret = ext4_da_map_blocks(inode, &map);
+ if (ret < 0)
return ret;

+ if (map.m_flags & EXT4_MAP_DELAYED) {
+ map_bh(bh, inode->i_sb, invalid_block);
+ set_buffer_new(bh);
+ set_buffer_delay(bh);
+ return 0;
+ }
+
map_bh(bh, inode->i_sb, map.m_pblk);
ext4_update_bh_state(bh, map.m_flags);

--
2.39.2


2024-04-10 03:51:51

by Zhang Yi

[permalink] [raw]
Subject: [PATCH v2 8/9] ext4: make ext4_insert_delayed_block() insert multi-blocks

From: Zhang Yi <[email protected]>

Rename ext4_insert_delayed_block() to ext4_insert_delayed_blocks(),
pass length parameter to make it insert multi delalloc blocks once a
time. For non-bigalloc case, just reserve len blocks and insert delalloc
extent. For bigalloc case, we can ensure the middle clusters are not
allocated, but need to check whether the start and end clusters are
delayed/allocated, if not, we should reserve more space for the start
and/or end block(s).

Signed-off-by: Zhang Yi <[email protected]>
---
fs/ext4/inode.c | 51 ++++++++++++++++++++++++++++++++++---------------
1 file changed, 36 insertions(+), 15 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 46c34baa848a..08e2692b7286 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1678,24 +1678,28 @@ static int ext4_da_check_clu_allocated(struct inode *inode, ext4_lblk_t lblk,
}

/*
- * ext4_insert_delayed_block - adds a delayed block to the extents status
- * tree, incrementing the reserved cluster/block
- * count or making a pending reservation
- * where needed
+ * ext4_insert_delayed_blocks - adds a multiple delayed blocks to the extents
+ * status tree, incrementing the reserved
+ * cluster/block count or making pending
+ * reservations where needed
*
* @inode - file containing the newly added block
- * @lblk - logical block to be added
+ * @lblk - start logical block to be added
+ * @len - length of blocks to be added
*
* Returns 0 on success, negative error code on failure.
*/
-static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
+static int ext4_insert_delayed_blocks(struct inode *inode, ext4_lblk_t lblk,
+ ext4_lblk_t len)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
- int ret;
- bool allocated = false;
+ int resv_clu, ret;
+ bool lclu_allocated = false;
+ bool end_allocated = false;
+ ext4_lblk_t end = lblk + len - 1;

/*
- * If the cluster containing lblk is shared with a delayed,
+ * If the cluster containing lblk or end is shared with a delayed,
* written, or unwritten extent in a bigalloc file system, it's
* already been accounted for and does not need to be reserved.
* A pending reservation must be made for the cluster if it's
@@ -1706,21 +1710,38 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
* extents status tree doesn't get a match.
*/
if (sbi->s_cluster_ratio == 1) {
- ret = ext4_da_reserve_space(inode, 1);
+ ret = ext4_da_reserve_space(inode, len);
if (ret != 0) /* ENOSPC */
return ret;
} else { /* bigalloc */
- ret = ext4_da_check_clu_allocated(inode, lblk, &allocated);
+ resv_clu = EXT4_B2C(sbi, end) - EXT4_B2C(sbi, lblk) - 1;
+ if (resv_clu < 0)
+ resv_clu = 0;
+
+ ret = ext4_da_check_clu_allocated(inode, lblk, &lclu_allocated);
if (ret < 0)
return ret;
- if (ret > 0) {
- ret = ext4_da_reserve_space(inode, 1);
+ if (ret > 0)
+ resv_clu++;
+
+ if (EXT4_B2C(sbi, lblk) != EXT4_B2C(sbi, end)) {
+ ret = ext4_da_check_clu_allocated(inode, end,
+ &end_allocated);
+ if (ret < 0)
+ return ret;
+ if (ret > 0)
+ resv_clu++;
+ }
+
+ if (resv_clu) {
+ ret = ext4_da_reserve_space(inode, resv_clu);
if (ret != 0) /* ENOSPC */
return ret;
}
}

- ext4_es_insert_delayed_extent(inode, lblk, 1, allocated, false);
+ ext4_es_insert_delayed_extent(inode, lblk, len, lclu_allocated,
+ end_allocated);
return 0;
}

@@ -1823,7 +1844,7 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
}
}

- retval = ext4_insert_delayed_block(inode, map->m_lblk);
+ retval = ext4_insert_delayed_blocks(inode, map->m_lblk, map->m_len);
up_write(&EXT4_I(inode)->i_data_sem);
if (retval)
return retval;
--
2.39.2


2024-04-24 20:07:26

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2 2/9] ext4: check the extent status again before inserting delalloc block

On Wed 10-04-24 11:41:56, Zhang Yi wrote:
> From: Zhang Yi <[email protected]>
>
> Now we lookup extent status entry without holding the i_data_sem before
> inserting delalloc block, it works fine in buffered write path and
> because it holds i_rwsem and folio lock, and the mmap path holds folio
> lock, so the found extent locklessly couldn't be modified concurrently.
> But it could be raced by fallocate since it allocate block whitout
> holding i_rwsem and folio lock.
>
> ext4_page_mkwrite() ext4_fallocate()
> block_page_mkwrite()
> ext4_da_map_blocks()
> //find hole in extent status tree
> ext4_alloc_file_blocks()
> ext4_map_blocks()
> //allocate block and unwritten extent
> ext4_insert_delayed_block()
> ext4_da_reserve_space()
> //reserve one more block
> ext4_es_insert_delayed_block()
> //drop unwritten extent and add delayed extent by mistake
>
> Then, the delalloc extent is wrong until writeback, the one more
> reserved block can't be release any more and trigger below warning:
>
> EXT4-fs (pmem2): Inode 13 (00000000bbbd4d23): i_reserved_data_blocks(1) not cleared!
>
> Hold i_data_sem in write mode directly can fix the problem, but it's
> expansive, we should keep the lockless check and check the extent again
> once we need to add an new delalloc block.
>
> Cc: [email protected]
> Signed-off-by: Zhang Yi <[email protected]>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/inode.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 6a41172c06e1..118b0497a954 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1737,6 +1737,7 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
> if (ext4_es_is_hole(&es))
> goto add_delayed;
>
> +found:
> /*
> * Delayed extent could be allocated by fallocate.
> * So we need to check it.
> @@ -1781,6 +1782,24 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
>
> add_delayed:
> down_write(&EXT4_I(inode)->i_data_sem);
> + /*
> + * Lookup extents tree again under i_data_sem, make sure this
> + * inserting delalloc range haven't been delayed or allocated
> + * whitout holding i_rwsem and folio lock.
> + */
> + if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
> + if (!ext4_es_is_hole(&es)) {
> + up_write(&EXT4_I(inode)->i_data_sem);
> + goto found;
> + }
> + } else if (!ext4_has_inline_data(inode)) {
> + retval = ext4_map_query_blocks(NULL, inode, map);
> + if (retval) {
> + up_write(&EXT4_I(inode)->i_data_sem);
> + return retval;
> + }
> + }
> +
> retval = ext4_insert_delayed_block(inode, map->m_lblk);
> up_write(&EXT4_I(inode)->i_data_sem);
> if (retval)
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-04-24 20:18:34

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2 1/9] ext4: factor out a common helper to query extent map

On Wed 10-04-24 11:41:55, Zhang Yi wrote:
> From: Zhang Yi <[email protected]>
>
> Factor out a new common helper ext4_map_query_blocks() from the
> ext4_da_map_blocks(), it query and return the extent map status on the
> inode's extent path, no logic changes.
>
> Signed-off-by: Zhang Yi <[email protected]>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/inode.c | 57 +++++++++++++++++++++++++++----------------------
> 1 file changed, 32 insertions(+), 25 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 537803250ca9..6a41172c06e1 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -453,6 +453,35 @@ static void ext4_map_blocks_es_recheck(handle_t *handle,
> }
> #endif /* ES_AGGRESSIVE_TEST */
>
> +static int ext4_map_query_blocks(handle_t *handle, struct inode *inode,
> + struct ext4_map_blocks *map)
> +{
> + unsigned int status;
> + int retval;
> +
> + if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
> + retval = ext4_ext_map_blocks(handle, inode, map, 0);
> + else
> + retval = ext4_ind_map_blocks(handle, inode, map, 0);
> +
> + if (retval <= 0)
> + return retval;
> +
> + if (unlikely(retval != map->m_len)) {
> + ext4_warning(inode->i_sb,
> + "ES len assertion failed for inode "
> + "%lu: retval %d != map->m_len %d",
> + inode->i_ino, retval, map->m_len);
> + WARN_ON(1);
> + }
> +
> + status = map->m_flags & EXT4_MAP_UNWRITTEN ?
> + EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
> + ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
> + map->m_pblk, status);
> + return retval;
> +}
> +
> /*
> * The ext4_map_blocks() function tries to look up the requested blocks,
> * and returns if the blocks are already mapped.
> @@ -1744,33 +1773,11 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
> down_read(&EXT4_I(inode)->i_data_sem);
> if (ext4_has_inline_data(inode))
> retval = 0;
> - else if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
> - retval = ext4_ext_map_blocks(NULL, inode, map, 0);
> else
> - retval = ext4_ind_map_blocks(NULL, inode, map, 0);
> - if (retval < 0) {
> - up_read(&EXT4_I(inode)->i_data_sem);
> - return retval;
> - }
> - if (retval > 0) {
> - unsigned int status;
> -
> - if (unlikely(retval != map->m_len)) {
> - ext4_warning(inode->i_sb,
> - "ES len assertion failed for inode "
> - "%lu: retval %d != map->m_len %d",
> - inode->i_ino, retval, map->m_len);
> - WARN_ON(1);
> - }
> -
> - status = map->m_flags & EXT4_MAP_UNWRITTEN ?
> - EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
> - ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
> - map->m_pblk, status);
> - up_read(&EXT4_I(inode)->i_data_sem);
> - return retval;
> - }
> + retval = ext4_map_query_blocks(NULL, inode, map);
> up_read(&EXT4_I(inode)->i_data_sem);
> + if (retval)
> + return retval;
>
> add_delayed:
> down_write(&EXT4_I(inode)->i_data_sem);
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-04-25 15:56:53

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2 3/9] ext4: trim delalloc extent

On Wed 10-04-24 11:41:57, Zhang Yi wrote:
> From: Zhang Yi <[email protected]>
>
> The cached delalloc or hole extent should be trimed to the map->map_len
> if we map delalloc blocks in ext4_da_map_blocks(). But it doesn't
> trigger any issue now because the map->m_len is always set to one and we
> always insert one delayed block once a time. Fix this by trim the extent
> once we get one from the cached extent tree, prearing for mapping a
> extent with multiple delalloc blocks.
>
> Signed-off-by: Zhang Yi <[email protected]>

Well, but we already do the trimming in ext4_da_map_blocks(), don't we? You
just move it to a different place... Or do you mean that we actually didn't
set 'map' at all in some cases and now we do? In either case the 'map'
handling looks a bit sloppy in ext4_da_map_blocks() as e.g. the
'add_delayed' case doesn't seem to bother with properly setting 'map' based
on what it does. So maybe we should clean that up to always set 'map' just
before returning at the same place where we update the 'bh'? And maybe bh
update could be updated in some common helper because it's content is
determined by the 'map' content?

Honza

> ---
> fs/ext4/inode.c | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 118b0497a954..e4043ddb07a5 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1734,6 +1734,11 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
>
> /* Lookup extent status tree firstly */
> if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
> + retval = es.es_len - (iblock - es.es_lblk);
> + if (retval > map->m_len)
> + retval = map->m_len;
> + map->m_len = retval;
> +
> if (ext4_es_is_hole(&es))
> goto add_delayed;
>
> @@ -1750,10 +1755,6 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
> }
>
> map->m_pblk = ext4_es_pblock(&es) + iblock - es.es_lblk;
> - retval = es.es_len - (iblock - es.es_lblk);
> - if (retval > map->m_len)
> - retval = map->m_len;
> - map->m_len = retval;
> if (ext4_es_is_written(&es))
> map->m_flags |= EXT4_MAP_MAPPED;
> else if (ext4_es_is_unwritten(&es))
> @@ -1788,6 +1789,11 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
> * whitout holding i_rwsem and folio lock.
> */
> if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
> + retval = es.es_len - (iblock - es.es_lblk);
> + if (retval > map->m_len)
> + retval = map->m_len;
> + map->m_len = retval;
> +
> if (!ext4_es_is_hole(&es)) {
> up_write(&EXT4_I(inode)->i_data_sem);
> goto found;
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-04-26 09:38:37

by Zhang Yi

[permalink] [raw]
Subject: Re: [PATCH v2 3/9] ext4: trim delalloc extent

On 2024/4/25 23:56, Jan Kara wrote:
> On Wed 10-04-24 11:41:57, Zhang Yi wrote:
>> From: Zhang Yi <[email protected]>
>>
>> The cached delalloc or hole extent should be trimed to the map->map_len
>> if we map delalloc blocks in ext4_da_map_blocks(). But it doesn't
>> trigger any issue now because the map->m_len is always set to one and we
>> always insert one delayed block once a time. Fix this by trim the extent
>> once we get one from the cached extent tree, prearing for mapping a
>> extent with multiple delalloc blocks.
>>
>> Signed-off-by: Zhang Yi <[email protected]>
>
> Well, but we already do the trimming in ext4_da_map_blocks(), don't we? You
> just move it to a different place... Or do you mean that we actually didn't
> set 'map' at all in some cases and now we do?

Yeah, now we only trim map len if we found an unwritten extent or written
extent in the cache, this isn't okay if we found a hole and
ext4_insert_delayed_block() and ext4_da_map_blocks() support inserting
map->len blocks. If we found a hole which es->es_len is shorter than the
length we want to write, we could delay more blocks than we expected.

Please assume we write data [A, C) to a file that contains a hole extent
[A, B) and a written extent [B, D) in cache.

A B C D
before da write: ...hhhhhh|wwwwww....

Then we will get extent [A, B), we should trim map->m_len to B-A before
inserting new delalloc blocks, if not, the range [B, C) is duplicated.

> In either case the 'map'
> handling looks a bit sloppy in ext4_da_map_blocks() as e.g. the
> 'add_delayed' case doesn't seem to bother with properly setting 'map' based
> on what it does. So maybe we should clean that up to always set 'map' just
> before returning at the same place where we update the 'bh'? And maybe bh
> update could be updated in some common helper because it's content is
> determined by the 'map' content?
>

I agree with you, it looks that we should always revise the map->m_len
once we found an extent from the cache, and then do corresponding handling
according to the extent type. so it's hard to put it to a common place.
But we can merge the handling of written and unwritten extent, I've moved
the bh updating into ext4_da_get_block_prep() and do some cleanup in
patch 9, please look at that patch, does it looks fine to you?

Thanks,
Yi.

> Honza
>
>> ---
>> fs/ext4/inode.c | 14 ++++++++++----
>> 1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 118b0497a954..e4043ddb07a5 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -1734,6 +1734,11 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
>>
>> /* Lookup extent status tree firstly */
>> if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
>> + retval = es.es_len - (iblock - es.es_lblk);
>> + if (retval > map->m_len)
>> + retval = map->m_len;
>> + map->m_len = retval;
>> +
>> if (ext4_es_is_hole(&es))
>> goto add_delayed;
>>
>> @@ -1750,10 +1755,6 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
>> }
>>
>> map->m_pblk = ext4_es_pblock(&es) + iblock - es.es_lblk;
>> - retval = es.es_len - (iblock - es.es_lblk);
>> - if (retval > map->m_len)
>> - retval = map->m_len;
>> - map->m_len = retval;
>> if (ext4_es_is_written(&es))
>> map->m_flags |= EXT4_MAP_MAPPED;
>> else if (ext4_es_is_unwritten(&es))
>> @@ -1788,6 +1789,11 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
>> * whitout holding i_rwsem and folio lock.
>> */
>> if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
>> + retval = es.es_len - (iblock - es.es_lblk);
>> + if (retval > map->m_len)
>> + retval = map->m_len;
>> + map->m_len = retval;
>> +
>> if (!ext4_es_is_hole(&es)) {
>> up_write(&EXT4_I(inode)->i_data_sem);
>> goto found;
>> --
>> 2.39.2
>>


2024-04-29 07:34:48

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2 4/9] ext4: drop iblock parameter

On Wed 10-04-24 11:41:58, Zhang Yi wrote:
> From: Zhang Yi <[email protected]>
>
> The start block of the delalloc extent to be inserted is equal to
> map->m_lblk, just drop the duplicate iblock input parameter.
>
> Signed-off-by: Zhang Yi <[email protected]>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/inode.c | 15 +++++++--------
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index e4043ddb07a5..cccc16506f5f 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1712,8 +1712,7 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
> * time. This function looks up the requested blocks and sets the
> * buffer delay bit under the protection of i_data_sem.
> */
> -static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
> - struct ext4_map_blocks *map,
> +static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
> struct buffer_head *bh)
> {
> struct extent_status es;
> @@ -1733,8 +1732,8 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
> (unsigned long) map->m_lblk);
>
> /* Lookup extent status tree firstly */
> - if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
> - retval = es.es_len - (iblock - es.es_lblk);
> + if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
> + retval = es.es_len - (map->m_lblk - es.es_lblk);
> if (retval > map->m_len)
> retval = map->m_len;
> map->m_len = retval;
> @@ -1754,7 +1753,7 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
> return 0;
> }
>
> - map->m_pblk = ext4_es_pblock(&es) + iblock - es.es_lblk;
> + map->m_pblk = ext4_es_pblock(&es) + map->m_lblk - es.es_lblk;
> if (ext4_es_is_written(&es))
> map->m_flags |= EXT4_MAP_MAPPED;
> else if (ext4_es_is_unwritten(&es))
> @@ -1788,8 +1787,8 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
> * inserting delalloc range haven't been delayed or allocated
> * whitout holding i_rwsem and folio lock.
> */
> - if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
> - retval = es.es_len - (iblock - es.es_lblk);
> + if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
> + retval = es.es_len - (map->m_lblk - es.es_lblk);
> if (retval > map->m_len)
> retval = map->m_len;
> map->m_len = retval;
> @@ -1846,7 +1845,7 @@ int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
> * preallocated blocks are unmapped but should treated
> * the same as allocated blocks.
> */
> - ret = ext4_da_map_blocks(inode, iblock, &map, bh);
> + ret = ext4_da_map_blocks(inode, &map, bh);
> if (ret <= 0)
> return ret;
>
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-04-29 09:16:59

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2 5/9] ext4: make ext4_es_insert_delayed_block() insert multi-blocks

On Wed 10-04-24 11:41:59, Zhang Yi wrote:
> From: Zhang Yi <[email protected]>
>
> Rename ext4_es_insert_delayed_block() to ext4_es_insert_delayed_extent()
> and pass length parameter to make it insert multi delalloc blocks once a
> time. For the case of bigalloc, expand the allocated parameter to
> lclu_allocated and end_allocated. lclu_allocated indicates the allocate
> state of the cluster which containing the lblk, end_allocated represents
> the end, and the middle clusters must be unallocated.
>
> Signed-off-by: Zhang Yi <[email protected]>

Looks mostly good, just one comment below:

> diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
> index 4a00e2f019d9..2320b0d71001 100644
> --- a/fs/ext4/extents_status.c
> +++ b/fs/ext4/extents_status.c
> @@ -2052,34 +2052,42 @@ bool ext4_is_pending(struct inode *inode, ext4_lblk_t lblk)
> }
>
> /*
> - * ext4_es_insert_delayed_block - adds a delayed block to the extents status
> - * tree, adding a pending reservation where
> - * needed
> + * ext4_es_insert_delayed_extent - adds some delayed blocks to the extents
> + * status tree, adding a pending reservation
> + * where needed
> *
> * @inode - file containing the newly added block
> - * @lblk - logical block to be added
> - * @allocated - indicates whether a physical cluster has been allocated for
> - * the logical cluster that contains the block
> + * @lblk - start logical block to be added
> + * @len - length of blocks to be added
> + * @lclu_allocated/end_allocated - indicates whether a physical cluster has
> + * been allocated for the logical cluster
> + * that contains the block
^^^^ "start / end
block" to make it clearer...

> -void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
> - bool allocated)
> +void ext4_es_insert_delayed_extent(struct inode *inode, ext4_lblk_t lblk,
> + ext4_lblk_t len, bool lclu_allocated,
> + bool end_allocated)
> {
> struct extent_status newes;
> + ext4_lblk_t end = lblk + len - 1;
> int err1 = 0, err2 = 0, err3 = 0;
> struct extent_status *es1 = NULL;
> struct extent_status *es2 = NULL;
> - struct pending_reservation *pr = NULL;
> + struct pending_reservation *pr1 = NULL;
> + struct pending_reservation *pr2 = NULL;
>
> if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
> return;
>
> - es_debug("add [%u/1) delayed to extent status tree of inode %lu\n",
> - lblk, inode->i_ino);
> + es_debug("add [%u/%u) delayed to extent status tree of inode %lu\n",
> + lblk, len, inode->i_ino);
> + if (!len)
> + return;
>
> newes.es_lblk = lblk;
> - newes.es_len = 1;
> + newes.es_len = len;
> ext4_es_store_pblock_status(&newes, ~0, EXTENT_STATUS_DELAYED);
> - trace_ext4_es_insert_delayed_block(inode, &newes, allocated);
> + trace_ext4_es_insert_delayed_extent(inode, &newes, lclu_allocated,
> + end_allocated);
>
> ext4_es_insert_extent_check(inode, &newes);
>
> @@ -2088,11 +2096,15 @@ void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
> es1 = __es_alloc_extent(true);
> if ((err1 || err2) && !es2)
> es2 = __es_alloc_extent(true);
> - if ((err1 || err2 || err3) && allocated && !pr)
> - pr = __alloc_pending(true);
> + if (err1 || err2 || err3) {
> + if (lclu_allocated && !pr1)
> + pr1 = __alloc_pending(true);
> + if (end_allocated && !pr2)
> + pr2 = __alloc_pending(true);
> + }
> write_lock(&EXT4_I(inode)->i_es_lock);
>
> - err1 = __es_remove_extent(inode, lblk, lblk, NULL, es1);
> + err1 = __es_remove_extent(inode, lblk, end, NULL, es1);
> if (err1 != 0)
> goto error;
> /* Free preallocated extent if it didn't get used. */
> @@ -2112,13 +2124,22 @@ void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
> es2 = NULL;
> }
>
> - if (allocated) {
> - err3 = __insert_pending(inode, lblk, &pr);
> + if (lclu_allocated) {
> + err3 = __insert_pending(inode, lblk, &pr1);
> if (err3 != 0)
> goto error;
> - if (pr) {
> - __free_pending(pr);
> - pr = NULL;
> + if (pr1) {
> + __free_pending(pr1);
> + pr1 = NULL;
> + }
> + }
> + if (end_allocated) {

So there's one unclear thing here: What if 'lblk' and 'end' are in the same
cluster? We don't want two pending reservation structures for the cluster.
__insert_pending() already handles this gracefully but perhaps we don't
need to allocate 'pr2' in that case and call __insert_pending() at all? I
think it could be easily handled by something like:

if (EXT4_B2C(lblk) == EXT4_B2C(end))
end_allocated = false;

at appropriate place in ext4_es_insert_delayed_extent().

Otherwise the patch looks good.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-04-29 09:25:11

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2 5/9] ext4: make ext4_es_insert_delayed_block() insert multi-blocks

On Wed 10-04-24 11:41:59, Zhang Yi wrote:
> From: Zhang Yi <[email protected]>

I've realized I wanted to suggest some language changes for the changelog:

> Rename ext4_es_insert_delayed_block() to ext4_es_insert_delayed_extent()
> and pass length parameter to make it insert multi delalloc blocks once a
^^^ multiple delalloc blocks
at a time.

> time. For the case of bigalloc, expand the allocated parameter to
^^^^ split

> lclu_allocated and end_allocated. lclu_allocated indicates the allocate
^^^
allocation

> state of the cluster which containing the lblk, end_allocated represents
^^^^ which is containing ^^^^
indicates the allocation state of the extent end

> the end, and the middle clusters must be unallocated.
^^^. Clusters in the middle of delay allocated extent must be
unallocated.

Honza

>
> Signed-off-by: Zhang Yi <[email protected]>
> ---
> fs/ext4/extents_status.c | 63 ++++++++++++++++++++++++-------------
> fs/ext4/extents_status.h | 5 +--
> fs/ext4/inode.c | 2 +-
> include/trace/events/ext4.h | 16 +++++-----
> 4 files changed, 55 insertions(+), 31 deletions(-)
>
> diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
> index 4a00e2f019d9..2320b0d71001 100644
> --- a/fs/ext4/extents_status.c
> +++ b/fs/ext4/extents_status.c
> @@ -2052,34 +2052,42 @@ bool ext4_is_pending(struct inode *inode, ext4_lblk_t lblk)
> }
>
> /*
> - * ext4_es_insert_delayed_block - adds a delayed block to the extents status
> - * tree, adding a pending reservation where
> - * needed
> + * ext4_es_insert_delayed_extent - adds some delayed blocks to the extents
> + * status tree, adding a pending reservation
> + * where needed
> *
> * @inode - file containing the newly added block
> - * @lblk - logical block to be added
> - * @allocated - indicates whether a physical cluster has been allocated for
> - * the logical cluster that contains the block
> + * @lblk - start logical block to be added
> + * @len - length of blocks to be added
> + * @lclu_allocated/end_allocated - indicates whether a physical cluster has
> + * been allocated for the logical cluster
> + * that contains the block
> */
> -void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
> - bool allocated)
> +void ext4_es_insert_delayed_extent(struct inode *inode, ext4_lblk_t lblk,
> + ext4_lblk_t len, bool lclu_allocated,
> + bool end_allocated)
> {
> struct extent_status newes;
> + ext4_lblk_t end = lblk + len - 1;
> int err1 = 0, err2 = 0, err3 = 0;
> struct extent_status *es1 = NULL;
> struct extent_status *es2 = NULL;
> - struct pending_reservation *pr = NULL;
> + struct pending_reservation *pr1 = NULL;
> + struct pending_reservation *pr2 = NULL;
>
> if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
> return;
>
> - es_debug("add [%u/1) delayed to extent status tree of inode %lu\n",
> - lblk, inode->i_ino);
> + es_debug("add [%u/%u) delayed to extent status tree of inode %lu\n",
> + lblk, len, inode->i_ino);
> + if (!len)
> + return;
>
> newes.es_lblk = lblk;
> - newes.es_len = 1;
> + newes.es_len = len;
> ext4_es_store_pblock_status(&newes, ~0, EXTENT_STATUS_DELAYED);
> - trace_ext4_es_insert_delayed_block(inode, &newes, allocated);
> + trace_ext4_es_insert_delayed_extent(inode, &newes, lclu_allocated,
> + end_allocated);
>
> ext4_es_insert_extent_check(inode, &newes);
>
> @@ -2088,11 +2096,15 @@ void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
> es1 = __es_alloc_extent(true);
> if ((err1 || err2) && !es2)
> es2 = __es_alloc_extent(true);
> - if ((err1 || err2 || err3) && allocated && !pr)
> - pr = __alloc_pending(true);
> + if (err1 || err2 || err3) {
> + if (lclu_allocated && !pr1)
> + pr1 = __alloc_pending(true);
> + if (end_allocated && !pr2)
> + pr2 = __alloc_pending(true);
> + }
> write_lock(&EXT4_I(inode)->i_es_lock);
>
> - err1 = __es_remove_extent(inode, lblk, lblk, NULL, es1);
> + err1 = __es_remove_extent(inode, lblk, end, NULL, es1);
> if (err1 != 0)
> goto error;
> /* Free preallocated extent if it didn't get used. */
> @@ -2112,13 +2124,22 @@ void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
> es2 = NULL;
> }
>
> - if (allocated) {
> - err3 = __insert_pending(inode, lblk, &pr);
> + if (lclu_allocated) {
> + err3 = __insert_pending(inode, lblk, &pr1);
> if (err3 != 0)
> goto error;
> - if (pr) {
> - __free_pending(pr);
> - pr = NULL;
> + if (pr1) {
> + __free_pending(pr1);
> + pr1 = NULL;
> + }
> + }
> + if (end_allocated) {
> + err3 = __insert_pending(inode, end, &pr2);
> + if (err3 != 0)
> + goto error;
> + if (pr2) {
> + __free_pending(pr2);
> + pr2 = NULL;
> }
> }
> error:
> diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
> index d9847a4a25db..3c8e2edee5d5 100644
> --- a/fs/ext4/extents_status.h
> +++ b/fs/ext4/extents_status.h
> @@ -249,8 +249,9 @@ extern void ext4_exit_pending(void);
> extern void ext4_init_pending_tree(struct ext4_pending_tree *tree);
> extern void ext4_remove_pending(struct inode *inode, ext4_lblk_t lblk);
> extern bool ext4_is_pending(struct inode *inode, ext4_lblk_t lblk);
> -extern void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
> - bool allocated);
> +extern void ext4_es_insert_delayed_extent(struct inode *inode, ext4_lblk_t lblk,
> + ext4_lblk_t len, bool lclu_allocated,
> + bool end_allocated);
> extern unsigned int ext4_es_delayed_clu(struct inode *inode, ext4_lblk_t lblk,
> ext4_lblk_t len);
> extern void ext4_clear_inode_es(struct inode *inode);
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index cccc16506f5f..d37233e2ed0b 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1702,7 +1702,7 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
> }
> }
>
> - ext4_es_insert_delayed_block(inode, lblk, allocated);
> + ext4_es_insert_delayed_extent(inode, lblk, 1, allocated, false);
> return 0;
> }
>
> diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
> index a697f4b77162..6b41ac61310f 100644
> --- a/include/trace/events/ext4.h
> +++ b/include/trace/events/ext4.h
> @@ -2478,11 +2478,11 @@ TRACE_EVENT(ext4_es_shrink,
> __entry->scan_time, __entry->nr_skipped, __entry->retried)
> );
>
> -TRACE_EVENT(ext4_es_insert_delayed_block,
> +TRACE_EVENT(ext4_es_insert_delayed_extent,
> TP_PROTO(struct inode *inode, struct extent_status *es,
> - bool allocated),
> + bool lclu_allocated, bool end_allocated),
>
> - TP_ARGS(inode, es, allocated),
> + TP_ARGS(inode, es, lclu_allocated, end_allocated),
>
> TP_STRUCT__entry(
> __field( dev_t, dev )
> @@ -2491,7 +2491,8 @@ TRACE_EVENT(ext4_es_insert_delayed_block,
> __field( ext4_lblk_t, len )
> __field( ext4_fsblk_t, pblk )
> __field( char, status )
> - __field( bool, allocated )
> + __field( bool, lclu_allocated )
> + __field( bool, end_allocated )
> ),
>
> TP_fast_assign(
> @@ -2501,16 +2502,17 @@ TRACE_EVENT(ext4_es_insert_delayed_block,
> __entry->len = es->es_len;
> __entry->pblk = ext4_es_show_pblock(es);
> __entry->status = ext4_es_status(es);
> - __entry->allocated = allocated;
> + __entry->lclu_allocated = lclu_allocated;
> + __entry->end_allocated = end_allocated;
> ),
>
> TP_printk("dev %d,%d ino %lu es [%u/%u) mapped %llu status %s "
> - "allocated %d",
> + "allocated %d %d",
> MAJOR(__entry->dev), MINOR(__entry->dev),
> (unsigned long) __entry->ino,
> __entry->lblk, __entry->len,
> __entry->pblk, show_extent_status(__entry->status),
> - __entry->allocated)
> + __entry->lclu_allocated, __entry->end_allocated)
> );
>
> /* fsmap traces */
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-04-29 09:25:50

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] ext4: make ext4_da_reserve_space() reserve multi-clusters

On Wed 10-04-24 11:42:00, Zhang Yi wrote:
> From: Zhang Yi <[email protected]>
>
> Add 'nr_resv' parameter to ext4_da_reserve_space(), which indicates the
> number of clusters wants to reserve, make it reserve multi-clusters once
> a time.
>
> Signed-off-by: Zhang Yi <[email protected]>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/inode.c | 18 +++++++++---------
> include/trace/events/ext4.h | 10 ++++++----
> 2 files changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index d37233e2ed0b..1180a9eb4362 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1479,9 +1479,9 @@ static int ext4_journalled_write_end(struct file *file,
> }
>
> /*
> - * Reserve space for a single cluster
> + * Reserve space for 'nr_resv' clusters
> */
> -static int ext4_da_reserve_space(struct inode *inode)
> +static int ext4_da_reserve_space(struct inode *inode, int nr_resv)
> {
> struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> struct ext4_inode_info *ei = EXT4_I(inode);
> @@ -1492,18 +1492,18 @@ static int ext4_da_reserve_space(struct inode *inode)
> * us from metadata over-estimation, though we may go over by
> * a small amount in the end. Here we just reserve for data.
> */
> - ret = dquot_reserve_block(inode, EXT4_C2B(sbi, 1));
> + ret = dquot_reserve_block(inode, EXT4_C2B(sbi, nr_resv));
> if (ret)
> return ret;
>
> spin_lock(&ei->i_block_reservation_lock);
> - if (ext4_claim_free_clusters(sbi, 1, 0)) {
> + if (ext4_claim_free_clusters(sbi, nr_resv, 0)) {
> spin_unlock(&ei->i_block_reservation_lock);
> - dquot_release_reservation_block(inode, EXT4_C2B(sbi, 1));
> + dquot_release_reservation_block(inode, EXT4_C2B(sbi, nr_resv));
> return -ENOSPC;
> }
> - ei->i_reserved_data_blocks++;
> - trace_ext4_da_reserve_space(inode);
> + ei->i_reserved_data_blocks += nr_resv;
> + trace_ext4_da_reserve_space(inode, nr_resv);
> spin_unlock(&ei->i_block_reservation_lock);
>
> return 0; /* success */
> @@ -1678,7 +1678,7 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
> * extents status tree doesn't get a match.
> */
> if (sbi->s_cluster_ratio == 1) {
> - ret = ext4_da_reserve_space(inode);
> + ret = ext4_da_reserve_space(inode, 1);
> if (ret != 0) /* ENOSPC */
> return ret;
> } else { /* bigalloc */
> @@ -1690,7 +1690,7 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
> if (ret < 0)
> return ret;
> if (ret == 0) {
> - ret = ext4_da_reserve_space(inode);
> + ret = ext4_da_reserve_space(inode, 1);
> if (ret != 0) /* ENOSPC */
> return ret;
> } else {
> diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
> index 6b41ac61310f..cc5e9b7b2b44 100644
> --- a/include/trace/events/ext4.h
> +++ b/include/trace/events/ext4.h
> @@ -1246,14 +1246,15 @@ TRACE_EVENT(ext4_da_update_reserve_space,
> );
>
> TRACE_EVENT(ext4_da_reserve_space,
> - TP_PROTO(struct inode *inode),
> + TP_PROTO(struct inode *inode, int nr_resv),
>
> - TP_ARGS(inode),
> + TP_ARGS(inode, nr_resv),
>
> TP_STRUCT__entry(
> __field( dev_t, dev )
> __field( ino_t, ino )
> __field( __u64, i_blocks )
> + __field( int, reserve_blocks )
> __field( int, reserved_data_blocks )
> __field( __u16, mode )
> ),
> @@ -1262,16 +1263,17 @@ TRACE_EVENT(ext4_da_reserve_space,
> __entry->dev = inode->i_sb->s_dev;
> __entry->ino = inode->i_ino;
> __entry->i_blocks = inode->i_blocks;
> + __entry->reserve_blocks = nr_resv;
> __entry->reserved_data_blocks = EXT4_I(inode)->i_reserved_data_blocks;
> __entry->mode = inode->i_mode;
> ),
>
> - TP_printk("dev %d,%d ino %lu mode 0%o i_blocks %llu "
> + TP_printk("dev %d,%d ino %lu mode 0%o i_blocks %llu reserve_blocks %d"
> "reserved_data_blocks %d",
> MAJOR(__entry->dev), MINOR(__entry->dev),
> (unsigned long) __entry->ino,
> __entry->mode, __entry->i_blocks,
> - __entry->reserved_data_blocks)
> + __entry->reserve_blocks, __entry->reserved_data_blocks)
> );
>
> TRACE_EVENT(ext4_da_release_space,
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-04-29 10:07:20

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2 8/9] ext4: make ext4_insert_delayed_block() insert multi-blocks

On Wed 10-04-24 11:42:02, Zhang Yi wrote:
> From: Zhang Yi <[email protected]>
>
> Rename ext4_insert_delayed_block() to ext4_insert_delayed_blocks(),
> pass length parameter to make it insert multi delalloc blocks once a
> time. For non-bigalloc case, just reserve len blocks and insert delalloc
> extent. For bigalloc case, we can ensure the middle clusters are not
> allocated, but need to check whether the start and end clusters are
> delayed/allocated, if not, we should reserve more space for the start
> and/or end block(s).
>
> Signed-off-by: Zhang Yi <[email protected]>

Thanks for the patch. Some comments below.

> ---
> fs/ext4/inode.c | 51 ++++++++++++++++++++++++++++++++++---------------
> 1 file changed, 36 insertions(+), 15 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 46c34baa848a..08e2692b7286 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1678,24 +1678,28 @@ static int ext4_da_check_clu_allocated(struct inode *inode, ext4_lblk_t lblk,
> }
>
> /*
> - * ext4_insert_delayed_block - adds a delayed block to the extents status
> - * tree, incrementing the reserved cluster/block
> - * count or making a pending reservation
> - * where needed
> + * ext4_insert_delayed_blocks - adds a multiple delayed blocks to the extents
> + * status tree, incrementing the reserved
> + * cluster/block count or making pending
> + * reservations where needed
> *
> * @inode - file containing the newly added block
> - * @lblk - logical block to be added
> + * @lblk - start logical block to be added
> + * @len - length of blocks to be added
> *
> * Returns 0 on success, negative error code on failure.
> */
> -static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
> +static int ext4_insert_delayed_blocks(struct inode *inode, ext4_lblk_t lblk,
> + ext4_lblk_t len)
> {
> struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> - int ret;
> - bool allocated = false;
> + int resv_clu, ret;
^^^ this variable is in prinple the length of the extent. Thus
it should be ext4_lblk_t type.

> + bool lclu_allocated = false;
> + bool end_allocated = false;
> + ext4_lblk_t end = lblk + len - 1;
>
> /*
> - * If the cluster containing lblk is shared with a delayed,
> + * If the cluster containing lblk or end is shared with a delayed,
> * written, or unwritten extent in a bigalloc file system, it's
> * already been accounted for and does not need to be reserved.
> * A pending reservation must be made for the cluster if it's
> @@ -1706,21 +1710,38 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
> * extents status tree doesn't get a match.
> */
> if (sbi->s_cluster_ratio == 1) {
> - ret = ext4_da_reserve_space(inode, 1);
> + ret = ext4_da_reserve_space(inode, len);
> if (ret != 0) /* ENOSPC */
> return ret;
> } else { /* bigalloc */
> - ret = ext4_da_check_clu_allocated(inode, lblk, &allocated);
> + resv_clu = EXT4_B2C(sbi, end) - EXT4_B2C(sbi, lblk) - 1;
> + if (resv_clu < 0)
> + resv_clu = 0;

Here resv_clu going negative is strange I'm not sure the math is 100%
correct in all the cases. I think it would be more logical as:

resv_clu = EXT4_B2C(sbi, end) - EXT4_B2C(sbi, lblk) + 1;

and then update resv_clu below as:

> +
> + ret = ext4_da_check_clu_allocated(inode, lblk, &lclu_allocated);
> if (ret < 0)
> return ret;
> - if (ret > 0) {
> - ret = ext4_da_reserve_space(inode, 1);
> + if (ret > 0)
> + resv_clu++;

Here we would do:
if (ret == 0)
resv_clu--;

> +
> + if (EXT4_B2C(sbi, lblk) != EXT4_B2C(sbi, end)) {
> + ret = ext4_da_check_clu_allocated(inode, end,
> + &end_allocated);
> + if (ret < 0)
> + return ret;
> + if (ret > 0)
> + resv_clu++;

And similarly here:
if (ret == 0)
resv_clu--;

Honza

> + }
> +
> + if (resv_clu) {
> + ret = ext4_da_reserve_space(inode, resv_clu);
> if (ret != 0) /* ENOSPC */
> return ret;
> }
> }
>
> - ext4_es_insert_delayed_extent(inode, lblk, 1, allocated, false);
> + ext4_es_insert_delayed_extent(inode, lblk, len, lclu_allocated,
> + end_allocated);
> return 0;
> }
>
> @@ -1823,7 +1844,7 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
> }
> }
>
> - retval = ext4_insert_delayed_block(inode, map->m_lblk);
> + retval = ext4_insert_delayed_blocks(inode, map->m_lblk, map->m_len);
> up_write(&EXT4_I(inode)->i_data_sem);
> if (retval)
> return retval;
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-04-29 10:28:34

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2 3/9] ext4: trim delalloc extent

On Fri 26-04-24 17:38:23, Zhang Yi wrote:
> On 2024/4/25 23:56, Jan Kara wrote:
> > On Wed 10-04-24 11:41:57, Zhang Yi wrote:
> >> From: Zhang Yi <[email protected]>
> >>
> >> The cached delalloc or hole extent should be trimed to the map->map_len
> >> if we map delalloc blocks in ext4_da_map_blocks(). But it doesn't
> >> trigger any issue now because the map->m_len is always set to one and we
> >> always insert one delayed block once a time. Fix this by trim the extent
> >> once we get one from the cached extent tree, prearing for mapping a
> >> extent with multiple delalloc blocks.
> >>
> >> Signed-off-by: Zhang Yi <[email protected]>
> >
> > Well, but we already do the trimming in ext4_da_map_blocks(), don't we? You
> > just move it to a different place... Or do you mean that we actually didn't
> > set 'map' at all in some cases and now we do?
>
> Yeah, now we only trim map len if we found an unwritten extent or written
> extent in the cache, this isn't okay if we found a hole and
> ext4_insert_delayed_block() and ext4_da_map_blocks() support inserting
> map->len blocks. If we found a hole which es->es_len is shorter than the
> length we want to write, we could delay more blocks than we expected.
>
> Please assume we write data [A, C) to a file that contains a hole extent
> [A, B) and a written extent [B, D) in cache.
>
> A B C D
> before da write: ...hhhhhh|wwwwww....
>
> Then we will get extent [A, B), we should trim map->m_len to B-A before
> inserting new delalloc blocks, if not, the range [B, C) is duplicated.

Thanks for explanation!

> > In either case the 'map'
> > handling looks a bit sloppy in ext4_da_map_blocks() as e.g. the
> > 'add_delayed' case doesn't seem to bother with properly setting 'map' based
> > on what it does. So maybe we should clean that up to always set 'map' just
> > before returning at the same place where we update the 'bh'? And maybe bh
> > update could be updated in some common helper because it's content is
> > determined by the 'map' content?
> >
>
> I agree with you, it looks that we should always revise the map->m_len
> once we found an extent from the cache, and then do corresponding handling
> according to the extent type. so it's hard to put it to a common place.
> But we can merge the handling of written and unwritten extent, I've moved
> the bh updating into ext4_da_get_block_prep() and do some cleanup in
> patch 9, please look at that patch, does it looks fine to you?

Oh, yes, what patch 9 does improve things significantly and it addresses my
concern. So feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Maybe in the changelog you can just mention that the remaining cases not
setting map->m_len will be handled in patch 9.

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-04-29 10:29:06

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2 9/9] ext4: make ext4_da_map_blocks() buffer_head unaware

On Wed 10-04-24 11:42:03, Zhang Yi wrote:
> From: Zhang Yi <[email protected]>
>
> After calling ext4_da_map_blocks(), a delalloc extent state could be
> distinguished through EXT4_MAP_DELAYED flag in map. So factor out
> buffer_head related handles in ext4_da_map_blocks(), make it
> buffer_head unaware, make it become a common helper, it could be used
> for iomap in the future.
>
> Signed-off-by: Zhang Yi <[email protected]>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza


> - retval = es.es_len - (map->m_lblk - es.es_lblk);
> - if (retval > map->m_len)
> - retval = map->m_len;
> - map->m_len = retval;
> + map->m_len = min_t(unsigned int, map->m_len,
> + es.es_len - (map->m_lblk - es.es_lblk));
>
> if (ext4_es_is_hole(&es))
> goto add_delayed;
> @@ -1785,10 +1778,8 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
> * Delayed extent could be allocated by fallocate.
> * So we need to check it.
> */
> - if (ext4_es_is_delayed(&es) && !ext4_es_is_unwritten(&es)) {
> - map_bh(bh, inode->i_sb, invalid_block);
> - set_buffer_new(bh);
> - set_buffer_delay(bh);
> + if (ext4_es_is_delonly(&es)) {
> + map->m_flags |= EXT4_MAP_DELAYED;
> return 0;
> }
>
> @@ -1803,7 +1794,7 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
> #ifdef ES_AGGRESSIVE_TEST
> ext4_map_blocks_es_recheck(NULL, inode, map, &orig_map, 0);
> #endif
> - return retval;
> + return 0;
> }
>
> /*
> @@ -1817,7 +1808,7 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
> retval = ext4_map_query_blocks(NULL, inode, map);
> up_read(&EXT4_I(inode)->i_data_sem);
> if (retval)
> - return retval;
> + return retval < 0 ? retval : 0;
>
> add_delayed:
> down_write(&EXT4_I(inode)->i_data_sem);
> @@ -1827,10 +1818,8 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
> * whitout holding i_rwsem and folio lock.
> */
> if (ext4_es_lookup_extent(inode, map->m_lblk, NULL, &es)) {
> - retval = es.es_len - (map->m_lblk - es.es_lblk);
> - if (retval > map->m_len)
> - retval = map->m_len;
> - map->m_len = retval;
> + map->m_len = min_t(unsigned int, map->m_len,
> + es.es_len - (map->m_lblk - es.es_lblk));
>
> if (!ext4_es_is_hole(&es)) {
> up_write(&EXT4_I(inode)->i_data_sem);
> @@ -1840,18 +1829,14 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
> retval = ext4_map_query_blocks(NULL, inode, map);
> if (retval) {
> up_write(&EXT4_I(inode)->i_data_sem);
> - return retval;
> + return retval < 0 ? retval : 0;
> }
> }
>
> + map->m_flags |= EXT4_MAP_DELAYED;
> retval = ext4_insert_delayed_blocks(inode, map->m_lblk, map->m_len);
> up_write(&EXT4_I(inode)->i_data_sem);
> - if (retval)
> - return retval;
>
> - map_bh(bh, inode->i_sb, invalid_block);
> - set_buffer_new(bh);
> - set_buffer_delay(bh);
> return retval;
> }
>
> @@ -1871,11 +1856,15 @@ int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
> struct buffer_head *bh, int create)
> {
> struct ext4_map_blocks map;
> + sector_t invalid_block = ~((sector_t) 0xffff);
> int ret = 0;
>
> BUG_ON(create == 0);
> BUG_ON(bh->b_size != inode->i_sb->s_blocksize);
>
> + if (invalid_block < ext4_blocks_count(EXT4_SB(inode->i_sb)->s_es))
> + invalid_block = ~0;
> +
> map.m_lblk = iblock;
> map.m_len = 1;
>
> @@ -1884,10 +1873,17 @@ int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
> * preallocated blocks are unmapped but should treated
> * the same as allocated blocks.
> */
> - ret = ext4_da_map_blocks(inode, &map, bh);
> - if (ret <= 0)
> + ret = ext4_da_map_blocks(inode, &map);
> + if (ret < 0)
> return ret;
>
> + if (map.m_flags & EXT4_MAP_DELAYED) {
> + map_bh(bh, inode->i_sb, invalid_block);
> + set_buffer_new(bh);
> + set_buffer_delay(bh);
> + return 0;
> + }
> +
> map_bh(bh, inode->i_sb, map.m_pblk);
> ext4_update_bh_state(bh, map.m_flags);
>
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-04-29 11:42:47

by Zhang Yi

[permalink] [raw]
Subject: Re: [PATCH v2 3/9] ext4: trim delalloc extent

On 2024/4/29 18:25, Jan Kara wrote:
> On Fri 26-04-24 17:38:23, Zhang Yi wrote:
>> On 2024/4/25 23:56, Jan Kara wrote:
>>> On Wed 10-04-24 11:41:57, Zhang Yi wrote:
>>>> From: Zhang Yi <[email protected]>
>>>>
>>>> The cached delalloc or hole extent should be trimed to the map->map_len
>>>> if we map delalloc blocks in ext4_da_map_blocks(). But it doesn't
>>>> trigger any issue now because the map->m_len is always set to one and we
>>>> always insert one delayed block once a time. Fix this by trim the extent
>>>> once we get one from the cached extent tree, prearing for mapping a
>>>> extent with multiple delalloc blocks.
>>>>
>>>> Signed-off-by: Zhang Yi <[email protected]>
>>>
>>> Well, but we already do the trimming in ext4_da_map_blocks(), don't we? You
>>> just move it to a different place... Or do you mean that we actually didn't
>>> set 'map' at all in some cases and now we do?
>>
>> Yeah, now we only trim map len if we found an unwritten extent or written
>> extent in the cache, this isn't okay if we found a hole and
>> ext4_insert_delayed_block() and ext4_da_map_blocks() support inserting
>> map->len blocks. If we found a hole which es->es_len is shorter than the
>> length we want to write, we could delay more blocks than we expected.
>>
>> Please assume we write data [A, C) to a file that contains a hole extent
>> [A, B) and a written extent [B, D) in cache.
>>
>> A B C D
>> before da write: ...hhhhhh|wwwwww....
>>
>> Then we will get extent [A, B), we should trim map->m_len to B-A before
>> inserting new delalloc blocks, if not, the range [B, C) is duplicated.
>
> Thanks for explanation!
>

Current change log is not clear enough now, I will put this explanation
into the changelog in my nextn iteratio, make it easier to understand.

>>> In either case the 'map'
>>> handling looks a bit sloppy in ext4_da_map_blocks() as e.g. the
>>> 'add_delayed' case doesn't seem to bother with properly setting 'map' based
>>> on what it does. So maybe we should clean that up to always set 'map' just
>>> before returning at the same place where we update the 'bh'? And maybe bh
>>> update could be updated in some common helper because it's content is
>>> determined by the 'map' content?
>>>
>>
>> I agree with you, it looks that we should always revise the map->m_len
>> once we found an extent from the cache, and then do corresponding handling
>> according to the extent type. so it's hard to put it to a common place.
>> But we can merge the handling of written and unwritten extent, I've moved
>> the bh updating into ext4_da_get_block_prep() and do some cleanup in
>> patch 9, please look at that patch, does it looks fine to you?
>
> Oh, yes, what patch 9 does improve things significantly and it addresses my
> concern. So feel free to add:
>
> Reviewed-by: Jan Kara <[email protected]>
>
> Maybe in the changelog you can just mention that the remaining cases not
> setting map->m_len will be handled in patch 9.
>

Sure.

Thanks,
Yi.


2024-04-29 12:37:24

by Zhang Yi

[permalink] [raw]
Subject: Re: [PATCH v2 5/9] ext4: make ext4_es_insert_delayed_block() insert multi-blocks

On 2024/4/29 17:16, Jan Kara wrote:
> On Wed 10-04-24 11:41:59, Zhang Yi wrote:
>> From: Zhang Yi <[email protected]>
>>
>> Rename ext4_es_insert_delayed_block() to ext4_es_insert_delayed_extent()
>> and pass length parameter to make it insert multi delalloc blocks once a
>> time. For the case of bigalloc, expand the allocated parameter to
>> lclu_allocated and end_allocated. lclu_allocated indicates the allocate
>> state of the cluster which containing the lblk, end_allocated represents
>> the end, and the middle clusters must be unallocated.
>>
>> Signed-off-by: Zhang Yi <[email protected]>
>
> Looks mostly good, just one comment below:
>
>> diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
>> index 4a00e2f019d9..2320b0d71001 100644
>> --- a/fs/ext4/extents_status.c
>> +++ b/fs/ext4/extents_status.c
>> @@ -2052,34 +2052,42 @@ bool ext4_is_pending(struct inode *inode, ext4_lblk_t lblk)
>> }
>>
>> /*
>> - * ext4_es_insert_delayed_block - adds a delayed block to the extents status
>> - * tree, adding a pending reservation where
>> - * needed
>> + * ext4_es_insert_delayed_extent - adds some delayed blocks to the extents
>> + * status tree, adding a pending reservation
>> + * where needed
>> *
>> * @inode - file containing the newly added block
>> - * @lblk - logical block to be added
>> - * @allocated - indicates whether a physical cluster has been allocated for
>> - * the logical cluster that contains the block
>> + * @lblk - start logical block to be added
>> + * @len - length of blocks to be added
>> + * @lclu_allocated/end_allocated - indicates whether a physical cluster has
>> + * been allocated for the logical cluster
>> + * that contains the block
> ^^^^ "start / end
> block" to make it clearer...
>
>> -void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
>> - bool allocated)
>> +void ext4_es_insert_delayed_extent(struct inode *inode, ext4_lblk_t lblk,
>> + ext4_lblk_t len, bool lclu_allocated,
>> + bool end_allocated)
>> {
>> struct extent_status newes;
>> + ext4_lblk_t end = lblk + len - 1;
>> int err1 = 0, err2 = 0, err3 = 0;
>> struct extent_status *es1 = NULL;
>> struct extent_status *es2 = NULL;
>> - struct pending_reservation *pr = NULL;
>> + struct pending_reservation *pr1 = NULL;
>> + struct pending_reservation *pr2 = NULL;
>>
>> if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
>> return;
>>
>> - es_debug("add [%u/1) delayed to extent status tree of inode %lu\n",
>> - lblk, inode->i_ino);
>> + es_debug("add [%u/%u) delayed to extent status tree of inode %lu\n",
>> + lblk, len, inode->i_ino);
>> + if (!len)
>> + return;
>>
>> newes.es_lblk = lblk;
>> - newes.es_len = 1;
>> + newes.es_len = len;
>> ext4_es_store_pblock_status(&newes, ~0, EXTENT_STATUS_DELAYED);
>> - trace_ext4_es_insert_delayed_block(inode, &newes, allocated);
>> + trace_ext4_es_insert_delayed_extent(inode, &newes, lclu_allocated,
>> + end_allocated);
>>
>> ext4_es_insert_extent_check(inode, &newes);
>>
>> @@ -2088,11 +2096,15 @@ void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
>> es1 = __es_alloc_extent(true);
>> if ((err1 || err2) && !es2)
>> es2 = __es_alloc_extent(true);
>> - if ((err1 || err2 || err3) && allocated && !pr)
>> - pr = __alloc_pending(true);
>> + if (err1 || err2 || err3) {
>> + if (lclu_allocated && !pr1)
>> + pr1 = __alloc_pending(true);
>> + if (end_allocated && !pr2)
>> + pr2 = __alloc_pending(true);
>> + }
>> write_lock(&EXT4_I(inode)->i_es_lock);
>>
>> - err1 = __es_remove_extent(inode, lblk, lblk, NULL, es1);
>> + err1 = __es_remove_extent(inode, lblk, end, NULL, es1);
>> if (err1 != 0)
>> goto error;
>> /* Free preallocated extent if it didn't get used. */
>> @@ -2112,13 +2124,22 @@ void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
>> es2 = NULL;
>> }
>>
>> - if (allocated) {
>> - err3 = __insert_pending(inode, lblk, &pr);
>> + if (lclu_allocated) {
>> + err3 = __insert_pending(inode, lblk, &pr1);
>> if (err3 != 0)
>> goto error;
>> - if (pr) {
>> - __free_pending(pr);
>> - pr = NULL;
>> + if (pr1) {
>> + __free_pending(pr1);
>> + pr1 = NULL;
>> + }
>> + }
>> + if (end_allocated) {
>
> So there's one unclear thing here: What if 'lblk' and 'end' are in the same
> cluster? We don't want two pending reservation structures for the cluster.
> __insert_pending() already handles this gracefully but perhaps we don't
> need to allocate 'pr2' in that case and call __insert_pending() at all? I
> think it could be easily handled by something like:
>
> if (EXT4_B2C(lblk) == EXT4_B2C(end))
> end_allocated = false;
>
> at appropriate place in ext4_es_insert_delayed_extent().
>

I've done the check "EXT4_B2C(lblk) == EXT4_B2C(end)" in the caller
ext4_insert_delayed_blocks() in patch 8, becasue there is no need to check
the allocation state if they are in the same cluster, so it could make sure
that end_allocated is always false when 'lblk' and 'end' are in the same
cluster. So I suppose check and set it here again maybe redundant, how about
add a wanging here in ext4_es_insert_delayed_extent(), like:

WARN_ON_ONCE((EXT4_B2C(sbi, lblk) == EXT4_B2C(sbi, end)) &&
end_allocated);

and modify the 'lclu_allocated/end_allocated' parameter comments, note that
end_allocated should always be set to false if the extent is in one cluster.
Is it fine?

Thanks,
Yi.


2024-04-29 12:56:25

by Zhang Yi

[permalink] [raw]
Subject: Re: [PATCH v2 8/9] ext4: make ext4_insert_delayed_block() insert multi-blocks

On 2024/4/29 18:06, Jan Kara wrote:
> On Wed 10-04-24 11:42:02, Zhang Yi wrote:
>> From: Zhang Yi <[email protected]>
>>
>> Rename ext4_insert_delayed_block() to ext4_insert_delayed_blocks(),
>> pass length parameter to make it insert multi delalloc blocks once a
>> time. For non-bigalloc case, just reserve len blocks and insert delalloc
>> extent. For bigalloc case, we can ensure the middle clusters are not
>> allocated, but need to check whether the start and end clusters are
>> delayed/allocated, if not, we should reserve more space for the start
>> and/or end block(s).
>>
>> Signed-off-by: Zhang Yi <[email protected]>
>
> Thanks for the patch. Some comments below.
>
>> ---
>> fs/ext4/inode.c | 51 ++++++++++++++++++++++++++++++++++---------------
>> 1 file changed, 36 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 46c34baa848a..08e2692b7286 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -1678,24 +1678,28 @@ static int ext4_da_check_clu_allocated(struct inode *inode, ext4_lblk_t lblk,
>> }
>>
>> /*
>> - * ext4_insert_delayed_block - adds a delayed block to the extents status
>> - * tree, incrementing the reserved cluster/block
>> - * count or making a pending reservation
>> - * where needed
>> + * ext4_insert_delayed_blocks - adds a multiple delayed blocks to the extents
>> + * status tree, incrementing the reserved
>> + * cluster/block count or making pending
>> + * reservations where needed
>> *
>> * @inode - file containing the newly added block
>> - * @lblk - logical block to be added
>> + * @lblk - start logical block to be added
>> + * @len - length of blocks to be added
>> *
>> * Returns 0 on success, negative error code on failure.
>> */
>> -static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
>> +static int ext4_insert_delayed_blocks(struct inode *inode, ext4_lblk_t lblk,
>> + ext4_lblk_t len)
>> {
>> struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
>> - int ret;
>> - bool allocated = false;
>> + int resv_clu, ret;
> ^^^ this variable is in prinple the length of the extent. Thus
> it should be ext4_lblk_t type.
>
>> + bool lclu_allocated = false;
>> + bool end_allocated = false;
>> + ext4_lblk_t end = lblk + len - 1;
>>
>> /*
>> - * If the cluster containing lblk is shared with a delayed,
>> + * If the cluster containing lblk or end is shared with a delayed,
>> * written, or unwritten extent in a bigalloc file system, it's
>> * already been accounted for and does not need to be reserved.
>> * A pending reservation must be made for the cluster if it's
>> @@ -1706,21 +1710,38 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
>> * extents status tree doesn't get a match.
>> */
>> if (sbi->s_cluster_ratio == 1) {
>> - ret = ext4_da_reserve_space(inode, 1);
>> + ret = ext4_da_reserve_space(inode, len);
>> if (ret != 0) /* ENOSPC */
>> return ret;
>> } else { /* bigalloc */
>> - ret = ext4_da_check_clu_allocated(inode, lblk, &allocated);
>> + resv_clu = EXT4_B2C(sbi, end) - EXT4_B2C(sbi, lblk) - 1;
>> + if (resv_clu < 0)
>> + resv_clu = 0;
>
> Here resv_clu going negative is strange I'm not sure the math is 100%
> correct in all the cases. I think it would be more logical as:
>
> resv_clu = EXT4_B2C(sbi, end) - EXT4_B2C(sbi, lblk) + 1;
>> and then update resv_clu below as:
>
>> +
>> + ret = ext4_da_check_clu_allocated(inode, lblk, &lclu_allocated);
>> if (ret < 0)
>> return ret;
>> - if (ret > 0) {
>> - ret = ext4_da_reserve_space(inode, 1);
>> + if (ret > 0)
>> + resv_clu++;
>
> Here we would do:
> if (ret == 0)
> resv_clu--;
>
>> +
>> + if (EXT4_B2C(sbi, lblk) != EXT4_B2C(sbi, end)) {
>> + ret = ext4_da_check_clu_allocated(inode, end,
>> + &end_allocated);
>> + if (ret < 0)
>> + return ret;
>> + if (ret > 0)
>> + resv_clu++;
>
> And similarly here:
> if (ret == 0)
> resv_clu--;

Oh, yes, it is definitely more logical than mine. Thanks for taking time
to review this series, other changelog and comments suggestions in this
series are looks fine to me, I will use them. Besides, Ritesh improved
my changelog in patch 2, I will keep your reviewed tag if you don't have
different opinions.

Thanks,
Yi.

>
>> + }
>> +
>> + if (resv_clu) {
>> + ret = ext4_da_reserve_space(inode, resv_clu);
>> if (ret != 0) /* ENOSPC */
>> return ret;
>> }
>> }
>>
>> - ext4_es_insert_delayed_extent(inode, lblk, 1, allocated, false);
>> + ext4_es_insert_delayed_extent(inode, lblk, len, lclu_allocated,
>> + end_allocated);
>> return 0;
>> }
>>
>> @@ -1823,7 +1844,7 @@ static int ext4_da_map_blocks(struct inode *inode, struct ext4_map_blocks *map,
>> }
>> }
>>
>> - retval = ext4_insert_delayed_block(inode, map->m_lblk);
>> + retval = ext4_insert_delayed_blocks(inode, map->m_lblk, map->m_len);
>> up_write(&EXT4_I(inode)->i_data_sem);
>> if (retval)
>> return retval;
>> --
>> 2.39.2
>>


2024-04-29 16:27:40

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v2 5/9] ext4: make ext4_es_insert_delayed_block() insert multi-blocks

On Mon 29-04-24 20:09:46, Zhang Yi wrote:
> On 2024/4/29 17:16, Jan Kara wrote:
> > On Wed 10-04-24 11:41:59, Zhang Yi wrote:
> >> From: Zhang Yi <[email protected]>
> >>
> >> Rename ext4_es_insert_delayed_block() to ext4_es_insert_delayed_extent()
> >> and pass length parameter to make it insert multi delalloc blocks once a
> >> time. For the case of bigalloc, expand the allocated parameter to
> >> lclu_allocated and end_allocated. lclu_allocated indicates the allocate
> >> state of the cluster which containing the lblk, end_allocated represents
> >> the end, and the middle clusters must be unallocated.
> >>
> >> Signed-off-by: Zhang Yi <[email protected]>
...
> >> @@ -2112,13 +2124,22 @@ void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
> >> es2 = NULL;
> >> }
> >>
> >> - if (allocated) {
> >> - err3 = __insert_pending(inode, lblk, &pr);
> >> + if (lclu_allocated) {
> >> + err3 = __insert_pending(inode, lblk, &pr1);
> >> if (err3 != 0)
> >> goto error;
> >> - if (pr) {
> >> - __free_pending(pr);
> >> - pr = NULL;
> >> + if (pr1) {
> >> + __free_pending(pr1);
> >> + pr1 = NULL;
> >> + }
> >> + }
> >> + if (end_allocated) {
> >
> > So there's one unclear thing here: What if 'lblk' and 'end' are in the same
> > cluster? We don't want two pending reservation structures for the cluster.
> > __insert_pending() already handles this gracefully but perhaps we don't
> > need to allocate 'pr2' in that case and call __insert_pending() at all? I
> > think it could be easily handled by something like:
> >
> > if (EXT4_B2C(lblk) == EXT4_B2C(end))
> > end_allocated = false;
> >
> > at appropriate place in ext4_es_insert_delayed_extent().
> >
>
> I've done the check "EXT4_B2C(lblk) == EXT4_B2C(end)" in the caller
> ext4_insert_delayed_blocks() in patch 8, becasue there is no need to check
> the allocation state if they are in the same cluster, so it could make sure
> that end_allocated is always false when 'lblk' and 'end' are in the same
> cluster. So I suppose check and set it here again maybe redundant, how about
> add a wanging here in ext4_es_insert_delayed_extent(), like:
>
> WARN_ON_ONCE((EXT4_B2C(sbi, lblk) == EXT4_B2C(sbi, end)) &&
> end_allocated);
>
> and modify the 'lclu_allocated/end_allocated' parameter comments, note that
> end_allocated should always be set to false if the extent is in one cluster.
> Is it fine?

Yes, that is a good solution as well!

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR