Hi, this series contain some random cleanup patches and some bugfix
patches to make EXT4_MB_HINT_GOAL_ONLY work properly, protect pa->pa_free
from race and so on. More details can be found in git log.
Thanks!
---
V2:
-Add signed-off from Ritesh and Ojaswin to patch 3/20 "ext4: get correct
ext4_group_info in ext4_mb_prefetch_fini" as this is a duplicate of
a patch under reviewing.
-Split out original patch "ext4: avoid to use preallocated blocks if
EXT4_MB_HINT_GOAL_ONLY is set" which will be resend after improved.
-Improve log information of patch 20.
-Collect Reviewed-by from Ojaswin and Ritesh. Now only patch 3, 12 and
20 need futher review.
---
Kemeng Shi (20):
ext4: set goal start correctly in ext4_mb_normalize_request
ext4: allow to find by goal if EXT4_MB_HINT_GOAL_ONLY is set
ext4: get correct ext4_group_info in ext4_mb_prefetch_fini
ext4: correct calculation of s_mb_preallocated
ext4: correct start of used group pa for debug in ext4_mb_use_group_pa
ext4: protect pa->pa_free in ext4_discard_allocated_blocks
ext4: add missed brelse in ext4_free_blocks_simple
ext4: remove unused return value of ext4_mb_try_best_found and
ext4_mb_free_metadata
ext4: Remove unnecessary release when memory allocation failed in
ext4_mb_init_cache
ext4: remove unnecessary e4b->bd_buddy_page check in
ext4_mb_load_buddy_gfp
ext4: remove unnecessary check in ext4_mb_new_blocks
ext4: remove dead check in mb_buddy_mark_free
ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in
ext4_mb_check_limits
ext4: use best found when complex scan of group finishs
ext4: remove unnecessary exit_meta_group_info tag
ext4: remove unnecessary count2 in ext4_free_data_in_buddy
ext4: remove unnecessary goto in ext4_mb_mark_diskspace_used
ext4: remove repeat assignment to ac_f_ex
ext4: remove comment code ext4_discard_preallocations
ext4: simplify calculation of blkoff in ext4_mb_new_blocks_simple
fs/ext4/mballoc.c | 104 +++++++++++++++++-----------------------------
1 file changed, 38 insertions(+), 66 deletions(-)
--
2.30.0
We need to set ac_g_ex to notify the goal start used in
ext4_mb_find_by_goal. Set ac_g_ex instead of ac_f_ex in
ext4_mb_normalize_request.
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Ojaswin Mujoo <[email protected]>
---
fs/ext4/mballoc.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 5b2ae37a8b80..0650a1dc870e 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4191,15 +4191,15 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac,
if (ar->pright && (ar->lright == (start + size))) {
/* merge to the right */
ext4_get_group_no_and_offset(ac->ac_sb, ar->pright - size,
- &ac->ac_f_ex.fe_group,
- &ac->ac_f_ex.fe_start);
+ &ac->ac_g_ex.fe_group,
+ &ac->ac_g_ex.fe_start);
ac->ac_flags |= EXT4_MB_HINT_TRY_GOAL;
}
if (ar->pleft && (ar->lleft + 1 == start)) {
/* merge to the left */
ext4_get_group_no_and_offset(ac->ac_sb, ar->pleft + 1,
- &ac->ac_f_ex.fe_group,
- &ac->ac_f_ex.fe_start);
+ &ac->ac_g_ex.fe_group,
+ &ac->ac_g_ex.fe_start);
ac->ac_flags |= EXT4_MB_HINT_TRY_GOAL;
}
--
2.30.0
If EXT4_MB_HINT_GOAL_ONLY is set, ext4_mb_regular_allocator will only
allocate blocks from ext4_mb_find_by_goal. Allow to find by goal in
ext4_mb_find_by_goal if EXT4_MB_HINT_GOAL_ONLY is set or allocation
with EXT4_MB_HINT_GOAL_ONLY set will always fail.
EXT4_MB_HINT_GOAL_ONLY is not used at all, so the problem is not
found for now.
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Ojaswin Mujoo <[email protected]>
---
fs/ext4/mballoc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 0650a1dc870e..375d9655b525 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2162,7 +2162,7 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
struct ext4_group_info *grp = ext4_get_group_info(ac->ac_sb, group);
struct ext4_free_extent ex;
- if (!(ac->ac_flags & EXT4_MB_HINT_TRY_GOAL))
+ if (!(ac->ac_flags & (EXT4_MB_HINT_TRY_GOAL | EXT4_MB_HINT_GOAL_ONLY)))
return 0;
if (grp->bb_free == 0)
return 0;
--
2.30.0
We will add pa_free to s_mb_preallocated when new ext4_prealloc_space is
created. In ext4_mb_new_inode_pa, we will call ext4_mb_use_inode_pa
before adding pa_free to s_mb_preallocated. However, ext4_mb_use_inode_pa
will consume pa_free for block allocation which triggerred the creation
of ext4_prealloc_space. Add pa_free to s_mb_preallocated before
ext4_mb_use_inode_pa to correct calculation of s_mb_preallocated.
There is no such problem in ext4_mb_new_group_pa as pa_free of group pa
is consumed in ext4_mb_release_context instead of ext4_mb_use_group_pa.
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Ojaswin Mujoo <[email protected]>
---
fs/ext4/mballoc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index b94c9f331a3c..90f765a350ea 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4667,8 +4667,8 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
pa->pa_len, pa->pa_lstart);
trace_ext4_mb_new_inode_pa(ac, pa);
- ext4_mb_use_inode_pa(ac, pa);
atomic_add(pa->pa_free, &sbi->s_mb_preallocated);
+ ext4_mb_use_inode_pa(ac, pa);
ei = EXT4_I(ac->ac_inode);
grp = ext4_get_group_info(sb, ac->ac_b_ex.fe_group);
--
2.30.0
As we don't correct pa_lstart here, so there is no need to subtract
pa_lstart with consumed len.
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Ojaswin Mujoo <[email protected]>
---
fs/ext4/mballoc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 90f765a350ea..31cb6ac1bc47 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4319,7 +4319,7 @@ static void ext4_mb_use_group_pa(struct ext4_allocation_context *ac,
* Other CPUs are prevented from allocating from this pa by lg_mutex
*/
mb_debug(ac->ac_sb, "use %u/%u from group pa %p\n",
- pa->pa_lstart-len, len, pa);
+ pa->pa_lstart, len, pa);
}
/*
--
2.30.0
If ext4_mb_mark_diskspace_used fails in ext4_mb_new_blocks, we may
discard pa already in list. Protect pa with pa_lock to avoid race.
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Ojaswin Mujoo <[email protected]>
---
fs/ext4/mballoc.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 31cb6ac1bc47..aec4a7b7af20 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4263,8 +4263,11 @@ static void ext4_discard_allocated_blocks(struct ext4_allocation_context *ac)
ext4_mb_unload_buddy(&e4b);
return;
}
- if (pa->pa_type == MB_INODE_PA)
+ if (pa->pa_type == MB_INODE_PA) {
+ spin_lock(&pa->pa_lock);
pa->pa_free += ac->ac_b_ex.fe_len;
+ spin_unlock(&pa->pa_lock);
+ }
}
/*
--
2.30.0
We always get ext4_group_desc with group + 1 and ext4_group_info with
group to check if we need do initialize ext4_group_info for the group.
Just get ext4_group_desc with group for ext4_group_info initialization
check.
Signed-off-by: Ritesh Harjani (IBM) <[email protected]>
Signed-off-by: Ojaswin Mujoo <[email protected]>
Signed-off-by: Kemeng Shi <[email protected]>
---
fs/ext4/mballoc.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 375d9655b525..b94c9f331a3c 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2569,14 +2569,14 @@ ext4_group_t ext4_mb_prefetch(struct super_block *sb, ext4_group_t group,
void ext4_mb_prefetch_fini(struct super_block *sb, ext4_group_t group,
unsigned int nr)
{
- while (nr-- > 0) {
- struct ext4_group_desc *gdp = ext4_get_group_desc(sb, group,
- NULL);
- struct ext4_group_info *grp = ext4_get_group_info(sb, group);
+ struct ext4_group_desc *gdp;
+ struct ext4_group_info *grp;
+ while (nr-- > 0) {
if (!group)
group = ext4_get_groups_count(sb);
group--;
+ gdp = ext4_get_group_desc(sb, group, NULL);
grp = ext4_get_group_info(sb, group);
if (EXT4_MB_GRP_NEED_INIT(grp) &&
--
2.30.0
Return value static function ext4_mb_try_best_found and
ext4_mb_free_metadata is not used. Just remove unused return value.
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Ojaswin Mujoo <[email protected]>
---
fs/ext4/mballoc.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c26421170406..0a34211bb507 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2124,7 +2124,7 @@ static void ext4_mb_measure_extent(struct ext4_allocation_context *ac,
}
static noinline_for_stack
-int ext4_mb_try_best_found(struct ext4_allocation_context *ac,
+void ext4_mb_try_best_found(struct ext4_allocation_context *ac,
struct ext4_buddy *e4b)
{
struct ext4_free_extent ex = ac->ac_b_ex;
@@ -2135,7 +2135,7 @@ int ext4_mb_try_best_found(struct ext4_allocation_context *ac,
BUG_ON(ex.fe_len <= 0);
err = ext4_mb_load_buddy(ac->ac_sb, group, e4b);
if (err)
- return err;
+ return;
ext4_lock_group(ac->ac_sb, group);
max = mb_find_extent(e4b, ex.fe_start, ex.fe_len, &ex);
@@ -2147,8 +2147,6 @@ int ext4_mb_try_best_found(struct ext4_allocation_context *ac,
ext4_unlock_group(ac->ac_sb, group);
ext4_mb_unload_buddy(e4b);
-
- return 0;
}
static noinline_for_stack
@@ -5696,7 +5694,7 @@ static void ext4_try_merge_freed_extent(struct ext4_sb_info *sbi,
kmem_cache_free(ext4_free_data_cachep, entry);
}
-static noinline_for_stack int
+static noinline_for_stack void
ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b,
struct ext4_free_data *new_entry)
{
@@ -5739,7 +5737,7 @@ ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b,
EXT4_C2B(sbi, cluster),
"Block already on to-be-freed list");
kmem_cache_free(ext4_free_data_cachep, new_entry);
- return 0;
+ return;
}
}
@@ -5765,7 +5763,6 @@ ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b,
list_add_tail(&new_entry->efd_list, &sbi->s_freed_data_list);
sbi->s_mb_free_pending += clusters;
spin_unlock(&sbi->s_md_lock);
- return 0;
}
/*
--
2.30.0
Release bitmap buffer_head we got if error occurs.
Besides, this patch remove unused assignment to err.
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Ojaswin Mujoo <[email protected]>
---
fs/ext4/mballoc.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index aec4a7b7af20..c26421170406 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5845,13 +5845,12 @@ static void ext4_free_blocks_simple(struct inode *inode, ext4_fsblk_t block,
ext4_get_group_no_and_offset(sb, block, &group, &blkoff);
bitmap_bh = ext4_read_block_bitmap(sb, group);
if (IS_ERR(bitmap_bh)) {
- err = PTR_ERR(bitmap_bh);
pr_warn("Failed to read block bitmap\n");
return;
}
gdp = ext4_get_group_desc(sb, group, &gdp_bh);
if (!gdp)
- return;
+ goto err_out;
for (i = 0; i < count; i++) {
if (!mb_test_bit(blkoff + i, bitmap_bh->b_data))
@@ -5860,7 +5859,7 @@ static void ext4_free_blocks_simple(struct inode *inode, ext4_fsblk_t block,
mb_clear_bits(bitmap_bh->b_data, blkoff, count);
err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh);
if (err)
- return;
+ goto err_out;
ext4_free_group_clusters_set(
sb, gdp, ext4_free_group_clusters(sb, gdp) +
count - already_freed);
@@ -5869,6 +5868,8 @@ static void ext4_free_blocks_simple(struct inode *inode, ext4_fsblk_t block,
ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
sync_dirty_buffer(bitmap_bh);
sync_dirty_buffer(gdp_bh);
+
+err_out:
brelse(bitmap_bh);
}
--
2.30.0
If we alloc array of buffer_head failed, there is no resource need to be
freed and we can simpily return error.
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Ojaswin Mujoo <[email protected]>
---
fs/ext4/mballoc.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 0a34211bb507..64a889b357d2 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1168,10 +1168,8 @@ static int ext4_mb_init_cache(struct page *page, char *incore, gfp_t gfp)
if (groups_per_page > 1) {
i = sizeof(struct buffer_head *) * groups_per_page;
bh = kzalloc(i, gfp);
- if (bh == NULL) {
- err = -ENOMEM;
- goto out;
- }
+ if (bh == NULL)
+ return -ENOMEM;
} else
bh = &bhs;
--
2.30.0
e4b->bd_buddy_page is only set if we initialize ext4_buddy successfully. So
e4b->bd_buddy_page is always NULL in error handle branch. Just remove the
dead check.
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Ojaswin Mujoo <[email protected]>
---
fs/ext4/mballoc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 64a889b357d2..30d5a0882ebc 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1555,8 +1555,7 @@ ext4_mb_load_buddy_gfp(struct super_block *sb, ext4_group_t group,
put_page(page);
if (e4b->bd_bitmap_page)
put_page(e4b->bd_bitmap_page);
- if (e4b->bd_buddy_page)
- put_page(e4b->bd_buddy_page);
+
e4b->bd_buddy = NULL;
e4b->bd_bitmap = NULL;
return ret;
--
2.30.0
1. remove unnecessary ac check:
We always go to out tag before ac is successfully allocated, then we can
move out tag after free of ac and remove NULL check of ac.
2. remove unnecessary *errp check:
We always go to errout tag if *errp is non-zero, then we can move errout
tag into error handle if *errp is non-zero.
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Ojaswin Mujoo <[email protected]>
---
fs/ext4/mballoc.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 30d5a0882ebc..fea835d8d241 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5638,16 +5638,15 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle,
*errp = -ENOSPC;
}
-errout:
if (*errp) {
+errout:
ac->ac_b_ex.fe_len = 0;
ar->len = 0;
ext4_mb_show_ac(ac);
}
ext4_mb_release_context(ac);
+ kmem_cache_free(ext4_ac_cachep, ac);
out:
- if (ac)
- kmem_cache_free(ext4_ac_cachep, ac);
if (inquota && ar->len < inquota)
dquot_free_block(ar->inode, EXT4_C2B(sbi, inquota - ar->len));
if (!ar->len) {
--
2.30.0
We always adjust first to even number and adjust last to odd number, so
first == last will never happen. Remove this dead check.
Signed-off-by: Kemeng Shi <[email protected]>
---
fs/ext4/mballoc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index fea835d8d241..f40af7430fca 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1718,7 +1718,8 @@ static void mb_buddy_mark_free(struct ext4_buddy *e4b, int first, int last)
break;
order++;
- if (first == last || !(buddy2 = mb_find_buddy(e4b, order, &max))) {
+ buddy2 = mb_find_buddy(e4b, order, &max);
+ if (!buddy2) {
mb_clear_bits(buddy, first, last - first + 1);
e4b->bd_info->bb_counters[order - 1] += last - first + 1;
break;
--
2.30.0
Only call trace of ext4_mb_check_limits is as following:
ext4_mb_complex_scan_group
ext4_mb_measure_extent
ext4_mb_check_limits(ac, e4b, 0);
ext4_mb_check_limits(ac, e4b, 1);
If the first ac->ac_found > sbi->s_mb_max_to_scan check in
ext4_mb_check_limits is met, we will set ac_status to
AC_STATUS_BREAK and call ext4_mb_try_best_found to try to use
ac->ac_b_ex.
If ext4_mb_try_best_found successes, then block allocation finishs,
the removed ac->ac_found > sbi->s_mb_min_to_scan check is not reachable.
If ext4_mb_try_best_found fails, then we set EXT4_MB_HINT_FIRST and
reset ac->ac_b_ex to retry block allocation. We will use any found
free extent in ext4_mb_measure_extent before reach the removed
ac->ac_found > sbi->s_mb_min_to_scan check.
In summary, the removed ac->ac_found > sbi->s_mb_min_to_scan check is
not reachable and we can remove that dead check.
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Ojaswin Mujoo <[email protected]>
---
fs/ext4/mballoc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index f40af7430fca..07a840e8543c 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2039,8 +2039,7 @@ static void ext4_mb_check_limits(struct ext4_allocation_context *ac,
if (bex->fe_len < gex->fe_len)
return;
- if ((finish_group || ac->ac_found > sbi->s_mb_min_to_scan)
- && bex->fe_group == e4b->bd_group) {
+ if (finish_group && bex->fe_group == e4b->bd_group) {
/* recheck chunk's availability - we don't know
* when it was found (within this lock-unlock
* period or not) */
--
2.30.0
We goto exit_meta_group_info only to return -ENOMEM. Return -ENOMEM
directly instead of goto to remove this unnecessary tag.
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Ojaswin Mujoo <[email protected]>
---
fs/ext4/mballoc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c637d0a89e2e..1175c81f8e2d 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -3069,7 +3069,7 @@ int ext4_mb_add_groupinfo(struct super_block *sb, ext4_group_t group,
if (meta_group_info == NULL) {
ext4_msg(sb, KERN_ERR, "can't allocate mem "
"for a buddy group");
- goto exit_meta_group_info;
+ return -ENOMEM;
}
rcu_read_lock();
rcu_dereference(sbi->s_group_info)[idx] = meta_group_info;
@@ -3123,7 +3123,6 @@ int ext4_mb_add_groupinfo(struct super_block *sb, ext4_group_t group,
group_info[idx] = NULL;
rcu_read_unlock();
}
-exit_meta_group_info:
return -ENOMEM;
} /* ext4_mb_add_groupinfo */
--
2.30.0
If any bex which meets bex->fe_len >= gex->fe_len is found, then it will
always be used when complex scan of group that bex belongs to finishs.
So there will not be any lock-unlock period.
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Ojaswin Mujoo <[email protected]>
---
fs/ext4/mballoc.c | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 07a840e8543c..c637d0a89e2e 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2019,8 +2019,6 @@ static void ext4_mb_check_limits(struct ext4_allocation_context *ac,
struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
struct ext4_free_extent *bex = &ac->ac_b_ex;
struct ext4_free_extent *gex = &ac->ac_g_ex;
- struct ext4_free_extent ex;
- int max;
if (ac->ac_status == AC_STATUS_FOUND)
return;
@@ -2039,16 +2037,8 @@ static void ext4_mb_check_limits(struct ext4_allocation_context *ac,
if (bex->fe_len < gex->fe_len)
return;
- if (finish_group && bex->fe_group == e4b->bd_group) {
- /* recheck chunk's availability - we don't know
- * when it was found (within this lock-unlock
- * period or not) */
- max = mb_find_extent(e4b, bex->fe_start, gex->fe_len, &ex);
- if (max >= gex->fe_len) {
- ext4_mb_use_best_found(ac, e4b);
- return;
- }
- }
+ if (finish_group)
+ ext4_mb_use_best_found(ac, e4b);
}
/*
--
2.30.0
When ext4_read_block_bitmap fails, we can return PTR_ERR(bitmap_bh) to
remove unnecessary NULL check of bitmap_bh.
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Ritesh Harjani (IBM) <[email protected]>
---
fs/ext4/mballoc.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 977f89856aed..f7fa081bece9 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -3739,9 +3739,7 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
bitmap_bh = ext4_read_block_bitmap(sb, ac->ac_b_ex.fe_group);
if (IS_ERR(bitmap_bh)) {
- err = PTR_ERR(bitmap_bh);
- bitmap_bh = NULL;
- goto out_err;
+ return PTR_ERR(bitmap_bh);
}
BUFFER_TRACE(bitmap_bh, "getting write access");
--
2.30.0
We calculate blkoff for groups as following:
ext4_get_group_no_and_offset(sb,
max(ext4_group_first_block_no(sb, group), goal),
NULL, &blkoff);
For first group, the blkoff is already calculated before loop, so this
is redundant.
For groups after first group which contains goal, the result of
max(...) above is always ext4_group_first_block_no(sb, group). The
blkoff of first block in group is always 0, so blkoff of groups after
the first group which contains goal is always 0.
So we can clean blkoff calculation as following:
1. Remove blkoff calculation above to remove repeat calculation of
first group.
2. Set blkoff to 0 to set blkoff for groups after first group.
Signed-off-by: Kemeng Shi <[email protected]>
---
fs/ext4/mballoc.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index f65894bfaff2..2cc655de1853 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5769,9 +5769,6 @@ static ext4_fsblk_t ext4_mb_new_blocks_simple(handle_t *handle,
return 0;
}
- ext4_get_group_no_and_offset(sb,
- max(ext4_group_first_block_no(sb, group), goal),
- NULL, &blkoff);
while (1) {
i = mb_find_next_zero_bit(bitmap_bh->b_data, max,
blkoff);
@@ -5786,6 +5783,8 @@ static ext4_fsblk_t ext4_mb_new_blocks_simple(handle_t *handle,
brelse(bitmap_bh);
if (i < max)
break;
+
+ blkoff = 0;
}
if (group >= ext4_get_groups_count(sb) || i >= max) {
--
2.30.0
Call trace to assign ac_f_ex:
ext4_mb_use_best_found
ac->ac_f_ex = ac->ac_b_ex;
ext4_mb_new_preallocation
ext4_mb_new_group_pa
ac->ac_f_ex = ac->ac_b_ex;
ext4_mb_new_inode_pa
ac->ac_f_ex = ac->ac_b_ex;
Actually allocated blocks is already stored in ac_f_ex in
ext4_mb_use_best_found, so there is no need to assign ac_f_ex
in ext4_mb_new_group_pa and ext4_mb_new_inode_pa.
Just remove repeat assignment to ac_f_ex in ext4_mb_new_group_pa
and ext4_mb_new_inode_pa.
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Ojaswin Mujoo <[email protected]>
---
fs/ext4/mballoc.c | 8 --------
1 file changed, 8 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index f7fa081bece9..8c5ad44f71f4 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4632,10 +4632,6 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len);
}
- /* preallocation can change ac_b_ex, thus we store actually
- * allocated blocks for history */
- ac->ac_f_ex = ac->ac_b_ex;
-
pa->pa_lstart = ac->ac_b_ex.fe_logical;
pa->pa_pstart = ext4_grp_offs_to_block(sb, &ac->ac_b_ex);
pa->pa_len = ac->ac_b_ex.fe_len;
@@ -4686,10 +4682,6 @@ ext4_mb_new_group_pa(struct ext4_allocation_context *ac)
pa = ac->ac_pa;
- /* preallocation can change ac_b_ex, thus we store actually
- * allocated blocks for history */
- ac->ac_f_ex = ac->ac_b_ex;
-
pa->pa_pstart = ext4_grp_offs_to_block(sb, &ac->ac_b_ex);
pa->pa_lstart = pa->pa_pstart;
pa->pa_len = ac->ac_b_ex.fe_len;
--
2.30.0
Just remove comment code in ext4_discard_preallocations.
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Ojaswin Mujoo <[email protected]>
---
fs/ext4/mballoc.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 8c5ad44f71f4..f65894bfaff2 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4922,7 +4922,6 @@ void ext4_discard_preallocations(struct inode *inode, unsigned int needed)
int err;
if (!S_ISREG(inode->i_mode)) {
- /*BUG_ON(!list_empty(&ei->i_prealloc_list));*/
return;
}
--
2.30.0
count2 is always 1 in mb_debug, just remove unnecessary count2.
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Ojaswin Mujoo <[email protected]>
---
fs/ext4/mballoc.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 1175c81f8e2d..977f89856aed 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -3590,7 +3590,7 @@ static void ext4_free_data_in_buddy(struct super_block *sb,
{
struct ext4_buddy e4b;
struct ext4_group_info *db;
- int err, count = 0, count2 = 0;
+ int err, count = 0;
mb_debug(sb, "gonna free %u blocks in group %u (0x%p):",
entry->efd_count, entry->efd_group, entry);
@@ -3606,7 +3606,6 @@ static void ext4_free_data_in_buddy(struct super_block *sb,
db = e4b.bd_info;
/* there are blocks to put in buddy to make them really free */
count += entry->efd_count;
- count2++;
ext4_lock_group(sb, entry->efd_group);
/* Take it out of per group rb tree */
rb_erase(&entry->efd_node, &(db->bb_free_root));
@@ -3631,8 +3630,7 @@ static void ext4_free_data_in_buddy(struct super_block *sb,
ext4_unlock_group(sb, entry->efd_group);
ext4_mb_unload_buddy(&e4b);
- mb_debug(sb, "freed %d blocks in %d structures\n", count,
- count2);
+ mb_debug(sb, "freed %d blocks in 1 structures\n", count);
}
/*
--
2.30.0
Kemeng Shi <[email protected]> writes:
> Hi, this series contain some random cleanup patches and some bugfix
> patches to make EXT4_MB_HINT_GOAL_ONLY work properly, protect pa->pa_free
> from race and so on. More details can be found in git log.
> Thanks!
Hi Kemeng,
Did you run any testing on these patches? Because I was very easily able
to hit ext/009 causing kernel BUG_ON with default mkfs/mount options.
It's always a good and recommended practice to ensure some level of
testing on any of the patches we submit to community for review
and call out in the cover letter on what has been tested and what is not.
<Call stack>
[ 208.545365] run fstests ext4/009 at 2023-02-28 21:44:06
[ 216.581660] EXT4-fs (loop7): mounted filesystem 33805b33-04c1-48c3-8de3-9c78f99a7598 with ordered data mode..
[ 216.709050] EXT4-fs (loop7): unmounting filesystem 33805b33-04c1-48c3-8de3-9c78f99a7598.
[ 218.878042] EXT4-fs (loop7): mounted filesystem 8a919af6-f8f4-4ef4-949b-673ccd9ae8c7 with ordered data mode..
[ 255.517357] ------------[ cut here ]------------
[ 255.520274] kernel BUG at fs/ext4/ext4.h:3331!
[ 255.522233] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
[ 255.524740] CPU: 7 PID: 2567 Comm: xfs_io Not tainted 6.2.0-rc8-xfstests-00041-gb1b4634ed055 #21
[ 255.527807] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.4
[ 255.531645] RIP: 0010:ext4_get_group_info+0x272/0x2f0
[ 255.533682] Code: 0f 85 a9 fe ff ff 48 c7 c2 c0 9b 7d 84 be fd 02 00 00 48 c7 c7 c0 95 7d 84 c6 05 9e b4 3b 8
[ 255.539798] RSP: 0018:ffff8881fcd6f6b0 EFLAGS: 00010246
[ 255.541721] RAX: 0000000000000000 RBX: ffff8881bfc54000 RCX: ffffffff81ec3d1a
[ 255.544181] RDX: 1ffff11040b8a208 RSI: 0000000000000050 RDI: ffff888205c51040
[ 255.546695] RBP: ffff888205c51000 R08: 0000000000000000 R09: ffff8881bfc54000
[ 255.549151] R10: ffffed102af9756b R11: ffff8881fcd6f5b4 R12: ffff8881fcd6f8a8
[ 255.551588] R13: ffff8881bfc546e8 R14: ffff888157c189b8 R15: ffff888157c189e0
[ 255.554039] FS: 00007ffff7e58840(0000) GS:ffff8883ecc00000(0000) knlGS:0000000000000000
[ 255.556788] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 255.558806] CR2: 00007ffff7a7bd58 CR3: 0000000181db4005 CR4: 0000000000170ee0
[ 255.561259] Call Trace:
[ 255.562248] <TASK>
[ 255.563164] ? kasan_set_track+0x25/0x30
[ 255.564710] ext4_mb_find_by_goal+0xf1/0xda0
[ 255.566317] ? ext4_alloc_file_blocks.isra.0+0x2a7/0x9a0
[ 255.568230] ? ext4_fallocate+0x28b/0x7d0
[ 255.569727] ? vfs_fallocate+0x2b0/0xb90
[ 255.571238] ? __x64_sys_fallocate+0xb9/0x110
[ 255.572852] ? do_syscall_64+0x3f/0x90
[ 255.574372] ? __pfx_ext4_mb_find_by_goal+0x10/0x10
[ 255.576131] ? set_track_prepare+0x40/0x70
[ 255.577677] ? kmem_cache_alloc+0x388/0x440
[ 255.579207] ext4_mb_regular_allocator+0x1f7/0x1970
[ 255.580981] ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 255.582874] ? ___slab_alloc+0xc93/0xd40
[ 255.584329] ? ext4_mb_new_blocks+0xc8f/0x15c0
[ 255.585976] ? __pfx___lock_release+0x10/0x10
[ 255.587652] ? __pfx_ext4_mb_regular_allocator+0x10/0x10
[ 255.589554] ? rcu_read_lock_sched_held+0x47/0x80
[ 255.591293] ? trace_kmem_cache_alloc+0x2d/0xe0
[ 255.592997] ? kmem_cache_alloc+0x25a/0x440
[ 255.594552] ? ext4_mb_new_blocks+0xc8f/0x15c0
[ 255.596176] ext4_mb_new_blocks+0xd3b/0x15c0
[ 255.597767] ? ext4_find_extent+0x742/0xbf0
[ 255.599301] ? __pfx_ext4_mb_new_blocks+0x10/0x10
[ 255.601021] ? lock_is_held_type+0xda/0x130
[ 255.602601] ext4_ext_map_blocks+0x151a/0x2490
[ 255.604237] ? __pfx_ext4_ext_map_blocks+0x10/0x10
[ 255.605980] ? __pfx___lock_acquired+0x10/0x10
[ 255.607648] ? lock_is_held_type+0xda/0x130
[ 255.609199] ? ext4_map_blocks+0x6c9/0x1670
[ 255.610757] ? ext4_map_blocks+0x6c9/0x1670
[ 255.612289] ? lock_acquired+0x10d/0x2b0
[ 255.613759] ? rcu_read_lock_sched_held+0x47/0x80
[ 255.615467] ? ext4_es_lookup_extent+0x43e/0xa20
[ 255.617152] ext4_map_blocks+0x724/0x1670
[ 255.618655] ? lock_is_held_type+0xda/0x130
[ 255.620179] ? __pfx_ext4_map_blocks+0x10/0x10
[ 255.621813] ? rcu_read_lock_sched_held+0x47/0x80
[ 255.623512] ? jbd2__journal_start+0x4ef/0x780
[ 255.625191] ext4_alloc_file_blocks.isra.0+0x2a7/0x9a0
[ 255.627019] ? __pfx_ext4_alloc_file_blocks.isra.0+0x10/0x10
[ 255.629034] ? __pfx_file_modified_flags+0x10/0x10
[ 255.630797] ? lock_is_held_type+0xda/0x130
[ 255.632320] ext4_fallocate+0x28b/0x7d0
[ 255.633855] vfs_fallocate+0x2b0/0xb90
[ 255.635252] __x64_sys_fallocate+0xb9/0x110
[ 255.636789] do_syscall_64+0x3f/0x90
[ 255.638135] entry_SYSCALL_64_after_hwframe+0x72/0xdc
-ritesh
On Tue, Feb 28, 2023 at 10:04:35PM +0530, Ritesh Harjani wrote:
> Kemeng Shi <[email protected]> writes:
>
> > Hi, this series contain some random cleanup patches and some bugfix
> > patches to make EXT4_MB_HINT_GOAL_ONLY work properly, protect pa->pa_free
> > from race and so on. More details can be found in git log.
> > Thanks!
>
> Hi Kemeng,
>
> Did you run any testing on these patches? Because I was very easily able
> to hit ext/009 causing kernel BUG_ON with default mkfs/mount options.
> It's always a good and recommended practice to ensure some level of
> testing on any of the patches we submit to community for review
> and call out in the cover letter on what has been tested and what is not.
Hi Kemeng,
If you need help running xfstests, I maintain a test appliance which
makes it very easy to run xfstests on development kernels. Please see:
https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-xfstests.md
https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md
This test appliance can also be run on Android devices and using
Google Compute Engine; for more information see:
https://thunk.org/android-xfstests
https://thunk.org/gce-xfstests
If you're just getting started, I recommend that you start with
kvm-xfstests, and this is the simplest way to get started.
Cheers,
- Ted
on 3/1/2023 6:30 AM, Theodore Ts'o wrote:
> On Tue, Feb 28, 2023 at 10:04:35PM +0530, Ritesh Harjani wrote:
>> Kemeng Shi <[email protected]> writes:
>>
>>> Hi, this series contain some random cleanup patches and some bugfix
>>> patches to make EXT4_MB_HINT_GOAL_ONLY work properly, protect pa->pa_free
>>> from race and so on. More details can be found in git log.
>>> Thanks!
>>
>> Hi Kemeng,
>>
>> Did you run any testing on these patches? Because I was very easily able
>> to hit ext/009 causing kernel BUG_ON with default mkfs/mount options.
>> It's always a good and recommended practice to ensure some level of
>> testing on any of the patches we submit to community for review
>> and call out in the cover letter on what has been tested and what is not.
>
> Hi Kemeng,
>
> If you need help running xfstests, I maintain a test appliance which
> makes it very easy to run xfstests on development kernels. Please see:
>
> https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-xfstests.md
> https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md
>
> This test appliance can also be run on Android devices and using
> Google Compute Engine; for more information see:
>
> https://thunk.org/android-xfstests
> https://thunk.org/gce-xfstests
>
> If you're just getting started, I recommend that you start with
> kvm-xfstests, and this is the simplest way to get started.
Hi, Theodore and Ritesh. Thanks for feedback. I will run xfstests before submitting
next version. Thanks!
--
Best wishes
Kemeng Shi