2021-07-24 07:43:30

by Wang Jianchao

[permalink] [raw]
Subject: [PATCH V3 0/5] ext4: get discard out of jbd2 commit context

Hi all

This is the version 3 patch set that attempts to get discard out of the jbd2
commit kthread. When the user delete a lot data and cause discard flooding,
the jbd2 commit kthread can be blocked for very long time and then all of
the metadata operations are blocked due to no journal space.

The xfstest with following parameters,
MODULAR=0
TEST_DIR=/mnt/test
TEST_DEV=/dev/nbd37p1
SCRATCH_MNT=/mnt/scratch
SCRATCH_DEV=/dev/nbd37p2
MOUNT_OPTIONS="-o discard"
has passed. The result is consistent w/ or w/o this patch set.

There are 5 patches,

Patch 1 ~ 3, there are no functional changes in them, but just some preparation
for following patches

Patch 4 introduces a async kworker to do discard in fstrim fation which implements
the core idea of this patch set.

Patch 5 let the fallocate retry when err is ENOSPC. This fix the generic/371

Any comments are welcome ;)

V2 -> V3
- Get rid of the per block group rb tree which carries freed entry. It is not neccesary
because we have done aggregation when wait for journal commit. Just use a list
to carry the free entries.

V1 -> V2
- free the blocks back to mb buddy after commit and then do ftrim fashion discard

fs/ext4/ext4.h | 2 +
fs/ext4/extents.c | 6 ++-
fs/ext4/mballoc.c | 223 ++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------
3 files changed, 151 insertions(+), 80 deletions(-)


2021-07-24 07:43:30

by Wang Jianchao

[permalink] [raw]
Subject: [PATCH V3 1/5] ext4: remove the 'group' parameter of ext4_trim_extent

From: Wang Jianchao <[email protected]>

Get rid of the 'group' parameter of ext4_trim_extent as we can get
it from the 'e4b'.

Reviewed-by: Andreas Dilger <[email protected]>
Signed-off-by: Wang Jianchao <[email protected]>
---
fs/ext4/mballoc.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 089c958aa2c3..018d5d3c6eeb 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6183,19 +6183,19 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
* @sb: super block for the file system
* @start: starting block of the free extent in the alloc. group
* @count: number of blocks to TRIM
- * @group: alloc. group we are working with
* @e4b: ext4 buddy for the group
*
* Trim "count" blocks starting at "start" in the "group". To assure that no
* one will allocate those blocks, mark it as used in buddy bitmap. This must
* be called with under the group lock.
*/
-static int ext4_trim_extent(struct super_block *sb, int start, int count,
- ext4_group_t group, struct ext4_buddy *e4b)
+static int ext4_trim_extent(struct super_block *sb,
+ int start, int count, struct ext4_buddy *e4b)
__releases(bitlock)
__acquires(bitlock)
{
struct ext4_free_extent ex;
+ ext4_group_t group = e4b->bd_group;
int ret = 0;

trace_ext4_trim_extent(sb, group, start, count);
@@ -6271,8 +6271,7 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
next = mb_find_next_bit(bitmap, max + 1, start);

if ((next - start) >= minblocks) {
- ret = ext4_trim_extent(sb, start,
- next - start, group, &e4b);
+ ret = ext4_trim_extent(sb, start, next - start, &e4b);
if (ret && ret != -EOPNOTSUPP)
break;
ret = 0;
--
2.17.1

2021-07-24 07:43:40

by Wang Jianchao

[permalink] [raw]
Subject: [PATCH V3 2/5] ext4: add new helper interface ext4_try_to_trim_range()

From: Wang Jianchao <[email protected]>

There is no functional change in this patch but just split the
codes, which serachs free block and does trim, into a new function
ext4_try_to_trim_range. This is preparing for the following async
backgroup discard.

Reviewed-by: Andreas Dilger <[email protected]>
Signed-off-by: Wang Jianchao <[email protected]>
---
fs/ext4/mballoc.c | 102 ++++++++++++++++++++++++++--------------------
1 file changed, 57 insertions(+), 45 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 018d5d3c6eeb..e3844152a643 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6218,6 +6218,54 @@ __acquires(bitlock)
return ret;
}

+static int ext4_try_to_trim_range(struct super_block *sb,
+ struct ext4_buddy *e4b, ext4_grpblk_t start,
+ ext4_grpblk_t max, ext4_grpblk_t minblocks)
+{
+ ext4_grpblk_t next, count, free_count;
+ void *bitmap;
+ int ret = 0;
+
+ bitmap = e4b->bd_bitmap;
+ start = (e4b->bd_info->bb_first_free > start) ?
+ e4b->bd_info->bb_first_free : start;
+ count = 0;
+ free_count = 0;
+
+ while (start <= max) {
+ start = mb_find_next_zero_bit(bitmap, max + 1, start);
+ if (start > max)
+ break;
+ next = mb_find_next_bit(bitmap, max + 1, start);
+
+ if ((next - start) >= minblocks) {
+ ret = ext4_trim_extent(sb, start, next - start, e4b);
+ if (ret && ret != -EOPNOTSUPP)
+ break;
+ ret = 0;
+ count += next - start;
+ }
+ free_count += next - start;
+ start = next + 1;
+
+ if (fatal_signal_pending(current)) {
+ count = -ERESTARTSYS;
+ break;
+ }
+
+ if (need_resched()) {
+ ext4_unlock_group(sb, e4b->bd_group);
+ cond_resched();
+ ext4_lock_group(sb, e4b->bd_group);
+ }
+
+ if ((e4b->bd_info->bb_free - free_count) < minblocks)
+ break;
+ }
+
+ return count;
+}
+
/**
* ext4_trim_all_free -- function to trim all free space in alloc. group
* @sb: super block for file system
@@ -6241,10 +6289,8 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_grpblk_t start, ext4_grpblk_t max,
ext4_grpblk_t minblocks)
{
- void *bitmap;
- ext4_grpblk_t next, count = 0, free_count = 0;
struct ext4_buddy e4b;
- int ret = 0;
+ int ret;

trace_ext4_trim_all_free(sb, group, start, max);

@@ -6254,57 +6300,23 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ret, group);
return ret;
}
- bitmap = e4b.bd_bitmap;

ext4_lock_group(sb, group);
- if (EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) &&
- minblocks >= atomic_read(&EXT4_SB(sb)->s_last_trim_minblks))
- goto out;
-
- start = (e4b.bd_info->bb_first_free > start) ?
- e4b.bd_info->bb_first_free : start;

- while (start <= max) {
- start = mb_find_next_zero_bit(bitmap, max + 1, start);
- if (start > max)
- break;
- next = mb_find_next_bit(bitmap, max + 1, start);
-
- if ((next - start) >= minblocks) {
- ret = ext4_trim_extent(sb, start, next - start, &e4b);
- if (ret && ret != -EOPNOTSUPP)
- break;
- ret = 0;
- count += next - start;
- }
- free_count += next - start;
- start = next + 1;
-
- if (fatal_signal_pending(current)) {
- count = -ERESTARTSYS;
- break;
- }
-
- if (need_resched()) {
- ext4_unlock_group(sb, group);
- cond_resched();
- ext4_lock_group(sb, group);
- }
-
- if ((e4b.bd_info->bb_free - free_count) < minblocks)
- break;
+ if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
+ minblocks < atomic_read(&EXT4_SB(sb)->s_last_trim_minblks)) {
+ ret = ext4_try_to_trim_range(sb, &e4b, start, max, minblocks);
+ if (ret >= 0)
+ EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
+ } else {
+ ret = 0;
}

- if (!ret) {
- ret = count;
- EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
- }
-out:
ext4_unlock_group(sb, group);
ext4_mb_unload_buddy(&e4b);

ext4_debug("trimmed %d blocks in the group %d\n",
- count, group);
+ ret, group);

return ret;
}
--
2.17.1

2021-07-24 07:44:16

by Wang Jianchao

[permalink] [raw]
Subject: [PATCH V3 3/5] ext4: remove the repeated comment of ext4_trim_all_free

From: Wang Jianchao <[email protected]>

Reviewed-by: Andreas Dilger <[email protected]>
Signed-off-by: Wang Jianchao <[email protected]>
---
fs/ext4/mballoc.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index e3844152a643..34be2f07449d 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6274,15 +6274,10 @@ static int ext4_try_to_trim_range(struct super_block *sb,
* @max: last group block to examine
* @minblocks: minimum extent block count
*
- * ext4_trim_all_free walks through group's buddy bitmap searching for free
- * extents. When the free block is found, ext4_trim_extent is called to TRIM
- * the extent.
- *
- *
* ext4_trim_all_free walks through group's block bitmap searching for free
* extents. When the free extent is found, mark it as used in group buddy
* bitmap. Then issue a TRIM command on this extent and free the extent in
- * the group buddy bitmap. This is done until whole group is scanned.
+ * the group buddy bitmap.
*/
static ext4_grpblk_t
ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
--
2.17.1

2021-07-24 07:44:32

by Wang Jianchao

[permalink] [raw]
Subject: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex

From: Wang Jianchao <[email protected]>

Right now, discard is issued and waited to be completed in jbd2
commit kthread context after the logs are committed. When large
amount of files are deleted and discard is flooding, jbd2 commit
kthread can be blocked for long time. Then all of the metadata
operations can be blocked to wait the log space.

One case is the page fault path with read mm->mmap_sem held, which
wants to update the file time but has to wait for the log space.
When other threads in the task wants to do mmap, then write mmap_sem
is blocked. Finally all of the following read mmap_sem requirements
are blocked, even the ps command which need to read the /proc/pid/
-cmdline. Our monitor service which needs to read /proc/pid/cmdline
used to be blocked for 5 mins.

This patch frees the blocks back to buddy after commit and then do
discard in a async kworker context in fstrim fashion, namely,
- mark blocks to be discarded as used if they have not been allocated
- do discard
- mark them free
After this, jbd2 commit kthread won't be blocked any more by discard
and we won't get NOSPC even if the discard is slow or throttled.

Link: https://marc.info/?l=linux-kernel&m=162143690731901&w=2
Suggested-by: Theodore Ts'o <[email protected]>
Signed-off-by: Wang Jianchao <[email protected]>
---
fs/ext4/ext4.h | 2 +
fs/ext4/mballoc.c | 109 +++++++++++++++++++++++++++++++++++-----------
2 files changed, 86 insertions(+), 25 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 3c51e243450d..6b678b968d84 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1536,6 +1536,8 @@ struct ext4_sb_info {
unsigned int s_mb_free_pending;
struct list_head s_freed_data_list; /* List of blocks to be freed
after commit completed */
+ struct list_head s_discard_list;
+ struct work_struct s_discard_work;
struct rb_root s_mb_avg_fragment_size_root;
rwlock_t s_mb_rb_lock;
struct list_head *s_mb_largest_free_orders;
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 34be2f07449d..a496509e61b7 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -386,6 +386,7 @@
static struct kmem_cache *ext4_pspace_cachep;
static struct kmem_cache *ext4_ac_cachep;
static struct kmem_cache *ext4_free_data_cachep;
+static struct workqueue_struct *ext4_discard_wq;

/* We create slab caches for groupinfo data structures based on the
* superblock block size. There will be one per mounted filesystem for
@@ -408,6 +409,10 @@ static void ext4_mb_new_preallocation(struct ext4_allocation_context *ac);
static bool ext4_mb_good_group(struct ext4_allocation_context *ac,
ext4_group_t group, int cr);

+static int ext4_try_to_trim_range(struct super_block *sb,
+ struct ext4_buddy *e4b, ext4_grpblk_t start,
+ ext4_grpblk_t max, ext4_grpblk_t minblocks);
+
/*
* The algorithm using this percpu seq counter goes below:
* 1. We sample the percpu discard_pa_seq counter before trying for block
@@ -3308,6 +3313,55 @@ static int ext4_groupinfo_create_slab(size_t size)
return 0;
}

+static void ext4_discard_work(struct work_struct *work)
+{
+ struct ext4_sb_info *sbi = container_of(work,
+ struct ext4_sb_info, s_discard_work);
+ struct super_block *sb = sbi->s_sb;
+ struct ext4_free_data *fd, *nfd;
+ struct ext4_buddy e4b;
+ struct list_head discard_list;
+ ext4_group_t grp, load_grp;
+ int err = 0;
+
+ INIT_LIST_HEAD(&discard_list);
+ spin_lock(&sbi->s_md_lock);
+ list_splice_init(&sbi->s_discard_list, &discard_list);
+ spin_unlock(&sbi->s_md_lock);
+
+ load_grp = UINT_MAX;
+ list_for_each_entry_safe(fd, nfd, &discard_list, efd_list) {
+ /*
+ * If filesystem is umounting or no memory, give up the discard
+ */
+ if ((sb->s_flags & SB_ACTIVE) && !err) {
+ grp = fd->efd_group;
+ if (grp != load_grp) {
+ if (load_grp != UINT_MAX)
+ ext4_mb_unload_buddy(&e4b);
+
+ err = ext4_mb_load_buddy(sb, grp, &e4b);
+ if (err) {
+ kmem_cache_free(ext4_free_data_cachep, fd);
+ load_grp = UINT_MAX;
+ continue;
+ } else {
+ load_grp = grp;
+ }
+ }
+
+ ext4_lock_group(sb, grp);
+ ext4_try_to_trim_range(sb, &e4b, fd->efd_start_cluster,
+ fd->efd_start_cluster + fd->efd_count - 1, 1);
+ ext4_unlock_group(sb, grp);
+ }
+ kmem_cache_free(ext4_free_data_cachep, fd);
+ }
+
+ if (load_grp != UINT_MAX)
+ ext4_mb_unload_buddy(&e4b);
+}
+
int ext4_mb_init(struct super_block *sb)
{
struct ext4_sb_info *sbi = EXT4_SB(sb);
@@ -3376,6 +3430,8 @@ int ext4_mb_init(struct super_block *sb)
spin_lock_init(&sbi->s_md_lock);
sbi->s_mb_free_pending = 0;
INIT_LIST_HEAD(&sbi->s_freed_data_list);
+ INIT_LIST_HEAD(&sbi->s_discard_list);
+ INIT_WORK(&sbi->s_discard_work, ext4_discard_work);

sbi->s_mb_max_to_scan = MB_DEFAULT_MAX_TO_SCAN;
sbi->s_mb_min_to_scan = MB_DEFAULT_MIN_TO_SCAN;
@@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb)
struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
int count;

+ if (test_opt(sb, DISCARD)) {
+ /*
+ * wait the discard work to drain all of ext4_free_data
+ */
+ queue_work(ext4_discard_wq, &sbi->s_discard_work);
+ flush_work(&sbi->s_discard_work);
+ }
+
if (sbi->s_group_info) {
for (i = 0; i < ngroups; i++) {
cond_resched();
@@ -3596,7 +3660,6 @@ static void ext4_free_data_in_buddy(struct super_block *sb,
put_page(e4b.bd_bitmap_page);
}
ext4_unlock_group(sb, entry->efd_group);
- kmem_cache_free(ext4_free_data_cachep, entry);
ext4_mb_unload_buddy(&e4b);

mb_debug(sb, "freed %d blocks in %d structures\n", count,
@@ -3611,10 +3674,9 @@ void ext4_process_freed_data(struct super_block *sb, tid_t commit_tid)
{
struct ext4_sb_info *sbi = EXT4_SB(sb);
struct ext4_free_data *entry, *tmp;
- struct bio *discard_bio = NULL;
struct list_head freed_data_list;
struct list_head *cut_pos = NULL;
- int err;
+ bool wake;

INIT_LIST_HEAD(&freed_data_list);

@@ -3629,30 +3691,20 @@ void ext4_process_freed_data(struct super_block *sb, tid_t commit_tid)
cut_pos);
spin_unlock(&sbi->s_md_lock);

- if (test_opt(sb, DISCARD)) {
- list_for_each_entry(entry, &freed_data_list, efd_list) {
- err = ext4_issue_discard(sb, entry->efd_group,
- entry->efd_start_cluster,
- entry->efd_count,
- &discard_bio);
- if (err && err != -EOPNOTSUPP) {
- ext4_msg(sb, KERN_WARNING, "discard request in"
- " group:%d block:%d count:%d failed"
- " with %d", entry->efd_group,
- entry->efd_start_cluster,
- entry->efd_count, err);
- } else if (err == -EOPNOTSUPP)
- break;
- }
+ list_for_each_entry(entry, &freed_data_list, efd_list)
+ ext4_free_data_in_buddy(sb, entry);

- if (discard_bio) {
- submit_bio_wait(discard_bio);
- bio_put(discard_bio);
- }
+ if (test_opt(sb, DISCARD)) {
+ spin_lock(&sbi->s_md_lock);
+ wake = list_empty(&sbi->s_discard_list);
+ list_splice_tail(&freed_data_list, &sbi->s_discard_list);
+ spin_unlock(&sbi->s_md_lock);
+ if (wake)
+ queue_work(ext4_discard_wq, &sbi->s_discard_work);
+ } else {
+ list_for_each_entry_safe(entry, tmp, &freed_data_list, efd_list)
+ kmem_cache_free(ext4_free_data_cachep, entry);
}
-
- list_for_each_entry_safe(entry, tmp, &freed_data_list, efd_list)
- ext4_free_data_in_buddy(sb, entry);
}

int __init ext4_init_mballoc(void)
@@ -3672,8 +3724,14 @@ int __init ext4_init_mballoc(void)
if (ext4_free_data_cachep == NULL)
goto out_ac_free;

+ ext4_discard_wq = alloc_workqueue("ext4discard", WQ_UNBOUND, 0);
+ if (!ext4_discard_wq)
+ goto out_free_data;
+
return 0;

+out_free_data:
+ kmem_cache_destroy(ext4_free_data_cachep);
out_ac_free:
kmem_cache_destroy(ext4_ac_cachep);
out_pa_free:
@@ -3693,6 +3751,7 @@ void ext4_exit_mballoc(void)
kmem_cache_destroy(ext4_ac_cachep);
kmem_cache_destroy(ext4_free_data_cachep);
ext4_groupinfo_destroy_slabs();
+ destroy_workqueue(ext4_discard_wq);
}


--
2.17.1

2021-07-24 07:44:49

by Wang Jianchao

[permalink] [raw]
Subject: [PATCH V3 5/5] ext4: make fallocate retry when err is ENOSPC

From: Wang Jianchao <[email protected]>

The blocks may be waiting for journal commit to be freed back to
mb buddy. Let fallocate wait and retry in that case.

Signed-off-by: Wang Jianchao <[email protected]>
---
fs/ext4/extents.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 92ad64b89d9b..ad0b874d3448 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4635,7 +4635,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
struct inode *inode = file_inode(file);
loff_t new_size = 0;
unsigned int max_blocks;
- int ret = 0;
+ int ret = 0, retries = 0;
int flags;
ext4_lblk_t lblk;
unsigned int blkbits = inode->i_blkbits;
@@ -4656,6 +4656,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
FALLOC_FL_INSERT_RANGE))
return -EOPNOTSUPP;

+retry:
ext4_fc_start_update(inode);

if (mode & FALLOC_FL_PUNCH_HOLE) {
@@ -4722,6 +4723,9 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
exit:
ext4_fc_stop_update(inode);
+ if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
+ goto retry;
+
return ret;
}

--
2.17.1

2021-07-26 03:41:24

by Guoqing Jiang

[permalink] [raw]
Subject: Re: [PATCH V3 5/5] ext4: make fallocate retry when err is ENOSPC

Hi,

On 7/24/21 3:41 PM, Wang Jianchao wrote:
> From: Wang Jianchao <[email protected]>
>
> The blocks may be waiting for journal commit to be freed back to
> mb buddy. Let fallocate wait and retry in that case.
>
> Signed-off-by: Wang Jianchao <[email protected]>
> ---
> fs/ext4/extents.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 92ad64b89d9b..ad0b874d3448 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4635,7 +4635,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> struct inode *inode = file_inode(file);
> loff_t new_size = 0;
> unsigned int max_blocks;
> - int ret = 0;
> + int ret = 0, retries = 0;
> int flags;
> ext4_lblk_t lblk;
> unsigned int blkbits = inode->i_blkbits;
> @@ -4656,6 +4656,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> FALLOC_FL_INSERT_RANGE))
> return -EOPNOTSUPP;
>
> +retry:
> ext4_fc_start_update(inode);
>
> if (mode & FALLOC_FL_PUNCH_HOLE) {
> @@ -4722,6 +4723,9 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
> exit:
> ext4_fc_stop_update(inode);
> + if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
> + goto retry;
> +

Not sure if it is necessary since ext4_alloc_file_blocks already retries
allocate.

Thanks,
Guoqing

2021-07-26 03:42:47

by Chen, Rong A

[permalink] [raw]
Subject: Re: [PATCH V3 2/5] ext4: add new helper interface ext4_try_to_trim_range()

Hi Wang,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on ext4/dev]
[also build test WARNING on linux/master linus/master v5.14-rc2
next-20210723]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Wang-Jianchao/ext4-get-discard-out-of-jbd2-commit-context/20210724-154426
base: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
:::::: branch date: 25 hours ago
:::::: commit date: 25 hours ago
config: x86_64-randconfig-c001-20210725 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project
c63dbd850182797bc4b76124d08e1c320ab2365d)
reproduce (this is a W=1 build):
wget
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross
-O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
#
https://github.com/0day-ci/linux/commit/55a3430685e83709742c1fe2e0d4b347781bcc80
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review
Wang-Jianchao/ext4-get-discard-out-of-jbd2-commit-context/20210724-154426
git checkout 55a3430685e83709742c1fe2e0d4b347781bcc80
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross
clang-analyzer ARCH=x86_64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>


clang-analyzer warnings: (new ones prefixed by >>)
fs/ext4/mballoc.c:1909:2: note: Taking true branch
if (ex->fe_start + ex->fe_len >
EXT4_CLUSTERS_PER_GROUP(e4b->bd_sb)) {
^
fs/ext4/mballoc.c:1911:3: note: Taking true branch
WARN_ON(1);
^
include/asm-generic/bug.h:120:2: note: expanded from macro 'WARN_ON'
if (unlikely(__ret_warn_on))
\
^
fs/ext4/mballoc.c:1911:3: note: Loop condition is false. Exiting loop
WARN_ON(1);
^
include/asm-generic/bug.h:121:3: note: expanded from macro 'WARN_ON'
__WARN();
\
^
include/asm-generic/bug.h:94:19: note: expanded from macro '__WARN'
#define __WARN() __WARN_FLAGS(BUGFLAG_TAINT(TAINT_WARN))
^
arch/x86/include/asm/bug.h:78:2: note: expanded from macro
'__WARN_FLAGS'
instrumentation_begin(); \
^
include/linux/instrumentation.h:53:34: note: expanded from macro
'instrumentation_begin'
# define instrumentation_begin() do { } while(0)
^
fs/ext4/mballoc.c:1911:3: note: Loop condition is false. Exiting loop
WARN_ON(1);
^
include/asm-generic/bug.h:121:3: note: expanded from macro 'WARN_ON'
__WARN();
\
^
include/asm-generic/bug.h:94:19: note: expanded from macro '__WARN'
#define __WARN() __WARN_FLAGS(BUGFLAG_TAINT(TAINT_WARN))
^
arch/x86/include/asm/bug.h:79:2: note: expanded from macro
'__WARN_FLAGS'
_BUG_FLAGS(ASM_UD2, BUGFLAG_WARNING|(flags)); \
^
arch/x86/include/asm/bug.h:25:37: note: expanded from macro '_BUG_FLAGS'
#define _BUG_FLAGS(ins, flags)
\

^
fs/ext4/mballoc.c:1911:3: note: Loop condition is false. Exiting loop
WARN_ON(1);
^
include/asm-generic/bug.h:121:3: note: expanded from macro 'WARN_ON'
__WARN();
\
^
include/asm-generic/bug.h:94:19: note: expanded from macro '__WARN'
#define __WARN() __WARN_FLAGS(BUGFLAG_TAINT(TAINT_WARN))
^
arch/x86/include/asm/bug.h:81:2: note: expanded from macro
'__WARN_FLAGS'
instrumentation_end(); \
^
include/linux/instrumentation.h:54:33: note: expanded from macro
'instrumentation_end'
# define instrumentation_end() do { } while(0)
^
fs/ext4/mballoc.c:1911:3: note: Loop condition is false. Exiting loop
WARN_ON(1);
^
include/asm-generic/bug.h:121:3: note: expanded from macro 'WARN_ON'
__WARN();
\
^
include/asm-generic/bug.h:94:19: note: expanded from macro '__WARN'
#define __WARN() __WARN_FLAGS(BUGFLAG_TAINT(TAINT_WARN))
^
arch/x86/include/asm/bug.h:76:33: note: expanded from macro
'__WARN_FLAGS'
#define __WARN_FLAGS(flags) \
^
fs/ext4/mballoc.c:1912:3: note: 14th function call argument is an
uninitialized value
ext4_grp_locked_error(e4b->bd_sb, e4b->bd_group, 0, 0,
^
fs/ext4/ext4.h:3126:2: note: expanded from macro 'ext4_grp_locked_error'
__ext4_grp_locked_error(__func__, __LINE__, sb, grp, ino,
block, \
^
fs/ext4/mballoc.c:2597:27: warning: Value stored to 'grp' during its
initialization is never read [clang-analyzer-deadcode.DeadStores]
struct ext4_group_info *grp =
ext4_get_group_info(sb, group);
^~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/ext4/mballoc.c:2597:27: note: Value stored to 'grp' during its
initialization is never read
struct ext4_group_info *grp =
ext4_get_group_info(sb, group);
^~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/ext4/mballoc.c:3839:3: warning: Value stored to 'err' is never
read [clang-analyzer-deadcode.DeadStores]
err = PTR_ERR(bitmap_bh);
^ ~~~~~~~~~~~~~~~~~~
fs/ext4/mballoc.c:3839:3: note: Value stored to 'err' is never read
err = PTR_ERR(bitmap_bh);
^ ~~~~~~~~~~~~~~~~~~
fs/ext4/mballoc.c:3844:2: warning: Value stored to 'err' is never
read [clang-analyzer-deadcode.DeadStores]
err = -EIO;
^ ~~~~
fs/ext4/mballoc.c:3844:2: note: Value stored to 'err' is never read
err = -EIO;
^ ~~~~
fs/ext4/mballoc.c:3889:2: warning: Value stored to 'err' is never
read [clang-analyzer-deadcode.DeadStores]
err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh);
^
fs/ext4/mballoc.c:3889:2: note: Value stored to 'err' is never read
fs/ext4/mballoc.c:5746:3: warning: Value stored to 'err' is never
read [clang-analyzer-deadcode.DeadStores]
err = PTR_ERR(bitmap_bh);
^ ~~~~~~~~~~~~~~~~~~
fs/ext4/mballoc.c:5746:3: note: Value stored to 'err' is never read
err = PTR_ERR(bitmap_bh);
^ ~~~~~~~~~~~~~~~~~~
>> fs/ext4/mballoc.c:6245:4: warning: Value stored to 'ret' is never read [clang-analyzer-deadcode.DeadStores]
ret = 0;
^ ~
fs/ext4/mballoc.c:6245:4: note: Value stored to 'ret' is never read
ret = 0;
^ ~
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
4 warnings generated.
Suppressed 4 warnings (4 in non-user code).
Use -header-filter=.* to display errors from all non-system headers.
Use -system-headers to display errors from system headers as well.
12 warnings generated.
arch/x86/include/asm/paravirt.h:585:2: warning: Assigned value is
garbage or undefined [clang-analyzer-core.uninitialized.Assign]
PVOP_VCALL2(lock.queued_spin_lock_slowpath, lock, val);
^
arch/x86/include/asm/paravirt_types.h:547:2: note: expanded from
macro 'PVOP_VCALL2'
__PVOP_VCALL(op, PVOP_CALL_ARG1(arg1), PVOP_CALL_ARG2(arg2))
^
arch/x86/include/asm/paravirt_types.h:491:8: note: expanded from
macro '__PVOP_VCALL'
(void)____PVOP_CALL(, op, CLBR_ANY, PVOP_VCALL_CLOBBERS,
\
^
arch/x86/include/asm/paravirt_types.h:446:3: note: expanded from
macro '____PVOP_CALL'
PVOP_CALL_ARGS;
\
^
arch/x86/include/asm/paravirt_types.h:404:16: note: expanded from
macro 'PVOP_CALL_ARGS'
unsigned long __edi = __edi, __esi = __esi, \
^
kernel/trace/trace.c:5779:2: note: Calling 'queued_spin_lock'
arch_spin_lock(&trace_cmdline_lock);
^
include/asm-generic/qspinlock.h:117:28: note: expanded from macro
'arch_spin_lock'
#define arch_spin_lock(l) queued_spin_lock(l)
^~~~~~~~~~~~~~~~~~~
include/asm-generic/qspinlock.h:82:6: note: Assuming the condition
is false
if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val,
_Q_LOCKED_VAL)))
^
include/linux/compiler.h:77:20: note: expanded from macro 'likely'
# define likely(x) __builtin_expect(!!(x), 1)
^~~~~~~~~~~~~~~~~~~~~~~~~~
include/asm-generic/qspinlock.h:82:2: note: Taking false branch
if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val,
_Q_LOCKED_VAL)))
^
include/asm-generic/qspinlock.h:85:2: note: Calling
'queued_spin_lock_slowpath'
queued_spin_lock_slowpath(lock, val);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/x86/include/asm/qspinlock.h:51:2: note: Calling
'pv_queued_spin_lock_slowpath'
pv_queued_spin_lock_slowpath(lock, val);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/x86/include/asm/paravirt.h:585:2: note: Assigned value is
garbage or undefined
PVOP_VCALL2(lock.queued_spin_lock_slowpath, lock, val);
^
arch/x86/include/asm/paravirt_types.h:547:2: note: expanded from
macro 'PVOP_VCALL2'
__PVOP_VCALL(op, PVOP_CALL_ARG1(arg1), PVOP_CALL_ARG2(arg2))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/x86/include/asm/paravirt_types.h:491:8: note: expanded from
macro '__PVOP_VCALL'
(void)____PVOP_CALL(, op, CLBR_ANY, PVOP_VCALL_CLOBBERS,
\

^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/x86/include/asm/paravirt_types.h:446:3: note: expanded from
macro '____PVOP_CALL'
PVOP_CALL_ARGS;
\
^~~~~~~~~~~~~~
arch/x86/include/asm/paravirt_types.h:404:16: note: expanded from
macro 'PVOP_CALL_ARGS'
unsigned long __edi = __edi, __esi = __esi, \
^ ~~~~~
arch/x86/include/asm/paravirt.h:590:2: warning: Assigned value is
garbage or undefined [clang-analyzer-core.uninitialized.Assign]
PVOP_ALT_VCALLEE1(lock.queued_spin_unlock, lock,
^
arch/x86/include/asm/paravirt_types.h:541:2: note: expanded from
macro 'PVOP_ALT_VCALLEE1'
__PVOP_ALT_VCALLEESAVE(op, alt, cond, PVOP_CALL_ARG1(arg1))
^
arch/x86/include/asm/paravirt_types.h:504:8: note: expanded from
macro '__PVOP_ALT_VCALLEESAVE'
(void)____PVOP_ALT_CALL(, op.func, alt, cond, CLBR_RET_REG,
\
^
arch/x86/include/asm/paravirt_types.h:460:3: note: expanded from
macro '____PVOP_ALT_CALL'
PVOP_CALL_ARGS;
\
^
arch/x86/include/asm/paravirt_types.h:404:16: note: expanded from
macro 'PVOP_CALL_ARGS'
unsigned long __edi = __edi, __esi = __esi, \
^
kernel/trace/trace.c:9948:6: note: Assuming 'tracepoint_printk' is 0
if (tracepoint_printk) {
^~~~~~~~~~~~~~~~~
kernel/trace/trace.c:9948:2: note: Taking false branch
if (tracepoint_printk) {
^
kernel/trace/trace.c:9957:2: note: Calling 'tracer_alloc_buffers'
tracer_alloc_buffers();
^~~~~~~~~~~~~~~~~~~~~~
kernel/trace/trace.c:9821:6: note: Assuming the condition is false
if (security_locked_down(LOCKDOWN_TRACEFS)) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/trace/trace.c:9821:2: note: Taking false branch
if (security_locked_down(LOCKDOWN_TRACEFS)) {
^
kernel/trace/trace.c:9830:15: note: TRACE_ITER_LAST_BIT is <= 32
BUILD_BUG_ON(TRACE_ITER_LAST_BIT > TRACE_FLAGS_MAX_SIZE);
^
include/linux/build_bug.h:50:19: note: expanded from macro
'BUILD_BUG_ON'
BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
^~~~~~~~~
include/linux/build_bug.h:39:58: note: expanded from macro
'BUILD_BUG_ON_MSG'
#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)

vim +/ret +6245 fs/ext4/mballoc.c

7360d1731e5dc7 Lukas Czerner 2010-10-27 6220 55a3430685e837 Wang
Jianchao 2021-07-24 6221 static int ext4_try_to_trim_range(struct
super_block *sb,
55a3430685e837 Wang Jianchao 2021-07-24 6222 struct ext4_buddy *e4b,
ext4_grpblk_t start,
55a3430685e837 Wang Jianchao 2021-07-24 6223 ext4_grpblk_t max,
ext4_grpblk_t minblocks)
55a3430685e837 Wang Jianchao 2021-07-24 6224 {
55a3430685e837 Wang Jianchao 2021-07-24 6225 ext4_grpblk_t next,
count, free_count;
55a3430685e837 Wang Jianchao 2021-07-24 6226 void *bitmap;
55a3430685e837 Wang Jianchao 2021-07-24 6227 int ret = 0;
55a3430685e837 Wang Jianchao 2021-07-24 6228 55a3430685e837 Wang
Jianchao 2021-07-24 6229 bitmap = e4b->bd_bitmap;
55a3430685e837 Wang Jianchao 2021-07-24 6230 start =
(e4b->bd_info->bb_first_free > start) ?
55a3430685e837 Wang Jianchao 2021-07-24 6231
e4b->bd_info->bb_first_free : start;
55a3430685e837 Wang Jianchao 2021-07-24 6232 count = 0;
55a3430685e837 Wang Jianchao 2021-07-24 6233 free_count = 0;
55a3430685e837 Wang Jianchao 2021-07-24 6234 55a3430685e837 Wang
Jianchao 2021-07-24 6235 while (start <= max) {
55a3430685e837 Wang Jianchao 2021-07-24 6236 start =
mb_find_next_zero_bit(bitmap, max + 1, start);
55a3430685e837 Wang Jianchao 2021-07-24 6237 if (start > max)
55a3430685e837 Wang Jianchao 2021-07-24 6238 break;
55a3430685e837 Wang Jianchao 2021-07-24 6239 next =
mb_find_next_bit(bitmap, max + 1, start);
55a3430685e837 Wang Jianchao 2021-07-24 6240 55a3430685e837 Wang
Jianchao 2021-07-24 6241 if ((next - start) >= minblocks) {
55a3430685e837 Wang Jianchao 2021-07-24 6242 ret =
ext4_trim_extent(sb, start, next - start, e4b);
55a3430685e837 Wang Jianchao 2021-07-24 6243 if (ret && ret !=
-EOPNOTSUPP)
55a3430685e837 Wang Jianchao 2021-07-24 6244 break;
55a3430685e837 Wang Jianchao 2021-07-24 @6245 ret = 0;
55a3430685e837 Wang Jianchao 2021-07-24 6246 count += next - start;
55a3430685e837 Wang Jianchao 2021-07-24 6247 }
55a3430685e837 Wang Jianchao 2021-07-24 6248 free_count += next - start;
55a3430685e837 Wang Jianchao 2021-07-24 6249 start = next + 1;
55a3430685e837 Wang Jianchao 2021-07-24 6250 55a3430685e837 Wang
Jianchao 2021-07-24 6251 if (fatal_signal_pending(current)) {
55a3430685e837 Wang Jianchao 2021-07-24 6252 count = -ERESTARTSYS;
55a3430685e837 Wang Jianchao 2021-07-24 6253 break;
55a3430685e837 Wang Jianchao 2021-07-24 6254 }
55a3430685e837 Wang Jianchao 2021-07-24 6255 55a3430685e837 Wang
Jianchao 2021-07-24 6256 if (need_resched()) {
55a3430685e837 Wang Jianchao 2021-07-24 6257 ext4_unlock_group(sb,
e4b->bd_group);
55a3430685e837 Wang Jianchao 2021-07-24 6258 cond_resched();
55a3430685e837 Wang Jianchao 2021-07-24 6259 ext4_lock_group(sb,
e4b->bd_group);
55a3430685e837 Wang Jianchao 2021-07-24 6260 }
55a3430685e837 Wang Jianchao 2021-07-24 6261 55a3430685e837 Wang
Jianchao 2021-07-24 6262 if ((e4b->bd_info->bb_free - free_count) <
minblocks)
55a3430685e837 Wang Jianchao 2021-07-24 6263 break;
55a3430685e837 Wang Jianchao 2021-07-24 6264 }
55a3430685e837 Wang Jianchao 2021-07-24 6265 55a3430685e837 Wang
Jianchao 2021-07-24 6266 return count;
55a3430685e837 Wang Jianchao 2021-07-24 6267 }
55a3430685e837 Wang Jianchao 2021-07-24 6268
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
.config.gz (28.72 kB)
Attached Message Part (154.00 B)
Download all attachments

2021-07-26 03:44:21

by Guoqing Jiang

[permalink] [raw]
Subject: Re: [PATCH V3 1/5] ext4: remove the 'group' parameter of ext4_trim_extent


On 7/24/21 3:41 PM, Wang Jianchao wrote:
> -static int ext4_trim_extent(struct super_block *sb, int start, int count,
> - ext4_group_t group, struct ext4_buddy *e4b)
> +static int ext4_trim_extent(struct super_block *sb,
> + int start, int count, struct ext4_buddy *e4b)

Nit, seems only need to change the second line.

Thanks,
Guoqing

2021-07-26 07:06:58

by Wang Jianchao

[permalink] [raw]
Subject: Re: [PATCH V3 5/5] ext4: make fallocate retry when err is ENOSPC



On 2021/7/26 11:40 AM, Guoqing Jiang wrote:
> Hi,
>
> On 7/24/21 3:41 PM, Wang Jianchao wrote:
>> From: Wang Jianchao <[email protected]>
>>
>> The blocks may be waiting for journal commit to be freed back to
>> mb buddy. Let fallocate wait and retry in that case.
>>
>> Signed-off-by: Wang Jianchao <[email protected]>
>> ---
>>   fs/ext4/extents.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index 92ad64b89d9b..ad0b874d3448 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -4635,7 +4635,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>>       struct inode *inode = file_inode(file);
>>       loff_t new_size = 0;
>>       unsigned int max_blocks;
>> -    int ret = 0;
>> +    int ret = 0, retries = 0;
>>       int flags;
>>       ext4_lblk_t lblk;
>>       unsigned int blkbits = inode->i_blkbits;
>> @@ -4656,6 +4656,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>>                FALLOC_FL_INSERT_RANGE))
>>           return -EOPNOTSUPP;
>>   +retry:
>>       ext4_fc_start_update(inode);
>>         if (mode & FALLOC_FL_PUNCH_HOLE) {
>> @@ -4722,6 +4723,9 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>>       trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
>>   exit:
>>       ext4_fc_stop_update(inode);
>> +    if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
>> +        goto retry;
>> +
>
> Not sure if it is necessary since ext4_alloc_file_blocks already retries allocate.

Yes, this patch should be get rid of.
But it is indeed helpful to fix the xfstest generic/371 which does concurrently write/rm
and fallocate/rm. I'll figure out some other way to improve that

Thanks
Jianchao

>
> Thanks,
> Guoqing

2021-08-04 15:27:42

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH V3 1/5] ext4: remove the 'group' parameter of ext4_trim_extent

On Sat 24-07-21 15:41:20, Wang Jianchao wrote:
> From: Wang Jianchao <[email protected]>
>
> Get rid of the 'group' parameter of ext4_trim_extent as we can get
> it from the 'e4b'.
>
> Reviewed-by: Andreas Dilger <[email protected]>
> Signed-off-by: Wang Jianchao <[email protected]>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/mballoc.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 089c958aa2c3..018d5d3c6eeb 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -6183,19 +6183,19 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
> * @sb: super block for the file system
> * @start: starting block of the free extent in the alloc. group
> * @count: number of blocks to TRIM
> - * @group: alloc. group we are working with
> * @e4b: ext4 buddy for the group
> *
> * Trim "count" blocks starting at "start" in the "group". To assure that no
> * one will allocate those blocks, mark it as used in buddy bitmap. This must
> * be called with under the group lock.
> */
> -static int ext4_trim_extent(struct super_block *sb, int start, int count,
> - ext4_group_t group, struct ext4_buddy *e4b)
> +static int ext4_trim_extent(struct super_block *sb,
> + int start, int count, struct ext4_buddy *e4b)
> __releases(bitlock)
> __acquires(bitlock)
> {
> struct ext4_free_extent ex;
> + ext4_group_t group = e4b->bd_group;
> int ret = 0;
>
> trace_ext4_trim_extent(sb, group, start, count);
> @@ -6271,8 +6271,7 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
> next = mb_find_next_bit(bitmap, max + 1, start);
>
> if ((next - start) >= minblocks) {
> - ret = ext4_trim_extent(sb, start,
> - next - start, group, &e4b);
> + ret = ext4_trim_extent(sb, start, next - start, &e4b);
> if (ret && ret != -EOPNOTSUPP)
> break;
> ret = 0;
> --
> 2.17.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-04 15:28:56

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH V3 3/5] ext4: remove the repeated comment of ext4_trim_all_free

On Sat 24-07-21 15:41:22, Wang Jianchao wrote:
> From: Wang Jianchao <[email protected]>
>
> Reviewed-by: Andreas Dilger <[email protected]>
> Signed-off-by: Wang Jianchao <[email protected]>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/mballoc.c | 7 +------
> 1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index e3844152a643..34be2f07449d 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -6274,15 +6274,10 @@ static int ext4_try_to_trim_range(struct super_block *sb,
> * @max: last group block to examine
> * @minblocks: minimum extent block count
> *
> - * ext4_trim_all_free walks through group's buddy bitmap searching for free
> - * extents. When the free block is found, ext4_trim_extent is called to TRIM
> - * the extent.
> - *
> - *
> * ext4_trim_all_free walks through group's block bitmap searching for free
> * extents. When the free extent is found, mark it as used in group buddy
> * bitmap. Then issue a TRIM command on this extent and free the extent in
> - * the group buddy bitmap. This is done until whole group is scanned.
> + * the group buddy bitmap.
> */
> static ext4_grpblk_t
> ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
> --
> 2.17.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-04 15:31:14

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH V3 2/5] ext4: add new helper interface ext4_try_to_trim_range()

On Sat 24-07-21 15:41:21, Wang Jianchao wrote:
> From: Wang Jianchao <[email protected]>
>
> There is no functional change in this patch but just split the
> codes, which serachs free block and does trim, into a new function
> ext4_try_to_trim_range. This is preparing for the following async
> backgroup discard.
>
> Reviewed-by: Andreas Dilger <[email protected]>
> Signed-off-by: Wang Jianchao <[email protected]>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/mballoc.c | 102 ++++++++++++++++++++++++++--------------------
> 1 file changed, 57 insertions(+), 45 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 018d5d3c6eeb..e3844152a643 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -6218,6 +6218,54 @@ __acquires(bitlock)
> return ret;
> }
>
> +static int ext4_try_to_trim_range(struct super_block *sb,
> + struct ext4_buddy *e4b, ext4_grpblk_t start,
> + ext4_grpblk_t max, ext4_grpblk_t minblocks)
> +{
> + ext4_grpblk_t next, count, free_count;
> + void *bitmap;
> + int ret = 0;
> +
> + bitmap = e4b->bd_bitmap;
> + start = (e4b->bd_info->bb_first_free > start) ?
> + e4b->bd_info->bb_first_free : start;
> + count = 0;
> + free_count = 0;
> +
> + while (start <= max) {
> + start = mb_find_next_zero_bit(bitmap, max + 1, start);
> + if (start > max)
> + break;
> + next = mb_find_next_bit(bitmap, max + 1, start);
> +
> + if ((next - start) >= minblocks) {
> + ret = ext4_trim_extent(sb, start, next - start, e4b);
> + if (ret && ret != -EOPNOTSUPP)
> + break;
> + ret = 0;
> + count += next - start;
> + }
> + free_count += next - start;
> + start = next + 1;
> +
> + if (fatal_signal_pending(current)) {
> + count = -ERESTARTSYS;
> + break;
> + }
> +
> + if (need_resched()) {
> + ext4_unlock_group(sb, e4b->bd_group);
> + cond_resched();
> + ext4_lock_group(sb, e4b->bd_group);
> + }
> +
> + if ((e4b->bd_info->bb_free - free_count) < minblocks)
> + break;
> + }
> +
> + return count;
> +}
> +
> /**
> * ext4_trim_all_free -- function to trim all free space in alloc. group
> * @sb: super block for file system
> @@ -6241,10 +6289,8 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
> ext4_grpblk_t start, ext4_grpblk_t max,
> ext4_grpblk_t minblocks)
> {
> - void *bitmap;
> - ext4_grpblk_t next, count = 0, free_count = 0;
> struct ext4_buddy e4b;
> - int ret = 0;
> + int ret;
>
> trace_ext4_trim_all_free(sb, group, start, max);
>
> @@ -6254,57 +6300,23 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
> ret, group);
> return ret;
> }
> - bitmap = e4b.bd_bitmap;
>
> ext4_lock_group(sb, group);
> - if (EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) &&
> - minblocks >= atomic_read(&EXT4_SB(sb)->s_last_trim_minblks))
> - goto out;
> -
> - start = (e4b.bd_info->bb_first_free > start) ?
> - e4b.bd_info->bb_first_free : start;
>
> - while (start <= max) {
> - start = mb_find_next_zero_bit(bitmap, max + 1, start);
> - if (start > max)
> - break;
> - next = mb_find_next_bit(bitmap, max + 1, start);
> -
> - if ((next - start) >= minblocks) {
> - ret = ext4_trim_extent(sb, start, next - start, &e4b);
> - if (ret && ret != -EOPNOTSUPP)
> - break;
> - ret = 0;
> - count += next - start;
> - }
> - free_count += next - start;
> - start = next + 1;
> -
> - if (fatal_signal_pending(current)) {
> - count = -ERESTARTSYS;
> - break;
> - }
> -
> - if (need_resched()) {
> - ext4_unlock_group(sb, group);
> - cond_resched();
> - ext4_lock_group(sb, group);
> - }
> -
> - if ((e4b.bd_info->bb_free - free_count) < minblocks)
> - break;
> + if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
> + minblocks < atomic_read(&EXT4_SB(sb)->s_last_trim_minblks)) {
> + ret = ext4_try_to_trim_range(sb, &e4b, start, max, minblocks);
> + if (ret >= 0)
> + EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
> + } else {
> + ret = 0;
> }
>
> - if (!ret) {
> - ret = count;
> - EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
> - }
> -out:
> ext4_unlock_group(sb, group);
> ext4_mb_unload_buddy(&e4b);
>
> ext4_debug("trimmed %d blocks in the group %d\n",
> - count, group);
> + ret, group);
>
> return ret;
> }
> --
> 2.17.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-04 15:33:24

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH V3 5/5] ext4: make fallocate retry when err is ENOSPC

On Sat 24-07-21 15:41:24, Wang Jianchao wrote:
> From: Wang Jianchao <[email protected]>
>
> The blocks may be waiting for journal commit to be freed back to
> mb buddy. Let fallocate wait and retry in that case.
>
> Signed-off-by: Wang Jianchao <[email protected]>

Did you really observe this? Because the retry is already handled in
ext4_alloc_file_blocks() that's used by ext4_fallocate(). So no retry
should be needed there.

Honza

> ---
> fs/ext4/extents.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 92ad64b89d9b..ad0b874d3448 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4635,7 +4635,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> struct inode *inode = file_inode(file);
> loff_t new_size = 0;
> unsigned int max_blocks;
> - int ret = 0;
> + int ret = 0, retries = 0;
> int flags;
> ext4_lblk_t lblk;
> unsigned int blkbits = inode->i_blkbits;
> @@ -4656,6 +4656,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> FALLOC_FL_INSERT_RANGE))
> return -EOPNOTSUPP;
>
> +retry:
> ext4_fc_start_update(inode);
>
> if (mode & FALLOC_FL_PUNCH_HOLE) {
> @@ -4722,6 +4723,9 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
> exit:
> ext4_fc_stop_update(inode);
> + if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
> + goto retry;
> +
> return ret;
> }
>
> --
> 2.17.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-04 15:47:01

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex

On Sat 24-07-21 15:41:23, Wang Jianchao wrote:
> From: Wang Jianchao <[email protected]>
>
> Right now, discard is issued and waited to be completed in jbd2
> commit kthread context after the logs are committed. When large
> amount of files are deleted and discard is flooding, jbd2 commit
> kthread can be blocked for long time. Then all of the metadata
> operations can be blocked to wait the log space.
>
> One case is the page fault path with read mm->mmap_sem held, which
> wants to update the file time but has to wait for the log space.
> When other threads in the task wants to do mmap, then write mmap_sem
> is blocked. Finally all of the following read mmap_sem requirements
> are blocked, even the ps command which need to read the /proc/pid/
> -cmdline. Our monitor service which needs to read /proc/pid/cmdline
> used to be blocked for 5 mins.
>
> This patch frees the blocks back to buddy after commit and then do
> discard in a async kworker context in fstrim fashion, namely,
> - mark blocks to be discarded as used if they have not been allocated
> - do discard
> - mark them free
> After this, jbd2 commit kthread won't be blocked any more by discard
> and we won't get NOSPC even if the discard is slow or throttled.
>
> Link: https://marc.info/?l=linux-kernel&m=162143690731901&w=2
> Suggested-by: Theodore Ts'o <[email protected]>
> Signed-off-by: Wang Jianchao <[email protected]>

Looks good to me. Just one small comment below. With that addressed feel
free to add:

Reviewed-by: Jan Kara <[email protected]>


> @@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb)
> struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
> int count;
>
> + if (test_opt(sb, DISCARD)) {
> + /*
> + * wait the discard work to drain all of ext4_free_data
> + */
> + queue_work(ext4_discard_wq, &sbi->s_discard_work);

Do we really need to queue the work here? The filesystem should be
quiescent by now, we take care to queue the work whenever we add item to
empty list. So it should be enough to have flush_work() here and then
possibly

WARN_ON_ONCE(!list_empty(&sbi->s_discard_list))

Or am I missing something?

Honza

> + flush_work(&sbi->s_discard_work);
> + }
> +
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-04 15:49:34

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH V3 5/5] ext4: make fallocate retry when err is ENOSPC

On Wed 04-08-21 17:32:21, Jan Kara wrote:
> On Sat 24-07-21 15:41:24, Wang Jianchao wrote:
> > From: Wang Jianchao <[email protected]>
> >
> > The blocks may be waiting for journal commit to be freed back to
> > mb buddy. Let fallocate wait and retry in that case.
> >
> > Signed-off-by: Wang Jianchao <[email protected]>
>
> Did you really observe this? Because the retry is already handled in
> ext4_alloc_file_blocks() that's used by ext4_fallocate(). So no retry
> should be needed there.

Oh, I can see you've addressed these already in another reply. I'll comment
there.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-04 15:53:39

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH V3 5/5] ext4: make fallocate retry when err is ENOSPC

On Mon 26-07-21 15:05:41, Wang Jianchao wrote:
>
>
> On 2021/7/26 11:40 AM, Guoqing Jiang wrote:
> > Hi,
> >
> > On 7/24/21 3:41 PM, Wang Jianchao wrote:
> >> From: Wang Jianchao <[email protected]>
> >>
> >> The blocks may be waiting for journal commit to be freed back to
> >> mb buddy. Let fallocate wait and retry in that case.
> >>
> >> Signed-off-by: Wang Jianchao <[email protected]>
> >> ---
> >> ? fs/ext4/extents.c | 6 +++++-
> >> ? 1 file changed, 5 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> >> index 92ad64b89d9b..ad0b874d3448 100644
> >> --- a/fs/ext4/extents.c
> >> +++ b/fs/ext4/extents.c
> >> @@ -4635,7 +4635,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> >> ????? struct inode *inode = file_inode(file);
> >> ????? loff_t new_size = 0;
> >> ????? unsigned int max_blocks;
> >> -??? int ret = 0;
> >> +??? int ret = 0, retries = 0;
> >> ????? int flags;
> >> ????? ext4_lblk_t lblk;
> >> ????? unsigned int blkbits = inode->i_blkbits;
> >> @@ -4656,6 +4656,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> >> ?????????????? FALLOC_FL_INSERT_RANGE))
> >> ????????? return -EOPNOTSUPP;
> >> ? +retry:
> >> ????? ext4_fc_start_update(inode);
> >> ? ????? if (mode & FALLOC_FL_PUNCH_HOLE) {
> >> @@ -4722,6 +4723,9 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> >> ????? trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
> >> ? exit:
> >> ????? ext4_fc_stop_update(inode);
> >> +??? if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
> >> +??????? goto retry;
> >> +
> >
> > Not sure if it is necessary since ext4_alloc_file_blocks already retries allocate.
>
> Yes, this patch should be get rid of. But it is indeed helpful to fix
> the xfstest generic/371 which does concurrently write/rm and
> fallocate/rm. I'll figure out some other way to improve that

Note that the retry logic is only a heuristic. It is not guaranteed any
number of retries is enough, we just do three to not give up too easily...
Your patch effectively raised number of retries to 9 so that may have
masked the issue. But I don't think so high number of retries is a sensible
choice because that way it may take too long to return ENOSPC.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-08-12 18:47:11

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH V3 2/5] ext4: add new helper interface ext4_try_to_trim_range()

On Sat, Jul 24, 2021 at 03:41:21PM +0800, Wang Jianchao wrote:
> From: Wang Jianchao <[email protected]>
>
> There is no functional change in this patch but just split the
> codes, which serachs free block and does trim, into a new function
> ext4_try_to_trim_range. This is preparing for the following async
> backgroup discard.
>
> Reviewed-by: Andreas Dilger <[email protected]>
> Signed-off-by: Wang Jianchao <[email protected]>
> ---
> fs/ext4/mballoc.c | 102 ++++++++++++++++++++++++++--------------------
> 1 file changed, 57 insertions(+), 45 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 018d5d3c6eeb..e3844152a643 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -6218,6 +6218,54 @@ __acquires(bitlock)
> return ret;
> }
>
> +static int ext4_try_to_trim_range(struct super_block *sb,
> + struct ext4_buddy *e4b, ext4_grpblk_t start,
> + ext4_grpblk_t max, ext4_grpblk_t minblocks)
> +{
> + ext4_grpblk_t next, count, free_count;
> + void *bitmap;
> + int ret = 0;
> +
> + bitmap = e4b->bd_bitmap;
> + start = (e4b->bd_info->bb_first_free > start) ?
> + e4b->bd_info->bb_first_free : start;
> + count = 0;
> + free_count = 0;
> +
> + while (start <= max) {
> + start = mb_find_next_zero_bit(bitmap, max + 1, start);
> + if (start > max)
> + break;
> + next = mb_find_next_bit(bitmap, max + 1, start);
> +
> + if ((next - start) >= minblocks) {
> + ret = ext4_trim_extent(sb, start, next - start, e4b);
> + if (ret && ret != -EOPNOTSUPP)
> + break;
> + ret = 0;
> + count += next - start;
> + }

"ret" is only used inside the if statement, so this might be better as:

> + if ((next - start) >= minblocks) {
> + int ret = ext4_trim_extent(sb, start, next - start, e4b);
> +
> + if (ret && ret != -EOPNOTSUPP)
> + break;
> + count += next - start;
> + }

... and then drop the "int ret = 0" above.

Otherwise, looks good.

- Ted

2021-08-12 20:41:56

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex

On Sat, Jul 24, 2021 at 03:41:23PM +0800, Wang Jianchao wrote:
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 34be2f07449d..a496509e61b7 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb)
> struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
> int count;
>
> + if (test_opt(sb, DISCARD)) {
> + /*
> + * wait the discard work to drain all of ext4_free_data
> + */
> + queue_work(ext4_discard_wq, &sbi->s_discard_work);
> + flush_work(&sbi->s_discard_work);

I agree with Jan --- it's not clear to me why the call to queue_work()
is needed. After the flush_work() call returns, if s_discard_work is
still non-empty, there must be something terribly wrong --- are we
missing something?

> @@ -3672,8 +3724,14 @@ int __init ext4_init_mballoc(void)
> if (ext4_free_data_cachep == NULL)
> goto out_ac_free;
>
> + ext4_discard_wq = alloc_workqueue("ext4discard", WQ_UNBOUND, 0);
> + if (!ext4_discard_wq)
> + goto out_free_data;
> +


Perhaps we should only allocate the workqueue when it's needed ---
e.g., when a file system is mounted or remounted with "-o discard"?

Then in ext4_exit_malloc(), we only free it if ext4_discard_wq is
non-NULL.

This would save a bit of memory on systems that wouldn't need the ext4
discard work queue.

- Ted

2021-08-26 07:16:44

by Wang Jianchao

[permalink] [raw]
Subject: Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex



On 2021/8/4 11:45 PM, Jan Kara wrote:
> On Sat 24-07-21 15:41:23, Wang Jianchao wrote:
>> From: Wang Jianchao <[email protected]>
>>
>> Right now, discard is issued and waited to be completed in jbd2
>> commit kthread context after the logs are committed. When large
>> amount of files are deleted and discard is flooding, jbd2 commit
>> kthread can be blocked for long time. Then all of the metadata
>> operations can be blocked to wait the log space.
>>
>> One case is the page fault path with read mm->mmap_sem held, which
>> wants to update the file time but has to wait for the log space.
>> When other threads in the task wants to do mmap, then write mmap_sem
>> is blocked. Finally all of the following read mmap_sem requirements
>> are blocked, even the ps command which need to read the /proc/pid/
>> -cmdline. Our monitor service which needs to read /proc/pid/cmdline
>> used to be blocked for 5 mins.
>>
>> This patch frees the blocks back to buddy after commit and then do
>> discard in a async kworker context in fstrim fashion, namely,
>> - mark blocks to be discarded as used if they have not been allocated
>> - do discard
>> - mark them free
>> After this, jbd2 commit kthread won't be blocked any more by discard
>> and we won't get NOSPC even if the discard is slow or throttled.
>>
>> Link: https://marc.info/?l=linux-kernel&m=162143690731901&w=2
>> Suggested-by: Theodore Ts'o <[email protected]>
>> Signed-off-by: Wang Jianchao <[email protected]>
>
> Looks good to me. Just one small comment below. With that addressed feel
> free to add:
>
> Reviewed-by: Jan Kara <[email protected]>
>
>
>> @@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb)
>> struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
>> int count;
>>
>> + if (test_opt(sb, DISCARD)) {
>> + /*
>> + * wait the discard work to drain all of ext4_free_data
>> + */
>> + queue_work(ext4_discard_wq, &sbi->s_discard_work);
>
> Do we really need to queue the work here? The filesystem should be
> quiescent by now, we take care to queue the work whenever we add item to
> empty list. So it should be enough to have flush_work() here and then
> possibly
>
> WARN_ON_ONCE(!list_empty(&sbi->s_discard_list))
>
> Or am I missing something?

queue_work here is indeed redundant.

Thanks so much for you point out this.
Jianchao

>
> Honza
>
>> + flush_work(&sbi->s_discard_work);
>> + }
>> +

2021-08-26 07:19:45

by Wang Jianchao

[permalink] [raw]
Subject: Re: [PATCH V3 2/5] ext4: add new helper interface ext4_try_to_trim_range()



On 2021/8/13 1:44 AM, Theodore Ts'o wrote:
> On Sat, Jul 24, 2021 at 03:41:21PM +0800, Wang Jianchao wrote:
>> From: Wang Jianchao <[email protected]>
>>
>> There is no functional change in this patch but just split the
>> codes, which serachs free block and does trim, into a new function
>> ext4_try_to_trim_range. This is preparing for the following async
>> backgroup discard.
>>
>> Reviewed-by: Andreas Dilger <[email protected]>
>> Signed-off-by: Wang Jianchao <[email protected]>
>> ---
>> fs/ext4/mballoc.c | 102 ++++++++++++++++++++++++++--------------------
>> 1 file changed, 57 insertions(+), 45 deletions(-)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index 018d5d3c6eeb..e3844152a643 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -6218,6 +6218,54 @@ __acquires(bitlock)
>> return ret;
>> }
>>
>> +static int ext4_try_to_trim_range(struct super_block *sb,
>> + struct ext4_buddy *e4b, ext4_grpblk_t start,
>> + ext4_grpblk_t max, ext4_grpblk_t minblocks)
>> +{
>> + ext4_grpblk_t next, count, free_count;
>> + void *bitmap;
>> + int ret = 0;
>> +
>> + bitmap = e4b->bd_bitmap;
>> + start = (e4b->bd_info->bb_first_free > start) ?
>> + e4b->bd_info->bb_first_free : start;
>> + count = 0;
>> + free_count = 0;
>> +
>> + while (start <= max) {
>> + start = mb_find_next_zero_bit(bitmap, max + 1, start);
>> + if (start > max)
>> + break;
>> + next = mb_find_next_bit(bitmap, max + 1, start);
>> +
>> + if ((next - start) >= minblocks) {
>> + ret = ext4_trim_extent(sb, start, next - start, e4b);
>> + if (ret && ret != -EOPNOTSUPP)
>> + break;
>> + ret = 0;
>> + count += next - start;
>> + }
>
> "ret" is only used inside the if statement, so this might be better as:
>
>> + if ((next - start) >= minblocks) {
>> + int ret = ext4_trim_extent(sb, start, next - start, e4b);
>> +
>> + if (ret && ret != -EOPNOTSUPP)
>> + break;
>> + count += next - start;
>> + }
>
> ... and then drop the "int ret = 0" above.
>
> Otherwise, looks good.
>

OK, I'll do it in next version

Thanks so much
Jianchao
> - Ted
>

2021-08-26 07:52:04

by Wang Jianchao

[permalink] [raw]
Subject: Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex



On 2021/8/13 3:46 AM, Theodore Ts'o wrote:
> On Sat, Jul 24, 2021 at 03:41:23PM +0800, Wang Jianchao wrote:
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index 34be2f07449d..a496509e61b7 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb)
>> struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
>> int count;
>>
>> + if (test_opt(sb, DISCARD)) {
>> + /*
>> + * wait the discard work to drain all of ext4_free_data
>> + */
>> + queue_work(ext4_discard_wq, &sbi->s_discard_work);
>> + flush_work(&sbi->s_discard_work);
>
> I agree with Jan --- it's not clear to me why the call to queue_work()
> is needed. After the flush_work() call returns, if s_discard_work is
> still non-empty, there must be something terribly wrong --- are we
> missing something?

Yes,the queue_work() is redundant.
I will get rid of it in next version.

>
>> @@ -3672,8 +3724,14 @@ int __init ext4_init_mballoc(void)
>> if (ext4_free_data_cachep == NULL)
>> goto out_ac_free;
>>
>> + ext4_discard_wq = alloc_workqueue("ext4discard", WQ_UNBOUND, 0);
>> + if (!ext4_discard_wq)
>> + goto out_free_data;
>> +
>
>
> Perhaps we should only allocate the workqueue when it's needed ---
> e.g., when a file system is mounted or remounted with "-o discard"?
>
> Then in ext4_exit_malloc(), we only free it if ext4_discard_wq is
> non-NULL.
>
> This would save a bit of memory on systems that wouldn't need the ext4
> discard work queue.

Yes, it make sense to the system with pool memory

Thanks so much
Jianchao

>
> - Ted
>

2021-08-26 11:43:08

by Wang Jianchao

[permalink] [raw]
Subject: Re: [PATCH V3 5/5] ext4: make fallocate retry when err is ENOSPC



On 2021/8/4 11:52 PM, Jan Kara wrote:
> On Mon 26-07-21 15:05:41, Wang Jianchao wrote:
>>
>>
>> On 2021/7/26 11:40 AM, Guoqing Jiang wrote:
>>> Hi,
>>>
>>> On 7/24/21 3:41 PM, Wang Jianchao wrote:
>>>> From: Wang Jianchao <[email protected]>
>>>>
>>>> The blocks may be waiting for journal commit to be freed back to
>>>> mb buddy. Let fallocate wait and retry in that case.
>>>>
>>>> Signed-off-by: Wang Jianchao <[email protected]>
>>>> ---
>>>>   fs/ext4/extents.c | 6 +++++-
>>>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>>>> index 92ad64b89d9b..ad0b874d3448 100644
>>>> --- a/fs/ext4/extents.c
>>>> +++ b/fs/ext4/extents.c
>>>> @@ -4635,7 +4635,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>>>>       struct inode *inode = file_inode(file);
>>>>       loff_t new_size = 0;
>>>>       unsigned int max_blocks;
>>>> -    int ret = 0;
>>>> +    int ret = 0, retries = 0;
>>>>       int flags;
>>>>       ext4_lblk_t lblk;
>>>>       unsigned int blkbits = inode->i_blkbits;
>>>> @@ -4656,6 +4656,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>>>>                FALLOC_FL_INSERT_RANGE))
>>>>           return -EOPNOTSUPP;
>>>>   +retry:
>>>>       ext4_fc_start_update(inode);
>>>>         if (mode & FALLOC_FL_PUNCH_HOLE) {
>>>> @@ -4722,6 +4723,9 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>>>>       trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
>>>>   exit:
>>>>       ext4_fc_stop_update(inode);
>>>> +    if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
>>>> +        goto retry;
>>>> +
>>>
>>> Not sure if it is necessary since ext4_alloc_file_blocks already retries allocate.
>>
>> Yes, this patch should be get rid of. But it is indeed helpful to fix
>> the xfstest generic/371 which does concurrently write/rm and
>> fallocate/rm. I'll figure out some other way to improve that
>
> Note that the retry logic is only a heuristic. It is not guaranteed any
> number of retries is enough, we just do three to not give up too easily...
> Your patch effectively raised number of retries to 9 so that may have
> masked the issue. But I don't think so high number of retries is a sensible
> choice because that way it may take too long to return ENOSPC.

The failure seems due to the background discard which marks the blocks used
before issue discard.

The test make a 256M filesystem which has 59316 4K blocks.
There are two thread running concurrently,
- write, rm 80M file
- fallocate, rm 80M file

When the fallocate failed, I can observe there was a 80M on-going background trim

We seems to need to add a flush_work(sbi->s_discard_work) into ext4_should_retry_alloc()

Thanks so much
Jianchao
>
> Honza
>