2023-02-26 16:03:36

by Ammar Faizi

[permalink] [raw]
Subject: [RFC PATCH v1 0/6] Introducing `wq_cpu_set` mount option for btrfs

Hi,

This is an RFC patchset that introduces the `wq_cpu_set` mount option.
This option lets the user specify a CPU set that the Btrfs workqueues
will use.

Btrfs workqueues can slow sensitive user tasks down because they can use
any online CPU to perform heavy workloads on an SMP system. Add a mount
option to isolate the Btrfs workqueues to a set of CPUs. It is helpful
to avoid sensitive user tasks being preempted by Btrfs heavy workqueues.

This option is similar to the taskset bitmask except that the comma
separator is replaced with a dot. The reason for this is that the mount
option parser uses commas to separate mount options.

Figure (the CPU usage when `wq_cpu_set` is used VS when it is not):
https://gist.githubusercontent.com/ammarfaizi2/a10f8073e58d1712c1ed49af83ae4ad1/raw/a4f7cbc4eb163db792a669d570ff542495e8c704/wq_cpu_set.png

A simple stress testing:

1. Open htop.
2. Open a new terminal.
3. Mount and perform a heavy workload on the mounted Btrfs filesystem.

## Test without wq_cpu_set
sudo mount -t btrfs -o rw,compress-force=zstd:15,commit=1500 /dev/sda2 hdd/a;
cp -rf /path/folder_with_many_large_files/ hdd/a/test;
sync; # See the CPU usage in htop.
sudo umount hdd/a;

## Test wq_cpu_set
sudo mount -t btrfs -o rw,compress-force=zstd:15,commit=1500,wq_cpu_set=0.4.1.5 /dev/sda2 hdd/a;
cp -rf /path/folder_with_many_large_files/ hdd/a/test;
sync; # See the CPU usage in htop.
sudo umount hdd/a;

Signed-off-by: Ammar Faizi <[email protected]>
---

Ammar Faizi (6):
workqueue: Add set_workqueue_cpumask() helper function
btrfs: Change `mount_opt` type in `struct btrfs_fs_info` to `u64`
btrfs: Create btrfs CPU set struct and helpers
btrfs: Add wq_cpu_set=%s mount option
btrfs: Adjust the default thread pool size when `wq_cpu_set` option is used
btrfs: Add `BTRFS_DEFAULT_MAX_THREAD_POOL_SIZE` macro

fs/btrfs/async-thread.c | 51 ++++++++++++++++++++
fs/btrfs/async-thread.h | 3 ++
fs/btrfs/disk-io.c | 6 ++-
fs/btrfs/fs.c | 97 +++++++++++++++++++++++++++++++++++++++
fs/btrfs/fs.h | 12 ++++-
fs/btrfs/super.c | 83 +++++++++++++++++++++++++++++++++
include/linux/workqueue.h | 3 ++
kernel/workqueue.c | 19 ++++++++
8 files changed, 271 insertions(+), 3 deletions(-)


base-commit: 2fcd07b7ccd5fd10b2120d298363e4e6c53ccf9c
--
Ammar Faizi



2023-02-26 16:03:44

by Ammar Faizi

[permalink] [raw]
Subject: [RFC PATCH v1 1/6] workqueue: Add set_workqueue_cpumask() helper function

Allow users to specify a CPU set for the workqueue. The first use case
of this helper function is to set the CPU affinity of Btrfs workqueues.

Signed-off-by: Ammar Faizi <[email protected]>
---
include/linux/workqueue.h | 3 +++
kernel/workqueue.c | 19 +++++++++++++++++++
2 files changed, 22 insertions(+)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index ac551b8ee7d9f2f4..e3bd6f47e74ecd66 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -710,6 +710,9 @@ int workqueue_online_cpu(unsigned int cpu);
int workqueue_offline_cpu(unsigned int cpu);
#endif

+int set_workqueue_cpumask(struct workqueue_struct *wq,
+ const cpumask_var_t mask);
+
void __init workqueue_init_early(void);
void __init workqueue_init(void);

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b8b541caed4854a4..adc1478fafb1811c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4398,6 +4398,25 @@ static int init_rescuer(struct workqueue_struct *wq)
return 0;
}

+int set_workqueue_cpumask(struct workqueue_struct *wq, const cpumask_var_t mask)
+{
+ struct workqueue_attrs *tmp_attrs;
+ int ret;
+
+ tmp_attrs = alloc_workqueue_attrs();
+ if (!tmp_attrs)
+ return -ENOMEM;
+
+ apply_wqattrs_lock();
+ copy_workqueue_attrs(tmp_attrs, wq->unbound_attrs);
+ cpumask_copy(tmp_attrs->cpumask, mask);
+ ret = apply_workqueue_attrs_locked(wq, tmp_attrs);
+ apply_wqattrs_unlock();
+ free_workqueue_attrs(tmp_attrs);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(set_workqueue_cpumask);
+
__printf(1, 4)
struct workqueue_struct *alloc_workqueue(const char *fmt,
unsigned int flags,
--
Ammar Faizi


2023-02-26 16:03:49

by Ammar Faizi

[permalink] [raw]
Subject: [RFC PATCH v1 2/6] btrfs: Change `mount_opt` type in `struct btrfs_fs_info` to `u64`

On a 32-bit Linux system, `unsigned long` is 32-bit. In preparation to
add more mount options, change the type to `u64` because the enum for
this option has reached the max 32-bit capacity.

It does not make any difference on a system where `unsigned long` is
64-bit, only needed for the 32-bit system.

Signed-off-by: Ammar Faizi <[email protected]>
---
fs/btrfs/fs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index 4c477eae689148dd..6de61367b6686197 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -422,7 +422,7 @@ struct btrfs_fs_info {
* required instead of the faster short fsync log commits
*/
u64 last_trans_log_full_commit;
- unsigned long mount_opt;
+ u64 mount_opt;

unsigned long compress_type:4;
unsigned int compress_level;
--
Ammar Faizi


2023-02-26 16:03:52

by Ammar Faizi

[permalink] [raw]
Subject: [RFC PATCH v1 3/6] btrfs: Create btrfs CPU set struct and helpers

It's a preparation to add Btrfs CPU affinity support. First, create a
new struct named `btrfs_cpu_set` to contain CPU affinity information.
Then, create helpers to allocate, parse, and free them.

Signed-off-by: Ammar Faizi <[email protected]>
---
fs/btrfs/fs.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++
fs/btrfs/fs.h | 7 ++++
2 files changed, 104 insertions(+)

diff --git a/fs/btrfs/fs.c b/fs/btrfs/fs.c
index 31c1648bc0b46922..283d153d4491289c 100644
--- a/fs/btrfs/fs.c
+++ b/fs/btrfs/fs.c
@@ -96,3 +96,100 @@ void __btrfs_clear_fs_compat_ro(struct btrfs_fs_info *fs_info, u64 flag,
set_bit(BTRFS_FS_FEATURE_CHANGED, &fs_info->flags);
}
}
+
+/*
+ * The user can't use the taskset pattern because ',' is used as
+ * the mount option delimiter. They can use the same taskset pattern,
+ * but replace the ',' with '.' and we will replace it back to
+ * ',', so the cpulist_parse() can recognize it.
+ *
+ * For example, in taskset cmd, they do:
+ * taskset -c 1,4,7 /bin/ls
+ *
+ * The equivalent CPU mask for the btrfs mount option will be:
+ * wq_cpu_set=1.4.7
+ *
+ * Mark these as __cold to avoid the code bloat from overoptimizing
+ * the loop.
+ */
+__cold static void cpulist_dot_to_comma(char *set)
+{
+ while (*set) {
+ if (*set == '.')
+ *set = ',';
+ set++;
+ }
+}
+
+__cold static void cpulist_comma_to_dot(char *set)
+{
+ while (*set) {
+ if (*set == ',')
+ *set = '.';
+ set++;
+ }
+}
+
+void btrfs_destroy_cpu_set(struct btrfs_cpu_set *cpu_set)
+{
+ if (!cpu_set)
+ return;
+
+ free_cpumask_var(cpu_set->mask);
+ kfree(cpu_set->mask_str);
+ kfree(cpu_set);
+}
+
+/*
+ * Only called from btrfs_parse_cpu_set().
+ */
+static struct btrfs_cpu_set *btrfs_alloc_cpu_set(void)
+{
+ struct btrfs_cpu_set *cpu_set;
+
+ cpu_set = kmalloc(sizeof(*cpu_set), GFP_KERNEL);
+ if (!cpu_set)
+ return NULL;
+
+ if (!alloc_cpumask_var(&cpu_set->mask, GFP_KERNEL)) {
+ kfree(cpu_set);
+ return NULL;
+ }
+
+ cpu_set->mask_str = NULL;
+ return cpu_set;
+}
+
+int btrfs_parse_cpu_set(struct btrfs_cpu_set **cpu_set_p, const char *mask_str)
+{
+ struct btrfs_cpu_set *cpu_set;
+ int ret;
+
+ cpu_set = btrfs_alloc_cpu_set();
+ if (!cpu_set)
+ return -ENOMEM;
+
+ cpu_set->mask_str = kstrdup(mask_str, GFP_KERNEL);
+ if (!cpu_set->mask_str) {
+ ret = -ENOMEM;
+ goto out_fail;
+ }
+
+ cpulist_dot_to_comma(cpu_set->mask_str);
+ ret = cpulist_parse(cpu_set->mask_str, cpu_set->mask);
+ if (ret)
+ goto out_fail;
+
+ if (cpumask_empty(cpu_set->mask)) {
+ ret = -EINVAL;
+ goto out_fail;
+ }
+
+ cpulist_comma_to_dot(cpu_set->mask_str);
+ *cpu_set_p = cpu_set;
+ return 0;
+
+out_fail:
+ btrfs_destroy_cpu_set(cpu_set);
+ return ret;
+}
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index 6de61367b6686197..cbad856df197ccfd 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -356,6 +356,11 @@ struct btrfs_commit_stats {
u64 total_commit_dur;
};

+struct btrfs_cpu_set {
+ cpumask_var_t mask;
+ char *mask_str;
+};
+
struct btrfs_fs_info {
u8 chunk_tree_uuid[BTRFS_UUID_SIZE];
unsigned long flags;
@@ -876,6 +881,8 @@ void btrfs_exclop_start_unlock(struct btrfs_fs_info *fs_info);
void btrfs_exclop_finish(struct btrfs_fs_info *fs_info);
void btrfs_exclop_balance(struct btrfs_fs_info *fs_info,
enum btrfs_exclusive_operation op);
+int btrfs_parse_cpu_set(struct btrfs_cpu_set **cpu_set_p, const char *mask_str);
+void btrfs_destroy_cpu_set(struct btrfs_cpu_set *cpu_set);

/* Compatibility and incompatibility defines */
void __btrfs_set_fs_incompat(struct btrfs_fs_info *fs_info, u64 flag,
--
Ammar Faizi


2023-02-26 16:04:01

by Ammar Faizi

[permalink] [raw]
Subject: [RFC PATCH v1 4/6] btrfs: Add wq_cpu_set=%s mount option

Btrfs workqueues can slow sensitive user tasks down because they can use
any online CPU to perform heavy workloads on an SMP system. Add a mount
option to isolate the Btrfs workqueues to a set of CPUs. It is helpful
to avoid sensitive user tasks being preempted by Btrfs heavy workqueues.

This option is similar to the taskset bitmask except that the comma
separator is replaced with a dot. The reason for this is that the mount
option parser uses commas to separate mount options.

Signed-off-by: Ammar Faizi <[email protected]>
---
fs/btrfs/async-thread.c | 51 +++++++++++++++++++++++++++++++++++++++++
fs/btrfs/async-thread.h | 1 +
fs/btrfs/disk-io.c | 3 ++-
fs/btrfs/fs.h | 3 +++
fs/btrfs/super.c | 44 +++++++++++++++++++++++++++++++++++
5 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index aac240430efe1316..445c055304574653 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -11,6 +11,7 @@
#include <linux/freezer.h>
#include "async-thread.h"
#include "ctree.h"
+#include "messages.h"

enum {
WORK_DONE_BIT,
@@ -339,3 +340,53 @@ void btrfs_flush_workqueue(struct btrfs_workqueue *wq)
{
flush_workqueue(wq->normal_wq);
}
+
+static int apply_wq_cpu_set_notice(struct btrfs_fs_info *info,
+ struct workqueue_struct *wq,
+ const char *wq_name)
+{
+ const char *mask_str = info->wq_cpu_set->mask_str;
+ int ret;
+
+ ret = set_workqueue_cpumask(wq, info->wq_cpu_set->mask);
+ if (ret) {
+ btrfs_err(info, "failed to set cpu mask for %s wq: %d", wq_name,
+ ret);
+ return ret;
+ }
+
+ btrfs_info(info, "set cpu mask for %s wq to %s", wq_name, mask_str);
+ return 0;
+}
+
+#define apply_wq_cpu_set(INFO, WQ) \
+ apply_wq_cpu_set_notice(INFO, (INFO)->WQ, # WQ)
+
+#define btrfs_apply_wq_cpu_set(INFO, WQ) \
+ apply_wq_cpu_set_notice(INFO, (INFO)->WQ->normal_wq, # WQ)
+
+
+void btrfs_apply_workqueue_cpu_set(struct btrfs_fs_info *fs_info)
+{
+ if (!btrfs_test_opt(fs_info, WQ_CPU_SET))
+ return;
+
+ btrfs_apply_wq_cpu_set(fs_info, workers);
+ btrfs_apply_wq_cpu_set(fs_info, hipri_workers);
+ btrfs_apply_wq_cpu_set(fs_info, delalloc_workers);
+ btrfs_apply_wq_cpu_set(fs_info, flush_workers);
+ btrfs_apply_wq_cpu_set(fs_info, caching_workers);
+ btrfs_apply_wq_cpu_set(fs_info, fixup_workers);
+ apply_wq_cpu_set(fs_info, endio_workers);
+ apply_wq_cpu_set(fs_info, endio_meta_workers);
+ apply_wq_cpu_set(fs_info, rmw_workers);
+ btrfs_apply_wq_cpu_set(fs_info, endio_write_workers);
+ apply_wq_cpu_set(fs_info, compressed_write_workers);
+ btrfs_apply_wq_cpu_set(fs_info, endio_freespace_worker);
+ btrfs_apply_wq_cpu_set(fs_info, delayed_workers);
+ btrfs_apply_wq_cpu_set(fs_info, qgroup_rescan_workers);
+ apply_wq_cpu_set(fs_info, discard_ctl.discard_workers);
+}
+
+#undef apply_wq_cpu_set
+#undef btrfs_apply_wq_cpu_set
diff --git a/fs/btrfs/async-thread.h b/fs/btrfs/async-thread.h
index 6e2596ddae1002ab..2b8a76fa75ef9e69 100644
--- a/fs/btrfs/async-thread.h
+++ b/fs/btrfs/async-thread.h
@@ -41,5 +41,6 @@ struct btrfs_fs_info * __pure btrfs_work_owner(const struct btrfs_work *work);
struct btrfs_fs_info * __pure btrfs_workqueue_owner(const struct btrfs_workqueue *wq);
bool btrfs_workqueue_normal_congested(const struct btrfs_workqueue *wq);
void btrfs_flush_workqueue(struct btrfs_workqueue *wq);
+void btrfs_apply_workqueue_cpu_set(struct btrfs_fs_info *fs_info);

#endif
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b53f0e30ce2b3bbb..1bb1db461a30fa71 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1498,6 +1498,7 @@ static void free_global_roots(struct btrfs_fs_info *fs_info)

void btrfs_free_fs_info(struct btrfs_fs_info *fs_info)
{
+ btrfs_destroy_cpu_set(fs_info->wq_cpu_set);
percpu_counter_destroy(&fs_info->dirty_metadata_bytes);
percpu_counter_destroy(&fs_info->delalloc_bytes);
percpu_counter_destroy(&fs_info->ordered_bytes);
@@ -2231,7 +2232,7 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
fs_info->discard_ctl.discard_workers)) {
return -ENOMEM;
}
-
+ btrfs_apply_workqueue_cpu_set(fs_info);
return 0;
}

diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index cbad856df197ccfd..a8bd1414b2520ea4 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -177,6 +177,7 @@ enum {
BTRFS_MOUNT_IGNOREBADROOTS = (1UL << 29),
BTRFS_MOUNT_IGNOREDATACSUMS = (1UL << 30),
BTRFS_MOUNT_NODISCARD = (1UL << 31),
+ BTRFS_MOUNT_WQ_CPU_SET = (1ULL << 32),
};

/*
@@ -807,6 +808,8 @@ struct btrfs_fs_info {
spinlock_t eb_leak_lock;
struct list_head allocated_ebs;
#endif
+
+ struct btrfs_cpu_set *wq_cpu_set;
};

static inline void btrfs_set_last_root_drop_gen(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 581845bc206ad28b..3e061ec977b014d1 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -139,6 +139,7 @@ enum {
#ifdef CONFIG_BTRFS_FS_REF_VERIFY
Opt_ref_verify,
#endif
+ Opt_wq_cpu_set,
Opt_err,
};

@@ -213,6 +214,7 @@ static const match_table_t tokens = {
#ifdef CONFIG_BTRFS_FS_REF_VERIFY
{Opt_ref_verify, "ref_verify"},
#endif
+ {Opt_wq_cpu_set, "wq_cpu_set=%s"},
{Opt_err, NULL},
};

@@ -298,6 +300,23 @@ static int parse_rescue_options(struct btrfs_fs_info *info, const char *options)
return ret;
}

+static int parse_wq_cpu_set(struct btrfs_fs_info *info, const char *mask_str)
+{
+ struct btrfs_cpu_set *cpu_set;
+ int ret;
+
+ ret = btrfs_parse_cpu_set(&cpu_set, mask_str);
+ if (ret) {
+ btrfs_err(info, "failed to parse wq_cpu_set: %d", ret);
+ return ret;
+ }
+
+ info->wq_cpu_set = cpu_set;
+ btrfs_info(info, "using wq_cpu_set=%s", mask_str);
+ btrfs_set_opt(info->mount_opt, WQ_CPU_SET);
+ return 0;
+}
+
/*
* Regular mount options parser. Everything that is needed only when
* reading in a new superblock is parsed here.
@@ -803,6 +822,11 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
btrfs_set_opt(info->mount_opt, REF_VERIFY);
break;
#endif
+ case Opt_wq_cpu_set:
+ ret = parse_wq_cpu_set(info, args[0].from);
+ if (ret < 0)
+ goto out;
+ break;
case Opt_err:
btrfs_err(info, "unrecognized mount option '%s'", p);
ret = -EINVAL;
@@ -1319,6 +1343,8 @@ static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry)
#endif
if (btrfs_test_opt(info, REF_VERIFY))
seq_puts(seq, ",ref_verify");
+ if (btrfs_test_opt(info, WQ_CPU_SET))
+ seq_printf(seq, ",wq_cpu_set=%s", info->wq_cpu_set->mask_str);
seq_printf(seq, ",subvolid=%llu",
BTRFS_I(d_inode(dentry))->root->root_key.objectid);
subvol_name = btrfs_get_subvol_name_from_objectid(info,
@@ -1686,6 +1712,7 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
u64 old_max_inline = fs_info->max_inline;
u32 old_thread_pool_size = fs_info->thread_pool_size;
u32 old_metadata_ratio = fs_info->metadata_ratio;
+ struct btrfs_cpu_set *old_wq_cpu_set = fs_info->wq_cpu_set;
int ret;

sync_filesystem(sb);
@@ -1838,6 +1865,14 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
set_bit(BTRFS_FS_OPEN, &fs_info->flags);
}
out:
+ /*
+ * The remount operation changes the wq_cpu_set.
+ */
+ if (fs_info->wq_cpu_set != old_wq_cpu_set) {
+ btrfs_destroy_cpu_set(old_wq_cpu_set);
+ btrfs_apply_workqueue_cpu_set(fs_info);
+ }
+
/*
* We need to set SB_I_VERSION here otherwise it'll get cleared by VFS,
* since the absence of the flag means it can be toggled off by remount.
@@ -1852,6 +1887,15 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
return 0;

restore:
+ /*
+ * The remount operation changes the wq_cpu_set, but we hit an error,
+ * destroy the new value and roll it back to the previous value.
+ */
+ if (fs_info->wq_cpu_set != old_wq_cpu_set) {
+ btrfs_destroy_cpu_set(fs_info->wq_cpu_set);
+ fs_info->wq_cpu_set = old_wq_cpu_set;
+ }
+
/* We've hit an error - don't reset SB_RDONLY */
if (sb_rdonly(sb))
old_flags |= SB_RDONLY;
--
Ammar Faizi


2023-02-26 16:04:12

by Ammar Faizi

[permalink] [raw]
Subject: [RFC PATCH v1 5/6] btrfs: Adjust the default thread pool size when `wq_cpu_set` option is used

If wq_cpu_set is specified, adjust thread_pool_size to the number of
CPUs in the set plus 2 to avoid the case where the number of CPUs in the
set is less than the default thread_pool_size + 2, which might cause the
thread pool to be starved. The thread_pool=%u mount option overrides
this adjusting behavior.

Signed-off-by: Ammar Faizi <[email protected]>
---
fs/btrfs/super.c | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 3e061ec977b014d1..34b7c5810d34d624 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -317,6 +317,32 @@ static int parse_wq_cpu_set(struct btrfs_fs_info *info, const char *mask_str)
return 0;
}

+static void adjust_default_thread_pool_size(struct btrfs_fs_info *info)
+{
+ unsigned long old_thread_pool_size;
+ unsigned long new_thread_pool_size;
+ unsigned long total_usable_cpu = 0;
+ unsigned long cpu;
+
+ if (!btrfs_test_opt(info, WQ_CPU_SET))
+ return;
+
+ for_each_online_cpu(cpu) {
+ if (cpumask_test_cpu(cpu, info->wq_cpu_set->mask))
+ total_usable_cpu++;
+ }
+
+ old_thread_pool_size = info->thread_pool_size;
+ new_thread_pool_size = min_t(unsigned long, total_usable_cpu + 2, 8);
+
+ if (old_thread_pool_size == new_thread_pool_size)
+ return;
+
+ info->thread_pool_size = new_thread_pool_size;
+ btrfs_info(info, "adjusting thread_pool_size to %lu due to wq_cpu_set (you can override this with thread_pool=%%u option)",
+ new_thread_pool_size);
+}
+
/*
* Regular mount options parser. Everything that is needed only when
* reading in a new superblock is parsed here.
@@ -336,6 +362,7 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
bool saved_compress_force;
int no_compress = 0;
const bool remounting = test_bit(BTRFS_FS_STATE_REMOUNTING, &info->fs_state);
+ bool has_thread_pool_opt = false;

if (btrfs_fs_compat_ro(info, FREE_SPACE_TREE))
btrfs_set_opt(info->mount_opt, FREE_SPACE_TREE);
@@ -543,6 +570,7 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
goto out;
}
info->thread_pool_size = intarg;
+ has_thread_pool_opt = true;
break;
case Opt_max_inline:
num = match_strdup(&args[0]);
@@ -854,6 +882,16 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
}
if (!ret)
ret = btrfs_check_mountopts_zoned(info);
+
+ /*
+ * If wq_cpu_set is specified, adjust thread_pool_size to the number of
+ * CPUs in the set plus 2 to avoid the case where the number of CPUs in
+ * the set is less than the default thread_pool_size + 2, which might
+ * cause the thread pool to be starved. The thread_pool=%u mount option
+ * overrides this adjusting behavior.
+ */
+ if (!ret && !has_thread_pool_opt)
+ adjust_default_thread_pool_size(info);
if (!ret && !remounting) {
if (btrfs_test_opt(info, SPACE_CACHE))
btrfs_info(info, "disk space caching is enabled");
--
Ammar Faizi


2023-02-26 16:04:16

by Ammar Faizi

[permalink] [raw]
Subject: [RFC PATCH v1 6/6] btrfs: Add `BTRFS_DEFAULT_MAX_THREAD_POOL_SIZE` macro

Currently, the default max thread pool size is hardcoded as 8. This
number is not only used in one place. Keep the default max thread pool
size in sync by introducing a new macro.

Signed-off-by: Ammar Faizi <[email protected]>
---
fs/btrfs/async-thread.h | 2 ++
fs/btrfs/disk-io.c | 3 ++-
fs/btrfs/super.c | 3 ++-
3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/async-thread.h b/fs/btrfs/async-thread.h
index 2b8a76fa75ef9e69..f11c3b36568053be 100644
--- a/fs/btrfs/async-thread.h
+++ b/fs/btrfs/async-thread.h
@@ -9,6 +9,8 @@

#include <linux/workqueue.h>

+#define BTRFS_DEFAULT_MAX_THREAD_POOL_SIZE 8
+
struct btrfs_fs_info;
struct btrfs_workqueue;
struct btrfs_work;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1bb1db461a30fa71..4f4ddc8e088b08ec 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2957,7 +2957,8 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
btrfs_init_ref_verify(fs_info);

fs_info->thread_pool_size = min_t(unsigned long,
- num_online_cpus() + 2, 8);
+ num_online_cpus() + 2,
+ BTRFS_DEFAULT_MAX_THREAD_POOL_SIZE);

INIT_LIST_HEAD(&fs_info->ordered_roots);
spin_lock_init(&fs_info->ordered_root_lock);
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 34b7c5810d34d624..bf4be383e289ef6c 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -333,7 +333,8 @@ static void adjust_default_thread_pool_size(struct btrfs_fs_info *info)
}

old_thread_pool_size = info->thread_pool_size;
- new_thread_pool_size = min_t(unsigned long, total_usable_cpu + 2, 8);
+ new_thread_pool_size = min_t(unsigned long, total_usable_cpu + 2,
+ BTRFS_DEFAULT_MAX_THREAD_POOL_SIZE);

if (old_thread_pool_size == new_thread_pool_size)
return;
--
Ammar Faizi


2023-02-26 17:01:50

by Tejun Heo

[permalink] [raw]
Subject: Re: [RFC PATCH v1 0/6] Introducing `wq_cpu_set` mount option for btrfs

On Sun, Feb 26, 2023 at 11:02:53PM +0700, Ammar Faizi wrote:
> Hi,
>
> This is an RFC patchset that introduces the `wq_cpu_set` mount option.
> This option lets the user specify a CPU set that the Btrfs workqueues
> will use.
>
> Btrfs workqueues can slow sensitive user tasks down because they can use
> any online CPU to perform heavy workloads on an SMP system. Add a mount
> option to isolate the Btrfs workqueues to a set of CPUs. It is helpful
> to avoid sensitive user tasks being preempted by Btrfs heavy workqueues.
>
> This option is similar to the taskset bitmask except that the comma
> separator is replaced with a dot. The reason for this is that the mount
> option parser uses commas to separate mount options.

Hmm... the allowed cpumasks for unbounded workqueues can already be set
through /sys/devices/virtual/workqueue/cpumask and also each individual
workqueue can be exposed in the matching subdirectory by setting WQ_SYSFS.
Wouldn't the proposed btrfs option be a bit reduandant?

Thanks.

--
tejun

2023-02-26 18:26:28

by Ammar Faizi

[permalink] [raw]
Subject: Re: [RFC PATCH v1 0/6] Introducing `wq_cpu_set` mount option for btrfs

Hello,

On Sun, 26 Feb 2023 07:01:41 -1000, Tejun Heo wrote:
> Hmm... the allowed cpumasks for unbounded workqueues can already be set
> through /sys/devices/virtual/workqueue/cpumask and also each individual
> workqueue can be exposed in the matching subdirectory by setting WQ_SYSFS.
> Wouldn't the proposed btrfs option be a bit reduandant?

Thank you for the comment. I just realized the sysfs facility for this.
So I will take a look into it deeper. For now, I have several reasons to
use the `wq_cpu_set` option:

1. Currently, there are 15 btrfs workqueues. It would not be convenient
to let the user manage each of them via the sysfs. Using `wq_cpu_set`
option at mounting time allows the user to set all of them in one
shot.

(for btrfs maintainers):
I am also not sure if the number of btrfs workqueues is stable so
that the user can rely on the WQ_SYSFS facility.

2. I looked at /sys/devices/virtual/workqueue/ and saw its
subdirectories. The directory name is taken from the wq->name. But
how to distinguish multiple workqueues with the same name?

Each btrfs mount will at least do this:

alloc_workqueue("btrfs-compressed-write", flags, max_active);

When we do:

mount -t -o rw btrfs /dev/sda1 a;
mount -t -o rw btrfs /dev/sda2 b;
mount -t -o rw btrfs /dev/sda3 c;
mount -t -o rw btrfs /dev/sda4 d;

Is there a way to identify which sysfs devices correspond to a specific
mounted btrfs fs workqueues? Let's say I want each mount to have a
different CPU mask.

--
Ammar Faizi


2023-02-26 18:29:22

by Ammar Faizi

[permalink] [raw]
Subject: Re: [RFC PATCH v1 0/6] Introducing `wq_cpu_set` mount option for btrfs

On Mon, Feb 27, 2023 at 01:26:24AM +0700, Ammar Faizi wrote:
> mount -t -o rw btrfs /dev/sda1 a;
> mount -t -o rw btrfs /dev/sda2 b;
> mount -t -o rw btrfs /dev/sda3 c;
> mount -t -o rw btrfs /dev/sda4 d;

Excuse the wrong mount commands, should be something like this:

mount -t btrfs -o rw /dev/sda1 a;
mount -t btrfs -o rw /dev/sda2 b;
mount -t btrfs -o rw /dev/sda3 c;
mount -t btrfs -o rw /dev/sda4 d;

--
Ammar Faizi


2023-02-27 09:48:16

by Ammar Faizi

[permalink] [raw]
Subject: Re: [RFC PATCH v1 0/6] Introducing `wq_cpu_set` mount option for btrfs

On Mon, Feb 27, 2023 at 11:04:38AM +0800, Hillf Danton wrote:
> Are the sensitive user tasks likely also preempted by the kblockd_workqueue[1]
> for instance?
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/block/blk-mq.c#n2258

I didn't see any kblockd_workqueue influence.

From my observation, the issue is btrfs-specific, especially for write
operations with high-level compression. It does not happen with ext4.

(Well, it doesn't compress the data with ext4, so no doubt it's lighter).

Seeing from htop, several kthreads consume CPU time:

- kworker btrfs-dealloc

- kworker btrfs-endio-write

- kworker btrfs-worker-high

- kworker btrfs-compressed-write

- ... there are more, but all of them are btrfs workqueues.

They perform heavy work when there is an intensive writing operation.
However, for the read operation, there is no visible issue.

Increasing vm.dirty_ratio and vm.background_dirty_ratio, plus using

-o commit=N (mount option)

where N is a large number helps a lot. But it slows my entire system
down when it starts syncing the dirty write to the disk. It freezes the
UI for 2 to 3 seconds and causes audio lag.

Isolating btrfs workqueues and UI tasks in a different set of CPUs, as
proposed in this series, solves the issue.

--
Ammar Faizi


2023-02-27 10:19:17

by Qu Wenruo

[permalink] [raw]
Subject: Re: [RFC PATCH v1 0/6] Introducing `wq_cpu_set` mount option for btrfs



On 2023/2/27 00:02, Ammar Faizi wrote:
> Hi,
>
> This is an RFC patchset that introduces the `wq_cpu_set` mount option.
> This option lets the user specify a CPU set that the Btrfs workqueues
> will use.
>
> Btrfs workqueues can slow sensitive user tasks down because they can use
> any online CPU to perform heavy workloads on an SMP system. Add a mount
> option to isolate the Btrfs workqueues to a set of CPUs. It is helpful
> to avoid sensitive user tasks being preempted by Btrfs heavy workqueues.

I'm not sure if pinning the wq is really the best way to your problem.

Yes, I understand you want to limit the CPU usage of btrfs workqueues,
but have you tried "thread_pool=" mount option?

That mount option should limit the max amount of in-flight work items,
thus at least limit the CPU usage.

For the wq CPU pinning part, I'm not sure if it's really needed,
although it's known CPU pinning can affect some performance characteristics.

Thanks,
Qu

>
> This option is similar to the taskset bitmask except that the comma
> separator is replaced with a dot. The reason for this is that the mount
> option parser uses commas to separate mount options.
>
> Figure (the CPU usage when `wq_cpu_set` is used VS when it is not):
> https://gist.githubusercontent.com/ammarfaizi2/a10f8073e58d1712c1ed49af83ae4ad1/raw/a4f7cbc4eb163db792a669d570ff542495e8c704/wq_cpu_set.png
>
> A simple stress testing:
>
> 1. Open htop.
> 2. Open a new terminal.
> 3. Mount and perform a heavy workload on the mounted Btrfs filesystem.
>
> ## Test without wq_cpu_set
> sudo mount -t btrfs -o rw,compress-force=zstd:15,commit=1500 /dev/sda2 hdd/a;
> cp -rf /path/folder_with_many_large_files/ hdd/a/test;
> sync; # See the CPU usage in htop.
> sudo umount hdd/a;
>
> ## Test wq_cpu_set
> sudo mount -t btrfs -o rw,compress-force=zstd:15,commit=1500,wq_cpu_set=0.4.1.5 /dev/sda2 hdd/a;
> cp -rf /path/folder_with_many_large_files/ hdd/a/test;
> sync; # See the CPU usage in htop.
> sudo umount hdd/a;
>
> Signed-off-by: Ammar Faizi <[email protected]>
> ---
>
> Ammar Faizi (6):
> workqueue: Add set_workqueue_cpumask() helper function
> btrfs: Change `mount_opt` type in `struct btrfs_fs_info` to `u64`
> btrfs: Create btrfs CPU set struct and helpers
> btrfs: Add wq_cpu_set=%s mount option
> btrfs: Adjust the default thread pool size when `wq_cpu_set` option is used
> btrfs: Add `BTRFS_DEFAULT_MAX_THREAD_POOL_SIZE` macro
>
> fs/btrfs/async-thread.c | 51 ++++++++++++++++++++
> fs/btrfs/async-thread.h | 3 ++
> fs/btrfs/disk-io.c | 6 ++-
> fs/btrfs/fs.c | 97 +++++++++++++++++++++++++++++++++++++++
> fs/btrfs/fs.h | 12 ++++-
> fs/btrfs/super.c | 83 +++++++++++++++++++++++++++++++++
> include/linux/workqueue.h | 3 ++
> kernel/workqueue.c | 19 ++++++++
> 8 files changed, 271 insertions(+), 3 deletions(-)
>
>
> base-commit: 2fcd07b7ccd5fd10b2120d298363e4e6c53ccf9c

2023-02-27 11:04:11

by Filipe Manana

[permalink] [raw]
Subject: Re: [RFC PATCH v1 0/6] Introducing `wq_cpu_set` mount option for btrfs

On Sun, Feb 26, 2023 at 4:31 PM Ammar Faizi <[email protected]> wrote:
>
> Hi,
>
> This is an RFC patchset that introduces the `wq_cpu_set` mount option.
> This option lets the user specify a CPU set that the Btrfs workqueues
> will use.
>
> Btrfs workqueues can slow sensitive user tasks down because they can use
> any online CPU to perform heavy workloads on an SMP system. Add a mount
> option to isolate the Btrfs workqueues to a set of CPUs. It is helpful
> to avoid sensitive user tasks being preempted by Btrfs heavy workqueues.
>
> This option is similar to the taskset bitmask except that the comma
> separator is replaced with a dot. The reason for this is that the mount
> option parser uses commas to separate mount options.
>
> Figure (the CPU usage when `wq_cpu_set` is used VS when it is not):
> https://gist.githubusercontent.com/ammarfaizi2/a10f8073e58d1712c1ed49af83ae4ad1/raw/a4f7cbc4eb163db792a669d570ff542495e8c704/wq_cpu_set.png

I haven't read the patchset.

It's great that it reduces CPU usage.
But does it also provide other performance benefits, like lower
latency or higher throughput for some workloads? Or using less CPU
also affects negatively in those other aspects?

Thanks.

>
> A simple stress testing:
>
> 1. Open htop.
> 2. Open a new terminal.
> 3. Mount and perform a heavy workload on the mounted Btrfs filesystem.
>
> ## Test without wq_cpu_set
> sudo mount -t btrfs -o rw,compress-force=zstd:15,commit=1500 /dev/sda2 hdd/a;
> cp -rf /path/folder_with_many_large_files/ hdd/a/test;
> sync; # See the CPU usage in htop.
> sudo umount hdd/a;
>
> ## Test wq_cpu_set
> sudo mount -t btrfs -o rw,compress-force=zstd:15,commit=1500,wq_cpu_set=0.4.1.5 /dev/sda2 hdd/a;
> cp -rf /path/folder_with_many_large_files/ hdd/a/test;
> sync; # See the CPU usage in htop.
> sudo umount hdd/a;
>
> Signed-off-by: Ammar Faizi <[email protected]>
> ---
>
> Ammar Faizi (6):
> workqueue: Add set_workqueue_cpumask() helper function
> btrfs: Change `mount_opt` type in `struct btrfs_fs_info` to `u64`
> btrfs: Create btrfs CPU set struct and helpers
> btrfs: Add wq_cpu_set=%s mount option
> btrfs: Adjust the default thread pool size when `wq_cpu_set` option is used
> btrfs: Add `BTRFS_DEFAULT_MAX_THREAD_POOL_SIZE` macro
>
> fs/btrfs/async-thread.c | 51 ++++++++++++++++++++
> fs/btrfs/async-thread.h | 3 ++
> fs/btrfs/disk-io.c | 6 ++-
> fs/btrfs/fs.c | 97 +++++++++++++++++++++++++++++++++++++++
> fs/btrfs/fs.h | 12 ++++-
> fs/btrfs/super.c | 83 +++++++++++++++++++++++++++++++++
> include/linux/workqueue.h | 3 ++
> kernel/workqueue.c | 19 ++++++++
> 8 files changed, 271 insertions(+), 3 deletions(-)
>
>
> base-commit: 2fcd07b7ccd5fd10b2120d298363e4e6c53ccf9c
> --
> Ammar Faizi
>

2023-02-27 11:46:24

by Qu Wenruo

[permalink] [raw]
Subject: Re: [RFC PATCH v1 0/6] Introducing `wq_cpu_set` mount option for btrfs



On 2023/2/27 19:02, Filipe Manana wrote:
> On Sun, Feb 26, 2023 at 4:31 PM Ammar Faizi <[email protected]> wrote:
>>
>> Hi,
>>
>> This is an RFC patchset that introduces the `wq_cpu_set` mount option.
>> This option lets the user specify a CPU set that the Btrfs workqueues
>> will use.
>>
>> Btrfs workqueues can slow sensitive user tasks down because they can use
>> any online CPU to perform heavy workloads on an SMP system. Add a mount
>> option to isolate the Btrfs workqueues to a set of CPUs. It is helpful
>> to avoid sensitive user tasks being preempted by Btrfs heavy workqueues.
>>
>> This option is similar to the taskset bitmask except that the comma
>> separator is replaced with a dot. The reason for this is that the mount
>> option parser uses commas to separate mount options.
>>
>> Figure (the CPU usage when `wq_cpu_set` is used VS when it is not):
>> https://gist.githubusercontent.com/ammarfaizi2/a10f8073e58d1712c1ed49af83ae4ad1/raw/a4f7cbc4eb163db792a669d570ff542495e8c704/wq_cpu_set.png
>
> I haven't read the patchset.
>
> It's great that it reduces CPU usage.
> But does it also provide other performance benefits, like lower
> latency or higher throughput for some workloads? Or using less CPU
> also affects negatively in those other aspects?

So far it looks like to just set CPU masks for each workqueue.

Thus if it's reducing CPU usage, it also takes longer time to finish the
workload (compression,csum calculation etc).

Thanks,
Qu
>
> Thanks.
>
>>
>> A simple stress testing:
>>
>> 1. Open htop.
>> 2. Open a new terminal.
>> 3. Mount and perform a heavy workload on the mounted Btrfs filesystem.
>>
>> ## Test without wq_cpu_set
>> sudo mount -t btrfs -o rw,compress-force=zstd:15,commit=1500 /dev/sda2 hdd/a;
>> cp -rf /path/folder_with_many_large_files/ hdd/a/test;
>> sync; # See the CPU usage in htop.
>> sudo umount hdd/a;
>>
>> ## Test wq_cpu_set
>> sudo mount -t btrfs -o rw,compress-force=zstd:15,commit=1500,wq_cpu_set=0.4.1.5 /dev/sda2 hdd/a;
>> cp -rf /path/folder_with_many_large_files/ hdd/a/test;
>> sync; # See the CPU usage in htop.
>> sudo umount hdd/a;
>>
>> Signed-off-by: Ammar Faizi <[email protected]>
>> ---
>>
>> Ammar Faizi (6):
>> workqueue: Add set_workqueue_cpumask() helper function
>> btrfs: Change `mount_opt` type in `struct btrfs_fs_info` to `u64`
>> btrfs: Create btrfs CPU set struct and helpers
>> btrfs: Add wq_cpu_set=%s mount option
>> btrfs: Adjust the default thread pool size when `wq_cpu_set` option is used
>> btrfs: Add `BTRFS_DEFAULT_MAX_THREAD_POOL_SIZE` macro
>>
>> fs/btrfs/async-thread.c | 51 ++++++++++++++++++++
>> fs/btrfs/async-thread.h | 3 ++
>> fs/btrfs/disk-io.c | 6 ++-
>> fs/btrfs/fs.c | 97 +++++++++++++++++++++++++++++++++++++++
>> fs/btrfs/fs.h | 12 ++++-
>> fs/btrfs/super.c | 83 +++++++++++++++++++++++++++++++++
>> include/linux/workqueue.h | 3 ++
>> kernel/workqueue.c | 19 ++++++++
>> 8 files changed, 271 insertions(+), 3 deletions(-)
>>
>>
>> base-commit: 2fcd07b7ccd5fd10b2120d298363e4e6c53ccf9c
>> --
>> Ammar Faizi
>>

2023-02-27 13:42:52

by Ammar Faizi

[permalink] [raw]
Subject: Re: [RFC PATCH v1 0/6] Introducing `wq_cpu_set` mount option for btrfs

On Mon, Feb 27, 2023 at 06:18:43PM +0800, Qu Wenruo wrote:
> I'm not sure if pinning the wq is really the best way to your problem.
>
> Yes, I understand you want to limit the CPU usage of btrfs workqueues, but
> have you tried "thread_pool=" mount option?
>
> That mount option should limit the max amount of in-flight work items, thus
> at least limit the CPU usage.

I have tried to use the thread_poll=%u mount option previously. But I
didn't observe the effect intensively. I'll try to play with this option
more and see if it can yield the desired behavior.

> For the wq CPU pinning part, I'm not sure if it's really needed, although
> it's known CPU pinning can affect some performance characteristics.

What I like about CPU pinning is that we can dedicate CPUs for specific
workloads so it won't cause scheduling noise to the app we've dedicated
other CPUs for.

--
Ammar Faizi


2023-02-27 13:46:02

by Ammar Faizi

[permalink] [raw]
Subject: Re: [RFC PATCH v1 0/6] Introducing `wq_cpu_set` mount option for btrfs

On 2/27/23 6:46 PM, Qu Wenruo wrote:
> On 2023/2/27 19:02, Filipe Manana wrote:
>> On Sun, Feb 26, 2023 at 4:31 PM Ammar Faizi <[email protected]> wrote:
>>> Figure (the CPU usage when `wq_cpu_set` is used VS when it is not):
>>> https://gist.githubusercontent.com/ammarfaizi2/a10f8073e58d1712c1ed49af83ae4ad1/raw/a4f7cbc4eb163db792a669d570ff542495e8c704/wq_cpu_set.png
>>
>> I haven't read the patchset.
>>
>> It's great that it reduces CPU usage. But does it also provide
>> other performance benefits, like lower latency or higher throughput
>> for some workloads? Or using less CPU also affects negatively in
>> those other aspects?

Based on my testing, it gives lower latency for a browser app playing
a YouTube video.

Without this proposed option, high-level compression on a btrfs
storage is a real noise to user space apps. It periodically freezes
the UI for 2 to 3 seconds and causes audio lag; it mostly happens when
it starts writing the dirty write to the disk.

It's reasonably easy to reproduce by making a large dirty write and
invoking a "sync" command.

Side note: Pin user apps to CPUs a,b,c,d and btrfs workquques to CPUs
w,x,y,z.

> So far it looks like to just set CPU masks for each workqueue.
>
> Thus if it's reducing CPU usage, it also takes longer time to finish
> the workload (compression,csum calculation etc).

Yes, that's correct.

I see this as a good mount option for btrfs because the btrfs-workload
in question is CPU bound, specifically for the writing operation.
While it may degrade the btrfs workload because we limit the number of
usable CPUs, there is a condition where users don't prioritize writing
to disk.

Let's say:
I want to run a smooth app with video. I also want to have high-level
compression for my btrfs storage. But I don't want the compression and
checksum work to bother my video; here, I give you CPU x,y,z for the
btrfs work. And here I give you CPU a,b,c,d,e,f for the video work.

I have a similar case on a torrent seeder server where high-level
compression is expected. And I believe there are more cases where this
option is advantageous.

Thank you all for the comments,

--
Ammar Faizi

2023-02-27 16:32:07

by Roman Mamedov

[permalink] [raw]
Subject: Re: [RFC PATCH v1 0/6] Introducing `wq_cpu_set` mount option for btrfs

On Mon, 27 Feb 2023 20:45:26 +0700
Ammar Faizi <[email protected]> wrote:

> Based on my testing, it gives lower latency for a browser app playing
> a YouTube video.
>
> Without this proposed option, high-level compression on a btrfs
> storage is a real noise to user space apps. It periodically freezes
> the UI for 2 to 3 seconds and causes audio lag; it mostly happens when
> it starts writing the dirty write to the disk.
>
> It's reasonably easy to reproduce by making a large dirty write and
> invoking a "sync" command.
>
> Side note: Pin user apps to CPUs a,b,c,d and btrfs workquques to CPUs
> w,x,y,z.

The end user should not be expected to do that.
(At least that is my opinion as an end user :)

The worst part, in times when Btrfs has no current work to do, it means the
user apps are hard-capped to 4 cores for no good reason, and the other 4 are
idling. Even with a split like 6/2, this is still looks like giving up on the
achievements of multi-tasking operating systems and falling back to some
antique state with fixed separation.

> I want to run a smooth app with video. I also want to have high-level
> compression for my btrfs storage. But I don't want the compression and
> checksum work to bother my video; here, I give you CPU x,y,z for the
> btrfs work. And here I give you CPU a,b,c,d,e,f for the video work.
>
> I have a similar case on a torrent seeder server where high-level
> compression is expected. And I believe there are more cases where this
> option is advantageous.

I really hope there can be some other approach. Such as some adjustment
to kworker processing so that it's not as aggressive in starving userspace.
There are no possibility to run kernel task at a lower priority, right?

But if no other way, maybe an option like that is good to have for the
moment.

--
With respect,
Roman

2023-02-27 22:17:56

by Dave Chinner

[permalink] [raw]
Subject: Re: [RFC PATCH v1 0/6] Introducing `wq_cpu_set` mount option for btrfs

On Sun, Feb 26, 2023 at 11:02:53PM +0700, Ammar Faizi wrote:
> ## Test wq_cpu_set
> sudo mount -t btrfs -o rw,compress-force=zstd:15,commit=1500,wq_cpu_set=0.4.1.5 /dev/sda2 hdd/a;
> cp -rf /path/folder_with_many_large_files/ hdd/a/test;
> sync; # See the CPU usage in htop.
> sudo umount hdd/a;

This seems like the wrong model for setting cpu locality for
internal filesystem threads.

Users are used to controlling cpu sets and other locality behaviour
of a task with wrapper tools like numactl. Wrap th emount command
with a numactl command to limit the CPU set, then have the btrfs
fill_super() callback set the cpu mask for the work queues it
creates based on the cpu mask that has been set for the mount task.

That is, I think the model should be "inherit cpu mask from parent
task" rather than adding mount options. This model allows anything
that numactl can control (e.g. memory locality) to also influence
the filesystem default behaviour without having to add yet more
mount options in the future....

-Dave.
--
Dave Chinner
[email protected]

2023-02-27 23:50:22

by Qu Wenruo

[permalink] [raw]
Subject: Re: [RFC PATCH v1 0/6] Introducing `wq_cpu_set` mount option for btrfs



On 2023/2/27 21:42, Ammar Faizi wrote:
> On Mon, Feb 27, 2023 at 06:18:43PM +0800, Qu Wenruo wrote:
>> I'm not sure if pinning the wq is really the best way to your problem.
>>
>> Yes, I understand you want to limit the CPU usage of btrfs workqueues, but
>> have you tried "thread_pool=" mount option?
>>
>> That mount option should limit the max amount of in-flight work items, thus
>> at least limit the CPU usage.
>
> I have tried to use the thread_poll=%u mount option previously. But I
> didn't observe the effect intensively. I'll try to play with this option
> more and see if it can yield the desired behavior.

The thread_pool mount option is much harder to observe the behavior change.

As wq pinned to one or two CPUs is easy to observe using htop, while the
unbounded wq, even with thread_pool, is much harder to observe.

Thus it needs more systematic testing to find the difference.

>
>> For the wq CPU pinning part, I'm not sure if it's really needed, although
>> it's known CPU pinning can affect some performance characteristics.
>
> What I like about CPU pinning is that we can dedicate CPUs for specific
> workloads so it won't cause scheduling noise to the app we've dedicated
> other CPUs for.
>

I'm not 100% sure if we're really any better than the scheduler
developers, as there are a lot of more things to consider.

E.g. for recent Intel CPUs, they have BIG and little cores, and BIG
cores even have SMT supports.
For current kernels, scheduler would avoid putting workloads into the
threads sharing the same physical cores.

Thus it can result seemingly weird priority like BIG core thread1 >
little core > BIG core thread2.
But that results overall better performance.

So overall, unless necessary I'd avoid manual CPU pinning.

Thanks,
Qu

2023-02-28 08:02:19

by Ammar Faizi

[permalink] [raw]
Subject: Re: [RFC PATCH v1 0/6] Introducing `wq_cpu_set` mount option for btrfs

On Tue, Feb 28, 2023 at 09:17:45AM +1100, Dave Chinner wrote:
> This seems like the wrong model for setting cpu locality for
> internal filesystem threads.
>
> Users are used to controlling cpu sets and other locality behaviour
> of a task with wrapper tools like numactl. Wrap th emount command
> with a numactl command to limit the CPU set, then have the btrfs
> fill_super() callback set the cpu mask for the work queues it
> creates based on the cpu mask that has been set for the mount task.
>
> That is, I think the model should be "inherit cpu mask from parent
> task" rather than adding mount options. This model allows anything
> that numactl can control (e.g. memory locality) to also influence
> the filesystem default behaviour without having to add yet more
> mount options in the future....

Good idea on the tooling part.

I like the idea of using numactl to determine a proper CPU set. But
users may also use /etc/fstab to mount their btrfs storage. In that
case, using mount option is still handy.

Also, if we always inherit CPU mask from the parent task who calls the
mount, it will be breaking the CPU affinity for old users who
inadvertently call their mount cmd with random CPU mask.

We should keep the old behavior and allow user to opt in if they want
to. Maybe something like:

numactl -N 0 mount -t btrfs -o rw,wq_cpu_set=inherit /dev/bla bla

--
Ammar Faizi