2024-04-22 07:11:32

by Kemeng Shi

[permalink] [raw]
Subject: [ATCH v3 0/4] Improve visibility of writeback

v2->v3:
-Drop patches to protect non-exist race and to define GDTC_INIT_NO_WB to
null.
-Add wb_tryget to wb from which we collect stats to bdi stats.
-Create wb_stats when CONFIG_CGROUP_WRITEBACK is no enabled.
-Add a blank line between two wb stats in wb_stats.

v1->v2:
-Send cleanup to wq_monitor.py separately.
-Add patch to avoid use after free of bdi.
-Rename wb_calc_cg_thresh to cgwb_calc_thresh as Tejun suggested.
-Use rcu walk to avoid use after free.
-Add debug output to each related patches.

This series tries to improve visilibity of writeback. Patch 1 make
/sys/kernel/debug/bdi/xxx/stats show writeback info of whole bdi
instead of only writeback info in root cgroup. Patch 2 add a new
debug file /sys/kernel/debug/bdi/xxx/wb_stats to show per wb writeback
info. Patch 3 add wb_monitor.py to monitor basic writeback info
of running system, more info could be added on demand. Patch 4
is a random cleanup. More details can be found in respective
patches. Thanks!

Following domain hierarchy is tested:
global domain (320G)
/ \
cgroup domain1(10G) cgroup domain2(10G)
| |
bdi wb1 wb2

/* all writeback info of bdi is successfully collected */
cat stats
BdiWriteback: 4704 kB
BdiReclaimable: 1294496 kB
BdiDirtyThresh: 204208088 kB
DirtyThresh: 195259944 kB
BackgroundThresh: 32503588 kB
BdiDirtied: 48519296 kB
BdiWritten: 47225696 kB
BdiWriteBandwidth: 1173892 kBps
b_dirty: 1
b_io: 0
b_more_io: 1
b_dirty_time: 0
bdi_list: 1
state: 1

/* per wb writeback info of bdi is collected */
cat /sys/kernel/debug/bdi/252:16/wb_stats
WbCgIno: 1
WbWriteback: 0 kB
WbReclaimable: 0 kB
WbDirtyThresh: 0 kB
WbDirtied: 0 kB
WbWritten: 0 kB
WbWriteBandwidth: 102400 kBps
b_dirty: 0
b_io: 0
b_more_io: 0
b_dirty_time: 0
state: 1

WbCgIno: 4208
WbWriteback: 59808 kB
WbReclaimable: 676480 kB
WbDirtyThresh: 6004624 kB
WbDirtied: 23348192 kB
WbWritten: 22614592 kB
WbWriteBandwidth: 593204 kBps
b_dirty: 1
b_io: 1
b_more_io: 0
b_dirty_time: 0
state: 7

WbCgIno: 4249
WbWriteback: 144256 kB
WbReclaimable: 432096 kB
WbDirtyThresh: 6004344 kB
WbDirtied: 25727744 kB
WbWritten: 25154752 kB
WbWriteBandwidth: 577904 kBps
b_dirty: 0
b_io: 1
b_more_io: 0
b_dirty_time: 0
state: 7

The wb_monitor.py script output is as following:
/wb_monitor.py 252:16 -c
writeback reclaimable dirtied written avg_bw
252:16_1 0 0 0 0 102400
252:16_4284 672 820064 9230368 8410304 685612
252:16_4325 896 819840 10491264 9671648 652348
252:16 1568 1639904 19721632 18081952 1440360

writeback reclaimable dirtied written avg_bw
252:16_1 0 0 0 0 102400
252:16_4284 672 820064 9230368 8410304 685612
252:16_4325 896 819840 10491264 9671648 652348
252:16 1568 1639904 19721632 18081952 1440360
..

Kemeng Shi (4):
writeback: collect stats of all wb of bdi in bdi_debug_stats_show
writeback: support retrieving per group debug writeback stats of bdi
writeback: add wb_monitor.py script to monitor writeback info on bdi
writeback: rename nr_reclaimable to nr_dirty in balance_dirty_pages

include/linux/writeback.h | 1 +
mm/backing-dev.c | 174 +++++++++++++++++++++++++++++-----
mm/page-writeback.c | 27 +++++-
tools/writeback/wb_monitor.py | 172 +++++++++++++++++++++++++++++++++
4 files changed, 345 insertions(+), 29 deletions(-)
create mode 100644 tools/writeback/wb_monitor.py

--
2.30.0



2024-04-22 07:11:38

by Kemeng Shi

[permalink] [raw]
Subject: [ATCH v3 2/4] writeback: support retrieving per group debug writeback stats of bdi

Add /sys/kernel/debug/bdi/xxx/wb_stats to show per group writeback stats
of bdi.

Following domain hierarchy is tested:
global domain (320G)
/ \
cgroup domain1(10G) cgroup domain2(10G)
| |
bdi wb1 wb2

/* per wb writeback info of bdi is collected */
cat /sys/kernel/debug/bdi/252:16/wb_stats
WbCgIno: 1
WbWriteback: 0 kB
WbReclaimable: 0 kB
WbDirtyThresh: 0 kB
WbDirtied: 0 kB
WbWritten: 0 kB
WbWriteBandwidth: 102400 kBps
b_dirty: 0
b_io: 0
b_more_io: 0
b_dirty_time: 0
state: 1

WbCgIno: 4208
WbWriteback: 59808 kB
WbReclaimable: 676480 kB
WbDirtyThresh: 6004624 kB
WbDirtied: 23348192 kB
WbWritten: 22614592 kB
WbWriteBandwidth: 593204 kBps
b_dirty: 1
b_io: 1
b_more_io: 0
b_dirty_time: 0
state: 7

WbCgIno: 4249
WbWriteback: 144256 kB
WbReclaimable: 432096 kB
WbDirtyThresh: 6004344 kB
WbDirtied: 25727744 kB
WbWritten: 25154752 kB
WbWriteBandwidth: 577904 kBps
b_dirty: 0
b_io: 1
b_more_io: 0
b_dirty_time: 0
state: 7

Signed-off-by: Kemeng Shi <[email protected]>
---
include/linux/writeback.h | 1 +
mm/backing-dev.c | 78 ++++++++++++++++++++++++++++++++++++++-
mm/page-writeback.c | 19 ++++++++++
3 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 9845cb62e40b..112d806ddbe4 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -355,6 +355,7 @@ int dirtytime_interval_handler(struct ctl_table *table, int write,

void global_dirty_limits(unsigned long *pbackground, unsigned long *pdirty);
unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh);
+unsigned long cgwb_calc_thresh(struct bdi_writeback *wb);

void wb_update_bandwidth(struct bdi_writeback *wb);

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 089146feb830..6ecd11bdce6e 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -155,19 +155,93 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
}
DEFINE_SHOW_ATTRIBUTE(bdi_debug_stats);

+static void wb_stats_show(struct seq_file *m, struct bdi_writeback *wb,
+ struct wb_stats *stats)
+{
+
+ seq_printf(m,
+ "WbCgIno: %10lu\n"
+ "WbWriteback: %10lu kB\n"
+ "WbReclaimable: %10lu kB\n"
+ "WbDirtyThresh: %10lu kB\n"
+ "WbDirtied: %10lu kB\n"
+ "WbWritten: %10lu kB\n"
+ "WbWriteBandwidth: %10lu kBps\n"
+ "b_dirty: %10lu\n"
+ "b_io: %10lu\n"
+ "b_more_io: %10lu\n"
+ "b_dirty_time: %10lu\n"
+ "state: %10lx\n\n",
+ cgroup_ino(wb->memcg_css->cgroup),
+ K(stats->nr_writeback),
+ K(stats->nr_reclaimable),
+ K(stats->wb_thresh),
+ K(stats->nr_dirtied),
+ K(stats->nr_written),
+ K(wb->avg_write_bandwidth),
+ stats->nr_dirty,
+ stats->nr_io,
+ stats->nr_more_io,
+ stats->nr_dirty_time,
+ wb->state);
+}
+
+static int cgwb_debug_stats_show(struct seq_file *m, void *v)
+{
+ struct backing_dev_info *bdi = m->private;
+ unsigned long background_thresh;
+ unsigned long dirty_thresh;
+ struct bdi_writeback *wb;
+ struct wb_stats stats;
+
+ global_dirty_limits(&background_thresh, &dirty_thresh);
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) {
+ struct wb_stats stats = { .dirty_thresh = dirty_thresh };
+
+ if (!wb_tryget(wb))
+ continue;
+
+ collect_wb_stats(&stats, wb);
+
+ /*
+ * Calculate thresh of wb in writeback cgroup which is min of
+ * thresh in global domain and thresh in cgroup domain. Drop
+ * rcu lock because cgwb_calc_thresh may sleep in
+ * cgroup_rstat_flush. We can do so here because we have a ref.
+ */
+ if (mem_cgroup_wb_domain(wb)) {
+ rcu_read_unlock();
+ stats.wb_thresh = min(stats.wb_thresh, cgwb_calc_thresh(wb));
+ rcu_read_lock();
+ }
+
+ wb_stats_show(m, wb, &stats);
+
+ wb_put(wb);
+ }
+ rcu_read_unlock();
+
+ return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(cgwb_debug_stats);
+
static void bdi_debug_register(struct backing_dev_info *bdi, const char *name)
{
bdi->debug_dir = debugfs_create_dir(name, bdi_debug_root);

debugfs_create_file("stats", 0444, bdi->debug_dir, bdi,
&bdi_debug_stats_fops);
+ debugfs_create_file("wb_stats", 0444, bdi->debug_dir, bdi,
+ &cgwb_debug_stats_fops);
}

static void bdi_debug_unregister(struct backing_dev_info *bdi)
{
debugfs_remove_recursive(bdi->debug_dir);
}
-#else
+#else /* CONFIG_DEBUG_FS */
static inline void bdi_debug_init(void)
{
}
@@ -178,7 +252,7 @@ static inline void bdi_debug_register(struct backing_dev_info *bdi,
static inline void bdi_debug_unregister(struct backing_dev_info *bdi)
{
}
-#endif
+#endif /* CONFIG_DEBUG_FS */

static ssize_t read_ahead_kb_store(struct device *dev,
struct device_attribute *attr,
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 3e19b87049db..38f143195bcb 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -892,6 +892,25 @@ unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh)
return __wb_calc_thresh(&gdtc);
}

+unsigned long cgwb_calc_thresh(struct bdi_writeback *wb)
+{
+ struct dirty_throttle_control gdtc = { GDTC_INIT_NO_WB };
+ struct dirty_throttle_control mdtc = { MDTC_INIT(wb, &gdtc) };
+ unsigned long filepages, headroom, writeback;
+
+ gdtc.avail = global_dirtyable_memory();
+ gdtc.dirty = global_node_page_state(NR_FILE_DIRTY) +
+ global_node_page_state(NR_WRITEBACK);
+
+ mem_cgroup_wb_stats(wb, &filepages, &headroom,
+ &mdtc.dirty, &writeback);
+ mdtc.dirty += writeback;
+ mdtc_calc_avail(&mdtc, filepages, headroom);
+ domain_dirty_limits(&mdtc);
+
+ return __wb_calc_thresh(&mdtc);
+}
+
/*
* setpoint - dirty 3
* f(dirty) := 1.0 + (----------------)
--
2.30.0


2024-04-22 07:55:46

by Kemeng Shi

[permalink] [raw]
Subject: Re: [ATCH v3 0/4] Improve visibility of writeback



on 4/23/2024 12:05 AM, Kemeng Shi wrote:
Forget to fix a build warning, please ignore this series. Sorry for the
noise.
> v2->v3:
> -Drop patches to protect non-exist race and to define GDTC_INIT_NO_WB to
> null.
> -Add wb_tryget to wb from which we collect stats to bdi stats.
> -Create wb_stats when CONFIG_CGROUP_WRITEBACK is no enabled.
> -Add a blank line between two wb stats in wb_stats.
>
> v1->v2:
> -Send cleanup to wq_monitor.py separately.
> -Add patch to avoid use after free of bdi.
> -Rename wb_calc_cg_thresh to cgwb_calc_thresh as Tejun suggested.
> -Use rcu walk to avoid use after free.
> -Add debug output to each related patches.
>
> This series tries to improve visilibity of writeback. Patch 1 make
> /sys/kernel/debug/bdi/xxx/stats show writeback info of whole bdi
> instead of only writeback info in root cgroup. Patch 2 add a new
> debug file /sys/kernel/debug/bdi/xxx/wb_stats to show per wb writeback
> info. Patch 3 add wb_monitor.py to monitor basic writeback info
> of running system, more info could be added on demand. Patch 4
> is a random cleanup. More details can be found in respective
> patches. Thanks!
>
> Following domain hierarchy is tested:
> global domain (320G)
> / \
> cgroup domain1(10G) cgroup domain2(10G)
> | |
> bdi wb1 wb2
>
> /* all writeback info of bdi is successfully collected */
> cat stats
> BdiWriteback: 4704 kB
> BdiReclaimable: 1294496 kB
> BdiDirtyThresh: 204208088 kB
> DirtyThresh: 195259944 kB
> BackgroundThresh: 32503588 kB
> BdiDirtied: 48519296 kB
> BdiWritten: 47225696 kB
> BdiWriteBandwidth: 1173892 kBps
> b_dirty: 1
> b_io: 0
> b_more_io: 1
> b_dirty_time: 0
> bdi_list: 1
> state: 1
>
> /* per wb writeback info of bdi is collected */
> cat /sys/kernel/debug/bdi/252:16/wb_stats
> WbCgIno: 1
> WbWriteback: 0 kB
> WbReclaimable: 0 kB
> WbDirtyThresh: 0 kB
> WbDirtied: 0 kB
> WbWritten: 0 kB
> WbWriteBandwidth: 102400 kBps
> b_dirty: 0
> b_io: 0
> b_more_io: 0
> b_dirty_time: 0
> state: 1
>
> WbCgIno: 4208
> WbWriteback: 59808 kB
> WbReclaimable: 676480 kB
> WbDirtyThresh: 6004624 kB
> WbDirtied: 23348192 kB
> WbWritten: 22614592 kB
> WbWriteBandwidth: 593204 kBps
> b_dirty: 1
> b_io: 1
> b_more_io: 0
> b_dirty_time: 0
> state: 7
>
> WbCgIno: 4249
> WbWriteback: 144256 kB
> WbReclaimable: 432096 kB
> WbDirtyThresh: 6004344 kB
> WbDirtied: 25727744 kB
> WbWritten: 25154752 kB
> WbWriteBandwidth: 577904 kBps
> b_dirty: 0
> b_io: 1
> b_more_io: 0
> b_dirty_time: 0
> state: 7
>
> The wb_monitor.py script output is as following:
> ./wb_monitor.py 252:16 -c
> writeback reclaimable dirtied written avg_bw
> 252:16_1 0 0 0 0 102400
> 252:16_4284 672 820064 9230368 8410304 685612
> 252:16_4325 896 819840 10491264 9671648 652348
> 252:16 1568 1639904 19721632 18081952 1440360
>
> writeback reclaimable dirtied written avg_bw
> 252:16_1 0 0 0 0 102400
> 252:16_4284 672 820064 9230368 8410304 685612
> 252:16_4325 896 819840 10491264 9671648 652348
> 252:16 1568 1639904 19721632 18081952 1440360
> ...
>
> Kemeng Shi (4):
> writeback: collect stats of all wb of bdi in bdi_debug_stats_show
> writeback: support retrieving per group debug writeback stats of bdi
> writeback: add wb_monitor.py script to monitor writeback info on bdi
> writeback: rename nr_reclaimable to nr_dirty in balance_dirty_pages
>
> include/linux/writeback.h | 1 +
> mm/backing-dev.c | 174 +++++++++++++++++++++++++++++-----
> mm/page-writeback.c | 27 +++++-
> tools/writeback/wb_monitor.py | 172 +++++++++++++++++++++++++++++++++
> 4 files changed, 345 insertions(+), 29 deletions(-)
> create mode 100644 tools/writeback/wb_monitor.py
>