2024-04-29 03:49:21

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH 00/10] Add helper functions to remove repeated code and

This series add a lot of helpers to remove repeated code between domain
and wb; dirty limit and dirty background; global domain and wb domain.
The helpers also improve readability. More details can be found on
respective patches.
Thanks.

A simple domain hierarchy is tested:
global domain (> 20G)
|
cgroup domain1(10G)
|
wb1
|
fio

Test steps:
/* make it easy to observe */
echo 300000 > /proc/sys/vm/dirty_expire_centisecs
echo 3000 > /proc/sys/vm/dirty_writeback_centisecs

/* create cgroup domain */
cd /sys/fs/cgroup
echo "+memory +io" > cgroup.subtree_control
mkdir group1
cd group1
echo 10G > memory.high
echo 10G > memory.max
echo $$ > cgroup.procs
mkfs.ext4 -F /dev/vdb
mount /dev/vdb /bdi1/

/* run fio to generate dirty pages */
fio -name test -filename=/bdi1/file -size=xxx -ioengine=libaio -bs=4K \
-iodepth=1 -rw=write -direct=0 --time_based -runtime=600 -invalidate=0

When fio size is 1.5G, the wb is in freerun state and dirty pages is
written back when dirty inode is expired after 30 seconds.
When fio size is 2G, the dirty pages keep being written back.
When fio size is 3G, the dirty pages keep being written back and
bandwidth of fio is reduce to 0 occasionally. It's more easy to observe
the dirty limitation by set /proc/sys/vm/dirty_ratio to a higher
value, and set /proc/sys/vm/dirty_ratio back to 20 when 3G pages are
dirtied.

Kemeng Shi (10):
writeback: factor out wb_bg_dirty_limits to remove repeated code
writeback: add general function domain_dirty_avail to calculate dirty
and avail of domain
writeback: factor out domain_over_bg_thresh to remove repeated code
writeback use [global/wb]_domain_dirty_avail helper in
cgwb_calc_thresh
writeback: call domain_dirty_avail in balance_dirty_pages
writeback: Factor out code of freerun to remove repeated code
writeback: factor out wb_dirty_freerun to remove more repeated freerun
code
writeback: factor out balance_domain_limits to remove repeated code
writeback: factor out wb_dirty_exceeded to remove repeated code
writeback: factor out balance_wb_limits to remove repeated code

mm/page-writeback.c | 324 ++++++++++++++++++++++++--------------------
1 file changed, 176 insertions(+), 148 deletions(-)

--
2.30.0



2024-04-29 03:49:33

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH 08/10] writeback: factor out balance_domain_limits to remove repeated code

Factor out balance_domain_limits to remove repeated code.

Signed-off-by: Kemeng Shi <[email protected]>
---
mm/page-writeback.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index e38beb680742..68ae4c90ce8b 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1753,6 +1753,14 @@ static void domain_dirty_freerun(struct dirty_throttle_control *dtc,
dtc->freerun = dirty <= dirty_freerun_ceiling(thresh, bg_thresh);
}

+static void balance_domain_limits(struct dirty_throttle_control *dtc,
+ bool strictlimit)
+{
+ domain_dirty_avail(dtc, false);
+ domain_dirty_limits(dtc);
+ domain_dirty_freerun(dtc, strictlimit);
+}
+
static void wb_dirty_freerun(struct dirty_throttle_control *dtc,
bool strictlimit)
{
@@ -1809,19 +1817,13 @@ static int balance_dirty_pages(struct bdi_writeback *wb,

nr_dirty = global_node_page_state(NR_FILE_DIRTY);

- domain_dirty_avail(gdtc, false);
- domain_dirty_limits(gdtc);
- domain_dirty_freerun(gdtc, strictlimit);
-
- if (mdtc) {
+ balance_domain_limits(gdtc, strictlimit);
+ if (mdtc)
/*
* If @wb belongs to !root memcg, repeat the same
* basic calculations for the memcg domain.
*/
- domain_dirty_avail(mdtc, false);
- domain_dirty_limits(mdtc);
- domain_dirty_freerun(mdtc, strictlimit);
- }
+ balance_domain_limits(mdtc, strictlimit);

/*
* In laptop mode, we wait until hitting the higher threshold
--
2.30.0


2024-04-29 03:49:42

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH 06/10] writeback: Factor out code of freerun to remove repeated code

Factor out code of freerun into new helper functions domain_poll_intv and
domain_dirty_freerun to remove repeated code.

Signed-off-by: Kemeng Shi <[email protected]>
---
mm/page-writeback.c | 89 +++++++++++++++++++++++++--------------------
1 file changed, 49 insertions(+), 40 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index c41db87f4e98..b49ca82380a5 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -139,6 +139,7 @@ struct dirty_throttle_control {
unsigned long wb_bg_thresh;

unsigned long pos_ratio;
+ bool freerun;
};

/*
@@ -1709,6 +1710,49 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
}
}

+static unsigned long domain_poll_intv(struct dirty_throttle_control *dtc,
+ bool strictlimit)
+{
+ unsigned long dirty, thresh;
+
+ if (strictlimit) {
+ dirty = dtc->wb_dirty;
+ thresh = dtc->wb_thresh;
+ } else {
+ dirty = dtc->dirty;
+ thresh = dtc->thresh;
+ }
+
+ return dirty_poll_interval(dirty, thresh);
+}
+
+/*
+ * Throttle it only when the background writeback cannot
+ * catch-up. This avoids (excessively) small writeouts
+ * when the wb limits are ramping up in case of !strictlimit.
+ *
+ * In strictlimit case make decision based on the wb counters
+ * and limits. Small writeouts when the wb limits are ramping
+ * up are the price we consciously pay for strictlimit-ing.
+ */
+static void domain_dirty_freerun(struct dirty_throttle_control *dtc,
+ bool strictlimit)
+{
+ unsigned long dirty, thresh, bg_thresh;
+
+ if (unlikely(strictlimit)) {
+ wb_dirty_limits(dtc);
+ dirty = dtc->wb_dirty;
+ thresh = dtc->wb_thresh;
+ bg_thresh = dtc->wb_bg_thresh;
+ } else {
+ dirty = dtc->dirty;
+ thresh = dtc->thresh;
+ bg_thresh = dtc->bg_thresh;
+ }
+ dtc->freerun = dirty <= dirty_freerun_ceiling(thresh, bg_thresh);
+}
+
/*
* balance_dirty_pages() must be called by processes which are generating dirty
* data. It looks at the number of dirty pages in the machine and will force
@@ -1741,27 +1785,12 @@ static int balance_dirty_pages(struct bdi_writeback *wb,

for (;;) {
unsigned long now = jiffies;
- unsigned long dirty, thresh, bg_thresh;
- unsigned long m_dirty = 0; /* stop bogus uninit warnings */
- unsigned long m_thresh = 0;
- unsigned long m_bg_thresh = 0;

nr_dirty = global_node_page_state(NR_FILE_DIRTY);

domain_dirty_avail(gdtc, false);
domain_dirty_limits(gdtc);
-
- if (unlikely(strictlimit)) {
- wb_dirty_limits(gdtc);
-
- dirty = gdtc->wb_dirty;
- thresh = gdtc->wb_thresh;
- bg_thresh = gdtc->wb_bg_thresh;
- } else {
- dirty = gdtc->dirty;
- thresh = gdtc->thresh;
- bg_thresh = gdtc->bg_thresh;
- }
+ domain_dirty_freerun(gdtc, strictlimit);

if (mdtc) {
/*
@@ -1770,17 +1799,7 @@ static int balance_dirty_pages(struct bdi_writeback *wb,
*/
domain_dirty_avail(mdtc, false);
domain_dirty_limits(mdtc);
-
- if (unlikely(strictlimit)) {
- wb_dirty_limits(mdtc);
- m_dirty = mdtc->wb_dirty;
- m_thresh = mdtc->wb_thresh;
- m_bg_thresh = mdtc->wb_bg_thresh;
- } else {
- m_dirty = mdtc->dirty;
- m_thresh = mdtc->thresh;
- m_bg_thresh = mdtc->bg_thresh;
- }
+ domain_dirty_freerun(mdtc, strictlimit);
}

/*
@@ -1797,31 +1816,21 @@ static int balance_dirty_pages(struct bdi_writeback *wb,
wb_start_background_writeback(wb);

/*
- * Throttle it only when the background writeback cannot
- * catch-up. This avoids (excessively) small writeouts
- * when the wb limits are ramping up in case of !strictlimit.
- *
- * In strictlimit case make decision based on the wb counters
- * and limits. Small writeouts when the wb limits are ramping
- * up are the price we consciously pay for strictlimit-ing.
- *
* If memcg domain is in effect, @dirty should be under
* both global and memcg freerun ceilings.
*/
- if (dirty <= dirty_freerun_ceiling(thresh, bg_thresh) &&
- (!mdtc ||
- m_dirty <= dirty_freerun_ceiling(m_thresh, m_bg_thresh))) {
+ if (gdtc->freerun && (!mdtc || mdtc->freerun)) {
unsigned long intv;
unsigned long m_intv;

free_running:
- intv = dirty_poll_interval(dirty, thresh);
+ intv = domain_poll_intv(gdtc, strictlimit);
m_intv = ULONG_MAX;

current->dirty_paused_when = now;
current->nr_dirtied = 0;
if (mdtc)
- m_intv = dirty_poll_interval(m_dirty, m_thresh);
+ m_intv = domain_poll_intv(mdtc, strictlimit);
current->nr_dirtied_pause = min(intv, m_intv);
break;
}
--
2.30.0


2024-04-29 03:50:05

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH 07/10] writeback: factor out wb_dirty_freerun to remove more repeated freerun code

Factor out wb_dirty_freerun to remove more repeated freerun code.

Signed-off-by: Kemeng Shi <[email protected]>
---
mm/page-writeback.c | 55 +++++++++++++++++++++++----------------------
1 file changed, 28 insertions(+), 27 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index b49ca82380a5..e38beb680742 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1753,6 +1753,27 @@ static void domain_dirty_freerun(struct dirty_throttle_control *dtc,
dtc->freerun = dirty <= dirty_freerun_ceiling(thresh, bg_thresh);
}

+static void wb_dirty_freerun(struct dirty_throttle_control *dtc,
+ bool strictlimit)
+{
+ dtc->freerun = false;
+
+ /* was already handled in domain_dirty_freerun */
+ if (strictlimit)
+ return;
+
+ wb_dirty_limits(dtc);
+ /*
+ * LOCAL_THROTTLE tasks must not be throttled when below the per-wb
+ * freerun ceiling.
+ */
+ if (!(current->flags & PF_LOCAL_THROTTLE))
+ return;
+
+ dtc->freerun = dtc->wb_dirty <
+ dirty_freerun_ceiling(dtc->wb_thresh, dtc->wb_bg_thresh);
+}
+
/*
* balance_dirty_pages() must be called by processes which are generating dirty
* data. It looks at the number of dirty pages in the machine and will force
@@ -1845,19 +1866,9 @@ static int balance_dirty_pages(struct bdi_writeback *wb,
* Calculate global domain's pos_ratio and select the
* global dtc by default.
*/
- if (!strictlimit) {
- wb_dirty_limits(gdtc);
-
- if ((current->flags & PF_LOCAL_THROTTLE) &&
- gdtc->wb_dirty <
- dirty_freerun_ceiling(gdtc->wb_thresh,
- gdtc->wb_bg_thresh))
- /*
- * LOCAL_THROTTLE tasks must not be throttled
- * when below the per-wb freerun ceiling.
- */
- goto free_running;
- }
+ wb_dirty_freerun(gdtc, strictlimit);
+ if (gdtc->freerun)
+ goto free_running;

dirty_exceeded = (gdtc->wb_dirty > gdtc->wb_thresh) &&
((gdtc->dirty > gdtc->thresh) || strictlimit);
@@ -1872,20 +1883,10 @@ static int balance_dirty_pages(struct bdi_writeback *wb,
* both global and memcg domains. Choose the one
* w/ lower pos_ratio.
*/
- if (!strictlimit) {
- wb_dirty_limits(mdtc);
-
- if ((current->flags & PF_LOCAL_THROTTLE) &&
- mdtc->wb_dirty <
- dirty_freerun_ceiling(mdtc->wb_thresh,
- mdtc->wb_bg_thresh))
- /*
- * LOCAL_THROTTLE tasks must not be
- * throttled when below the per-wb
- * freerun ceiling.
- */
- goto free_running;
- }
+ wb_dirty_freerun(mdtc, strictlimit);
+ if (mdtc->freerun)
+ goto free_running;
+
dirty_exceeded |= (mdtc->wb_dirty > mdtc->wb_thresh) &&
((mdtc->dirty > mdtc->thresh) || strictlimit);

--
2.30.0


2024-04-29 03:50:36

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH 09/10] writeback: factor out wb_dirty_exceeded to remove repeated code

Factor out wb_dirty_exceeded to remove repeated code

Signed-off-by: Kemeng Shi <[email protected]>
---
mm/page-writeback.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 68ae4c90ce8b..26b638cc58c5 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -140,6 +140,7 @@ struct dirty_throttle_control {

unsigned long pos_ratio;
bool freerun;
+ bool dirty_exceeded;
};

/*
@@ -1782,6 +1783,13 @@ static void wb_dirty_freerun(struct dirty_throttle_control *dtc,
dirty_freerun_ceiling(dtc->wb_thresh, dtc->wb_bg_thresh);
}

+static inline void wb_dirty_exceeded(struct dirty_throttle_control *dtc,
+ bool strictlimit)
+{
+ dtc->dirty_exceeded = (dtc->wb_dirty > dtc->wb_thresh) &&
+ ((dtc->dirty > dtc->thresh) || strictlimit);
+}
+
/*
* balance_dirty_pages() must be called by processes which are generating dirty
* data. It looks at the number of dirty pages in the machine and will force
@@ -1804,7 +1812,6 @@ static int balance_dirty_pages(struct bdi_writeback *wb,
long max_pause;
long min_pause;
int nr_dirtied_pause;
- bool dirty_exceeded = false;
unsigned long task_ratelimit;
unsigned long dirty_ratelimit;
struct backing_dev_info *bdi = wb->bdi;
@@ -1872,9 +1879,7 @@ static int balance_dirty_pages(struct bdi_writeback *wb,
if (gdtc->freerun)
goto free_running;

- dirty_exceeded = (gdtc->wb_dirty > gdtc->wb_thresh) &&
- ((gdtc->dirty > gdtc->thresh) || strictlimit);
-
+ wb_dirty_exceeded(gdtc, strictlimit);
wb_position_ratio(gdtc);
sdtc = gdtc;

@@ -1889,17 +1894,13 @@ static int balance_dirty_pages(struct bdi_writeback *wb,
if (mdtc->freerun)
goto free_running;

- dirty_exceeded |= (mdtc->wb_dirty > mdtc->wb_thresh) &&
- ((mdtc->dirty > mdtc->thresh) || strictlimit);
-
+ wb_dirty_exceeded(mdtc, strictlimit);
wb_position_ratio(mdtc);
if (mdtc->pos_ratio < gdtc->pos_ratio)
sdtc = mdtc;
}

- if (dirty_exceeded != wb->dirty_exceeded)
- wb->dirty_exceeded = dirty_exceeded;
-
+ wb->dirty_exceeded = gdtc->dirty_exceeded || mdtc->dirty_exceeded;
if (time_is_before_jiffies(READ_ONCE(wb->bw_time_stamp) +
BANDWIDTH_INTERVAL))
__wb_update_bandwidth(gdtc, mdtc, true);
--
2.30.0


2024-04-29 03:50:42

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH 10/10] writeback: factor out balance_wb_limits to remove repeated code

Factor out balance_wb_limits to remove repeated code

Signed-off-by: Kemeng Shi <[email protected]>
---
mm/page-writeback.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 26b638cc58c5..4949036e6286 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1790,6 +1790,17 @@ static inline void wb_dirty_exceeded(struct dirty_throttle_control *dtc,
((dtc->dirty > dtc->thresh) || strictlimit);
}

+static void balance_wb_limits(struct dirty_throttle_control *dtc,
+ bool strictlimit)
+{
+ wb_dirty_freerun(dtc, strictlimit);
+ if (dtc->freerun)
+ return;
+
+ wb_dirty_exceeded(dtc, strictlimit);
+ wb_position_ratio(dtc);
+}
+
/*
* balance_dirty_pages() must be called by processes which are generating dirty
* data. It looks at the number of dirty pages in the machine and will force
@@ -1875,12 +1886,9 @@ static int balance_dirty_pages(struct bdi_writeback *wb,
* Calculate global domain's pos_ratio and select the
* global dtc by default.
*/
- wb_dirty_freerun(gdtc, strictlimit);
+ balance_wb_limits(gdtc, strictlimit);
if (gdtc->freerun)
goto free_running;
-
- wb_dirty_exceeded(gdtc, strictlimit);
- wb_position_ratio(gdtc);
sdtc = gdtc;

if (mdtc) {
@@ -1890,12 +1898,9 @@ static int balance_dirty_pages(struct bdi_writeback *wb,
* both global and memcg domains. Choose the one
* w/ lower pos_ratio.
*/
- wb_dirty_freerun(mdtc, strictlimit);
+ balance_wb_limits(mdtc, strictlimit);
if (mdtc->freerun)
goto free_running;
-
- wb_dirty_exceeded(mdtc, strictlimit);
- wb_position_ratio(mdtc);
if (mdtc->pos_ratio < gdtc->pos_ratio)
sdtc = mdtc;
}
--
2.30.0


2024-04-30 07:24:45

by Dan Carpenter

[permalink] [raw]
Subject: Re: [PATCH 09/10] writeback: factor out wb_dirty_exceeded to remove repeated code

Hi Kemeng,

kernel test robot noticed the following build warnings:

https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Kemeng-Shi/writeback-factor-out-wb_bg_dirty_limits-to-remove-repeated-code/20240429-114903
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20240429034738.138609-10-shikemeng%40huaweicloud.com
patch subject: [PATCH 09/10] writeback: factor out wb_dirty_exceeded to remove repeated code
config: i386-randconfig-141-20240429 (https://download.01.org/0day-ci/archive/20240430/[email protected]/config)
compiler: gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Reported-by: Dan Carpenter <[email protected]>
| Closes: https://lore.kernel.org/r/[email protected]/

New smatch warnings:
mm/page-writeback.c:1903 balance_dirty_pages() error: we previously assumed 'mdtc' could be null (see line 1886)

vim +/mdtc +1903 mm/page-writeback.c

fe6c9c6e3e3e33 Jan Kara 2022-06-23 1800 static int balance_dirty_pages(struct bdi_writeback *wb,
fe6c9c6e3e3e33 Jan Kara 2022-06-23 1801 unsigned long pages_dirtied, unsigned int flags)
^1da177e4c3f41 Linus Torvalds 2005-04-16 1802 {
2bc00aef030f4f Tejun Heo 2015-05-22 1803 struct dirty_throttle_control gdtc_stor = { GDTC_INIT(wb) };
c2aa723a609363 Tejun Heo 2015-05-22 1804 struct dirty_throttle_control mdtc_stor = { MDTC_INIT(wb, &gdtc_stor) };
2bc00aef030f4f Tejun Heo 2015-05-22 1805 struct dirty_throttle_control * const gdtc = &gdtc_stor;
c2aa723a609363 Tejun Heo 2015-05-22 1806 struct dirty_throttle_control * const mdtc = mdtc_valid(&mdtc_stor) ?
c2aa723a609363 Tejun Heo 2015-05-22 1807 &mdtc_stor : NULL;
c2aa723a609363 Tejun Heo 2015-05-22 1808 struct dirty_throttle_control *sdtc;
c8a7ee1b73042a Kemeng Shi 2024-04-23 1809 unsigned long nr_dirty;
83712358ba0a14 Wu Fengguang 2011-06-11 1810 long period;
7ccb9ad5364d6a Wu Fengguang 2011-11-30 1811 long pause;
7ccb9ad5364d6a Wu Fengguang 2011-11-30 1812 long max_pause;
7ccb9ad5364d6a Wu Fengguang 2011-11-30 1813 long min_pause;
7ccb9ad5364d6a Wu Fengguang 2011-11-30 1814 int nr_dirtied_pause;
143dfe8611a630 Wu Fengguang 2010-08-27 1815 unsigned long task_ratelimit;
7ccb9ad5364d6a Wu Fengguang 2011-11-30 1816 unsigned long dirty_ratelimit;
dfb8ae56783542 Tejun Heo 2015-05-22 1817 struct backing_dev_info *bdi = wb->bdi;
5a53748568f796 Maxim Patlasov 2013-09-11 1818 bool strictlimit = bdi->capabilities & BDI_CAP_STRICTLIMIT;
e98be2d599207c Wu Fengguang 2010-08-29 1819 unsigned long start_time = jiffies;
fe6c9c6e3e3e33 Jan Kara 2022-06-23 1820 int ret = 0;
^1da177e4c3f41 Linus Torvalds 2005-04-16 1821
^1da177e4c3f41 Linus Torvalds 2005-04-16 1822 for (;;) {
83712358ba0a14 Wu Fengguang 2011-06-11 1823 unsigned long now = jiffies;
83712358ba0a14 Wu Fengguang 2011-06-11 1824
c8a7ee1b73042a Kemeng Shi 2024-04-23 1825 nr_dirty = global_node_page_state(NR_FILE_DIRTY);
5fce25a9df4865 Peter Zijlstra 2007-11-14 1826
204c26b42a22b8 Kemeng Shi 2024-04-29 1827 balance_domain_limits(gdtc, strictlimit);
204c26b42a22b8 Kemeng Shi 2024-04-29 1828 if (mdtc)
c2aa723a609363 Tejun Heo 2015-05-22 1829 /*
c2aa723a609363 Tejun Heo 2015-05-22 1830 * If @wb belongs to !root memcg, repeat the same
c2aa723a609363 Tejun Heo 2015-05-22 1831 * basic calculations for the memcg domain.
c2aa723a609363 Tejun Heo 2015-05-22 1832 */
204c26b42a22b8 Kemeng Shi 2024-04-29 1833 balance_domain_limits(mdtc, strictlimit);
5a53748568f796 Maxim Patlasov 2013-09-11 1834
ea6813be07dcdc Jan Kara 2022-06-23 1835 /*
ea6813be07dcdc Jan Kara 2022-06-23 1836 * In laptop mode, we wait until hitting the higher threshold
ea6813be07dcdc Jan Kara 2022-06-23 1837 * before starting background writeout, and then write out all
ea6813be07dcdc Jan Kara 2022-06-23 1838 * the way down to the lower threshold. So slow writers cause
ea6813be07dcdc Jan Kara 2022-06-23 1839 * minimal disk activity.
ea6813be07dcdc Jan Kara 2022-06-23 1840 *
ea6813be07dcdc Jan Kara 2022-06-23 1841 * In normal mode, we start background writeout at the lower
ea6813be07dcdc Jan Kara 2022-06-23 1842 * background_thresh, to keep the amount of dirty memory low.
ea6813be07dcdc Jan Kara 2022-06-23 1843 */
c8a7ee1b73042a Kemeng Shi 2024-04-23 1844 if (!laptop_mode && nr_dirty > gdtc->bg_thresh &&
ea6813be07dcdc Jan Kara 2022-06-23 1845 !writeback_in_progress(wb))
ea6813be07dcdc Jan Kara 2022-06-23 1846 wb_start_background_writeback(wb);
ea6813be07dcdc Jan Kara 2022-06-23 1847
16c4042f08919f Wu Fengguang 2010-08-11 1848 /*
c2aa723a609363 Tejun Heo 2015-05-22 1849 * If memcg domain is in effect, @dirty should be under
c2aa723a609363 Tejun Heo 2015-05-22 1850 * both global and memcg freerun ceilings.
16c4042f08919f Wu Fengguang 2010-08-11 1851 */
c474a90dc076a7 Kemeng Shi 2024-04-29 1852 if (gdtc->freerun && (!mdtc || mdtc->freerun)) {
a37b0715ddf300 NeilBrown 2020-06-01 1853 unsigned long intv;
a37b0715ddf300 NeilBrown 2020-06-01 1854 unsigned long m_intv;
a37b0715ddf300 NeilBrown 2020-06-01 1855
a37b0715ddf300 NeilBrown 2020-06-01 1856 free_running:
c474a90dc076a7 Kemeng Shi 2024-04-29 1857 intv = domain_poll_intv(gdtc, strictlimit);
a37b0715ddf300 NeilBrown 2020-06-01 1858 m_intv = ULONG_MAX;
c2aa723a609363 Tejun Heo 2015-05-22 1859
83712358ba0a14 Wu Fengguang 2011-06-11 1860 current->dirty_paused_when = now;
83712358ba0a14 Wu Fengguang 2011-06-11 1861 current->nr_dirtied = 0;
c2aa723a609363 Tejun Heo 2015-05-22 1862 if (mdtc)
c474a90dc076a7 Kemeng Shi 2024-04-29 1863 m_intv = domain_poll_intv(mdtc, strictlimit);
c2aa723a609363 Tejun Heo 2015-05-22 1864 current->nr_dirtied_pause = min(intv, m_intv);
16c4042f08919f Wu Fengguang 2010-08-11 1865 break;
83712358ba0a14 Wu Fengguang 2011-06-11 1866 }
16c4042f08919f Wu Fengguang 2010-08-11 1867
ea6813be07dcdc Jan Kara 2022-06-23 1868 /* Start writeback even when in laptop mode */
bc05873dccd27d Tejun Heo 2015-05-22 1869 if (unlikely(!writeback_in_progress(wb)))
9ecf4866c018ae Tejun Heo 2015-05-22 1870 wb_start_background_writeback(wb);
143dfe8611a630 Wu Fengguang 2010-08-27 1871
97b27821b4854c Tejun Heo 2019-08-26 1872 mem_cgroup_flush_foreign(wb);
97b27821b4854c Tejun Heo 2019-08-26 1873
c2aa723a609363 Tejun Heo 2015-05-22 1874 /*
c2aa723a609363 Tejun Heo 2015-05-22 1875 * Calculate global domain's pos_ratio and select the
c2aa723a609363 Tejun Heo 2015-05-22 1876 * global dtc by default.
c2aa723a609363 Tejun Heo 2015-05-22 1877 */
aab09fbaa2dd34 Kemeng Shi 2024-04-29 1878 wb_dirty_freerun(gdtc, strictlimit);
aab09fbaa2dd34 Kemeng Shi 2024-04-29 1879 if (gdtc->freerun)
a37b0715ddf300 NeilBrown 2020-06-01 1880 goto free_running;
a37b0715ddf300 NeilBrown 2020-06-01 1881
8b8bf84233eccf Kemeng Shi 2024-04-29 1882 wb_dirty_exceeded(gdtc, strictlimit);
daddfa3cb30ebf Tejun Heo 2015-05-22 1883 wb_position_ratio(gdtc);
c2aa723a609363 Tejun Heo 2015-05-22 1884 sdtc = gdtc;
e98be2d599207c Wu Fengguang 2010-08-29 1885
c2aa723a609363 Tejun Heo 2015-05-22 @1886 if (mdtc) {
^^^^^^^^^^^
This code assumes mdtc can be NULL

c2aa723a609363 Tejun Heo 2015-05-22 1887 /*
c2aa723a609363 Tejun Heo 2015-05-22 1888 * If memcg domain is in effect, calculate its
c2aa723a609363 Tejun Heo 2015-05-22 1889 * pos_ratio. @wb should satisfy constraints from
c2aa723a609363 Tejun Heo 2015-05-22 1890 * both global and memcg domains. Choose the one
c2aa723a609363 Tejun Heo 2015-05-22 1891 * w/ lower pos_ratio.
c2aa723a609363 Tejun Heo 2015-05-22 1892 */
aab09fbaa2dd34 Kemeng Shi 2024-04-29 1893 wb_dirty_freerun(mdtc, strictlimit);
aab09fbaa2dd34 Kemeng Shi 2024-04-29 1894 if (mdtc->freerun)
a37b0715ddf300 NeilBrown 2020-06-01 1895 goto free_running;
aab09fbaa2dd34 Kemeng Shi 2024-04-29 1896
8b8bf84233eccf Kemeng Shi 2024-04-29 1897 wb_dirty_exceeded(mdtc, strictlimit);
c2aa723a609363 Tejun Heo 2015-05-22 1898 wb_position_ratio(mdtc);
c2aa723a609363 Tejun Heo 2015-05-22 1899 if (mdtc->pos_ratio < gdtc->pos_ratio)
c2aa723a609363 Tejun Heo 2015-05-22 1900 sdtc = mdtc;
c2aa723a609363 Tejun Heo 2015-05-22 1901 }
daddfa3cb30ebf Tejun Heo 2015-05-22 1902
8b8bf84233eccf Kemeng Shi 2024-04-29 @1903 wb->dirty_exceeded = gdtc->dirty_exceeded || mdtc->dirty_exceeded;
^^^^^^
Unchecked dereference

20792ebf3eeb82 Jan Kara 2021-09-02 1904 if (time_is_before_jiffies(READ_ONCE(wb->bw_time_stamp) +
45a2966fd64147 Jan Kara 2021-09-02 1905 BANDWIDTH_INTERVAL))
fee468fdf41cdf Jan Kara 2021-09-02 1906 __wb_update_bandwidth(gdtc, mdtc, true);
e98be2d599207c Wu Fengguang 2010-08-29 1907
c2aa723a609363 Tejun Heo 2015-05-22 1908 /* throttle according to the chosen dtc */
20792ebf3eeb82 Jan Kara 2021-09-02 1909 dirty_ratelimit = READ_ONCE(wb->dirty_ratelimit);
c2aa723a609363 Tejun Heo 2015-05-22 1910 task_ratelimit = ((u64)dirty_ratelimit * sdtc->pos_ratio) >>
3a73dbbc9bb3fc Wu Fengguang 2011-11-07 1911 RATELIMIT_CALC_SHIFT;
c2aa723a609363 Tejun Heo 2015-05-22 1912 max_pause = wb_max_pause(wb, sdtc->wb_dirty);
a88a341a73be4e Tejun Heo 2015-05-22 1913 min_pause = wb_min_pause(wb, max_pause,
7ccb9ad5364d6a Wu Fengguang 2011-11-30 1914 task_ratelimit, dirty_ratelimit,
7ccb9ad5364d6a Wu Fengguang 2011-11-30 1915 &nr_dirtied_pause);
7ccb9ad5364d6a Wu Fengguang 2011-11-30 1916
3a73dbbc9bb3fc Wu Fengguang 2011-11-07 1917 if (unlikely(task_ratelimit == 0)) {
83712358ba0a14 Wu Fengguang 2011-06-11 1918 period = max_pause;
c8462cc9de9e92 Wu Fengguang 2011-06-11 1919 pause = max_pause;
143dfe8611a630 Wu Fengguang 2010-08-27 1920 goto pause;
e50e37201ae2e7 Wu Fengguang 2010-08-11 1921 }
83712358ba0a14 Wu Fengguang 2011-06-11 1922 period = HZ * pages_dirtied / task_ratelimit;
83712358ba0a14 Wu Fengguang 2011-06-11 1923 pause = period;
83712358ba0a14 Wu Fengguang 2011-06-11 1924 if (current->dirty_paused_when)
83712358ba0a14 Wu Fengguang 2011-06-11 1925 pause -= now - current->dirty_paused_when;
83712358ba0a14 Wu Fengguang 2011-06-11 1926 /*
83712358ba0a14 Wu Fengguang 2011-06-11 1927 * For less than 1s think time (ext3/4 may block the dirtier
83712358ba0a14 Wu Fengguang 2011-06-11 1928 * for up to 800ms from time to time on 1-HDD; so does xfs,
83712358ba0a14 Wu Fengguang 2011-06-11 1929 * however at much less frequency), try to compensate it in
83712358ba0a14 Wu Fengguang 2011-06-11 1930 * future periods by updating the virtual time; otherwise just
83712358ba0a14 Wu Fengguang 2011-06-11 1931 * do a reset, as it may be a light dirtier.
83712358ba0a14 Wu Fengguang 2011-06-11 1932 */
7ccb9ad5364d6a Wu Fengguang 2011-11-30 1933 if (pause < min_pause) {
5634cc2aa9aebc Tejun Heo 2015-08-18 1934 trace_balance_dirty_pages(wb,
c2aa723a609363 Tejun Heo 2015-05-22 1935 sdtc->thresh,
c2aa723a609363 Tejun Heo 2015-05-22 1936 sdtc->bg_thresh,
c2aa723a609363 Tejun Heo 2015-05-22 1937 sdtc->dirty,
c2aa723a609363 Tejun Heo 2015-05-22 1938 sdtc->wb_thresh,
c2aa723a609363 Tejun Heo 2015-05-22 1939 sdtc->wb_dirty,
ece13ac31bbe49 Wu Fengguang 2010-08-29 1940 dirty_ratelimit,
ece13ac31bbe49 Wu Fengguang 2010-08-29 1941 task_ratelimit,
ece13ac31bbe49 Wu Fengguang 2010-08-29 1942 pages_dirtied,
83712358ba0a14 Wu Fengguang 2011-06-11 1943 period,
7ccb9ad5364d6a Wu Fengguang 2011-11-30 1944 min(pause, 0L),
ece13ac31bbe49 Wu Fengguang 2010-08-29 1945 start_time);
83712358ba0a14 Wu Fengguang 2011-06-11 1946 if (pause < -HZ) {
83712358ba0a14 Wu Fengguang 2011-06-11 1947 current->dirty_paused_when = now;
83712358ba0a14 Wu Fengguang 2011-06-11 1948 current->nr_dirtied = 0;
83712358ba0a14 Wu Fengguang 2011-06-11 1949 } else if (period) {
83712358ba0a14 Wu Fengguang 2011-06-11 1950 current->dirty_paused_when += period;
83712358ba0a14 Wu Fengguang 2011-06-11 1951 current->nr_dirtied = 0;
7ccb9ad5364d6a Wu Fengguang 2011-11-30 1952 } else if (current->nr_dirtied_pause <= pages_dirtied)
7ccb9ad5364d6a Wu Fengguang 2011-11-30 1953 current->nr_dirtied_pause += pages_dirtied;
57fc978cfb61ed Wu Fengguang 2011-06-11 1954 break;
e50e37201ae2e7 Wu Fengguang 2010-08-11 1955 }
7ccb9ad5364d6a Wu Fengguang 2011-11-30 1956 if (unlikely(pause > max_pause)) {
7ccb9ad5364d6a Wu Fengguang 2011-11-30 1957 /* for occasional dropped task_ratelimit */
7ccb9ad5364d6a Wu Fengguang 2011-11-30 1958 now += min(pause - max_pause, max_pause);
7ccb9ad5364d6a Wu Fengguang 2011-11-30 1959 pause = max_pause;
7ccb9ad5364d6a Wu Fengguang 2011-11-30 1960 }
143dfe8611a630 Wu Fengguang 2010-08-27 1961
143dfe8611a630 Wu Fengguang 2010-08-27 1962 pause:
5634cc2aa9aebc Tejun Heo 2015-08-18 1963 trace_balance_dirty_pages(wb,
c2aa723a609363 Tejun Heo 2015-05-22 1964 sdtc->thresh,
c2aa723a609363 Tejun Heo 2015-05-22 1965 sdtc->bg_thresh,
c2aa723a609363 Tejun Heo 2015-05-22 1966 sdtc->dirty,
c2aa723a609363 Tejun Heo 2015-05-22 1967 sdtc->wb_thresh,
c2aa723a609363 Tejun Heo 2015-05-22 1968 sdtc->wb_dirty,
ece13ac31bbe49 Wu Fengguang 2010-08-29 1969 dirty_ratelimit,
ece13ac31bbe49 Wu Fengguang 2010-08-29 1970 task_ratelimit,
ece13ac31bbe49 Wu Fengguang 2010-08-29 1971 pages_dirtied,
83712358ba0a14 Wu Fengguang 2011-06-11 1972 period,
ece13ac31bbe49 Wu Fengguang 2010-08-29 1973 pause,
ece13ac31bbe49 Wu Fengguang 2010-08-29 1974 start_time);
fe6c9c6e3e3e33 Jan Kara 2022-06-23 1975 if (flags & BDP_ASYNC) {
fe6c9c6e3e3e33 Jan Kara 2022-06-23 1976 ret = -EAGAIN;
fe6c9c6e3e3e33 Jan Kara 2022-06-23 1977 break;
fe6c9c6e3e3e33 Jan Kara 2022-06-23 1978 }
499d05ecf990a7 Jan Kara 2011-11-16 1979 __set_current_state(TASK_KILLABLE);
f814bdda774c18 Jan Kara 2024-01-23 1980 bdi->last_bdp_sleep = jiffies;
d25105e8911bff Wu Fengguang 2009-10-09 1981 io_schedule_timeout(pause);
87c6a9b253520b Jens Axboe 2009-09-17 1982
83712358ba0a14 Wu Fengguang 2011-06-11 1983 current->dirty_paused_when = now + pause;
83712358ba0a14 Wu Fengguang 2011-06-11 1984 current->nr_dirtied = 0;
7ccb9ad5364d6a Wu Fengguang 2011-11-30 1985 current->nr_dirtied_pause = nr_dirtied_pause;
83712358ba0a14 Wu Fengguang 2011-06-11 1986
ffd1f609ab1053 Wu Fengguang 2011-06-19 1987 /*
2bc00aef030f4f Tejun Heo 2015-05-22 1988 * This is typically equal to (dirty < thresh) and can also
2bc00aef030f4f Tejun Heo 2015-05-22 1989 * keep "1000+ dd on a slow USB stick" under control.
ffd1f609ab1053 Wu Fengguang 2011-06-19 1990 */
1df647197c5b8a Wu Fengguang 2011-11-13 1991 if (task_ratelimit)
ffd1f609ab1053 Wu Fengguang 2011-06-19 1992 break;
499d05ecf990a7 Jan Kara 2011-11-16 1993
c5c6343c4d75f9 Wu Fengguang 2011-12-02 1994 /*
f0953a1bbaca71 Ingo Molnar 2021-05-06 1995 * In the case of an unresponsive NFS server and the NFS dirty
de1fff37b2781f Tejun Heo 2015-05-22 1996 * pages exceeds dirty_thresh, give the other good wb's a pipe
c5c6343c4d75f9 Wu Fengguang 2011-12-02 1997 * to go through, so that tasks on them still remain responsive.
c5c6343c4d75f9 Wu Fengguang 2011-12-02 1998 *
3f8b6fb7f279c7 Masahiro Yamada 2017-02-27 1999 * In theory 1 page is enough to keep the consumer-producer
c5c6343c4d75f9 Wu Fengguang 2011-12-02 2000 * pipe going: the flusher cleans 1 page => the task dirties 1
de1fff37b2781f Tejun Heo 2015-05-22 2001 * more page. However wb_dirty has accounting errors. So use
93f78d882865cb Tejun Heo 2015-05-22 2002 * the larger and more IO friendly wb_stat_error.
c5c6343c4d75f9 Wu Fengguang 2011-12-02 2003 */
2bce774e8245e9 Wang Long 2017-11-15 2004 if (sdtc->wb_dirty <= wb_stat_error())
c5c6343c4d75f9 Wu Fengguang 2011-12-02 2005 break;
c5c6343c4d75f9 Wu Fengguang 2011-12-02 2006
499d05ecf990a7 Jan Kara 2011-11-16 2007 if (fatal_signal_pending(current))
499d05ecf990a7 Jan Kara 2011-11-16 2008 break;
^1da177e4c3f41 Linus Torvalds 2005-04-16 2009 }
fe6c9c6e3e3e33 Jan Kara 2022-06-23 2010 return ret;
^1da177e4c3f41 Linus Torvalds 2005-04-16 2011 }

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


2024-05-01 17:15:31

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 06/10] writeback: Factor out code of freerun to remove repeated code

On Mon, Apr 29, 2024 at 11:47:34AM +0800, Kemeng Shi wrote:
> Factor out code of freerun into new helper functions domain_poll_intv and
> domain_dirty_freerun to remove repeated code.
>
> Signed-off-by: Kemeng Shi <[email protected]>

Acked-by: Tejun Heo <[email protected]>

with one nit:

> +/*
> + * Throttle it only when the background writeback cannot
> + * catch-up. This avoids (excessively) small writeouts
> + * when the wb limits are ramping up in case of !strictlimit.
> + *
> + * In strictlimit case make decision based on the wb counters
> + * and limits. Small writeouts when the wb limits are ramping
> + * up are the price we consciously pay for strictlimit-ing.
> + */

Can you please reflow the above to 80col or at least seventy something?

Thanks.

--
tejun

2024-05-01 17:19:11

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 07/10] writeback: factor out wb_dirty_freerun to remove more repeated freerun code

On Mon, Apr 29, 2024 at 11:47:35AM +0800, Kemeng Shi wrote:
..
> +static void wb_dirty_freerun(struct dirty_throttle_control *dtc,
> + bool strictlimit)
> +{
..
> + /*
> + * LOCAL_THROTTLE tasks must not be throttled when below the per-wb
> + * freerun ceiling.
> + */
> + if (!(current->flags & PF_LOCAL_THROTTLE))
> + return;

Shouldn't this set free_run to true?

Also, wouldn't it be better if these functions return bool instead of
recording the result in dtc->freerun?

Thanks.

--
tejun

2024-05-01 17:22:00

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 08/10] writeback: factor out balance_domain_limits to remove repeated code

> @@ -1809,19 +1817,13 @@ static int balance_dirty_pages(struct bdi_writeback *wb,
>
> nr_dirty = global_node_page_state(NR_FILE_DIRTY);
>
> - domain_dirty_avail(gdtc, false);
> - domain_dirty_limits(gdtc);
> - domain_dirty_freerun(gdtc, strictlimit);
> -
> - if (mdtc) {
> + balance_domain_limits(gdtc, strictlimit);
> + if (mdtc)
> /*
> * If @wb belongs to !root memcg, repeat the same
> * basic calculations for the memcg domain.
> */
> - domain_dirty_avail(mdtc, false);
> - domain_dirty_limits(mdtc);
> - domain_dirty_freerun(mdtc, strictlimit);
> - }
> + balance_domain_limits(mdtc, strictlimit);

Please keep the braces with the intervening comment. Other than that,

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2024-05-01 17:29:02

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 09/10] writeback: factor out wb_dirty_exceeded to remove repeated code

On Mon, Apr 29, 2024 at 11:47:37AM +0800, Kemeng Shi wrote:
> Factor out wb_dirty_exceeded to remove repeated code
>
> Signed-off-by: Kemeng Shi <[email protected]>
> ---
> mm/page-writeback.c | 21 +++++++++++----------
> 1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 68ae4c90ce8b..26b638cc58c5 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -140,6 +140,7 @@ struct dirty_throttle_control {
>
> unsigned long pos_ratio;
> bool freerun;
> + bool dirty_exceeded;

Can you try making the function return bool? That or collect dtc setup into
a single function which takes flags to initialize different parts? It can
become pretty error-prone to keep partially storing results in the struct.

Thanks.

--
tejun

2024-05-06 01:38:45

by Kemeng Shi

[permalink] [raw]
Subject: Re: [PATCH 06/10] writeback: Factor out code of freerun to remove repeated code



on 5/2/2024 1:15 AM, Tejun Heo wrote:
>> +/*
>> + * Throttle it only when the background writeback cannot
>> + * catch-up. This avoids (excessively) small writeouts
>> + * when the wb limits are ramping up in case of !strictlimit.
>> + *
>> + * In strictlimit case make decision based on the wb counters
>> + * and limits. Small writeouts when the wb limits are ramping
>> + * up are the price we consciously pay for strictlimit-ing.
>> + */
> Can you please reflow the above to 80col or at least seventy something?
Sure, will do it in next version. Thanks.


2024-05-06 01:54:06

by Kemeng Shi

[permalink] [raw]
Subject: Re: [PATCH 07/10] writeback: factor out wb_dirty_freerun to remove more repeated freerun code


Hello,
on 5/2/2024 1:18 AM, Tejun Heo wrote:
> On Mon, Apr 29, 2024 at 11:47:35AM +0800, Kemeng Shi wrote:
> ...
>> +static void wb_dirty_freerun(struct dirty_throttle_control *dtc,
>> + bool strictlimit)
>> +{
> ...
>> + /*
>> + * LOCAL_THROTTLE tasks must not be throttled when below the per-wb
>> + * freerun ceiling.
>> + */
>> + if (!(current->flags & PF_LOCAL_THROTTLE))
>> + return;
>
> Shouldn't this set free_run to true?
Originally, we will go freerun if PF_LOCAL_THROTTLE is set and number of dirty
pages is under freerun ceil. So if PF_LOCAL_THROTTLE is *not* set, freerun
should be false. Maybe I miss something and please correct me, Thanks.
>
> Also, wouldn't it be better if these functions return bool instead of
> recording the result in dtc->freerun?
As I try to factor out balance_wb_limits to calculate freerun, dirty_exceeded
and position ratio of wb, so wb_dirty_freerun and wb_dirty_exceeded will be
called indirectly and balance_dirty_pages has to retrieve freerun and
dirty_exceeded from somewhere like dtc where position ratio is retrieved.
Would like to know any better idea.
Thanks.
>
> Thanks.
>


2024-05-06 02:15:31

by Kemeng Shi

[permalink] [raw]
Subject: Re: [PATCH 09/10] writeback: factor out wb_dirty_exceeded to remove repeated code


on 4/30/2024 3:24 PM, Dan Carpenter wrote:
> Hi Kemeng,
>
> kernel test robot noticed the following build warnings:
>
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Kemeng-Shi/writeback-factor-out-wb_bg_dirty_limits-to-remove-repeated-code/20240429-114903
> base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> patch link: https://lore.kernel.org/r/20240429034738.138609-10-shikemeng%40huaweicloud.com
> patch subject: [PATCH 09/10] writeback: factor out wb_dirty_exceeded to remove repeated code
> config: i386-randconfig-141-20240429 (https://download.01.org/0day-ci/archive/20240430/[email protected]/config)
> compiler: gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <[email protected]>
> | Reported-by: Dan Carpenter <[email protected]>
> | Closes: https://lore.kernel.org/r/[email protected]/
>
> New smatch warnings:
> mm/page-writeback.c:1903 balance_dirty_pages() error: we previously assumed 'mdtc' could be null (see line 1886)
Will fix this in next version. Thanks a lot for the report.

Kemeng