2023-02-28 08:50:22

by Yosry Ahmed

[permalink] [raw]
Subject: [PATCH v1 0/2] Ignore non-LRU-based reclaim in memcg reclaim

Reclaimed pages through other means than LRU-based reclaim are tracked
through reclaim_state in struct scan_control, which is stashed in
current task_struct. These pages are added to the number of reclaimed
pages through LRUs. For memcg reclaim, these pages generally cannot be
linked to the memcg under reclaim and can cause an overestimated count
of reclaimed pages. This short series tries to address that.

Patch 1 is just refactoring updating reclaim_state into a helper
function, and renames reclaimed_slab to just reclaimed, with a comment
describing its true purpose.

Patch 2 ignores pages reclaimed outside of LRU reclaim in memcg reclaim.
The pages are uncharged anyway, so even if we end up under-reporting
reclaimed pages we will still succeed in making progress during
charging.

Do not let the diff stat trick you, patch 2 is a one-line change. All
the rest is moving a couple of functions around and a huge comment :)

RFC -> v1:
- Exported report_freed_pages in case XFS is built as a module (Matthew
Wilcox).
- Renamed reclaimed_slab to reclaim in previously missed MGLRU code.
- Refactored using reclaim_state to update sc->nr_reclaimed into a
helper and added an XL comment explaining why we ignore
reclaim_state->reclaimed in memcg reclaim (Johannes Weiner).

Yosry Ahmed (2):
mm: vmscan: refactor updating reclaimed pages in reclaim_state
mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim

fs/inode.c | 3 +-
fs/xfs/xfs_buf.c | 3 +-
include/linux/swap.h | 5 ++-
mm/slab.c | 3 +-
mm/slob.c | 6 ++--
mm/slub.c | 5 ++-
mm/vmscan.c | 79 +++++++++++++++++++++++++++++++++++---------
7 files changed, 74 insertions(+), 30 deletions(-)

--
2.39.2.722.g9855ee24e9-goog



2023-02-28 08:50:25

by Yosry Ahmed

[permalink] [raw]
Subject: [PATCH v1 1/2] mm: vmscan: refactor updating reclaimed pages in reclaim_state

During reclaim, we keep track of pages reclaimed from other means than
LRU-based reclaim through scan_control->reclaim_state->reclaimed_slab,
which we stash a pointer to in current task_struct.

However, we keep track of more than just reclaimed slab pages through
this. We also use it for clean file pages dropped through pruned inodes,
and xfs buffer pages freed. Rename reclaimed_slab to reclaimed, and add
a helper function that wraps updating it through current.

Signed-off-by: Yosry Ahmed <[email protected]>
---
fs/inode.c | 3 +--
fs/xfs/xfs_buf.c | 3 +--
include/linux/swap.h | 5 ++++-
mm/slab.c | 3 +--
mm/slob.c | 6 ++----
mm/slub.c | 5 ++---
mm/vmscan.c | 31 +++++++++++++++++++++++++------
7 files changed, 36 insertions(+), 20 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 4558dc2f1355..1022d8ac7205 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -864,8 +864,7 @@ static enum lru_status inode_lru_isolate(struct list_head *item,
__count_vm_events(KSWAPD_INODESTEAL, reap);
else
__count_vm_events(PGINODESTEAL, reap);
- if (current->reclaim_state)
- current->reclaim_state->reclaimed_slab += reap;
+ report_freed_pages(reap);
}
iput(inode);
spin_lock(lru_lock);
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 54c774af6e1c..060079f1e966 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -286,8 +286,7 @@ xfs_buf_free_pages(
if (bp->b_pages[i])
__free_page(bp->b_pages[i]);
}
- if (current->reclaim_state)
- current->reclaim_state->reclaimed_slab += bp->b_page_count;
+ report_freed_pages(bp->b_page_count);

if (bp->b_pages != bp->b_page_array)
kmem_free(bp->b_pages);
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 209a425739a9..525f0ae442f9 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -153,13 +153,16 @@ union swap_header {
* memory reclaim
*/
struct reclaim_state {
- unsigned long reclaimed_slab;
+ /* pages reclaimed outside of LRU-based reclaim */
+ unsigned long reclaimed;
#ifdef CONFIG_LRU_GEN
/* per-thread mm walk data */
struct lru_gen_mm_walk *mm_walk;
#endif
};

+void report_freed_pages(unsigned long pages);
+
#ifdef __KERNEL__

struct address_space;
diff --git a/mm/slab.c b/mm/slab.c
index dabc2a671fc6..325634416aab 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1392,8 +1392,7 @@ static void kmem_freepages(struct kmem_cache *cachep, struct slab *slab)
smp_wmb();
__folio_clear_slab(folio);

- if (current->reclaim_state)
- current->reclaim_state->reclaimed_slab += 1 << order;
+ report_freed_pages(1 << order);
unaccount_slab(slab, order, cachep);
__free_pages(&folio->page, order);
}
diff --git a/mm/slob.c b/mm/slob.c
index fe567fcfa3a3..71ee00e9dd46 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -61,7 +61,7 @@
#include <linux/slab.h>

#include <linux/mm.h>
-#include <linux/swap.h> /* struct reclaim_state */
+#include <linux/swap.h> /* report_freed_pages() */
#include <linux/cache.h>
#include <linux/init.h>
#include <linux/export.h>
@@ -211,9 +211,7 @@ static void slob_free_pages(void *b, int order)
{
struct page *sp = virt_to_page(b);

- if (current->reclaim_state)
- current->reclaim_state->reclaimed_slab += 1 << order;
-
+ report_freed_pages(1 << order);
mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B,
-(PAGE_SIZE << order));
__free_pages(sp, order);
diff --git a/mm/slub.c b/mm/slub.c
index 39327e98fce3..165319bf11f1 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -11,7 +11,7 @@
*/

#include <linux/mm.h>
-#include <linux/swap.h> /* struct reclaim_state */
+#include <linux/swap.h> /* report_freed_pages() */
#include <linux/module.h>
#include <linux/bit_spinlock.h>
#include <linux/interrupt.h>
@@ -2063,8 +2063,7 @@ static void __free_slab(struct kmem_cache *s, struct slab *slab)
/* Make the mapping reset visible before clearing the flag */
smp_wmb();
__folio_clear_slab(folio);
- if (current->reclaim_state)
- current->reclaim_state->reclaimed_slab += pages;
+ report_freed_pages(pages);
unaccount_slab(slab, order, s);
__free_pages(&folio->page, order);
}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9c1c5e8b24b8..8846531e85a4 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -200,6 +200,29 @@ static void set_task_reclaim_state(struct task_struct *task,
task->reclaim_state = rs;
}

+/*
+ * reclaim_report_freed_pages: report pages freed outside of LRU-based reclaim
+ * @pages: number of pages freed
+ *
+ * If the current process is undergoing a reclaim operation,
+ * increment the number of reclaimed pages by @pages.
+ */
+void report_freed_pages(unsigned long pages)
+{
+ if (current->reclaim_state)
+ current->reclaim_state->reclaimed += pages;
+}
+EXPORT_SYMBOL(report_freed_pages);
+
+static void add_non_vmscan_reclaimed(struct scan_control *sc,
+ struct reclaim_state *rs)
+{
+ if (rs) {
+ sc->nr_reclaimed += rs->reclaimed;
+ rs->reclaimed = 0;
+ }
+}
+
LIST_HEAD(shrinker_list);
DECLARE_RWSEM(shrinker_rwsem);

@@ -5346,8 +5369,7 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned,
sc->nr_reclaimed - reclaimed);

- sc->nr_reclaimed += current->reclaim_state->reclaimed_slab;
- current->reclaim_state->reclaimed_slab = 0;
+ add_non_vmscan_reclaimed(sc, current->reclaim_state);

return success ? MEMCG_LRU_YOUNG : 0;
}
@@ -6472,10 +6494,7 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)

shrink_node_memcgs(pgdat, sc);

- if (reclaim_state) {
- sc->nr_reclaimed += reclaim_state->reclaimed_slab;
- reclaim_state->reclaimed_slab = 0;
- }
+ add_non_vmscan_reclaimed(sc, reclaim_state);

/* Record the subtree's reclaim efficiency */
if (!sc->proactive)
--
2.39.2.722.g9855ee24e9-goog


2023-02-28 08:50:30

by Yosry Ahmed

[permalink] [raw]
Subject: [PATCH v1 2/2] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim

We keep track of different types of reclaimed pages through
reclaim_state->reclaimed, and we add them to the reported number of
reclaimed pages. For non-memcg reclaim, this makes sense. For memcg
reclaim, we have no clue if those pages are charged to the memcg under
reclaim.

Slab pages are shared by different memcgs, so a freed slab page may have
only been partially charged to the memcg under reclaim. The same goes
for clean file pages from pruned inodes (on highmem systems) or xfs
buffer pages, there is no way to link them to the memcg under reclaim.

Stop reporting those freed pages as reclaimed pages during memcg
reclaim. This should make the return value of writing to memory.reclaim,
and may help reduce unnecessary reclaim retries during memcg charging.

Generally, this should make the return value of
try_to_free_mem_cgroup_pages() more accurate. In some limited cases (e.g.
freed a slab page that was mostly charged to the memcg under reclaim),
the return value of try_to_free_mem_cgroup_pages() can be
underestimated, but this should be fine. The freed pages will be
uncharged anyway, and we can charge the memcg the next time around as we
usually do memcg reclaim in a retry loop.

Signed-off-by: Yosry Ahmed <[email protected]>
---
mm/vmscan.c | 50 +++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 39 insertions(+), 11 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8846531e85a4..c53659221965 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -188,6 +188,16 @@ struct scan_control {
*/
int vm_swappiness = 60;

+static bool cgroup_reclaim(struct scan_control *sc)
+{
+ return sc->target_mem_cgroup;
+}
+
+static bool global_reclaim(struct scan_control *sc)
+{
+ return !sc->target_mem_cgroup || mem_cgroup_is_root(sc->target_mem_cgroup);
+}
+
static void set_task_reclaim_state(struct task_struct *task,
struct reclaim_state *rs)
{
@@ -217,7 +227,35 @@ EXPORT_SYMBOL(report_freed_pages);
static void add_non_vmscan_reclaimed(struct scan_control *sc,
struct reclaim_state *rs)
{
- if (rs) {
+ /*
+ * Currently, reclaim_state->reclaimed includes three types of pages
+ * freed outside of vmscan:
+ * (1) Slab pages.
+ * (2) Clean file pages from pruned inodes.
+ * (3) XFS freed buffer pages.
+ *
+ * For all of these cases, we have no way of finding out whether these
+ * pages were related to the memcg under reclaim. For example, a freed
+ * slab page could have had only a single object charged to the memcg
+ * under reclaim. Also, populated inodes are not on shrinker LRUs
+ * anymore except on highmem systems.
+ *
+ * Instead of over-reporting the reclaimed pages in a memcg reclaim,
+ * only count such pages in system-wide reclaim. This prevents
+ * unnecessary retries during memcg charging and false positive from
+ * proactive reclaim (memory.reclaim).
+ *
+ * For uncommon cases were the freed pages were actually significantly
+ * charged to the memcg under reclaim, and we end up under-reporting, it
+ * should be fine. The freed pages will be uncharged anyway, even if
+ * they are not reported properly, and we will be able to make forward
+ * progress in charging (which is usually in a retry loop).
+ *
+ * We can go one step further, and report the uncharged objcg pages in
+ * memcg reclaim, to make reporting more accurate and reduce
+ * under-reporting, but it's probably not worth the complexity for now.
+ */
+ if (rs && !cgroup_reclaim(sc)) {
sc->nr_reclaimed += rs->reclaimed;
rs->reclaimed = 0;
}
@@ -463,16 +501,6 @@ void reparent_shrinker_deferred(struct mem_cgroup *memcg)
up_read(&shrinker_rwsem);
}

-static bool cgroup_reclaim(struct scan_control *sc)
-{
- return sc->target_mem_cgroup;
-}
-
-static bool global_reclaim(struct scan_control *sc)
-{
- return !sc->target_mem_cgroup || mem_cgroup_is_root(sc->target_mem_cgroup);
-}
-
/**
* writeback_throttling_sane - is the usual dirty throttling mechanism available?
* @sc: scan_control in question
--
2.39.2.722.g9855ee24e9-goog


2023-02-28 08:52:51

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] mm: vmscan: refactor updating reclaimed pages in reclaim_state

+Yu Zhao

On Tue, Feb 28, 2023 at 12:50 AM Yosry Ahmed <[email protected]> wrote:
>
> During reclaim, we keep track of pages reclaimed from other means than
> LRU-based reclaim through scan_control->reclaim_state->reclaimed_slab,
> which we stash a pointer to in current task_struct.
>
> However, we keep track of more than just reclaimed slab pages through
> this. We also use it for clean file pages dropped through pruned inodes,
> and xfs buffer pages freed. Rename reclaimed_slab to reclaimed, and add
> a helper function that wraps updating it through current.
>
> Signed-off-by: Yosry Ahmed <[email protected]>
> ---
> fs/inode.c | 3 +--
> fs/xfs/xfs_buf.c | 3 +--
> include/linux/swap.h | 5 ++++-
> mm/slab.c | 3 +--
> mm/slob.c | 6 ++----
> mm/slub.c | 5 ++---
> mm/vmscan.c | 31 +++++++++++++++++++++++++------
> 7 files changed, 36 insertions(+), 20 deletions(-)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index 4558dc2f1355..1022d8ac7205 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -864,8 +864,7 @@ static enum lru_status inode_lru_isolate(struct list_head *item,
> __count_vm_events(KSWAPD_INODESTEAL, reap);
> else
> __count_vm_events(PGINODESTEAL, reap);
> - if (current->reclaim_state)
> - current->reclaim_state->reclaimed_slab += reap;
> + report_freed_pages(reap);
> }
> iput(inode);
> spin_lock(lru_lock);
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 54c774af6e1c..060079f1e966 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -286,8 +286,7 @@ xfs_buf_free_pages(
> if (bp->b_pages[i])
> __free_page(bp->b_pages[i]);
> }
> - if (current->reclaim_state)
> - current->reclaim_state->reclaimed_slab += bp->b_page_count;
> + report_freed_pages(bp->b_page_count);
>
> if (bp->b_pages != bp->b_page_array)
> kmem_free(bp->b_pages);
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 209a425739a9..525f0ae442f9 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -153,13 +153,16 @@ union swap_header {
> * memory reclaim
> */
> struct reclaim_state {
> - unsigned long reclaimed_slab;
> + /* pages reclaimed outside of LRU-based reclaim */
> + unsigned long reclaimed;
> #ifdef CONFIG_LRU_GEN
> /* per-thread mm walk data */
> struct lru_gen_mm_walk *mm_walk;
> #endif
> };
>
> +void report_freed_pages(unsigned long pages);
> +
> #ifdef __KERNEL__
>
> struct address_space;
> diff --git a/mm/slab.c b/mm/slab.c
> index dabc2a671fc6..325634416aab 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1392,8 +1392,7 @@ static void kmem_freepages(struct kmem_cache *cachep, struct slab *slab)
> smp_wmb();
> __folio_clear_slab(folio);
>
> - if (current->reclaim_state)
> - current->reclaim_state->reclaimed_slab += 1 << order;
> + report_freed_pages(1 << order);
> unaccount_slab(slab, order, cachep);
> __free_pages(&folio->page, order);
> }
> diff --git a/mm/slob.c b/mm/slob.c
> index fe567fcfa3a3..71ee00e9dd46 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -61,7 +61,7 @@
> #include <linux/slab.h>
>
> #include <linux/mm.h>
> -#include <linux/swap.h> /* struct reclaim_state */
> +#include <linux/swap.h> /* report_freed_pages() */
> #include <linux/cache.h>
> #include <linux/init.h>
> #include <linux/export.h>
> @@ -211,9 +211,7 @@ static void slob_free_pages(void *b, int order)
> {
> struct page *sp = virt_to_page(b);
>
> - if (current->reclaim_state)
> - current->reclaim_state->reclaimed_slab += 1 << order;
> -
> + report_freed_pages(1 << order);
> mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE_B,
> -(PAGE_SIZE << order));
> __free_pages(sp, order);
> diff --git a/mm/slub.c b/mm/slub.c
> index 39327e98fce3..165319bf11f1 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -11,7 +11,7 @@
> */
>
> #include <linux/mm.h>
> -#include <linux/swap.h> /* struct reclaim_state */
> +#include <linux/swap.h> /* report_freed_pages() */
> #include <linux/module.h>
> #include <linux/bit_spinlock.h>
> #include <linux/interrupt.h>
> @@ -2063,8 +2063,7 @@ static void __free_slab(struct kmem_cache *s, struct slab *slab)
> /* Make the mapping reset visible before clearing the flag */
> smp_wmb();
> __folio_clear_slab(folio);
> - if (current->reclaim_state)
> - current->reclaim_state->reclaimed_slab += pages;
> + report_freed_pages(pages);
> unaccount_slab(slab, order, s);
> __free_pages(&folio->page, order);
> }
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 9c1c5e8b24b8..8846531e85a4 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -200,6 +200,29 @@ static void set_task_reclaim_state(struct task_struct *task,
> task->reclaim_state = rs;
> }
>
> +/*
> + * reclaim_report_freed_pages: report pages freed outside of LRU-based reclaim
> + * @pages: number of pages freed
> + *
> + * If the current process is undergoing a reclaim operation,
> + * increment the number of reclaimed pages by @pages.
> + */
> +void report_freed_pages(unsigned long pages)
> +{
> + if (current->reclaim_state)
> + current->reclaim_state->reclaimed += pages;
> +}
> +EXPORT_SYMBOL(report_freed_pages);
> +
> +static void add_non_vmscan_reclaimed(struct scan_control *sc,
> + struct reclaim_state *rs)
> +{
> + if (rs) {
> + sc->nr_reclaimed += rs->reclaimed;
> + rs->reclaimed = 0;
> + }
> +}
> +
> LIST_HEAD(shrinker_list);
> DECLARE_RWSEM(shrinker_rwsem);
>
> @@ -5346,8 +5369,7 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
> vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned,
> sc->nr_reclaimed - reclaimed);
>
> - sc->nr_reclaimed += current->reclaim_state->reclaimed_slab;
> - current->reclaim_state->reclaimed_slab = 0;
> + add_non_vmscan_reclaimed(sc, current->reclaim_state);
>
> return success ? MEMCG_LRU_YOUNG : 0;
> }
> @@ -6472,10 +6494,7 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
>
> shrink_node_memcgs(pgdat, sc);
>
> - if (reclaim_state) {
> - sc->nr_reclaimed += reclaim_state->reclaimed_slab;
> - reclaim_state->reclaimed_slab = 0;
> - }
> + add_non_vmscan_reclaimed(sc, reclaim_state);
>
> /* Record the subtree's reclaim efficiency */
> if (!sc->proactive)
> --
> 2.39.2.722.g9855ee24e9-goog
>

2023-02-28 11:45:41

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim

Hi Yosry,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on akpm-mm/mm-everything]
[cannot apply to vbabka-slab/for-next xfs-linux/for-next v6.2]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Yosry-Ahmed/mm-vmscan-refactor-updating-reclaimed-pages-in-reclaim_state/20230228-165214
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20230228085002.2592473-3-yosryahmed%40google.com
patch subject: [PATCH v1 2/2] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim
config: x86_64-randconfig-a014-20230227 (https://download.01.org/0day-ci/archive/20230228/[email protected]/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/f6d2b849f186a927925a29e289d60895048550f5
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Yosry-Ahmed/mm-vmscan-refactor-updating-reclaimed-pages-in-reclaim_state/20230228-165214
git checkout f6d2b849f186a927925a29e289d60895048550f5
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=x86_64 olddefconfig
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All errors (new ones prefixed by >>):

>> mm/vmscan.c:549:13: error: redefinition of 'cgroup_reclaim'
549 | static bool cgroup_reclaim(struct scan_control *sc)
| ^~~~~~~~~~~~~~
mm/vmscan.c:191:13: note: previous definition of 'cgroup_reclaim' with type 'bool(struct scan_control *)' {aka '_Bool(struct scan_control *)'}
191 | static bool cgroup_reclaim(struct scan_control *sc)
| ^~~~~~~~~~~~~~
>> mm/vmscan.c:554:13: error: redefinition of 'global_reclaim'
554 | static bool global_reclaim(struct scan_control *sc)
| ^~~~~~~~~~~~~~
mm/vmscan.c:196:13: note: previous definition of 'global_reclaim' with type 'bool(struct scan_control *)' {aka '_Bool(struct scan_control *)'}
196 | static bool global_reclaim(struct scan_control *sc)
| ^~~~~~~~~~~~~~
mm/vmscan.c:196:13: warning: 'global_reclaim' defined but not used [-Wunused-function]


vim +/cgroup_reclaim +549 mm/vmscan.c

86750830468506 Yang Shi 2021-05-04 548
b5ead35e7e1d34 Johannes Weiner 2019-11-30 @549 static bool cgroup_reclaim(struct scan_control *sc)
89b5fae5368f6a Johannes Weiner 2012-01-12 550 {
b5ead35e7e1d34 Johannes Weiner 2019-11-30 551 return false;
89b5fae5368f6a Johannes Weiner 2012-01-12 552 }
97c9341f727105 Tejun Heo 2015-05-22 553
a579086c99ed70 Yu Zhao 2022-12-21 @554 static bool global_reclaim(struct scan_control *sc)
a579086c99ed70 Yu Zhao 2022-12-21 555 {
a579086c99ed70 Yu Zhao 2022-12-21 556 return true;
a579086c99ed70 Yu Zhao 2022-12-21 557 }
a579086c99ed70 Yu Zhao 2022-12-21 558

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

2023-02-28 11:57:14

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim

Hi Yosry,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master next-20230228]
[cannot apply to vbabka-slab/for-next xfs-linux/for-next v6.2]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Yosry-Ahmed/mm-vmscan-refactor-updating-reclaimed-pages-in-reclaim_state/20230228-165214
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20230228085002.2592473-3-yosryahmed%40google.com
patch subject: [PATCH v1 2/2] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim
config: i386-randconfig-a002-20230227 (https://download.01.org/0day-ci/archive/20230228/[email protected]/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/f6d2b849f186a927925a29e289d60895048550f5
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Yosry-Ahmed/mm-vmscan-refactor-updating-reclaimed-pages-in-reclaim_state/20230228-165214
git checkout f6d2b849f186a927925a29e289d60895048550f5
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All errors (new ones prefixed by >>):

>> mm/vmscan.c:549:13: error: redefinition of 'cgroup_reclaim'
static bool cgroup_reclaim(struct scan_control *sc)
^
mm/vmscan.c:191:13: note: previous definition is here
static bool cgroup_reclaim(struct scan_control *sc)
^
>> mm/vmscan.c:554:13: error: redefinition of 'global_reclaim'
static bool global_reclaim(struct scan_control *sc)
^
mm/vmscan.c:196:13: note: previous definition is here
static bool global_reclaim(struct scan_control *sc)
^
2 errors generated.


vim +/cgroup_reclaim +549 mm/vmscan.c

86750830468506 Yang Shi 2021-05-04 548
b5ead35e7e1d34 Johannes Weiner 2019-11-30 @549 static bool cgroup_reclaim(struct scan_control *sc)
89b5fae5368f6a Johannes Weiner 2012-01-12 550 {
b5ead35e7e1d34 Johannes Weiner 2019-11-30 551 return false;
89b5fae5368f6a Johannes Weiner 2012-01-12 552 }
97c9341f727105 Tejun Heo 2015-05-22 553
a579086c99ed70 Yu Zhao 2022-12-21 @554 static bool global_reclaim(struct scan_control *sc)
a579086c99ed70 Yu Zhao 2022-12-21 555 {
a579086c99ed70 Yu Zhao 2022-12-21 556 return true;
a579086c99ed70 Yu Zhao 2022-12-21 557 }
a579086c99ed70 Yu Zhao 2022-12-21 558

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

2023-02-28 17:19:06

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim

On Tue, Feb 28, 2023 at 3:56 AM kernel test robot <[email protected]> wrote:
>
> Hi Yosry,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on akpm-mm/mm-everything]
> [also build test ERROR on linus/master next-20230228]
> [cannot apply to vbabka-slab/for-next xfs-linux/for-next v6.2]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Yosry-Ahmed/mm-vmscan-refactor-updating-reclaimed-pages-in-reclaim_state/20230228-165214
> base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> patch link: https://lore.kernel.org/r/20230228085002.2592473-3-yosryahmed%40google.com
> patch subject: [PATCH v1 2/2] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim
> config: i386-randconfig-a002-20230227 (https://download.01.org/0day-ci/archive/20230228/[email protected]/config)
> compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
> reproduce (this is a W=1 build):
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # https://github.com/intel-lab-lkp/linux/commit/f6d2b849f186a927925a29e289d60895048550f5
> git remote add linux-review https://github.com/intel-lab-lkp/linux
> git fetch --no-tags linux-review Yosry-Ahmed/mm-vmscan-refactor-updating-reclaimed-pages-in-reclaim_state/20230228-165214
> git checkout f6d2b849f186a927925a29e289d60895048550f5
> # save the config file
> mkdir build_dir && cp config build_dir/.config
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 olddefconfig
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash
>
> If you fix the issue, kindly add following tag where applicable
> | Reported-by: kernel test robot <[email protected]>
> | Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/
>
> All errors (new ones prefixed by >>):
>
> >> mm/vmscan.c:549:13: error: redefinition of 'cgroup_reclaim'
> static bool cgroup_reclaim(struct scan_control *sc)
> ^
> mm/vmscan.c:191:13: note: previous definition is here
> static bool cgroup_reclaim(struct scan_control *sc)
> ^
> >> mm/vmscan.c:554:13: error: redefinition of 'global_reclaim'
> static bool global_reclaim(struct scan_control *sc)
> ^
> mm/vmscan.c:196:13: note: previous definition is here
> static bool global_reclaim(struct scan_control *sc)
> ^
> 2 errors generated.

Ugh yeah I didn't realize I am moving the definitions from within an
#ifdef CONFIG_MEMCG. I will just leave the definitions as-is and add a
forward declaration before the definition of
add_non_vmscan_reclaimed(), should also reduce the churn in the diff.
Will wait for a bit before re-spinning to gather some feedback on the
current version first.

>
>
> vim +/cgroup_reclaim +549 mm/vmscan.c
>
> 86750830468506 Yang Shi 2021-05-04 548
> b5ead35e7e1d34 Johannes Weiner 2019-11-30 @549 static bool cgroup_reclaim(struct scan_control *sc)
> 89b5fae5368f6a Johannes Weiner 2012-01-12 550 {
> b5ead35e7e1d34 Johannes Weiner 2019-11-30 551 return false;
> 89b5fae5368f6a Johannes Weiner 2012-01-12 552 }
> 97c9341f727105 Tejun Heo 2015-05-22 553
> a579086c99ed70 Yu Zhao 2022-12-21 @554 static bool global_reclaim(struct scan_control *sc)
> a579086c99ed70 Yu Zhao 2022-12-21 555 {
> a579086c99ed70 Yu Zhao 2022-12-21 556 return true;
> a579086c99ed70 Yu Zhao 2022-12-21 557 }
> a579086c99ed70 Yu Zhao 2022-12-21 558
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests

2023-02-28 17:25:31

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [PATCH v1 2/2] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim

On Tue, Feb 28, 2023 at 9:18 AM Yosry Ahmed <[email protected]> wrote:
>
> On Tue, Feb 28, 2023 at 3:56 AM kernel test robot <[email protected]> wrote:
> >
> > Hi Yosry,
> >
> > Thank you for the patch! Yet something to improve:
> >
> > [auto build test ERROR on akpm-mm/mm-everything]
> > [also build test ERROR on linus/master next-20230228]
> > [cannot apply to vbabka-slab/for-next xfs-linux/for-next v6.2]
> > [If your patch is applied to the wrong git tree, kindly drop us a note.
> > And when submitting patch, we suggest to use '--base' as documented in
> > https://git-scm.com/docs/git-format-patch#_base_tree_information]
> >
> > url: https://github.com/intel-lab-lkp/linux/commits/Yosry-Ahmed/mm-vmscan-refactor-updating-reclaimed-pages-in-reclaim_state/20230228-165214
> > base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> > patch link: https://lore.kernel.org/r/20230228085002.2592473-3-yosryahmed%40google.com
> > patch subject: [PATCH v1 2/2] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim
> > config: i386-randconfig-a002-20230227 (https://download.01.org/0day-ci/archive/20230228/[email protected]/config)
> > compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
> > reproduce (this is a W=1 build):
> > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> > chmod +x ~/bin/make.cross
> > # https://github.com/intel-lab-lkp/linux/commit/f6d2b849f186a927925a29e289d60895048550f5
> > git remote add linux-review https://github.com/intel-lab-lkp/linux
> > git fetch --no-tags linux-review Yosry-Ahmed/mm-vmscan-refactor-updating-reclaimed-pages-in-reclaim_state/20230228-165214
> > git checkout f6d2b849f186a927925a29e289d60895048550f5
> > # save the config file
> > mkdir build_dir && cp config build_dir/.config
> > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 olddefconfig
> > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash
> >
> > If you fix the issue, kindly add following tag where applicable
> > | Reported-by: kernel test robot <[email protected]>
> > | Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/
> >
> > All errors (new ones prefixed by >>):
> >
> > >> mm/vmscan.c:549:13: error: redefinition of 'cgroup_reclaim'
> > static bool cgroup_reclaim(struct scan_control *sc)
> > ^
> > mm/vmscan.c:191:13: note: previous definition is here
> > static bool cgroup_reclaim(struct scan_control *sc)
> > ^
> > >> mm/vmscan.c:554:13: error: redefinition of 'global_reclaim'
> > static bool global_reclaim(struct scan_control *sc)
> > ^
> > mm/vmscan.c:196:13: note: previous definition is here
> > static bool global_reclaim(struct scan_control *sc)
> > ^
> > 2 errors generated.
>
> Ugh yeah I didn't realize I am moving the definitions from within an
> #ifdef CONFIG_MEMCG. I will just leave the definitions as-is and add a
> forward declaration before the definition of
> add_non_vmscan_reclaimed(), should also reduce the churn in the diff.
> Will wait for a bit before re-spinning to gather some feedback on the
> current version first.

I can also just move all the reclaim state functions
(set_task_reclaim_state(), report_freed_pages(),
add_non_vmscan_reclaimed()) below that #ifdef CONFIG_MEMCG.
Might also name them more consistently.


>
> >
> >
> > vim +/cgroup_reclaim +549 mm/vmscan.c
> >
> > 86750830468506 Yang Shi 2021-05-04 548
> > b5ead35e7e1d34 Johannes Weiner 2019-11-30 @549 static bool cgroup_reclaim(struct scan_control *sc)
> > 89b5fae5368f6a Johannes Weiner 2012-01-12 550 {
> > b5ead35e7e1d34 Johannes Weiner 2019-11-30 551 return false;
> > 89b5fae5368f6a Johannes Weiner 2012-01-12 552 }
> > 97c9341f727105 Tejun Heo 2015-05-22 553
> > a579086c99ed70 Yu Zhao 2022-12-21 @554 static bool global_reclaim(struct scan_control *sc)
> > a579086c99ed70 Yu Zhao 2022-12-21 555 {
> > a579086c99ed70 Yu Zhao 2022-12-21 556 return true;
> > a579086c99ed70 Yu Zhao 2022-12-21 557 }
> > a579086c99ed70 Yu Zhao 2022-12-21 558
> >
> > --
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests

2023-03-08 06:55:32

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] Ignore non-LRU-based reclaim in memcg reclaim

On Tue, Feb 28, 2023 at 12:50 AM Yosry Ahmed <[email protected]> wrote:
>
> Reclaimed pages through other means than LRU-based reclaim are tracked
> through reclaim_state in struct scan_control, which is stashed in
> current task_struct. These pages are added to the number of reclaimed
> pages through LRUs. For memcg reclaim, these pages generally cannot be
> linked to the memcg under reclaim and can cause an overestimated count
> of reclaimed pages. This short series tries to address that.
>
> Patch 1 is just refactoring updating reclaim_state into a helper
> function, and renames reclaimed_slab to just reclaimed, with a comment
> describing its true purpose.
>
> Patch 2 ignores pages reclaimed outside of LRU reclaim in memcg reclaim.
> The pages are uncharged anyway, so even if we end up under-reporting
> reclaimed pages we will still succeed in making progress during
> charging.
>
> Do not let the diff stat trick you, patch 2 is a one-line change. All
> the rest is moving a couple of functions around and a huge comment :)
>
> RFC -> v1:
> - Exported report_freed_pages in case XFS is built as a module (Matthew
> Wilcox).
> - Renamed reclaimed_slab to reclaim in previously missed MGLRU code.
> - Refactored using reclaim_state to update sc->nr_reclaimed into a
> helper and added an XL comment explaining why we ignore
> reclaim_state->reclaimed in memcg reclaim (Johannes Weiner).
>
> Yosry Ahmed (2):
> mm: vmscan: refactor updating reclaimed pages in reclaim_state
> mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim
>
> fs/inode.c | 3 +-
> fs/xfs/xfs_buf.c | 3 +-
> include/linux/swap.h | 5 ++-
> mm/slab.c | 3 +-
> mm/slob.c | 6 ++--
> mm/slub.c | 5 ++-
> mm/vmscan.c | 79 +++++++++++++++++++++++++++++++++++---------
> 7 files changed, 74 insertions(+), 30 deletions(-)
>
> --
> 2.39.2.722.g9855ee24e9-goog
>

Friendly ping on this series, any comments or thoughts -- especially
on the memcg side?

2023-03-08 16:03:17

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] Ignore non-LRU-based reclaim in memcg reclaim

Hello Yosry,

On Tue, Feb 28, 2023 at 08:50:00AM +0000, Yosry Ahmed wrote:
> Reclaimed pages through other means than LRU-based reclaim are tracked
> through reclaim_state in struct scan_control, which is stashed in
> current task_struct. These pages are added to the number of reclaimed
> pages through LRUs. For memcg reclaim, these pages generally cannot be
> linked to the memcg under reclaim and can cause an overestimated count
> of reclaimed pages. This short series tries to address that.

Could you please add more details on how this manifests as a problem
with real workloads?

> Patch 1 is just refactoring updating reclaim_state into a helper
> function, and renames reclaimed_slab to just reclaimed, with a comment
> describing its true purpose.

Looking through the code again, I don't think these helpers add value.

report_freed_pages() is fairly vague. Report to who? It abstracts only
two lines of code, and those two lines are more descriptive of what's
happening than the helper is. Just leave them open-coded.

add_non_vmanscan_reclaimed() may or may not add anything. But let's
take a step back. It only has two callsites because lrugen duplicates
the entire reclaim implementation, including the call to shrink_slab()
and the transfer of reclaim_state to sc->nr_reclaimed.

IMO the resulting code would overall be simpler, less duplicative and
easier to follow if you added a common shrink_slab_reclaim() that
takes sc, handles the transfer, and documents the memcg exception.

2023-03-08 18:02:44

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] Ignore non-LRU-based reclaim in memcg reclaim

On Wed, Mar 8, 2023 at 8:00 AM Johannes Weiner <[email protected]> wrote:
>
> Hello Yosry,
>
> On Tue, Feb 28, 2023 at 08:50:00AM +0000, Yosry Ahmed wrote:
> > Reclaimed pages through other means than LRU-based reclaim are tracked
> > through reclaim_state in struct scan_control, which is stashed in
> > current task_struct. These pages are added to the number of reclaimed
> > pages through LRUs. For memcg reclaim, these pages generally cannot be
> > linked to the memcg under reclaim and can cause an overestimated count
> > of reclaimed pages. This short series tries to address that.
>
> Could you please add more details on how this manifests as a problem
> with real workloads?

We haven't observed problems in production workloads, but we have
observed problems in testing using memory.reclaim when sometimes a
write to memory.reclaim would succeed when we didn't fully reclaim the
requested amount. This leads to tests flaking sometimes, and we have
to look into the failures to find out if there is a real problem or
not.

>
> > Patch 1 is just refactoring updating reclaim_state into a helper
> > function, and renames reclaimed_slab to just reclaimed, with a comment
> > describing its true purpose.
>
> Looking through the code again, I don't think these helpers add value.
>
> report_freed_pages() is fairly vague. Report to who? It abstracts only
> two lines of code, and those two lines are more descriptive of what's
> happening than the helper is. Just leave them open-coded.

I agree the name is not great, I am usually bad at naming things and
hope people would point that out (like you're doing now). The reason I
added it is to contain the logic within mm/vmscan.c such that future
changes do not have to add noisy diffs to a lot of unrelated files. If
you have a better name that makes more sense to you please let me
know, otherwise I'm fine dropping the helper as well, no strong
opinions here.

>
> add_non_vmanscan_reclaimed() may or may not add anything. But let's
> take a step back. It only has two callsites because lrugen duplicates
> the entire reclaim implementation, including the call to shrink_slab()
> and the transfer of reclaim_state to sc->nr_reclaimed.
>
> IMO the resulting code would overall be simpler, less duplicative and
> easier to follow if you added a common shrink_slab_reclaim() that
> takes sc, handles the transfer, and documents the memcg exception.

IIUC you mean something like:

void shrink_slab_reclaim(struct scan_control *sc, pg_data_t *pgdat,
struct mem_cgroup *memcg)
{
shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority);

/* very long comment */
if (current->reclaim_state && !cgroup_reclaim(sc)) {
sc->nr_reclaimed += current->reclaim_state->reclaimed;
current->reclaim_state->reclaimed = 0;
}
}

The difference would be that today we handle the transfer once after
we scan all memcgs in classic lruvec, while we do the transfer once
per-memcg in lrugen. With this change, we would be doing the transfer
once per-memcg for both, but I guess that's not a big deal.

What I don't like about this is that it doubles down on associating
the counter in reclaim_state with slab, which is the opposite of what
patch 1 does (renaming reclaimed_slab to just reclaimed). If we do
this, maybe it's better from a consistency perspective to leave it as
reclaimed_slab, with a comment announcing that this is mainly used for
slab, but others are piggybacking on it. It seems like whatever we do
is not going to be ideal in this case, but we should at least be
consistent. Either add shrink_slab_reclaim(), leave it as
reclaimed_slab and call out other users as piggybackers -- or make it
generic and separate from slab completely, with a separate helper to
do the transfer.

I do not have a strong opinion here, so let me know what you prefer.

2023-03-08 20:17:12

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] Ignore non-LRU-based reclaim in memcg reclaim

On Wed, Mar 08, 2023 at 10:01:24AM -0800, Yosry Ahmed wrote:
> On Wed, Mar 8, 2023 at 8:00 AM Johannes Weiner <[email protected]> wrote:
> >
> > Hello Yosry,
> >
> > On Tue, Feb 28, 2023 at 08:50:00AM +0000, Yosry Ahmed wrote:
> > > Reclaimed pages through other means than LRU-based reclaim are tracked
> > > through reclaim_state in struct scan_control, which is stashed in
> > > current task_struct. These pages are added to the number of reclaimed
> > > pages through LRUs. For memcg reclaim, these pages generally cannot be
> > > linked to the memcg under reclaim and can cause an overestimated count
> > > of reclaimed pages. This short series tries to address that.
> >
> > Could you please add more details on how this manifests as a problem
> > with real workloads?
>
> We haven't observed problems in production workloads, but we have
> observed problems in testing using memory.reclaim when sometimes a
> write to memory.reclaim would succeed when we didn't fully reclaim the
> requested amount. This leads to tests flaking sometimes, and we have
> to look into the failures to find out if there is a real problem or
> not.

Ah, that would be great to have in the cover letter. Thanks!

Have you also tested this patch against prod without memory.reclaim?
Just to make sure there are no problems with cgroup OOMs or
similar. There shouldn't be, but, you know...

> > > Patch 1 is just refactoring updating reclaim_state into a helper
> > > function, and renames reclaimed_slab to just reclaimed, with a comment
> > > describing its true purpose.
> >
> > Looking through the code again, I don't think these helpers add value.
> >
> > report_freed_pages() is fairly vague. Report to who? It abstracts only
> > two lines of code, and those two lines are more descriptive of what's
> > happening than the helper is. Just leave them open-coded.
>
> I agree the name is not great, I am usually bad at naming things and
> hope people would point that out (like you're doing now). The reason I
> added it is to contain the logic within mm/vmscan.c such that future
> changes do not have to add noisy diffs to a lot of unrelated files. If
> you have a better name that makes more sense to you please let me
> know, otherwise I'm fine dropping the helper as well, no strong
> opinions here.

I tried to come up with something better, but wasn't happy with any of
the options, either. So I defaulted to just leaving it alone :-)

It's part of the shrinker API and the name hasn't changed since the
initial git import of the kernel tree. It should be fine, churn-wise.

> > add_non_vmanscan_reclaimed() may or may not add anything. But let's
> > take a step back. It only has two callsites because lrugen duplicates
> > the entire reclaim implementation, including the call to shrink_slab()
> > and the transfer of reclaim_state to sc->nr_reclaimed.
> >
> > IMO the resulting code would overall be simpler, less duplicative and
> > easier to follow if you added a common shrink_slab_reclaim() that
> > takes sc, handles the transfer, and documents the memcg exception.
>
> IIUC you mean something like:
>
> void shrink_slab_reclaim(struct scan_control *sc, pg_data_t *pgdat,
> struct mem_cgroup *memcg)
> {
> shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority);
>
> /* very long comment */
> if (current->reclaim_state && !cgroup_reclaim(sc)) {
> sc->nr_reclaimed += current->reclaim_state->reclaimed;
> current->reclaim_state->reclaimed = 0;
> }
> }

Sorry, I screwed up, that doesn't actually work.

reclaim_state is used by buffer heads freed in shrink_folio_list() ->
filemap_release_folio(). So flushing the count cannot be shrink_slab()
specific. Bummer. Your patch had it right by making a helper for just
flushing the reclaim state. But add_non_vmscan_reclaimed() is then
also not a great name because these frees are directly from vmscan.

Maybe simply flush_reclaim_state()?

As far as the name reclaimed_slab, I agree it's not optimal, although
90% accurate ;-) I wouldn't mind a rename to just 'reclaimed'.

2023-03-08 20:25:01

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] Ignore non-LRU-based reclaim in memcg reclaim

On Wed, Mar 8, 2023 at 12:16 PM Johannes Weiner <[email protected]> wrote:
>
> On Wed, Mar 08, 2023 at 10:01:24AM -0800, Yosry Ahmed wrote:
> > On Wed, Mar 8, 2023 at 8:00 AM Johannes Weiner <[email protected]> wrote:
> > >
> > > Hello Yosry,
> > >
> > > On Tue, Feb 28, 2023 at 08:50:00AM +0000, Yosry Ahmed wrote:
> > > > Reclaimed pages through other means than LRU-based reclaim are tracked
> > > > through reclaim_state in struct scan_control, which is stashed in
> > > > current task_struct. These pages are added to the number of reclaimed
> > > > pages through LRUs. For memcg reclaim, these pages generally cannot be
> > > > linked to the memcg under reclaim and can cause an overestimated count
> > > > of reclaimed pages. This short series tries to address that.
> > >
> > > Could you please add more details on how this manifests as a problem
> > > with real workloads?
> >
> > We haven't observed problems in production workloads, but we have
> > observed problems in testing using memory.reclaim when sometimes a
> > write to memory.reclaim would succeed when we didn't fully reclaim the
> > requested amount. This leads to tests flaking sometimes, and we have
> > to look into the failures to find out if there is a real problem or
> > not.
>
> Ah, that would be great to have in the cover letter. Thanks!

Will do in the next version.

>
> Have you also tested this patch against prod without memory.reclaim?
> Just to make sure there are no problems with cgroup OOMs or
> similar. There shouldn't be, but, you know...

Honestly, no. I was debugging a test flake and I spotted that this was
the cause, came up with patches to address it, and sent it to the
mailing list for feedback. We did not want to merge it internally if
it's not going to land upstream -- with the rationale that making the
test more tolerant might be better than maintaining the patch
internally, although that is not ideal of course (as it can hide
actual failures from different sources).

>
> > > > Patch 1 is just refactoring updating reclaim_state into a helper
> > > > function, and renames reclaimed_slab to just reclaimed, with a comment
> > > > describing its true purpose.
> > >
> > > Looking through the code again, I don't think these helpers add value.
> > >
> > > report_freed_pages() is fairly vague. Report to who? It abstracts only
> > > two lines of code, and those two lines are more descriptive of what's
> > > happening than the helper is. Just leave them open-coded.
> >
> > I agree the name is not great, I am usually bad at naming things and
> > hope people would point that out (like you're doing now). The reason I
> > added it is to contain the logic within mm/vmscan.c such that future
> > changes do not have to add noisy diffs to a lot of unrelated files. If
> > you have a better name that makes more sense to you please let me
> > know, otherwise I'm fine dropping the helper as well, no strong
> > opinions here.
>
> I tried to come up with something better, but wasn't happy with any of
> the options, either. So I defaulted to just leaving it alone :-)
>
> It's part of the shrinker API and the name hasn't changed since the
> initial git import of the kernel tree. It should be fine, churn-wise.

Last attempt, just update_reclaim_state() (corresponding to
flush_reclaim_state() below). It doesn't tell a story, but neither
does incrementing a counter in current->reclaim_state. If that doesn't
make you happy I'll give up now and leave it as-is :)

>
> > > add_non_vmanscan_reclaimed() may or may not add anything. But let's
> > > take a step back. It only has two callsites because lrugen duplicates
> > > the entire reclaim implementation, including the call to shrink_slab()
> > > and the transfer of reclaim_state to sc->nr_reclaimed.
> > >
> > > IMO the resulting code would overall be simpler, less duplicative and
> > > easier to follow if you added a common shrink_slab_reclaim() that
> > > takes sc, handles the transfer, and documents the memcg exception.
> >
> > IIUC you mean something like:
> >
> > void shrink_slab_reclaim(struct scan_control *sc, pg_data_t *pgdat,
> > struct mem_cgroup *memcg)
> > {
> > shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority);
> >
> > /* very long comment */
> > if (current->reclaim_state && !cgroup_reclaim(sc)) {
> > sc->nr_reclaimed += current->reclaim_state->reclaimed;
> > current->reclaim_state->reclaimed = 0;
> > }
> > }
>
> Sorry, I screwed up, that doesn't actually work.
>
> reclaim_state is used by buffer heads freed in shrink_folio_list() ->
> filemap_release_folio(). So flushing the count cannot be shrink_slab()
> specific. Bummer. Your patch had it right by making a helper for just
> flushing the reclaim state. But add_non_vmscan_reclaimed() is then
> also not a great name because these frees are directly from vmscan.
>
> Maybe simply flush_reclaim_state()?

Sounds good and simple enough, I will use that for the next version.

>
> As far as the name reclaimed_slab, I agree it's not optimal, although
> 90% accurate ;-) I wouldn't mind a rename to just 'reclaimed'.

Got it.

2023-03-08 21:25:45

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] Ignore non-LRU-based reclaim in memcg reclaim

On Wed, Mar 08, 2023 at 12:24:08PM -0800, Yosry Ahmed wrote:
> > I tried to come up with something better, but wasn't happy with any of
> > the options, either. So I defaulted to just leaving it alone :-)
> >
> > It's part of the shrinker API and the name hasn't changed since the
> > initial git import of the kernel tree. It should be fine, churn-wise.
>
> Last attempt, just update_reclaim_state() (corresponding to
> flush_reclaim_state() below). It doesn't tell a story, but neither
> does incrementing a counter in current->reclaim_state. If that doesn't
> make you happy I'll give up now and leave it as-is :)

This is used in different subsystem shrinkers outside mm/, so the
name needs to be correctly namespaced. Please prefix it with the
subsystem the function belongs to, at minimum.

mm_account_reclaimed_pages() is what is actually being done here.
It is self describing and leaves behind no ambiguity as to what is
being accounted and why, nor which subsystem the accounting belongs
to.

It doesn't matter what the internal mm/vmscan structures are called,
all we care about is telling the mm infrastructure how many extra
pages were freed by the shrinker....

-Dave.
--
Dave Chinner
[email protected]

2023-03-08 21:32:20

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] Ignore non-LRU-based reclaim in memcg reclaim

On Wed, Mar 8, 2023 at 1:25 PM Dave Chinner <[email protected]> wrote:
>
> On Wed, Mar 08, 2023 at 12:24:08PM -0800, Yosry Ahmed wrote:
> > > I tried to come up with something better, but wasn't happy with any of
> > > the options, either. So I defaulted to just leaving it alone :-)
> > >
> > > It's part of the shrinker API and the name hasn't changed since the
> > > initial git import of the kernel tree. It should be fine, churn-wise.
> >
> > Last attempt, just update_reclaim_state() (corresponding to
> > flush_reclaim_state() below). It doesn't tell a story, but neither
> > does incrementing a counter in current->reclaim_state. If that doesn't
> > make you happy I'll give up now and leave it as-is :)
>
> This is used in different subsystem shrinkers outside mm/, so the
> name needs to be correctly namespaced. Please prefix it with the
> subsystem the function belongs to, at minimum.
>
> mm_account_reclaimed_pages() is what is actually being done here.
> It is self describing and leaves behind no ambiguity as to what is
> being accounted and why, nor which subsystem the accounting belongs
> to.
>
> It doesn't matter what the internal mm/vmscan structures are called,
> all we care about is telling the mm infrastructure how many extra
> pages were freed by the shrinker....

mm_account_reclaimed_pages() sounds good to me. We can also do
something more specific if Johannes has any ideas. I do not have a
strong opinion here at all, I just prefer having a helper to leaving
it open-coded.

Thanks!

>
> -Dave.
> --
> Dave Chinner
> [email protected]

2023-03-09 04:08:31

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH v1 0/2] Ignore non-LRU-based reclaim in memcg reclaim

On Thu, Mar 09, 2023 at 08:25:29AM +1100, Dave Chinner wrote:
> On Wed, Mar 08, 2023 at 12:24:08PM -0800, Yosry Ahmed wrote:
> > > I tried to come up with something better, but wasn't happy with any of
> > > the options, either. So I defaulted to just leaving it alone :-)
> > >
> > > It's part of the shrinker API and the name hasn't changed since the
> > > initial git import of the kernel tree. It should be fine, churn-wise.
> >
> > Last attempt, just update_reclaim_state() (corresponding to
> > flush_reclaim_state() below). It doesn't tell a story, but neither
> > does incrementing a counter in current->reclaim_state. If that doesn't
> > make you happy I'll give up now and leave it as-is :)
>
> This is used in different subsystem shrinkers outside mm/, so the
> name needs to be correctly namespaced. Please prefix it with the
> subsystem the function belongs to, at minimum.
>
> mm_account_reclaimed_pages() is what is actually being done here.
> It is self describing and leaves behind no ambiguity as to what is
> being accounted and why, nor which subsystem the accounting belongs
> to.
>
> It doesn't matter what the internal mm/vmscan structures are called,
> all we care about is telling the mm infrastructure how many extra
> pages were freed by the shrinker....

My first preference would still be to just leave it. IMO that one line
saved in a small handful of places isn't worth the indirection,
obscuring the `current' deref etc.

But mm_account_reclaimed_pages() works for me if you really want to
enapsulate it.