2024-01-11 12:25:27

by zhiguojiang

[permalink] [raw]
Subject: [PATCH v1 1/2] mm:vmscan: fix workingset eviction memcg issue

The parameter of target_memcg is NULL in workingset_eviction(), and
the lruvec obtained by mem_cgroup_lruvec(target_memcg, pgdat) is always
root_mem_cgroup->lruvec, rather than the lruvec of mem_cgroup where
folio is actually located.

Fix target_memcg to the memcg obtained by folio_memcg(folio).

Signed-off-by: Zhiguo Jiang <[email protected]>
---
include/linux/swap.h | 2 +-
mm/vmscan.c | 11 +++++------
mm/workingset.c | 6 +++---
3 files changed, 9 insertions(+), 10 deletions(-)
mode change 100644 => 100755 include/linux/swap.h
mode change 100644 => 100755 mm/vmscan.c
mode change 100644 => 100755 mm/workingset.c

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 41e4b484bc34..4de61f158903
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -346,7 +346,7 @@ static inline swp_entry_t page_swap_entry(struct page *page)
/* linux/mm/workingset.c */
bool workingset_test_recent(void *shadow, bool file, bool *workingset);
void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages);
-void *workingset_eviction(struct folio *folio, struct mem_cgroup *target_memcg);
+void *workingset_eviction(struct folio *folio);
void workingset_refault(struct folio *folio, void *shadow);
void workingset_activation(struct folio *folio);

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 91e7d334a7ca..8a1fbdaca042
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -693,7 +693,7 @@ static pageout_t pageout(struct folio *folio, struct address_space *mapping,
* gets returned with a refcount of 0.
*/
static int __remove_mapping(struct address_space *mapping, struct folio *folio,
- bool reclaimed, struct mem_cgroup *target_memcg)
+ bool reclaimed)
{
int refcount;
void *shadow = NULL;
@@ -742,7 +742,7 @@ static int __remove_mapping(struct address_space *mapping, struct folio *folio,
swp_entry_t swap = folio->swap;

if (reclaimed && !mapping_exiting(mapping))
- shadow = workingset_eviction(folio, target_memcg);
+ shadow = workingset_eviction(folio);
__delete_from_swap_cache(folio, swap, shadow);
mem_cgroup_swapout(folio, swap);
xa_unlock_irq(&mapping->i_pages);
@@ -769,7 +769,7 @@ static int __remove_mapping(struct address_space *mapping, struct folio *folio,
*/
if (reclaimed && folio_is_file_lru(folio) &&
!mapping_exiting(mapping) && !dax_mapping(mapping))
- shadow = workingset_eviction(folio, target_memcg);
+ shadow = workingset_eviction(folio);
__filemap_remove_folio(folio, shadow);
xa_unlock_irq(&mapping->i_pages);
if (mapping_shrinkable(mapping))
@@ -803,7 +803,7 @@ static int __remove_mapping(struct address_space *mapping, struct folio *folio,
*/
long remove_mapping(struct address_space *mapping, struct folio *folio)
{
- if (__remove_mapping(mapping, folio, false, NULL)) {
+ if (__remove_mapping(mapping, folio, false)) {
/*
* Unfreezing the refcount with 1 effectively
* drops the pagecache ref for us without requiring another
@@ -1417,8 +1417,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
*/
count_vm_events(PGLAZYFREED, nr_pages);
count_memcg_folio_events(folio, PGLAZYFREED, nr_pages);
- } else if (!mapping || !__remove_mapping(mapping, folio, true,
- sc->target_mem_cgroup))
+ } else if (!mapping || !__remove_mapping(mapping, folio, true))
goto keep_locked;

folio_unlock(folio);
diff --git a/mm/workingset.c b/mm/workingset.c
index 226012974328..f29396d4bf75
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -372,15 +372,15 @@ void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages)

/**
* workingset_eviction - note the eviction of a folio from memory
- * @target_memcg: the cgroup that is causing the reclaim
* @folio: the folio being evicted
*
* Return: a shadow entry to be stored in @folio->mapping->i_pages in place
* of the evicted @folio so that a later refault can be detected.
*/
-void *workingset_eviction(struct folio *folio, struct mem_cgroup *target_memcg)
+void *workingset_eviction(struct folio *folio)
{
struct pglist_data *pgdat = folio_pgdat(folio);
+ struct mem_cgroup *memcg = folio_memcg(folio);
unsigned long eviction;
struct lruvec *lruvec;
int memcgid;
@@ -393,7 +393,7 @@ void *workingset_eviction(struct folio *folio, struct mem_cgroup *target_memcg)
if (lru_gen_enabled())
return lru_gen_eviction(folio);

- lruvec = mem_cgroup_lruvec(target_memcg, pgdat);
+ lruvec = mem_cgroup_lruvec(memcg, pgdat);
/* XXX: target_memcg can be NULL, go through lruvec */
memcgid = mem_cgroup_id(lruvec_memcg(lruvec));
eviction = atomic_long_read(&lruvec->nonresident_age);
--
2.39.0



2024-01-11 13:41:37

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] mm:vmscan: fix workingset eviction memcg issue

On Thu, Jan 11, 2024 at 08:24:50PM +0800, Zhiguo Jiang wrote:
> The parameter of target_memcg is NULL in workingset_eviction(), and
> the lruvec obtained by mem_cgroup_lruvec(target_memcg, pgdat) is always
> root_mem_cgroup->lruvec, rather than the lruvec of mem_cgroup where
> folio is actually located.

WTF? No!

/*
* The memory cgroup that hit its limit and as a result is the
* primary target of this reclaim invocation.
*/
struct mem_cgroup *target_mem_cgroup;

The cgroup that is stored in the eviction cookie is the one whose
limit triggered the reclaim cycle. This is often several levels above
the cgroups that own the pages. Subsequent refaults need to be
evaluated at the eviction level:

/*
* The activation decision for this folio is made at the level
* where the eviction occurred, as that is where the LRU order
* during folio reclaim is being determined.
*
* However, the cgroup that will own the folio is the one that
* is actually experiencing the refault event.
*/

> Fix target_memcg to the memcg obtained by folio_memcg(folio).
>
> Signed-off-by: Zhiguo Jiang <[email protected]>

Nacked-by: Johannes Weiner <[email protected]>

Please take more time to read into the code you're proposing to
change. You made it sound like a trivial simplification, but this
totally screws up aging and pressure detection in containers.

2024-01-11 14:16:11

by zhiguojiang

[permalink] [raw]
Subject: Re: [PATCH v1 1/2] mm:vmscan: fix workingset eviction memcg issue



在 2024/1/11 21:41, Johannes Weiner 写道:
> On Thu, Jan 11, 2024 at 08:24:50PM +0800, Zhiguo Jiang wrote:
>> The parameter of target_memcg is NULL in workingset_eviction(), and
>> the lruvec obtained by mem_cgroup_lruvec(target_memcg, pgdat) is always
>> root_mem_cgroup->lruvec, rather than the lruvec of mem_cgroup where
>> folio is actually located.
> WTF? No!
>
> /*
> * The memory cgroup that hit its limit and as a result is the
> * primary target of this reclaim invocation.
> */
> struct mem_cgroup *target_mem_cgroup;
>
> The cgroup that is stored in the eviction cookie is the one whose
> limit triggered the reclaim cycle. This is often several levels above
> the cgroups that own the pages. Subsequent refaults need to be
> evaluated at the eviction level:
>
> /*
> * The activation decision for this folio is made at the level
> * where the eviction occurred, as that is where the LRU order
> * during folio reclaim is being determined.
> *
> * However, the cgroup that will own the folio is the one that
> * is actually experiencing the refault event.
> */
May I ask three questions: 1.I don't understand the meaning of this
paragraph. Can you explain it in detail ? 2.What are the characteristics
of folios managed by the root_mem_cgroup differring from other memcgs?
3.If shrink flow uses target_lruvec->anon_cost/file_cost, what is the
purposr of calculating the actual memcg's lruvec->anon_cost/file_cost in
lru_note_cost(), and the actual memcg's lruvec->anon_cost/file_cost
values are unused in shrink flow ? Thanks
>> Fix target_memcg to the memcg obtained by folio_memcg(folio).
>>
>> Signed-off-by: Zhiguo Jiang <[email protected]>
> Nacked-by: Johannes Weiner <[email protected]>
>
> Please take more time to read into the code you're proposing to
> change. You made it sound like a trivial simplification, but this
> totally screws up aging and pressure detection in containers.