2020-08-11 11:13:09

by Alex Shi

[permalink] [raw]
Subject: [Resend PATCH 1/6] mm/memcg: warning on !memcg after readahead page charged

Since readahead page is charged on memcg too, in theory we don't have to
check this exception now. Before safely remove them all, add a warning
for the unexpected !memcg.

Signed-off-by: Alex Shi <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
include/linux/mmdebug.h | 13 +++++++++++++
mm/memcontrol.c | 15 ++++++++-------
2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
index 2ad72d2c8cc5..4ed52879ce55 100644
--- a/include/linux/mmdebug.h
+++ b/include/linux/mmdebug.h
@@ -37,6 +37,18 @@
BUG(); \
} \
} while (0)
+#define VM_WARN_ON_ONCE_PAGE(cond, page) ({ \
+ static bool __section(.data.once) __warned; \
+ int __ret_warn_once = !!(cond); \
+ \
+ if (unlikely(__ret_warn_once && !__warned)) { \
+ dump_page(page, "VM_WARN_ON_ONCE_PAGE(" __stringify(cond)")");\
+ __warned = true; \
+ WARN_ON(1); \
+ } \
+ unlikely(__ret_warn_once); \
+})
+
#define VM_WARN_ON(cond) (void)WARN_ON(cond)
#define VM_WARN_ON_ONCE(cond) (void)WARN_ON_ONCE(cond)
#define VM_WARN_ONCE(cond, format...) (void)WARN_ONCE(cond, format)
@@ -48,6 +60,7 @@
#define VM_BUG_ON_MM(cond, mm) VM_BUG_ON(cond)
#define VM_WARN_ON(cond) BUILD_BUG_ON_INVALID(cond)
#define VM_WARN_ON_ONCE(cond) BUILD_BUG_ON_INVALID(cond)
+#define VM_WARN_ON_ONCE_PAGE(cond, page) BUILD_BUG_ON_INVALID(cond)
#define VM_WARN_ONCE(cond, format...) BUILD_BUG_ON_INVALID(cond)
#define VM_WARN(cond, format...) BUILD_BUG_ON_INVALID(cond)
#endif
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 130093bdf74b..299382fc55a9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1322,10 +1322,8 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd
}

memcg = page->mem_cgroup;
- /*
- * Swapcache readahead pages are added to the LRU - and
- * possibly migrated - before they are charged.
- */
+ /* Readahead page is charged too, to see if other page uncharged */
+ VM_WARN_ON_ONCE_PAGE(!memcg, page);
if (!memcg)
memcg = root_mem_cgroup;

@@ -6906,8 +6904,9 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage)
if (newpage->mem_cgroup)
return;

- /* Swapcache readahead pages can get replaced before being charged */
memcg = oldpage->mem_cgroup;
+ /* Readahead page is charged too, to see if other page uncharged */
+ VM_WARN_ON_ONCE_PAGE(!memcg, oldpage);
if (!memcg)
return;

@@ -7104,7 +7103,8 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)

memcg = page->mem_cgroup;

- /* Readahead page, never charged */
+ /* Readahead page is charged too, to see if other page uncharged */
+ VM_WARN_ON_ONCE_PAGE(!memcg, page);
if (!memcg)
return;

@@ -7168,7 +7168,8 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)

memcg = page->mem_cgroup;

- /* Readahead page, never charged */
+ /* Readahead page is charged too, to see if other page uncharged */
+ VM_WARN_ON_ONCE_PAGE(!memcg, page);
if (!memcg)
return 0;

--
1.8.3.1


2020-08-11 11:14:34

by Alex Shi

[permalink] [raw]
Subject: [Resend PATCH 3/6] mm/thp: move lru_add_page_tail func to huge_memory.c

The func is only used in huge_memory.c, defining it in other file with a
CONFIG_TRANSPARENT_HUGEPAGE macro restrict just looks weird.

Let's move it THP. And make it static as Hugh Dickin suggested.

Signed-off-by: Alex Shi <[email protected]>
Reviewed-by: Kirill A. Shutemov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
include/linux/swap.h | 2 --
mm/huge_memory.c | 30 ++++++++++++++++++++++++++++++
mm/swap.c | 33 ---------------------------------
3 files changed, 30 insertions(+), 35 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 661046994db4..43e6b3458f58 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -338,8 +338,6 @@ extern void lru_note_cost(struct lruvec *lruvec, bool file,
unsigned int nr_pages);
extern void lru_note_cost_page(struct page *);
extern void lru_cache_add(struct page *);
-extern void lru_add_page_tail(struct page *page, struct page *page_tail,
- struct lruvec *lruvec, struct list_head *head);
extern void activate_page(struct page *);
extern void mark_page_accessed(struct page *);
extern void lru_add_drain(void);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 90733cefa528..bc905e7079bf 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2315,6 +2315,36 @@ static void remap_page(struct page *page)
}
}

+static void lru_add_page_tail(struct page *page, struct page *page_tail,
+ struct lruvec *lruvec, struct list_head *list)
+{
+ VM_BUG_ON_PAGE(!PageHead(page), page);
+ VM_BUG_ON_PAGE(PageCompound(page_tail), page);
+ VM_BUG_ON_PAGE(PageLRU(page_tail), page);
+ lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock);
+
+ if (!list)
+ SetPageLRU(page_tail);
+
+ if (likely(PageLRU(page)))
+ list_add_tail(&page_tail->lru, &page->lru);
+ else if (list) {
+ /* page reclaim is reclaiming a huge page */
+ get_page(page_tail);
+ list_add_tail(&page_tail->lru, list);
+ } else {
+ /*
+ * Head page has not yet been counted, as an hpage,
+ * so we must account for each subpage individually.
+ *
+ * Put page_tail on the list at the correct position
+ * so they all end up in order.
+ */
+ add_page_to_lru_list_tail(page_tail, lruvec,
+ page_lru(page_tail));
+ }
+}
+
static void __split_huge_page_tail(struct page *head, int tail,
struct lruvec *lruvec, struct list_head *list)
{
diff --git a/mm/swap.c b/mm/swap.c
index d16d65d9b4e0..c674fb441fe9 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -935,39 +935,6 @@ void __pagevec_release(struct pagevec *pvec)
}
EXPORT_SYMBOL(__pagevec_release);

-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-/* used by __split_huge_page_refcount() */
-void lru_add_page_tail(struct page *page, struct page *page_tail,
- struct lruvec *lruvec, struct list_head *list)
-{
- VM_BUG_ON_PAGE(!PageHead(page), page);
- VM_BUG_ON_PAGE(PageCompound(page_tail), page);
- VM_BUG_ON_PAGE(PageLRU(page_tail), page);
- lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock);
-
- if (!list)
- SetPageLRU(page_tail);
-
- if (likely(PageLRU(page)))
- list_add_tail(&page_tail->lru, &page->lru);
- else if (list) {
- /* page reclaim is reclaiming a huge page */
- get_page(page_tail);
- list_add_tail(&page_tail->lru, list);
- } else {
- /*
- * Head page has not yet been counted, as an hpage,
- * so we must account for each subpage individually.
- *
- * Put page_tail on the list at the correct position
- * so they all end up in order.
- */
- add_page_to_lru_list_tail(page_tail, lruvec,
- page_lru(page_tail));
- }
-}
-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-
static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
void *arg)
{
--
1.8.3.1

2020-08-11 11:15:28

by Alex Shi

[permalink] [raw]
Subject: [Resend PATCH 2/6] mm/memcg: remove useless check on page->mem_cgroup

If we disabled memcg by cgroup_disable=memory, the swap charges are
still called. Let's return from the funcs earlier and keep WARN_ON
monitor.

Signed-off-by: Alex Shi <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
mm/memcontrol.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 299382fc55a9..419cf565f40b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -7098,6 +7098,9 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
VM_BUG_ON_PAGE(PageLRU(page), page);
VM_BUG_ON_PAGE(page_count(page), page);

+ if (mem_cgroup_disabled())
+ return;
+
if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
return;

@@ -7163,6 +7166,9 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
struct mem_cgroup *memcg;
unsigned short oldid;

+ if (mem_cgroup_disabled())
+ return 0;
+
if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
return 0;

--
1.8.3.1

2020-08-11 11:31:17

by Michal Hocko

[permalink] [raw]
Subject: Re: [Resend PATCH 2/6] mm/memcg: remove useless check on page->mem_cgroup

subject line looks like a left over. It doesn't match the path. Did you
mean
memcg: bail out early from swap accounting when memcg is disabled?

Btw. if this patch was first in the series then you wouldn't need to
mention the warnings that would trigger based on your previous patch.
I am fine with both ways but mentioning the warning is usefule.

On Tue 11-08-20 19:10:28, Alex Shi wrote:
> If we disabled memcg by cgroup_disable=memory, the swap charges are
> still called. Let's return from the funcs earlier and keep WARN_ON
> monitor.
>
> Signed-off-by: Alex Shi <[email protected]>
> Reviewed-by: Roman Gushchin <[email protected]>
> Acked-by: Michal Hocko <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Vladimir Davydov <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> ---
> mm/memcontrol.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 299382fc55a9..419cf565f40b 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -7098,6 +7098,9 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
> VM_BUG_ON_PAGE(PageLRU(page), page);
> VM_BUG_ON_PAGE(page_count(page), page);
>
> + if (mem_cgroup_disabled())
> + return;
> +
> if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
> return;
>
> @@ -7163,6 +7166,9 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
> struct mem_cgroup *memcg;
> unsigned short oldid;
>
> + if (mem_cgroup_disabled())
> + return 0;
> +
> if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
> return 0;
>
> --
> 1.8.3.1

--
Michal Hocko
SUSE Labs

2020-08-11 12:44:13

by Alex Shi

[permalink] [raw]
Subject: Re: [Resend PATCH 2/6] mm/memcg: remove useless check on page->mem_cgroup



?? 2020/8/11 ????7:30, Michal Hocko д??:
> subject line looks like a left over. It doesn't match the path. Did you
> mean
> memcg: bail out early from swap accounting when memcg is disabled?

It's much better, Thanks for correction!

>
> Btw. if this patch was first in the series then you wouldn't need to
> mention the warnings that would trigger based on your previous patch.
> I am fine with both ways but mentioning the warning is usefule.

Right. but the patch is very simple, w/o warning message doesn't cuase trouble.
So, removing the 'and keep WARN_ON monitor' make sense too.

Do I need a resend for the commit log change?

Thanks a lot!
Alex

>
> On Tue 11-08-20 19:10:28, Alex Shi wrote:
>> If we disabled memcg by cgroup_disable=memory, the swap charges are
>> still called. Let's return from the funcs earlier and keep WARN_ON
>> monitor.
>>
>> Signed-off-by: Alex Shi <[email protected]>
>> Reviewed-by: Roman Gushchin <[email protected]>
>> Acked-by: Michal Hocko <[email protected]>
>> Cc: Johannes Weiner <[email protected]>
>> Cc: Michal Hocko <[email protected]>
>> Cc: Vladimir Davydov <[email protected]>
>> Cc: Andrew Morton <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> ---
>> mm/memcontrol.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 299382fc55a9..419cf565f40b 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -7098,6 +7098,9 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
>> VM_BUG_ON_PAGE(PageLRU(page), page);
>> VM_BUG_ON_PAGE(page_count(page), page);
>>
>> + if (mem_cgroup_disabled())
>> + return;
>> +
>> if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
>> return;
>>
>> @@ -7163,6 +7166,9 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
>> struct mem_cgroup *memcg;
>> unsigned short oldid;
>>
>> + if (mem_cgroup_disabled())
>> + return 0;
>> +
>> if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
>> return 0;
>>
>> --
>> 1.8.3.1
>

2020-08-11 12:56:54

by Alex Shi

[permalink] [raw]
Subject: Re: [Resend PATCH 2/6] mm/memcg: remove useless check on page->mem_cgroup

From beeac61119ab39b1869c520c0f272fb8bab93765 Mon Sep 17 00:00:00 2001
From: Alex Shi <[email protected]>
Date: Wed, 5 Aug 2020 21:02:30 +0800
Subject: [PATCH 2/6] memcg: bail out early from swap accounting when memcg is
disabled

If we disabled memcg by cgroup_disable=memory, the swap charges are
still called. Let's return from the funcs earlier.

Signed-off-by: Alex Shi <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
mm/memcontrol.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 299382fc55a9..419cf565f40b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -7098,6 +7098,9 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
VM_BUG_ON_PAGE(PageLRU(page), page);
VM_BUG_ON_PAGE(page_count(page), page);

+ if (mem_cgroup_disabled())
+ return;
+
if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
return;

@@ -7163,6 +7166,9 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
struct mem_cgroup *memcg;
unsigned short oldid;

+ if (mem_cgroup_disabled())
+ return 0;
+
if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
return 0;

--
1.8.3.1

2020-08-11 13:57:27

by Michal Hocko

[permalink] [raw]
Subject: Re: [Resend PATCH 2/6] mm/memcg: remove useless check on page->mem_cgroup

On Tue 11-08-20 20:54:18, Alex Shi wrote:
> >From beeac61119ab39b1869c520c0f272fb8bab93765 Mon Sep 17 00:00:00 2001
> From: Alex Shi <[email protected]>
> Date: Wed, 5 Aug 2020 21:02:30 +0800
> Subject: [PATCH 2/6] memcg: bail out early from swap accounting when memcg is
> disabled
>
> If we disabled memcg by cgroup_disable=memory, the swap charges are
> still called. Let's return from the funcs earlier.

They are not, are they? page->memcg will be NULL and so the charge is
skipped and that will trigger a warning with your current ordering.

Let me repeat again. Either you put it first in the series and argue
that we can bail out early or keep the ordering then this makes sure the
warning doesn't trigger.

> Signed-off-by: Alex Shi <[email protected]>
> Reviewed-by: Roman Gushchin <[email protected]>
> Acked-by: Michal Hocko <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Vladimir Davydov <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> ---
> mm/memcontrol.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 299382fc55a9..419cf565f40b 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -7098,6 +7098,9 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
> VM_BUG_ON_PAGE(PageLRU(page), page);
> VM_BUG_ON_PAGE(page_count(page), page);
>
> + if (mem_cgroup_disabled())
> + return;
> +
> if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
> return;
>
> @@ -7163,6 +7166,9 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
> struct mem_cgroup *memcg;
> unsigned short oldid;
>
> + if (mem_cgroup_disabled())
> + return 0;
> +
> if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
> return 0;
>
> --
> 1.8.3.1

--
Michal Hocko
SUSE Labs

2020-08-12 03:28:14

by Alex Shi

[permalink] [raw]
Subject: Re: [Resend PATCH 2/6] mm/memcg: remove useless check on page->mem_cgroup



?? 2020/8/11 ????9:56, Michal Hocko д??:
> On Tue 11-08-20 20:54:18, Alex Shi wrote:
>> >From beeac61119ab39b1869c520c0f272fb8bab93765 Mon Sep 17 00:00:00 2001
>> From: Alex Shi <[email protected]>
>> Date: Wed, 5 Aug 2020 21:02:30 +0800
>> Subject: [PATCH 2/6] memcg: bail out early from swap accounting when memcg is
>> disabled>>
>> If we disabled memcg by cgroup_disable=memory, the swap charges are
>> still called. Let's return from the funcs earlier.
> They are not, are they? page->memcg will be NULL and so the charge is
> skipped and that will trigger a warning with your current ordering.

Hi Michal,

Thanks for comment! Looks like we both agree the memcg wasn't charged, but funcs
just are called. :)

>
> Let me repeat again. Either you put it first in the series and argue
> that we can bail out early or keep the ordering then this makes sure the
> warning doesn't trigger.
>

Is the following commit log fine?

Thanks!
Alex

From 999b0fe5fc65865c3b59ff28500d45572a4a9570 Mon Sep 17 00:00:00 2001
From: Alex Shi <[email protected]>
Date: Wed, 5 Aug 2020 21:02:30 +0800
Subject: [PATCH 2/6] mm/memcg: bail out early from swap accounting when memcg
is disabled

If we disabled memcg by cgroup_disable=memory, page->memcg will be NULL
and so the charge is skipped and that will trigger a warning like below.
Let's return from the funcs earlier.

---[ end trace f1f34bfc3b32ed2f ]---
anon flags:0x5005b48008000d(locked|uptodate|dirty|swapbacked)
raw: 005005b48008000d dead000000000100 dead000000000122 ffff8897c7c76ad1
raw: 0000000000000022 0000000000000000 0000000200000000 0000000000000000
page dumped because: VM_WARN_ON_ONCE_PAGE(!memcg)

Signed-off-by: Alex Shi <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
mm/memcontrol.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 299382fc55a9..419cf565f40b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -7098,6 +7098,9 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
VM_BUG_ON_PAGE(PageLRU(page), page);
VM_BUG_ON_PAGE(page_count(page), page);

+ if (mem_cgroup_disabled())
+ return;
+
if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
return;

@@ -7163,6 +7166,9 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
struct mem_cgroup *memcg;
unsigned short oldid;

+ if (mem_cgroup_disabled())
+ return 0;
+
if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
return 0;

--
1.8.3.1

2020-08-13 06:22:08

by Michal Hocko

[permalink] [raw]
Subject: Re: [Resend PATCH 2/6] mm/memcg: remove useless check on page->mem_cgroup

On Wed 12-08-20 11:25:53, Alex Shi wrote:
> >From 999b0fe5fc65865c3b59ff28500d45572a4a9570 Mon Sep 17 00:00:00 2001
> From: Alex Shi <[email protected]>
> Date: Wed, 5 Aug 2020 21:02:30 +0800
> Subject: [PATCH 2/6] mm/memcg: bail out early from swap accounting when memcg
> is disabled
>
> If we disabled memcg by cgroup_disable=memory, page->memcg will be NULL
> and so the charge is skipped and that will trigger a warning like below.
> Let's return from the funcs earlier.
>
> ---[ end trace f1f34bfc3b32ed2f ]---
> anon flags:0x5005b48008000d(locked|uptodate|dirty|swapbacked)
> raw: 005005b48008000d dead000000000100 dead000000000122 ffff8897c7c76ad1
> raw: 0000000000000022 0000000000000000 0000000200000000 0000000000000000
> page dumped because: VM_WARN_ON_ONCE_PAGE(!memcg)

Yes this is better. It would be even more informative if you added the
backtrace.

> Signed-off-by: Alex Shi <[email protected]>
> Reviewed-by: Roman Gushchin <[email protected]>
> Acked-by: Michal Hocko <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Vladimir Davydov <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> ---
> mm/memcontrol.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 299382fc55a9..419cf565f40b 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -7098,6 +7098,9 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
> VM_BUG_ON_PAGE(PageLRU(page), page);
> VM_BUG_ON_PAGE(page_count(page), page);
>
> + if (mem_cgroup_disabled())
> + return;
> +
> if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
> return;
>
> @@ -7163,6 +7166,9 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
> struct mem_cgroup *memcg;
> unsigned short oldid;
>
> + if (mem_cgroup_disabled())
> + return 0;
> +
> if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
> return 0;
>
> --
> 1.8.3.1

--
Michal Hocko
SUSE Labs

2020-08-13 09:48:58

by Alex Shi

[permalink] [raw]
Subject: Re: [Resend PATCH 2/6] mm/memcg: remove useless check on page->mem_cgroup



?? 2020/8/13 ????2:20, Michal Hocko д??:
> On Wed 12-08-20 11:25:53, Alex Shi wrote:
>> >From 999b0fe5fc65865c3b59ff28500d45572a4a9570 Mon Sep 17 00:00:00 2001
>> From: Alex Shi <[email protected]>
>> Date: Wed, 5 Aug 2020 21:02:30 +0800
>> Subject: [PATCH 2/6] mm/memcg: bail out early from swap accounting when memcg
>> is disabled
>>
>> If we disabled memcg by cgroup_disable=memory, page->memcg will be NULL
>> and so the charge is skipped and that will trigger a warning like below.
>> Let's return from the funcs earlier.
>>
>> ---[ end trace f1f34bfc3b32ed2f ]---
>> anon flags:0x5005b48008000d(locked|uptodate|dirty|swapbacked)
>> raw: 005005b48008000d dead000000000100 dead000000000122 ffff8897c7c76ad1
>> raw: 0000000000000022 0000000000000000 0000000200000000 0000000000000000
>> page dumped because: VM_WARN_ON_ONCE_PAGE(!memcg)
>
> Yes this is better. It would be even more informative if you added the
> backtrace.

The stack is a bit long.
I still don't know where cause the asm_exc_page_fault? And seems the vma is from
kernel not user.

From 999b0fe5fc65865c3b59ff28500d45572a4a9570 Mon Sep 17 00:00:00 2001
From: Alex Shi <[email protected]>
Date: Wed, 5 Aug 2020 21:02:30 +0800
Subject: [PATCH 2/6] mm/memcg: bail out early from swap accounting when memcg
is disabled

If we disabled memcg by cgroup_disable=memory, page->memcg will be NULL
and so the charge is skipped and that will trigger a warning like below.
Let's return from the funcs earlier.

anon flags:0x5005b48008000d(locked|uptodate|dirty|swapbacked)
raw: 005005b48008000d dead000000000100 dead000000000122 ffff8897c7c76ad1
raw: 0000000000000022 0000000000000000 0000000200000000 0000000000000000
page dumped because: VM_WARN_ON_ONCE_PAGE(!memcg)
...
RIP: 0010:vprintk_emit+0x1f7/0x260
Code: 00 84 d2 74 72 0f b6 15 27 58 64 01 48 c7 c0 00 d4 72 82 84 d2 74 09 f3 90 0f b6 10 84 d2 75 f7 e8 de 0d 00 00 4c 89 e7 57 9d <0f> 1f 44 00 00 e9 62 ff ff ff 80 3d 88 c9 3a 01 00 0f 85 54 fe ff
RSP: 0018:ffffc9000faab358 EFLAGS: 00000202
RAX: ffffffff8272d400 RBX: 000000000000005e RCX: ffff88afd80d0040
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000202
RBP: ffffc9000faab3a8 R08: ffffffff8272d440 R09: 0000000000022480
R10: 00120c77be68bfac R11: 0000000000cd7568 R12: 0000000000000202
R13: 0057ffffc0080005 R14: ffffffff820a0130 R15: ffffc9000faab3e8
? vprintk_emit+0x140/0x260
vprintk_default+0x1a/0x20
vprintk_func+0x4f/0xc4
? vprintk_func+0x4f/0xc4
printk+0x53/0x6a
? xas_load+0xc/0x80
__dump_page.cold.6+0xff/0x4ee
? xas_init_marks+0x23/0x50
? xas_store+0x30/0x40
? free_swap_slot+0x43/0xd0
? put_swap_page+0x119/0x320
? update_load_avg+0x82/0x580
dump_page+0x9/0xb
mem_cgroup_try_charge_swap+0x16e/0x1d0
get_swap_page+0x130/0x210
add_to_swap+0x41/0xc0
shrink_page_list+0x99e/0xdf0
shrink_inactive_list+0x199/0x360
shrink_lruvec+0x40d/0x650
? _cond_resched+0x14/0x30
? _cond_resched+0x14/0x30
shrink_node+0x226/0x6e0
do_try_to_free_pages+0xd0/0x400
try_to_free_pages+0xef/0x130
__alloc_pages_slowpath.constprop.127+0x38d/0xbd0
? ___slab_alloc+0x31d/0x6f0
__alloc_pages_nodemask+0x27f/0x2c0
alloc_pages_vma+0x75/0x220
shmem_alloc_page+0x46/0x90
? release_pages+0x1ae/0x410
shmem_alloc_and_acct_page+0x77/0x1c0
shmem_getpage_gfp+0x162/0x910
shmem_fault+0x74/0x210
? filemap_map_pages+0x29c/0x410
__do_fault+0x37/0x190
handle_mm_fault+0x120a/0x1770
exc_page_fault+0x251/0x450
? asm_exc_page_fault+0x8/0x30
asm_exc_page_fault+0x1e/0x30

Signed-off-by: Alex Shi <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
mm/memcontrol.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 299382fc55a9..419cf565f40b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -7098,6 +7098,9 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
VM_BUG_ON_PAGE(PageLRU(page), page);
VM_BUG_ON_PAGE(page_count(page), page);

+ if (mem_cgroup_disabled())
+ return;
+
if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
return;

@@ -7163,6 +7166,9 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
struct mem_cgroup *memcg;
unsigned short oldid;

+ if (mem_cgroup_disabled())
+ return 0;
+
if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
return 0;

--
1.8.3.1

2020-08-13 11:02:42

by Michal Hocko

[permalink] [raw]
Subject: Re: [Resend PATCH 2/6] mm/memcg: remove useless check on page->mem_cgroup

On Thu 13-08-20 17:45:19, Alex Shi wrote:
>
>
> 在 2020/8/13 下午2:20, Michal Hocko 写道:
> > On Wed 12-08-20 11:25:53, Alex Shi wrote:
> >> >From 999b0fe5fc65865c3b59ff28500d45572a4a9570 Mon Sep 17 00:00:00 2001
> >> From: Alex Shi <[email protected]>
> >> Date: Wed, 5 Aug 2020 21:02:30 +0800
> >> Subject: [PATCH 2/6] mm/memcg: bail out early from swap accounting when memcg
> >> is disabled
> >>
> >> If we disabled memcg by cgroup_disable=memory, page->memcg will be NULL
> >> and so the charge is skipped and that will trigger a warning like below.
> >> Let's return from the funcs earlier.
> >>
> >> ---[ end trace f1f34bfc3b32ed2f ]---
> >> anon flags:0x5005b48008000d(locked|uptodate|dirty|swapbacked)
> >> raw: 005005b48008000d dead000000000100 dead000000000122 ffff8897c7c76ad1
> >> raw: 0000000000000022 0000000000000000 0000000200000000 0000000000000000
> >> page dumped because: VM_WARN_ON_ONCE_PAGE(!memcg)
> >
> > Yes this is better. It would be even more informative if you added the
> > backtrace.
>
> The stack is a bit long.

This doesn't matter. It is informative and potentially useful for future
reference.

Thanks!
--
Michal Hocko
SUSE Labs

2020-08-20 15:03:41

by Qian Cai

[permalink] [raw]
Subject: Re: [Resend PATCH 1/6] mm/memcg: warning on !memcg after readahead page charged

On Tue, Aug 11, 2020 at 07:10:27PM +0800, Alex Shi wrote:
> Since readahead page is charged on memcg too, in theory we don't have to
> check this exception now. Before safely remove them all, add a warning
> for the unexpected !memcg.
>
> Signed-off-by: Alex Shi <[email protected]>
> Acked-by: Michal Hocko <[email protected]>

This will trigger,

[ 1863.916499] LTP: starting move_pages12
[ 1863.946520] page:000000008ccc1062 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1fd3c0
[ 1863.946553] head:000000008ccc1062 order:5 compound_mapcount:0 compound_pincount:0
[ 1863.946568] anon flags: 0x7fff800001000d(locked|uptodate|dirty|head)
[ 1863.946584] raw: 007fff800001000d c000000016ebfcd8 c000000016ebfcd8 c000001feaf46d59
[ 1863.946609] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 1863.946632] page dumped because: VM_WARN_ON_ONCE_PAGE(!memcg)
[ 1863.946669] ------------[ cut here ]------------
[ 1863.946694] WARNING: CPU: 16 PID: 35307 at mm/memcontrol.c:6908 mem_cgroup_migrate+0x5f8/0x610
[ 1863.946708] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_spapr_tce vfio vfio_spapr_eeh loop kvm_hv kvm ip_tables x_tables sd_mod bnx2x tg3 ahci libahci libphy mdio firmware_class libata dm_mirror dm_region_hash dm_log dm_mod
[ 1863.946801] CPU: 16 PID: 35307 Comm: move_pages12 Not tainted 5.9.0-rc1-next-20200820 #4
[ 1863.946834] NIP: c0000000003fcb48 LR: c0000000003fcb38 CTR: 0000000000000000
[ 1863.946856] REGS: c000000016ebf6f0 TRAP: 0700 Not tainted (5.9.0-rc1-next-20200820)
[ 1863.946879] MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28242882 XER: 00000000
[ 1863.946915] CFAR: c00000000032c644 IRQMASK: 0
GPR00: c0000000003fcb38 c000000016ebf980 c000000005923200 0000000000000031
GPR04: 0000000000000000 0000000000000000 0000000000000027 c000001ffd727190
GPR08: 0000000000000023 0000000000000001 c0000000058f3200 0000000000000001
GPR12: 0000000000002000 c000001ffffe3800 c000000000b26a68 0000000000000000
GPR16: c000000016ebfc20 c000000016ebfcd8 0000000000000020 0000000000000001
GPR20: c00c00080724f000 c0000000003c8770 0000000000000000 c000000016ebfcd0
GPR24: 0000000000000000 fffffffffffffff5 0000000000000002 0000000000000000
GPR28: 0000000000000000 0000000000000001 0000000000000000 c00c000007f4f000
[ 1863.947142] NIP [c0000000003fcb48] mem_cgroup_migrate+0x5f8/0x610
[ 1863.947164] LR [c0000000003fcb38] mem_cgroup_migrate+0x5e8/0x610
[ 1863.947185] Call Trace:
[ 1863.947203] [c000000016ebf980] [c0000000003fcb38] mem_cgroup_migrate+0x5e8/0x610 (unreliable)
[ 1863.947241] [c000000016ebf9c0] [c0000000003c9080] migrate_page_states+0x4e0/0xce0
[ 1863.947274] [c000000016ebf9f0] [c0000000003cbbec] migrate_page+0x8c/0x120
[ 1863.947307] [c000000016ebfa30] [c0000000003ccf10] move_to_new_page+0x190/0x670
[ 1863.947341] [c000000016ebfaf0] [c0000000003ced08] migrate_pages+0xfb8/0x1880
[ 1863.947365] [c000000016ebfc00] [c0000000003cf670] move_pages_and_store_status.isra.45+0xa0/0x160
[ 1863.947399] [c000000016ebfc80] [c0000000003cfef4] sys_move_pages+0x7c4/0xed0
[ 1863.947434] [c000000016ebfdc0] [c00000000002c678] system_call_exception+0xf8/0x1d0
[ 1863.947459] [c000000016ebfe20] [c00000000000d0a8] system_call_common+0xe8/0x218
[ 1863.947481] Instruction dump:
[ 1863.947502] 7fc3f378 4bfee82d 7c0802a6 3c82fb20 7fe3fb78 38844fc8 f8010050 4bf2fad5
[ 1863.947527] 60000000 39200001 3d42fffd 992a82fb <0fe00000> e8010050 eb810020 7c0803a6
[ 1863.947563] CPU: 16 PID: 35307 Comm: move_pages12 Not tainted 5.9.0-rc1-next-20200820 #4
[ 1863.947594] Call Trace:
[ 1863.947615] [c000000016ebf4d0] [c0000000006f6008] dump_stack+0xfc/0x174 (unreliable)
[ 1863.947642] [c000000016ebf520] [c0000000000c9004] __warn+0xc4/0x14c
[ 1863.947665] [c000000016ebf5b0] [c0000000006f4b68] report_bug+0x108/0x1f0
[ 1863.947689] [c000000016ebf650] [c0000000000234f4] program_check_exception+0x104/0x2e0
[ 1863.947724] [c000000016ebf680] [c000000000009664] program_check_common_virt+0x2c4/0x310
[ 1863.947751] --- interrupt: 700 at mem_cgroup_migrate+0x5f8/0x610
LR = mem_cgroup_migrate+0x5e8/0x610
[ 1863.947786] [c000000016ebf9c0] [c0000000003c9080] migrate_page_states+0x4e0/0xce0
[ 1863.947810] [c000000016ebf9f0] [c0000000003cbbec] migrate_page+0x8c/0x120
[ 1863.947843] [c000000016ebfa30] [c0000000003ccf10] move_to_new_page+0x190/0x670
[ 1863.947867] [c000000016ebfaf0] [c0000000003ced08] migrate_pages+0xfb8/0x1880
[ 1863.947901] [c000000016ebfc00] [c0000000003cf670] move_pages_and_store_status.isra.45+0xa0/0x160
[ 1863.947936] [c000000016ebfc80] [c0000000003cfef4] sys_move_pages+0x7c4/0xed0
[ 1863.947969] [c000000016ebfdc0] [c00000000002c678] system_call_exception+0xf8/0x1d0
[ 1863.948002] [c000000016ebfe20] [c00000000000d0a8] system_call_common+0xe8/0x218
[ 1863.948034] irq event stamp: 410
[ 1863.948054] hardirqs last enabled at (409): [<c000000000184564>] console_unlock+0x6b4/0x990
[ 1863.948092] hardirqs last disabled at (410): [<c00000000000965c>] program_check_common_virt+0x2bc/0x310
[ 1863.948126] softirqs last enabled at (0): [<c0000000000c59a8>] copy_process+0x788/0x1950
[ 1863.948229] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 1863.948316] ---[ end trace 74f8f4df751b0259 ]---

> Cc: Johannes Weiner <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Vladimir Davydov <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> ---
> include/linux/mmdebug.h | 13 +++++++++++++
> mm/memcontrol.c | 15 ++++++++-------
> 2 files changed, 21 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
> index 2ad72d2c8cc5..4ed52879ce55 100644
> --- a/include/linux/mmdebug.h
> +++ b/include/linux/mmdebug.h
> @@ -37,6 +37,18 @@
> BUG(); \
> } \
> } while (0)
> +#define VM_WARN_ON_ONCE_PAGE(cond, page) ({ \
> + static bool __section(.data.once) __warned; \
> + int __ret_warn_once = !!(cond); \
> + \
> + if (unlikely(__ret_warn_once && !__warned)) { \
> + dump_page(page, "VM_WARN_ON_ONCE_PAGE(" __stringify(cond)")");\
> + __warned = true; \
> + WARN_ON(1); \
> + } \
> + unlikely(__ret_warn_once); \
> +})
> +
> #define VM_WARN_ON(cond) (void)WARN_ON(cond)
> #define VM_WARN_ON_ONCE(cond) (void)WARN_ON_ONCE(cond)
> #define VM_WARN_ONCE(cond, format...) (void)WARN_ONCE(cond, format)
> @@ -48,6 +60,7 @@
> #define VM_BUG_ON_MM(cond, mm) VM_BUG_ON(cond)
> #define VM_WARN_ON(cond) BUILD_BUG_ON_INVALID(cond)
> #define VM_WARN_ON_ONCE(cond) BUILD_BUG_ON_INVALID(cond)
> +#define VM_WARN_ON_ONCE_PAGE(cond, page) BUILD_BUG_ON_INVALID(cond)
> #define VM_WARN_ONCE(cond, format...) BUILD_BUG_ON_INVALID(cond)
> #define VM_WARN(cond, format...) BUILD_BUG_ON_INVALID(cond)
> #endif
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 130093bdf74b..299382fc55a9 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1322,10 +1322,8 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd
> }
>
> memcg = page->mem_cgroup;
> - /*
> - * Swapcache readahead pages are added to the LRU - and
> - * possibly migrated - before they are charged.
> - */
> + /* Readahead page is charged too, to see if other page uncharged */
> + VM_WARN_ON_ONCE_PAGE(!memcg, page);
> if (!memcg)
> memcg = root_mem_cgroup;
>
> @@ -6906,8 +6904,9 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage)
> if (newpage->mem_cgroup)
> return;
>
> - /* Swapcache readahead pages can get replaced before being charged */
> memcg = oldpage->mem_cgroup;
> + /* Readahead page is charged too, to see if other page uncharged */
> + VM_WARN_ON_ONCE_PAGE(!memcg, oldpage);
> if (!memcg)
> return;
>
> @@ -7104,7 +7103,8 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
>
> memcg = page->mem_cgroup;
>
> - /* Readahead page, never charged */
> + /* Readahead page is charged too, to see if other page uncharged */
> + VM_WARN_ON_ONCE_PAGE(!memcg, page);
> if (!memcg)
> return;
>
> @@ -7168,7 +7168,8 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
>
> memcg = page->mem_cgroup;
>
> - /* Readahead page, never charged */
> + /* Readahead page is charged too, to see if other page uncharged */
> + VM_WARN_ON_ONCE_PAGE(!memcg, page);
> if (!memcg)
> return 0;
>
> --
> 1.8.3.1
>
>

2020-08-21 08:04:11

by Michal Hocko

[permalink] [raw]
Subject: Re: [Resend PATCH 1/6] mm/memcg: warning on !memcg after readahead page charged

On Thu 20-08-20 10:58:51, Qian Cai wrote:
> On Tue, Aug 11, 2020 at 07:10:27PM +0800, Alex Shi wrote:
> > Since readahead page is charged on memcg too, in theory we don't have to
> > check this exception now. Before safely remove them all, add a warning
> > for the unexpected !memcg.
> >
> > Signed-off-by: Alex Shi <[email protected]>
> > Acked-by: Michal Hocko <[email protected]>
>
> This will trigger,

Thanks for the report!

> [ 1863.916499] LTP: starting move_pages12
> [ 1863.946520] page:000000008ccc1062 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1fd3c0
> [ 1863.946553] head:000000008ccc1062 order:5 compound_mapcount:0 compound_pincount:0

Hmm, this is really unexpected. How did we get order-5 page here? Is
this some special mappaing that sys_move_pages should just ignore?

> [ 1863.946568] anon flags: 0x7fff800001000d(locked|uptodate|dirty|head)
> [ 1863.946584] raw: 007fff800001000d c000000016ebfcd8 c000000016ebfcd8 c000001feaf46d59
> [ 1863.946609] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> [ 1863.946632] page dumped because: VM_WARN_ON_ONCE_PAGE(!memcg)
> [ 1863.946669] ------------[ cut here ]------------
> [ 1863.946694] WARNING: CPU: 16 PID: 35307 at mm/memcontrol.c:6908 mem_cgroup_migrate+0x5f8/0x610
> [ 1863.946708] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_spapr_tce vfio vfio_spapr_eeh loop kvm_hv kvm ip_tables x_tables sd_mod bnx2x tg3 ahci libahci libphy mdio firmware_class libata dm_mirror dm_region_hash dm_log dm_mod
> [ 1863.946801] CPU: 16 PID: 35307 Comm: move_pages12 Not tainted 5.9.0-rc1-next-20200820 #4
> [ 1863.946834] NIP: c0000000003fcb48 LR: c0000000003fcb38 CTR: 0000000000000000
> [ 1863.946856] REGS: c000000016ebf6f0 TRAP: 0700 Not tainted (5.9.0-rc1-next-20200820)
> [ 1863.946879] MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28242882 XER: 00000000
> [ 1863.946915] CFAR: c00000000032c644 IRQMASK: 0
> GPR00: c0000000003fcb38 c000000016ebf980 c000000005923200 0000000000000031
> GPR04: 0000000000000000 0000000000000000 0000000000000027 c000001ffd727190
> GPR08: 0000000000000023 0000000000000001 c0000000058f3200 0000000000000001
> GPR12: 0000000000002000 c000001ffffe3800 c000000000b26a68 0000000000000000
> GPR16: c000000016ebfc20 c000000016ebfcd8 0000000000000020 0000000000000001
> GPR20: c00c00080724f000 c0000000003c8770 0000000000000000 c000000016ebfcd0
> GPR24: 0000000000000000 fffffffffffffff5 0000000000000002 0000000000000000
> GPR28: 0000000000000000 0000000000000001 0000000000000000 c00c000007f4f000
> [ 1863.947142] NIP [c0000000003fcb48] mem_cgroup_migrate+0x5f8/0x610
> [ 1863.947164] LR [c0000000003fcb38] mem_cgroup_migrate+0x5e8/0x610
> [ 1863.947185] Call Trace:
> [ 1863.947203] [c000000016ebf980] [c0000000003fcb38] mem_cgroup_migrate+0x5e8/0x610 (unreliable)
> [ 1863.947241] [c000000016ebf9c0] [c0000000003c9080] migrate_page_states+0x4e0/0xce0
> [ 1863.947274] [c000000016ebf9f0] [c0000000003cbbec] migrate_page+0x8c/0x120
> [ 1863.947307] [c000000016ebfa30] [c0000000003ccf10] move_to_new_page+0x190/0x670
> [ 1863.947341] [c000000016ebfaf0] [c0000000003ced08] migrate_pages+0xfb8/0x1880
> [ 1863.947365] [c000000016ebfc00] [c0000000003cf670] move_pages_and_store_status.isra.45+0xa0/0x160
> [ 1863.947399] [c000000016ebfc80] [c0000000003cfef4] sys_move_pages+0x7c4/0xed0
> [ 1863.947434] [c000000016ebfdc0] [c00000000002c678] system_call_exception+0xf8/0x1d0
> [ 1863.947459] [c000000016ebfe20] [c00000000000d0a8] system_call_common+0xe8/0x218
> [ 1863.947481] Instruction dump:
> [ 1863.947502] 7fc3f378 4bfee82d 7c0802a6 3c82fb20 7fe3fb78 38844fc8 f8010050 4bf2fad5
> [ 1863.947527] 60000000 39200001 3d42fffd 992a82fb <0fe00000> e8010050 eb810020 7c0803a6
> [ 1863.947563] CPU: 16 PID: 35307 Comm: move_pages12 Not tainted 5.9.0-rc1-next-20200820 #4
> [ 1863.947594] Call Trace:
> [ 1863.947615] [c000000016ebf4d0] [c0000000006f6008] dump_stack+0xfc/0x174 (unreliable)
> [ 1863.947642] [c000000016ebf520] [c0000000000c9004] __warn+0xc4/0x14c
> [ 1863.947665] [c000000016ebf5b0] [c0000000006f4b68] report_bug+0x108/0x1f0
> [ 1863.947689] [c000000016ebf650] [c0000000000234f4] program_check_exception+0x104/0x2e0
> [ 1863.947724] [c000000016ebf680] [c000000000009664] program_check_common_virt+0x2c4/0x310
> [ 1863.947751] --- interrupt: 700 at mem_cgroup_migrate+0x5f8/0x610
> LR = mem_cgroup_migrate+0x5e8/0x610
> [ 1863.947786] [c000000016ebf9c0] [c0000000003c9080] migrate_page_states+0x4e0/0xce0
> [ 1863.947810] [c000000016ebf9f0] [c0000000003cbbec] migrate_page+0x8c/0x120
> [ 1863.947843] [c000000016ebfa30] [c0000000003ccf10] move_to_new_page+0x190/0x670
> [ 1863.947867] [c000000016ebfaf0] [c0000000003ced08] migrate_pages+0xfb8/0x1880
> [ 1863.947901] [c000000016ebfc00] [c0000000003cf670] move_pages_and_store_status.isra.45+0xa0/0x160
> [ 1863.947936] [c000000016ebfc80] [c0000000003cfef4] sys_move_pages+0x7c4/0xed0
> [ 1863.947969] [c000000016ebfdc0] [c00000000002c678] system_call_exception+0xf8/0x1d0
> [ 1863.948002] [c000000016ebfe20] [c00000000000d0a8] system_call_common+0xe8/0x218
> [ 1863.948034] irq event stamp: 410
> [ 1863.948054] hardirqs last enabled at (409): [<c000000000184564>] console_unlock+0x6b4/0x990
> [ 1863.948092] hardirqs last disabled at (410): [<c00000000000965c>] program_check_common_virt+0x2bc/0x310
> [ 1863.948126] softirqs last enabled at (0): [<c0000000000c59a8>] copy_process+0x788/0x1950
> [ 1863.948229] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [ 1863.948316] ---[ end trace 74f8f4df751b0259 ]---
--
Michal Hocko
SUSE Labs

2020-08-21 12:43:07

by Qian Cai

[permalink] [raw]
Subject: Re: [Resend PATCH 1/6] mm/memcg: warning on !memcg after readahead page charged

On Fri, Aug 21, 2020 at 10:01:27AM +0200, Michal Hocko wrote:
> On Thu 20-08-20 10:58:51, Qian Cai wrote:
> > On Tue, Aug 11, 2020 at 07:10:27PM +0800, Alex Shi wrote:
> > > Since readahead page is charged on memcg too, in theory we don't have to
> > > check this exception now. Before safely remove them all, add a warning
> > > for the unexpected !memcg.
> > >
> > > Signed-off-by: Alex Shi <[email protected]>
> > > Acked-by: Michal Hocko <[email protected]>
> >
> > This will trigger,
>
> Thanks for the report!
>
> > [ 1863.916499] LTP: starting move_pages12
> > [ 1863.946520] page:000000008ccc1062 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1fd3c0
> > [ 1863.946553] head:000000008ccc1062 order:5 compound_mapcount:0 compound_pincount:0
>
> Hmm, this is really unexpected. How did we get order-5 page here? Is
> this some special mappaing that sys_move_pages should just ignore?

Well, I thought everybody should be able to figure out where to find the LTP
tests source code at this stage to see what it does. Anyway, the test simply
migrate hugepages while soft offlining, so order 5 is expected as that is 2M
hugepage on powerpc (also reproduced on x86 below). It might be easier to
reproduce using our linux-mm random bug collection on NUMA systems.

# git clone https://gitlab.com/cailca/linux-mm
# cd linux-mm; make
# ./random 1

The main code is here:

https://gitlab.com/cailca/linux-mm/-/blob/master/random.c#L786

Reproduced on x86 as well:

[ 314.171411][ T1762] Offlined Pages 524288
[ 315.265413][ T1762] Soft offlining pfn 0x1e86e00 at process virtual address 0x7f221b800000
[ 315.307179][ T1762] soft offline: 0x1e86e00: hugepage isolation failed: 0, page count 2, type 3bfffc00001000f (locked|referenced|uptodate|dirty|head)
[ 315.372397][ T1762] Soft offlining pfn 0x1880000 at process virtual address 0x7f221ba00000
[ 315.372788][ T1939] page:000000004b7fe362 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1e86e00
[ 315.461283][ T1939] head:000000004b7fe362 order:9 compound_mapcount:0 compound_pincount:0
[ 315.501050][ T1939] anon flags: 0x3bfffc00001000f(locked|referenced|uptodate|dirty|head)
[ 315.539977][ T1939] raw: 03bfffc00001000f ffffc9000aaefe30 ffffc9000aaefe30 ffff888fefec5a49
[ 315.580841][ T1939] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 315.621327][ T1939] page dumped because: VM_WARN_ON_ONCE_PAGE(!memcg)
[ 315.677964][ T1939] WARNING: CPU: 30 PID: 1939 at mm/memcontrol.c:6908 mem_cgroup_migrate+0x50e/0x850
[ 315.722090][ T1939] Modules linked in: nls_ascii nls_cp437 vfat fat kvm_intel kvm irqbypass efivars ip_tables x_tables sd_mod bnx2x hpsa mdio scsi_transport_sas firmware_class dm_mirror dm_region_hash dm_log dm_mod efivarfs
[ 315.819298][ T1939] CPU: 30 PID: 1939 Comm: random Not tainted 5.9.0-rc1-next-20200821 #2
[ 315.858186][ T1939] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018
[ 315.894272][ T1939] RIP: 0010:mem_cgroup_migrate+0x50e/0x850
[ 315.922436][ T1939] Code: 2d c0 5c 5d 06 40 80 fd 01 0f 87 3f 2d 00 00 83 e5 01 75 18 48 c7 c6 80 14 4e ad 48 89 df e8 99 4f eb ff c6 05 9b 5c 5d 06 01 <0f> 0b 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c6 60 04
[ 316.018174][ T1939] RSP: 0018:ffffc9000aaefa98 EFLAGS: 00010296
[ 316.047132][ T1939] RAX: 0000000000000000 RBX: ffffea007a1b8000 RCX: ffffffffac899a12
[ 316.084567][ T1939] RDX: 1ffffd400f437007 RSI: 0000000000000000 RDI: ffffea007a1b8038
[ 316.122817][ T1939] RBP: 0000000000000000 R08: ffffed120bf75e7a R09: ffffed120bf75e7a
[ 316.160571][ T1939] R10: ffff88905fbaf3cf R11: ffffed120bf75e79 R12: 0000000000000000
[ 316.198278][ T1939] R13: ffffea000e108038 R14: ffffea000e108008 R15: 0000000000000001
[ 316.235465][ T1939] FS: 00007f221c461740(0000) GS:ffff88905fb80000(0000) knlGS:0000000000000000
[ 316.277658][ T1939] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 316.309711][ T1939] CR2: 000055acde5279d8 CR3: 0000000fec490003 CR4: 00000000001706e0
[ 316.349521][ T1939] Call Trace:
[ 316.364537][ T1939] ? migrate_page_states+0xb4c/0x1970
[ 316.389115][ T1939] migrate_page+0xea/0x190
[ 316.409463][ T1939] move_to_new_page+0x338/0xca0
[ 316.432024][ T1939] ? remove_migration_ptes+0xd0/0xd0
[ 316.456464][ T1939] ? __page_mapcount+0x19a/0x250
[ 316.479465][ T1939] ? try_to_unmap+0x1bf/0x2d0
[ 316.501801][ T1939] ? rmap_walk_locked+0x140/0x140
[ 316.524702][ T1939] ? PageHuge+0xf/0xd0
[ 316.543478][ T1939] ? page_mapped+0x155/0x2e0
[ 316.564341][ T1939] ? hugetlb_page_mapping_lock_write+0x97/0x180
[ 316.593160][ T1939] migrate_pages+0x1496/0x2290
[ 316.615520][ T1939] ? remove_migration_pte+0xac0/0xac0
[ 316.640773][ T1939] move_pages_and_store_status.isra.47+0xd7/0x1a0
[ 316.670470][ T1939] ? migrate_pages+0x2290/0x2290
[ 316.693733][ T1939] __x64_sys_move_pages+0x8b7/0x1180
[ 316.717974][ T1939] ? move_pages_and_store_status.isra.47+0x1a0/0x1a0
[ 316.749559][ T1939] ? syscall_enter_from_user_mode+0x1b/0x210
[ 316.776749][ T1939] ? lockdep_hardirqs_on_prepare+0x33e/0x4e0
[ 316.804892][ T1939] ? syscall_enter_from_user_mode+0x20/0x210
[ 316.833657][ T1939] ? trace_hardirqs_on+0x20/0x1b5
[ 316.858450][ T1939] do_syscall_64+0x33/0x40
[ 316.879205][ T1939] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 316.905709][ T1939] RIP: 0033:0x7f221bd5d6ed
[ 316.925937][ T1939] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6b 57 2c 00 f7 d8 64 89 01 48
[ 317.017819][ T1939] RSP: 002b:00007ffff6c4d448 EFLAGS: 00000212 ORIG_RAX: 0000000000000117
[ 317.057248][ T1939] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f221bd5d6ed
[ 317.094612][ T1939] RDX: 000000000151bc40 RSI: 0000000000000400 RDI: 00000000000006e2
[ 317.131770][ T1939] RBP: 00007ffff6c4d4b0 R08: 000000000151ac30 R09: 0000000000000004
[ 317.169009][ T1939] R10: 0000000001519c20 R11: 0000000000000212 R12: 0000000000401cb0
[ 317.206463][ T1939] R13: 00007ffff6c58970 R14: 0000000000000000 R15: 0000000000000000
[ 317.243922][ T1939] CPU: 30 PID: 1939 Comm: random Not tainted 5.9.0-rc1-next-20200821 #2
[ 317.282806][ T1939] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018
[ 317.318693][ T1939] Call Trace:
[ 317.334040][ T1939] dump_stack+0x9d/0xe0
[ 317.354529][ T1939] __warn.cold.13+0xe/0x57
[ 317.376117][ T1939] ? mem_cgroup_migrate+0x50e/0x850
[ 317.400959][ T1939] report_bug+0x1af/0x260
[ 317.420960][ T1939] handle_bug+0x44/0x80
[ 317.439737][ T1939] exc_invalid_op+0x13/0x40
[ 317.460081][ T1939] asm_exc_invalid_op+0x12/0x20
[ 317.483336][ T1939] RIP: 0010:mem_cgroup_migrate+0x50e/0x850
[ 317.510309][ T1939] Code: 2d c0 5c 5d 06 40 80 fd 01 0f 87 3f 2d 00 00 83 e5 01 75 18 48 c7 c6 80 14 4e ad 48 89 df e8 99 4f eb ff c6 05 9b 5c 5d 06 01 <0f> 0b 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c6 60 04
[ 317.603479][ T1939] RSP: 0018:ffffc9000aaefa98 EFLAGS: 00010296
[ 317.630605][ T1939] RAX: 0000000000000000 RBX: ffffea007a1b8000 RCX: ffffffffac899a12
[ 317.666432][ T1939] RDX: 1ffffd400f437007 RSI: 0000000000000000 RDI: ffffea007a1b8038
[ 317.703630][ T1939] RBP: 0000000000000000 R08: ffffed120bf75e7a R09: ffffed120bf75e7a
[ 317.739491][ T1939] R10: ffff88905fbaf3cf R11: ffffed120bf75e79 R12: 0000000000000000
[ 317.777498][ T1939] R13: ffffea000e108038 R14: ffffea000e108008 R15: 0000000000000001
[ 317.815155][ T1939] ? llist_add_batch+0x52/0x90
[ 317.837438][ T1939] ? mem_cgroup_migrate+0x507/0x850
[ 317.862628][ T1939] ? migrate_page_states+0xb4c/0x1970
[ 317.888821][ T1939] migrate_page+0xea/0x190
[ 317.910213][ T1939] move_to_new_page+0x338/0xca0
[ 317.932614][ T1939] ? remove_migration_ptes+0xd0/0xd0
[ 317.956549][ T1939] ? __page_mapcount+0x19a/0x250
[ 317.979397][ T1939] ? try_to_unmap+0x1bf/0x2d0
[ 318.000960][ T1939] ? rmap_walk_locked+0x140/0x140
[ 318.023999][ T1939] ? PageHuge+0xf/0xd0
[ 318.042729][ T1939] ? page_mapped+0x155/0x2e0
[ 318.063725][ T1939] ? hugetlb_page_mapping_lock_write+0x97/0x180
[ 318.092655][ T1939] migrate_pages+0x1496/0x2290
[ 318.114177][ T1939] ? remove_migration_pte+0xac0/0xac0
[ 318.139797][ T1939] move_pages_and_store_status.isra.47+0xd7/0x1a0
[ 318.170609][ T1939] ? migrate_pages+0x2290/0x2290
[ 318.193481][ T1939] __x64_sys_move_pages+0x8b7/0x1180
[ 318.217652][ T1939] ? move_pages_and_store_status.isra.47+0x1a0/0x1a0
[ 318.248748][ T1939] ? syscall_enter_from_user_mode+0x1b/0x210
[ 318.276536][ T1939] ? lockdep_hardirqs_on_prepare+0x33e/0x4e0
[ 318.304077][ T1939] ? syscall_enter_from_user_mode+0x20/0x210
[ 318.331847][ T1939] ? trace_hardirqs_on+0x20/0x1b5
[ 318.355312][ T1939] do_syscall_64+0x33/0x40
[ 318.376825][ T1939] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 318.405938][ T1939] RIP: 0033:0x7f221bd5d6ed
[ 318.426072][ T1939] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6b 57 2c 00 f7 d8 64 89 01 48
[ 318.519665][ T1939] RSP: 002b:00007ffff6c4d448 EFLAGS: 00000212 ORIG_RAX: 0000000000000117
[ 318.560049][ T1939] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f221bd5d6ed
[ 318.597747][ T1939] RDX: 000000000151bc40 RSI: 0000000000000400 RDI: 00000000000006e2
[ 318.635143][ T1939] RBP: 00007ffff6c4d4b0 R08: 000000000151ac30 R09: 0000000000000004
[ 318.672919][ T1939] R10: 0000000001519c20 R11: 0000000000000212 R12: 0000000000401cb0
[ 318.710619][ T1939] R13: 00007ffff6c58970 R14: 0000000000000000 R15: 0000000000000000
[ 318.747920][ T1939] irq event stamp: 1465
[ 318.766713][ T1939] hardirqs last enabled at (1475): [<ffffffffabe671bf>] console_unlock+0x75f/0xaf0
[ 318.810473][ T1939] hardirqs last disabled at (1484): [<ffffffffabe66cad>] console_unlock+0x24d/0xaf0
[ 318.854253][ T1939] softirqs last enabled at (1464): [<ffffffffad20070f>] __do_softirq+0x70f/0xa9f
[ 318.900757][ T1939] softirqs last disabled at (1455): [<ffffffffad000ec2>] asm_call_on_stack+0x12/0x20
[ 318.945713][ T1939] ---[ end trace 5a58095b9439b080 ]---

2020-08-21 13:49:59

by Michal Hocko

[permalink] [raw]
Subject: Re: [Resend PATCH 1/6] mm/memcg: warning on !memcg after readahead page charged

On Fri 21-08-20 08:39:37, Qian Cai wrote:
> On Fri, Aug 21, 2020 at 10:01:27AM +0200, Michal Hocko wrote:
> > On Thu 20-08-20 10:58:51, Qian Cai wrote:
> > > On Tue, Aug 11, 2020 at 07:10:27PM +0800, Alex Shi wrote:
> > > > Since readahead page is charged on memcg too, in theory we don't have to
> > > > check this exception now. Before safely remove them all, add a warning
> > > > for the unexpected !memcg.
> > > >
> > > > Signed-off-by: Alex Shi <[email protected]>
> > > > Acked-by: Michal Hocko <[email protected]>
> > >
> > > This will trigger,
> >
> > Thanks for the report!
> >
> > > [ 1863.916499] LTP: starting move_pages12
> > > [ 1863.946520] page:000000008ccc1062 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1fd3c0
> > > [ 1863.946553] head:000000008ccc1062 order:5 compound_mapcount:0 compound_pincount:0
> >
> > Hmm, this is really unexpected. How did we get order-5 page here? Is
> > this some special mappaing that sys_move_pages should just ignore?
>
> Well, I thought everybody should be able to figure out where to find the LTP
> tests source code at this stage to see what it does. Anyway, the test simply
> migrate hugepages while soft offlining, so order 5 is expected as that is 2M
> hugepage on powerpc (also reproduced on x86 below). It might be easier to
> reproduce using our linux-mm random bug collection on NUMA systems.

OK, I must have missed that this was on ppc. The order makes more sense
now. I will have a look at this next week.

Thanks!
--
Michal Hocko
SUSE Labs

2020-08-21 14:04:06

by Qian Cai

[permalink] [raw]
Subject: Re: [Resend PATCH 1/6] mm/memcg: warning on !memcg after readahead page charged

On Fri, Aug 21, 2020 at 03:48:42PM +0200, Michal Hocko wrote:
> On Fri 21-08-20 08:39:37, Qian Cai wrote:
> > On Fri, Aug 21, 2020 at 10:01:27AM +0200, Michal Hocko wrote:
> > > On Thu 20-08-20 10:58:51, Qian Cai wrote:
> > > > On Tue, Aug 11, 2020 at 07:10:27PM +0800, Alex Shi wrote:
> > > > > Since readahead page is charged on memcg too, in theory we don't have to
> > > > > check this exception now. Before safely remove them all, add a warning
> > > > > for the unexpected !memcg.
> > > > >
> > > > > Signed-off-by: Alex Shi <[email protected]>
> > > > > Acked-by: Michal Hocko <[email protected]>
> > > >
> > > > This will trigger,
> > >
> > > Thanks for the report!
> > >
> > > > [ 1863.916499] LTP: starting move_pages12
> > > > [ 1863.946520] page:000000008ccc1062 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1fd3c0
> > > > [ 1863.946553] head:000000008ccc1062 order:5 compound_mapcount:0 compound_pincount:0
> > >
> > > Hmm, this is really unexpected. How did we get order-5 page here? Is
> > > this some special mappaing that sys_move_pages should just ignore?
> >
> > Well, I thought everybody should be able to figure out where to find the LTP
> > tests source code at this stage to see what it does. Anyway, the test simply
> > migrate hugepages while soft offlining, so order 5 is expected as that is 2M
> > hugepage on powerpc (also reproduced on x86 below). It might be easier to
> > reproduce using our linux-mm random bug collection on NUMA systems.
>
> OK, I must have missed that this was on ppc. The order makes more sense
> now. I will have a look at this next week.

Sorry about not mentioning powerpc in the first place. I don't know since when
powerpc will no longer print out hardware information like x86 does in those
warning reports. I'll dig.

2020-08-24 14:53:41

by Qian Cai

[permalink] [raw]
Subject: Re: [Resend PATCH 1/6] mm/memcg: warning on !memcg after readahead page charged

On Thu, Aug 20, 2020 at 10:58:50AM -0400, Qian Cai wrote:
> On Tue, Aug 11, 2020 at 07:10:27PM +0800, Alex Shi wrote:
> > Since readahead page is charged on memcg too, in theory we don't have to
> > check this exception now. Before safely remove them all, add a warning
> > for the unexpected !memcg.
> >
> > Signed-off-by: Alex Shi <[email protected]>
> > Acked-by: Michal Hocko <[email protected]>
>
> This will trigger,

Andrew, Stephen, can you drop this series for now? I did manage to trigger this
warning on all arches, powerpc, x86 and arm64 (below).

[ 7380.751929][T73938] WARNING: CPU: 160 PID: 73938 at mm/memcontrol.c:6908 mem_cgroup_migrate+0x5a4/0x6d8
[ 7380.761317][T73938] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio loop processor efivarfs ip_tables x_tables sd_mod ahci libahci mlx5_core firmware_class libata dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dummy_del_mod]
[ 7380.783242][T73938] CPU: 160 PID: 73938 Comm: move_pages12 Tainted: G O L 5.9.0-rc2-next-20200824 #1
[ 7380.793499][T73938] Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.15 05/08/2020
[ 7380.803932][T73938] pstate: 00400009 (nzcv daif +PAN -UAO BTYPE=--)
[ 7380.810196][T73938] pc : mem_cgroup_migrate+0x5a4/0x6d8
[ 7380.815419][T73938] lr : mem_cgroup_migrate+0x59c/0x6d8
[ 7380.820641][T73938] sp : ffff008de9d0f6a0
[ 7380.824647][T73938] x29: ffff008de9d0f6a0 x28: 0000000000000002
[ 7380.830661][T73938] x27: ffffffe022880018 x26: 1ffffffc04510003
[ 7380.836674][T73938] x25: 0000000000000001 x24: 0000000000000001
[ 7380.842687][T73938] x23: ffffffe003280038 x22: 1ffffffc00650007
[ 7380.848701][T73938] x21: 0000000000000000 x20: ffffa0001703692c
[ 7380.854714][T73938] x19: ffffffe022880000 x18: 0000000000000000
[ 7380.860726][T73938] x17: 0000000000000000 x16: 0000000000000000
[ 7380.866738][T73938] x15: 0000000000000000 x14: 0000000000000001
[ 7380.872751][T73938] x13: ffff8011cf16f0ff x12: 1fffe011cf16f0fe
[ 7380.878764][T73938] x11: 1fffe011cf16f0fe x10: ffff8011cf16f0fe
[ 7380.884777][T73938] x9 : dfffa00000000000 x8 : ffff008e78b787f7
[ 7380.890789][T73938] x7 : 0000000000000001 x6 : ffff8011cf16f0ff
[ 7380.896802][T73938] x5 : ffff8011cf16f0ff x4 : ffff8011cf16f0ff
[ 7380.902815][T73938] x3 : 1fffe011c1a4ae72 x2 : 1ffffffc04510007
[ 7380.908828][T73938] x1 : 53d80e6b46c19e00 x0 : 0000000000000001
[ 7380.914842][T73938] Call trace:
[ 7380.917982][T73938] mem_cgroup_migrate+0x5a4/0x6d8
[ 7380.922862][T73938] migrate_page_states+0x938/0x17c0
[ 7380.927911][T73938] migrate_page_copy+0x6c0/0x1018
[ 7380.932787][T73938] migrate_page+0xe0/0x1a0
[ 7380.937055][T73938] move_to_new_page+0x2b0/0x9e8
[ 7380.941757][T73938] migrate_pages+0x1650/0x23a0
[ 7380.946373][T73938] move_pages_and_store_status.isra.40+0xe4/0x170
[ 7380.952638][T73938] do_pages_move+0x484/0xb88
[ 7380.957079][T73938] __arm64_sys_move_pages+0x3a8/0x7d0
[ 7380.962314][T73938] do_el0_svc+0x124/0x228
[ 7380.966502][T73938] el0_sync_handler+0x260/0x410
[ 7380.971204][T73938] el0_sync+0x140/0x180
[ 7380.975213][T73938] CPU: 160 PID: 73938 Comm: move_pages12 Tainted: G O L 5.9.0-rc2-next-20200824 #1
[ 7380.985469][T73938] Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.15 05/08/2020
[ 7380.995898][T73938] Call trace:
[ 7380.999041][T73938] dump_backtrace+0x0/0x398
[ 7381.003396][T73938] show_stack+0x14/0x20
[ 7381.007412][T73938] dump_stack+0x140/0x1c8
[ 7381.011604][T73938] __warn+0x23c/0x2c8
[ 7381.015439][T73938] report_bug+0x18c/0x2a8
[ 7381.019621][T73938] bug_handler+0x34/0x88
[ 7381.023715][T73938] brk_handler+0x138/0x240
[ 7381.027987][T73938] do_debug_exception+0x138/0x544
[ 7381.032862][T73938] el1_sync_handler+0x13c/0x1b8
[ 7381.037564][T73938] el1_sync+0x7c/0x100
[ 7381.041484][T73938] mem_cgroup_migrate+0x5a4/0x6d8
[ 7381.046359][T73938] migrate_page_states+0x938/0x17c0
[ 7381.051408][T73938] migrate_page_copy+0x6c0/0x1018
[ 7381.056283][T73938] migrate_page+0xe0/0x1a0
[ 7381.060551][T73938] move_to_new_page+0x2b0/0x9e8
[ 7381.065252][T73938] migrate_pages+0x1650/0x23a0
[ 7381.069867][T73938] move_pages_and_store_status.isra.40+0xe4/0x170
[ 7381.076131][T73938] do_pages_move+0x484/0xb88
[ 7381.080573][T73938] __arm64_sys_move_pages+0x3a8/0x7d0
[ 7381.085796][T73938] do_el0_svc+0x124/0x228
[ 7381.089977][T73938] el0_sync_handler+0x260/0x410
[ 7381.094678][T73938] el0_sync+0x140/0x180
[ 7381.098684][T73938] irq event stamp: 470
[ 7381.102614][T73938] hardirqs last enabled at (469): [<ffffa000103b5b5c>] console_unlock+0x7f4/0xf10
[ 7381.111745][T73938] hardirqs last disabled at (470): [<ffffa000101cd934>] do_debug_exception+0x304/0x544
[ 7381.121222][T73938] softirqs last enabled at (408): [<ffffa000101a1b50>] efi_header_end+0xb50/0x14d4
[ 7381.130448][T73938] softirqs last disabled at (403): [<ffffa0001028df98>] irq_exit+0x440/0x510

== powerpc ==

>
> [ 1863.916499] LTP: starting move_pages12
> [ 1863.946520] page:000000008ccc1062 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1fd3c0
> [ 1863.946553] head:000000008ccc1062 order:5 compound_mapcount:0 compound_pincount:0
> [ 1863.946568] anon flags: 0x7fff800001000d(locked|uptodate|dirty|head)
> [ 1863.946584] raw: 007fff800001000d c000000016ebfcd8 c000000016ebfcd8 c000001feaf46d59
> [ 1863.946609] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> [ 1863.946632] page dumped because: VM_WARN_ON_ONCE_PAGE(!memcg)
> [ 1863.946669] ------------[ cut here ]------------
> [ 1863.946694] WARNING: CPU: 16 PID: 35307 at mm/memcontrol.c:6908 mem_cgroup_migrate+0x5f8/0x610
> [ 1863.946708] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_spapr_tce vfio vfio_spapr_eeh loop kvm_hv kvm ip_tables x_tables sd_mod bnx2x tg3 ahci libahci libphy mdio firmware_class libata dm_mirror dm_region_hash dm_log dm_mod
> [ 1863.946801] CPU: 16 PID: 35307 Comm: move_pages12 Not tainted 5.9.0-rc1-next-20200820 #4
> [ 1863.946834] NIP: c0000000003fcb48 LR: c0000000003fcb38 CTR: 0000000000000000
> [ 1863.946856] REGS: c000000016ebf6f0 TRAP: 0700 Not tainted (5.9.0-rc1-next-20200820)
> [ 1863.946879] MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28242882 XER: 00000000
> [ 1863.946915] CFAR: c00000000032c644 IRQMASK: 0
> GPR00: c0000000003fcb38 c000000016ebf980 c000000005923200 0000000000000031
> GPR04: 0000000000000000 0000000000000000 0000000000000027 c000001ffd727190
> GPR08: 0000000000000023 0000000000000001 c0000000058f3200 0000000000000001
> GPR12: 0000000000002000 c000001ffffe3800 c000000000b26a68 0000000000000000
> GPR16: c000000016ebfc20 c000000016ebfcd8 0000000000000020 0000000000000001
> GPR20: c00c00080724f000 c0000000003c8770 0000000000000000 c000000016ebfcd0
> GPR24: 0000000000000000 fffffffffffffff5 0000000000000002 0000000000000000
> GPR28: 0000000000000000 0000000000000001 0000000000000000 c00c000007f4f000
> [ 1863.947142] NIP [c0000000003fcb48] mem_cgroup_migrate+0x5f8/0x610
> [ 1863.947164] LR [c0000000003fcb38] mem_cgroup_migrate+0x5e8/0x610
> [ 1863.947185] Call Trace:
> [ 1863.947203] [c000000016ebf980] [c0000000003fcb38] mem_cgroup_migrate+0x5e8/0x610 (unreliable)
> [ 1863.947241] [c000000016ebf9c0] [c0000000003c9080] migrate_page_states+0x4e0/0xce0
> [ 1863.947274] [c000000016ebf9f0] [c0000000003cbbec] migrate_page+0x8c/0x120
> [ 1863.947307] [c000000016ebfa30] [c0000000003ccf10] move_to_new_page+0x190/0x670
> [ 1863.947341] [c000000016ebfaf0] [c0000000003ced08] migrate_pages+0xfb8/0x1880
> [ 1863.947365] [c000000016ebfc00] [c0000000003cf670] move_pages_and_store_status.isra.45+0xa0/0x160
> [ 1863.947399] [c000000016ebfc80] [c0000000003cfef4] sys_move_pages+0x7c4/0xed0
> [ 1863.947434] [c000000016ebfdc0] [c00000000002c678] system_call_exception+0xf8/0x1d0
> [ 1863.947459] [c000000016ebfe20] [c00000000000d0a8] system_call_common+0xe8/0x218
> [ 1863.947481] Instruction dump:
> [ 1863.947502] 7fc3f378 4bfee82d 7c0802a6 3c82fb20 7fe3fb78 38844fc8 f8010050 4bf2fad5
> [ 1863.947527] 60000000 39200001 3d42fffd 992a82fb <0fe00000> e8010050 eb810020 7c0803a6
> [ 1863.947563] CPU: 16 PID: 35307 Comm: move_pages12 Not tainted 5.9.0-rc1-next-20200820 #4
> [ 1863.947594] Call Trace:
> [ 1863.947615] [c000000016ebf4d0] [c0000000006f6008] dump_stack+0xfc/0x174 (unreliable)
> [ 1863.947642] [c000000016ebf520] [c0000000000c9004] __warn+0xc4/0x14c
> [ 1863.947665] [c000000016ebf5b0] [c0000000006f4b68] report_bug+0x108/0x1f0
> [ 1863.947689] [c000000016ebf650] [c0000000000234f4] program_check_exception+0x104/0x2e0
> [ 1863.947724] [c000000016ebf680] [c000000000009664] program_check_common_virt+0x2c4/0x310
> [ 1863.947751] --- interrupt: 700 at mem_cgroup_migrate+0x5f8/0x610
> LR = mem_cgroup_migrate+0x5e8/0x610
> [ 1863.947786] [c000000016ebf9c0] [c0000000003c9080] migrate_page_states+0x4e0/0xce0
> [ 1863.947810] [c000000016ebf9f0] [c0000000003cbbec] migrate_page+0x8c/0x120
> [ 1863.947843] [c000000016ebfa30] [c0000000003ccf10] move_to_new_page+0x190/0x670
> [ 1863.947867] [c000000016ebfaf0] [c0000000003ced08] migrate_pages+0xfb8/0x1880
> [ 1863.947901] [c000000016ebfc00] [c0000000003cf670] move_pages_and_store_status.isra.45+0xa0/0x160
> [ 1863.947936] [c000000016ebfc80] [c0000000003cfef4] sys_move_pages+0x7c4/0xed0
> [ 1863.947969] [c000000016ebfdc0] [c00000000002c678] system_call_exception+0xf8/0x1d0
> [ 1863.948002] [c000000016ebfe20] [c00000000000d0a8] system_call_common+0xe8/0x218
> [ 1863.948034] irq event stamp: 410
> [ 1863.948054] hardirqs last enabled at (409): [<c000000000184564>] console_unlock+0x6b4/0x990
> [ 1863.948092] hardirqs last disabled at (410): [<c00000000000965c>] program_check_common_virt+0x2bc/0x310
> [ 1863.948126] softirqs last enabled at (0): [<c0000000000c59a8>] copy_process+0x788/0x1950
> [ 1863.948229] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [ 1863.948316] ---[ end trace 74f8f4df751b0259 ]---
>
> > Cc: Johannes Weiner <[email protected]>
> > Cc: Michal Hocko <[email protected]>
> > Cc: Vladimir Davydov <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > ---
> > include/linux/mmdebug.h | 13 +++++++++++++
> > mm/memcontrol.c | 15 ++++++++-------
> > 2 files changed, 21 insertions(+), 7 deletions(-)
> >
> > diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h
> > index 2ad72d2c8cc5..4ed52879ce55 100644
> > --- a/include/linux/mmdebug.h
> > +++ b/include/linux/mmdebug.h
> > @@ -37,6 +37,18 @@
> > BUG(); \
> > } \
> > } while (0)
> > +#define VM_WARN_ON_ONCE_PAGE(cond, page) ({ \
> > + static bool __section(.data.once) __warned; \
> > + int __ret_warn_once = !!(cond); \
> > + \
> > + if (unlikely(__ret_warn_once && !__warned)) { \
> > + dump_page(page, "VM_WARN_ON_ONCE_PAGE(" __stringify(cond)")");\
> > + __warned = true; \
> > + WARN_ON(1); \
> > + } \
> > + unlikely(__ret_warn_once); \
> > +})
> > +
> > #define VM_WARN_ON(cond) (void)WARN_ON(cond)
> > #define VM_WARN_ON_ONCE(cond) (void)WARN_ON_ONCE(cond)
> > #define VM_WARN_ONCE(cond, format...) (void)WARN_ONCE(cond, format)
> > @@ -48,6 +60,7 @@
> > #define VM_BUG_ON_MM(cond, mm) VM_BUG_ON(cond)
> > #define VM_WARN_ON(cond) BUILD_BUG_ON_INVALID(cond)
> > #define VM_WARN_ON_ONCE(cond) BUILD_BUG_ON_INVALID(cond)
> > +#define VM_WARN_ON_ONCE_PAGE(cond, page) BUILD_BUG_ON_INVALID(cond)
> > #define VM_WARN_ONCE(cond, format...) BUILD_BUG_ON_INVALID(cond)
> > #define VM_WARN(cond, format...) BUILD_BUG_ON_INVALID(cond)
> > #endif
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 130093bdf74b..299382fc55a9 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1322,10 +1322,8 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgd
> > }
> >
> > memcg = page->mem_cgroup;
> > - /*
> > - * Swapcache readahead pages are added to the LRU - and
> > - * possibly migrated - before they are charged.
> > - */
> > + /* Readahead page is charged too, to see if other page uncharged */
> > + VM_WARN_ON_ONCE_PAGE(!memcg, page);
> > if (!memcg)
> > memcg = root_mem_cgroup;
> >
> > @@ -6906,8 +6904,9 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage)
> > if (newpage->mem_cgroup)
> > return;
> >
> > - /* Swapcache readahead pages can get replaced before being charged */
> > memcg = oldpage->mem_cgroup;
> > + /* Readahead page is charged too, to see if other page uncharged */
> > + VM_WARN_ON_ONCE_PAGE(!memcg, oldpage);
> > if (!memcg)
> > return;
> >
> > @@ -7104,7 +7103,8 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
> >
> > memcg = page->mem_cgroup;
> >
> > - /* Readahead page, never charged */
> > + /* Readahead page is charged too, to see if other page uncharged */
> > + VM_WARN_ON_ONCE_PAGE(!memcg, page);
> > if (!memcg)
> > return;
> >
> > @@ -7168,7 +7168,8 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
> >
> > memcg = page->mem_cgroup;
> >
> > - /* Readahead page, never charged */
> > + /* Readahead page is charged too, to see if other page uncharged */
> > + VM_WARN_ON_ONCE_PAGE(!memcg, page);
> > if (!memcg)
> > return 0;
> >
> > --
> > 1.8.3.1
> >
> >

2020-08-24 15:12:26

by Michal Hocko

[permalink] [raw]
Subject: Re: [Resend PATCH 1/6] mm/memcg: warning on !memcg after readahead page charged

On Fri 21-08-20 15:48:44, Michal Hocko wrote:
> On Fri 21-08-20 08:39:37, Qian Cai wrote:
> > On Fri, Aug 21, 2020 at 10:01:27AM +0200, Michal Hocko wrote:
> > > On Thu 20-08-20 10:58:51, Qian Cai wrote:
> > > > On Tue, Aug 11, 2020 at 07:10:27PM +0800, Alex Shi wrote:
> > > > > Since readahead page is charged on memcg too, in theory we don't have to
> > > > > check this exception now. Before safely remove them all, add a warning
> > > > > for the unexpected !memcg.
> > > > >
> > > > > Signed-off-by: Alex Shi <[email protected]>
> > > > > Acked-by: Michal Hocko <[email protected]>
> > > >
> > > > This will trigger,
> > >
> > > Thanks for the report!
> > >
> > > > [ 1863.916499] LTP: starting move_pages12
> > > > [ 1863.946520] page:000000008ccc1062 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1fd3c0
> > > > [ 1863.946553] head:000000008ccc1062 order:5 compound_mapcount:0 compound_pincount:0
> > >
> > > Hmm, this is really unexpected. How did we get order-5 page here? Is
> > > this some special mappaing that sys_move_pages should just ignore?
> >
> > Well, I thought everybody should be able to figure out where to find the LTP
> > tests source code at this stage to see what it does. Anyway, the test simply
> > migrate hugepages while soft offlining, so order 5 is expected as that is 2M
> > hugepage on powerpc (also reproduced on x86 below). It might be easier to
> > reproduce using our linux-mm random bug collection on NUMA systems.
>
> OK, I must have missed that this was on ppc. The order makes more sense
> now. I will have a look at this next week.

OK, so I've had a look and I know what's going on there. The
move_pages12 is migrating hugetlb pages. Those are not charged to any
memcg. We have completely missed this case. There are two ways going
around that. Drop the warning and update the comment so that we do not
forget about that or special case hugetlb pages.

I think the first option is better.
--
Michal Hocko
SUSE Labs

2020-08-24 15:15:09

by Michal Hocko

[permalink] [raw]
Subject: Re: [Resend PATCH 1/6] mm/memcg: warning on !memcg after readahead page charged

On Mon 24-08-20 10:52:02, Qian Cai wrote:
> On Thu, Aug 20, 2020 at 10:58:50AM -0400, Qian Cai wrote:
> > On Tue, Aug 11, 2020 at 07:10:27PM +0800, Alex Shi wrote:
> > > Since readahead page is charged on memcg too, in theory we don't have to
> > > check this exception now. Before safely remove them all, add a warning
> > > for the unexpected !memcg.
> > >
> > > Signed-off-by: Alex Shi <[email protected]>
> > > Acked-by: Michal Hocko <[email protected]>
> >
> > This will trigger,
>
> Andrew, Stephen, can you drop this series for now? I did manage to trigger this
> warning on all arches, powerpc, x86 and arm64 (below).

Yes, I do agree. See http://lkml.kernel.org/r/[email protected]
--
Michal Hocko
SUSE Labs

2020-08-24 23:02:21

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [Resend PATCH 1/6] mm/memcg: warning on !memcg after readahead page charged

Hi Michal,

On Mon, 24 Aug 2020 17:10:45 +0200 Michal Hocko <[email protected]> wrote:
>
> On Mon 24-08-20 10:52:02, Qian Cai wrote:
> > On Thu, Aug 20, 2020 at 10:58:50AM -0400, Qian Cai wrote:
> > > On Tue, Aug 11, 2020 at 07:10:27PM +0800, Alex Shi wrote:
> > > > Since readahead page is charged on memcg too, in theory we don't have to
> > > > check this exception now. Before safely remove them all, add a warning
> > > > for the unexpected !memcg.
> > > >
> > > > Signed-off-by: Alex Shi <[email protected]>
> > > > Acked-by: Michal Hocko <[email protected]>
> > >
> > > This will trigger,
> >
> > Andrew, Stephen, can you drop this series for now? I did manage to trigger this
> > warning on all arches, powerpc, x86 and arm64 (below).
>
> Yes, I do agree. See http://lkml.kernel.org/r/[email protected]

OK, I have removed the following from linux-next for today:

c443db77c9f3 ("mm/thp: narrow lru locking")
18bafefba73d ("mm/thp: remove code path which never got into")
5fb6c0683017 ("mm/thp: clean up lru_add_page_tail")
9d1d568727a8 ("mm/thp: move lru_add_page_tail func to huge_memory.c")
47eb331560ff ("mm/memcg: bail out early from swap accounting when memcg is disabled")
4b0d99a64d78 ("mm/memcg: warning on !memcg after readahead page charged")

--
Cheers,
Stephen Rothwell


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2020-08-24 23:17:34

by Alex Shi

[permalink] [raw]
Subject: Re: [Resend PATCH 1/6] mm/memcg: warning on !memcg after readahead page charged



在 2020/8/25 上午7:00, Stephen Rothwell 写道:
>>>> This will trigger,
>>> Andrew, Stephen, can you drop this series for now? I did manage to trigger this
>>> warning on all arches, powerpc, x86 and arm64 (below).
>> Yes, I do agree. See http://lkml.kernel.org/r/[email protected]
> OK, I have removed the following from linux-next for today:
>
> c443db77c9f3 ("mm/thp: narrow lru locking")
> 18bafefba73d ("mm/thp: remove code path which never got into")
> 5fb6c0683017 ("mm/thp: clean up lru_add_page_tail")
> 9d1d568727a8 ("mm/thp: move lru_add_page_tail func to huge_memory.c")
> 47eb331560ff ("mm/memcg: bail out early from swap accounting when memcg is disabled")
> 4b0d99a64d78 ("mm/memcg: warning on !memcg after readahead page charged")

The first patch 4b0d99a64d78 ("mm/memcg: warning on !memcg after readahead page charged")
reveals the hugetlb out of lru on some unexpected path. At least comments are helpful.

All other are good and functional.

Thanks
Alex

2020-08-25 01:28:38

by Alex Shi

[permalink] [raw]
Subject: Re: [Resend PATCH 1/6] mm/memcg: warning on !memcg after readahead page charged

reproduce using our linux-mm random bug collection on NUMA systems.
>>
>> OK, I must have missed that this was on ppc. The order makes more sense
>> now. I will have a look at this next week.
>
> OK, so I've had a look and I know what's going on there. The
> move_pages12 is migrating hugetlb pages. Those are not charged to any
> memcg. We have completely missed this case. There are two ways going
> around that. Drop the warning and update the comment so that we do not
> forget about that or special case hugetlb pages.
>
> I think the first option is better.
>


Hi Michal,

Compare to ignore the warning which is designed to give, seems addressing
the hugetlb out of charge issue is a better solution, otherwise the memcg
memory usage is out of control on hugetlb, is that right?

Thanks
Alex

2020-08-25 02:09:10

by Hugh Dickins

[permalink] [raw]
Subject: Re: [Resend PATCH 1/6] mm/memcg: warning on !memcg after readahead page charged

On Tue, 25 Aug 2020, Alex Shi wrote:
> reproduce using our linux-mm random bug collection on NUMA systems.
> >>
> >> OK, I must have missed that this was on ppc. The order makes more sense
> >> now. I will have a look at this next week.
> >
> > OK, so I've had a look and I know what's going on there. The
> > move_pages12 is migrating hugetlb pages. Those are not charged to any
> > memcg. We have completely missed this case. There are two ways going
> > around that. Drop the warning and update the comment so that we do not
> > forget about that or special case hugetlb pages.
> >
> > I think the first option is better.
> >
>
>
> Hi Michal,
>
> Compare to ignore the warning which is designed to give, seems addressing
> the hugetlb out of charge issue is a better solution, otherwise the memcg
> memory usage is out of control on hugetlb, is that right?

Please don't suppose that this is peculiar to hugetlb: I'm not
testing hugetlb at all (sorry), but I see the VM_WARN_ON_ONCE from
mem_cgroup_page_lruvec(), and from mem_cgroup_migrate(), and from
mem_cgroup_swapout().

In all cases seen on a PageAnon page (well, in one case PageKsm).
And not related to THP either: seen also on machine incapable of THP.

Maybe there's an independent change in 5.9-rc that's defeating
expectations here, or maybe they were never valid. Worth
investigating, even though the patch is currently removed,
to find out why expectations were wrong.

You'll ask me for more info, stacktraces etc, and I'll say sorry,
no time today. Please try the swapping tests I sent before.

And may I say, the comment
/* Readahead page is charged too, to see if other page uncharged */
is nonsensical to me, and much better deleted: maybe it would make
some sense if the reader could see the comment it replaces - as
they can in the patch - but not in the resulting source file.

Hugh

2020-08-25 07:28:11

by Michal Hocko

[permalink] [raw]
Subject: Re: [Resend PATCH 1/6] mm/memcg: warning on !memcg after readahead page charged

On Tue 25-08-20 09:25:01, Alex Shi wrote:
> reproduce using our linux-mm random bug collection on NUMA systems.
> >>
> >> OK, I must have missed that this was on ppc. The order makes more sense
> >> now. I will have a look at this next week.
> >
> > OK, so I've had a look and I know what's going on there. The
> > move_pages12 is migrating hugetlb pages. Those are not charged to any
> > memcg. We have completely missed this case. There are two ways going
> > around that. Drop the warning and update the comment so that we do not
> > forget about that or special case hugetlb pages.
> >
> > I think the first option is better.
> >
>
>
> Hi Michal,
>
> Compare to ignore the warning which is designed to give, seems addressing
> the hugetlb out of charge issue is a better solution, otherwise the memcg
> memory usage is out of control on hugetlb, is that right?

Hugetlb memory is out of memcg scope deliberately. This is not a
reclaimable memory and something that can easily get out of control. The
memory is preallocated and overcommit is strictly controlled as well. We
have a dedicated hugetlb cgroup controller to offer a better control of
the preallocated pool distribution.

Anyway this just shows that there are more subtle cases where a page
with no memcg can hit some common paths so the patch is clearly not
ready.

I should have realized that when giving my ack but same as you I got
misled by the existing comment.
--
Michal Hocko
SUSE Labs

2020-08-27 06:02:29

by Hugh Dickins

[permalink] [raw]
Subject: Re: [Resend PATCH 1/6] mm/memcg: warning on !memcg after readahead page charged

On Mon, 24 Aug 2020, Hugh Dickins wrote:
> On Tue, 25 Aug 2020, Alex Shi wrote:
> > reproduce using our linux-mm random bug collection on NUMA systems.
> > >>
> > >> OK, I must have missed that this was on ppc. The order makes more sense
> > >> now. I will have a look at this next week.
> > >
> > > OK, so I've had a look and I know what's going on there. The
> > > move_pages12 is migrating hugetlb pages. Those are not charged to any
> > > memcg. We have completely missed this case. There are two ways going
> > > around that. Drop the warning and update the comment so that we do not
> > > forget about that or special case hugetlb pages.
> > >
> > > I think the first option is better.
> > >
> >
> >
> > Hi Michal,
> >
> > Compare to ignore the warning which is designed to give, seems addressing
> > the hugetlb out of charge issue is a better solution, otherwise the memcg
> > memory usage is out of control on hugetlb, is that right?

I agree: it seems that hugetlb is not participating in memcg and lrus,
so it should not even be calling mem_cgroup_migrate(). That happens
because hugetlb finds the rest of migrate_page_states() useful,
but maybe there just needs to be an "if (!PageHuge(page))" or
"if (!PageHuge(newpage))" before its call to mem_cgroup_migrate() -
but I have not yet checked whether either of those actually works.

The same could be done inside mem_cgroup_migrate() instead,
but it just seems wrong for hugetlb to be getting that far,
if it has no other reason to enter mm/memcontrol.c.

>
> Please don't suppose that this is peculiar to hugetlb: I'm not
> testing hugetlb at all (sorry), but I see the VM_WARN_ON_ONCE from
> mem_cgroup_page_lruvec(), and from mem_cgroup_migrate(), and from
> mem_cgroup_swapout().
>
> In all cases seen on a PageAnon page (well, in one case PageKsm).
> And not related to THP either: seen also on machine incapable of THP.
>
> Maybe there's an independent change in 5.9-rc that's defeating
> expectations here, or maybe they were never valid. Worth
> investigating, even though the patch is currently removed,
> to find out why expectations were wrong.

It was very well worth investigating. And at the time of writing
the above, I thought it was coming up very quickly on all machines,
but in fact it only came up quickly on the one exercising KSM;
on the other machines it took about an hour to appear, so no
wonder that you and others had not already seen it.

While I'd prefer to spring the answer on you all in the patch that
fixes it, there's something more there that I don't fully understand
yet, and want to sort out before posting; so I'd better not keep you
in suspense... we broke the memcg charging of ksm_might_need_to_copy()
pages a couple of releases ago, and not noticed until your warning.

What's surprising is that the same bug can affect PageAnon pages too,
even when there's been no KSM involved whatsoever. I put in the KSM
fix, set all the machines running, expecting to get more info on the
PageAnon instances, but all of them turned out to be fixed.

>
> You'll ask me for more info, stacktraces etc, and I'll say sorry,
> no time today. Please try the swapping tests I sent before.
>
> And may I say, the comment
> /* Readahead page is charged too, to see if other page uncharged */
> is nonsensical to me, and much better deleted: maybe it would make
> some sense if the reader could see the comment it replaces - as
> they can in the patch - but not in the resulting source file.

I stand by that remark; but otherwise, I think this was a helpful
commit that helped to identify a bug, just as it was intended to do.
(I say "helped to" because its warnings alerted, but did not point
to the culprit: I had to add another in lru_cache_add() to find it.)

Hugh