2018-02-06 06:56:36

by Huang, Ying

[permalink] [raw]
Subject: [PATCH -mm] mm, swap, frontswap: Fix THP swap if frontswap enabled

From: Huang Ying <[email protected]>

It was reported by Sergey Senozhatsky that if THP (Transparent Huge
Page) and frontswap (via zswap) are both enabled, when memory goes low
so that swap is triggered, segfault and memory corruption will occur
in random user space applications as follow,

kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
#0 0x00007fc08889ae0d _int_malloc (libc.so.6)
#1 0x00007fc08889c2f3 malloc (libc.so.6)
#2 0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
#3 0x0000560e6005e75c n/a (urxvt)
#4 0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
#5 0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
#6 0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
#7 0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
#8 0x0000560e6005cb55 ev_run (urxvt)
#9 0x0000560e6003b9b9 main (urxvt)
#10 0x00007fc08883af4a __libc_start_main (libc.so.6)
#11 0x0000560e6003f9da _start (urxvt)

After bisection, it was found the first bad commit is
bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped
out").

The root cause is as follow.

When the pages are written to storage device during swapping out in
swap_writepage(), zswap (fontswap) is tried to compress the pages
instead to improve the performance. But zswap (frontswap) will treat
THP as normal page, so only the head page is saved. After swapping
in, tail pages will not be restored to its original contents, so cause
the memory corruption in the applications.

This is fixed via splitting THP at the begin of swapping out if
frontswap is enabled. To avoid frontswap to be enabled at runtime,
whether the page is THP is checked before using frontswap during
swapping out too.

Reported-and-tested-by: Sergey Senozhatsky <[email protected]>
Signed-off-by: "Huang, Ying" <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: Seth Jennings <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Shaohua Li <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: [email protected] # 4.14
Fixes: bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped out")
---
mm/page_io.c | 2 +-
mm/vmscan.c | 16 +++++++++++++---
2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/mm/page_io.c b/mm/page_io.c
index b41cf9644585..6dca817ae7a0 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -250,7 +250,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
unlock_page(page);
goto out;
}
- if (frontswap_store(page) == 0) {
+ if (!PageTransHuge(page) && frontswap_store(page) == 0) {
set_page_writeback(page);
unlock_page(page);
end_page_writeback(page);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index bee53495a829..d1c1e00b08bb 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -55,6 +55,7 @@

#include <linux/swapops.h>
#include <linux/balloon_compaction.h>
+#include <linux/frontswap.h>

#include "internal.h"

@@ -1063,14 +1064,23 @@ static unsigned long shrink_page_list(struct list_head *page_list,
/* cannot split THP, skip it */
if (!can_split_huge_page(page, NULL))
goto activate_locked;
+ /*
+ * Split THP if frontswap enabled,
+ * because it cannot process THP
+ */
+ if (frontswap_enabled()) {
+ if (split_huge_page_to_list(
+ page, page_list))
+ goto activate_locked;
+ }
/*
* Split pages without a PMD map right
* away. Chances are some or all of the
* tail pages can be freed without IO.
*/
- if (!compound_mapcount(page) &&
- split_huge_page_to_list(page,
- page_list))
+ else if (!compound_mapcount(page) &&
+ split_huge_page_to_list(page,
+ page_list))
goto activate_locked;
}
if (!add_to_swap(page)) {
--
2.15.1



2018-02-06 08:31:58

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH -mm] mm, swap, frontswap: Fix THP swap if frontswap enabled

Hi Huang,

On Tue, Feb 06, 2018 at 02:54:04PM +0800, Huang, Ying wrote:
> From: Huang Ying <[email protected]>
>
> It was reported by Sergey Senozhatsky that if THP (Transparent Huge
> Page) and frontswap (via zswap) are both enabled, when memory goes low
> so that swap is triggered, segfault and memory corruption will occur
> in random user space applications as follow,
>
> kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
> #0 0x00007fc08889ae0d _int_malloc (libc.so.6)
> #1 0x00007fc08889c2f3 malloc (libc.so.6)
> #2 0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
> #3 0x0000560e6005e75c n/a (urxvt)
> #4 0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
> #5 0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
> #6 0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
> #7 0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
> #8 0x0000560e6005cb55 ev_run (urxvt)
> #9 0x0000560e6003b9b9 main (urxvt)
> #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
> #11 0x0000560e6003f9da _start (urxvt)
>
> After bisection, it was found the first bad commit is
> bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped
> out").
>
> The root cause is as follow.
>
> When the pages are written to storage device during swapping out in
> swap_writepage(), zswap (fontswap) is tried to compress the pages
> instead to improve the performance. But zswap (frontswap) will treat
> THP as normal page, so only the head page is saved. After swapping
> in, tail pages will not be restored to its original contents, so cause
> the memory corruption in the applications.
>
> This is fixed via splitting THP at the begin of swapping out if
> frontswap is enabled. To avoid frontswap to be enabled at runtime,
> whether the page is THP is checked before using frontswap during
> swapping out too.

Nice catch, Huang. However, before the adding a new dependency between
frontswap and vmscan that I want to avoid if it is possible, let's think
whether frontswap can support THP page or not.
Can't we handle it with some loop to handle all of subpages of THP page?
It might be not hard?

>
> Reported-and-tested-by: Sergey Senozhatsky <[email protected]>
> Signed-off-by: "Huang, Ying" <[email protected]>
> Cc: Konrad Rzeszutek Wilk <[email protected]>
> Cc: Dan Streetman <[email protected]>
> Cc: Seth Jennings <[email protected]>
> Cc: Minchan Kim <[email protected]>
> Cc: Tetsuo Handa <[email protected]>
> Cc: Shaohua Li <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Shakeel Butt <[email protected]>
> Cc: [email protected] # 4.14
> Fixes: bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped out")
> ---
> mm/page_io.c | 2 +-
> mm/vmscan.c | 16 +++++++++++++---
> 2 files changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/mm/page_io.c b/mm/page_io.c
> index b41cf9644585..6dca817ae7a0 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -250,7 +250,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
> unlock_page(page);
> goto out;
> }
> - if (frontswap_store(page) == 0) {
> + if (!PageTransHuge(page) && frontswap_store(page) == 0) {
> set_page_writeback(page);
> unlock_page(page);
> end_page_writeback(page);
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bee53495a829..d1c1e00b08bb 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -55,6 +55,7 @@
>
> #include <linux/swapops.h>
> #include <linux/balloon_compaction.h>
> +#include <linux/frontswap.h>
>
> #include "internal.h"
>
> @@ -1063,14 +1064,23 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> /* cannot split THP, skip it */
> if (!can_split_huge_page(page, NULL))
> goto activate_locked;
> + /*
> + * Split THP if frontswap enabled,
> + * because it cannot process THP
> + */
> + if (frontswap_enabled()) {
> + if (split_huge_page_to_list(
> + page, page_list))
> + goto activate_locked;
> + }
> /*
> * Split pages without a PMD map right
> * away. Chances are some or all of the
> * tail pages can be freed without IO.
> */
> - if (!compound_mapcount(page) &&
> - split_huge_page_to_list(page,
> - page_list))
> + else if (!compound_mapcount(page) &&
> + split_huge_page_to_list(page,
> + page_list))
> goto activate_locked;
> }
> if (!add_to_swap(page)) {
> --
> 2.15.1
>

2018-02-06 08:40:13

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH -mm] mm, swap, frontswap: Fix THP swap if frontswap enabled

Hi, Minchan,

Minchan Kim <[email protected]> writes:

> Hi Huang,
>
> On Tue, Feb 06, 2018 at 02:54:04PM +0800, Huang, Ying wrote:
>> From: Huang Ying <[email protected]>
>>
>> It was reported by Sergey Senozhatsky that if THP (Transparent Huge
>> Page) and frontswap (via zswap) are both enabled, when memory goes low
>> so that swap is triggered, segfault and memory corruption will occur
>> in random user space applications as follow,
>>
>> kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
>> #0 0x00007fc08889ae0d _int_malloc (libc.so.6)
>> #1 0x00007fc08889c2f3 malloc (libc.so.6)
>> #2 0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
>> #3 0x0000560e6005e75c n/a (urxvt)
>> #4 0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
>> #5 0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
>> #6 0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
>> #7 0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
>> #8 0x0000560e6005cb55 ev_run (urxvt)
>> #9 0x0000560e6003b9b9 main (urxvt)
>> #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
>> #11 0x0000560e6003f9da _start (urxvt)
>>
>> After bisection, it was found the first bad commit is
>> bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped
>> out").
>>
>> The root cause is as follow.
>>
>> When the pages are written to storage device during swapping out in
>> swap_writepage(), zswap (fontswap) is tried to compress the pages
>> instead to improve the performance. But zswap (frontswap) will treat
>> THP as normal page, so only the head page is saved. After swapping
>> in, tail pages will not be restored to its original contents, so cause
>> the memory corruption in the applications.
>>
>> This is fixed via splitting THP at the begin of swapping out if
>> frontswap is enabled. To avoid frontswap to be enabled at runtime,
>> whether the page is THP is checked before using frontswap during
>> swapping out too.
>
> Nice catch, Huang. However, before the adding a new dependency between
> frontswap and vmscan that I want to avoid if it is possible, let's think
> whether frontswap can support THP page or not.
> Can't we handle it with some loop to handle all of subpages of THP page?
> It might be not hard?

Yes. That could be an optimization over this patch. This patch is just
a simple fix to make things work and be suitable for stable tree.

I think it may be too complex for stable tree to handle THP in zswap.

Best Regards,
Huang, Ying

>>
>> Reported-and-tested-by: Sergey Senozhatsky <[email protected]>
>> Signed-off-by: "Huang, Ying" <[email protected]>
>> Cc: Konrad Rzeszutek Wilk <[email protected]>
>> Cc: Dan Streetman <[email protected]>
>> Cc: Seth Jennings <[email protected]>
>> Cc: Minchan Kim <[email protected]>
>> Cc: Tetsuo Handa <[email protected]>
>> Cc: Shaohua Li <[email protected]>
>> Cc: Michal Hocko <[email protected]>
>> Cc: Johannes Weiner <[email protected]>
>> Cc: Mel Gorman <[email protected]>
>> Cc: Shakeel Butt <[email protected]>
>> Cc: [email protected] # 4.14
>> Fixes: bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped out")
>> ---
>> mm/page_io.c | 2 +-
>> mm/vmscan.c | 16 +++++++++++++---
>> 2 files changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/page_io.c b/mm/page_io.c
>> index b41cf9644585..6dca817ae7a0 100644
>> --- a/mm/page_io.c
>> +++ b/mm/page_io.c
>> @@ -250,7 +250,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
>> unlock_page(page);
>> goto out;
>> }
>> - if (frontswap_store(page) == 0) {
>> + if (!PageTransHuge(page) && frontswap_store(page) == 0) {
>> set_page_writeback(page);
>> unlock_page(page);
>> end_page_writeback(page);
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index bee53495a829..d1c1e00b08bb 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -55,6 +55,7 @@
>>
>> #include <linux/swapops.h>
>> #include <linux/balloon_compaction.h>
>> +#include <linux/frontswap.h>
>>
>> #include "internal.h"
>>
>> @@ -1063,14 +1064,23 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>> /* cannot split THP, skip it */
>> if (!can_split_huge_page(page, NULL))
>> goto activate_locked;
>> + /*
>> + * Split THP if frontswap enabled,
>> + * because it cannot process THP
>> + */
>> + if (frontswap_enabled()) {
>> + if (split_huge_page_to_list(
>> + page, page_list))
>> + goto activate_locked;
>> + }
>> /*
>> * Split pages without a PMD map right
>> * away. Chances are some or all of the
>> * tail pages can be freed without IO.
>> */
>> - if (!compound_mapcount(page) &&
>> - split_huge_page_to_list(page,
>> - page_list))
>> + else if (!compound_mapcount(page) &&
>> + split_huge_page_to_list(page,
>> + page_list))
>> goto activate_locked;
>> }
>> if (!add_to_swap(page)) {
>> --
>> 2.15.1
>>

2018-02-06 09:04:10

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH -mm] mm, swap, frontswap: Fix THP swap if frontswap enabled

On Tue, Feb 06, 2018 at 04:39:18PM +0800, Huang, Ying wrote:
> Hi, Minchan,
>
> Minchan Kim <[email protected]> writes:
>
> > Hi Huang,
> >
> > On Tue, Feb 06, 2018 at 02:54:04PM +0800, Huang, Ying wrote:
> >> From: Huang Ying <[email protected]>
> >>
> >> It was reported by Sergey Senozhatsky that if THP (Transparent Huge
> >> Page) and frontswap (via zswap) are both enabled, when memory goes low
> >> so that swap is triggered, segfault and memory corruption will occur
> >> in random user space applications as follow,
> >>
> >> kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
> >> #0 0x00007fc08889ae0d _int_malloc (libc.so.6)
> >> #1 0x00007fc08889c2f3 malloc (libc.so.6)
> >> #2 0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
> >> #3 0x0000560e6005e75c n/a (urxvt)
> >> #4 0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
> >> #5 0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
> >> #6 0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
> >> #7 0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
> >> #8 0x0000560e6005cb55 ev_run (urxvt)
> >> #9 0x0000560e6003b9b9 main (urxvt)
> >> #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
> >> #11 0x0000560e6003f9da _start (urxvt)
> >>
> >> After bisection, it was found the first bad commit is
> >> bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped
> >> out").
> >>
> >> The root cause is as follow.
> >>
> >> When the pages are written to storage device during swapping out in
> >> swap_writepage(), zswap (fontswap) is tried to compress the pages
> >> instead to improve the performance. But zswap (frontswap) will treat
> >> THP as normal page, so only the head page is saved. After swapping
> >> in, tail pages will not be restored to its original contents, so cause
> >> the memory corruption in the applications.
> >>
> >> This is fixed via splitting THP at the begin of swapping out if
> >> frontswap is enabled. To avoid frontswap to be enabled at runtime,
> >> whether the page is THP is checked before using frontswap during
> >> swapping out too.
> >
> > Nice catch, Huang. However, before the adding a new dependency between
> > frontswap and vmscan that I want to avoid if it is possible, let's think
> > whether frontswap can support THP page or not.
> > Can't we handle it with some loop to handle all of subpages of THP page?
> > It might be not hard?
>
> Yes. That could be an optimization over this patch. This patch is just
> a simple fix to make things work and be suitable for stable tree.

Yub, it would be more complex than this patch. However, this patch introduces
a new dependency to vmscan.c. IOW, we have been good without knowing frontswap
in vmscan.c but from now on, we should be aware of that, which is unfortunate.

Can't we simple do like that if you want to make it simple and rely on someone
who makes frontswap THP-aware later?

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 42fe5653814a..4bf1725407aa 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -934,7 +934,11 @@ int get_swap_pages(int n_goal, bool cluster, swp_entry_t swp_entries[])

/* Only single cluster request supported */
WARN_ON_ONCE(n_goal > 1 && cluster);
+#ifdef CONFIG_FRONTSWAP
+ /* Now, frontswap doesn't support THP page */
+ if (frontswap_enabled() && cluster)
+ return;
+#endif
avail_pgs = atomic_long_read(&nr_swap_pages) / nr_pages;
if (avail_pgs <= 0)
goto noswap;


>
> I think it may be too complex for stable tree to handle THP in zswap.
>
> Best Regards,
> Huang, Ying
>
> >>
> >> Reported-and-tested-by: Sergey Senozhatsky <[email protected]>
> >> Signed-off-by: "Huang, Ying" <[email protected]>
> >> Cc: Konrad Rzeszutek Wilk <[email protected]>
> >> Cc: Dan Streetman <[email protected]>
> >> Cc: Seth Jennings <[email protected]>
> >> Cc: Minchan Kim <[email protected]>
> >> Cc: Tetsuo Handa <[email protected]>
> >> Cc: Shaohua Li <[email protected]>
> >> Cc: Michal Hocko <[email protected]>
> >> Cc: Johannes Weiner <[email protected]>
> >> Cc: Mel Gorman <[email protected]>
> >> Cc: Shakeel Butt <[email protected]>
> >> Cc: [email protected] # 4.14
> >> Fixes: bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped out")
> >> ---
> >> mm/page_io.c | 2 +-
> >> mm/vmscan.c | 16 +++++++++++++---
> >> 2 files changed, 14 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/mm/page_io.c b/mm/page_io.c
> >> index b41cf9644585..6dca817ae7a0 100644
> >> --- a/mm/page_io.c
> >> +++ b/mm/page_io.c
> >> @@ -250,7 +250,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
> >> unlock_page(page);
> >> goto out;
> >> }
> >> - if (frontswap_store(page) == 0) {
> >> + if (!PageTransHuge(page) && frontswap_store(page) == 0) {
> >> set_page_writeback(page);
> >> unlock_page(page);
> >> end_page_writeback(page);
> >> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >> index bee53495a829..d1c1e00b08bb 100644
> >> --- a/mm/vmscan.c
> >> +++ b/mm/vmscan.c
> >> @@ -55,6 +55,7 @@
> >>
> >> #include <linux/swapops.h>
> >> #include <linux/balloon_compaction.h>
> >> +#include <linux/frontswap.h>
> >>
> >> #include "internal.h"
> >>
> >> @@ -1063,14 +1064,23 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> >> /* cannot split THP, skip it */
> >> if (!can_split_huge_page(page, NULL))
> >> goto activate_locked;
> >> + /*
> >> + * Split THP if frontswap enabled,
> >> + * because it cannot process THP
> >> + */
> >> + if (frontswap_enabled()) {
> >> + if (split_huge_page_to_list(
> >> + page, page_list))
> >> + goto activate_locked;
> >> + }
> >> /*
> >> * Split pages without a PMD map right
> >> * away. Chances are some or all of the
> >> * tail pages can be freed without IO.
> >> */
> >> - if (!compound_mapcount(page) &&
> >> - split_huge_page_to_list(page,
> >> - page_list))
> >> + else if (!compound_mapcount(page) &&
> >> + split_huge_page_to_list(page,
> >> + page_list))
> >> goto activate_locked;
> >> }
> >> if (!add_to_swap(page)) {
> >> --
> >> 2.15.1
> >>

2018-02-06 09:49:28

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH -mm] mm, swap, frontswap: Fix THP swap if frontswap enabled

Hello,

On (02/06/18 01:02), Minchan Kim wrote:
[..]
> Can't we simple do like that if you want to make it simple and rely on someone
> who makes frontswap THP-aware later?
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 42fe5653814a..4bf1725407aa 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -934,7 +934,11 @@ int get_swap_pages(int n_goal, bool cluster, swp_entry_t swp_entries[])
>
> /* Only single cluster request supported */
> WARN_ON_ONCE(n_goal > 1 && cluster);
> +#ifdef CONFIG_FRONTSWAP

Wouldn't #ifdef CONFIG_THP_SWAP be better? frontswap_enabled() is 'false'
on CONFIG_FRONTSWAP configs, should be compiled out anyway.

> + /* Now, frontswap doesn't support THP page */
> + if (frontswap_enabled() && cluster)
> + return;
> +#endif
> avail_pgs = atomic_long_read(&nr_swap_pages) / nr_pages;
> if (avail_pgs <= 0)
> goto noswap;

Looks interesting. Technically, can be done earlier - in get_swap_page(),
can't it? get_swap_page() has the PageTransHuge(page) && CONFIG_THP_SWAP
condition checks. Can add frontswap dependency there. Something like

if (PageTransHuge(page)) {
if (IS_ENABLED(CONFIG_THP_SWAP))
+ if (!frontswap_enabled())
get_swap_pages(1, true, &entry);
return entry;
}

-ss

2018-02-06 13:36:49

by huang ying

[permalink] [raw]
Subject: Re: [PATCH -mm] mm, swap, frontswap: Fix THP swap if frontswap enabled

On Tue, Feb 6, 2018 at 5:02 PM, Minchan Kim <[email protected]> wrote:
> On Tue, Feb 06, 2018 at 04:39:18PM +0800, Huang, Ying wrote:
>> Hi, Minchan,
>>
>> Minchan Kim <[email protected]> writes:
>>
>> > Hi Huang,
>> >
>> > On Tue, Feb 06, 2018 at 02:54:04PM +0800, Huang, Ying wrote:
>> >> From: Huang Ying <[email protected]>
>> >>
>> >> It was reported by Sergey Senozhatsky that if THP (Transparent Huge
>> >> Page) and frontswap (via zswap) are both enabled, when memory goes low
>> >> so that swap is triggered, segfault and memory corruption will occur
>> >> in random user space applications as follow,
>> >>
>> >> kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
>> >> #0 0x00007fc08889ae0d _int_malloc (libc.so.6)
>> >> #1 0x00007fc08889c2f3 malloc (libc.so.6)
>> >> #2 0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
>> >> #3 0x0000560e6005e75c n/a (urxvt)
>> >> #4 0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
>> >> #5 0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
>> >> #6 0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
>> >> #7 0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
>> >> #8 0x0000560e6005cb55 ev_run (urxvt)
>> >> #9 0x0000560e6003b9b9 main (urxvt)
>> >> #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
>> >> #11 0x0000560e6003f9da _start (urxvt)
>> >>
>> >> After bisection, it was found the first bad commit is
>> >> bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped
>> >> out").
>> >>
>> >> The root cause is as follow.
>> >>
>> >> When the pages are written to storage device during swapping out in
>> >> swap_writepage(), zswap (fontswap) is tried to compress the pages
>> >> instead to improve the performance. But zswap (frontswap) will treat
>> >> THP as normal page, so only the head page is saved. After swapping
>> >> in, tail pages will not be restored to its original contents, so cause
>> >> the memory corruption in the applications.
>> >>
>> >> This is fixed via splitting THP at the begin of swapping out if
>> >> frontswap is enabled. To avoid frontswap to be enabled at runtime,
>> >> whether the page is THP is checked before using frontswap during
>> >> swapping out too.
>> >
>> > Nice catch, Huang. However, before the adding a new dependency between
>> > frontswap and vmscan that I want to avoid if it is possible, let's think
>> > whether frontswap can support THP page or not.
>> > Can't we handle it with some loop to handle all of subpages of THP page?
>> > It might be not hard?
>>
>> Yes. That could be an optimization over this patch. This patch is just
>> a simple fix to make things work and be suitable for stable tree.
>
> Yub, it would be more complex than this patch. However, this patch introduces
> a new dependency to vmscan.c. IOW, we have been good without knowing frontswap
> in vmscan.c but from now on, we should be aware of that, which is unfortunate.
>
> Can't we simple do like that if you want to make it simple and rely on someone
> who makes frontswap THP-aware later?
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 42fe5653814a..4bf1725407aa 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -934,7 +934,11 @@ int get_swap_pages(int n_goal, bool cluster, swp_entry_t swp_entries[])
>
> /* Only single cluster request supported */
> WARN_ON_ONCE(n_goal > 1 && cluster);
> +#ifdef CONFIG_FRONTSWAP
> + /* Now, frontswap doesn't support THP page */
> + if (frontswap_enabled() && cluster)
> + return;
> +#endif
> avail_pgs = atomic_long_read(&nr_swap_pages) / nr_pages;
> if (avail_pgs <= 0)
> goto noswap;
>

This can avoid introduce dependency on frontswap in vmscan.c. But
IMHO it doesn't look like the right place to place the logic.
vmscan.c is the place we put policy to determine whether to split THP.

[snip]

2018-02-06 14:15:33

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH -mm] mm, swap, frontswap: Fix THP swap if frontswap enabled

Hi Sergey,

On Tue, Feb 06, 2018 at 06:48:22PM +0900, Sergey Senozhatsky wrote:
> Hello,
>
> On (02/06/18 01:02), Minchan Kim wrote:
> [..]
> > Can't we simple do like that if you want to make it simple and rely on someone
> > who makes frontswap THP-aware later?
> >
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 42fe5653814a..4bf1725407aa 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -934,7 +934,11 @@ int get_swap_pages(int n_goal, bool cluster, swp_entry_t swp_entries[])
> >
> > /* Only single cluster request supported */
> > WARN_ON_ONCE(n_goal > 1 && cluster);
> > +#ifdef CONFIG_FRONTSWAP
>
> Wouldn't #ifdef CONFIG_THP_SWAP be better? frontswap_enabled() is 'false'
> on CONFIG_FRONTSWAP configs, should be compiled out anyway.

Agree.

>
> > + /* Now, frontswap doesn't support THP page */
> > + if (frontswap_enabled() && cluster)
> > + return;
> > +#endif
> > avail_pgs = atomic_long_read(&nr_swap_pages) / nr_pages;
> > if (avail_pgs <= 0)
> > goto noswap;
>
> Looks interesting. Technically, can be done earlier - in get_swap_page(),
> can't it? get_swap_page() has the PageTransHuge(page) && CONFIG_THP_SWAP
> condition checks. Can add frontswap dependency there. Something like
>
> if (PageTransHuge(page)) {
> if (IS_ENABLED(CONFIG_THP_SWAP))
> + if (!frontswap_enabled())
> get_swap_pages(1, true, &entry);
> return entry;
> }

Looks better but it introduces frontswap thing to swap_slots.c while
all frontswap works in swapfile.c.


2018-02-06 14:36:14

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH -mm] mm, swap, frontswap: Fix THP swap if frontswap enabled

On Tue, Feb 06, 2018 at 09:34:44PM +0800, huang ying wrote:
> On Tue, Feb 6, 2018 at 5:02 PM, Minchan Kim <[email protected]> wrote:
> > On Tue, Feb 06, 2018 at 04:39:18PM +0800, Huang, Ying wrote:
> >> Hi, Minchan,
> >>
> >> Minchan Kim <[email protected]> writes:
> >>
> >> > Hi Huang,
> >> >
> >> > On Tue, Feb 06, 2018 at 02:54:04PM +0800, Huang, Ying wrote:
> >> >> From: Huang Ying <[email protected]>
> >> >>
> >> >> It was reported by Sergey Senozhatsky that if THP (Transparent Huge
> >> >> Page) and frontswap (via zswap) are both enabled, when memory goes low
> >> >> so that swap is triggered, segfault and memory corruption will occur
> >> >> in random user space applications as follow,
> >> >>
> >> >> kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
> >> >> #0 0x00007fc08889ae0d _int_malloc (libc.so.6)
> >> >> #1 0x00007fc08889c2f3 malloc (libc.so.6)
> >> >> #2 0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
> >> >> #3 0x0000560e6005e75c n/a (urxvt)
> >> >> #4 0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
> >> >> #5 0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
> >> >> #6 0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
> >> >> #7 0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
> >> >> #8 0x0000560e6005cb55 ev_run (urxvt)
> >> >> #9 0x0000560e6003b9b9 main (urxvt)
> >> >> #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
> >> >> #11 0x0000560e6003f9da _start (urxvt)
> >> >>
> >> >> After bisection, it was found the first bad commit is
> >> >> bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped
> >> >> out").
> >> >>
> >> >> The root cause is as follow.
> >> >>
> >> >> When the pages are written to storage device during swapping out in
> >> >> swap_writepage(), zswap (fontswap) is tried to compress the pages
> >> >> instead to improve the performance. But zswap (frontswap) will treat
> >> >> THP as normal page, so only the head page is saved. After swapping
> >> >> in, tail pages will not be restored to its original contents, so cause
> >> >> the memory corruption in the applications.
> >> >>
> >> >> This is fixed via splitting THP at the begin of swapping out if
> >> >> frontswap is enabled. To avoid frontswap to be enabled at runtime,
> >> >> whether the page is THP is checked before using frontswap during
> >> >> swapping out too.
> >> >
> >> > Nice catch, Huang. However, before the adding a new dependency between
> >> > frontswap and vmscan that I want to avoid if it is possible, let's think
> >> > whether frontswap can support THP page or not.
> >> > Can't we handle it with some loop to handle all of subpages of THP page?
> >> > It might be not hard?
> >>
> >> Yes. That could be an optimization over this patch. This patch is just
> >> a simple fix to make things work and be suitable for stable tree.
> >
> > Yub, it would be more complex than this patch. However, this patch introduces
> > a new dependency to vmscan.c. IOW, we have been good without knowing frontswap
> > in vmscan.c but from now on, we should be aware of that, which is unfortunate.
> >
> > Can't we simple do like that if you want to make it simple and rely on someone
> > who makes frontswap THP-aware later?
> >
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 42fe5653814a..4bf1725407aa 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -934,7 +934,11 @@ int get_swap_pages(int n_goal, bool cluster, swp_entry_t swp_entries[])
> >
> > /* Only single cluster request supported */
> > WARN_ON_ONCE(n_goal > 1 && cluster);
> > +#ifdef CONFIG_FRONTSWAP
> > + /* Now, frontswap doesn't support THP page */
> > + if (frontswap_enabled() && cluster)
> > + return;
> > +#endif
> > avail_pgs = atomic_long_read(&nr_swap_pages) / nr_pages;
> > if (avail_pgs <= 0)
> > goto noswap;
> >
>
> This can avoid introduce dependency on frontswap in vmscan.c. But
> IMHO it doesn't look like the right place to place the logic.
> vmscan.c is the place we put policy to determine whether to split THP.

It adds split policy in vmscan.c like you said.

shrink_page_list already relies on swap_file.c to decide split a THP page.
IOW, if a THP swap stuff is not avilable, split a thp.
It's totally same logic. I don't see any difference at all.

shrink_page_list:

if (!add_to_swap(page)) {
if (PageTransHuge(page))
goto activate_locked;
if (split_huge_page_to_list(page, page_list))
goto activate_locked;
count_vm_event(THP_SWPOUT_FALLBACK);
if (!add_to_swap(page))
goto activate_locked;
}

2018-02-07 02:24:13

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH -mm] mm, swap, frontswap: Fix THP swap if frontswap enabled

Minchan Kim <[email protected]> writes:

> On Tue, Feb 06, 2018 at 09:34:44PM +0800, huang ying wrote:
>> On Tue, Feb 6, 2018 at 5:02 PM, Minchan Kim <[email protected]> wrote:
>> > On Tue, Feb 06, 2018 at 04:39:18PM +0800, Huang, Ying wrote:
>> >> Hi, Minchan,
>> >>
>> >> Minchan Kim <[email protected]> writes:
>> >>
>> >> > Hi Huang,
>> >> >
>> >> > On Tue, Feb 06, 2018 at 02:54:04PM +0800, Huang, Ying wrote:
>> >> >> From: Huang Ying <[email protected]>
>> >> >>
>> >> >> It was reported by Sergey Senozhatsky that if THP (Transparent Huge
>> >> >> Page) and frontswap (via zswap) are both enabled, when memory goes low
>> >> >> so that swap is triggered, segfault and memory corruption will occur
>> >> >> in random user space applications as follow,
>> >> >>
>> >> >> kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
>> >> >> #0 0x00007fc08889ae0d _int_malloc (libc.so.6)
>> >> >> #1 0x00007fc08889c2f3 malloc (libc.so.6)
>> >> >> #2 0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
>> >> >> #3 0x0000560e6005e75c n/a (urxvt)
>> >> >> #4 0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
>> >> >> #5 0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
>> >> >> #6 0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
>> >> >> #7 0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
>> >> >> #8 0x0000560e6005cb55 ev_run (urxvt)
>> >> >> #9 0x0000560e6003b9b9 main (urxvt)
>> >> >> #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
>> >> >> #11 0x0000560e6003f9da _start (urxvt)
>> >> >>
>> >> >> After bisection, it was found the first bad commit is
>> >> >> bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped
>> >> >> out").
>> >> >>
>> >> >> The root cause is as follow.
>> >> >>
>> >> >> When the pages are written to storage device during swapping out in
>> >> >> swap_writepage(), zswap (fontswap) is tried to compress the pages
>> >> >> instead to improve the performance. But zswap (frontswap) will treat
>> >> >> THP as normal page, so only the head page is saved. After swapping
>> >> >> in, tail pages will not be restored to its original contents, so cause
>> >> >> the memory corruption in the applications.
>> >> >>
>> >> >> This is fixed via splitting THP at the begin of swapping out if
>> >> >> frontswap is enabled. To avoid frontswap to be enabled at runtime,
>> >> >> whether the page is THP is checked before using frontswap during
>> >> >> swapping out too.
>> >> >
>> >> > Nice catch, Huang. However, before the adding a new dependency between
>> >> > frontswap and vmscan that I want to avoid if it is possible, let's think
>> >> > whether frontswap can support THP page or not.
>> >> > Can't we handle it with some loop to handle all of subpages of THP page?
>> >> > It might be not hard?
>> >>
>> >> Yes. That could be an optimization over this patch. This patch is just
>> >> a simple fix to make things work and be suitable for stable tree.
>> >
>> > Yub, it would be more complex than this patch. However, this patch introduces
>> > a new dependency to vmscan.c. IOW, we have been good without knowing frontswap
>> > in vmscan.c but from now on, we should be aware of that, which is unfortunate.
>> >
>> > Can't we simple do like that if you want to make it simple and rely on someone
>> > who makes frontswap THP-aware later?
>> >
>> > diff --git a/mm/swapfile.c b/mm/swapfile.c
>> > index 42fe5653814a..4bf1725407aa 100644
>> > --- a/mm/swapfile.c
>> > +++ b/mm/swapfile.c
>> > @@ -934,7 +934,11 @@ int get_swap_pages(int n_goal, bool cluster, swp_entry_t swp_entries[])
>> >
>> > /* Only single cluster request supported */
>> > WARN_ON_ONCE(n_goal > 1 && cluster);
>> > +#ifdef CONFIG_FRONTSWAP
>> > + /* Now, frontswap doesn't support THP page */
>> > + if (frontswap_enabled() && cluster)
>> > + return;
>> > +#endif
>> > avail_pgs = atomic_long_read(&nr_swap_pages) / nr_pages;
>> > if (avail_pgs <= 0)
>> > goto noswap;
>> >
>>
>> This can avoid introduce dependency on frontswap in vmscan.c. But
>> IMHO it doesn't look like the right place to place the logic.
>> vmscan.c is the place we put policy to determine whether to split THP.
>
> It adds split policy in vmscan.c like you said.
>
> shrink_page_list already relies on swap_file.c to decide split a THP page.
> IOW, if a THP swap stuff is not avilable, split a thp.
> It's totally same logic. I don't see any difference at all.
>
> shrink_page_list:
>
> if (!add_to_swap(page)) {
> if (PageTransHuge(page))
> goto activate_locked;
> if (split_huge_page_to_list(page, page_list))
> goto activate_locked;
> count_vm_event(THP_SWPOUT_FALLBACK);
> if (!add_to_swap(page))
> goto activate_locked;
> }

OK. I will change the code as you suggested.

Best Regards,
Huang, Ying