Hi all,
This series contains cleanups to remove unneeded return value, misleading
comment and so on. Also this puts the redirtied MADV_FREE pages back to
anonymous LRU list. More details can be found in the respective changelogs.
Thanks!
Miaohe Lin (5):
mm/vmscan: put the redirtied MADV_FREE pages back to anonymous LRU
list
mm/vmscan: remove misleading setting to sc->priority
mm/vmscan: remove unneeded return value of kswapd_run()
mm/vmscan: add 'else' to remove check_pending label
mm/vmscan: fix misleading comment in isolate_lru_pages()
include/linux/swap.h | 2 +-
mm/vmscan.c | 26 +++++++++-----------------
2 files changed, 10 insertions(+), 18 deletions(-)
--
2.23.0
If the MADV_FREE pages are redirtied before they could be reclaimed, put
the pages back to anonymous LRU list by setting SwapBacked flag and the
pages will be reclaimed in normal swapout way. Otherwise MADV_FREE pages
won't be reclaimed as expected.
Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages")
Signed-off-by: Miaohe Lin <[email protected]>
---
mm/vmscan.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a7602f71ec04..6483fe0e2065 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1628,6 +1628,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
if (!page_ref_freeze(page, 1))
goto keep_locked;
if (PageDirty(page)) {
+ SetPageSwapBacked(page);
page_ref_unfreeze(page, 1);
goto keep_locked;
}
--
2.23.0
The priority field of sc is used to control how many pages we should scan
at once while we always traverse the list to shrink the pages in these
functions. So these settings are unneeded and misleading.
Signed-off-by: Miaohe Lin <[email protected]>
---
mm/vmscan.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6483fe0e2065..fbe53e60b248 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1702,7 +1702,6 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
{
struct scan_control sc = {
.gfp_mask = GFP_KERNEL,
- .priority = DEF_PRIORITY,
.may_unmap = 1,
};
struct reclaim_stat stat;
@@ -2327,7 +2326,6 @@ unsigned long reclaim_pages(struct list_head *page_list)
unsigned int noreclaim_flag;
struct scan_control sc = {
.gfp_mask = GFP_KERNEL,
- .priority = DEF_PRIORITY,
.may_writepage = 1,
.may_unmap = 1,
.may_swap = 1,
--
2.23.0
The return value of kswapd_run() is unused now. Clean it up.
Signed-off-by: Miaohe Lin <[email protected]>
---
include/linux/swap.h | 2 +-
mm/vmscan.c | 7 ++-----
2 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 6f5a43251593..717e6e500929 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -408,7 +408,7 @@ static inline bool node_reclaim_enabled(void)
extern void check_move_unevictable_pages(struct pagevec *pvec);
-extern int kswapd_run(int nid);
+extern void kswapd_run(int nid);
extern void kswapd_stop(int nid);
#ifdef CONFIG_SWAP
diff --git a/mm/vmscan.c b/mm/vmscan.c
index fbe53e60b248..c580bef6b885 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4284,23 +4284,20 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
* This kswapd start function will be called by init and node-hot-add.
* On node-hot-add, kswapd will moved to proper cpus if cpus are hot-added.
*/
-int kswapd_run(int nid)
+void kswapd_run(int nid)
{
pg_data_t *pgdat = NODE_DATA(nid);
- int ret = 0;
if (pgdat->kswapd)
- return 0;
+ return;
pgdat->kswapd = kthread_run(kswapd, pgdat, "kswapd%d", nid);
if (IS_ERR(pgdat->kswapd)) {
/* failure at boot is fatal */
BUG_ON(system_state < SYSTEM_RUNNING);
pr_err("Failed to start kswapd on node %d\n", nid);
- ret = PTR_ERR(pgdat->kswapd);
pgdat->kswapd = NULL;
}
- return ret;
}
/*
--
2.23.0
We could add 'else' to remove the somewhat odd check_pending label to
make code core succinct.
Signed-off-by: Miaohe Lin <[email protected]>
---
mm/vmscan.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c580bef6b885..a74760c48bd8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3428,18 +3428,14 @@ static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
* blocked waiting on the same lock. Instead, throttle for up to a
* second before continuing.
*/
- if (!(gfp_mask & __GFP_FS)) {
+ if (!(gfp_mask & __GFP_FS))
wait_event_interruptible_timeout(pgdat->pfmemalloc_wait,
allow_direct_reclaim(pgdat), HZ);
+ else
+ /* Throttle until kswapd wakes the process */
+ wait_event_killable(zone->zone_pgdat->pfmemalloc_wait,
+ allow_direct_reclaim(pgdat));
- goto check_pending;
- }
-
- /* Throttle until kswapd wakes the process */
- wait_event_killable(zone->zone_pgdat->pfmemalloc_wait,
- allow_direct_reclaim(pgdat));
-
-check_pending:
if (fatal_signal_pending(current))
return true;
--
2.23.0
We couldn't know whether the page is being freed elsewhere until we failed
to increase the page count.
Signed-off-by: Miaohe Lin <[email protected]>
---
mm/vmscan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a74760c48bd8..6e26b3c93242 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1891,7 +1891,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
*/
scan += nr_pages;
if (!__isolate_lru_page_prepare(page, mode)) {
- /* It is being freed elsewhere */
list_move(&page->lru, src);
continue;
}
@@ -1901,6 +1900,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
* page release code relies on it.
*/
if (unlikely(!get_page_unless_zero(page))) {
+ /* It is being freed elsewhere. */
list_move(&page->lru, src);
continue;
}
--
2.23.0
On Sat, Jul 10, 2021 at 4:03 AM Miaohe Lin <[email protected]> wrote:
>
> If the MADV_FREE pages are redirtied before they could be reclaimed, put
> the pages back to anonymous LRU list by setting SwapBacked flag and the
> pages will be reclaimed in normal swapout way. Otherwise MADV_FREE pages
> won't be reclaimed as expected.
>
> Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages")
This is not a bug -- the dirty check isn't needed but it was copied
from __remove_mapping().
The page has only one reference left, which is from the isolation.
After the caller puts the page back on lru and drops the reference,
the page will be freed anyway. It doesn't matter which lru it goes.
> Signed-off-by: Miaohe Lin <[email protected]>
> ---
> mm/vmscan.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a7602f71ec04..6483fe0e2065 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1628,6 +1628,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
> if (!page_ref_freeze(page, 1))
> goto keep_locked;
> if (PageDirty(page)) {
> + SetPageSwapBacked(page);
> page_ref_unfreeze(page, 1);
> goto keep_locked;
> }
> --
> 2.23.0
>
>
On Sat 10-07-21 18:03:25, Miaohe Lin wrote:
> If the MADV_FREE pages are redirtied before they could be reclaimed, put
> the pages back to anonymous LRU list by setting SwapBacked flag and the
> pages will be reclaimed in normal swapout way. Otherwise MADV_FREE pages
> won't be reclaimed as expected.
Could you describe problem which you are trying to address? What does it
mean that pages won't be reclaimed as expected?
Also why is SetPageSwapBacked in shrink_page_list insufficient?
> Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages")
> Signed-off-by: Miaohe Lin <[email protected]>
> ---
> mm/vmscan.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a7602f71ec04..6483fe0e2065 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1628,6 +1628,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
> if (!page_ref_freeze(page, 1))
> goto keep_locked;
> if (PageDirty(page)) {
> + SetPageSwapBacked(page);
> page_ref_unfreeze(page, 1);
> goto keep_locked;
> }
> --
> 2.23.0
--
Michal Hocko
SUSE Labs
On Sat 10-07-21 18:03:26, Miaohe Lin wrote:
> The priority field of sc is used to control how many pages we should scan
> at once while we always traverse the list to shrink the pages in these
> functions. So these settings are unneeded and misleading.
I dunno. I agree that priority is not really used as these operate on
page lists but I am not sure this is worth touching.
> Signed-off-by: Miaohe Lin <[email protected]>
> ---
> mm/vmscan.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 6483fe0e2065..fbe53e60b248 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1702,7 +1702,6 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
> {
> struct scan_control sc = {
> .gfp_mask = GFP_KERNEL,
> - .priority = DEF_PRIORITY,
> .may_unmap = 1,
> };
> struct reclaim_stat stat;
> @@ -2327,7 +2326,6 @@ unsigned long reclaim_pages(struct list_head *page_list)
> unsigned int noreclaim_flag;
> struct scan_control sc = {
> .gfp_mask = GFP_KERNEL,
> - .priority = DEF_PRIORITY,
> .may_writepage = 1,
> .may_unmap = 1,
> .may_swap = 1,
> --
> 2.23.0
--
Michal Hocko
SUSE Labs
On Sat 10-07-21 18:03:29, Miaohe Lin wrote:
> We couldn't know whether the page is being freed elsewhere until we failed
> to increase the page count.
This is moving a hard to understand comment from one place to another.
If anything this would benefit from what that elsewhere might be
typically or simply drop the comment altogether.
>
> Signed-off-by: Miaohe Lin <[email protected]>
> ---
> mm/vmscan.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a74760c48bd8..6e26b3c93242 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1891,7 +1891,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> */
> scan += nr_pages;
> if (!__isolate_lru_page_prepare(page, mode)) {
> - /* It is being freed elsewhere */
> list_move(&page->lru, src);
> continue;
> }
> @@ -1901,6 +1900,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> * page release code relies on it.
> */
> if (unlikely(!get_page_unless_zero(page))) {
> + /* It is being freed elsewhere. */
> list_move(&page->lru, src);
> continue;
> }
> --
> 2.23.0
--
Michal Hocko
SUSE Labs
On Sat 10-07-21 18:03:27, Miaohe Lin wrote:
> The return value of kswapd_run() is unused now. Clean it up.
>
> Signed-off-by: Miaohe Lin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
> ---
> include/linux/swap.h | 2 +-
> mm/vmscan.c | 7 ++-----
> 2 files changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 6f5a43251593..717e6e500929 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -408,7 +408,7 @@ static inline bool node_reclaim_enabled(void)
>
> extern void check_move_unevictable_pages(struct pagevec *pvec);
>
> -extern int kswapd_run(int nid);
> +extern void kswapd_run(int nid);
> extern void kswapd_stop(int nid);
>
> #ifdef CONFIG_SWAP
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index fbe53e60b248..c580bef6b885 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4284,23 +4284,20 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
> * This kswapd start function will be called by init and node-hot-add.
> * On node-hot-add, kswapd will moved to proper cpus if cpus are hot-added.
> */
> -int kswapd_run(int nid)
> +void kswapd_run(int nid)
> {
> pg_data_t *pgdat = NODE_DATA(nid);
> - int ret = 0;
>
> if (pgdat->kswapd)
> - return 0;
> + return;
>
> pgdat->kswapd = kthread_run(kswapd, pgdat, "kswapd%d", nid);
> if (IS_ERR(pgdat->kswapd)) {
> /* failure at boot is fatal */
> BUG_ON(system_state < SYSTEM_RUNNING);
> pr_err("Failed to start kswapd on node %d\n", nid);
> - ret = PTR_ERR(pgdat->kswapd);
> pgdat->kswapd = NULL;
> }
> - return ret;
> }
>
> /*
> --
> 2.23.0
--
Michal Hocko
SUSE Labs
On 2021/7/11 7:22, Yu Zhao wrote:
> On Sat, Jul 10, 2021 at 4:03 AM Miaohe Lin <[email protected]> wrote:
>>
>> If the MADV_FREE pages are redirtied before they could be reclaimed, put
>> the pages back to anonymous LRU list by setting SwapBacked flag and the
>> pages will be reclaimed in normal swapout way. Otherwise MADV_FREE pages
>> won't be reclaimed as expected.
>>
>> Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages")
>
> This is not a bug -- the dirty check isn't needed but it was copied
> from __remove_mapping().
Yes, this is not a bug and harmless. When we reach here, page should not be
dirtied because PageDirty is handled above and there is no way to redirty it
again as pagetable references are all gone and it's not in the swap cache.
>
> The page has only one reference left, which is from the isolation.
> After the caller puts the page back on lru and drops the reference,
> the page will be freed anyway. It doesn't matter which lru it goes.
But it looks buggy as it didn't perform the expected ops from code view.
Should I drop the Fixes tag and send a v2 version?
Many thanks for reply!
>
>> Signed-off-by: Miaohe Lin <[email protected]>
>> ---
>> mm/vmscan.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index a7602f71ec04..6483fe0e2065 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1628,6 +1628,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
>> if (!page_ref_freeze(page, 1))
>> goto keep_locked;
>> if (PageDirty(page)) {
>> + SetPageSwapBacked(page);
>> page_ref_unfreeze(page, 1);
>> goto keep_locked;
>> }
>> --
>> 2.23.0
>>
>>
> .
>
On Sat 10-07-21 18:03:28, Miaohe Lin wrote:
> We could add 'else' to remove the somewhat odd check_pending label to
> make code core succinct.
Yes, this makes the code easier to follow. The two modes of throttling
depending on the fs reclaim mode is more obvious now.
> Signed-off-by: Miaohe Lin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Thanks!
> ---
> mm/vmscan.c | 14 +++++---------
> 1 file changed, 5 insertions(+), 9 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c580bef6b885..a74760c48bd8 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3428,18 +3428,14 @@ static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
> * blocked waiting on the same lock. Instead, throttle for up to a
> * second before continuing.
> */
> - if (!(gfp_mask & __GFP_FS)) {
> + if (!(gfp_mask & __GFP_FS))
> wait_event_interruptible_timeout(pgdat->pfmemalloc_wait,
> allow_direct_reclaim(pgdat), HZ);
> + else
> + /* Throttle until kswapd wakes the process */
> + wait_event_killable(zone->zone_pgdat->pfmemalloc_wait,
> + allow_direct_reclaim(pgdat));
>
> - goto check_pending;
> - }
> -
> - /* Throttle until kswapd wakes the process */
> - wait_event_killable(zone->zone_pgdat->pfmemalloc_wait,
> - allow_direct_reclaim(pgdat));
> -
> -check_pending:
> if (fatal_signal_pending(current))
> return true;
>
> --
> 2.23.0
--
Michal Hocko
SUSE Labs
On 2021/7/12 15:22, Michal Hocko wrote:
> On Sat 10-07-21 18:03:25, Miaohe Lin wrote:
>> If the MADV_FREE pages are redirtied before they could be reclaimed, put
>> the pages back to anonymous LRU list by setting SwapBacked flag and the
>> pages will be reclaimed in normal swapout way. Otherwise MADV_FREE pages
>> won't be reclaimed as expected.
>
> Could you describe problem which you are trying to address? What does it
> mean that pages won't be reclaimed as expected?
>
In fact, this is not a bug and harmless. But it looks buggy as it didn't perform
the expected ops from code view. Lazyfree (MADV_FREE) pages are clean anonymous
pages. They have SwapBacked flag cleared to distinguish normal anonymous pages.
When the MADV_FREE pages are redirtied before they could be reclaimed, the pages
should be put back to anonymous LRU list by setting SwapBacked flag, thus the
pages will be reclaimed in normal swapout way.
Many thanks for review and reply.
> Also why is SetPageSwapBacked in shrink_page_list insufficient?
>
>> Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages")
>> Signed-off-by: Miaohe Lin <[email protected]>
>> ---
>> mm/vmscan.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index a7602f71ec04..6483fe0e2065 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1628,6 +1628,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
>> if (!page_ref_freeze(page, 1))
>> goto keep_locked;
>> if (PageDirty(page)) {
>> + SetPageSwapBacked(page);
>> page_ref_unfreeze(page, 1);
>> goto keep_locked;
>> }
>> --
>> 2.23.0
>
On 2021/7/12 15:24, Michal Hocko wrote:
> On Sat 10-07-21 18:03:26, Miaohe Lin wrote:
>> The priority field of sc is used to control how many pages we should scan
>> at once while we always traverse the list to shrink the pages in these
>> functions. So these settings are unneeded and misleading.
>
> I dunno. I agree that priority is not really used as these operate on
> page lists but I am not sure this is worth touching.
When I investigated the vmscan code, I thought the order here would control the
proportion of the pages in the list to shrink. So I prefer to remove these.
Thanks a lot for review and reply!
>
>> Signed-off-by: Miaohe Lin <[email protected]>
>> ---
>> mm/vmscan.c | 2 --
>> 1 file changed, 2 deletions(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 6483fe0e2065..fbe53e60b248 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1702,7 +1702,6 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
>> {
>> struct scan_control sc = {
>> .gfp_mask = GFP_KERNEL,
>> - .priority = DEF_PRIORITY,
>> .may_unmap = 1,
>> };
>> struct reclaim_stat stat;
>> @@ -2327,7 +2326,6 @@ unsigned long reclaim_pages(struct list_head *page_list)
>> unsigned int noreclaim_flag;
>> struct scan_control sc = {
>> .gfp_mask = GFP_KERNEL,
>> - .priority = DEF_PRIORITY,
>> .may_writepage = 1,
>> .may_unmap = 1,
>> .may_swap = 1,
>> --
>> 2.23.0
>
On 2021/7/12 15:28, Michal Hocko wrote:
> On Sat 10-07-21 18:03:29, Miaohe Lin wrote:
>> We couldn't know whether the page is being freed elsewhere until we failed
>> to increase the page count.
>
> This is moving a hard to understand comment from one place to another.
If get_page_unless_zero failed, the page could have been freed elsewhere. I think
this looks straightforward but doesn't help a lot. Are you preferring to just
remove this comment ?
Thank you.
> If anything this would benefit from what that elsewhere might be
> typically or simply drop the comment altogether.
>
>>
>> Signed-off-by: Miaohe Lin <[email protected]>
>> ---
>> mm/vmscan.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index a74760c48bd8..6e26b3c93242 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1891,7 +1891,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>> */
>> scan += nr_pages;
>> if (!__isolate_lru_page_prepare(page, mode)) {
>> - /* It is being freed elsewhere */
>> list_move(&page->lru, src);
>> continue;
>> }
>> @@ -1901,6 +1900,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>> * page release code relies on it.
>> */
>> if (unlikely(!get_page_unless_zero(page))) {
>> + /* It is being freed elsewhere. */
>> list_move(&page->lru, src);
>> continue;
>> }
>> --
>> 2.23.0
>
On Mon, Jul 12, 2021 at 1:12 AM Miaohe Lin <[email protected]> wrote:
>
> On 2021/7/11 7:22, Yu Zhao wrote:
> > On Sat, Jul 10, 2021 at 4:03 AM Miaohe Lin <[email protected]> wrote:
> >>
> >> If the MADV_FREE pages are redirtied before they could be reclaimed, put
> >> the pages back to anonymous LRU list by setting SwapBacked flag and the
> >> pages will be reclaimed in normal swapout way. Otherwise MADV_FREE pages
> >> won't be reclaimed as expected.
> >>
> >> Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages")
> >
> > This is not a bug -- the dirty check isn't needed but it was copied
> > from __remove_mapping().
>
> Yes, this is not a bug and harmless. When we reach here, page should not be
> dirtied because PageDirty is handled above and there is no way to redirty it
> again as pagetable references are all gone and it's not in the swap cache.
>
> >
> > The page has only one reference left, which is from the isolation.
> > After the caller puts the page back on lru and drops the reference,
> > the page will be freed anyway. It doesn't matter which lru it goes.
>
> But it looks buggy as it didn't perform the expected ops from code view.
> Should I drop the Fixes tag and send a v2 version?
I don't understand the logic here -- it looks pretty obvious to me
that, if we want to change anything, we should delete the dirty check,
not add another line that would enforce the belief that the dirty
check is needed.
>
> Many thanks for reply!
>
> >
> >> Signed-off-by: Miaohe Lin <[email protected]>
> >> ---
> >> mm/vmscan.c | 1 +
> >> 1 file changed, 1 insertion(+)
> >>
> >> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >> index a7602f71ec04..6483fe0e2065 100644
> >> --- a/mm/vmscan.c
> >> +++ b/mm/vmscan.c
> >> @@ -1628,6 +1628,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
> >> if (!page_ref_freeze(page, 1))
> >> goto keep_locked;
> >> if (PageDirty(page)) {
> >> + SetPageSwapBacked(page);
> >> page_ref_unfreeze(page, 1);
> >> goto keep_locked;
> >> }
> >> --
> >> 2.23.0
> >>
> >>
> > .
> >
>
On Mon 12-07-21 19:03:39, Miaohe Lin wrote:
> On 2021/7/12 15:22, Michal Hocko wrote:
> > On Sat 10-07-21 18:03:25, Miaohe Lin wrote:
> >> If the MADV_FREE pages are redirtied before they could be reclaimed, put
> >> the pages back to anonymous LRU list by setting SwapBacked flag and the
> >> pages will be reclaimed in normal swapout way. Otherwise MADV_FREE pages
> >> won't be reclaimed as expected.
> >
> > Could you describe problem which you are trying to address? What does it
> > mean that pages won't be reclaimed as expected?
> >
>
> In fact, this is not a bug and harmless.
Fixes tag is then misleading and the changelog should be more clear
about this as well.
> But it looks buggy as it didn't perform
> the expected ops from code view. Lazyfree (MADV_FREE) pages are clean anonymous
> pages. They have SwapBacked flag cleared to distinguish normal anonymous pages.
yes.
> When the MADV_FREE pages are redirtied before they could be reclaimed, the pages
> should be put back to anonymous LRU list by setting SwapBacked flag, thus the
> pages will be reclaimed in normal swapout way.
Agreed. But the question is why this needs an explicit handling here
when we already do handle this case when trying to unmap the page.
Please make sure to document the behavior you are observing, why it is
not desirable.
> Many thanks for review and reply.
>
> > Also why is SetPageSwapBacked in shrink_page_list insufficient?
Sorry I meant to say try_to_unmap path here
> >> Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages")
> >> Signed-off-by: Miaohe Lin <[email protected]>
> >> ---
> >> mm/vmscan.c | 1 +
> >> 1 file changed, 1 insertion(+)
> >>
> >> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >> index a7602f71ec04..6483fe0e2065 100644
> >> --- a/mm/vmscan.c
> >> +++ b/mm/vmscan.c
> >> @@ -1628,6 +1628,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
> >> if (!page_ref_freeze(page, 1))
> >> goto keep_locked;
> >> if (PageDirty(page)) {
> >> + SetPageSwapBacked(page);
> >> page_ref_unfreeze(page, 1);
> >> goto keep_locked;
> >> }
> >> --
> >> 2.23.0
> >
--
Michal Hocko
SUSE Labs
On Mon 12-07-21 19:16:47, Miaohe Lin wrote:
> On 2021/7/12 15:28, Michal Hocko wrote:
> > On Sat 10-07-21 18:03:29, Miaohe Lin wrote:
> >> We couldn't know whether the page is being freed elsewhere until we failed
> >> to increase the page count.
> >
> > This is moving a hard to understand comment from one place to another.
>
> If get_page_unless_zero failed, the page could have been freed elsewhere. I think
> this looks straightforward but doesn't help a lot. Are you preferring to just
> remove this comment ?
Yes the comment in its current form is not really helpful much. Does it
deserve a single liner to drop it? Likely not on its own without more
changes in that area.
--
Michal Hocko
SUSE Labs
On 2021/7/13 17:32, Michal Hocko wrote:
> On Mon 12-07-21 19:16:47, Miaohe Lin wrote:
>> On 2021/7/12 15:28, Michal Hocko wrote:
>>> On Sat 10-07-21 18:03:29, Miaohe Lin wrote:
>>>> We couldn't know whether the page is being freed elsewhere until we failed
>>>> to increase the page count.
>>>
>>> This is moving a hard to understand comment from one place to another.
>>
>> If get_page_unless_zero failed, the page could have been freed elsewhere. I think
>> this looks straightforward but doesn't help a lot. Are you preferring to just
>> remove this comment ?
>
> Yes the comment in its current form is not really helpful much. Does it
> deserve a single liner to drop it? Likely not on its own without more
> changes in that area.
Sure, I will drop this single patch. And I would send a new patch when I collect enough
misleading/obsolete comments to fix.
Thanks.
>
On 2021/7/13 17:30, Michal Hocko wrote:
> On Mon 12-07-21 19:03:39, Miaohe Lin wrote:
>> On 2021/7/12 15:22, Michal Hocko wrote:
>>> On Sat 10-07-21 18:03:25, Miaohe Lin wrote:
>>>> If the MADV_FREE pages are redirtied before they could be reclaimed, put
>>>> the pages back to anonymous LRU list by setting SwapBacked flag and the
>>>> pages will be reclaimed in normal swapout way. Otherwise MADV_FREE pages
>>>> won't be reclaimed as expected.
>>>
>>> Could you describe problem which you are trying to address? What does it
>>> mean that pages won't be reclaimed as expected?
>>>
>>
>> In fact, this is not a bug and harmless.
>
> Fixes tag is then misleading and the changelog should be more clear
> about this as well.
Sure.
>
>> But it looks buggy as it didn't perform
>> the expected ops from code view. Lazyfree (MADV_FREE) pages are clean anonymous
>> pages. They have SwapBacked flag cleared to distinguish normal anonymous pages.
>
> yes.
>
>> When the MADV_FREE pages are redirtied before they could be reclaimed, the pages
>> should be put back to anonymous LRU list by setting SwapBacked flag, thus the
>> pages will be reclaimed in normal swapout way.
>
> Agreed. But the question is why this needs an explicit handling here
> when we already do handle this case when trying to unmap the page.
This makes me think more. It seems even the page_ref_freeze call is guaranteed to
success as no one can grab the page refcnt after the page is successfully unmapped.
Does the change below makes sense for you?
Many Thanks.
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6e26b3c93242..c31925320b33 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1624,15 +1624,11 @@ static unsigned int shrink_page_list(struct list_head *page_list,
}
if (PageAnon(page) && !PageSwapBacked(page)) {
- /* follow __remove_mapping for reference */
- if (!page_ref_freeze(page, 1))
- goto keep_locked;
- if (PageDirty(page)) {
- SetPageSwapBacked(page);
- page_ref_unfreeze(page, 1);
- goto keep_locked;
- }
-
+ /*
+ * No one can grab the page refcnt or redirty the page
+ * after the page is successfully unmapped.
+ */
+ WARN_ON_ONCE(!page_ref_freeze(page, 1));
count_vm_event(PGLAZYFREED);
count_memcg_page_event(page, PGLAZYFREED);
} else if (!mapping || !__remove_mapping(mapping, page, true,
> Please make sure to document the behavior you are observing, why it is
> not desirable.
>
>> Many thanks for review and reply.
>>
>>> Also why is SetPageSwapBacked in shrink_page_list insufficient?
>
> Sorry I meant to say try_to_unmap path here
>
>>>> Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages")
>>>> Signed-off-by: Miaohe Lin <[email protected]>
>>>> ---
>>>> mm/vmscan.c | 1 +
>>>> 1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>> index a7602f71ec04..6483fe0e2065 100644
>>>> --- a/mm/vmscan.c
>>>> +++ b/mm/vmscan.c
>>>> @@ -1628,6 +1628,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
>>>> if (!page_ref_freeze(page, 1))
>>>> goto keep_locked;
>>>> if (PageDirty(page)) {
>>>> + SetPageSwapBacked(page);
>>>> page_ref_unfreeze(page, 1);
>>>> goto keep_locked;
>>>> }
>>>> --
>>>> 2.23.0
>>>
>
On 2021/7/13 15:25, Yu Zhao wrote:
> On Mon, Jul 12, 2021 at 1:12 AM Miaohe Lin <[email protected]> wrote:
>>
>> On 2021/7/11 7:22, Yu Zhao wrote:
>>> On Sat, Jul 10, 2021 at 4:03 AM Miaohe Lin <[email protected]> wrote:
>>>>
>>>> If the MADV_FREE pages are redirtied before they could be reclaimed, put
>>>> the pages back to anonymous LRU list by setting SwapBacked flag and the
>>>> pages will be reclaimed in normal swapout way. Otherwise MADV_FREE pages
>>>> won't be reclaimed as expected.
>>>>
>>>> Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages")
>>>
>>> This is not a bug -- the dirty check isn't needed but it was copied
>>> from __remove_mapping().
>>
>> Yes, this is not a bug and harmless. When we reach here, page should not be
>> dirtied because PageDirty is handled above and there is no way to redirty it
>> again as pagetable references are all gone and it's not in the swap cache.
>>
>>>
>>> The page has only one reference left, which is from the isolation.
>>> After the caller puts the page back on lru and drops the reference,
>>> the page will be freed anyway. It doesn't matter which lru it goes.
>>
>> But it looks buggy as it didn't perform the expected ops from code view.
>> Should I drop the Fixes tag and send a v2 version?
>
> I don't understand the logic here -- it looks pretty obvious to me
> that, if we want to change anything, we should delete the dirty check,
> not add another line that would enforce the belief that the dirty
> check is needed.
>
The dirty check could be removed even with the page_ref_freeze check because no one can grab
the page refcnt after the page is successfully unmapped.
Does the change below makes sense for you?
Many Thanks.
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6e26b3c93242..c31925320b33 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1624,15 +1624,11 @@ static unsigned int shrink_page_list(struct list_head *page_list,
}
if (PageAnon(page) && !PageSwapBacked(page)) {
- /* follow __remove_mapping for reference */
- if (!page_ref_freeze(page, 1))
- goto keep_locked;
- if (PageDirty(page)) {
- SetPageSwapBacked(page);
- page_ref_unfreeze(page, 1);
- goto keep_locked;
- }
-
+ /*
+ * No one can grab the page refcnt or redirty the page
+ * after the page is successfully unmapped.
+ */
+ WARN_ON_ONCE(!page_ref_freeze(page, 1));
count_vm_event(PGLAZYFREED);
count_memcg_page_event(page, PGLAZYFREED);
} else if (!mapping || !__remove_mapping(mapping, page, true,
>>
>> Many thanks for reply!
>>
>>>
>>>> Signed-off-by: Miaohe Lin <[email protected]>
>>>> ---
>>>> mm/vmscan.c | 1 +
>>>> 1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>> index a7602f71ec04..6483fe0e2065 100644
>>>> --- a/mm/vmscan.c
>>>> +++ b/mm/vmscan.c
>>>> @@ -1628,6 +1628,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
>>>> if (!page_ref_freeze(page, 1))
>>>> goto keep_locked;
>>>> if (PageDirty(page)) {
>>>> + SetPageSwapBacked(page);
>>>> page_ref_unfreeze(page, 1);
>>>> goto keep_locked;
>>>> }
>>>> --
>>>> 2.23.0
>>>>
>>>>
>>> .
>>>
>>
> .
>
On Tue, Jul 13, 2021 at 09:13:51PM +0800, Miaohe Lin wrote:
> >> When the MADV_FREE pages are redirtied before they could be reclaimed, the pages
> >> should be put back to anonymous LRU list by setting SwapBacked flag, thus the
> >> pages will be reclaimed in normal swapout way.
> >
> > Agreed. But the question is why this needs an explicit handling here
> > when we already do handle this case when trying to unmap the page.
>
> This makes me think more. It seems even the page_ref_freeze call is guaranteed to
> success as no one can grab the page refcnt after the page is successfully unmapped.
NO! This is wrong. Every page can have its refcount speculatively raised
(and then lowered). The two prime candidates for this are lockless GUP
and page cache lookups, but there can be others too.
On 2021/7/13 21:34, Matthew Wilcox wrote:
> On Tue, Jul 13, 2021 at 09:13:51PM +0800, Miaohe Lin wrote:
>>>> When the MADV_FREE pages are redirtied before they could be reclaimed, the pages
>>>> should be put back to anonymous LRU list by setting SwapBacked flag, thus the
>>>> pages will be reclaimed in normal swapout way.
>>>
>>> Agreed. But the question is why this needs an explicit handling here
>>> when we already do handle this case when trying to unmap the page.
>>
>> This makes me think more. It seems even the page_ref_freeze call is guaranteed to
>> success as no one can grab the page refcnt after the page is successfully unmapped.
>
> NO! This is wrong. Every page can have its refcount speculatively raised
> (and then lowered). The two prime candidates for this are lockless GUP
> and page cache lookups, but there can be others too.
>
Many thanks for pointing this out. My overlook! Sorry!
So, it seems lockless GUP can redirty the MADV_FREE page. But is it ok to just release
a redirtied MADV_FREE pages? Because we hold the last reference here and the page will
be freed anyway...
> .
>
On Wed, Jul 14, 2021 at 07:36:57PM +0800, Miaohe Lin wrote:
> On 2021/7/13 21:34, Matthew Wilcox wrote:
> > On Tue, Jul 13, 2021 at 09:13:51PM +0800, Miaohe Lin wrote:
> >>>> When the MADV_FREE pages are redirtied before they could be reclaimed, the pages
> >>>> should be put back to anonymous LRU list by setting SwapBacked flag, thus the
> >>>> pages will be reclaimed in normal swapout way.
> >>>
> >>> Agreed. But the question is why this needs an explicit handling here
> >>> when we already do handle this case when trying to unmap the page.
> >>
> >> This makes me think more. It seems even the page_ref_freeze call is guaranteed to
> >> success as no one can grab the page refcnt after the page is successfully unmapped.
> >
> > NO! This is wrong. Every page can have its refcount speculatively raised
> > (and then lowered). The two prime candidates for this are lockless GUP
> > and page cache lookups, but there can be others too.
> >
>
> Many thanks for pointing this out. My overlook! Sorry!
> So, it seems lockless GUP can redirty the MADV_FREE page. But is it ok to just release
> a redirtied MADV_FREE pages? Because we hold the last reference here and the page will
> be freed anyway...
I don't see how lockless GUP can redirty the page. It can grab the
refcount, thus making the refcount here two. Then the call to freeze
here fails and the page stays on the list. But the lockless GUP checks
the page is still in the page table (and discovers it isn't, so releases
the reference count). Am I missing a path that lets lockless GUP dirty
the page?
On 7/14/21 4:48 AM, Matthew Wilcox wrote:
> On Wed, Jul 14, 2021 at 07:36:57PM +0800, Miaohe Lin wrote:
>> On 2021/7/13 21:34, Matthew Wilcox wrote:
>>> On Tue, Jul 13, 2021 at 09:13:51PM +0800, Miaohe Lin wrote:
>>>>>> When the MADV_FREE pages are redirtied before they could be reclaimed, the pages
>>>>>> should be put back to anonymous LRU list by setting SwapBacked flag, thus the
>>>>>> pages will be reclaimed in normal swapout way.
>>>>>
>>>>> Agreed. But the question is why this needs an explicit handling here
>>>>> when we already do handle this case when trying to unmap the page.
>>>>
>>>> This makes me think more. It seems even the page_ref_freeze call is guaranteed to
>>>> success as no one can grab the page refcnt after the page is successfully unmapped.
>>>
>>> NO! This is wrong. Every page can have its refcount speculatively raised
>>> (and then lowered). The two prime candidates for this are lockless GUP
>>> and page cache lookups, but there can be others too.
>>>
>>
>> Many thanks for pointing this out. My overlook! Sorry!
>> So, it seems lockless GUP can redirty the MADV_FREE page. But is it ok to just release
>> a redirtied MADV_FREE pages? Because we hold the last reference here and the page will
>> be freed anyway...
>
> I don't see how lockless GUP can redirty the page. It can grab the
> refcount, thus making the refcount here two. Then the call to freeze
> here fails and the page stays on the list. But the lockless GUP checks
> the page is still in the page table (and discovers it isn't, so releases
> the reference count). Am I missing a path that lets lockless GUP dirty
> the page?
>
If a device driver pins some pages using gup, and the device then uses dma
to write to those pages, then you could get there. That story is part of the
reasoning that led to creating pin_user_pages(), which btw does not yet
fully solve that case.
Basically, though, unless a non-CPU device has access to the page, it's
hard to see how gup itself can lead to a page getting dirtied.
thanks,
--
John Hubbard
NVIDIA
On 2021/7/15 3:43, John Hubbard wrote:
> On 7/14/21 4:48 AM, Matthew Wilcox wrote:
>> On Wed, Jul 14, 2021 at 07:36:57PM +0800, Miaohe Lin wrote:
>>> On 2021/7/13 21:34, Matthew Wilcox wrote:
>>>> On Tue, Jul 13, 2021 at 09:13:51PM +0800, Miaohe Lin wrote:
>>>>>>> When the MADV_FREE pages are redirtied before they could be reclaimed, the pages
>>>>>>> should be put back to anonymous LRU list by setting SwapBacked flag, thus the
>>>>>>> pages will be reclaimed in normal swapout way.
>>>>>>
>>>>>> Agreed. But the question is why this needs an explicit handling here
>>>>>> when we already do handle this case when trying to unmap the page.
>>>>>
>>>>> This makes me think more. It seems even the page_ref_freeze call is guaranteed to
>>>>> success as no one can grab the page refcnt after the page is successfully unmapped.
>>>>
>>>> NO! This is wrong. Every page can have its refcount speculatively raised
>>>> (and then lowered). The two prime candidates for this are lockless GUP
>>>> and page cache lookups, but there can be others too.
>>>>
>>>
>>> Many thanks for pointing this out. My overlook! Sorry!
>>> So, it seems lockless GUP can redirty the MADV_FREE page. But is it ok to just release
>>> a redirtied MADV_FREE pages? Because we hold the last reference here and the page will
>>> be freed anyway...
>>
>> I don't see how lockless GUP can redirty the page. It can grab the
>> refcount, thus making the refcount here two. Then the call to freeze
>> here fails and the page stays on the list. But the lockless GUP checks
>> the page is still in the page table (and discovers it isn't, so releases
>> the reference count). Am I missing a path that lets lockless GUP dirty
>> the page?
>>
>
> If a device driver pins some pages using gup, and the device then uses dma
> to write to those pages, then you could get there. That story is part of the
> reasoning that led to creating pin_user_pages(), which btw does not yet
> fully solve that case.
Many thanks for your explanation.
So the similar scenario that is clarified in the __remove_mapping() is possible:
get_user_pages(&page);
[user mapping goes away]
write_to(page);
!PageDirty(page) [good]
SetPageDirty(page);
put_page(page);
!page_count(page) [good, discard it]
[oops, our write_to data is lost]
The page can be redirtied after the page is unmapped. And there is no way to restore the page
table as clean MADV_FREE page is simply cleared from page table via the try_to_unmap path.
Is it ok to just release the redirtied MADV_FREE pages here as we hold the last reference
and the page will be freed anyway... ?
>
> Basically, though, unless a non-CPU device has access to the page, it's
> hard to see how gup itself can lead to a page getting dirtied.
>
> thanks,
On 7/15/21 4:30 AM, Miaohe Lin wrote:
...
>>>> So, it seems lockless GUP can redirty the MADV_FREE page. But is it ok to just release
>>>> a redirtied MADV_FREE pages? Because we hold the last reference here and the page will
>>>> be freed anyway...
>>>
>>> I don't see how lockless GUP can redirty the page. It can grab the
>>> refcount, thus making the refcount here two. Then the call to freeze
>>> here fails and the page stays on the list. But the lockless GUP checks
>>> the page is still in the page table (and discovers it isn't, so releases
>>> the reference count). Am I missing a path that lets lockless GUP dirty
>>> the page?
>>>
>>
>> If a device driver pins some pages using gup, and the device then uses dma
>> to write to those pages, then you could get there. That story is part of the
>> reasoning that led to creating pin_user_pages(), which btw does not yet
>> fully solve that case.
>
> Many thanks for your explanation.
> So the similar scenario that is clarified in the __remove_mapping() is possible:
I probably should have added that the scenario I was describing is broken even
before any patches that you might apply here. I was just trying to ensure that
the complete list of scenarios was known.
thanks,
--
John Hubbard
NVIDIA
On 2021/7/16 8:01, John Hubbard wrote:
> On 7/15/21 4:30 AM, Miaohe Lin wrote:
> ...
>>>>> So, it seems lockless GUP can redirty the MADV_FREE page. But is it ok to just release
>>>>> a redirtied MADV_FREE pages? Because we hold the last reference here and the page will
>>>>> be freed anyway...
>>>>
>>>> I don't see how lockless GUP can redirty the page. It can grab the
>>>> refcount, thus making the refcount here two. Then the call to freeze
>>>> here fails and the page stays on the list. But the lockless GUP checks
>>>> the page is still in the page table (and discovers it isn't, so releases
>>>> the reference count). Am I missing a path that lets lockless GUP dirty
>>>> the page?
>>>>
>>>
>>> If a device driver pins some pages using gup, and the device then uses dma
>>> to write to those pages, then you could get there. That story is part of the
>>> reasoning that led to creating pin_user_pages(), which btw does not yet
>>> fully solve that case.
>>
>> Many thanks for your explanation.
>> So the similar scenario that is clarified in the __remove_mapping() is possible:
>
> I probably should have added that the scenario I was describing is broken even
> before any patches that you might apply here. I was just trying to ensure that
> the complete list of scenarios was known.
>
Many thanks for doing this! :)
>
>
> thanks,