Hi everyone,
This series contains a few patches to remove obsolete comment, introduce
helper to remove duplicated code and so no. Also we take all base pages
of THP into account in rare race condition. More details can be found in
the respective changelogs. Thanks!
---
v2:
patch 1/9: drop code change and add a comment about MADV_FREE
patch 2/9: simplify the code further and change to goto keep_locked
patch 3/9: use folio, remove unneeded inline and break craze long lines
patch 5/9: activate swap-backed executable folios after first usage too
patch 9/9: new cleanup patch splitted from 5/9
Many thanks Huang Ying, Matthew Wilcox, Christoph Hellwig, Muchun Song
for review!
---
Miaohe Lin (9):
mm/vmscan: add a comment about MADV_FREE pages check in
folio_check_dirty_writeback
mm/vmscan: remove unneeded can_split_huge_page check
mm/vmscan: introduce helper function reclaim_page_list()
mm/vmscan: save a bit of stack space in shrink_lruvec
mm/vmscan: activate swap-backed executable folios after first usage
mm/vmscan: take all base pages of THP into account when race with
speculative reference
mm/vmscan: take min_slab_pages into account when try to call
shrink_node
mm/vmscan: remove obsolete comment in kswapd_run
mm/vmscan: use helper folio_is_file_lru()
mm/vmscan.c | 92 ++++++++++++++++++++++++++---------------------------
1 file changed, 45 insertions(+), 47 deletions(-)
--
2.23.0
If the page has buffers, shrink_page_list will try to free the buffer
mappings associated with the page and try to free the page as well.
In the rare race with speculative reference, the page will be freed
shortly by speculative reference. But nr_reclaimed is not incremented
correctly when we come across the THP. We need to account all the base
pages in this case.
Signed-off-by: Miaohe Lin <[email protected]>
---
mm/vmscan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index cc1193e320c2..53f1d0755b34 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1878,7 +1878,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
* increment nr_reclaimed here (and
* leave it off the LRU).
*/
- nr_reclaimed++;
+ nr_reclaimed += nr_pages;
continue;
}
}
--
2.23.0
Since commit 6b700b5b3c59 ("mm/vmscan.c: remove cpu online notification
for now"), cpu online notification is removed. So kswapd won't move to
proper cpus if cpus are hot-added. Remove this obsolete comment.
Signed-off-by: Miaohe Lin <[email protected]>
---
mm/vmscan.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ba83d8f3e53e..79310b3cbbe3 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4567,7 +4567,6 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
/*
* This kswapd start function will be called by init and node-hot-add.
- * On node-hot-add, kswapd will moved to proper cpus if cpus are hot-added.
*/
void kswapd_run(int nid)
{
--
2.23.0
Since commit 6b4f7799c6a5 ("mm: vmscan: invoke slab shrinkers from
shrink_zone()"), slab reclaim and lru page reclaim are done together
in the shrink_node. So we should take min_slab_pages into account
when try to call shrink_node.
Signed-off-by: Miaohe Lin <[email protected]>
---
mm/vmscan.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 53f1d0755b34..ba83d8f3e53e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4714,7 +4714,8 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
noreclaim_flag = memalloc_noreclaim_save();
set_task_reclaim_state(p, &sc.reclaim_state);
- if (node_pagecache_reclaimable(pgdat) > pgdat->min_unmapped_pages) {
+ if (node_pagecache_reclaimable(pgdat) > pgdat->min_unmapped_pages ||
+ node_page_state_pages(pgdat, NR_SLAB_RECLAIMABLE_B) > pgdat->min_slab_pages) {
/*
* Free memory by calling shrink node with increasing
* priorities until we have enough memory freed.
--
2.23.0
Use helper folio_is_file_lru() to check whether folio is file lru. Minor
readability improvement.
Signed-off-by: Miaohe Lin <[email protected]>
---
mm/vmscan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 79310b3cbbe3..208ce417f1a4 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1421,7 +1421,7 @@ static enum page_references folio_check_references(struct folio *folio,
}
/* Reclaim if clean, defer dirty folios to writeback */
- if (referenced_folio && !folio_test_swapbacked(folio))
+ if (referenced_folio && folio_is_file_lru(folio))
return PAGEREF_RECLAIM_CLEAN;
return PAGEREF_RECLAIM;
--
2.23.0
The MADV_FREE pages check in folio_check_dirty_writeback is a bit hard to
follow. Add a comment to make the code clear.
Suggested-by: Huang, Ying <[email protected]>
Signed-off-by: Miaohe Lin <[email protected]>
---
mm/vmscan.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c77d5052f230..4a76be47bed1 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1436,6 +1436,9 @@ static void folio_check_dirty_writeback(struct folio *folio,
/*
* Anonymous pages are not handled by flushers and must be written
* from reclaim context. Do not stall reclaim based on them
+ * MADV_FREE anonymous pages are put into inactive file list too.
+ * They could be mistakenly treated as file lru. So further anon
+ * test is needed.
*/
if (!folio_is_file_lru(folio) ||
(folio_test_anon(folio) && !folio_test_swapbacked(folio))) {
--
2.23.0
We don't need to check can_split_folio() because folio_maybe_dma_pinned()
is checked before. It will avoid the long term pinned pages to be swapped
out. And we can live with short term pinned pages. Without can_split_folio
checking we can simplify the code. Also activate_locked can be changed to
keep_locked as it's just short term pinning.
Suggested-by: Huang, Ying <[email protected]>
Signed-off-by: Miaohe Lin <[email protected]>
---
mm/vmscan.c | 22 ++++++++--------------
1 file changed, 8 insertions(+), 14 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4a76be47bed1..01f5db75a507 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1711,20 +1711,14 @@ static unsigned int shrink_page_list(struct list_head *page_list,
goto keep_locked;
if (folio_maybe_dma_pinned(folio))
goto keep_locked;
- if (PageTransHuge(page)) {
- /* cannot split THP, skip it */
- if (!can_split_folio(folio, NULL))
- goto activate_locked;
- /*
- * Split pages without a PMD map right
- * away. Chances are some or all of the
- * tail pages can be freed without IO.
- */
- if (!folio_entire_mapcount(folio) &&
- split_folio_to_list(folio,
- page_list))
- goto activate_locked;
- }
+ /*
+ * Split pages without a PMD map right
+ * away. Chances are some or all of the
+ * tail pages can be freed without IO.
+ */
+ if (PageTransHuge(page) && !folio_entire_mapcount(folio) &&
+ split_folio_to_list(folio, page_list))
+ goto keep_locked;
if (!add_to_swap(page)) {
if (!PageTransHuge(page))
goto activate_locked_split;
--
2.23.0
Introduce helper function reclaim_page_list() to eliminate the duplicated
code of doing shrink_page_list() and putback_lru_page. Also We can separate
node reclaim from node page list operation this way. No functional change
intended.
Signed-off-by: Miaohe Lin <[email protected]>
---
mm/vmscan.c | 50 +++++++++++++++++++++++++-------------------------
1 file changed, 25 insertions(+), 25 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 01f5db75a507..59b96320f481 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2531,14 +2531,12 @@ static void shrink_active_list(unsigned long nr_to_scan,
nr_deactivate, nr_rotated, sc->priority, file);
}
-unsigned long reclaim_pages(struct list_head *page_list)
+static unsigned int reclaim_page_list(struct list_head *page_list,
+ struct pglist_data *pgdat)
{
- int nid = NUMA_NO_NODE;
- unsigned int nr_reclaimed = 0;
- LIST_HEAD(node_page_list);
struct reclaim_stat dummy_stat;
- struct page *page;
- unsigned int noreclaim_flag;
+ unsigned int nr_reclaimed;
+ struct folio *folio;
struct scan_control sc = {
.gfp_mask = GFP_KERNEL,
.may_writepage = 1,
@@ -2547,6 +2545,24 @@ unsigned long reclaim_pages(struct list_head *page_list)
.no_demotion = 1,
};
+ nr_reclaimed = shrink_page_list(page_list, pgdat, &sc, &dummy_stat, false);
+ while (!list_empty(page_list)) {
+ folio = lru_to_folio(page_list);
+ list_del(&folio->lru);
+ putback_lru_page(&folio->page);
+ }
+
+ return nr_reclaimed;
+}
+
+unsigned long reclaim_pages(struct list_head *page_list)
+{
+ int nid = NUMA_NO_NODE;
+ unsigned int nr_reclaimed = 0;
+ LIST_HEAD(node_page_list);
+ struct page *page;
+ unsigned int noreclaim_flag;
+
noreclaim_flag = memalloc_noreclaim_save();
while (!list_empty(page_list)) {
@@ -2562,28 +2578,12 @@ unsigned long reclaim_pages(struct list_head *page_list)
continue;
}
- nr_reclaimed += shrink_page_list(&node_page_list,
- NODE_DATA(nid),
- &sc, &dummy_stat, false);
- while (!list_empty(&node_page_list)) {
- page = lru_to_page(&node_page_list);
- list_del(&page->lru);
- putback_lru_page(page);
- }
-
+ nr_reclaimed += reclaim_page_list(&node_page_list, NODE_DATA(nid));
nid = NUMA_NO_NODE;
}
- if (!list_empty(&node_page_list)) {
- nr_reclaimed += shrink_page_list(&node_page_list,
- NODE_DATA(nid),
- &sc, &dummy_stat, false);
- while (!list_empty(&node_page_list)) {
- page = lru_to_page(&node_page_list);
- list_del(&page->lru);
- putback_lru_page(page);
- }
- }
+ if (!list_empty(&node_page_list))
+ nr_reclaimed += reclaim_page_list(&node_page_list, NODE_DATA(nid));
memalloc_noreclaim_restore(noreclaim_flag);
--
2.23.0
We should activate swap-backed executable folios (e.g. tmpfs) after first
usage so that executable code gets yet better chance to stay in memory.
Suggested-by: Huang, Ying <[email protected]>
Signed-off-by: Miaohe Lin <[email protected]>
Cc: Joonsoo Kim <[email protected]>
---
mm/vmscan.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0e5818970998..cc1193e320c2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1412,9 +1412,9 @@ static enum page_references folio_check_references(struct folio *folio,
return PAGEREF_ACTIVATE;
/*
- * Activate file-backed executable folios after first usage.
+ * Activate executable folios after first usage.
*/
- if ((vm_flags & VM_EXEC) && !folio_test_swapbacked(folio))
+ if (vm_flags & VM_EXEC)
return PAGEREF_ACTIVATE;
return PAGEREF_KEEP;
--
2.23.0
On Mon, Apr 11, 2022 at 09:53:15AM +0800, Miaohe Lin wrote:
> On 2022/4/9 21:44, Matthew Wilcox wrote:
> > On Sat, Apr 09, 2022 at 05:34:54PM +0800, Miaohe Lin wrote:
> >> + nr_reclaimed = shrink_page_list(page_list, pgdat, &sc, &dummy_stat, false);
> >> + while (!list_empty(page_list)) {
> >> + folio = lru_to_folio(page_list);
> >> + list_del(&folio->lru);
> >> + putback_lru_page(&folio->page);
> >
> > folio_putback_lru()
>
> I thought folio_putback_lru is deliberately not to use because there is no caller of folio_putback_lru now.
> But it seems I was wrong. Will do it in next version.
Looks like all of the uses of it that I mooted during the last merge
window ended up going away.
https://lore.kernel.org/all/[email protected]/
was obsoleted by commit b109b87050df
https://lore.kernel.org/all/[email protected]/
and
https://lore.kernel.org/all/[email protected]/
were also obsoleted by Hugh's mlock changes
I also sent
https://lore.kernel.org/all/[email protected]/
but never quite got it up to submittable quality.
On 2022/4/11 11:17, Matthew Wilcox wrote:
> On Mon, Apr 11, 2022 at 09:53:15AM +0800, Miaohe Lin wrote:
>> On 2022/4/9 21:44, Matthew Wilcox wrote:
>>> On Sat, Apr 09, 2022 at 05:34:54PM +0800, Miaohe Lin wrote:
>>>> + nr_reclaimed = shrink_page_list(page_list, pgdat, &sc, &dummy_stat, false);
>>>> + while (!list_empty(page_list)) {
>>>> + folio = lru_to_folio(page_list);
>>>> + list_del(&folio->lru);
>>>> + putback_lru_page(&folio->page);
>>>
>>> folio_putback_lru()
>>
>> I thought folio_putback_lru is deliberately not to use because there is no caller of folio_putback_lru now.
>> But it seems I was wrong. Will do it in next version.
>
> Looks like all of the uses of it that I mooted during the last merge
> window ended up going away.
>
> https://lore.kernel.org/all/[email protected]/
> was obsoleted by commit b109b87050df
>
> https://lore.kernel.org/all/[email protected]/
> and
> https://lore.kernel.org/all/[email protected]/
> were also obsoleted by Hugh's mlock changes
>
> I also sent
> https://lore.kernel.org/all/[email protected]/
>
> but never quite got it up to submittable quality.
I see. This is really a pity. Many thanks for clarifying. :)
>
> .
>
On Sat, Apr 09, 2022 at 05:34:52PM +0800, Miaohe Lin wrote:
> The MADV_FREE pages check in folio_check_dirty_writeback is a bit hard to
> follow. Add a comment to make the code clear.
>
> Suggested-by: Huang, Ying <[email protected]>
> Signed-off-by: Miaohe Lin <[email protected]>
> ---
> mm/vmscan.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c77d5052f230..4a76be47bed1 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1436,6 +1436,9 @@ static void folio_check_dirty_writeback(struct folio *folio,
> /*
> * Anonymous pages are not handled by flushers and must be written
> * from reclaim context. Do not stall reclaim based on them
While you touch this please add a period at the end of the above
sentence.
Otherwise looks good:
Reviewed-by: Christoph Hellwig <[email protected]>
On Sat, Apr 09, 2022 at 05:34:53PM +0800, Miaohe Lin wrote:
> We don't need to check can_split_folio() because folio_maybe_dma_pinned()
> is checked before. It will avoid the long term pinned pages to be swapped
> out. And we can live with short term pinned pages. Without can_split_folio
> checking we can simplify the code. Also activate_locked can be changed to
> keep_locked as it's just short term pinning.
>
> Suggested-by: Huang, Ying <[email protected]>
> Signed-off-by: Miaohe Lin <[email protected]>
> ---
> mm/vmscan.c | 22 ++++++++--------------
> 1 file changed, 8 insertions(+), 14 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4a76be47bed1..01f5db75a507 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1711,20 +1711,14 @@ static unsigned int shrink_page_list(struct list_head *page_list,
> goto keep_locked;
> if (folio_maybe_dma_pinned(folio))
> goto keep_locked;
> - if (PageTransHuge(page)) {
> - /* cannot split THP, skip it */
> - if (!can_split_folio(folio, NULL))
> - goto activate_locked;
> - /*
> - * Split pages without a PMD map right
> - * away. Chances are some or all of the
> - * tail pages can be freed without IO.
> - */
> - if (!folio_entire_mapcount(folio) &&
> - split_folio_to_list(folio,
> - page_list))
> - goto activate_locked;
> - }
> + /*
> + * Split pages without a PMD map right
> + * away. Chances are some or all of the
> + * tail pages can be freed without IO.
> + */
This could use more of the line length and be more readable:
/*
* Split pages without a PMD map right away.
* Chances are some or all of the tail pages
* can be freed without IO.
*/
> + if (PageTransHuge(page) && !folio_entire_mapcount(folio) &&
Please put the folio_entire_mapcoun ont a separate line to make this a
bit more redable.
On Sat, Apr 09, 2022 at 05:34:52PM +0800, Miaohe Lin wrote:
> The MADV_FREE pages check in folio_check_dirty_writeback is a bit hard to
> follow. Add a comment to make the code clear.
>
> Suggested-by: Huang, Ying <[email protected]>
> Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
--
Oscar Salvador
SUSE Labs
On 12.04.22 15:42, Miaohe Lin wrote:
> On 2022/4/12 16:59, Oscar Salvador wrote:
>> On Sat, Apr 09, 2022 at 05:34:53PM +0800, Miaohe Lin wrote:
>>> We don't need to check can_split_folio() because folio_maybe_dma_pinned()
>>> is checked before. It will avoid the long term pinned pages to be swapped
>>> out. And we can live with short term pinned pages. Without can_split_folio
>>> checking we can simplify the code. Also activate_locked can be changed to
>>> keep_locked as it's just short term pinning.
>>
>> What do you mean by "we can live with short term pinned pages"?
>> Does it mean that it was not pinned when we check
>> folio_maybe_dma_pinned() but now it is?
>>
>> To me it looks like the pinning is fluctuating and we rely on
>> split_folio_to_list() to see whether we succeed or not, and if not
>> we give it another spin in the next round?
>
> Yes. Short term pinned pages is relative to long term pinned pages and these pages won't be
> pinned for a noticeable time. So it's expected to split the folio successfully in the next
> round as the pinning is really fluctuating. Or am I miss something?
>
Just so we're on the same page. folio_maybe_dma_pinned() only capture
FOLL_PIN, but not FOLL_GET. You can have long-term FOLL_GET right now
via vmsplice().
can_split_folio() is more precise then folio_maybe_dma_pinned(), but
both are racy as long as the page is still mapped.
--
Thanks,
David / dhildenb
On Sat, 2022-04-09 at 17:34 +0800, Miaohe Lin wrote:
> We should activate swap-backed executable folios (e.g. tmpfs) after first
> usage so that executable code gets yet better chance to stay in memory.
>
> Suggested-by: Huang, Ying <[email protected]>
> Signed-off-by: Miaohe Lin <[email protected]>
> Cc: Joonsoo Kim <[email protected]>
Reviewed-by: Huang, Ying <[email protected]>
Best Regards,
Huang, Ying
> ---
> mm/vmscan.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 0e5818970998..cc1193e320c2 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1412,9 +1412,9 @@ static enum page_references folio_check_references(struct folio *folio,
> return PAGEREF_ACTIVATE;
>
>
>
>
> /*
> - * Activate file-backed executable folios after first usage.
> + * Activate executable folios after first usage.
> */
> - if ((vm_flags & VM_EXEC) && !folio_test_swapbacked(folio))
> + if (vm_flags & VM_EXEC)
> return PAGEREF_ACTIVATE;
>
>
>
>
> return PAGEREF_KEEP;
On Sat, Apr 09, 2022 at 05:34:53PM +0800, Miaohe Lin wrote:
> We don't need to check can_split_folio() because folio_maybe_dma_pinned()
> is checked before. It will avoid the long term pinned pages to be swapped
> out. And we can live with short term pinned pages. Without can_split_folio
> checking we can simplify the code. Also activate_locked can be changed to
> keep_locked as it's just short term pinning.
What do you mean by "we can live with short term pinned pages"?
Does it mean that it was not pinned when we check
folio_maybe_dma_pinned() but now it is?
To me it looks like the pinning is fluctuating and we rely on
split_folio_to_list() to see whether we succeed or not, and if not
we give it another spin in the next round?
> Suggested-by: Huang, Ying <[email protected]>
> Signed-off-by: Miaohe Lin <[email protected]>
> ---
> mm/vmscan.c | 22 ++++++++--------------
> 1 file changed, 8 insertions(+), 14 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4a76be47bed1..01f5db75a507 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1711,20 +1711,14 @@ static unsigned int shrink_page_list(struct list_head *page_list,
> goto keep_locked;
> if (folio_maybe_dma_pinned(folio))
> goto keep_locked;
> - if (PageTransHuge(page)) {
> - /* cannot split THP, skip it */
> - if (!can_split_folio(folio, NULL))
> - goto activate_locked;
> - /*
> - * Split pages without a PMD map right
> - * away. Chances are some or all of the
> - * tail pages can be freed without IO.
> - */
> - if (!folio_entire_mapcount(folio) &&
> - split_folio_to_list(folio,
> - page_list))
> - goto activate_locked;
> - }
> + /*
> + * Split pages without a PMD map right
> + * away. Chances are some or all of the
> + * tail pages can be freed without IO.
> + */
> + if (PageTransHuge(page) && !folio_entire_mapcount(folio) &&
> + split_folio_to_list(folio, page_list))
> + goto keep_locked;
> if (!add_to_swap(page)) {
> if (!PageTransHuge(page))
> goto activate_locked_split;
> --
> 2.23.0
>
>
--
Oscar Salvador
SUSE Labs
On Sat, 2022-04-09 at 17:34 +0800, Miaohe Lin wrote:
> Since commit 6b4f7799c6a5 ("mm: vmscan: invoke slab shrinkers from
> shrink_zone()"), slab reclaim and lru page reclaim are done together
> in the shrink_node. So we should take min_slab_pages into account
> when try to call shrink_node.
>
Looks reasonable to me, copying Johannes.
Hi, Christoph,
Should we check min_unmapped_pages and min_slab_pages in shrink_node()
to avoid reclaim LRU when necessary and vice versa? This may be done
via another 2 scan_control flags.
Best Regards,
Huang, Ying
> Signed-off-by: Miaohe Lin <[email protected]>
> ---
> mm/vmscan.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 53f1d0755b34..ba83d8f3e53e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4714,7 +4714,8 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
> noreclaim_flag = memalloc_noreclaim_save();
> set_task_reclaim_state(p, &sc.reclaim_state);
>
>
>
>
> - if (node_pagecache_reclaimable(pgdat) > pgdat->min_unmapped_pages) {
> + if (node_pagecache_reclaimable(pgdat) > pgdat->min_unmapped_pages ||
> + node_page_state_pages(pgdat, NR_SLAB_RECLAIMABLE_B) > pgdat->min_slab_pages) {
> /*
> * Free memory by calling shrink node with increasing
> * priorities until we have enough memory freed.
LRU_UNEVICTABLE is not taken into account when shrink lruvec. So we can
save a bit of stack space by shrinking the array size of nr and targets
to NR_LRU_LISTS - 1. No functional change intended.
Signed-off-by: Miaohe Lin <[email protected]>
---
mm/vmscan.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 59b96320f481..0e5818970998 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2881,8 +2881,9 @@ static bool can_age_anon_pages(struct pglist_data *pgdat,
static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
{
- unsigned long nr[NR_LRU_LISTS];
- unsigned long targets[NR_LRU_LISTS];
+ /* LRU_UNEVICTABLE is not taken into account. */
+ unsigned long nr[NR_LRU_LISTS - 1];
+ unsigned long targets[NR_LRU_LISTS - 1];
unsigned long nr_to_scan;
enum lru_list lru;
unsigned long nr_reclaimed = 0;
--
2.23.0
On 2022/4/12 16:59, Oscar Salvador wrote:
> On Sat, Apr 09, 2022 at 05:34:53PM +0800, Miaohe Lin wrote:
>> We don't need to check can_split_folio() because folio_maybe_dma_pinned()
>> is checked before. It will avoid the long term pinned pages to be swapped
>> out. And we can live with short term pinned pages. Without can_split_folio
>> checking we can simplify the code. Also activate_locked can be changed to
>> keep_locked as it's just short term pinning.
>
> What do you mean by "we can live with short term pinned pages"?
> Does it mean that it was not pinned when we check
> folio_maybe_dma_pinned() but now it is?
>
> To me it looks like the pinning is fluctuating and we rely on
> split_folio_to_list() to see whether we succeed or not, and if not
> we give it another spin in the next round?
Yes. Short term pinned pages is relative to long term pinned pages and these pages won't be
pinned for a noticeable time. So it's expected to split the folio successfully in the next
round as the pinning is really fluctuating. Or am I miss something?
Many thanks for your comment and reply!
>
>> Suggested-by: Huang, Ying <[email protected]>
>> Signed-off-by: Miaohe Lin <[email protected]>
>> ---
>> mm/vmscan.c | 22 ++++++++--------------
>> 1 file changed, 8 insertions(+), 14 deletions(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 4a76be47bed1..01f5db75a507 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1711,20 +1711,14 @@ static unsigned int shrink_page_list(struct list_head *page_list,
>> goto keep_locked;
>> if (folio_maybe_dma_pinned(folio))
>> goto keep_locked;
>> - if (PageTransHuge(page)) {
>> - /* cannot split THP, skip it */
>> - if (!can_split_folio(folio, NULL))
>> - goto activate_locked;
>> - /*
>> - * Split pages without a PMD map right
>> - * away. Chances are some or all of the
>> - * tail pages can be freed without IO.
>> - */
>> - if (!folio_entire_mapcount(folio) &&
>> - split_folio_to_list(folio,
>> - page_list))
>> - goto activate_locked;
>> - }
>> + /*
>> + * Split pages without a PMD map right
>> + * away. Chances are some or all of the
>> + * tail pages can be freed without IO.
>> + */
>> + if (PageTransHuge(page) && !folio_entire_mapcount(folio) &&
>> + split_folio_to_list(folio, page_list))
>> + goto keep_locked;
>> if (!add_to_swap(page)) {
>> if (!PageTransHuge(page))
>> goto activate_locked_split;
>> --
>> 2.23.0
>>
>>
>
On Sat, 2022-04-09 at 17:34 +0800, Miaohe Lin wrote:
> The MADV_FREE pages check in folio_check_dirty_writeback is a bit hard to
> follow. Add a comment to make the code clear.
>
> Suggested-by: Huang, Ying <[email protected]>
> Signed-off-by: Miaohe Lin <[email protected]>
> ---
> mm/vmscan.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c77d5052f230..4a76be47bed1 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1436,6 +1436,9 @@ static void folio_check_dirty_writeback(struct folio
> *folio,
> /*
> * Anonymous pages are not handled by flushers and must be written
> * from reclaim context. Do not stall reclaim based on them
> + * MADV_FREE anonymous pages are put into inactive file list too.
> + * They could be mistakenly treated as file lru. So further anon
> + * test is needed.
> */
How about something as follows,
/*
* Anonymous pages (including MADV_FREE ones) are not handled
by * flushers and must be written from reclaim context. Do not stall
* reclaim based on them
*/.
Best Regards,
Huang, Ying
> if (!folio_is_file_lru(folio) ||
> (folio_test_anon(folio) && !folio_test_swapbacked(folio))) {
On Sat, 2022-04-09 at 17:34 +0800, Miaohe Lin wrote:
> We don't need to check can_split_folio() because folio_maybe_dma_pinned()
> is checked before. It will avoid the long term pinned pages to be swapped
> out. And we can live with short term pinned pages. Without can_split_folio
> checking we can simplify the code. Also activate_locked can be changed to
> keep_locked as it's just short term pinning.
>
> Suggested-by: Huang, Ying <[email protected]>
> Signed-off-by: Miaohe Lin <[email protected]>
Look good to me. Thanks!
Reviewed-by: Huang, Ying <[email protected]>
Best Regards,
Huang, Ying
> ---
> mm/vmscan.c | 22 ++++++++--------------
> 1 file changed, 8 insertions(+), 14 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4a76be47bed1..01f5db75a507 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1711,20 +1711,14 @@ static unsigned int shrink_page_list(struct list_head *page_list,
> goto keep_locked;
> if (folio_maybe_dma_pinned(folio))
> goto keep_locked;
> - if (PageTransHuge(page)) {
> - /* cannot split THP, skip it */
> - if (!can_split_folio(folio, NULL))
> - goto activate_locked;
> - /*
> - * Split pages without a PMD map right
> - * away. Chances are some or all of the
> - * tail pages can be freed without IO.
> - */
> - if (!folio_entire_mapcount(folio) &&
> - split_folio_to_list(folio,
> - page_list))
> - goto activate_locked;
> - }
> + /*
> + * Split pages without a PMD map right
> + * away. Chances are some or all of the
> + * tail pages can be freed without IO.
> + */
> + if (PageTransHuge(page) && !folio_entire_mapcount(folio) &&
> + split_folio_to_list(folio, page_list))
> + goto keep_locked;
> if (!add_to_swap(page)) {
> if (!PageTransHuge(page))
> goto activate_locked_split;
On 2022/4/11 19:50, [email protected] wrote:
> On Sat, 2022-04-09 at 17:34 +0800, Miaohe Lin wrote:
>> The MADV_FREE pages check in folio_check_dirty_writeback is a bit hard to
>> follow. Add a comment to make the code clear.
>>
>> Suggested-by: Huang, Ying <[email protected]>
>> Signed-off-by: Miaohe Lin <[email protected]>
>> ---
>> mm/vmscan.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index c77d5052f230..4a76be47bed1 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1436,6 +1436,9 @@ static void folio_check_dirty_writeback(struct folio
>> *folio,
>> /*
>> * Anonymous pages are not handled by flushers and must be written
>> * from reclaim context. Do not stall reclaim based on them
>> + * MADV_FREE anonymous pages are put into inactive file list too.
>> + * They could be mistakenly treated as file lru. So further anon
>> + * test is needed.
>> */
>
> How about something as follows,
>
> /*
> * Anonymous pages (including MADV_FREE ones) are not handled
> by * flushers and must be written from reclaim context. Do not stall
> * reclaim based on them
> */.
>
This comment looks good but it seems it doesn't explain the MADV_FREE check logic.
Is this already enough? If so, will change to use this. Thanks!
> Best Regards,
> Huang, Ying
>
>> if (!folio_is_file_lru(folio) ||
>> (folio_test_anon(folio) && !folio_test_swapbacked(folio))) {
>
>
> .
>
On 2022/4/13 9:26, [email protected] wrote:
> On Tue, 2022-04-12 at 16:59 +0200, David Hildenbrand wrote:
>> On 12.04.22 15:42, Miaohe Lin wrote:
>>> On 2022/4/12 16:59, Oscar Salvador wrote:
>>>> On Sat, Apr 09, 2022 at 05:34:53PM +0800, Miaohe Lin wrote:
>>>>> We don't need to check can_split_folio() because folio_maybe_dma_pinned()
>>>>> is checked before. It will avoid the long term pinned pages to be swapped
>>>>> out. And we can live with short term pinned pages. Without can_split_folio
>>>>> checking we can simplify the code. Also activate_locked can be changed to
>>>>> keep_locked as it's just short term pinning.
>>>>
>>>> What do you mean by "we can live with short term pinned pages"?
>>>> Does it mean that it was not pinned when we check
>>>> folio_maybe_dma_pinned() but now it is?
>>>>
>>>> To me it looks like the pinning is fluctuating and we rely on
>>>> split_folio_to_list() to see whether we succeed or not, and if not
>>>> we give it another spin in the next round?
>>>
>>> Yes. Short term pinned pages is relative to long term pinned pages and these pages won't be
>>> pinned for a noticeable time. So it's expected to split the folio successfully in the next
>>> round as the pinning is really fluctuating. Or am I miss something?
>>>
>>
>> Just so we're on the same page. folio_maybe_dma_pinned() only capture
>> FOLL_PIN, but not FOLL_GET. You can have long-term FOLL_GET right now
>> via vmsplice().
>
> Per my original understanding, folio_maybe_dma_pinned() can be used to
> detect long-term pinned pages. And it seems reasonable to skip the
> long-term pinned pages and try short-term pinned pages during page
> reclaiming. But as you pointed out, vmsplice() doesn't use FOLL_PIN.
> So if vmsplice() is expected to pin pages for long time, and we have no
> way to detect it, then we should keep can_split_folio() in the original
> code.
IIUC, even if we have no way to detect it, can_split_folio should be removed
due to:
"""
can_split_huge_page is introduced via commit b8f593cd0896 ("mm, THP, swap:
check whether THP can be split firstly") to avoid deleting the THP from
the swap cache and freeing the swap cluster when the THP cannot be split.
But since commit bd4c82c22c36 ("mm, THP, swap: delay splitting THP after
swapped out"), splitting THP is delayed until THP is swapped out. There's
no need to delete the THP from the swap cache and free the swap cluster
anymore. Thus we can remove this unneeded can_split_huge_page check now to
simplify the code logic.
"""
THP might not need to be splitted and could be freed directly after swapout.
So can_split_huge_page check here is unneeded. Or am I miss something?
Thanks!
>
> Copying more people who have worked on long-term pinning for comments.
>
> Best Regards,
> Huang, Ying
>
>> can_split_folio() is more precise then folio_maybe_dma_pinned(), but
>> both are racy as long as the page is still mapped.
>>
>>
>
>
> .
>
On Tue, 2022-04-12 at 16:59 +0200, David Hildenbrand wrote:
> On 12.04.22 15:42, Miaohe Lin wrote:
> > On 2022/4/12 16:59, Oscar Salvador wrote:
> > > On Sat, Apr 09, 2022 at 05:34:53PM +0800, Miaohe Lin wrote:
> > > > We don't need to check can_split_folio() because folio_maybe_dma_pinned()
> > > > is checked before. It will avoid the long term pinned pages to be swapped
> > > > out. And we can live with short term pinned pages. Without can_split_folio
> > > > checking we can simplify the code. Also activate_locked can be changed to
> > > > keep_locked as it's just short term pinning.
> > >
> > > What do you mean by "we can live with short term pinned pages"?
> > > Does it mean that it was not pinned when we check
> > > folio_maybe_dma_pinned() but now it is?
> > >
> > > To me it looks like the pinning is fluctuating and we rely on
> > > split_folio_to_list() to see whether we succeed or not, and if not
> > > we give it another spin in the next round?
> >
> > Yes. Short term pinned pages is relative to long term pinned pages and these pages won't be
> > pinned for a noticeable time. So it's expected to split the folio successfully in the next
> > round as the pinning is really fluctuating. Or am I miss something?
> >
>
> Just so we're on the same page. folio_maybe_dma_pinned() only capture
> FOLL_PIN, but not FOLL_GET. You can have long-term FOLL_GET right now
> via vmsplice().
Per my original understanding, folio_maybe_dma_pinned() can be used to
detect long-term pinned pages. And it seems reasonable to skip the
long-term pinned pages and try short-term pinned pages during page
reclaiming. But as you pointed out, vmsplice() doesn't use FOLL_PIN.
So if vmsplice() is expected to pin pages for long time, and we have no
way to detect it, then we should keep can_split_folio() in the original
code.
Copying more people who have worked on long-term pinning for comments.
Best Regards,
Huang, Ying
> can_split_folio() is more precise then folio_maybe_dma_pinned(), but
> both are racy as long as the page is still mapped.
>
>
On Wed, 2022-04-13 at 09:26 +0800, [email protected] wrote:
> On Tue, 2022-04-12 at 16:59 +0200, David Hildenbrand wrote:
> > On 12.04.22 15:42, Miaohe Lin wrote:
> > > On 2022/4/12 16:59, Oscar Salvador wrote:
> > > > On Sat, Apr 09, 2022 at 05:34:53PM +0800, Miaohe Lin wrote:
> > > > > We don't need to check can_split_folio() because folio_maybe_dma_pinned()
> > > > > is checked before. It will avoid the long term pinned pages to be swapped
> > > > > out. And we can live with short term pinned pages. Without can_split_folio
> > > > > checking we can simplify the code. Also activate_locked can be changed to
> > > > > keep_locked as it's just short term pinning.
> > > >
> > > > What do you mean by "we can live with short term pinned pages"?
> > > > Does it mean that it was not pinned when we check
> > > > folio_maybe_dma_pinned() but now it is?
> > > >
> > > > To me it looks like the pinning is fluctuating and we rely on
> > > > split_folio_to_list() to see whether we succeed or not, and if not
> > > > we give it another spin in the next round?
> > >
> > > Yes. Short term pinned pages is relative to long term pinned pages and these pages won't be
> > > pinned for a noticeable time. So it's expected to split the folio successfully in the next
> > > round as the pinning is really fluctuating. Or am I miss something?
> > >
> >
> > Just so we're on the same page. folio_maybe_dma_pinned() only capture
> > FOLL_PIN, but not FOLL_GET. You can have long-term FOLL_GET right now
> > via vmsplice().
>
> Per my original understanding, folio_maybe_dma_pinned() can be used to
> detect long-term pinned pages. And it seems reasonable to skip the
> long-term pinned pages and try short-term pinned pages during page
> reclaiming. But as you pointed out, vmsplice() doesn't use FOLL_PIN.
> So if vmsplice() is expected to pin pages for long time, and we have no
> way to detect it, then we should keep can_split_folio() in the original
> code.
>
> Copying more people who have worked on long-term pinning for comments.
Checked the discussion in the following thread,
https://lore.kernel.org/lkml/CA+CK2bBffHBxjmb9jmSKacm0fJMinyt3Nhk8Nx6iudcQSj80_w@mail.gmail.com/
It seems that from the practical point of view, folio_maybe_dma_pinned()
can identify most long-term pinned pages that may block memory hot-
remove or CMA allocation. Although as David pointed out, some pages may
still be GUPed for long time (e.g. via vmsplice) even if
!folio_maybe_dma_pinned().
But from another point of view, can_split_huge_page() is cheap and THP
swapout is expensive (swap space, disk IO, and hard to be recovered), so
it may be better to keep can_split_huge_page() in shink_page_list().
Best Regards,
Huang, Ying
>
> > can_split_folio() is more precise then folio_maybe_dma_pinned(), but
> > both are racy as long as the page is still mapped.
> >
> >
>