2023-12-13 07:28:59

by Jianfeng Wang

[permalink] [raw]
Subject: [PATCH] mm: remove redundant lru_add_drain() prior to unmapping pages

When unmapping VMA pages, pages will be gathered in batch and released by
tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function
tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(),
which calls lru_add_drain() to drain cached pages in folio_batch before
releasing gathered pages. Thus, it is redundant to call lru_add_drain()
before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set.

Remove lru_add_drain() prior to gathering and unmapping pages in
exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set.

Note that the page unmapping process in oom_killer (e.g., in
__oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have
redundant lru_add_drain(). So, this commit makes the code more consistent.

Signed-off-by: Jianfeng Wang <[email protected]>
---
mm/mmap.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/mm/mmap.c b/mm/mmap.c
index 1971bfffcc03..0451285dee4f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2330,7 +2330,9 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas,
struct mmu_gather tlb;
unsigned long mt_start = mas->index;

+#ifdef CONFIG_MMU_GATHER_NO_GATHER
lru_add_drain();
+#endif
tlb_gather_mmu(&tlb, mm);
update_hiwater_rss(mm);
unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked);
@@ -3300,7 +3302,9 @@ void exit_mmap(struct mm_struct *mm)
return;
}

+#ifdef CONFIG_MMU_GATHER_NO_GATHER
lru_add_drain();
+#endif
flush_cache_mm(mm);
tlb_gather_mmu_fullmm(&tlb, mm);
/* update_hiwater_rss(mm) here? but nobody should be looking */
--
2.42.1


2023-12-13 22:58:06

by Tim Chen

[permalink] [raw]
Subject: Re: [PATCH] mm: remove redundant lru_add_drain() prior to unmapping pages

On Tue, 2023-12-12 at 23:28 -0800, Jianfeng Wang wrote:
> When unmapping VMA pages, pages will be gathered in batch and released by
> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function
> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(),
> which calls lru_add_drain() to drain cached pages in folio_batch before
> releasing gathered pages. Thus, it is redundant to call lru_add_drain()
> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set.
>
> Remove lru_add_drain() prior to gathering and unmapping pages in
> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set.
>
> Note that the page unmapping process in oom_killer (e.g., in
> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have
> redundant lru_add_drain(). So, this commit makes the code more consistent.
>
> Signed-off-by: Jianfeng Wang <[email protected]>
> ---
> mm/mmap.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 1971bfffcc03..0451285dee4f 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2330,7 +2330,9 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas,
> struct mmu_gather tlb;
> unsigned long mt_start = mas->index;
>
> +#ifdef CONFIG_MMU_GATHER_NO_GATHER

In your comment you say skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER
is *not* set. So shouldn't this be

#ifndef CONFIG_MMU_GATHER_NO_GATHER ?

> lru_add_drain();
> +#endif
> tlb_gather_mmu(&tlb, mm);
> update_hiwater_rss(mm);
> unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked);
> @@ -3300,7 +3302,9 @@ void exit_mmap(struct mm_struct *mm)
> return;
> }
>
> +#ifdef CONFIG_MMU_GATHER_NO_GATHER

same question as above.

> lru_add_drain();
> +#endif
> flush_cache_mm(mm);
> tlb_gather_mmu_fullmm(&tlb, mm);
> /* update_hiwater_rss(mm) here? but nobody should be looking */

2023-12-14 01:03:46

by Jianfeng Wang

[permalink] [raw]
Subject: Re: [PATCH] mm: remove redundant lru_add_drain() prior to unmapping pages

On 12/13/23 2:57 PM, Tim Chen wrote:
> On Tue, 2023-12-12 at 23:28 -0800, Jianfeng Wang wrote:
>> When unmapping VMA pages, pages will be gathered in batch and released by
>> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function
>> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(),
>> which calls lru_add_drain() to drain cached pages in folio_batch before
>> releasing gathered pages. Thus, it is redundant to call lru_add_drain()
>> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set.
>>
>> Remove lru_add_drain() prior to gathering and unmapping pages in
>> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set.
>>
>> Note that the page unmapping process in oom_killer (e.g., in
>> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have
>> redundant lru_add_drain(). So, this commit makes the code more consistent.
>>
>> Signed-off-by: Jianfeng Wang <[email protected]>
>> ---
>> mm/mmap.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index 1971bfffcc03..0451285dee4f 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -2330,7 +2330,9 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas,
>> struct mmu_gather tlb;
>> unsigned long mt_start = mas->index;
>>
>> +#ifdef CONFIG_MMU_GATHER_NO_GATHER
>
> In your comment you say skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER
> is *not* set. So shouldn't this be
>
> #ifndef CONFIG_MMU_GATHER_NO_GATHER ?
>
Hi Tim,

The mmu_gather feature is used to gather pages produced by unmap_vmas() and
release them in batch in tlb_finish_mmu(). The feature is *on* if
CONFIG_MMU_GATHER_NO_GATHER is *not* set. Note that: tlb_finish_mmu() will call
free_pages_and_swap_cache()/lru_add_drain() only when the feature is on.

Yes, this commit aims to skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER
is *not* set (i.e. when the mmu_gather feature is on) because it is redundant.

If CONFIG_MMU_GATHER_NO_GATHER is set, pages will be released in unmap_vmas().
tlb_finish_mmu() will not call lru_add_drain(). So, it is still necessary to
keep the lru_add_drain() call to clear cached pages before unmap_vmas(), as
folio_batchs hold a reference count for pages in them.

The same applies to the other case.

Thanks,
- Jianfeng

>> lru_add_drain();
>> +#endif
>> tlb_gather_mmu(&tlb, mm);
>> update_hiwater_rss(mm);
>> unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked);
>> @@ -3300,7 +3302,9 @@ void exit_mmap(struct mm_struct *mm)
>> return;
>> }
>>
>> +#ifdef CONFIG_MMU_GATHER_NO_GATHER
>
> same question as above.
>
>> lru_add_drain();
>> +#endif
>> flush_cache_mm(mm);
>> tlb_gather_mmu_fullmm(&tlb, mm);
>> /* update_hiwater_rss(mm) here? but nobody should be looking */
>

2023-12-14 17:58:42

by Tim Chen

[permalink] [raw]
Subject: Re: [PATCH] mm: remove redundant lru_add_drain() prior to unmapping pages

On Wed, 2023-12-13 at 17:03 -0800, Jianfeng Wang wrote:
> On 12/13/23 2:57 PM, Tim Chen wrote:
> > On Tue, 2023-12-12 at 23:28 -0800, Jianfeng Wang wrote:
> > > When unmapping VMA pages, pages will be gathered in batch and released by
> > > tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function
> > > tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(),
> > > which calls lru_add_drain() to drain cached pages in folio_batch before
> > > releasing gathered pages. Thus, it is redundant to call lru_add_drain()
> > > before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set.
> > >
> > > Remove lru_add_drain() prior to gathering and unmapping pages in
> > > exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set.
> > >
> > > Note that the page unmapping process in oom_killer (e.g., in
> > > __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have
> > > redundant lru_add_drain(). So, this commit makes the code more consistent.
> > >
> > > Signed-off-by: Jianfeng Wang <[email protected]>
> > > ---
> > > mm/mmap.c | 4 ++++
> > > 1 file changed, 4 insertions(+)
> > >
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index 1971bfffcc03..0451285dee4f 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -2330,7 +2330,9 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas,
> > > struct mmu_gather tlb;
> > > unsigned long mt_start = mas->index;
> > >
> > > +#ifdef CONFIG_MMU_GATHER_NO_GATHER
> >
> > In your comment you say skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER
> > is *not* set. So shouldn't this be
> >
> > #ifndef CONFIG_MMU_GATHER_NO_GATHER ?
> >
> Hi Tim,
>
> The mmu_gather feature is used to gather pages produced by unmap_vmas() and
> release them in batch in tlb_finish_mmu(). The feature is *on* if
> CONFIG_MMU_GATHER_NO_GATHER is *not* set. Note that: tlb_finish_mmu() will call
> free_pages_and_swap_cache()/lru_add_drain() only when the feature is on.

Thanks for the explanation.

Looking at the code, lru_add_drain() is executed for #ifndef CONFIG_MMU_GATHER_NO_GATHER
in tlb_finish_mmu(). So the logic of your patch is fine.
 
The #ifndef CONFIG_MMU_GATHER_NO_GATHER means
mmu_gather feature is on. The double negative throws me off on in my first read
of your commit log.

Suggest that you add a comment in code to make it easier for
future code maintenence:

/* defer lru_add_drain() to tlb_finish_mmu() for ifndef CONFIG_MMU_GATHER_NO_GATHER */

Is your change of skipping the extra lru_add_drain() motivated by some performance reason
in a workload? Wonder whether it is worth adding an extra ifdef in the code.

Tim

>
> Yes, this commit aims to skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER
> is *not* set (i.e. when the mmu_gather feature is on) because it is redundant.
>
> If CONFIG_MMU_GATHER_NO_GATHER is set, pages will be released in unmap_vmas().
> tlb_finish_mmu() will not call lru_add_drain(). So, it is still necessary to
> keep the lru_add_drain() call to clear cached pages before unmap_vmas(), as
> folio_batchs hold a reference count for pages in them.
>
> The same applies to the other case.
>
> Thanks,
> - Jianfeng
>
> > > lru_add_drain();
> > > +#endif
> > > tlb_gather_mmu(&tlb, mm);
> > > update_hiwater_rss(mm);
> > > unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked);
> > > @@ -3300,7 +3302,9 @@ void exit_mmap(struct mm_struct *mm)
> > > return;
> > > }
> > >
> > > +#ifdef CONFIG_MMU_GATHER_NO_GATHER
> >
> > same question as above.
> >
> > > lru_add_drain();
> > > +#endif
> > > flush_cache_mm(mm);
> > > tlb_gather_mmu_fullmm(&tlb, mm);
> > > /* update_hiwater_rss(mm) here? but nobody should be looking */
> >

2023-12-14 20:53:54

by Jianfeng Wang

[permalink] [raw]
Subject: Re: [PATCH] mm: remove redundant lru_add_drain() prior to unmapping pages

On 12/14/23 9:57 AM, Tim Chen wrote:
> On Wed, 2023-12-13 at 17:03 -0800, Jianfeng Wang wrote:
>> On 12/13/23 2:57 PM, Tim Chen wrote:
>>> On Tue, 2023-12-12 at 23:28 -0800, Jianfeng Wang wrote:
>>>> When unmapping VMA pages, pages will be gathered in batch and released by
>>>> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function
>>>> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(),
>>>> which calls lru_add_drain() to drain cached pages in folio_batch before
>>>> releasing gathered pages. Thus, it is redundant to call lru_add_drain()
>>>> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set.
>>>>
>>>> Remove lru_add_drain() prior to gathering and unmapping pages in
>>>> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set.
>>>>
>>>> Note that the page unmapping process in oom_killer (e.g., in
>>>> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have
>>>> redundant lru_add_drain(). So, this commit makes the code more consistent.
>>>>
>>>> Signed-off-by: Jianfeng Wang <[email protected]>
>>>> ---
>>>> mm/mmap.c | 4 ++++
>>>> 1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>>> index 1971bfffcc03..0451285dee4f 100644
>>>> --- a/mm/mmap.c
>>>> +++ b/mm/mmap.c
>>>> @@ -2330,7 +2330,9 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas,
>>>> struct mmu_gather tlb;
>>>> unsigned long mt_start = mas->index;
>>>>
>>>> +#ifdef CONFIG_MMU_GATHER_NO_GATHER
>>>
>>> In your comment you say skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER
>>> is *not* set. So shouldn't this be
>>>
>>> #ifndef CONFIG_MMU_GATHER_NO_GATHER ?
>>>
>> Hi Tim,
>>
>> The mmu_gather feature is used to gather pages produced by unmap_vmas() and
>> release them in batch in tlb_finish_mmu(). The feature is *on* if
>> CONFIG_MMU_GATHER_NO_GATHER is *not* set. Note that: tlb_finish_mmu() will call
>> free_pages_and_swap_cache()/lru_add_drain() only when the feature is on.
>
> Thanks for the explanation.
>
> Looking at the code, lru_add_drain() is executed for #ifndef CONFIG_MMU_GATHER_NO_GATHER
> in tlb_finish_mmu(). So the logic of your patch is fine.
>  
> The #ifndef CONFIG_MMU_GATHER_NO_GATHER means
> mmu_gather feature is on. The double negative throws me off on in my first read
> of your commit log.
>
> Suggest that you add a comment in code to make it easier for
> future code maintenence:
>
> /* defer lru_add_drain() to tlb_finish_mmu() for ifndef CONFIG_MMU_GATHER_NO_GATHER */
>
> Is your change of skipping the extra lru_add_drain() motivated by some performance reason
> in a workload? Wonder whether it is worth adding an extra ifdef in the code.
>
> Tim
>

Okay, great suggestion.

We observe heavy contention on the LRU lock, introduced by lru_add_drain() and
release_pages() for a prod workload, and we're trying to reduce the level of
contention.

lru_add_drain() is a complex function that first takes a local CPU lock and
iterate through *all* folio_batches to see if there are pages to be moved to
and between LRU lists. At that point, any page in these folio_batches will
trigger acquiring the per-LRU spin lock and increase the level of lock
contention. Applying the change can avoid calling lru_add_drain() unnecessarily,
which is a source of lock contention. Together with the comment line suggested
by you, I believe this also increases code readability to clarify the
mmu_gather feature.

- Jianfeng

>>
>> Yes, this commit aims to skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER
>> is *not* set (i.e. when the mmu_gather feature is on) because it is redundant.
>>
>> If CONFIG_MMU_GATHER_NO_GATHER is set, pages will be released in unmap_vmas().
>> tlb_finish_mmu() will not call lru_add_drain(). So, it is still necessary to
>> keep the lru_add_drain() call to clear cached pages before unmap_vmas(), as
>> folio_batchs hold a reference count for pages in them.
>>
>> The same applies to the other case.
>>
>> Thanks,
>> - Jianfeng
>>
>>>> lru_add_drain();
>>>> +#endif
>>>> tlb_gather_mmu(&tlb, mm);
>>>> update_hiwater_rss(mm);
>>>> unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked);
>>>> @@ -3300,7 +3302,9 @@ void exit_mmap(struct mm_struct *mm)
>>>> return;
>>>> }
>>>>
>>>> +#ifdef CONFIG_MMU_GATHER_NO_GATHER
>>>
>>> same question as above.
>>>
>>>> lru_add_drain();
>>>> +#endif
>>>> flush_cache_mm(mm);
>>>> tlb_gather_mmu_fullmm(&tlb, mm);
>>>> /* update_hiwater_rss(mm) here? but nobody should be looking */
>>>
>