2022-06-08 15:09:58

by Miaohe Lin

[permalink] [raw]
Subject: [PATCH v2 0/3] A few cleanup and fixup patches for swap

Hi everyone,
This series contains a cleaup patch to remove unneeded swap_cache_info
statistics, and two bugfix patches to avoid possible data races of
inuse_pages and so on. More details can be found in the respective
changelogs. Thanks!

---
v2:
collect Reviewed-by tag per David
drop patch "mm/swapfile: avoid confusing swap cache statistics"
add a new patch to remove swap_cache_info statistics per David
Many thanks David for review and comment.
---
Miaohe Lin (3):
mm/swapfile: make security_vm_enough_memory_mm() work as expected
mm/swapfile: fix possible data races of inuse_pages
mm/swap: remove swap_cache_info statistics

mm/swap_state.c | 17 -----------------
mm/swapfile.c | 14 +++++++++-----
2 files changed, 9 insertions(+), 22 deletions(-)

--
2.23.0


2022-06-08 15:10:43

by Miaohe Lin

[permalink] [raw]
Subject: [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected

security_vm_enough_memory_mm() checks whether a process has enough memory
to allocate a new virtual mapping. And total_swap_pages is considered as
available memory while swapoff tries to make sure there's enough memory
that can hold the swapped out memory. But total_swap_pages contains the
swap space that is being swapoff. So security_vm_enough_memory_mm() will
success even if there's no memory to hold the swapped out memory because
total_swap_pages always greater than or equal to p->pages.

In order to fix it, p->pages should be retracted from total_swap_pages
first and then check whether there's enough memory for inuse swap pages.

Signed-off-by: Miaohe Lin <[email protected]>
---
mm/swapfile.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index ec4c1b276691..d2bead7b8b70 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2398,6 +2398,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
struct filename *pathname;
int err, found = 0;
unsigned int old_block_size;
+ unsigned int inuse_pages;

if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -2428,9 +2429,13 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
spin_unlock(&swap_lock);
goto out_dput;
}
- if (!security_vm_enough_memory_mm(current->mm, p->pages))
- vm_unacct_memory(p->pages);
+
+ total_swap_pages -= p->pages;
+ inuse_pages = READ_ONCE(p->inuse_pages);
+ if (!security_vm_enough_memory_mm(current->mm, inuse_pages))
+ vm_unacct_memory(inuse_pages);
else {
+ total_swap_pages += p->pages;
err = -ENOMEM;
spin_unlock(&swap_lock);
goto out_dput;
@@ -2453,7 +2458,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
}
plist_del(&p->list, &swap_active_head);
atomic_long_sub(p->pages, &nr_swap_pages);
- total_swap_pages -= p->pages;
p->flags &= ~SWP_WRITEOK;
spin_unlock(&p->lock);
spin_unlock(&swap_lock);
--
2.23.0

2022-06-08 15:10:53

by Miaohe Lin

[permalink] [raw]
Subject: [PATCH v2 3/3] mm/swap: remove swap_cache_info statistics

swap_cache_info are not statistics that could be easily used to tune system
performance because they are not easily accessile. Also they can't provide
really useful info when OOM occurs. Remove these statistics can also help
mitigate unneeded global swap_cache_info cacheline contention.

Suggested-by: David Hildenbrand <[email protected]>
Signed-off-by: Miaohe Lin <[email protected]>
---
mm/swap_state.c | 17 -----------------
1 file changed, 17 deletions(-)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 0a2021fc55ad..41c6a6053d5c 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -59,24 +59,11 @@ static bool enable_vma_readahead __read_mostly = true;
#define GET_SWAP_RA_VAL(vma) \
(atomic_long_read(&(vma)->swap_readahead_info) ? : 4)

-#define INC_CACHE_INFO(x) data_race(swap_cache_info.x++)
-#define ADD_CACHE_INFO(x, nr) data_race(swap_cache_info.x += (nr))
-
-static struct {
- unsigned long add_total;
- unsigned long del_total;
- unsigned long find_success;
- unsigned long find_total;
-} swap_cache_info;
-
static atomic_t swapin_readahead_hits = ATOMIC_INIT(4);

void show_swap_cache_info(void)
{
printk("%lu pages in swap cache\n", total_swapcache_pages());
- printk("Swap cache stats: add %lu, delete %lu, find %lu/%lu\n",
- swap_cache_info.add_total, swap_cache_info.del_total,
- swap_cache_info.find_success, swap_cache_info.find_total);
printk("Free swap = %ldkB\n",
get_nr_swap_pages() << (PAGE_SHIFT - 10));
printk("Total swap = %lukB\n", total_swap_pages << (PAGE_SHIFT - 10));
@@ -133,7 +120,6 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry,
address_space->nrpages += nr;
__mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr);
__mod_lruvec_page_state(page, NR_SWAPCACHE, nr);
- ADD_CACHE_INFO(add_total, nr);
unlock:
xas_unlock_irq(&xas);
} while (xas_nomem(&xas, gfp));
@@ -172,7 +158,6 @@ void __delete_from_swap_cache(struct page *page,
address_space->nrpages -= nr;
__mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr);
__mod_lruvec_page_state(page, NR_SWAPCACHE, -nr);
- ADD_CACHE_INFO(del_total, nr);
}

/**
@@ -348,12 +333,10 @@ struct page *lookup_swap_cache(swp_entry_t entry, struct vm_area_struct *vma,
page = find_get_page(swap_address_space(entry), swp_offset(entry));
put_swap_device(si);

- INC_CACHE_INFO(find_total);
if (page) {
bool vma_ra = swap_use_vma_readahead();
bool readahead;

- INC_CACHE_INFO(find_success);
/*
* At the moment, we don't support PG_readahead for anon THP
* so let's bail out rather than confusing the readahead stat.
--
2.23.0

2022-06-08 15:50:17

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] mm/swap: remove swap_cache_info statistics

On 08.06.22 16:40, Miaohe Lin wrote:
> swap_cache_info are not statistics that could be easily used to tune system
> performance because they are not easily accessile. Also they can't provide
> really useful info when OOM occurs. Remove these statistics can also help
> mitigate unneeded global swap_cache_info cacheline contention.
>
> Suggested-by: David Hildenbrand <[email protected]>
> Signed-off-by: Miaohe Lin <[email protected]>
> ---

Reviewed-by: David Hildenbrand <[email protected]>


--
Thanks,

David / dhildenb

2022-06-17 02:42:05

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] A few cleanup and fixup patches for swap

On Wed, 8 Jun 2022 22:40:28 +0800 Miaohe Lin <[email protected]> wrote:

> This series contains a cleaup patch to remove unneeded swap_cache_info
> statistics, and two bugfix patches to avoid possible data races of
> inuse_pages and so on. More details can be found in the respective
> changelogs.

It would be nice to get [1/3] reviewed please.


2022-06-17 03:19:16

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] A few cleanup and fixup patches for swap

On 2022/6/17 10:37, Andrew Morton wrote:
> On Wed, 8 Jun 2022 22:40:28 +0800 Miaohe Lin <[email protected]> wrote:
>
>> This series contains a cleaup patch to remove unneeded swap_cache_info
>> statistics, and two bugfix patches to avoid possible data races of
>> inuse_pages and so on. More details can be found in the respective
>> changelogs.
>
> It would be nice to get [1/3] reviewed please.

I'd like too.

Hi David & Hugh & Huang, Ying,
It's very kind of you if you can help review this patch!

Thanks!

BTW: It should be convenient if there are mm/swap reviewers. ;)

>
>
> .
>

2022-06-17 07:46:26

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected

On 08.06.22 16:40, Miaohe Lin wrote:
> security_vm_enough_memory_mm() checks whether a process has enough memory
> to allocate a new virtual mapping. And total_swap_pages is considered as
> available memory while swapoff tries to make sure there's enough memory
> that can hold the swapped out memory. But total_swap_pages contains the
> swap space that is being swapoff. So security_vm_enough_memory_mm() will
> success even if there's no memory to hold the swapped out memory because

s/success/succeed/

> total_swap_pages always greater than or equal to p->pages.
>
> In order to fix it, p->pages should be retracted from total_swap_pages

s/retracted/subtracted/

> first and then check whether there's enough memory for inuse swap pages.
>
> Signed-off-by: Miaohe Lin <[email protected]>
> ---
> mm/swapfile.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index ec4c1b276691..d2bead7b8b70 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -2398,6 +2398,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
> struct filename *pathname;
> int err, found = 0;
> unsigned int old_block_size;
> + unsigned int inuse_pages;
>
> if (!capable(CAP_SYS_ADMIN))
> return -EPERM;
> @@ -2428,9 +2429,13 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
> spin_unlock(&swap_lock);
> goto out_dput;
> }
> - if (!security_vm_enough_memory_mm(current->mm, p->pages))
> - vm_unacct_memory(p->pages);
> +
> + total_swap_pages -= p->pages;
> + inuse_pages = READ_ONCE(p->inuse_pages);
> + if (!security_vm_enough_memory_mm(current->mm, inuse_pages))
> + vm_unacct_memory(inuse_pages);
> else {
> + total_swap_pages += p->pages;

That implies that whenever we fail in security_vm_enough_memory_mm(),
that other concurrent users might see a wrong total_swap_pages.

Assume 4 GiB memory and 8 GiB swap. Let's assume 10 GiB are in use.

Temporarily, we'd have

CommitLimit 4 GiB
Committed_AS 10 GiB

Not sure if relevant, but I wonder if it could be avoided somehow?


Apart from that, LGTM.

--
Thanks,

David / dhildenb

2022-06-18 03:22:40

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected

On 2022/6/17 15:33, David Hildenbrand wrote:
> On 08.06.22 16:40, Miaohe Lin wrote:
>> security_vm_enough_memory_mm() checks whether a process has enough memory
>> to allocate a new virtual mapping. And total_swap_pages is considered as
>> available memory while swapoff tries to make sure there's enough memory
>> that can hold the swapped out memory. But total_swap_pages contains the
>> swap space that is being swapoff. So security_vm_enough_memory_mm() will
>> success even if there's no memory to hold the swapped out memory because
>
> s/success/succeed/

OK. Thanks.

>
>> total_swap_pages always greater than or equal to p->pages.
>>
>> In order to fix it, p->pages should be retracted from total_swap_pages
>
> s/retracted/subtracted/

OK. Thanks.

>
>> first and then check whether there's enough memory for inuse swap pages.
>>
>> Signed-off-by: Miaohe Lin <[email protected]>
>> ---
>> mm/swapfile.c | 10 +++++++---
>> 1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index ec4c1b276691..d2bead7b8b70 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -2398,6 +2398,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>> struct filename *pathname;
>> int err, found = 0;
>> unsigned int old_block_size;
>> + unsigned int inuse_pages;
>>
>> if (!capable(CAP_SYS_ADMIN))
>> return -EPERM;
>> @@ -2428,9 +2429,13 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>> spin_unlock(&swap_lock);
>> goto out_dput;
>> }
>> - if (!security_vm_enough_memory_mm(current->mm, p->pages))
>> - vm_unacct_memory(p->pages);
>> +
>> + total_swap_pages -= p->pages;
>> + inuse_pages = READ_ONCE(p->inuse_pages);
>> + if (!security_vm_enough_memory_mm(current->mm, inuse_pages))
>> + vm_unacct_memory(inuse_pages);
>> else {
>> + total_swap_pages += p->pages;
>
> That implies that whenever we fail in security_vm_enough_memory_mm(),
> that other concurrent users might see a wrong total_swap_pages.
>
> Assume 4 GiB memory and 8 GiB swap. Let's assume 10 GiB are in use.
>
> Temporarily, we'd have
>
> CommitLimit 4 GiB
> Committed_AS 10 GiB

IIUC, even if without this change, the other concurrent users if come after vm_acct_memory()
is done in __vm_enough_memory(), they might see

CommitLimit 12 GiB (4 GiB memory + 8GiB total swap)
Committed_AS 18 GiB (10 GiB in use + 8GiB swap space to swapoff)

Or am I miss something?

>
> Not sure if relevant, but I wonder if it could be avoided somehow?

It seems this race exists already and is benign. The worst case is concurrent users might
fail to allocate the memory. But that window should be really small and swapoff is a rare
ops. Or should I try to fix this race?

>
>
> Apart from that, LGTM.

Many thanks for comment! :)

>

2022-06-18 07:36:58

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected

On 18.06.22 04:43, Miaohe Lin wrote:
> On 2022/6/17 15:33, David Hildenbrand wrote:
>> On 08.06.22 16:40, Miaohe Lin wrote:
>>> security_vm_enough_memory_mm() checks whether a process has enough memory
>>> to allocate a new virtual mapping. And total_swap_pages is considered as
>>> available memory while swapoff tries to make sure there's enough memory
>>> that can hold the swapped out memory. But total_swap_pages contains the
>>> swap space that is being swapoff. So security_vm_enough_memory_mm() will
>>> success even if there's no memory to hold the swapped out memory because
>>
>> s/success/succeed/
>
> OK. Thanks.
>
>>
>>> total_swap_pages always greater than or equal to p->pages.
>>>
>>> In order to fix it, p->pages should be retracted from total_swap_pages
>>
>> s/retracted/subtracted/
>
> OK. Thanks.
>
>>
>>> first and then check whether there's enough memory for inuse swap pages.
>>>
>>> Signed-off-by: Miaohe Lin <[email protected]>
>>> ---
>>> mm/swapfile.c | 10 +++++++---
>>> 1 file changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>>> index ec4c1b276691..d2bead7b8b70 100644
>>> --- a/mm/swapfile.c
>>> +++ b/mm/swapfile.c
>>> @@ -2398,6 +2398,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>>> struct filename *pathname;
>>> int err, found = 0;
>>> unsigned int old_block_size;
>>> + unsigned int inuse_pages;
>>>
>>> if (!capable(CAP_SYS_ADMIN))
>>> return -EPERM;
>>> @@ -2428,9 +2429,13 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>>> spin_unlock(&swap_lock);
>>> goto out_dput;
>>> }
>>> - if (!security_vm_enough_memory_mm(current->mm, p->pages))
>>> - vm_unacct_memory(p->pages);
>>> +
>>> + total_swap_pages -= p->pages;
>>> + inuse_pages = READ_ONCE(p->inuse_pages);
>>> + if (!security_vm_enough_memory_mm(current->mm, inuse_pages))
>>> + vm_unacct_memory(inuse_pages);
>>> else {
>>> + total_swap_pages += p->pages;
>>
>> That implies that whenever we fail in security_vm_enough_memory_mm(),
>> that other concurrent users might see a wrong total_swap_pages.
>>
>> Assume 4 GiB memory and 8 GiB swap. Let's assume 10 GiB are in use.
>>
>> Temporarily, we'd have
>>
>> CommitLimit 4 GiB
>> Committed_AS 10 GiB
>
> IIUC, even if without this change, the other concurrent users if come after vm_acct_memory()
> is done in __vm_enough_memory(), they might see
>
> CommitLimit 12 GiB (4 GiB memory + 8GiB total swap)
> Committed_AS 18 GiB (10 GiB in use + 8GiB swap space to swapoff)
>
> Or am I miss something?
>

I think you are right!

Reviewed-by: David Hildenbrand <[email protected]>


--
Thanks,

David / dhildenb

2022-06-18 07:40:15

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected

On 2022/6/18 15:10, David Hildenbrand wrote:
> On 18.06.22 04:43, Miaohe Lin wrote:
>> On 2022/6/17 15:33, David Hildenbrand wrote:
>>> On 08.06.22 16:40, Miaohe Lin wrote:
>>>> security_vm_enough_memory_mm() checks whether a process has enough memory
>>>> to allocate a new virtual mapping. And total_swap_pages is considered as
>>>> available memory while swapoff tries to make sure there's enough memory
>>>> that can hold the swapped out memory. But total_swap_pages contains the
>>>> swap space that is being swapoff. So security_vm_enough_memory_mm() will
>>>> success even if there's no memory to hold the swapped out memory because
>>>
>>> s/success/succeed/
>>
>> OK. Thanks.
>>
>>>
>>>> total_swap_pages always greater than or equal to p->pages.
>>>>
>>>> In order to fix it, p->pages should be retracted from total_swap_pages
>>>
>>> s/retracted/subtracted/
>>
>> OK. Thanks.
>>
>>>
>>>> first and then check whether there's enough memory for inuse swap pages.
>>>>
>>>> Signed-off-by: Miaohe Lin <[email protected]>
>>>> ---
>>>> mm/swapfile.c | 10 +++++++---
>>>> 1 file changed, 7 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>>>> index ec4c1b276691..d2bead7b8b70 100644
>>>> --- a/mm/swapfile.c
>>>> +++ b/mm/swapfile.c
>>>> @@ -2398,6 +2398,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>>>> struct filename *pathname;
>>>> int err, found = 0;
>>>> unsigned int old_block_size;
>>>> + unsigned int inuse_pages;
>>>>
>>>> if (!capable(CAP_SYS_ADMIN))
>>>> return -EPERM;
>>>> @@ -2428,9 +2429,13 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
>>>> spin_unlock(&swap_lock);
>>>> goto out_dput;
>>>> }
>>>> - if (!security_vm_enough_memory_mm(current->mm, p->pages))
>>>> - vm_unacct_memory(p->pages);
>>>> +
>>>> + total_swap_pages -= p->pages;
>>>> + inuse_pages = READ_ONCE(p->inuse_pages);
>>>> + if (!security_vm_enough_memory_mm(current->mm, inuse_pages))
>>>> + vm_unacct_memory(inuse_pages);
>>>> else {
>>>> + total_swap_pages += p->pages;
>>>
>>> That implies that whenever we fail in security_vm_enough_memory_mm(),
>>> that other concurrent users might see a wrong total_swap_pages.
>>>
>>> Assume 4 GiB memory and 8 GiB swap. Let's assume 10 GiB are in use.
>>>
>>> Temporarily, we'd have
>>>
>>> CommitLimit 4 GiB
>>> Committed_AS 10 GiB
>>
>> IIUC, even if without this change, the other concurrent users if come after vm_acct_memory()
>> is done in __vm_enough_memory(), they might see
>>
>> CommitLimit 12 GiB (4 GiB memory + 8GiB total swap)
>> Committed_AS 18 GiB (10 GiB in use + 8GiB swap space to swapoff)
>>
>> Or am I miss something?
>>
>
> I think you are right!
>
> Reviewed-by: David Hildenbrand <[email protected]>

Thanks a lot!

>
>

2022-06-20 08:02:07

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected

Miaohe Lin <[email protected]> writes:

> security_vm_enough_memory_mm() checks whether a process has enough memory
> to allocate a new virtual mapping. And total_swap_pages is considered as
> available memory while swapoff tries to make sure there's enough memory
> that can hold the swapped out memory. But total_swap_pages contains the
> swap space that is being swapoff. So security_vm_enough_memory_mm() will
> success even if there's no memory to hold the swapped out memory because
> total_swap_pages always greater than or equal to p->pages.

Per my understanding, swapoff will not allocate virtual mapping by
itself. But after swapoff, the overcommit limit could be exceeded.
security_vm_enough_memory_mm() is used to check that. For example, in a
system with 4GB memory and 8GB swap, and 10GB is in use,

CommitLimit: 4+8 = 12GB
Committed_AS: 10GB

security_vm_enough_memory_mm() in swapoff() will fail because
10+8 = 18 > 12. This is expected because after swapoff, the overcommit
limit will be exceeded.

If 3GB is in use,

CommitLimit: 4+8 = 12GB
Committed_AS: 3GB

security_vm_enough_memory_mm() in swapoff() will succeed because
3+8 = 11 < 12. This is expected because after swapoff, the overcommit
limit will not be exceeded.

So, what's the real problem of the original implementation? Can you
show it with an example as above?

Best Regards,
Huang, Ying

> In order to fix it, p->pages should be retracted from total_swap_pages
> first and then check whether there's enough memory for inuse swap pages.
>
> Signed-off-by: Miaohe Lin <[email protected]>

[snip]

2022-06-20 08:53:15

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] mm/swap: remove swap_cache_info statistics

Miaohe Lin <[email protected]> writes:

> swap_cache_info are not statistics that could be easily used to tune system
> performance because they are not easily accessile. Also they can't provide
> really useful info when OOM occurs. Remove these statistics can also help
> mitigate unneeded global swap_cache_info cacheline contention.
>
> Suggested-by: David Hildenbrand <[email protected]>
> Signed-off-by: Miaohe Lin <[email protected]>
> ---
> mm/swap_state.c | 17 -----------------
> 1 file changed, 17 deletions(-)
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 0a2021fc55ad..41c6a6053d5c 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -59,24 +59,11 @@ static bool enable_vma_readahead __read_mostly = true;
> #define GET_SWAP_RA_VAL(vma) \
> (atomic_long_read(&(vma)->swap_readahead_info) ? : 4)
>
> -#define INC_CACHE_INFO(x) data_race(swap_cache_info.x++)
> -#define ADD_CACHE_INFO(x, nr) data_race(swap_cache_info.x += (nr))
> -
> -static struct {
> - unsigned long add_total;
> - unsigned long del_total;
> - unsigned long find_success;
> - unsigned long find_total;
> -} swap_cache_info;
> -
> static atomic_t swapin_readahead_hits = ATOMIC_INIT(4);
>
> void show_swap_cache_info(void)
> {
> printk("%lu pages in swap cache\n", total_swapcache_pages());
> - printk("Swap cache stats: add %lu, delete %lu, find %lu/%lu\n",
> - swap_cache_info.add_total, swap_cache_info.del_total,
> - swap_cache_info.find_success, swap_cache_info.find_total);
> printk("Free swap = %ldkB\n",
> get_nr_swap_pages() << (PAGE_SHIFT - 10));
> printk("Total swap = %lukB\n", total_swap_pages << (PAGE_SHIFT - 10));
> @@ -133,7 +120,6 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry,
> address_space->nrpages += nr;
> __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr);
> __mod_lruvec_page_state(page, NR_SWAPCACHE, nr);
> - ADD_CACHE_INFO(add_total, nr);
> unlock:
> xas_unlock_irq(&xas);
> } while (xas_nomem(&xas, gfp));
> @@ -172,7 +158,6 @@ void __delete_from_swap_cache(struct page *page,
> address_space->nrpages -= nr;
> __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr);
> __mod_lruvec_page_state(page, NR_SWAPCACHE, -nr);
> - ADD_CACHE_INFO(del_total, nr);
> }
>
> /**
> @@ -348,12 +333,10 @@ struct page *lookup_swap_cache(swp_entry_t entry, struct vm_area_struct *vma,
> page = find_get_page(swap_address_space(entry), swp_offset(entry));
> put_swap_device(si);
>
> - INC_CACHE_INFO(find_total);
> if (page) {
> bool vma_ra = swap_use_vma_readahead();
> bool readahead;
>
> - INC_CACHE_INFO(find_success);
> /*
> * At the moment, we don't support PG_readahead for anon THP
> * so let's bail out rather than confusing the readahead stat.

This looks reasonable. And if we want to do some statistics for swap
cache in the future, we can use BPF, that is even more convenient.

Acked-by: "Huang, Ying" <[email protected]>

Best Regards,
Huang, Ying

2022-06-20 09:20:32

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] mm/swap: remove swap_cache_info statistics

On 2022/6/20 16:08, Huang, Ying wrote:
> Miaohe Lin <[email protected]> writes:
>
>> swap_cache_info are not statistics that could be easily used to tune system
>> performance because they are not easily accessile. Also they can't provide
>> really useful info when OOM occurs. Remove these statistics can also help
>> mitigate unneeded global swap_cache_info cacheline contention.
>>
>> Suggested-by: David Hildenbrand <[email protected]>
>> Signed-off-by: Miaohe Lin <[email protected]>
>> ---
>> mm/swap_state.c | 17 -----------------
>> 1 file changed, 17 deletions(-)
>>
>> diff --git a/mm/swap_state.c b/mm/swap_state.c
>> index 0a2021fc55ad..41c6a6053d5c 100644
>> --- a/mm/swap_state.c
>> +++ b/mm/swap_state.c
>> @@ -59,24 +59,11 @@ static bool enable_vma_readahead __read_mostly = true;
>> #define GET_SWAP_RA_VAL(vma) \
>> (atomic_long_read(&(vma)->swap_readahead_info) ? : 4)
>>
>> -#define INC_CACHE_INFO(x) data_race(swap_cache_info.x++)
>> -#define ADD_CACHE_INFO(x, nr) data_race(swap_cache_info.x += (nr))
>> -
>> -static struct {
>> - unsigned long add_total;
>> - unsigned long del_total;
>> - unsigned long find_success;
>> - unsigned long find_total;
>> -} swap_cache_info;
>> -
>> static atomic_t swapin_readahead_hits = ATOMIC_INIT(4);
>>
>> void show_swap_cache_info(void)
>> {
>> printk("%lu pages in swap cache\n", total_swapcache_pages());
>> - printk("Swap cache stats: add %lu, delete %lu, find %lu/%lu\n",
>> - swap_cache_info.add_total, swap_cache_info.del_total,
>> - swap_cache_info.find_success, swap_cache_info.find_total);
>> printk("Free swap = %ldkB\n",
>> get_nr_swap_pages() << (PAGE_SHIFT - 10));
>> printk("Total swap = %lukB\n", total_swap_pages << (PAGE_SHIFT - 10));
>> @@ -133,7 +120,6 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry,
>> address_space->nrpages += nr;
>> __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr);
>> __mod_lruvec_page_state(page, NR_SWAPCACHE, nr);
>> - ADD_CACHE_INFO(add_total, nr);
>> unlock:
>> xas_unlock_irq(&xas);
>> } while (xas_nomem(&xas, gfp));
>> @@ -172,7 +158,6 @@ void __delete_from_swap_cache(struct page *page,
>> address_space->nrpages -= nr;
>> __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr);
>> __mod_lruvec_page_state(page, NR_SWAPCACHE, -nr);
>> - ADD_CACHE_INFO(del_total, nr);
>> }
>>
>> /**
>> @@ -348,12 +333,10 @@ struct page *lookup_swap_cache(swp_entry_t entry, struct vm_area_struct *vma,
>> page = find_get_page(swap_address_space(entry), swp_offset(entry));
>> put_swap_device(si);
>>
>> - INC_CACHE_INFO(find_total);
>> if (page) {
>> bool vma_ra = swap_use_vma_readahead();
>> bool readahead;
>>
>> - INC_CACHE_INFO(find_success);
>> /*
>> * At the moment, we don't support PG_readahead for anon THP
>> * so let's bail out rather than confusing the readahead stat.
>
> This looks reasonable. And if we want to do some statistics for swap
> cache in the future, we can use BPF, that is even more convenient.

BPF should be very convenient. Many thanks for reviewing!

>
> Acked-by: "Huang, Ying" <[email protected]>
>
> Best Regards,
> Huang, Ying
>
> .
>

2022-06-20 09:41:38

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] mm/swap: remove swap_cache_info statistics

On Wed, Jun 08, 2022 at 10:40:31PM +0800, Miaohe Lin wrote:
> swap_cache_info are not statistics that could be easily used to tune system
> performance because they are not easily accessile. Also they can't provide
> really useful info when OOM occurs. Remove these statistics can also help
> mitigate unneeded global swap_cache_info cacheline contention.
>
> Suggested-by: David Hildenbrand <[email protected]>
> Signed-off-by: Miaohe Lin <[email protected]>

Reviewed-by: Muchun Song <[email protected]>

Thanks.

2022-06-20 12:26:29

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected

On 2022/6/20 15:31, Huang, Ying wrote:
> Miaohe Lin <[email protected]> writes:
>
>> security_vm_enough_memory_mm() checks whether a process has enough memory
>> to allocate a new virtual mapping. And total_swap_pages is considered as
>> available memory while swapoff tries to make sure there's enough memory
>> that can hold the swapped out memory. But total_swap_pages contains the
>> swap space that is being swapoff. So security_vm_enough_memory_mm() will
>> success even if there's no memory to hold the swapped out memory because
>> total_swap_pages always greater than or equal to p->pages.
>
> Per my understanding, swapoff will not allocate virtual mapping by
> itself. But after swapoff, the overcommit limit could be exceeded.
> security_vm_enough_memory_mm() is used to check that. For example, in a
> system with 4GB memory and 8GB swap, and 10GB is in use,
>
> CommitLimit: 4+8 = 12GB
> Committed_AS: 10GB
>
> security_vm_enough_memory_mm() in swapoff() will fail because
> 10+8 = 18 > 12. This is expected because after swapoff, the overcommit
> limit will be exceeded.
>
> If 3GB is in use,
>
> CommitLimit: 4+8 = 12GB
> Committed_AS: 3GB
>
> security_vm_enough_memory_mm() in swapoff() will succeed because
> 3+8 = 11 < 12. This is expected because after swapoff, the overcommit
> limit will not be exceeded.

In OVERCOMMIT_NEVER scene, I think you're right.

>
> So, what's the real problem of the original implementation? Can you
> show it with an example as above?

In OVERCOMMIT_GUESS scene, in a system with 4GB memory and 8GB swap, and 10GB is in use,
pages below is 8GB, totalram_pages() + total_swap_pages is 12GB, so swapoff() will succeed
instead of expected failure because 8 < 12. The overcommit limit is always *ignored* in the
below case.

if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) {
if (pages > totalram_pages() + total_swap_pages)
goto error;
return 0;
}

Or am I miss something?

>
> Best Regards,
> Huang, Ying

Thanks!

>
>> In order to fix it, p->pages should be retracted from total_swap_pages
>> first and then check whether there's enough memory for inuse swap pages.
>>
>> Signed-off-by: Miaohe Lin <[email protected]>
>
> [snip]
>
> .
>

2022-06-21 01:38:19

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected

Miaohe Lin <[email protected]> writes:

> On 2022/6/20 15:31, Huang, Ying wrote:
>> Miaohe Lin <[email protected]> writes:
>>
>>> security_vm_enough_memory_mm() checks whether a process has enough memory
>>> to allocate a new virtual mapping. And total_swap_pages is considered as
>>> available memory while swapoff tries to make sure there's enough memory
>>> that can hold the swapped out memory. But total_swap_pages contains the
>>> swap space that is being swapoff. So security_vm_enough_memory_mm() will
>>> success even if there's no memory to hold the swapped out memory because
>>> total_swap_pages always greater than or equal to p->pages.
>>
>> Per my understanding, swapoff will not allocate virtual mapping by
>> itself. But after swapoff, the overcommit limit could be exceeded.
>> security_vm_enough_memory_mm() is used to check that. For example, in a
>> system with 4GB memory and 8GB swap, and 10GB is in use,
>>
>> CommitLimit: 4+8 = 12GB
>> Committed_AS: 10GB
>>
>> security_vm_enough_memory_mm() in swapoff() will fail because
>> 10+8 = 18 > 12. This is expected because after swapoff, the overcommit
>> limit will be exceeded.
>>
>> If 3GB is in use,
>>
>> CommitLimit: 4+8 = 12GB
>> Committed_AS: 3GB
>>
>> security_vm_enough_memory_mm() in swapoff() will succeed because
>> 3+8 = 11 < 12. This is expected because after swapoff, the overcommit
>> limit will not be exceeded.
>
> In OVERCOMMIT_NEVER scene, I think you're right.
>
>>
>> So, what's the real problem of the original implementation? Can you
>> show it with an example as above?
>
> In OVERCOMMIT_GUESS scene, in a system with 4GB memory and 8GB swap, and 10GB is in use,
> pages below is 8GB, totalram_pages() + total_swap_pages is 12GB, so swapoff() will succeed
> instead of expected failure because 8 < 12. The overcommit limit is always *ignored* in the
> below case.
>
> if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) {
> if (pages > totalram_pages() + total_swap_pages)
> goto error;
> return 0;
> }
>
> Or am I miss something?

Per my understanding, with OVERCOMMIT_GUESS, the number of in-use pages
isn't checked at all. The only restriction is that the size of the
virtual mapping created should be less than total RAM + total swap
pages. Because swapoff() will not create virtual mapping, so it's
expected that security_vm_enough_memory_mm() in swapoff() always
succeeds.

Best Regards,
Huang, Ying

>
> Thanks!
>
>>
>>> In order to fix it, p->pages should be retracted from total_swap_pages
>>> first and then check whether there's enough memory for inuse swap pages.
>>>
>>> Signed-off-by: Miaohe Lin <[email protected]>
>>
>> [snip]
>>
>> .
>>

2022-06-21 08:03:11

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected

On 2022/6/21 9:35, Huang, Ying wrote:
> Miaohe Lin <[email protected]> writes:
>
>> On 2022/6/20 15:31, Huang, Ying wrote:
>>> Miaohe Lin <[email protected]> writes:
>>>
>>>> security_vm_enough_memory_mm() checks whether a process has enough memory
>>>> to allocate a new virtual mapping. And total_swap_pages is considered as
>>>> available memory while swapoff tries to make sure there's enough memory
>>>> that can hold the swapped out memory. But total_swap_pages contains the
>>>> swap space that is being swapoff. So security_vm_enough_memory_mm() will
>>>> success even if there's no memory to hold the swapped out memory because
>>>> total_swap_pages always greater than or equal to p->pages.
>>>
>>> Per my understanding, swapoff will not allocate virtual mapping by
>>> itself. But after swapoff, the overcommit limit could be exceeded.
>>> security_vm_enough_memory_mm() is used to check that. For example, in a
>>> system with 4GB memory and 8GB swap, and 10GB is in use,
>>>
>>> CommitLimit: 4+8 = 12GB
>>> Committed_AS: 10GB
>>>
>>> security_vm_enough_memory_mm() in swapoff() will fail because
>>> 10+8 = 18 > 12. This is expected because after swapoff, the overcommit
>>> limit will be exceeded.
>>>
>>> If 3GB is in use,
>>>
>>> CommitLimit: 4+8 = 12GB
>>> Committed_AS: 3GB
>>>
>>> security_vm_enough_memory_mm() in swapoff() will succeed because
>>> 3+8 = 11 < 12. This is expected because after swapoff, the overcommit
>>> limit will not be exceeded.
>>
>> In OVERCOMMIT_NEVER scene, I think you're right.
>>
>>>
>>> So, what's the real problem of the original implementation? Can you
>>> show it with an example as above?
>>
>> In OVERCOMMIT_GUESS scene, in a system with 4GB memory and 8GB swap, and 10GB is in use,
>> pages below is 8GB, totalram_pages() + total_swap_pages is 12GB, so swapoff() will succeed
>> instead of expected failure because 8 < 12. The overcommit limit is always *ignored* in the
>> below case.
>>
>> if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) {
>> if (pages > totalram_pages() + total_swap_pages)
>> goto error;
>> return 0;
>> }
>>
>> Or am I miss something?
>
> Per my understanding, with OVERCOMMIT_GUESS, the number of in-use pages
> isn't checked at all. The only restriction is that the size of the
> virtual mapping created should be less than total RAM + total swap

Do you mean the only restriction is that the size of the virtual mapping
*created every time* should be less than total RAM + total swap pages but
*total virtual mapping* is not limited in OVERCOMMIT_GUESS scene? If so,
the current behavior should be sane and I will drop this patch.

Thanks!

> pages. Because swapoff() will not create virtual mapping, so it's
> expected that security_vm_enough_memory_mm() in swapoff() always
> succeeds.
>
> Best Regards,
> Huang, Ying
>
>>
>> Thanks!
>>
>>>
>>>> In order to fix it, p->pages should be retracted from total_swap_pages
>>>> first and then check whether there's enough memory for inuse swap pages.
>>>>
>>>> Signed-off-by: Miaohe Lin <[email protected]>
>>>
>>> [snip]
>>>
>>> .
>>>
>
> .
>

2022-06-21 08:43:27

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected

Miaohe Lin <[email protected]> writes:

> On 2022/6/21 9:35, Huang, Ying wrote:
>> Miaohe Lin <[email protected]> writes:
>>
>>> On 2022/6/20 15:31, Huang, Ying wrote:
>>>> Miaohe Lin <[email protected]> writes:
>>>>
>>>>> security_vm_enough_memory_mm() checks whether a process has enough memory
>>>>> to allocate a new virtual mapping. And total_swap_pages is considered as
>>>>> available memory while swapoff tries to make sure there's enough memory
>>>>> that can hold the swapped out memory. But total_swap_pages contains the
>>>>> swap space that is being swapoff. So security_vm_enough_memory_mm() will
>>>>> success even if there's no memory to hold the swapped out memory because
>>>>> total_swap_pages always greater than or equal to p->pages.
>>>>
>>>> Per my understanding, swapoff will not allocate virtual mapping by
>>>> itself. But after swapoff, the overcommit limit could be exceeded.
>>>> security_vm_enough_memory_mm() is used to check that. For example, in a
>>>> system with 4GB memory and 8GB swap, and 10GB is in use,
>>>>
>>>> CommitLimit: 4+8 = 12GB
>>>> Committed_AS: 10GB
>>>>
>>>> security_vm_enough_memory_mm() in swapoff() will fail because
>>>> 10+8 = 18 > 12. This is expected because after swapoff, the overcommit
>>>> limit will be exceeded.
>>>>
>>>> If 3GB is in use,
>>>>
>>>> CommitLimit: 4+8 = 12GB
>>>> Committed_AS: 3GB
>>>>
>>>> security_vm_enough_memory_mm() in swapoff() will succeed because
>>>> 3+8 = 11 < 12. This is expected because after swapoff, the overcommit
>>>> limit will not be exceeded.
>>>
>>> In OVERCOMMIT_NEVER scene, I think you're right.
>>>
>>>>
>>>> So, what's the real problem of the original implementation? Can you
>>>> show it with an example as above?
>>>
>>> In OVERCOMMIT_GUESS scene, in a system with 4GB memory and 8GB swap, and 10GB is in use,
>>> pages below is 8GB, totalram_pages() + total_swap_pages is 12GB, so swapoff() will succeed
>>> instead of expected failure because 8 < 12. The overcommit limit is always *ignored* in the
>>> below case.
>>>
>>> if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) {
>>> if (pages > totalram_pages() + total_swap_pages)
>>> goto error;
>>> return 0;
>>> }
>>>
>>> Or am I miss something?
>>
>> Per my understanding, with OVERCOMMIT_GUESS, the number of in-use pages
>> isn't checked at all. The only restriction is that the size of the
>> virtual mapping created should be less than total RAM + total swap
>
> Do you mean the only restriction is that the size of the virtual mapping
> *created every time* should be less than total RAM + total swap pages but
> *total virtual mapping* is not limited in OVERCOMMIT_GUESS scene? If so,
> the current behavior should be sane and I will drop this patch.

Yes. This is my understanding.

Best Regards,
Huang, Ying

> Thanks!
>
>> pages. Because swapoff() will not create virtual mapping, so it's
>> expected that security_vm_enough_memory_mm() in swapoff() always
>> succeeds.
>>
>> Best Regards,
>> Huang, Ying
>>
>>>
>>> Thanks!
>>>
>>>>
>>>>> In order to fix it, p->pages should be retracted from total_swap_pages
>>>>> first and then check whether there's enough memory for inuse swap pages.
>>>>>
>>>>> Signed-off-by: Miaohe Lin <[email protected]>
>>>>
>>>> [snip]
>>>>
>>>> .
>>>>
>>
>> .
>>

2022-06-21 08:54:42

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] mm/swapfile: make security_vm_enough_memory_mm() work as expected

On 2022/6/21 15:42, Huang, Ying wrote:
> Miaohe Lin <[email protected]> writes:
>
>> On 2022/6/21 9:35, Huang, Ying wrote:
>>> Miaohe Lin <[email protected]> writes:
>>>
>>>> On 2022/6/20 15:31, Huang, Ying wrote:
>>>>> Miaohe Lin <[email protected]> writes:
>>>>>
>>>>>> security_vm_enough_memory_mm() checks whether a process has enough memory
>>>>>> to allocate a new virtual mapping. And total_swap_pages is considered as
>>>>>> available memory while swapoff tries to make sure there's enough memory
>>>>>> that can hold the swapped out memory. But total_swap_pages contains the
>>>>>> swap space that is being swapoff. So security_vm_enough_memory_mm() will
>>>>>> success even if there's no memory to hold the swapped out memory because
>>>>>> total_swap_pages always greater than or equal to p->pages.
>>>>>
>>>>> Per my understanding, swapoff will not allocate virtual mapping by
>>>>> itself. But after swapoff, the overcommit limit could be exceeded.
>>>>> security_vm_enough_memory_mm() is used to check that. For example, in a
>>>>> system with 4GB memory and 8GB swap, and 10GB is in use,
>>>>>
>>>>> CommitLimit: 4+8 = 12GB
>>>>> Committed_AS: 10GB
>>>>>
>>>>> security_vm_enough_memory_mm() in swapoff() will fail because
>>>>> 10+8 = 18 > 12. This is expected because after swapoff, the overcommit
>>>>> limit will be exceeded.
>>>>>
>>>>> If 3GB is in use,
>>>>>
>>>>> CommitLimit: 4+8 = 12GB
>>>>> Committed_AS: 3GB
>>>>>
>>>>> security_vm_enough_memory_mm() in swapoff() will succeed because
>>>>> 3+8 = 11 < 12. This is expected because after swapoff, the overcommit
>>>>> limit will not be exceeded.
>>>>
>>>> In OVERCOMMIT_NEVER scene, I think you're right.
>>>>
>>>>>
>>>>> So, what's the real problem of the original implementation? Can you
>>>>> show it with an example as above?
>>>>
>>>> In OVERCOMMIT_GUESS scene, in a system with 4GB memory and 8GB swap, and 10GB is in use,
>>>> pages below is 8GB, totalram_pages() + total_swap_pages is 12GB, so swapoff() will succeed
>>>> instead of expected failure because 8 < 12. The overcommit limit is always *ignored* in the
>>>> below case.
>>>>
>>>> if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) {
>>>> if (pages > totalram_pages() + total_swap_pages)
>>>> goto error;
>>>> return 0;
>>>> }
>>>>
>>>> Or am I miss something?
>>>
>>> Per my understanding, with OVERCOMMIT_GUESS, the number of in-use pages
>>> isn't checked at all. The only restriction is that the size of the
>>> virtual mapping created should be less than total RAM + total swap
>>
>> Do you mean the only restriction is that the size of the virtual mapping
>> *created every time* should be less than total RAM + total swap pages but
>> *total virtual mapping* is not limited in OVERCOMMIT_GUESS scene? If so,
>> the current behavior should be sane and I will drop this patch.
>
> Yes. This is my understanding.

I see. Thank you.

>
> Best Regards,
> Huang, Ying
>
>> Thanks!
>>
>>> pages. Because swapoff() will not create virtual mapping, so it's
>>> expected that security_vm_enough_memory_mm() in swapoff() always
>>> succeeds.
>>>
>>> Best Regards,
>>> Huang, Ying
>>>
>>>>
>>>> Thanks!
>>>>
>>>>>
>>>>>> In order to fix it, p->pages should be retracted from total_swap_pages
>>>>>> first and then check whether there's enough memory for inuse swap pages.
>>>>>>
>>>>>> Signed-off-by: Miaohe Lin <[email protected]>
>>>>>
>>>>> [snip]
>>>>>
>>>>> .
>>>>>
>>>
>>> .
>>>
>
> .
>