Currently the kernel adds the page, allocated for swapin, to the
swapcache before charging the page. This is fine but now we want a
per-memcg swapcache stat which is essential for folks who wants to
transparently migrate from cgroup v1's memsw to cgroup v2's memory and
swap counters. In addition charging a page before exposing it to other
parts of the kernel is a step in the right direction.
To correctly maintain the per-memcg swapcache stat, this patch has
adopted to charge the page before adding it to swapcache. One
challenge in this option is the failure case of add_to_swap_cache() on
which we need to undo the mem_cgroup_charge(). Specifically undoing
mem_cgroup_uncharge_swap() is not simple.
To resolve the issue, this patch introduces transaction like interface
to charge a page for swapin. The function mem_cgroup_charge_swapin_page()
initiates the charging of the page and mem_cgroup_finish_swapin_page()
completes the charging process. So, the kernel starts the charging
process of the page for swapin with mem_cgroup_charge_swapin_page(),
adds the page to the swapcache and on success completes the charging
process with mem_cgroup_finish_swapin_page().
Signed-off-by: Shakeel Butt <[email protected]>
---
Changes since v2:
- fixed build for !CONFIG_MEMCG
- simplified failure path from add_to_swap_cache()
Changes since v1:
- Removes __GFP_NOFAIL and introduced transaction interface for charging
(suggested by Johannes)
- Updated the commit message
include/linux/memcontrol.h | 14 +++++
mm/memcontrol.c | 116 +++++++++++++++++++++++--------------
mm/memory.c | 14 ++---
mm/swap_state.c | 13 ++---
4 files changed, 97 insertions(+), 60 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e6dc793d587d..d31e6dca397f 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -596,6 +596,9 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg)
}
int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask);
+int mem_cgroup_charge_swapin_page(struct page *page, struct mm_struct *mm,
+ gfp_t gfp, swp_entry_t entry);
+void mem_cgroup_finish_swapin_page(struct page *page, swp_entry_t entry);
void mem_cgroup_uncharge(struct page *page);
void mem_cgroup_uncharge_list(struct list_head *page_list);
@@ -1141,6 +1144,17 @@ static inline int mem_cgroup_charge(struct page *page, struct mm_struct *mm,
return 0;
}
+static inline int mem_cgroup_charge_swapin_page(struct page *page,
+ struct mm_struct *mm, gfp_t gfp, swp_entry_t entry)
+{
+ return 0;
+}
+
+static inline void mem_cgroup_finish_swapin_page(struct page *page,
+ swp_entry_t entry)
+{
+}
+
static inline void mem_cgroup_uncharge(struct page *page)
{
}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2db2aeac8a9e..226b7bccb44c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6690,6 +6690,27 @@ void mem_cgroup_calculate_protection(struct mem_cgroup *root,
atomic_long_read(&parent->memory.children_low_usage)));
}
+static int __mem_cgroup_charge(struct page *page, struct mem_cgroup *memcg,
+ gfp_t gfp)
+{
+ unsigned int nr_pages = thp_nr_pages(page);
+ int ret;
+
+ ret = try_charge(memcg, gfp, nr_pages);
+ if (ret)
+ goto out;
+
+ css_get(&memcg->css);
+ commit_charge(page, memcg);
+
+ local_irq_disable();
+ mem_cgroup_charge_statistics(memcg, page, nr_pages);
+ memcg_check_events(memcg, page);
+ local_irq_enable();
+out:
+ return ret;
+}
+
/**
* mem_cgroup_charge - charge a newly allocated page to a cgroup
* @page: page to charge
@@ -6699,55 +6720,70 @@ void mem_cgroup_calculate_protection(struct mem_cgroup *root,
* Try to charge @page to the memcg that @mm belongs to, reclaiming
* pages according to @gfp_mask if necessary.
*
+ * Do not use this for pages allocated for swapin.
+ *
* Returns 0 on success. Otherwise, an error code is returned.
*/
int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
{
- unsigned int nr_pages = thp_nr_pages(page);
- struct mem_cgroup *memcg = NULL;
- int ret = 0;
+ struct mem_cgroup *memcg;
+ int ret;
if (mem_cgroup_disabled())
- goto out;
+ return 0;
- if (PageSwapCache(page)) {
- swp_entry_t ent = { .val = page_private(page), };
- unsigned short id;
+ memcg = get_mem_cgroup_from_mm(mm);
+ ret = __mem_cgroup_charge(page, memcg, gfp_mask);
+ css_put(&memcg->css);
- /*
- * Every swap fault against a single page tries to charge the
- * page, bail as early as possible. shmem_unuse() encounters
- * already charged pages, too. page and memcg binding is
- * protected by the page lock, which serializes swap cache
- * removal, which in turn serializes uncharging.
- */
- VM_BUG_ON_PAGE(!PageLocked(page), page);
- if (page_memcg(compound_head(page)))
- goto out;
+ return ret;
+}
- id = lookup_swap_cgroup_id(ent);
- rcu_read_lock();
- memcg = mem_cgroup_from_id(id);
- if (memcg && !css_tryget_online(&memcg->css))
- memcg = NULL;
- rcu_read_unlock();
- }
+/**
+ * mem_cgroup_charge_swapin_page - charge a newly allocated page for swapin
+ * @page: page to charge
+ * @mm: mm context of the victim
+ * @gfp: reclaim mode
+ * @entry: swap entry for which the page is allocated
+ *
+ * This function marks the start of the transaction of charging the page for
+ * swapin. Complete the transaction with mem_cgroup_finish_swapin_page().
+ *
+ * Returns 0 on success. Otherwise, an error code is returned.
+ */
+int mem_cgroup_charge_swapin_page(struct page *page, struct mm_struct *mm,
+ gfp_t gfp, swp_entry_t entry)
+{
+ struct mem_cgroup *memcg;
+ unsigned short id;
+ int ret;
- if (!memcg)
- memcg = get_mem_cgroup_from_mm(mm);
+ if (mem_cgroup_disabled())
+ return 0;
- ret = try_charge(memcg, gfp_mask, nr_pages);
- if (ret)
- goto out_put;
+ id = lookup_swap_cgroup_id(entry);
+ rcu_read_lock();
+ memcg = mem_cgroup_from_id(id);
+ if (!memcg || !css_tryget_online(&memcg->css))
+ memcg = get_mem_cgroup_from_mm(mm);
+ rcu_read_unlock();
- css_get(&memcg->css);
- commit_charge(page, memcg);
+ ret = __mem_cgroup_charge(page, memcg, gfp);
- local_irq_disable();
- mem_cgroup_charge_statistics(memcg, page, nr_pages);
- memcg_check_events(memcg, page);
- local_irq_enable();
+ css_put(&memcg->css);
+ return ret;
+}
+/*
+ * mem_cgroup_finish_swapin_page - complete the swapin page charge transaction
+ * @page: page charged for swapin
+ * @entry: swap entry for which the page is charged
+ *
+ * This function completes the transaction of charging the page allocated for
+ * swapin.
+ */
+void mem_cgroup_finish_swapin_page(struct page *page, swp_entry_t entry)
+{
/*
* Cgroup1's unified memory+swap counter has been charged with the
* new swapcache page, finish the transfer by uncharging the swap
@@ -6760,20 +6796,14 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
* correspond 1:1 to page and swap slot lifetimes: we charge the
* page to memory here, and uncharge swap when the slot is freed.
*/
- if (do_memsw_account() && PageSwapCache(page)) {
- swp_entry_t entry = { .val = page_private(page) };
+ if (!mem_cgroup_disabled() && do_memsw_account()) {
/*
* The swap entry might not get freed for a long time,
* let's not wait for it. The page already received a
* memory+swap charge, drop the swap entry duplicate.
*/
- mem_cgroup_uncharge_swap(entry, nr_pages);
+ mem_cgroup_uncharge_swap(entry, thp_nr_pages(page));
}
-
-out_put:
- css_put(&memcg->css);
-out:
- return ret;
}
struct uncharge_gather {
diff --git a/mm/memory.c b/mm/memory.c
index c8e357627318..4cd3cd95bb70 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3307,21 +3307,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma,
vmf->address);
if (page) {
- int err;
-
__SetPageLocked(page);
__SetPageSwapBacked(page);
- set_page_private(page, entry.val);
-
- /* Tell memcg to use swap ownership records */
- SetPageSwapCache(page);
- err = mem_cgroup_charge(page, vma->vm_mm,
- GFP_KERNEL);
- ClearPageSwapCache(page);
- if (err) {
+
+ if (mem_cgroup_charge_swapin_page(page,
+ vma->vm_mm, GFP_KERNEL, entry)) {
ret = VM_FAULT_OOM;
goto out_page;
}
+ mem_cgroup_finish_swapin_page(page, entry);
shadow = get_shadow_from_swap_cache(entry);
if (shadow)
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 3cdee7b11da9..e69a8df7da33 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -497,16 +497,14 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
__SetPageLocked(page);
__SetPageSwapBacked(page);
- /* May fail (-ENOMEM) if XArray node allocation failed. */
- if (add_to_swap_cache(page, entry, gfp_mask & GFP_RECLAIM_MASK, &shadow)) {
- put_swap_page(page, entry);
+ if (mem_cgroup_charge_swapin_page(page, NULL, gfp_mask, entry))
goto fail_unlock;
- }
- if (mem_cgroup_charge(page, NULL, gfp_mask)) {
- delete_from_swap_cache(page);
+ /* May fail (-ENOMEM) if XArray node allocation failed. */
+ if (add_to_swap_cache(page, entry, gfp_mask & GFP_RECLAIM_MASK, &shadow))
goto fail_unlock;
- }
+
+ mem_cgroup_finish_swapin_page(page, entry);
if (shadow)
workingset_refault(page, shadow);
@@ -517,6 +515,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
return page;
fail_unlock:
+ put_swap_page(page, entry);
unlock_page(page);
put_page(page);
return NULL;
--
2.30.1.766.gb4fecdf3b7-goog
On Thu, Mar 4, 2021 at 7:48 AM Johannes Weiner <[email protected]> wrote:
>
> On Wed, Mar 03, 2021 at 05:42:29PM -0800, Shakeel Butt wrote:
> > Currently the kernel adds the page, allocated for swapin, to the
> > swapcache before charging the page. This is fine but now we want a
> > per-memcg swapcache stat which is essential for folks who wants to
> > transparently migrate from cgroup v1's memsw to cgroup v2's memory and
> > swap counters. In addition charging a page before exposing it to other
> > parts of the kernel is a step in the right direction.
> >
> > To correctly maintain the per-memcg swapcache stat, this patch has
> > adopted to charge the page before adding it to swapcache. One
> > challenge in this option is the failure case of add_to_swap_cache() on
> > which we need to undo the mem_cgroup_charge(). Specifically undoing
> > mem_cgroup_uncharge_swap() is not simple.
> >
> > To resolve the issue, this patch introduces transaction like interface
> > to charge a page for swapin. The function mem_cgroup_charge_swapin_page()
> > initiates the charging of the page and mem_cgroup_finish_swapin_page()
> > completes the charging process. So, the kernel starts the charging
> > process of the page for swapin with mem_cgroup_charge_swapin_page(),
> > adds the page to the swapcache and on success completes the charging
> > process with mem_cgroup_finish_swapin_page().
> >
> > Signed-off-by: Shakeel Butt <[email protected]>
>
> The patch looks good to me, I have just a minor documentation nit
> below. But with that addressed, please add:
>
> Acked-by: Johannes Weiner <[email protected]>
Thanks.
>
[...]
>
> It's possible somebody later needs to change things around in the
> swapin path and it's not immediately obvious when exactly these two
> functions need to be called in the swapin sequence.
>
> Maybe add here and above that charge_swapin_page needs to be called
> before we try adding the page to the swapcache, and finish_swapin_page
> needs to be called when swapcache insertion has been successful?
I will update the comments and send v4 after a day or so to see if
someone else has any comments.
On Wed, Mar 03, 2021 at 05:42:29PM -0800, Shakeel Butt wrote:
> Currently the kernel adds the page, allocated for swapin, to the
> swapcache before charging the page. This is fine but now we want a
> per-memcg swapcache stat which is essential for folks who wants to
> transparently migrate from cgroup v1's memsw to cgroup v2's memory and
> swap counters. In addition charging a page before exposing it to other
> parts of the kernel is a step in the right direction.
>
> To correctly maintain the per-memcg swapcache stat, this patch has
> adopted to charge the page before adding it to swapcache. One
> challenge in this option is the failure case of add_to_swap_cache() on
> which we need to undo the mem_cgroup_charge(). Specifically undoing
> mem_cgroup_uncharge_swap() is not simple.
>
> To resolve the issue, this patch introduces transaction like interface
> to charge a page for swapin. The function mem_cgroup_charge_swapin_page()
> initiates the charging of the page and mem_cgroup_finish_swapin_page()
> completes the charging process. So, the kernel starts the charging
> process of the page for swapin with mem_cgroup_charge_swapin_page(),
> adds the page to the swapcache and on success completes the charging
> process with mem_cgroup_finish_swapin_page().
>
> Signed-off-by: Shakeel Butt <[email protected]>
The patch looks good to me, I have just a minor documentation nit
below. But with that addressed, please add:
Acked-by: Johannes Weiner <[email protected]>
> +/**
> + * mem_cgroup_charge_swapin_page - charge a newly allocated page for swapin
> + * @page: page to charge
> + * @mm: mm context of the victim
> + * @gfp: reclaim mode
> + * @entry: swap entry for which the page is allocated
> + *
> + * This function marks the start of the transaction of charging the page for
> + * swapin. Complete the transaction with mem_cgroup_finish_swapin_page().
> + *
> + * Returns 0 on success. Otherwise, an error code is returned.
> + */
> +int mem_cgroup_charge_swapin_page(struct page *page, struct mm_struct *mm,
> + gfp_t gfp, swp_entry_t entry)
> +{
> + struct mem_cgroup *memcg;
> + unsigned short id;
> + int ret;
>
> - if (!memcg)
> - memcg = get_mem_cgroup_from_mm(mm);
> + if (mem_cgroup_disabled())
> + return 0;
>
> - ret = try_charge(memcg, gfp_mask, nr_pages);
> - if (ret)
> - goto out_put;
> + id = lookup_swap_cgroup_id(entry);
> + rcu_read_lock();
> + memcg = mem_cgroup_from_id(id);
> + if (!memcg || !css_tryget_online(&memcg->css))
> + memcg = get_mem_cgroup_from_mm(mm);
> + rcu_read_unlock();
>
> - css_get(&memcg->css);
> - commit_charge(page, memcg);
> + ret = __mem_cgroup_charge(page, memcg, gfp);
>
> - local_irq_disable();
> - mem_cgroup_charge_statistics(memcg, page, nr_pages);
> - memcg_check_events(memcg, page);
> - local_irq_enable();
> + css_put(&memcg->css);
> + return ret;
> +}
>
> +/*
> + * mem_cgroup_finish_swapin_page - complete the swapin page charge transaction
> + * @page: page charged for swapin
> + * @entry: swap entry for which the page is charged
> + *
> + * This function completes the transaction of charging the page allocated for
> + * swapin.
It's possible somebody later needs to change things around in the
swapin path and it's not immediately obvious when exactly these two
functions need to be called in the swapin sequence.
Maybe add here and above that charge_swapin_page needs to be called
before we try adding the page to the swapcache, and finish_swapin_page
needs to be called when swapcache insertion has been successful?
On Wed, 3 Mar 2021, Shakeel Butt wrote:
> Currently the kernel adds the page, allocated for swapin, to the
> swapcache before charging the page. This is fine but now we want a
> per-memcg swapcache stat which is essential for folks who wants to
> transparently migrate from cgroup v1's memsw to cgroup v2's memory and
> swap counters. In addition charging a page before exposing it to other
> parts of the kernel is a step in the right direction.
>
> To correctly maintain the per-memcg swapcache stat, this patch has
> adopted to charge the page before adding it to swapcache. One
> challenge in this option is the failure case of add_to_swap_cache() on
> which we need to undo the mem_cgroup_charge(). Specifically undoing
> mem_cgroup_uncharge_swap() is not simple.
>
> To resolve the issue, this patch introduces transaction like interface
> to charge a page for swapin. The function mem_cgroup_charge_swapin_page()
> initiates the charging of the page and mem_cgroup_finish_swapin_page()
> completes the charging process. So, the kernel starts the charging
> process of the page for swapin with mem_cgroup_charge_swapin_page(),
> adds the page to the swapcache and on success completes the charging
> process with mem_cgroup_finish_swapin_page().
>
> Signed-off-by: Shakeel Butt <[email protected]>
Quite apart from helping with the stat you want, what you've ended
up with here is a nice cleanup in several different ways (and I'm
glad Johannes talked you out of __GFP_NOFAIL: much better like this).
I'll say
Acked-by: Hugh Dickins <[email protected]>
but I am quite unhappy with the name mem_cgroup_finish_swapin_page():
it doesn't finish the swapin, it doesn't finish the page, and I'm
not persuaded by your paragraph above that there's any "transaction"
here (if there were, I'd suggest "commit" instead of "finish"'; and
I'd get worried by the css_put before it's called - but no, that's
fine, it's independent).
How about complementing mem_cgroup_charge_swapin_page() with
mem_cgroup_uncharge_swapin_swap()? I think that describes well
what it does, at least in the do_memsw_account() case, and I hope
we can overlook that it does nothing at all in the other case.
And it really doesn't need a page argument: both places it's called
have just allocated an order-0 page, there's no chance of a THP here;
but you might have some idea of future expansion, or matching
put_swap_page() - I won't object if you prefer to pass in the page.
But more interesting, though off-topic, comments on it below...
> +/*
> + * mem_cgroup_finish_swapin_page - complete the swapin page charge transaction
> + * @page: page charged for swapin
> + * @entry: swap entry for which the page is charged
> + *
> + * This function completes the transaction of charging the page allocated for
> + * swapin.
> + */
> +void mem_cgroup_finish_swapin_page(struct page *page, swp_entry_t entry)
> +{
> /*
> * Cgroup1's unified memory+swap counter has been charged with the
> * new swapcache page, finish the transfer by uncharging the swap
> @@ -6760,20 +6796,14 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
> * correspond 1:1 to page and swap slot lifetimes: we charge the
> * page to memory here, and uncharge swap when the slot is freed.
> */
> - if (do_memsw_account() && PageSwapCache(page)) {
> - swp_entry_t entry = { .val = page_private(page) };
> + if (!mem_cgroup_disabled() && do_memsw_account()) {
I understand why you put that !mem_cgroup_disabled() check in there,
but I have a series of observations on that.
First I was going to say that it would be better left to
mem_cgroup_uncharge_swap() itself.
Then I was going to say that I think it's already covered here
by the cgroup_memory_noswap check inside do_memsw_account().
Then, going back to mem_cgroup_uncharge_swap(), I realized that 5.8's
2d1c498072de ("mm: memcontrol: make swap tracking an integral part of
memory control") removed the do_swap_account or cgroup_memory_noswap
checks from mem_cgroup_uncharge_swap() and swap_cgroup_swapon() and
swap_cgroup_swapoff() - so since then we have been allocating totally
unnecessary swap_cgroup arrays when mem_cgroup_disabled() (and
mem_cgroup_uncharge_swap() has worked by reading the zalloced array).
I think, or am I confused? If I'm right on that, one of us ought to
send another patch putting back, either cgroup_memory_noswap checks
or mem_cgroup_disabled() checks in those three places - I suspect the
static key mem_cgroup_disabled() is preferable, but I'm getting dozy.
Whatever we do with that - and it's really not any business for this
patch - I think you can drop the mem_cgroup_disabled() check from
mem_cgroup_uncharge_swapin_swap().
> /*
> * The swap entry might not get freed for a long time,
> * let's not wait for it. The page already received a
> * memory+swap charge, drop the swap entry duplicate.
> */
> - mem_cgroup_uncharge_swap(entry, nr_pages);
> + mem_cgroup_uncharge_swap(entry, thp_nr_pages(page));
> }
> -
> -out_put:
> - css_put(&memcg->css);
> -out:
> - return ret;
> }
On Fri, Mar 05, 2021 at 12:06:31AM -0800, Hugh Dickins wrote:
> On Wed, 3 Mar 2021, Shakeel Butt wrote:
>
> > Currently the kernel adds the page, allocated for swapin, to the
> > swapcache before charging the page. This is fine but now we want a
> > per-memcg swapcache stat which is essential for folks who wants to
> > transparently migrate from cgroup v1's memsw to cgroup v2's memory and
> > swap counters. In addition charging a page before exposing it to other
> > parts of the kernel is a step in the right direction.
> >
> > To correctly maintain the per-memcg swapcache stat, this patch has
> > adopted to charge the page before adding it to swapcache. One
> > challenge in this option is the failure case of add_to_swap_cache() on
> > which we need to undo the mem_cgroup_charge(). Specifically undoing
> > mem_cgroup_uncharge_swap() is not simple.
> >
> > To resolve the issue, this patch introduces transaction like interface
> > to charge a page for swapin. The function mem_cgroup_charge_swapin_page()
> > initiates the charging of the page and mem_cgroup_finish_swapin_page()
> > completes the charging process. So, the kernel starts the charging
> > process of the page for swapin with mem_cgroup_charge_swapin_page(),
> > adds the page to the swapcache and on success completes the charging
> > process with mem_cgroup_finish_swapin_page().
> >
> > Signed-off-by: Shakeel Butt <[email protected]>
>
> Quite apart from helping with the stat you want, what you've ended
> up with here is a nice cleanup in several different ways (and I'm
> glad Johannes talked you out of __GFP_NOFAIL: much better like this).
> I'll say
>
> Acked-by: Hugh Dickins <[email protected]>
>
> but I am quite unhappy with the name mem_cgroup_finish_swapin_page():
> it doesn't finish the swapin, it doesn't finish the page, and I'm
> not persuaded by your paragraph above that there's any "transaction"
> here (if there were, I'd suggest "commit" instead of "finish"'; and
> I'd get worried by the css_put before it's called - but no, that's
> fine, it's independent).
>
> How about complementing mem_cgroup_charge_swapin_page() with
> mem_cgroup_uncharge_swapin_swap()? I think that describes well
> what it does, at least in the do_memsw_account() case, and I hope
> we can overlook that it does nothing at all in the other case.
Yes, that's better. The sequence is still somewhat mysterious for
people not overly familiar with memcg swap internals, but it's much
clearer for people who are.
I mildly prefer swapping the swapin bit:
mem_cgroup_swapin_charge_page()
mem_cgroup_swapin_uncharge_swap()
But either way works for me.
> And it really doesn't need a page argument: both places it's called
> have just allocated an order-0 page, there's no chance of a THP here;
> but you might have some idea of future expansion, or matching
> put_swap_page() - I won't object if you prefer to pass in the page.
Agreed.
> > + * mem_cgroup_finish_swapin_page - complete the swapin page charge transaction
> > + * @page: page charged for swapin
> > + * @entry: swap entry for which the page is charged
> > + *
> > + * This function completes the transaction of charging the page allocated for
> > + * swapin.
> > + */
> > +void mem_cgroup_finish_swapin_page(struct page *page, swp_entry_t entry)
> > +{
> > /*
> > * Cgroup1's unified memory+swap counter has been charged with the
> > * new swapcache page, finish the transfer by uncharging the swap
> > @@ -6760,20 +6796,14 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
> > * correspond 1:1 to page and swap slot lifetimes: we charge the
> > * page to memory here, and uncharge swap when the slot is freed.
> > */
> > - if (do_memsw_account() && PageSwapCache(page)) {
> > - swp_entry_t entry = { .val = page_private(page) };
> > + if (!mem_cgroup_disabled() && do_memsw_account()) {
>
> I understand why you put that !mem_cgroup_disabled() check in there,
> but I have a series of observations on that.
>
> First I was going to say that it would be better left to
> mem_cgroup_uncharge_swap() itself.
>
> Then I was going to say that I think it's already covered here
> by the cgroup_memory_noswap check inside do_memsw_account().
>
> Then, going back to mem_cgroup_uncharge_swap(), I realized that 5.8's
> 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of
> memory control") removed the do_swap_account or cgroup_memory_noswap
> checks from mem_cgroup_uncharge_swap() and swap_cgroup_swapon() and
> swap_cgroup_swapoff() - so since then we have been allocating totally
> unnecessary swap_cgroup arrays when mem_cgroup_disabled() (and
> mem_cgroup_uncharge_swap() has worked by reading the zalloced array).
>
> I think, or am I confused? If I'm right on that, one of us ought to
> send another patch putting back, either cgroup_memory_noswap checks
> or mem_cgroup_disabled() checks in those three places - I suspect the
> static key mem_cgroup_disabled() is preferable, but I'm getting dozy.
You're right, that patch was overzealous.
The point behind it was to ALWAYS track swap ownership when memcg is
enabled, so that even if swap space COUNTING has been disabled, we'll
charge pages back to their original owners during swap readahead and
on swapoff. Therefor, we have to allocate the arrays even if the user
requested cgroup_memory_noswap.
But we certainly don't need to allocate and maintain the array when
mem_cgroup_disabled() altogether. I'll send a patch to add those back.
I'd also rename cgroup_memory_noswap to cgroup_swapaccount - to match
the commandline and (hopefully) make a bit clearer what it effects.
> Whatever we do with that - and it's really not any business for this
> patch - I think you can drop the mem_cgroup_disabled() check from
> mem_cgroup_uncharge_swapin_swap().
Yes. do_memsw_account() implies !mem_cgroup_disabled(), as disabling
memcg sets cgroup_memory_noswap.
On Fri, Mar 5, 2021 at 8:25 AM Johannes Weiner <[email protected]> wrote:
>
[...]
> I'd also rename cgroup_memory_noswap to cgroup_swapaccount - to match
> the commandline and (hopefully) make a bit clearer what it effects.
Do we really need to keep supporting "swapaccount=0"? Is swap
page_counter really a performance issue for systems with memcg and
swap? To me, deprecating "swapaccount=0" simplifies already
complicated code.
On Fri, Mar 5, 2021 at 8:25 AM Johannes Weiner <[email protected]> wrote:
>
> On Fri, Mar 05, 2021 at 12:06:31AM -0800, Hugh Dickins wrote:
> > On Wed, 3 Mar 2021, Shakeel Butt wrote:
> >
> > > Currently the kernel adds the page, allocated for swapin, to the
> > > swapcache before charging the page. This is fine but now we want a
> > > per-memcg swapcache stat which is essential for folks who wants to
> > > transparently migrate from cgroup v1's memsw to cgroup v2's memory and
> > > swap counters. In addition charging a page before exposing it to other
> > > parts of the kernel is a step in the right direction.
> > >
> > > To correctly maintain the per-memcg swapcache stat, this patch has
> > > adopted to charge the page before adding it to swapcache. One
> > > challenge in this option is the failure case of add_to_swap_cache() on
> > > which we need to undo the mem_cgroup_charge(). Specifically undoing
> > > mem_cgroup_uncharge_swap() is not simple.
> > >
> > > To resolve the issue, this patch introduces transaction like interface
> > > to charge a page for swapin. The function mem_cgroup_charge_swapin_page()
> > > initiates the charging of the page and mem_cgroup_finish_swapin_page()
> > > completes the charging process. So, the kernel starts the charging
> > > process of the page for swapin with mem_cgroup_charge_swapin_page(),
> > > adds the page to the swapcache and on success completes the charging
> > > process with mem_cgroup_finish_swapin_page().
> > >
> > > Signed-off-by: Shakeel Butt <[email protected]>
> >
> > Quite apart from helping with the stat you want, what you've ended
> > up with here is a nice cleanup in several different ways (and I'm
> > glad Johannes talked you out of __GFP_NOFAIL: much better like this).
> > I'll say
> >
> > Acked-by: Hugh Dickins <[email protected]>
> >
> > but I am quite unhappy with the name mem_cgroup_finish_swapin_page():
> > it doesn't finish the swapin, it doesn't finish the page, and I'm
> > not persuaded by your paragraph above that there's any "transaction"
> > here (if there were, I'd suggest "commit" instead of "finish"'; and
> > I'd get worried by the css_put before it's called - but no, that's
> > fine, it's independent).
> >
> > How about complementing mem_cgroup_charge_swapin_page() with
> > mem_cgroup_uncharge_swapin_swap()? I think that describes well
> > what it does, at least in the do_memsw_account() case, and I hope
> > we can overlook that it does nothing at all in the other case.
>
> Yes, that's better. The sequence is still somewhat mysterious for
> people not overly familiar with memcg swap internals, but it's much
> clearer for people who are.
>
> I mildly prefer swapping the swapin bit:
>
> mem_cgroup_swapin_charge_page()
> mem_cgroup_swapin_uncharge_swap()
>
> But either way works for me.
>
I will do a coin toss to select one.
> > And it really doesn't need a page argument: both places it's called
> > have just allocated an order-0 page, there's no chance of a THP here;
> > but you might have some idea of future expansion, or matching
> > put_swap_page() - I won't object if you prefer to pass in the page.
>
> Agreed.
Will remove the page parameter.
BTW I will keep mem_cgroup_disabled() check in the uncharge swap
function as I am thinking of removing "swapaccount=0" functionality.
On Fri, Mar 05, 2021 at 08:42:00AM -0800, Shakeel Butt wrote:
> On Fri, Mar 5, 2021 at 8:25 AM Johannes Weiner <[email protected]> wrote:
> >
> [...]
> > I'd also rename cgroup_memory_noswap to cgroup_swapaccount - to match
> > the commandline and (hopefully) make a bit clearer what it effects.
>
> Do we really need to keep supporting "swapaccount=0"? Is swap
> page_counter really a performance issue for systems with memcg and
> swap? To me, deprecating "swapaccount=0" simplifies already
> complicated code.
Now that you mention it, it's probably really not worth it.
I'll replace my cleanup patch with a removal patch that eliminates
everything behind swapaccount= except for a deprecation warning...