Dan Schatzberg writes:
>This is a dependency for 3/3
This can be omitted, since "3" won't mean anything in the change history (and
patch series are generally considered as a unit unless there are explicit
requests to split them out).
>memalloc_use_memcg() worked for kernel allocations but was silently
>ignored for user pages.
>
>This patch establishes a precedence order for who gets charged:
>
>1. If there is a memcg associated with the page already, that memcg is
> charged. This happens during swapin.
>
>2. If an explicit mm is passed, mm->memcg is charged. This happens
> during page faults, which can be triggered in remote VMs (eg gup).
>
>3. Otherwise consult the current process context. If it has configured
> a current->active_memcg, use that. Otherwise, current->mm->memcg.
>
>Signed-off-by: Dan Schatzberg <[email protected]>
>Acked-by: Johannes Weiner <[email protected]>
Thanks, this seems reasonable. One (minor and optional) suggestion would be to
make the title more clear that this is a change in
try_charge/memalloc_use_memcg behaviour overall rather than a charge site,
since this wasn't what I expected to find when I saw the patch title :-)
I only have one other question about behaviour when there is no active_memcg
and mm/memcg in try_charge are NULL below, but assuming that's been checked:
Acked-by: Chris Down <[email protected]>
>---
> mm/memcontrol.c | 11 ++++++++---
> mm/shmem.c | 2 +-
> 2 files changed, 9 insertions(+), 4 deletions(-)
>
>diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>index f7da3ff135ed..69935d166bdb 100644
>--- a/mm/memcontrol.c
>+++ b/mm/memcontrol.c
>@@ -6812,7 +6812,8 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
> * @compound: charge the page as compound or small page
> *
> * Try to charge @page to the memcg that @mm belongs to, reclaiming
>- * pages according to @gfp_mask if necessary.
>+ * pages according to @gfp_mask if necessary. If @mm is NULL, try to
>+ * charge to the active memcg.
> *
> * Returns 0 on success, with *@memcgp pointing to the charged memcg.
> * Otherwise, an error code is returned.
>@@ -6856,8 +6857,12 @@ int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
> }
> }
>
>- if (!memcg)
>- memcg = get_mem_cgroup_from_mm(mm);
>+ if (!memcg) {
>+ if (!mm)
>+ memcg = get_mem_cgroup_from_current();
>+ else
>+ memcg = get_mem_cgroup_from_mm(mm);
>+ }
Just to do due diligence, did we double check whether this results in any
unintentional shift in accounting for those passing in both mm and memcg as
NULL with no current->active_memcg set, since previously we never even tried to
consult current->mm and always used root_mem_cgroup in get_mem_cgroup_from_mm?
It's entirely possible that this results in exactly the same outcome as before
just by different means, but with the number of try_charge callsites I'm not
totally certain of that.
>
> ret = try_charge(memcg, gfp_mask, nr_pages, false);
>
>diff --git a/mm/shmem.c b/mm/shmem.c
>index ca74ede9e40b..70aabd9aba1a 100644
>--- a/mm/shmem.c
>+++ b/mm/shmem.c
>@@ -1748,7 +1748,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
> }
>
> sbinfo = SHMEM_SB(inode->i_sb);
>- charge_mm = vma ? vma->vm_mm : current->mm;
>+ charge_mm = vma ? vma->vm_mm : NULL;
>
> page = find_lock_entry(mapping, index);
> if (xa_is_value(page)) {
>--
>2.17.1
>
On Fri, Feb 07, 2020 at 09:18:07PM +0000, Chris Down wrote:
> > @@ -6856,8 +6857,12 @@ int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
> > }
> > }
> >
> > - if (!memcg)
> > - memcg = get_mem_cgroup_from_mm(mm);
> > + if (!memcg) {
> > + if (!mm)
> > + memcg = get_mem_cgroup_from_current();
> > + else
> > + memcg = get_mem_cgroup_from_mm(mm);
> > + }
>
> Just to do due diligence, did we double check whether this results in any
> unintentional shift in accounting for those passing in both mm and memcg as
> NULL with no current->active_memcg set, since previously we never even tried
> to consult current->mm and always used root_mem_cgroup in
> get_mem_cgroup_from_mm?
Excellent question on a subtle issue.
But nobody actually passes NULL. They either pass current->mm (or a
destination mm) in syscalls, or vma->vm_mm in page faults.
The only times we end up with NULL is when kernel threads do something
and have !current->mm. We redirect those to root_mem_cgroup.
So this patch doesn't change those semantics.