2021-03-19 16:40:24

by Muchun Song

[permalink] [raw]
Subject: [PATCH v5 1/7] mm: memcontrol: slab: fix obtain a reference to a freeing memcg

The rcu_read_lock/unlock only can guarantee that the memcg will not be
freed, but it cannot guarantee the success of css_get (which is in the
refill_stock when cached memcg changed) to memcg.

rcu_read_lock()
memcg = obj_cgroup_memcg(old)
__memcg_kmem_uncharge(memcg)
refill_stock(memcg)
if (stock->cached != memcg)
// css_get can change the ref counter from 0 back to 1.
css_get(&memcg->css)
rcu_read_unlock()

This fix is very like the commit:

eefbfa7fd678 ("mm: memcg/slab: fix use after free in obj_cgroup_charge")

Fix this by holding a reference to the memcg which is passed to the
__memcg_kmem_uncharge() before calling __memcg_kmem_uncharge().

Fixes: 3de7d4f25a74 ("mm: memcg/slab: optimize objcg stock draining")
Signed-off-by: Muchun Song <[email protected]>
---
mm/memcontrol.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 845eec01ef9d..2cda76ff0629 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3181,9 +3181,17 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock)
unsigned int nr_bytes = stock->nr_bytes & (PAGE_SIZE - 1);

if (nr_pages) {
+ struct mem_cgroup *memcg;
+
rcu_read_lock();
- __memcg_kmem_uncharge(obj_cgroup_memcg(old), nr_pages);
+retry:
+ memcg = obj_cgroup_memcg(old);
+ if (unlikely(!css_tryget(&memcg->css)))
+ goto retry;
rcu_read_unlock();
+
+ __memcg_kmem_uncharge(memcg, nr_pages);
+ css_put(&memcg->css);
}

/*
--
2.11.0


2021-03-19 18:29:33

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH v5 1/7] mm: memcontrol: slab: fix obtain a reference to a freeing memcg

On Fri, Mar 19, 2021 at 9:38 AM Muchun Song <[email protected]> wrote:
>
> The rcu_read_lock/unlock only can guarantee that the memcg will not be
> freed, but it cannot guarantee the success of css_get (which is in the
> refill_stock when cached memcg changed) to memcg.
>
> rcu_read_lock()
> memcg = obj_cgroup_memcg(old)
> __memcg_kmem_uncharge(memcg)
> refill_stock(memcg)
> if (stock->cached != memcg)
> // css_get can change the ref counter from 0 back to 1.
> css_get(&memcg->css)
> rcu_read_unlock()
>
> This fix is very like the commit:
>
> eefbfa7fd678 ("mm: memcg/slab: fix use after free in obj_cgroup_charge")
>
> Fix this by holding a reference to the memcg which is passed to the
> __memcg_kmem_uncharge() before calling __memcg_kmem_uncharge().
>
> Fixes: 3de7d4f25a74 ("mm: memcg/slab: optimize objcg stock draining")
> Signed-off-by: Muchun Song <[email protected]>

Good catch.

Reviewed-by: Shakeel Butt <[email protected]>

2021-03-22 14:50:26

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH v5 1/7] mm: memcontrol: slab: fix obtain a reference to a freeing memcg

On Sat, Mar 20, 2021 at 12:38:14AM +0800, Muchun Song wrote:
> The rcu_read_lock/unlock only can guarantee that the memcg will not be
> freed, but it cannot guarantee the success of css_get (which is in the
> refill_stock when cached memcg changed) to memcg.
>
> rcu_read_lock()
> memcg = obj_cgroup_memcg(old)
> __memcg_kmem_uncharge(memcg)
> refill_stock(memcg)
> if (stock->cached != memcg)
> // css_get can change the ref counter from 0 back to 1.
> css_get(&memcg->css)
> rcu_read_unlock()
>
> This fix is very like the commit:
>
> eefbfa7fd678 ("mm: memcg/slab: fix use after free in obj_cgroup_charge")
>
> Fix this by holding a reference to the memcg which is passed to the
> __memcg_kmem_uncharge() before calling __memcg_kmem_uncharge().
>
> Fixes: 3de7d4f25a74 ("mm: memcg/slab: optimize objcg stock draining")
> Signed-off-by: Muchun Song <[email protected]>

Acked-by: Johannes Weiner <[email protected]>

Good catch! Did you trigger the WARN_ON() in
percpu_ref_kill_and_confirm() during testing?

2021-03-22 18:19:28

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH v5 1/7] mm: memcontrol: slab: fix obtain a reference to a freeing memcg

On Sat, Mar 20, 2021 at 12:38:14AM +0800, Muchun Song wrote:
> The rcu_read_lock/unlock only can guarantee that the memcg will not be
> freed, but it cannot guarantee the success of css_get (which is in the
> refill_stock when cached memcg changed) to memcg.
>
> rcu_read_lock()
> memcg = obj_cgroup_memcg(old)
> __memcg_kmem_uncharge(memcg)
> refill_stock(memcg)
> if (stock->cached != memcg)
> // css_get can change the ref counter from 0 back to 1.
> css_get(&memcg->css)
> rcu_read_unlock()
>
> This fix is very like the commit:
>
> eefbfa7fd678 ("mm: memcg/slab: fix use after free in obj_cgroup_charge")
>
> Fix this by holding a reference to the memcg which is passed to the
> __memcg_kmem_uncharge() before calling __memcg_kmem_uncharge().
>
> Fixes: 3de7d4f25a74 ("mm: memcg/slab: optimize objcg stock draining")
> Signed-off-by: Muchun Song <[email protected]>

Acked-by: Roman Gushchin <[email protected]>

Thanks!

2021-03-23 09:20:57

by Muchun Song

[permalink] [raw]
Subject: Re: [External] Re: [PATCH v5 1/7] mm: memcontrol: slab: fix obtain a reference to a freeing memcg

On Mon, Mar 22, 2021 at 10:46 PM Johannes Weiner <[email protected]> wrote:
>
> On Sat, Mar 20, 2021 at 12:38:14AM +0800, Muchun Song wrote:
> > The rcu_read_lock/unlock only can guarantee that the memcg will not be
> > freed, but it cannot guarantee the success of css_get (which is in the
> > refill_stock when cached memcg changed) to memcg.
> >
> > rcu_read_lock()
> > memcg = obj_cgroup_memcg(old)
> > __memcg_kmem_uncharge(memcg)
> > refill_stock(memcg)
> > if (stock->cached != memcg)
> > // css_get can change the ref counter from 0 back to 1.
> > css_get(&memcg->css)
> > rcu_read_unlock()
> >
> > This fix is very like the commit:
> >
> > eefbfa7fd678 ("mm: memcg/slab: fix use after free in obj_cgroup_charge")
> >
> > Fix this by holding a reference to the memcg which is passed to the
> > __memcg_kmem_uncharge() before calling __memcg_kmem_uncharge().
> >
> > Fixes: 3de7d4f25a74 ("mm: memcg/slab: optimize objcg stock draining")
> > Signed-off-by: Muchun Song <[email protected]>
>
> Acked-by: Johannes Weiner <[email protected]>
>
> Good catch! Did you trigger the WARN_ON() in
> percpu_ref_kill_and_confirm() during testing?

No. The race window is very small, it should be difficult to trigger.
When I reviewed the code here, I suddenly realized that there
might be a problem here. Very coincidental.

Thanks.