2021-08-16 07:14:06

by Chen Huang

[permalink] [raw]
Subject: [PATCH 5.10.y 00/11] mm: memcontrol: fix nullptr in __mod_lruvec_page_state()

We found a nullptr in __mod_lruvec_page_state(),
UIO driver:
kmalloc(PAGE_SIZE)
UIO user:
mmap() then read, but before user read the page, others may alloc the
page that belong to the same compound page and modify the head page's obj_cgroups
likes that:
[ 94.845687] memcg_alloc_page_obj_cgroups+0x50/0xa0
[ 94.846334] slab_post_alloc_hook+0xc8/0x184
[ 94.846852] kmem_cache_alloc+0x148/0x2a4
[ 94.847346] __d_alloc+0x30/0x2e4
[ 94.847809] d_alloc+0x30/0xc0

Then when the user reads the page, in __mod_lruvec_page_state(), it will get the
nullptr in head->mem_cgroup.

[ 94.882699] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000080
[ 94.882773] Mem abort info:
[ 94.882819] ESR = 0x96000006
[ 94.882953] EC = 0x25: DABT (current EL), IL = 32 bits
[ 94.883000] SET = 0, FnV = 0
[ 94.883043] EA = 0, S1PTW = 0
[ 94.883089] Data abort info:
[ 94.883134] ISV = 0, ISS = 0x00000006
[ 94.883179] CM = 0, WnR = 0
[ 94.883402] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010c355000
[ 94.883495] [0000000000000080] pgd=000000010c046003, p4d=000000010c046003, pud=000000010c368003, pmd=0000000000000000
[ 94.884225] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 94.884480] Modules linked in:
[ 94.884788] CPU: 0 PID: 250 Comm: uio_user_mmap Tainted: G B 5.10.0-07799-ged92fcf8d408-dirty #112
[ 94.884837] Hardware name: linux,dummy-virt (DT)
[ 94.885052] pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
[ 94.885169] pc : __mod_lruvec_page_state+0x118/0x180
[ 94.885249] lr : __mod_lruvec_page_state+0x118/0x180
[ 94.885297] sp : ffff2872ce25fb40
[ 94.885402] x29: ffff2872ce25fb40 x28: 0000000000000254
[ 94.885572] x27: 0000000000000000 x26: ffff2872fe2d7c38
[ 94.885724] x25: ffffa000242e7dc0 x24: 0000000000000001
[ 94.885872] x23: 0000000000000012 x22: ffffa00022bcfc60
[ 94.886030] x21: ffff2872fffeb380 x20: 0000000000000144
[ 94.886169] x19: 0000000000000000 x18: 0000000000000000
[ 94.886331] x17: 0000000000000000 x16: 0000000000000000
[ 94.886476] x15: 0000000000000000 x14: 3078303a7865646e
[ 94.886625] x13: 6920303030303030 x12: 1fffe50e5b713f20
[ 94.886765] x11: ffff850e5b713f20 x10: 616d20303a746e75
[ 94.886947] x9 : dfffa00000000000 x8 : 3266666666203d20
[ 94.887095] x7 : ffff2872db89f903 x6 : 0000000000000000
[ 94.887236] x5 : 0000000000000000 x4 : dfffa00000000000
[ 94.887381] x3 : ffffa00021e6c5dc x2 : 0000000000000000
[ 94.887515] x1 : 0000000000000008 x0 : 0000000000000000
[ 94.887702] Call trace:
[ 94.887840] __mod_lruvec_page_state+0x118/0x180
[ 94.887919] page_add_file_rmap+0xa8/0xe0
[ 94.887998] alloc_set_pte+0x2c4/0x2d0
[ 94.888074] finish_fault+0x94/0xcc
[ 94.888157] handle_mm_fault+0x7c8/0x1094
[ 94.888230] do_page_fault+0x358/0x490
[ 94.888300] do_translation_fault+0x38/0x54
[ 94.888370] do_mem_abort+0x5c/0xe4
[ 94.888435] el0_da+0x3c/0x4c
[ 94.888506] el0_sync_handler+0xd8/0x14c
[ 94.888573] el0_sync+0x148/0x180
[ 94.888963] Code: d2835101 8b0102b3 91020260 9400e8da (f9404260)
[ 94.889860] ---[ end trace 1de53a0bd9084cde ]---
[ 94.890244] Kernel panic - not syncing: Oops: Fatal exception
[ 94.890620] SMP: stopping secondary CPUs
[ 94.891117] Kernel Offset: 0x11c00000 from 0xffffa00010000000
[ 94.891179] PHYS_OFFSET: 0xffffd78e40000000
[ 94.891293] CPU features: 0x0660012,41002000
[ 94.891365] Memory Limit: none
[ 94.927552] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---

1. Roman Gushchin's 4 patch remove this limitation by moving the PageKmemcg
flag into one of the free bits of the page->mem_cgroup pointer. Also it
formalizes accesses to the page->mem_cgroup and page->obj_cgroups
using new helpers, adds several checks and removes a couple of obsolete
functions.
Link: https://lkml.kernel.org/r/[email protected]

2. Muchun Song's patchset aim to make those kmem pages to drop the reference
to memory cgroup by using the APIs of obj_cgroup.
Link: https://lkml.kernel.org/r/[email protected]

3. Wang Hai's patch is a bugfix for "mm: memcontrol/slab: Use helpers to
access slab page's memcg_data"
Link: https://lkml.kernel.org/r/[email protected]

Muchun Song (6):
mm: memcontrol: introduce obj_cgroup_{un}charge_pages
mm: memcontrol: directly access page->memcg_data in mm/page_alloc.c
mm: memcontrol: change ug->dummy_page only if memcg changed
mm: memcontrol: use obj_cgroup APIs to charge kmem pages
Conflict for commit c47d5032ed3002311a4188eae51f4641ec436beb not merged
mm: memcontrol: inline __memcg_kmem_{un}charge() into
obj_cgroup_{un}charge_pages()
mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM

Roman Gushchin (4):
mm: memcontrol: Use helpers to read page's memcg data
Conflict function:split_page_memcg(), for commit 002ea848d7fd3bdcb6281e75bdde28095c2cd549
mm: memcontrol/slab: Use helpers to access slab page's memcg_data
mm: Introduce page memcg flags
mm: Convert page kmemcg type to a page memcg flag

Wang Hai (1):
mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook()

fs/buffer.c | 2 +-
fs/iomap/buffered-io.c | 2 +-
include/linux/memcontrol.h | 320 +++++++++++++++++++++++++++--
include/linux/mm.h | 22 --
include/linux/mm_types.h | 5 +-
include/linux/page-flags.h | 11 +-
include/trace/events/writeback.h | 2 +-
kernel/fork.c | 7 +-
mm/debug.c | 4 +-
mm/huge_memory.c | 4 +-
mm/memcontrol.c | 336 +++++++++++++++----------------
mm/page_alloc.c | 8 +-
mm/page_io.c | 6 +-
mm/slab.h | 38 +---
mm/workingset.c | 2 +-
15 files changed, 493 insertions(+), 276 deletions(-)

--
2.18.0.huawei.25


2021-08-16 07:14:09

by Chen Huang

[permalink] [raw]
Subject: [PATCH 5.10.y 03/11] mm: Introduce page memcg flags

From: Roman Gushchin <[email protected]>

The lowest bit in page->memcg_data is used to distinguish between struct
memory_cgroup pointer and a pointer to a objcgs array. All checks and
modifications of this bit are open-coded.

Let's formalize it using page memcg flags, defined in enum
page_memcg_data_flags.

Additional flags might be added later.

Signed-off-by: Roman Gushchin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Chen Huang <[email protected]>
---
include/linux/memcontrol.h | 32 ++++++++++++++++++++------------
1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 2805fe81f97d..4a0feb9d4b82 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -343,6 +343,15 @@ struct mem_cgroup {

extern struct mem_cgroup *root_mem_cgroup;

+enum page_memcg_data_flags {
+ /* page->memcg_data is a pointer to an objcgs vector */
+ MEMCG_DATA_OBJCGS = (1UL << 0),
+ /* the next bit after the last actual flag */
+ __NR_MEMCG_DATA_FLAGS = (1UL << 1),
+};
+
+#define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
+
/*
* page_memcg - get the memory cgroup associated with a page
* @page: a pointer to the page struct
@@ -404,13 +413,7 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
*/
unsigned long memcg_data = READ_ONCE(page->memcg_data);

- /*
- * The lowest bit set means that memcg isn't a valid
- * memcg pointer, but a obj_cgroups pointer.
- * In this case the page is shared and doesn't belong
- * to any specific memory cgroup.
- */
- if (memcg_data & 0x1UL)
+ if (memcg_data & MEMCG_DATA_OBJCGS)
return NULL;

return (struct mem_cgroup *)memcg_data;
@@ -429,7 +432,11 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
*/
static inline struct obj_cgroup **page_objcgs(struct page *page)
{
- return (struct obj_cgroup **)(READ_ONCE(page->memcg_data) & ~0x1UL);
+ unsigned long memcg_data = READ_ONCE(page->memcg_data);
+
+ VM_BUG_ON_PAGE(memcg_data && !(memcg_data & MEMCG_DATA_OBJCGS), page);
+
+ return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
}

/*
@@ -444,10 +451,10 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page)
{
unsigned long memcg_data = READ_ONCE(page->memcg_data);

- if (memcg_data && (memcg_data & 0x1UL))
- return (struct obj_cgroup **)(memcg_data & ~0x1UL);
+ if (!memcg_data || !(memcg_data & MEMCG_DATA_OBJCGS))
+ return NULL;

- return NULL;
+ return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
}

/*
@@ -460,7 +467,8 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page)
static inline bool set_page_objcgs(struct page *page,
struct obj_cgroup **objcgs)
{
- return !cmpxchg(&page->memcg_data, 0, (unsigned long)objcgs | 0x1UL);
+ return !cmpxchg(&page->memcg_data, 0, (unsigned long)objcgs |
+ MEMCG_DATA_OBJCGS);
}
#else
static inline struct obj_cgroup **page_objcgs(struct page *page)
--
2.18.0.huawei.25

2021-08-16 07:14:21

by Chen Huang

[permalink] [raw]
Subject: [PATCH 5.10.y 08/11] mm: memcontrol: use obj_cgroup APIs to charge kmem pages

From: Muchun Song <[email protected]>

Since Roman's series "The new cgroup slab memory controller" applied.
All slab objects are charged via the new APIs of obj_cgroup. The new
APIs introduce a struct obj_cgroup to charge slab objects. It prevents
long-living objects from pinning the original memory cgroup in the
memory. But there are still some corner objects (e.g. allocations
larger than order-1 page on SLUB) which are not charged via the new
APIs. Those objects (include the pages which are allocated from buddy
allocator directly) are charged as kmem pages which still hold a
reference to the memory cgroup.

We want to reuse the obj_cgroup APIs to charge the kmem pages. If we do
that, we should store an object cgroup pointer to page->memcg_data for
the kmem pages.

Finally, page->memcg_data will have 3 different meanings.

1) For the slab pages, page->memcg_data points to an object cgroups
vector.

2) For the kmem pages (exclude the slab pages), page->memcg_data
points to an object cgroup.

3) For the user pages (e.g. the LRU pages), page->memcg_data points
to a memory cgroup.

We do not change the behavior of page_memcg() and page_memcg_rcu(). They
are also suitable for LRU pages and kmem pages. Why?

Because memory allocations pinning memcgs for a long time - it exists at a
larger scale and is causing recurring problems in the real world: page
cache doesn't get reclaimed for a long time, or is used by the second,
third, fourth, ... instance of the same job that was restarted into a new
cgroup every time. Unreclaimable dying cgroups pile up, waste memory, and
make page reclaim very inefficient.

We can convert LRU pages and most other raw memcg pins to the objcg
direction to fix this problem, and then the page->memcg will always point
to an object cgroup pointer. At that time, LRU pages and kmem pages will
be treated the same. The implementation of page_memcg() will remove the
kmem page check.

This patch aims to charge the kmem pages by using the new APIs of
obj_cgroup. Finally, the page->memcg_data of the kmem page points to an
object cgroup. We can use the __page_objcg() to get the object cgroup
associated with a kmem page. Or we can use page_memcg() to get the memory
cgroup associated with a kmem page, but caller must ensure that the
returned memcg won't be released (e.g. acquire the rcu_read_lock or
css_set_lock).

Link: https://lkml.kernel.org/r/[email protected]

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Xiongchun Duan <[email protected]>
Cc: Christian Borntraeger <[email protected]>
[[email protected]: fix forget to obtain the ref to objcg in split_page_memcg]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Conflicts:
include/linux/memcontrol.h
mm/memcontrol.c
Signed-off-by: Chen Huang <[email protected]>
---
include/linux/memcontrol.h | 126 ++++++++++++++++++++++++++++++-------
mm/memcontrol.c | 117 ++++++++++++++++------------------
2 files changed, 157 insertions(+), 86 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1a357edd2a1e..b8bb5d37d4ad 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -354,6 +354,62 @@ enum page_memcg_data_flags {

#define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)

+static inline bool PageMemcgKmem(struct page *page);
+
+/*
+ * After the initialization objcg->memcg is always pointing at
+ * a valid memcg, but can be atomically swapped to the parent memcg.
+ *
+ * The caller must ensure that the returned memcg won't be released:
+ * e.g. acquire the rcu_read_lock or css_set_lock.
+ */
+static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
+{
+ return READ_ONCE(objcg->memcg);
+}
+
+/*
+ * __page_memcg - get the memory cgroup associated with a non-kmem page
+ * @page: a pointer to the page struct
+ *
+ * Returns a pointer to the memory cgroup associated with the page,
+ * or NULL. This function assumes that the page is known to have a
+ * proper memory cgroup pointer. It's not safe to call this function
+ * against some type of pages, e.g. slab pages or ex-slab pages or
+ * kmem pages.
+ */
+static inline struct mem_cgroup *__page_memcg(struct page *page)
+{
+ unsigned long memcg_data = page->memcg_data;
+
+ VM_BUG_ON_PAGE(PageSlab(page), page);
+ VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
+ VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page);
+
+ return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+}
+
+/*
+ * __page_objcg - get the object cgroup associated with a kmem page
+ * @page: a pointer to the page struct
+ *
+ * Returns a pointer to the object cgroup associated with the page,
+ * or NULL. This function assumes that the page is known to have a
+ * proper object cgroup pointer. It's not safe to call this function
+ * against some type of pages, e.g. slab pages or ex-slab pages or
+ * LRU pages.
+ */
+static inline struct obj_cgroup *__page_objcg(struct page *page)
+{
+ unsigned long memcg_data = page->memcg_data;
+
+ VM_BUG_ON_PAGE(PageSlab(page), page);
+ VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
+ VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page);
+
+ return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+}
+
/*
* page_memcg - get the memory cgroup associated with a page
* @page: a pointer to the page struct
@@ -363,20 +419,23 @@ enum page_memcg_data_flags {
* proper memory cgroup pointer. It's not safe to call this function
* against some type of pages, e.g. slab pages or ex-slab pages.
*
- * Any of the following ensures page and memcg binding stability:
+ * For a non-kmem page any of the following ensures page and memcg binding
+ * stability:
+ *
* - the page lock
* - LRU isolation
* - lock_page_memcg()
* - exclusive reference
+ *
+ * For a kmem page a caller should hold an rcu read lock to protect memcg
+ * associated with a kmem page from being released.
*/
static inline struct mem_cgroup *page_memcg(struct page *page)
{
- unsigned long memcg_data = page->memcg_data;
-
- VM_BUG_ON_PAGE(PageSlab(page), page);
- VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
-
- return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ if (PageMemcgKmem(page))
+ return obj_cgroup_memcg(__page_objcg(page));
+ else
+ return __page_memcg(page);
}

/*
@@ -390,11 +449,19 @@ static inline struct mem_cgroup *page_memcg(struct page *page)
*/
static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
{
+ unsigned long memcg_data = READ_ONCE(page->memcg_data);
+
VM_BUG_ON_PAGE(PageSlab(page), page);
WARN_ON_ONCE(!rcu_read_lock_held());

- return (struct mem_cgroup *)(READ_ONCE(page->memcg_data) &
- ~MEMCG_DATA_FLAGS_MASK);
+ if (memcg_data & MEMCG_DATA_KMEM) {
+ struct obj_cgroup *objcg;
+
+ objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ return obj_cgroup_memcg(objcg);
+ }
+
+ return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
}

/*
@@ -402,15 +469,21 @@ static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
* @page: a pointer to the page struct
*
* Returns a pointer to the memory cgroup associated with the page,
- * or NULL. This function unlike page_memcg() can take any page
+ * or NULL. This function unlike page_memcg() can take any page
* as an argument. It has to be used in cases when it's not known if a page
- * has an associated memory cgroup pointer or an object cgroups vector.
+ * has an associated memory cgroup pointer or an object cgroups vector or
+ * an object cgroup.
+ *
+ * For a non-kmem page any of the following ensures page and memcg binding
+ * stability:
*
- * Any of the following ensures page and memcg binding stability:
* - the page lock
* - LRU isolation
* - lock_page_memcg()
* - exclusive reference
+ *
+ * For a kmem page a caller should hold an rcu read lock to protect memcg
+ * associated with a kmem page from being released.
*/
static inline struct mem_cgroup *page_memcg_check(struct page *page)
{
@@ -423,6 +496,13 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
if (memcg_data & MEMCG_DATA_OBJCGS)
return NULL;

+ if (memcg_data & MEMCG_DATA_KMEM) {
+ struct obj_cgroup *objcg;
+
+ objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ return obj_cgroup_memcg(objcg);
+ }
+
return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
}

@@ -681,21 +761,15 @@ static inline void obj_cgroup_get(struct obj_cgroup *objcg)
percpu_ref_get(&objcg->refcnt);
}

-static inline void obj_cgroup_put(struct obj_cgroup *objcg)
+static inline void obj_cgroup_get_many(struct obj_cgroup *objcg,
+ unsigned long nr)
{
- percpu_ref_put(&objcg->refcnt);
+ percpu_ref_get_many(&objcg->refcnt, nr);
}

-/*
- * After the initialization objcg->memcg is always pointing at
- * a valid memcg, but can be atomically swapped to the parent memcg.
- *
- * The caller must ensure that the returned memcg won't be released:
- * e.g. acquire the rcu_read_lock or css_set_lock.
- */
-static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
+static inline void obj_cgroup_put(struct obj_cgroup *objcg)
{
- return READ_ONCE(objcg->memcg);
+ percpu_ref_put(&objcg->refcnt);
}

static inline void mem_cgroup_put(struct mem_cgroup *memcg)
@@ -1007,18 +1081,22 @@ static inline void __mod_lruvec_page_state(struct page *page,
enum node_stat_item idx, int val)
{
struct page *head = compound_head(page); /* rmap on tail pages */
- struct mem_cgroup *memcg = page_memcg(head);
+ struct mem_cgroup *memcg;
pg_data_t *pgdat = page_pgdat(page);
struct lruvec *lruvec;

+ rcu_read_lock();
+ memcg = page_memcg(head);
/* Untracked pages have no memcg, no lruvec. Update only the node */
if (!memcg) {
+ rcu_read_unlock();
__mod_node_page_state(pgdat, idx, val);
return;
}

lruvec = mem_cgroup_lruvec(memcg, pgdat);
__mod_lruvec_state(lruvec, idx, val);
+ rcu_read_unlock();
}

static inline void mod_lruvec_page_state(struct page *page,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 73418413958c..738051c79cdd 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1068,20 +1068,6 @@ static __always_inline struct mem_cgroup *active_memcg(void)
return current->active_memcg;
}

-static __always_inline struct mem_cgroup *get_active_memcg(void)
-{
- struct mem_cgroup *memcg;
-
- rcu_read_lock();
- memcg = active_memcg();
- /* remote memcg must hold a ref. */
- if (memcg && WARN_ON_ONCE(!css_tryget(&memcg->css)))
- memcg = root_mem_cgroup;
- rcu_read_unlock();
-
- return memcg;
-}
-
static __always_inline bool memcg_kmem_bypass(void)
{
/* Allow remote memcg charging from any context. */
@@ -1095,20 +1081,6 @@ static __always_inline bool memcg_kmem_bypass(void)
return false;
}

-/**
- * If active memcg is set, do not fallback to current->mm->memcg.
- */
-static __always_inline struct mem_cgroup *get_mem_cgroup_from_current(void)
-{
- if (memcg_kmem_bypass())
- return NULL;
-
- if (unlikely(active_memcg()))
- return get_active_memcg();
-
- return get_mem_cgroup_from_mm(current->mm);
-}
-
/**
* mem_cgroup_iter - iterate over memory cgroup hierarchy
* @root: hierarchy root
@@ -3128,18 +3100,18 @@ void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages)
*/
int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
{
- struct mem_cgroup *memcg;
+ struct obj_cgroup *objcg;
int ret = 0;

- memcg = get_mem_cgroup_from_current();
- if (memcg && !mem_cgroup_is_root(memcg)) {
- ret = __memcg_kmem_charge(memcg, gfp, 1 << order);
+ objcg = get_obj_cgroup_from_current();
+ if (objcg) {
+ ret = obj_cgroup_charge_pages(objcg, gfp, 1 << order);
if (!ret) {
- page->memcg_data = (unsigned long)memcg |
+ page->memcg_data = (unsigned long)objcg |
MEMCG_DATA_KMEM;
return 0;
}
- css_put(&memcg->css);
+ obj_cgroup_put(objcg);
}
return ret;
}
@@ -3151,16 +3123,16 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
*/
void __memcg_kmem_uncharge_page(struct page *page, int order)
{
- struct mem_cgroup *memcg = page_memcg(page);
+ struct obj_cgroup *objcg;
unsigned int nr_pages = 1 << order;

- if (!memcg)
+ if (!PageMemcgKmem(page))
return;

- VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page);
- __memcg_kmem_uncharge(memcg, nr_pages);
+ objcg = __page_objcg(page);
+ obj_cgroup_uncharge_pages(objcg, nr_pages);
page->memcg_data = 0;
- css_put(&memcg->css);
+ obj_cgroup_put(objcg);
}

static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
@@ -3299,12 +3271,13 @@ void split_page_memcg(struct page *head, unsigned int nr)
if (mem_cgroup_disabled() || !memcg)
return;

- for (i = 1; i < nr; i++) {
- head[i].memcg_data = (unsigned long)memcg;
- if (kmemcg)
- __SetPageKmemcg(head + i);
- }
- css_get_many(&memcg->css, nr - 1);
+ for (i = 1; i < nr; i++)
+ head[i].memcg_data = head->memcg_data;
+
+ if (PageMemcgKmem(head))
+ obj_cgroup_get_many(__page_objcg(head), nr - 1);
+ else
+ css_get_many(&memcg->css, nr - 1);
}

#ifdef CONFIG_MEMCG_SWAP
@@ -6852,7 +6825,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)

struct uncharge_gather {
struct mem_cgroup *memcg;
- unsigned long nr_pages;
+ unsigned long nr_memory;
unsigned long pgpgout;
unsigned long nr_kmem;
struct page *dummy_page;
@@ -6867,10 +6840,10 @@ static void uncharge_batch(const struct uncharge_gather *ug)
{
unsigned long flags;

- if (!mem_cgroup_is_root(ug->memcg)) {
- page_counter_uncharge(&ug->memcg->memory, ug->nr_pages);
+ if (ug->nr_memory) {
+ page_counter_uncharge(&ug->memcg->memory, ug->nr_memory);
if (do_memsw_account())
- page_counter_uncharge(&ug->memcg->memsw, ug->nr_pages);
+ page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory);
if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem)
page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem);
memcg_oom_recover(ug->memcg);
@@ -6878,7 +6851,7 @@ static void uncharge_batch(const struct uncharge_gather *ug)

local_irq_save(flags);
__count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout);
- __this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_pages);
+ __this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_memory);
memcg_check_events(ug->memcg, ug->dummy_page);
local_irq_restore(flags);

@@ -6889,40 +6862,60 @@ static void uncharge_batch(const struct uncharge_gather *ug)
static void uncharge_page(struct page *page, struct uncharge_gather *ug)
{
unsigned long nr_pages;
+ struct mem_cgroup *memcg;
+ struct obj_cgroup *objcg;

VM_BUG_ON_PAGE(PageLRU(page), page);

- if (!page_memcg(page))
- return;
-
/*
* Nobody should be changing or seriously looking at
- * page_memcg(page) at this point, we have fully
+ * page memcg or objcg at this point, we have fully
* exclusive access to the page.
*/
+ if (PageMemcgKmem(page)) {
+ objcg = __page_objcg(page);
+ /*
+ * This get matches the put at the end of the function and
+ * kmem pages do not hold memcg references anymore.
+ */
+ memcg = get_mem_cgroup_from_objcg(objcg);
+ } else {
+ memcg = __page_memcg(page);
+ }
+
+ if (!memcg)
+ return;

- if (ug->memcg != page_memcg(page)) {
+ if (ug->memcg != memcg) {
if (ug->memcg) {
uncharge_batch(ug);
uncharge_gather_clear(ug);
}
- ug->memcg = page_memcg(page);
+ ug->memcg = memcg;
ug->dummy_page = page;

/* pairs with css_put in uncharge_batch */
- css_get(&ug->memcg->css);
+ css_get(&memcg->css);
}

nr_pages = compound_nr(page);
- ug->nr_pages += nr_pages;

- if (PageMemcgKmem(page))
+ if (PageMemcgKmem(page)) {
+ ug->nr_memory += nr_pages;
ug->nr_kmem += nr_pages;
- else
+
+ page->memcg_data = 0;
+ obj_cgroup_put(objcg);
+ } else {
+ /* LRU pages aren't accounted at the root level */
+ if (!mem_cgroup_is_root(memcg))
+ ug->nr_memory += nr_pages;
ug->pgpgout++;

- page->memcg_data = 0;
- css_put(&ug->memcg->css);
+ page->memcg_data = 0;
+ }
+
+ css_put(&memcg->css);
}

static void uncharge_list(struct list_head *page_list)
--
2.18.0.huawei.25

2021-08-16 07:14:28

by Chen Huang

[permalink] [raw]
Subject: [PATCH 5.10.y 06/11] mm: memcontrol: directly access page->memcg_data in mm/page_alloc.c

From: Muchun Song <[email protected]>

page_memcg() is not suitable for use by page_expected_state() and
page_bad_reason(). Because it can BUG_ON() for the slab pages when
CONFIG_DEBUG_VM is enabled. As neither lru, nor kmem, nor slab page
should have anything left in there by the time the page is freed, what
we care about is whether the value of page->memcg_data is 0. So just
directly access page->memcg_data here.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Xiongchun Duan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Chen Huang <[email protected]>
---
mm/page_alloc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8ec194271b91..12deac86a7ac 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1094,7 +1094,7 @@ static inline bool page_expected_state(struct page *page,
if (unlikely((unsigned long)page->mapping |
page_ref_count(page) |
#ifdef CONFIG_MEMCG
- (unsigned long)page_memcg(page) |
+ page->memcg_data |
#endif
(page->flags & check_flags)))
return false;
@@ -1119,7 +1119,7 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
bad_reason = "PAGE_FLAGS_CHECK_AT_FREE flag(s) set";
}
#ifdef CONFIG_MEMCG
- if (unlikely(page_memcg(page)))
+ if (unlikely(page->memcg_data))
bad_reason = "page still charged to cgroup";
#endif
return bad_reason;
--
2.18.0.huawei.25

2021-08-16 07:14:38

by Chen Huang

[permalink] [raw]
Subject: [PATCH 5.10.y 11/11] mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook()

From: Wang Hai <[email protected]>

When I use kfree_rcu() to free a large memory allocated by kmalloc_node(),
the following dump occurs.

BUG: kernel NULL pointer dereference, address: 0000000000000020
[...]
Oops: 0000 [#1] SMP
[...]
Workqueue: events kfree_rcu_work
RIP: 0010:__obj_to_index include/linux/slub_def.h:182 [inline]
RIP: 0010:obj_to_index include/linux/slub_def.h:191 [inline]
RIP: 0010:memcg_slab_free_hook+0x120/0x260 mm/slab.h:363
[...]
Call Trace:
kmem_cache_free_bulk+0x58/0x630 mm/slub.c:3293
kfree_bulk include/linux/slab.h:413 [inline]
kfree_rcu_work+0x1ab/0x200 kernel/rcu/tree.c:3300
process_one_work+0x207/0x530 kernel/workqueue.c:2276
worker_thread+0x320/0x610 kernel/workqueue.c:2422
kthread+0x13d/0x160 kernel/kthread.c:313
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

When kmalloc_node() a large memory, page is allocated, not slab, so when
freeing memory via kfree_rcu(), this large memory should not be used by
memcg_slab_free_hook(), because memcg_slab_free_hook() is is used for
slab.

Using page_objcgs_check() instead of page_objcgs() in
memcg_slab_free_hook() to fix this bug.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 270c6a71460e ("mm: memcontrol/slab: Use helpers to access slab page's memcg_data")
Signed-off-by: Wang Hai <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Reviewed-by: Kefeng Wang <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Chen Huang <[email protected]>
---
mm/slab.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab.h b/mm/slab.h
index 571757eb4a8f..9759992c720c 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -349,7 +349,7 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s_orig,
continue;

page = virt_to_head_page(p[i]);
- objcgs = page_objcgs(page);
+ objcgs = page_objcgs_check(page);
if (!objcgs)
continue;

--
2.18.0.huawei.25

2021-08-16 07:15:10

by Chen Huang

[permalink] [raw]
Subject: [PATCH 5.10.y 07/11] mm: memcontrol: change ug->dummy_page only if memcg changed

From: Muchun Song <[email protected]>

Just like assignment to ug->memcg, we only need to update ug->dummy_page
if memcg changed. So move it to there. This is a very small
optimization.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Xiongchun Duan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Chen Huang <[email protected]>
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d45c2c97d9b8..73418413958c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6907,6 +6907,7 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
uncharge_gather_clear(ug);
}
ug->memcg = page_memcg(page);
+ ug->dummy_page = page;

/* pairs with css_put in uncharge_batch */
css_get(&ug->memcg->css);
@@ -6920,7 +6921,6 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
else
ug->pgpgout++;

- ug->dummy_page = page;
page->memcg_data = 0;
css_put(&ug->memcg->css);
}
--
2.18.0.huawei.25

2021-08-16 07:15:31

by Chen Huang

[permalink] [raw]
Subject: [PATCH 5.10.y 10/11] mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM

From: Muchun Song <[email protected]>

The page only can be marked as kmem when CONFIG_MEMCG_KMEM is enabled.
So move PageMemcgKmem() to the scope of the CONFIG_MEMCG_KMEM.

As a bonus, on !CONFIG_MEMCG_KMEM build some code can be compiled out.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Xiongchun Duan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Chen Huang <[email protected]>
---
include/linux/memcontrol.h | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index b8bb5d37d4ad..f07463cf7dac 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -506,6 +506,7 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
}

+#ifdef CONFIG_MEMCG_KMEM
/*
* PageMemcgKmem - check if the page has MemcgKmem flag set
* @page: a pointer to the page struct
@@ -520,7 +521,6 @@ static inline bool PageMemcgKmem(struct page *page)
return page->memcg_data & MEMCG_DATA_KMEM;
}

-#ifdef CONFIG_MEMCG_KMEM
/*
* page_objcgs - get the object cgroups vector associated with a page
* @page: a pointer to the page struct
@@ -575,6 +575,11 @@ static inline bool set_page_objcgs(struct page *page,
MEMCG_DATA_OBJCGS);
}
#else
+static inline bool PageMemcgKmem(struct page *page)
+{
+ return false;
+}
+
static inline struct obj_cgroup **page_objcgs(struct page *page)
{
return NULL;
--
2.18.0.huawei.25

2021-08-16 07:15:31

by Chen Huang

[permalink] [raw]
Subject: [PATCH 5.10.y 09/11] mm: memcontrol: inline __memcg_kmem_{un}charge() into obj_cgroup_{un}charge_pages()

From: Muchun Song <[email protected]>

There is only one user of __memcg_kmem_charge(), so manually inline
__memcg_kmem_charge() to obj_cgroup_charge_pages(). Similarly manually
inline __memcg_kmem_uncharge() into obj_cgroup_uncharge_pages() and call
obj_cgroup_uncharge_pages() in obj_cgroup_release().

This is just code cleanup without any functionality changes.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Xiongchun Duan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Conflicts:
mm/memcontrol.c
Signed-off-by: Chen Huang <[email protected]>
---
mm/memcontrol.c | 60 +++++++++++++++++++++----------------------------
1 file changed, 26 insertions(+), 34 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 738051c79cdd..8932f986bf2e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -252,6 +252,9 @@ struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr)
#ifdef CONFIG_MEMCG_KMEM
extern spinlock_t css_set_lock;

+static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
+ unsigned int nr_pages);
+
static void obj_cgroup_release(struct percpu_ref *ref)
{
struct obj_cgroup *objcg = container_of(ref, struct obj_cgroup, refcnt);
@@ -287,7 +290,7 @@ static void obj_cgroup_release(struct percpu_ref *ref)
spin_lock_irqsave(&css_set_lock, flags);
memcg = obj_cgroup_memcg(objcg);
if (nr_pages)
- __memcg_kmem_uncharge(memcg, nr_pages);
+ obj_cgroup_uncharge_pages(objcg, nr_pages);
list_del(&objcg->list);
mem_cgroup_put(memcg);
spin_unlock_irqrestore(&css_set_lock, flags);
@@ -3018,46 +3021,45 @@ static void memcg_free_cache_id(int id)
ida_simple_remove(&memcg_cache_ida, id);
}

+/*
+ * obj_cgroup_uncharge_pages: uncharge a number of kernel pages from a objcg
+ * @objcg: object cgroup to uncharge
+ * @nr_pages: number of pages to uncharge
+ */
static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
unsigned int nr_pages)
{
struct mem_cgroup *memcg;

memcg = get_mem_cgroup_from_objcg(objcg);
- __memcg_kmem_uncharge(memcg, nr_pages);
- css_put(&memcg->css);
-}

-static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
- unsigned int nr_pages)
-{
- struct mem_cgroup *memcg;
- int ret;
+ if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
+ page_counter_uncharge(&memcg->kmem, nr_pages);
+ refill_stock(memcg, nr_pages);

- memcg = get_mem_cgroup_from_objcg(objcg);
- ret = __memcg_kmem_charge(memcg, gfp, nr_pages);
css_put(&memcg->css);
-
- return ret;
}

-/**
- * __memcg_kmem_charge: charge a number of kernel pages to a memcg
- * @memcg: memory cgroup to charge
+/*
+ * obj_cgroup_charge_pages: charge a number of kernel pages to a objcg
+ * @objcg: object cgroup to charge
* @gfp: reclaim mode
* @nr_pages: number of pages to charge
*
* Returns 0 on success, an error code on failure.
*/
-int __memcg_kmem_charge(struct mem_cgroup *memcg, gfp_t gfp,
- unsigned int nr_pages)
+static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
+ unsigned int nr_pages)
{
struct page_counter *counter;
+ struct mem_cgroup *memcg;
int ret;

+ memcg = get_mem_cgroup_from_objcg(objcg);
+
ret = try_charge(memcg, gfp, nr_pages);
if (ret)
- return ret;
+ goto out;

if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) &&
!page_counter_try_charge(&memcg->kmem, nr_pages, &counter)) {
@@ -3069,25 +3071,15 @@ int __memcg_kmem_charge(struct mem_cgroup *memcg, gfp_t gfp,
*/
if (gfp & __GFP_NOFAIL) {
page_counter_charge(&memcg->kmem, nr_pages);
- return 0;
+ goto out;
}
cancel_charge(memcg, nr_pages);
- return -ENOMEM;
+ ret = -ENOMEM;
}
- return 0;
-}
-
-/**
- * __memcg_kmem_uncharge: uncharge a number of kernel pages from a memcg
- * @memcg: memcg to uncharge
- * @nr_pages: number of pages to uncharge
- */
-void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages)
-{
- if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
- page_counter_uncharge(&memcg->kmem, nr_pages);
+out:
+ css_put(&memcg->css);

- refill_stock(memcg, nr_pages);
+ return ret;
}

/**
--
2.18.0.huawei.25

2021-08-16 07:16:44

by Chen Huang

[permalink] [raw]
Subject: [PATCH 5.10.y 05/11] mm: memcontrol: introduce obj_cgroup_{un}charge_pages

From: Muchun Song <[email protected]>

We know that the unit of slab object charging is bytes, the unit of kmem
page charging is PAGE_SIZE. If we want to reuse obj_cgroup APIs to
charge the kmem pages, we should pass PAGE_SIZE (as third parameter) to
obj_cgroup_charge(). Because the size is already PAGE_SIZE, we can skip
touch the objcg stock. And obj_cgroup_{un}charge_pages() are introduced
to charge in units of page level.

In the latter patch, we also can reuse those two helpers to charge or
uncharge a number of kernel pages to a object cgroup. This is just a
code movement without any functional changes.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Xiongchun Duan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Chen Huang <[email protected]>
---
mm/memcontrol.c | 63 +++++++++++++++++++++++++++++++------------------
1 file changed, 40 insertions(+), 23 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index abeaf5cede74..d45c2c97d9b8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2888,6 +2888,20 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg)
page->memcg_data = (unsigned long)memcg;
}

+static struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *objcg)
+{
+ struct mem_cgroup *memcg;
+
+ rcu_read_lock();
+retry:
+ memcg = obj_cgroup_memcg(objcg);
+ if (unlikely(!css_tryget(&memcg->css)))
+ goto retry;
+ rcu_read_unlock();
+
+ return memcg;
+}
+
#ifdef CONFIG_MEMCG_KMEM
/*
* The allocated objcg pointers array is not accounted directly.
@@ -3032,6 +3046,29 @@ static void memcg_free_cache_id(int id)
ida_simple_remove(&memcg_cache_ida, id);
}

+static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
+ unsigned int nr_pages)
+{
+ struct mem_cgroup *memcg;
+
+ memcg = get_mem_cgroup_from_objcg(objcg);
+ __memcg_kmem_uncharge(memcg, nr_pages);
+ css_put(&memcg->css);
+}
+
+static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
+ unsigned int nr_pages)
+{
+ struct mem_cgroup *memcg;
+ int ret;
+
+ memcg = get_mem_cgroup_from_objcg(objcg);
+ ret = __memcg_kmem_charge(memcg, gfp, nr_pages);
+ css_put(&memcg->css);
+
+ return ret;
+}
+
/**
* __memcg_kmem_charge: charge a number of kernel pages to a memcg
* @memcg: memory cgroup to charge
@@ -3156,19 +3193,8 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock)
unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT;
unsigned int nr_bytes = stock->nr_bytes & (PAGE_SIZE - 1);

- if (nr_pages) {
- struct mem_cgroup *memcg;
-
- rcu_read_lock();
-retry:
- memcg = obj_cgroup_memcg(old);
- if (unlikely(!css_tryget(&memcg->css)))
- goto retry;
- rcu_read_unlock();
-
- __memcg_kmem_uncharge(memcg, nr_pages);
- css_put(&memcg->css);
- }
+ if (nr_pages)
+ obj_cgroup_uncharge_pages(old, nr_pages);

/*
* The leftover is flushed to the centralized per-memcg value.
@@ -3226,7 +3252,6 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)

int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
{
- struct mem_cgroup *memcg;
unsigned int nr_pages, nr_bytes;
int ret;

@@ -3243,24 +3268,16 @@ int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
* refill_obj_stock(), called from this function or
* independently later.
*/
- rcu_read_lock();
-retry:
- memcg = obj_cgroup_memcg(objcg);
- if (unlikely(!css_tryget(&memcg->css)))
- goto retry;
- rcu_read_unlock();
-
nr_pages = size >> PAGE_SHIFT;
nr_bytes = size & (PAGE_SIZE - 1);

if (nr_bytes)
nr_pages += 1;

- ret = __memcg_kmem_charge(memcg, gfp, nr_pages);
+ ret = obj_cgroup_charge_pages(objcg, gfp, nr_pages);
if (!ret && nr_bytes)
refill_obj_stock(objcg, PAGE_SIZE - nr_bytes);

- css_put(&memcg->css);
return ret;
}

--
2.18.0.huawei.25