2022-09-26 15:49:34

by Johannes Weiner

[permalink] [raw]
Subject: [PATCH 0/4] memcg swap fix & cleanups

This is a refresh of older patches that fell through the cracks.

Applies on top of mm-unstable.

Documentation/admin-guide/cgroup-v1/memory.rst | 4 +-
Documentation/admin-guide/kernel-parameters.txt | 6 --
arch/mips/configs/db1xxx_defconfig | 1 -
arch/mips/configs/generic_defconfig | 1 -
arch/powerpc/configs/powernv_defconfig | 1 -
arch/powerpc/configs/pseries_defconfig | 1 -
arch/sh/configs/sdk7786_defconfig | 1 -
arch/sh/configs/urquell_defconfig | 1 -
include/linux/swap.h | 2 +-
include/linux/swap_cgroup.h | 4 +-
init/Kconfig | 5 --
mm/Makefile | 4 +-
mm/memcontrol.c | 79 ++++++++---------------
mm/swap_cgroup.c | 6 ++
tools/testing/selftests/cgroup/config | 1 -
15 files changed, 39 insertions(+), 78 deletions(-)



2022-09-26 15:51:09

by Johannes Weiner

[permalink] [raw]
Subject: [PATCH 1/4] mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled

Since commit 2d1c498072de ("mm: memcontrol: make swap tracking an
integral part of memory control"), the cgroup swap arrays are used to
track memory ownership at the time of swap readahead and swapoff, even
if swap space *accounting* has been turned off by the user via
swapaccount=0 (which sets cgroup_memory_noswap).

However, the patch was overzealous: by simply dropping the
cgroup_memory_noswap conditionals in the swapon, swapoff and uncharge
path, it caused the cgroup arrays being allocated even when the memory
controller as a whole is disabled. This is a waste of that memory.

Restore mem_cgroup_disabled() checks, implied previously by
cgroup_memory_noswap, in the swapon, swapoff, and swap_entry_free
callbacks.

Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control")
Reported-by: Hugh Dickins <[email protected]>
Signed-off-by: Johannes Weiner <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Acked-by: Michal Hocko <[email protected]>
---
mm/memcontrol.c | 3 +++
mm/swap_cgroup.c | 6 ++++++
2 files changed, 9 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6b74bbdc2659..9e3c010ca676 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -7459,6 +7459,9 @@ void __mem_cgroup_uncharge_swap(swp_entry_t entry, unsigned int nr_pages)
struct mem_cgroup *memcg;
unsigned short id;

+ if (mem_cgroup_disabled())
+ return;
+
id = swap_cgroup_record(entry, 0, nr_pages);
rcu_read_lock();
memcg = mem_cgroup_from_id(id);
diff --git a/mm/swap_cgroup.c b/mm/swap_cgroup.c
index 5a9442979a18..db6c4a26cf59 100644
--- a/mm/swap_cgroup.c
+++ b/mm/swap_cgroup.c
@@ -170,6 +170,9 @@ int swap_cgroup_swapon(int type, unsigned long max_pages)
unsigned long length;
struct swap_cgroup_ctrl *ctrl;

+ if (mem_cgroup_disabled())
+ return 0;
+
length = DIV_ROUND_UP(max_pages, SC_PER_PAGE);

array = vcalloc(length, sizeof(void *));
@@ -204,6 +207,9 @@ void swap_cgroup_swapoff(int type)
unsigned long i, length;
struct swap_cgroup_ctrl *ctrl;

+ if (mem_cgroup_disabled())
+ return;
+
mutex_lock(&swap_cgroup_mutex);
ctrl = &swap_cgroup_ctrl[type];
map = ctrl->map;
--
2.37.3

2022-09-26 15:52:43

by Johannes Weiner

[permalink] [raw]
Subject: [PATCH 2/4] mm: memcontrol: deprecate swapaccounting=0 mode

The swapaccounting= commandline option already does very little
today. To close a trivial containment failure case, the swap ownership
tracking part of the swap controller has recently become mandatory
(see commit 2d1c498072de ("mm: memcontrol: make swap tracking an
integral part of memory control") for details), which makes up the
majority of the work during swapout, swapin, and the swap slot map.

The only thing left under this flag is the page_counter operations and
the visibility of the swap control files in the first place, which are
rather meager savings. There also aren't many scenarios, if any, where
controlling the memory of a cgroup while allowing it unlimited access
to a global swap space is a workable resource isolation strategy.

On the other hand, there have been several bugs and confusion around
the many possible swap controller states (cgroup1 vs cgroup2 behavior,
memory accounting without swap accounting, memcg runtime disabled).

This puts the maintenance overhead of retaining the toggle above its
practical benefits. Deprecate it.

Suggested-by: Shakeel Butt <[email protected]>
Signed-off-by: Johannes Weiner <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 6 ---
mm/memcontrol.c | 50 ++++---------------
2 files changed, 10 insertions(+), 46 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 3b95f65bafe2..99a13f2be2ef 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6036,12 +6036,6 @@
This parameter controls use of the Protected
Execution Facility on pSeries.

- swapaccount= [KNL]
- Format: [0|1]
- Enable accounting of swap in memory resource
- controller if no parameter or 1 is given or disable
- it if 0 is given (See Documentation/admin-guide/cgroup-v1/memory.rst)
-
swiotlb= [ARM,IA-64,PPC,MIPS,X86]
Format: { <int> [,<int>] | force | noforce }
<int> -- Number of I/O TLB slabs
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9e3c010ca676..4be1b48b9659 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -88,22 +88,6 @@ static bool cgroup_memory_nosocket __ro_after_init;
/* Kernel memory accounting disabled? */
static bool cgroup_memory_nokmem __ro_after_init;

-/* Whether the swap controller is active */
-#ifdef CONFIG_MEMCG_SWAP
-static bool cgroup_memory_noswap __initdata;
-
-static DEFINE_STATIC_KEY_FALSE(memcg_swap_enabled_key);
-static inline bool memcg_swap_enabled(void)
-{
- return static_branch_likely(&memcg_swap_enabled_key);
-}
-#else
-static inline bool memcg_swap_enabled(void)
-{
- return false;
-}
-#endif
-
#ifdef CONFIG_CGROUP_WRITEBACK
static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq);
#endif
@@ -111,7 +95,7 @@ static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq);
/* Whether legacy memory+swap accounting is active */
static bool do_memsw_account(void)
{
- return !cgroup_subsys_on_dfl(memory_cgrp_subsys) && memcg_swap_enabled();
+ return !cgroup_subsys_on_dfl(memory_cgrp_subsys);
}

#define THRESHOLDS_EVENTS_TARGET 128
@@ -7379,7 +7363,7 @@ void mem_cgroup_swapout(struct folio *folio, swp_entry_t entry)
if (!mem_cgroup_is_root(memcg))
page_counter_uncharge(&memcg->memory, nr_entries);

- if (memcg_swap_enabled() && memcg != swap_memcg) {
+ if (memcg != swap_memcg) {
if (!mem_cgroup_is_root(swap_memcg))
page_counter_charge(&swap_memcg->memsw, nr_entries);
page_counter_uncharge(&memcg->memsw, nr_entries);
@@ -7431,7 +7415,7 @@ int __mem_cgroup_try_charge_swap(struct folio *folio, swp_entry_t entry)

memcg = mem_cgroup_id_get_online(memcg);

- if (memcg_swap_enabled() && !mem_cgroup_is_root(memcg) &&
+ if (!mem_cgroup_is_root(memcg) &&
!page_counter_try_charge(&memcg->swap, nr_pages, &counter)) {
memcg_memory_event(memcg, MEMCG_SWAP_MAX);
memcg_memory_event(memcg, MEMCG_SWAP_FAIL);
@@ -7466,7 +7450,7 @@ void __mem_cgroup_uncharge_swap(swp_entry_t entry, unsigned int nr_pages)
rcu_read_lock();
memcg = mem_cgroup_from_id(id);
if (memcg) {
- if (memcg_swap_enabled() && !mem_cgroup_is_root(memcg)) {
+ if (!mem_cgroup_is_root(memcg)) {
if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
page_counter_uncharge(&memcg->swap, nr_pages);
else
@@ -7482,7 +7466,7 @@ long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg)
{
long nr_swap_pages = get_nr_swap_pages();

- if (!memcg_swap_enabled() || !cgroup_subsys_on_dfl(memory_cgrp_subsys))
+ if (mem_cgroup_disabled() || do_memsw_account())
return nr_swap_pages;
for (; memcg != root_mem_cgroup; memcg = parent_mem_cgroup(memcg))
nr_swap_pages = min_t(long, nr_swap_pages,
@@ -7499,7 +7483,7 @@ bool mem_cgroup_swap_full(struct folio *folio)

if (vm_swap_full())
return true;
- if (!memcg_swap_enabled() || !cgroup_subsys_on_dfl(memory_cgrp_subsys))
+ if (do_memsw_account())
return false;

memcg = folio_memcg(folio);
@@ -7519,10 +7503,9 @@ bool mem_cgroup_swap_full(struct folio *folio)

static int __init setup_swap_account(char *s)
{
- bool res;
-
- if (!kstrtobool(s, &res))
- cgroup_memory_noswap = !res;
+ pr_warn_once("The swapaccount= commandline option is deprecated. "
+ "Please report your usecase to [email protected] if you "
+ "depend on this functionality.\n");
return 1;
}
__setup("swapaccount=", setup_swap_account);
@@ -7791,24 +7774,11 @@ static struct cftype zswap_files[] = {
};
#endif /* CONFIG_MEMCG_KMEM && CONFIG_ZSWAP */

-/*
- * If mem_cgroup_swap_init() is implemented as a subsys_initcall()
- * instead of a core_initcall(), this could mean cgroup_memory_noswap still
- * remains set to false even when memcg is disabled via "cgroup_disable=memory"
- * boot parameter. This may result in premature OOPS inside
- * mem_cgroup_get_nr_swap_pages() function in corner cases.
- */
static int __init mem_cgroup_swap_init(void)
{
- /* No memory control -> no swap control */
if (mem_cgroup_disabled())
- cgroup_memory_noswap = true;
-
- if (cgroup_memory_noswap)
return 0;

- static_branch_enable(&memcg_swap_enabled_key);
-
WARN_ON(cgroup_add_dfl_cftypes(&memory_cgrp_subsys, swap_files));
WARN_ON(cgroup_add_legacy_cftypes(&memory_cgrp_subsys, memsw_files));
#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP)
@@ -7816,6 +7786,6 @@ static int __init mem_cgroup_swap_init(void)
#endif
return 0;
}
-core_initcall(mem_cgroup_swap_init);
+subsys_initcall(mem_cgroup_swap_init);

#endif /* CONFIG_MEMCG_SWAP */
--
2.37.3

2022-09-26 15:56:58

by Johannes Weiner

[permalink] [raw]
Subject: [PATCH 4/4] mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol

Since 2d1c498072de ("mm: memcontrol: make swap tracking an integral
part of memory control"), CONFIG_MEMCG_SWAP hasn't been a user-visible
config option anymore, it just means CONFIG_MEMCG && CONFIG_SWAP.

Update the sites accordingly and drop the symbol.

[ While touching the docs, remove two references to CONFIG_MEMCG_KMEM,
which hasn't been a user-visible symbol for over half a decade. ]

Signed-off-by: Johannes Weiner <[email protected]>
---
Documentation/admin-guide/cgroup-v1/memory.rst | 4 +---
arch/mips/configs/db1xxx_defconfig | 1 -
arch/mips/configs/generic_defconfig | 1 -
arch/powerpc/configs/powernv_defconfig | 1 -
arch/powerpc/configs/pseries_defconfig | 1 -
arch/sh/configs/sdk7786_defconfig | 1 -
arch/sh/configs/urquell_defconfig | 1 -
include/linux/swap.h | 2 +-
include/linux/swap_cgroup.h | 4 ++--
init/Kconfig | 5 -----
mm/Makefile | 4 +++-
mm/memcontrol.c | 6 +++---
tools/testing/selftests/cgroup/config | 1 -
13 files changed, 10 insertions(+), 22 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
index 2cc502a75ef6..5b86245450bd 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -299,7 +299,7 @@ Per-node-per-memcgroup LRU (cgroup's private LRU) is guarded by
lruvec->lru_lock; PG_lru bit of page->flags is cleared before
isolating a page from its LRU under lruvec->lru_lock.

-2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM)
+2.7 Kernel Memory Extension
-----------------------------------------------

With the Kernel memory extension, the Memory Controller is able to limit
@@ -386,8 +386,6 @@ limit, and "K" the kernel limit. There are three possible ways limits can be

a. Enable CONFIG_CGROUPS
b. Enable CONFIG_MEMCG
-c. Enable CONFIG_MEMCG_SWAP (to use swap extension)
-d. Enable CONFIG_MEMCG_KMEM (to use kmem extension)

3.1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
-------------------------------------------------------------------
diff --git a/arch/mips/configs/db1xxx_defconfig b/arch/mips/configs/db1xxx_defconfig
index b8bd66300996..83cbdecb27e6 100644
--- a/arch/mips/configs/db1xxx_defconfig
+++ b/arch/mips/configs/db1xxx_defconfig
@@ -9,7 +9,6 @@ CONFIG_HIGH_RES_TIMERS=y
CONFIG_LOG_BUF_SHIFT=16
CONFIG_CGROUPS=y
CONFIG_MEMCG=y
-CONFIG_MEMCG_SWAP=y
CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
diff --git a/arch/mips/configs/generic_defconfig b/arch/mips/configs/generic_defconfig
index 714169e411cf..48e4e251779b 100644
--- a/arch/mips/configs/generic_defconfig
+++ b/arch/mips/configs/generic_defconfig
@@ -3,7 +3,6 @@ CONFIG_NO_HZ_IDLE=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_MEMCG=y
-CONFIG_MEMCG_SWAP=y
CONFIG_BLK_CGROUP=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig
index 49f49c263935..4acca5263404 100644
--- a/arch/powerpc/configs/powernv_defconfig
+++ b/arch/powerpc/configs/powernv_defconfig
@@ -17,7 +17,6 @@ CONFIG_LOG_CPU_MAX_BUF_SHIFT=13
CONFIG_NUMA_BALANCING=y
CONFIG_CGROUPS=y
CONFIG_MEMCG=y
-CONFIG_MEMCG_SWAP=y
CONFIG_CGROUP_SCHED=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CPUSETS=y
diff --git a/arch/powerpc/configs/pseries_defconfig b/arch/powerpc/configs/pseries_defconfig
index b571d084c148..fead14ebb1fc 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -16,7 +16,6 @@ CONFIG_LOG_CPU_MAX_BUF_SHIFT=13
CONFIG_NUMA_BALANCING=y
CONFIG_CGROUPS=y
CONFIG_MEMCG=y
-CONFIG_MEMCG_SWAP=y
CONFIG_CGROUP_SCHED=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CPUSETS=y
diff --git a/arch/sh/configs/sdk7786_defconfig b/arch/sh/configs/sdk7786_defconfig
index a8662b6927ec..97b7356639ed 100644
--- a/arch/sh/configs/sdk7786_defconfig
+++ b/arch/sh/configs/sdk7786_defconfig
@@ -16,7 +16,6 @@ CONFIG_CPUSETS=y
# CONFIG_PROC_PID_CPUSET is not set
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_MEMCG=y
-CONFIG_CGROUP_MEMCG_SWAP=y
CONFIG_CGROUP_SCHED=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_BLK_CGROUP=y
diff --git a/arch/sh/configs/urquell_defconfig b/arch/sh/configs/urquell_defconfig
index cb2f56468fe0..be478f3148f2 100644
--- a/arch/sh/configs/urquell_defconfig
+++ b/arch/sh/configs/urquell_defconfig
@@ -14,7 +14,6 @@ CONFIG_CPUSETS=y
# CONFIG_PROC_PID_CPUSET is not set
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_MEMCG=y
-CONFIG_CGROUP_MEMCG_SWAP=y
CONFIG_CGROUP_SCHED=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_BLK_DEV_INITRD=y
diff --git a/include/linux/swap.h b/include/linux/swap.h
index fc8d98660326..a18cf4b7c724 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -666,7 +666,7 @@ static inline void folio_throttle_swaprate(struct folio *folio, gfp_t gfp)
cgroup_throttle_swaprate(&folio->page, gfp);
}

-#ifdef CONFIG_MEMCG_SWAP
+#if defined(CONFIG_MEMCG) && defined(CONFIG_SWAP)
void mem_cgroup_swapout(struct folio *folio, swp_entry_t entry);
int __mem_cgroup_try_charge_swap(struct folio *folio, swp_entry_t entry);
static inline int mem_cgroup_try_charge_swap(struct folio *folio,
diff --git a/include/linux/swap_cgroup.h b/include/linux/swap_cgroup.h
index a12dd1c3966c..ae73a87775b3 100644
--- a/include/linux/swap_cgroup.h
+++ b/include/linux/swap_cgroup.h
@@ -4,7 +4,7 @@

#include <linux/swap.h>

-#ifdef CONFIG_MEMCG_SWAP
+#if defined(CONFIG_MEMCG) && defined(CONFIG_SWAP)

extern unsigned short swap_cgroup_cmpxchg(swp_entry_t ent,
unsigned short old, unsigned short new);
@@ -40,6 +40,6 @@ static inline void swap_cgroup_swapoff(int type)
return;
}

-#endif /* CONFIG_MEMCG_SWAP */
+#endif

#endif /* __LINUX_SWAP_CGROUP_H */
diff --git a/init/Kconfig b/init/Kconfig
index 532362fcfe31..7d86cf6b3012 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -958,11 +958,6 @@ config MEMCG
help
Provides control over the memory footprint of tasks in a cgroup.

-config MEMCG_SWAP
- bool
- depends on MEMCG && SWAP
- default y
-
config MEMCG_KMEM
bool
depends on MEMCG && !SLOB
diff --git a/mm/Makefile b/mm/Makefile
index cc23b0052584..8e105e5b3e29 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -98,7 +98,9 @@ obj-$(CONFIG_DEVICE_MIGRATION) += migrate_device.o
obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o
obj-$(CONFIG_PAGE_COUNTER) += page_counter.o
obj-$(CONFIG_MEMCG) += memcontrol.o vmpressure.o
-obj-$(CONFIG_MEMCG_SWAP) += swap_cgroup.o
+ifdef CONFIG_SWAP
+obj-$(CONFIG_MEMCG) += swap_cgroup.o
+endif
obj-$(CONFIG_CGROUP_HUGETLB) += hugetlb_cgroup.o
obj-$(CONFIG_GUP_TEST) += gup_test.o
obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 76bb0a18a2f3..61e05fc281fb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3423,7 +3423,7 @@ void split_page_memcg(struct page *head, unsigned int nr)
css_get_many(&memcg->css, nr - 1);
}

-#ifdef CONFIG_MEMCG_SWAP
+#ifdef CONFIG_SWAP
/**
* mem_cgroup_move_swap_account - move swap charge and swap_cgroup's record.
* @entry: swap entry to be moved
@@ -7296,7 +7296,7 @@ static int __init mem_cgroup_init(void)
}
subsys_initcall(mem_cgroup_init);

-#ifdef CONFIG_MEMCG_SWAP
+#ifdef CONFIG_SWAP
static struct mem_cgroup *mem_cgroup_id_get_online(struct mem_cgroup *memcg)
{
while (!refcount_inc_not_zero(&memcg->id.ref)) {
@@ -7788,4 +7788,4 @@ static int __init mem_cgroup_swap_init(void)
}
subsys_initcall(mem_cgroup_swap_init);

-#endif /* CONFIG_MEMCG_SWAP */
+#endif /* CONFIG_SWAP */
diff --git a/tools/testing/selftests/cgroup/config b/tools/testing/selftests/cgroup/config
index 84fe884fad86..97d549ee894f 100644
--- a/tools/testing/selftests/cgroup/config
+++ b/tools/testing/selftests/cgroup/config
@@ -4,5 +4,4 @@ CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_SCHED=y
CONFIG_MEMCG=y
CONFIG_MEMCG_KMEM=y
-CONFIG_MEMCG_SWAP=y
CONFIG_PAGE_COUNTER=y
--
2.37.3

2022-09-26 16:45:23

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH 4/4] mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol

On Mon, Sep 26, 2022 at 6:57 AM Johannes Weiner <[email protected]> wrote:
>
> Since 2d1c498072de ("mm: memcontrol: make swap tracking an integral
> part of memory control"), CONFIG_MEMCG_SWAP hasn't been a user-visible
> config option anymore, it just means CONFIG_MEMCG && CONFIG_SWAP.
>
> Update the sites accordingly and drop the symbol.
>
> [ While touching the docs, remove two references to CONFIG_MEMCG_KMEM,
> which hasn't been a user-visible symbol for over half a decade. ]
>
> Signed-off-by: Johannes Weiner <[email protected]>

Acked-by: Shakeel Butt <[email protected]>