2021-08-19 19:55:28

by Johannes Weiner

[permalink] [raw]
Subject: [PATCH 1/4] mm: Kconfig: move swap and slab config options to the MM section

These are currently under General Setup. MM seems like a better fit.

Signed-off-by: Johannes Weiner <[email protected]>
---
init/Kconfig | 120 ---------------------------------------------------
mm/Kconfig | 120 +++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 120 insertions(+), 120 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index a61c92066c2e..a2358cd5498a 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -331,23 +331,6 @@ config DEFAULT_HOSTNAME
but you may wish to use a different default here to make a minimal
system more usable with less configuration.

-#
-# For some reason microblaze and nios2 hard code SWAP=n. Hopefully we can
-# add proper SWAP support to them, in which case this can be remove.
-#
-config ARCH_NO_SWAP
- bool
-
-config SWAP
- bool "Support for paging of anonymous memory (swap)"
- depends on MMU && BLOCK && !ARCH_NO_SWAP
- default y
- help
- This option allows you to choose whether you want to have support
- for so called swap devices or swap files in your kernel that are
- used to provide more virtual memory than the actual RAM present
- in your computer. If unsure say Y.
-
config SYSVIPC
bool "System V IPC"
help
@@ -1862,109 +1845,6 @@ config COMPAT_BRK

On non-ancient distros (post-2000 ones) N is usually a safe choice.

-choice
- prompt "Choose SLAB allocator"
- default SLUB
- help
- This option allows to select a slab allocator.
-
-config SLAB
- bool "SLAB"
- select HAVE_HARDENED_USERCOPY_ALLOCATOR
- help
- The regular slab allocator that is established and known to work
- well in all environments. It organizes cache hot objects in
- per cpu and per node queues.
-
-config SLUB
- bool "SLUB (Unqueued Allocator)"
- select HAVE_HARDENED_USERCOPY_ALLOCATOR
- help
- SLUB is a slab allocator that minimizes cache line usage
- instead of managing queues of cached objects (SLAB approach).
- Per cpu caching is realized using slabs of objects instead
- of queues of objects. SLUB can use memory efficiently
- and has enhanced diagnostics. SLUB is the default choice for
- a slab allocator.
-
-config SLOB
- depends on EXPERT
- bool "SLOB (Simple Allocator)"
- help
- SLOB replaces the stock allocator with a drastically simpler
- allocator. SLOB is generally more space efficient but
- does not perform as well on large systems.
-
-endchoice
-
-config SLAB_MERGE_DEFAULT
- bool "Allow slab caches to be merged"
- default y
- help
- For reduced kernel memory fragmentation, slab caches can be
- merged when they share the same size and other characteristics.
- This carries a risk of kernel heap overflows being able to
- overwrite objects from merged caches (and more easily control
- cache layout), which makes such heap attacks easier to exploit
- by attackers. By keeping caches unmerged, these kinds of exploits
- can usually only damage objects in the same cache. To disable
- merging at runtime, "slab_nomerge" can be passed on the kernel
- command line.
-
-config SLAB_FREELIST_RANDOM
- bool "Randomize slab freelist"
- depends on SLAB || SLUB
- help
- Randomizes the freelist order used on creating new pages. This
- security feature reduces the predictability of the kernel slab
- allocator against heap overflows.
-
-config SLAB_FREELIST_HARDENED
- bool "Harden slab freelist metadata"
- depends on SLAB || SLUB
- help
- Many kernel heap attacks try to target slab cache metadata and
- other infrastructure. This options makes minor performance
- sacrifices to harden the kernel slab allocator against common
- freelist exploit methods. Some slab implementations have more
- sanity-checking than others. This option is most effective with
- CONFIG_SLUB.
-
-config SHUFFLE_PAGE_ALLOCATOR
- bool "Page allocator randomization"
- default SLAB_FREELIST_RANDOM && ACPI_NUMA
- help
- Randomization of the page allocator improves the average
- utilization of a direct-mapped memory-side-cache. See section
- 5.2.27 Heterogeneous Memory Attribute Table (HMAT) in the ACPI
- 6.2a specification for an example of how a platform advertises
- the presence of a memory-side-cache. There are also incidental
- security benefits as it reduces the predictability of page
- allocations to compliment SLAB_FREELIST_RANDOM, but the
- default granularity of shuffling on the "MAX_ORDER - 1" i.e,
- 10th order of pages is selected based on cache utilization
- benefits on x86.
-
- While the randomization improves cache utilization it may
- negatively impact workloads on platforms without a cache. For
- this reason, by default, the randomization is enabled only
- after runtime detection of a direct-mapped memory-side-cache.
- Otherwise, the randomization may be force enabled with the
- 'page_alloc.shuffle' kernel command line parameter.
-
- Say Y if unsure.
-
-config SLUB_CPU_PARTIAL
- default y
- depends on SLUB && SMP
- bool "SLUB per cpu partial cache"
- help
- Per cpu partial caches accelerate objects allocation and freeing
- that is local to a processor at the price of more indeterminism
- in the latency of the free. On overflow these caches will be cleared
- which requires the taking of locks that may cause latency spikes.
- Typically one would choose no for a realtime system.
-
config MMAP_ALLOW_UNINITIALIZED
bool "Allow mmapped anonymous memory to be uninitialized"
depends on EXPERT && !MMU
diff --git a/mm/Kconfig b/mm/Kconfig
index 02d44e3420f5..894858536e7f 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -2,6 +2,126 @@

menu "Memory Management options"

+#
+# For some reason microblaze and nios2 hard code SWAP=n. Hopefully we can
+# add proper SWAP support to them, in which case this can be remove.
+#
+config ARCH_NO_SWAP
+ bool
+
+config SWAP
+ bool "Support for paging of anonymous memory (swap)"
+ depends on MMU && BLOCK && !ARCH_NO_SWAP
+ default y
+ help
+ This option allows you to choose whether you want to have support
+ for so called swap devices or swap files in your kernel that are
+ used to provide more virtual memory than the actual RAM present
+ in your computer. If unsure say Y.
+
+choice
+ prompt "Choose SLAB allocator"
+ default SLUB
+ help
+ This option allows to select a slab allocator.
+
+config SLAB
+ bool "SLAB"
+ select HAVE_HARDENED_USERCOPY_ALLOCATOR
+ help
+ The regular slab allocator that is established and known to work
+ well in all environments. It organizes cache hot objects in
+ per cpu and per node queues.
+
+config SLUB
+ bool "SLUB (Unqueued Allocator)"
+ select HAVE_HARDENED_USERCOPY_ALLOCATOR
+ help
+ SLUB is a slab allocator that minimizes cache line usage
+ instead of managing queues of cached objects (SLAB approach).
+ Per cpu caching is realized using slabs of objects instead
+ of queues of objects. SLUB can use memory efficiently
+ and has enhanced diagnostics. SLUB is the default choice for
+ a slab allocator.
+
+config SLOB
+ depends on EXPERT
+ bool "SLOB (Simple Allocator)"
+ help
+ SLOB replaces the stock allocator with a drastically simpler
+ allocator. SLOB is generally more space efficient but
+ does not perform as well on large systems.
+
+endchoice
+
+config SLAB_MERGE_DEFAULT
+ bool "Allow slab caches to be merged"
+ default y
+ help
+ For reduced kernel memory fragmentation, slab caches can be
+ merged when they share the same size and other characteristics.
+ This carries a risk of kernel heap overflows being able to
+ overwrite objects from merged caches (and more easily control
+ cache layout), which makes such heap attacks easier to exploit
+ by attackers. By keeping caches unmerged, these kinds of exploits
+ can usually only damage objects in the same cache. To disable
+ merging at runtime, "slab_nomerge" can be passed on the kernel
+ command line.
+
+config SLAB_FREELIST_RANDOM
+ bool "Randomize slab freelist"
+ depends on SLAB || SLUB
+ help
+ Randomizes the freelist order used on creating new pages. This
+ security feature reduces the predictability of the kernel slab
+ allocator against heap overflows.
+
+config SLAB_FREELIST_HARDENED
+ bool "Harden slab freelist metadata"
+ depends on SLAB || SLUB
+ help
+ Many kernel heap attacks try to target slab cache metadata and
+ other infrastructure. This options makes minor performance
+ sacrifices to harden the kernel slab allocator against common
+ freelist exploit methods. Some slab implementations have more
+ sanity-checking than others. This option is most effective with
+ CONFIG_SLUB.
+
+config SHUFFLE_PAGE_ALLOCATOR
+ bool "Page allocator randomization"
+ default SLAB_FREELIST_RANDOM && ACPI_NUMA
+ help
+ Randomization of the page allocator improves the average
+ utilization of a direct-mapped memory-side-cache. See section
+ 5.2.27 Heterogeneous Memory Attribute Table (HMAT) in the ACPI
+ 6.2a specification for an example of how a platform advertises
+ the presence of a memory-side-cache. There are also incidental
+ security benefits as it reduces the predictability of page
+ allocations to compliment SLAB_FREELIST_RANDOM, but the
+ default granularity of shuffling on the "MAX_ORDER - 1" i.e,
+ 10th order of pages is selected based on cache utilization
+ benefits on x86.
+
+ While the randomization improves cache utilization it may
+ negatively impact workloads on platforms without a cache. For
+ this reason, by default, the randomization is enabled only
+ after runtime detection of a direct-mapped memory-side-cache.
+ Otherwise, the randomization may be force enabled with the
+ 'page_alloc.shuffle' kernel command line parameter.
+
+ Say Y if unsure.
+
+config SLUB_CPU_PARTIAL
+ default y
+ depends on SLUB && SMP
+ bool "SLUB per cpu partial cache"
+ help
+ Per cpu partial caches accelerate objects allocation and freeing
+ that is local to a processor at the price of more indeterminism
+ in the latency of the free. On overflow these caches will be cleared
+ which requires the taking of locks that may cause latency spikes.
+ Typically one would choose no for a realtime system.
+
config SELECT_MEMORY_MODEL
def_bool y
depends on ARCH_SELECT_MEMORY_MODEL
--
2.32.0


2021-08-19 19:55:36

by Johannes Weiner

[permalink] [raw]
Subject: [PATCH 3/4] mm: Kconfig: simplify zswap configuration

Clean up option ordering; make prompts and help text more concise and
actionable for non-developers; turn depends into selects where
possible, so that users can simply select the functionality they want
without having to chase down obscure code dependencies.

Signed-off-by: Johannes Weiner <[email protected]>
---
drivers/block/zram/Kconfig | 3 ++-
mm/Kconfig | 53 ++++++++++++++++++--------------------
2 files changed, 27 insertions(+), 29 deletions(-)

diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index 668c6bf2554d..e4163d4b936b 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -1,8 +1,9 @@
# SPDX-License-Identifier: GPL-2.0
config ZRAM
tristate "Compressed RAM block device support"
- depends on BLOCK && SYSFS && ZSMALLOC && CRYPTO
+ depends on BLOCK && SYSFS
depends on CRYPTO_LZO || CRYPTO_ZSTD || CRYPTO_LZ4 || CRYPTO_LZ4HC || CRYPTO_842
+ select ZSMALLOC
help
Creates virtual block devices called /dev/zramX (X = 0, 1, ...).
Pages written to these disks are compressed and stored in memory
diff --git a/mm/Kconfig b/mm/Kconfig
index dbceaa2a04a4..62c6e6092a0a 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -21,9 +21,13 @@ menuconfig SWAP

if SWAP

+config ZPOOL
+ bool
+
config ZSWAP
bool "Compressed cache for swap pages (EXPERIMENTAL)"
- depends on FRONTSWAP && CRYPTO=y
+ select FRONTSWAP
+ select CRYPTO
select ZPOOL
help
A lightweight compressed cache for swap pages. It takes
@@ -39,8 +43,18 @@ config ZSWAP
they have not be fully explored on the large set of potential
configurations and workloads that exist.

+config ZSWAP_DEFAULT_ON
+ bool "Enable the compressed cache for swap pages by default"
+ depends on ZSWAP
+ help
+ If selected, the compressed cache for swap pages will be enabled
+ at boot, otherwise it will be disabled.
+
+ The selection made here can be overridden by using the kernel
+ command line 'zswap.enabled=' option.
+
choice
- prompt "Compressed cache for swap pages default compressor"
+ prompt "Default compressor"
depends on ZSWAP
default ZSWAP_COMPRESSOR_DEFAULT_LZO
help
@@ -106,7 +120,7 @@ config ZSWAP_COMPRESSOR_DEFAULT
default ""

choice
- prompt "Compressed cache for swap pages default allocator"
+ prompt "Default allocator"
depends on ZSWAP
default ZSWAP_ZPOOL_DEFAULT_ZBUD
help
@@ -146,24 +160,9 @@ config ZSWAP_ZPOOL_DEFAULT
default "zsmalloc" if ZSWAP_ZPOOL_DEFAULT_ZSMALLOC
default ""

-config ZSWAP_DEFAULT_ON
- bool "Enable the compressed cache for swap pages by default"
- depends on ZSWAP
- help
- If selected, the compressed cache for swap pages will be enabled
- at boot, otherwise it will be disabled.
-
- The selection made here can be overridden by using the kernel
- command line 'zswap.enabled=' option.
-
-config ZPOOL
- tristate "Common API for compressed memory storage"
- help
- Compressed memory storage API. This allows using either zbud or
- zsmalloc.
-
config ZBUD
- tristate "Low (Up to 2x) density storage for compressed pages"
+ tristate "2:1 compression allocator (zbud)"
+ depends on ZSWAP
help
A special purpose allocator for storing compressed pages.
It is designed to store up to two compressed pages per physical
@@ -172,8 +171,8 @@ config ZBUD
density approach when reclaim will be used.

config Z3FOLD
- tristate "Up to 3x density storage for compressed pages"
- depends on ZPOOL
+ tristate "3:1 compression allocator (z3fold)"
+ depends on ZSWAP
help
A special purpose allocator for storing compressed pages.
It is designed to store up to three compressed pages per physical
@@ -181,15 +180,13 @@ config Z3FOLD
still there.

config ZSMALLOC
- tristate "Memory allocator for compressed pages"
+ tristate
+ prompt "N:1 compression allocator (zsmalloc)" if ZSWAP
depends on MMU
help
zsmalloc is a slab-based memory allocator designed to store
- compressed RAM pages. zsmalloc uses virtual memory mapping
- in order to reduce fragmentation. However, this results in a
- non-standard allocator interface where a handle, not a pointer, is
- returned by an alloc(). This handle must be mapped in order to
- access the allocated space.
+ pages of various compression levels efficiently. It achieves
+ the highest storage density with the least amount of fragmentation.

config ZSMALLOC_STAT
bool "Export zsmalloc statistics"
--
2.32.0

2021-08-19 19:55:47

by Johannes Weiner

[permalink] [raw]
Subject: [PATCH 4/4] mm: zswap: add basic meminfo and vmstat coverage

Currently it requires poking at debugfs to figure out the size of the
zswap cache size on a host. There are no counters for reads and writes
against the cache. This makes it difficult to understand behavior on
production systems.

Print zswap memory consumption in /proc/meminfo, count zswapouts and
zswapins in /proc/vmstat.

Signed-off-by: Johannes Weiner <[email protected]>
---
fs/proc/meminfo.c | 4 ++++
include/linux/swap.h | 4 ++++
include/linux/vm_event_item.h | 4 ++++
mm/vmstat.c | 4 ++++
mm/zswap.c | 11 +++++------
5 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 6fa761c9cc78..2dc474940691 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -86,6 +86,10 @@ static int meminfo_proc_show(struct seq_file *m, void *v)

show_val_kb(m, "SwapTotal: ", i.totalswap);
show_val_kb(m, "SwapFree: ", i.freeswap);
+#ifdef CONFIG_ZSWAP
+ seq_printf(m, "Zswap: %8lu kB\n",
+ (unsigned long)(zswap_pool_total_size >> 10));
+#endif
show_val_kb(m, "Dirty: ",
global_node_page_state(NR_FILE_DIRTY));
show_val_kb(m, "Writeback: ",
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 144727041e78..3b23c88b6a8d 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -696,6 +696,10 @@ static inline int mem_cgroup_swappiness(struct mem_cgroup *mem)
}
#endif

+#ifdef CONFIG_ZSWAP
+extern u64 zswap_pool_total_size;
+#endif
+
#if defined(CONFIG_SWAP) && defined(CONFIG_MEMCG) && defined(CONFIG_BLK_CGROUP)
extern void cgroup_throttle_swaprate(struct page *page, gfp_t gfp_mask);
#else
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index ae0dd1948c2b..9dbebea09c69 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -125,6 +125,10 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
SWAP_RA,
SWAP_RA_HIT,
#endif
+#ifdef CONFIG_ZSWAP
+ ZSWPIN,
+ ZSWPOUT,
+#endif
#ifdef CONFIG_X86
DIRECT_MAP_LEVEL2_SPLIT,
DIRECT_MAP_LEVEL3_SPLIT,
diff --git a/mm/vmstat.c b/mm/vmstat.c
index cccee36b289c..31aada15c571 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1369,6 +1369,10 @@ const char * const vmstat_text[] = {
"swap_ra",
"swap_ra_hit",
#endif
+#ifdef CONFIG_ZSWAP
+ "zswpin",
+ "zswpout",
+#endif
#ifdef CONFIG_X86
"direct_map_level2_splits",
"direct_map_level3_splits",
diff --git a/mm/zswap.c b/mm/zswap.c
index 20763267a219..f93a7c715f76 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -40,7 +40,7 @@
* statistics
**********************************/
/* Total bytes used by the compressed storage */
-static u64 zswap_pool_total_size;
+u64 zswap_pool_total_size;
/* The number of compressed pages currently stored in zswap */
static atomic_t zswap_stored_pages = ATOMIC_INIT(0);
/* The number of same-value filled pages currently stored in zswap */
@@ -1231,6 +1231,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
/* update stats */
atomic_inc(&zswap_stored_pages);
zswap_update_total_size();
+ count_vm_event(ZSWAPOUT);

return 0;

@@ -1273,11 +1274,10 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
zswap_fill_page(dst, entry->value);
kunmap_atomic(dst);
ret = 0;
- goto freeentry;
+ goto stats;
}

if (!zpool_can_sleep_mapped(entry->pool->zpool)) {
-
tmp = kmalloc(entry->length, GFP_ATOMIC);
if (!tmp) {
ret = -ENOMEM;
@@ -1292,10 +1292,8 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
src += sizeof(struct zswap_header);

if (!zpool_can_sleep_mapped(entry->pool->zpool)) {
-
memcpy(tmp, src, entry->length);
src = tmp;
-
zpool_unmap_handle(entry->pool->zpool, entry->handle);
}

@@ -1314,7 +1312,8 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
kfree(tmp);

BUG_ON(ret);
-
+stats:
+ count_vm_event(ZSWAPIN);
freeentry:
spin_lock(&tree->lock);
zswap_entry_put(tree, entry);
--
2.32.0

2021-08-19 19:56:06

by Johannes Weiner

[permalink] [raw]
Subject: [PATCH 2/4] mm: Kconfig: group swap, slab, hotplug and thp options into submenus

There are several clusters of related config options spread throughout
the mostly flat MM submenu. Group them together and put specialization
options into further subdirectories to make the MM section a bit more
organized and easier to navigate.

Signed-off-by: Johannes Weiner <[email protected]>
---
mm/Kconfig | 428 +++++++++++++++++++++++++++--------------------------
1 file changed, 222 insertions(+), 206 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 894858536e7f..dbceaa2a04a4 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -9,7 +9,7 @@ menu "Memory Management options"
config ARCH_NO_SWAP
bool

-config SWAP
+menuconfig SWAP
bool "Support for paging of anonymous memory (swap)"
depends on MMU && BLOCK && !ARCH_NO_SWAP
default y
@@ -19,6 +19,192 @@ config SWAP
used to provide more virtual memory than the actual RAM present
in your computer. If unsure say Y.

+if SWAP
+
+config ZSWAP
+ bool "Compressed cache for swap pages (EXPERIMENTAL)"
+ depends on FRONTSWAP && CRYPTO=y
+ select ZPOOL
+ help
+ A lightweight compressed cache for swap pages. It takes
+ pages that are in the process of being swapped out and attempts to
+ compress them into a dynamically allocated RAM-based memory pool.
+ This can result in a significant I/O reduction on swap device and,
+ in the case where decompressing from RAM is faster that swap device
+ reads, can also improve workload performance.
+
+ This is marked experimental because it is a new feature (as of
+ v3.11) that interacts heavily with memory reclaim. While these
+ interactions don't cause any known issues on simple memory setups,
+ they have not be fully explored on the large set of potential
+ configurations and workloads that exist.
+
+choice
+ prompt "Compressed cache for swap pages default compressor"
+ depends on ZSWAP
+ default ZSWAP_COMPRESSOR_DEFAULT_LZO
+ help
+ Selects the default compression algorithm for the compressed cache
+ for swap pages.
+
+ For an overview what kind of performance can be expected from
+ a particular compression algorithm please refer to the benchmarks
+ available at the following LWN page:
+ https://lwn.net/Articles/751795/
+
+ If in doubt, select 'LZO'.
+
+ The selection made here can be overridden by using the kernel
+ command line 'zswap.compressor=' option.
+
+config ZSWAP_COMPRESSOR_DEFAULT_DEFLATE
+ bool "Deflate"
+ select CRYPTO_DEFLATE
+ help
+ Use the Deflate algorithm as the default compression algorithm.
+
+config ZSWAP_COMPRESSOR_DEFAULT_LZO
+ bool "LZO"
+ select CRYPTO_LZO
+ help
+ Use the LZO algorithm as the default compression algorithm.
+
+config ZSWAP_COMPRESSOR_DEFAULT_842
+ bool "842"
+ select CRYPTO_842
+ help
+ Use the 842 algorithm as the default compression algorithm.
+
+config ZSWAP_COMPRESSOR_DEFAULT_LZ4
+ bool "LZ4"
+ select CRYPTO_LZ4
+ help
+ Use the LZ4 algorithm as the default compression algorithm.
+
+config ZSWAP_COMPRESSOR_DEFAULT_LZ4HC
+ bool "LZ4HC"
+ select CRYPTO_LZ4HC
+ help
+ Use the LZ4HC algorithm as the default compression algorithm.
+
+config ZSWAP_COMPRESSOR_DEFAULT_ZSTD
+ bool "zstd"
+ select CRYPTO_ZSTD
+ help
+ Use the zstd algorithm as the default compression algorithm.
+endchoice
+
+config ZSWAP_COMPRESSOR_DEFAULT
+ string
+ depends on ZSWAP
+ default "deflate" if ZSWAP_COMPRESSOR_DEFAULT_DEFLATE
+ default "lzo" if ZSWAP_COMPRESSOR_DEFAULT_LZO
+ default "842" if ZSWAP_COMPRESSOR_DEFAULT_842
+ default "lz4" if ZSWAP_COMPRESSOR_DEFAULT_LZ4
+ default "lz4hc" if ZSWAP_COMPRESSOR_DEFAULT_LZ4HC
+ default "zstd" if ZSWAP_COMPRESSOR_DEFAULT_ZSTD
+ default ""
+
+choice
+ prompt "Compressed cache for swap pages default allocator"
+ depends on ZSWAP
+ default ZSWAP_ZPOOL_DEFAULT_ZBUD
+ help
+ Selects the default allocator for the compressed cache for
+ swap pages.
+ The default is 'zbud' for compatibility, however please do
+ read the description of each of the allocators below before
+ making a right choice.
+
+ The selection made here can be overridden by using the kernel
+ command line 'zswap.zpool=' option.
+
+config ZSWAP_ZPOOL_DEFAULT_ZBUD
+ bool "zbud"
+ select ZBUD
+ help
+ Use the zbud allocator as the default allocator.
+
+config ZSWAP_ZPOOL_DEFAULT_Z3FOLD
+ bool "z3fold"
+ select Z3FOLD
+ help
+ Use the z3fold allocator as the default allocator.
+
+config ZSWAP_ZPOOL_DEFAULT_ZSMALLOC
+ bool "zsmalloc"
+ select ZSMALLOC
+ help
+ Use the zsmalloc allocator as the default allocator.
+endchoice
+
+config ZSWAP_ZPOOL_DEFAULT
+ string
+ depends on ZSWAP
+ default "zbud" if ZSWAP_ZPOOL_DEFAULT_ZBUD
+ default "z3fold" if ZSWAP_ZPOOL_DEFAULT_Z3FOLD
+ default "zsmalloc" if ZSWAP_ZPOOL_DEFAULT_ZSMALLOC
+ default ""
+
+config ZSWAP_DEFAULT_ON
+ bool "Enable the compressed cache for swap pages by default"
+ depends on ZSWAP
+ help
+ If selected, the compressed cache for swap pages will be enabled
+ at boot, otherwise it will be disabled.
+
+ The selection made here can be overridden by using the kernel
+ command line 'zswap.enabled=' option.
+
+config ZPOOL
+ tristate "Common API for compressed memory storage"
+ help
+ Compressed memory storage API. This allows using either zbud or
+ zsmalloc.
+
+config ZBUD
+ tristate "Low (Up to 2x) density storage for compressed pages"
+ help
+ A special purpose allocator for storing compressed pages.
+ It is designed to store up to two compressed pages per physical
+ page. While this design limits storage density, it has simple and
+ deterministic reclaim properties that make it preferable to a higher
+ density approach when reclaim will be used.
+
+config Z3FOLD
+ tristate "Up to 3x density storage for compressed pages"
+ depends on ZPOOL
+ help
+ A special purpose allocator for storing compressed pages.
+ It is designed to store up to three compressed pages per physical
+ page. It is a ZBUD derivative so the simplicity and determinism are
+ still there.
+
+config ZSMALLOC
+ tristate "Memory allocator for compressed pages"
+ depends on MMU
+ help
+ zsmalloc is a slab-based memory allocator designed to store
+ compressed RAM pages. zsmalloc uses virtual memory mapping
+ in order to reduce fragmentation. However, this results in a
+ non-standard allocator interface where a handle, not a pointer, is
+ returned by an alloc(). This handle must be mapped in order to
+ access the allocated space.
+
+config ZSMALLOC_STAT
+ bool "Export zsmalloc statistics"
+ depends on ZSMALLOC
+ select DEBUG_FS
+ help
+ This option enables code in the zsmalloc to collect various
+ statistics about what's happening in zsmalloc and exports that
+ information to userspace via debugfs.
+ If unsure, say N.
+
+endif # SWAP
+
+menu "SLAB allocator options"
+
choice
prompt "Choose SLAB allocator"
default SLUB
@@ -87,6 +273,19 @@ config SLAB_FREELIST_HARDENED
sanity-checking than others. This option is most effective with
CONFIG_SLUB.

+config SLUB_CPU_PARTIAL
+ default y
+ depends on SLUB && SMP
+ bool "SLUB per cpu partial cache"
+ help
+ Per cpu partial caches accelerate objects allocation and freeing
+ that is local to a processor at the price of more indeterminism
+ in the latency of the free. On overflow these caches will be cleared
+ which requires the taking of locks that may cause latency spikes.
+ Typically one would choose no for a realtime system.
+
+endmenu # SLAB allocator options
+
config SHUFFLE_PAGE_ALLOCATOR
bool "Page allocator randomization"
default SLAB_FREELIST_RANDOM && ACPI_NUMA
@@ -111,17 +310,6 @@ config SHUFFLE_PAGE_ALLOCATOR

Say Y if unsure.

-config SLUB_CPU_PARTIAL
- default y
- depends on SLUB && SMP
- bool "SLUB per cpu partial cache"
- help
- Per cpu partial caches accelerate objects allocation and freeing
- that is local to a processor at the price of more indeterminism
- in the latency of the free. On overflow these caches will be cleared
- which requires the taking of locks that may cause latency spikes.
- Typically one would choose no for a realtime system.
-
config SELECT_MEMORY_MODEL
def_bool y
depends on ARCH_SELECT_MEMORY_MODEL
@@ -272,14 +460,16 @@ config ARCH_ENABLE_MEMORY_HOTPLUG
bool

# eventually, we can have this option just 'select SPARSEMEM'
-config MEMORY_HOTPLUG
- bool "Allow for memory hot-add"
+menuconfig MEMORY_HOTPLUG
+ bool "Memory hotplug"
select MEMORY_ISOLATION
depends on SPARSEMEM || X86_64_ACPI_NUMA
depends on ARCH_ENABLE_MEMORY_HOTPLUG
depends on 64BIT || BROKEN
select NUMA_KEEP_MEMINFO if NUMA

+if MEMORY_HOTPLUG
+
config MEMORY_HOTPLUG_SPARSE
def_bool y
depends on SPARSEMEM && MEMORY_HOTPLUG
@@ -313,6 +503,8 @@ config MHP_MEMMAP_ON_MEMORY
depends on MEMORY_HOTPLUG && SPARSEMEM_VMEMMAP
depends on ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE

+endif # MEMORY_HOTPLUG
+
# Heavily threaded applications may benefit from splitting the mm-wide
# page_table_lock, so that faults on different parts of the user address
# space can be handled with less contention: split it at this NR_CPUS.
@@ -521,7 +713,7 @@ config NOMMU_INITIAL_TRIM_EXCESS

See Documentation/admin-guide/mm/nommu-mmap.rst for more information.

-config TRANSPARENT_HUGEPAGE
+menuconfig TRANSPARENT_HUGEPAGE
bool "Transparent Hugepage Support"
depends on HAVE_ARCH_TRANSPARENT_HUGEPAGE
select COMPACTION
@@ -536,6 +728,8 @@ config TRANSPARENT_HUGEPAGE

If memory constrained on embedded, you may want to say N.

+if TRANSPARENT_HUGEPAGE
+
choice
prompt "Transparent Hugepage Support sysfs defaults"
depends on TRANSPARENT_HUGEPAGE
@@ -573,6 +767,19 @@ config THP_SWAP

For selection by architectures with reasonable THP sizes.

+config READ_ONLY_THP_FOR_FS
+ bool "Read-only THP for filesystems (EXPERIMENTAL)"
+ depends on TRANSPARENT_HUGEPAGE && SHMEM
+
+ help
+ Allow khugepaged to put read-only file-backed pages in THP.
+
+ This is marked experimental because it is a new feature. Write
+ support of file THPs will be developed in the next few release
+ cycles.
+
+endif # TRANSPARENT_HUGEPAGE
+
#
# UP and nommu archs use km based percpu allocator
#
@@ -680,186 +887,6 @@ config MEM_SOFT_DIRTY

See Documentation/admin-guide/mm/soft-dirty.rst for more details.

-config ZSWAP
- bool "Compressed cache for swap pages (EXPERIMENTAL)"
- depends on FRONTSWAP && CRYPTO=y
- select ZPOOL
- help
- A lightweight compressed cache for swap pages. It takes
- pages that are in the process of being swapped out and attempts to
- compress them into a dynamically allocated RAM-based memory pool.
- This can result in a significant I/O reduction on swap device and,
- in the case where decompressing from RAM is faster that swap device
- reads, can also improve workload performance.
-
- This is marked experimental because it is a new feature (as of
- v3.11) that interacts heavily with memory reclaim. While these
- interactions don't cause any known issues on simple memory setups,
- they have not be fully explored on the large set of potential
- configurations and workloads that exist.
-
-choice
- prompt "Compressed cache for swap pages default compressor"
- depends on ZSWAP
- default ZSWAP_COMPRESSOR_DEFAULT_LZO
- help
- Selects the default compression algorithm for the compressed cache
- for swap pages.
-
- For an overview what kind of performance can be expected from
- a particular compression algorithm please refer to the benchmarks
- available at the following LWN page:
- https://lwn.net/Articles/751795/
-
- If in doubt, select 'LZO'.
-
- The selection made here can be overridden by using the kernel
- command line 'zswap.compressor=' option.
-
-config ZSWAP_COMPRESSOR_DEFAULT_DEFLATE
- bool "Deflate"
- select CRYPTO_DEFLATE
- help
- Use the Deflate algorithm as the default compression algorithm.
-
-config ZSWAP_COMPRESSOR_DEFAULT_LZO
- bool "LZO"
- select CRYPTO_LZO
- help
- Use the LZO algorithm as the default compression algorithm.
-
-config ZSWAP_COMPRESSOR_DEFAULT_842
- bool "842"
- select CRYPTO_842
- help
- Use the 842 algorithm as the default compression algorithm.
-
-config ZSWAP_COMPRESSOR_DEFAULT_LZ4
- bool "LZ4"
- select CRYPTO_LZ4
- help
- Use the LZ4 algorithm as the default compression algorithm.
-
-config ZSWAP_COMPRESSOR_DEFAULT_LZ4HC
- bool "LZ4HC"
- select CRYPTO_LZ4HC
- help
- Use the LZ4HC algorithm as the default compression algorithm.
-
-config ZSWAP_COMPRESSOR_DEFAULT_ZSTD
- bool "zstd"
- select CRYPTO_ZSTD
- help
- Use the zstd algorithm as the default compression algorithm.
-endchoice
-
-config ZSWAP_COMPRESSOR_DEFAULT
- string
- depends on ZSWAP
- default "deflate" if ZSWAP_COMPRESSOR_DEFAULT_DEFLATE
- default "lzo" if ZSWAP_COMPRESSOR_DEFAULT_LZO
- default "842" if ZSWAP_COMPRESSOR_DEFAULT_842
- default "lz4" if ZSWAP_COMPRESSOR_DEFAULT_LZ4
- default "lz4hc" if ZSWAP_COMPRESSOR_DEFAULT_LZ4HC
- default "zstd" if ZSWAP_COMPRESSOR_DEFAULT_ZSTD
- default ""
-
-choice
- prompt "Compressed cache for swap pages default allocator"
- depends on ZSWAP
- default ZSWAP_ZPOOL_DEFAULT_ZBUD
- help
- Selects the default allocator for the compressed cache for
- swap pages.
- The default is 'zbud' for compatibility, however please do
- read the description of each of the allocators below before
- making a right choice.
-
- The selection made here can be overridden by using the kernel
- command line 'zswap.zpool=' option.
-
-config ZSWAP_ZPOOL_DEFAULT_ZBUD
- bool "zbud"
- select ZBUD
- help
- Use the zbud allocator as the default allocator.
-
-config ZSWAP_ZPOOL_DEFAULT_Z3FOLD
- bool "z3fold"
- select Z3FOLD
- help
- Use the z3fold allocator as the default allocator.
-
-config ZSWAP_ZPOOL_DEFAULT_ZSMALLOC
- bool "zsmalloc"
- select ZSMALLOC
- help
- Use the zsmalloc allocator as the default allocator.
-endchoice
-
-config ZSWAP_ZPOOL_DEFAULT
- string
- depends on ZSWAP
- default "zbud" if ZSWAP_ZPOOL_DEFAULT_ZBUD
- default "z3fold" if ZSWAP_ZPOOL_DEFAULT_Z3FOLD
- default "zsmalloc" if ZSWAP_ZPOOL_DEFAULT_ZSMALLOC
- default ""
-
-config ZSWAP_DEFAULT_ON
- bool "Enable the compressed cache for swap pages by default"
- depends on ZSWAP
- help
- If selected, the compressed cache for swap pages will be enabled
- at boot, otherwise it will be disabled.
-
- The selection made here can be overridden by using the kernel
- command line 'zswap.enabled=' option.
-
-config ZPOOL
- tristate "Common API for compressed memory storage"
- help
- Compressed memory storage API. This allows using either zbud or
- zsmalloc.
-
-config ZBUD
- tristate "Low (Up to 2x) density storage for compressed pages"
- help
- A special purpose allocator for storing compressed pages.
- It is designed to store up to two compressed pages per physical
- page. While this design limits storage density, it has simple and
- deterministic reclaim properties that make it preferable to a higher
- density approach when reclaim will be used.
-
-config Z3FOLD
- tristate "Up to 3x density storage for compressed pages"
- depends on ZPOOL
- help
- A special purpose allocator for storing compressed pages.
- It is designed to store up to three compressed pages per physical
- page. It is a ZBUD derivative so the simplicity and determinism are
- still there.
-
-config ZSMALLOC
- tristate "Memory allocator for compressed pages"
- depends on MMU
- help
- zsmalloc is a slab-based memory allocator designed to store
- compressed RAM pages. zsmalloc uses virtual memory mapping
- in order to reduce fragmentation. However, this results in a
- non-standard allocator interface where a handle, not a pointer, is
- returned by an alloc(). This handle must be mapped in order to
- access the allocated space.
-
-config ZSMALLOC_STAT
- bool "Export zsmalloc statistics"
- depends on ZSMALLOC
- select DEBUG_FS
- help
- This option enables code in the zsmalloc to collect various
- statistics about what's happening in zsmalloc and exports that
- information to userspace via debugfs.
- If unsure, say N.
-
config GENERIC_EARLY_IOREMAP
bool

@@ -988,17 +1015,6 @@ comment "GUP_TEST needs to have DEBUG_FS enabled"
config GUP_GET_PTE_LOW_HIGH
bool

-config READ_ONLY_THP_FOR_FS
- bool "Read-only THP for filesystems (EXPERIMENTAL)"
- depends on TRANSPARENT_HUGEPAGE && SHMEM
-
- help
- Allow khugepaged to put read-only file-backed pages in THP.
-
- This is marked experimental because it is a new feature. Write
- support of file THPs will be developed in the next few release
- cycles.
-
config ARCH_HAS_PTE_SPECIAL
bool

--
2.32.0

2021-08-24 11:49:06

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 1/4] mm: Kconfig: move swap and slab config options to the MM section

On 8/19/21 21:55, Johannes Weiner wrote:
> These are currently under General Setup. MM seems like a better fit.

Right. I've been also wondering about that occasionally.

> Signed-off-by: Johannes Weiner <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>

> ---
> init/Kconfig | 120 ---------------------------------------------------
> mm/Kconfig | 120 +++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 120 insertions(+), 120 deletions(-)
>
> diff --git a/init/Kconfig b/init/Kconfig
> index a61c92066c2e..a2358cd5498a 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -331,23 +331,6 @@ config DEFAULT_HOSTNAME
> but you may wish to use a different default here to make a minimal
> system more usable with less configuration.
>
> -#
> -# For some reason microblaze and nios2 hard code SWAP=n. Hopefully we can
> -# add proper SWAP support to them, in which case this can be remove.
> -#
> -config ARCH_NO_SWAP
> - bool
> -
> -config SWAP
> - bool "Support for paging of anonymous memory (swap)"
> - depends on MMU && BLOCK && !ARCH_NO_SWAP
> - default y
> - help
> - This option allows you to choose whether you want to have support
> - for so called swap devices or swap files in your kernel that are
> - used to provide more virtual memory than the actual RAM present
> - in your computer. If unsure say Y.
> -
> config SYSVIPC
> bool "System V IPC"
> help
> @@ -1862,109 +1845,6 @@ config COMPAT_BRK
>
> On non-ancient distros (post-2000 ones) N is usually a safe choice.
>
> -choice
> - prompt "Choose SLAB allocator"
> - default SLUB
> - help
> - This option allows to select a slab allocator.
> -
> -config SLAB
> - bool "SLAB"
> - select HAVE_HARDENED_USERCOPY_ALLOCATOR
> - help
> - The regular slab allocator that is established and known to work
> - well in all environments. It organizes cache hot objects in
> - per cpu and per node queues.
> -
> -config SLUB
> - bool "SLUB (Unqueued Allocator)"
> - select HAVE_HARDENED_USERCOPY_ALLOCATOR
> - help
> - SLUB is a slab allocator that minimizes cache line usage
> - instead of managing queues of cached objects (SLAB approach).
> - Per cpu caching is realized using slabs of objects instead
> - of queues of objects. SLUB can use memory efficiently
> - and has enhanced diagnostics. SLUB is the default choice for
> - a slab allocator.
> -
> -config SLOB
> - depends on EXPERT
> - bool "SLOB (Simple Allocator)"
> - help
> - SLOB replaces the stock allocator with a drastically simpler
> - allocator. SLOB is generally more space efficient but
> - does not perform as well on large systems.
> -
> -endchoice
> -
> -config SLAB_MERGE_DEFAULT
> - bool "Allow slab caches to be merged"
> - default y
> - help
> - For reduced kernel memory fragmentation, slab caches can be
> - merged when they share the same size and other characteristics.
> - This carries a risk of kernel heap overflows being able to
> - overwrite objects from merged caches (and more easily control
> - cache layout), which makes such heap attacks easier to exploit
> - by attackers. By keeping caches unmerged, these kinds of exploits
> - can usually only damage objects in the same cache. To disable
> - merging at runtime, "slab_nomerge" can be passed on the kernel
> - command line.
> -
> -config SLAB_FREELIST_RANDOM
> - bool "Randomize slab freelist"
> - depends on SLAB || SLUB
> - help
> - Randomizes the freelist order used on creating new pages. This
> - security feature reduces the predictability of the kernel slab
> - allocator against heap overflows.
> -
> -config SLAB_FREELIST_HARDENED
> - bool "Harden slab freelist metadata"
> - depends on SLAB || SLUB
> - help
> - Many kernel heap attacks try to target slab cache metadata and
> - other infrastructure. This options makes minor performance
> - sacrifices to harden the kernel slab allocator against common
> - freelist exploit methods. Some slab implementations have more
> - sanity-checking than others. This option is most effective with
> - CONFIG_SLUB.
> -
> -config SHUFFLE_PAGE_ALLOCATOR
> - bool "Page allocator randomization"
> - default SLAB_FREELIST_RANDOM && ACPI_NUMA
> - help
> - Randomization of the page allocator improves the average
> - utilization of a direct-mapped memory-side-cache. See section
> - 5.2.27 Heterogeneous Memory Attribute Table (HMAT) in the ACPI
> - 6.2a specification for an example of how a platform advertises
> - the presence of a memory-side-cache. There are also incidental
> - security benefits as it reduces the predictability of page
> - allocations to compliment SLAB_FREELIST_RANDOM, but the
> - default granularity of shuffling on the "MAX_ORDER - 1" i.e,
> - 10th order of pages is selected based on cache utilization
> - benefits on x86.
> -
> - While the randomization improves cache utilization it may
> - negatively impact workloads on platforms without a cache. For
> - this reason, by default, the randomization is enabled only
> - after runtime detection of a direct-mapped memory-side-cache.
> - Otherwise, the randomization may be force enabled with the
> - 'page_alloc.shuffle' kernel command line parameter.
> -
> - Say Y if unsure.
> -
> -config SLUB_CPU_PARTIAL
> - default y
> - depends on SLUB && SMP
> - bool "SLUB per cpu partial cache"
> - help
> - Per cpu partial caches accelerate objects allocation and freeing
> - that is local to a processor at the price of more indeterminism
> - in the latency of the free. On overflow these caches will be cleared
> - which requires the taking of locks that may cause latency spikes.
> - Typically one would choose no for a realtime system.
> -
> config MMAP_ALLOW_UNINITIALIZED
> bool "Allow mmapped anonymous memory to be uninitialized"
> depends on EXPERT && !MMU
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 02d44e3420f5..894858536e7f 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -2,6 +2,126 @@
>
> menu "Memory Management options"
>
> +#
> +# For some reason microblaze and nios2 hard code SWAP=n. Hopefully we can
> +# add proper SWAP support to them, in which case this can be remove.
> +#
> +config ARCH_NO_SWAP
> + bool
> +
> +config SWAP
> + bool "Support for paging of anonymous memory (swap)"
> + depends on MMU && BLOCK && !ARCH_NO_SWAP
> + default y
> + help
> + This option allows you to choose whether you want to have support
> + for so called swap devices or swap files in your kernel that are
> + used to provide more virtual memory than the actual RAM present
> + in your computer. If unsure say Y.
> +
> +choice
> + prompt "Choose SLAB allocator"
> + default SLUB
> + help
> + This option allows to select a slab allocator.
> +
> +config SLAB
> + bool "SLAB"
> + select HAVE_HARDENED_USERCOPY_ALLOCATOR
> + help
> + The regular slab allocator that is established and known to work
> + well in all environments. It organizes cache hot objects in
> + per cpu and per node queues.
> +
> +config SLUB
> + bool "SLUB (Unqueued Allocator)"
> + select HAVE_HARDENED_USERCOPY_ALLOCATOR
> + help
> + SLUB is a slab allocator that minimizes cache line usage
> + instead of managing queues of cached objects (SLAB approach).
> + Per cpu caching is realized using slabs of objects instead
> + of queues of objects. SLUB can use memory efficiently
> + and has enhanced diagnostics. SLUB is the default choice for
> + a slab allocator.
> +
> +config SLOB
> + depends on EXPERT
> + bool "SLOB (Simple Allocator)"
> + help
> + SLOB replaces the stock allocator with a drastically simpler
> + allocator. SLOB is generally more space efficient but
> + does not perform as well on large systems.
> +
> +endchoice
> +
> +config SLAB_MERGE_DEFAULT
> + bool "Allow slab caches to be merged"
> + default y
> + help
> + For reduced kernel memory fragmentation, slab caches can be
> + merged when they share the same size and other characteristics.
> + This carries a risk of kernel heap overflows being able to
> + overwrite objects from merged caches (and more easily control
> + cache layout), which makes such heap attacks easier to exploit
> + by attackers. By keeping caches unmerged, these kinds of exploits
> + can usually only damage objects in the same cache. To disable
> + merging at runtime, "slab_nomerge" can be passed on the kernel
> + command line.
> +
> +config SLAB_FREELIST_RANDOM
> + bool "Randomize slab freelist"
> + depends on SLAB || SLUB
> + help
> + Randomizes the freelist order used on creating new pages. This
> + security feature reduces the predictability of the kernel slab
> + allocator against heap overflows.
> +
> +config SLAB_FREELIST_HARDENED
> + bool "Harden slab freelist metadata"
> + depends on SLAB || SLUB
> + help
> + Many kernel heap attacks try to target slab cache metadata and
> + other infrastructure. This options makes minor performance
> + sacrifices to harden the kernel slab allocator against common
> + freelist exploit methods. Some slab implementations have more
> + sanity-checking than others. This option is most effective with
> + CONFIG_SLUB.
> +
> +config SHUFFLE_PAGE_ALLOCATOR
> + bool "Page allocator randomization"
> + default SLAB_FREELIST_RANDOM && ACPI_NUMA
> + help
> + Randomization of the page allocator improves the average
> + utilization of a direct-mapped memory-side-cache. See section
> + 5.2.27 Heterogeneous Memory Attribute Table (HMAT) in the ACPI
> + 6.2a specification for an example of how a platform advertises
> + the presence of a memory-side-cache. There are also incidental
> + security benefits as it reduces the predictability of page
> + allocations to compliment SLAB_FREELIST_RANDOM, but the
> + default granularity of shuffling on the "MAX_ORDER - 1" i.e,
> + 10th order of pages is selected based on cache utilization
> + benefits on x86.
> +
> + While the randomization improves cache utilization it may
> + negatively impact workloads on platforms without a cache. For
> + this reason, by default, the randomization is enabled only
> + after runtime detection of a direct-mapped memory-side-cache.
> + Otherwise, the randomization may be force enabled with the
> + 'page_alloc.shuffle' kernel command line parameter.
> +
> + Say Y if unsure.
> +
> +config SLUB_CPU_PARTIAL
> + default y
> + depends on SLUB && SMP
> + bool "SLUB per cpu partial cache"
> + help
> + Per cpu partial caches accelerate objects allocation and freeing
> + that is local to a processor at the price of more indeterminism
> + in the latency of the free. On overflow these caches will be cleared
> + which requires the taking of locks that may cause latency spikes.
> + Typically one would choose no for a realtime system.
> +
> config SELECT_MEMORY_MODEL
> def_bool y
> depends on ARCH_SELECT_MEMORY_MODEL
>

2021-08-24 12:05:51

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 2/4] mm: Kconfig: group swap, slab, hotplug and thp options into submenus

On 8/19/21 21:55, Johannes Weiner wrote:
> There are several clusters of related config options spread throughout
> the mostly flat MM submenu. Group them together and put specialization
> options into further subdirectories to make the MM section a bit more
> organized and easier to navigate.
>
> Signed-off-by: Johannes Weiner <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>

Note:

> -config ZBUD
> - tristate "Low (Up to 2x) density storage for compressed pages"
> - help
> - A special purpose allocator for storing compressed pages.
> - It is designed to store up to two compressed pages per physical
> - page. While this design limits storage density, it has simple and
> - deterministic reclaim properties that make it preferable to a higher
> - density approach when reclaim will be used.
> -

The whole large hunk with deletion part of the block move will be rejected in
current trees because this is apparently based on older commit than 2a03085ce887
("mm/zbud: don't export any zbud API") which adds a "depends on ZPOOL" to the
above. It's thus also missing in the add hunk part of the move and if not
careful when resolving the reject, the depend will then be missing in the result.

2021-08-24 12:07:25

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 3/4] mm: Kconfig: simplify zswap configuration

+CC zswap maintainers

On 8/19/21 21:55, Johannes Weiner wrote:
> Clean up option ordering; make prompts and help text more concise and
> actionable for non-developers; turn depends into selects where
> possible, so that users can simply select the functionality they want
> without having to chase down obscure code dependencies.
>
> Signed-off-by: Johannes Weiner <[email protected]>
> ---
> drivers/block/zram/Kconfig | 3 ++-
> mm/Kconfig | 53 ++++++++++++++++++--------------------
> 2 files changed, 27 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
> index 668c6bf2554d..e4163d4b936b 100644
> --- a/drivers/block/zram/Kconfig
> +++ b/drivers/block/zram/Kconfig
> @@ -1,8 +1,9 @@
> # SPDX-License-Identifier: GPL-2.0
> config ZRAM
> tristate "Compressed RAM block device support"
> - depends on BLOCK && SYSFS && ZSMALLOC && CRYPTO
> + depends on BLOCK && SYSFS
> depends on CRYPTO_LZO || CRYPTO_ZSTD || CRYPTO_LZ4 || CRYPTO_LZ4HC || CRYPTO_842
> + select ZSMALLOC
> help
> Creates virtual block devices called /dev/zramX (X = 0, 1, ...).
> Pages written to these disks are compressed and stored in memory
> diff --git a/mm/Kconfig b/mm/Kconfig
> index dbceaa2a04a4..62c6e6092a0a 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -21,9 +21,13 @@ menuconfig SWAP
>
> if SWAP
>
> +config ZPOOL
> + bool
> +
> config ZSWAP
> bool "Compressed cache for swap pages (EXPERIMENTAL)"
> - depends on FRONTSWAP && CRYPTO=y
> + select FRONTSWAP
> + select CRYPTO
> select ZPOOL
> help
> A lightweight compressed cache for swap pages. It takes
> @@ -39,8 +43,18 @@ config ZSWAP
> they have not be fully explored on the large set of potential
> configurations and workloads that exist.
>
> +config ZSWAP_DEFAULT_ON
> + bool "Enable the compressed cache for swap pages by default"
> + depends on ZSWAP
> + help
> + If selected, the compressed cache for swap pages will be enabled
> + at boot, otherwise it will be disabled.
> +
> + The selection made here can be overridden by using the kernel
> + command line 'zswap.enabled=' option.
> +
> choice
> - prompt "Compressed cache for swap pages default compressor"
> + prompt "Default compressor"
> depends on ZSWAP
> default ZSWAP_COMPRESSOR_DEFAULT_LZO
> help
> @@ -106,7 +120,7 @@ config ZSWAP_COMPRESSOR_DEFAULT
> default ""
>
> choice
> - prompt "Compressed cache for swap pages default allocator"
> + prompt "Default allocator"
> depends on ZSWAP
> default ZSWAP_ZPOOL_DEFAULT_ZBUD
> help
> @@ -146,24 +160,9 @@ config ZSWAP_ZPOOL_DEFAULT
> default "zsmalloc" if ZSWAP_ZPOOL_DEFAULT_ZSMALLOC
> default ""
>
> -config ZSWAP_DEFAULT_ON
> - bool "Enable the compressed cache for swap pages by default"
> - depends on ZSWAP
> - help
> - If selected, the compressed cache for swap pages will be enabled
> - at boot, otherwise it will be disabled.
> -
> - The selection made here can be overridden by using the kernel
> - command line 'zswap.enabled=' option.
> -
> -config ZPOOL
> - tristate "Common API for compressed memory storage"
> - help
> - Compressed memory storage API. This allows using either zbud or
> - zsmalloc.
> -
> config ZBUD
> - tristate "Low (Up to 2x) density storage for compressed pages"
> + tristate "2:1 compression allocator (zbud)"
> + depends on ZSWAP
> help
> A special purpose allocator for storing compressed pages.
> It is designed to store up to two compressed pages per physical
> @@ -172,8 +171,8 @@ config ZBUD
> density approach when reclaim will be used.
>
> config Z3FOLD
> - tristate "Up to 3x density storage for compressed pages"
> - depends on ZPOOL
> + tristate "3:1 compression allocator (z3fold)"
> + depends on ZSWAP
> help
> A special purpose allocator for storing compressed pages.
> It is designed to store up to three compressed pages per physical
> @@ -181,15 +180,13 @@ config Z3FOLD
> still there.
>
> config ZSMALLOC
> - tristate "Memory allocator for compressed pages"
> + tristate
> + prompt "N:1 compression allocator (zsmalloc)" if ZSWAP
> depends on MMU
> help
> zsmalloc is a slab-based memory allocator designed to store
> - compressed RAM pages. zsmalloc uses virtual memory mapping
> - in order to reduce fragmentation. However, this results in a
> - non-standard allocator interface where a handle, not a pointer, is
> - returned by an alloc(). This handle must be mapped in order to
> - access the allocated space.
> + pages of various compression levels efficiently. It achieves
> + the highest storage density with the least amount of fragmentation.
>
> config ZSMALLOC_STAT
> bool "Export zsmalloc statistics"
>

2021-08-24 12:08:36

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 4/4] mm: zswap: add basic meminfo and vmstat coverage

+CC zswap maintainers

On 8/19/21 21:55, Johannes Weiner wrote:
> Currently it requires poking at debugfs to figure out the size of the
> zswap cache size on a host. There are no counters for reads and writes
> against the cache. This makes it difficult to understand behavior on
> production systems.
>
> Print zswap memory consumption in /proc/meminfo, count zswapouts and
> zswapins in /proc/vmstat.
>
> Signed-off-by: Johannes Weiner <[email protected]>
> ---
> fs/proc/meminfo.c | 4 ++++
> include/linux/swap.h | 4 ++++
> include/linux/vm_event_item.h | 4 ++++
> mm/vmstat.c | 4 ++++
> mm/zswap.c | 11 +++++------
> 5 files changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> index 6fa761c9cc78..2dc474940691 100644
> --- a/fs/proc/meminfo.c
> +++ b/fs/proc/meminfo.c
> @@ -86,6 +86,10 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
>
> show_val_kb(m, "SwapTotal: ", i.totalswap);
> show_val_kb(m, "SwapFree: ", i.freeswap);
> +#ifdef CONFIG_ZSWAP
> + seq_printf(m, "Zswap: %8lu kB\n",
> + (unsigned long)(zswap_pool_total_size >> 10));
> +#endif
> show_val_kb(m, "Dirty: ",
> global_node_page_state(NR_FILE_DIRTY));
> show_val_kb(m, "Writeback: ",
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 144727041e78..3b23c88b6a8d 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -696,6 +696,10 @@ static inline int mem_cgroup_swappiness(struct mem_cgroup *mem)
> }
> #endif
>
> +#ifdef CONFIG_ZSWAP
> +extern u64 zswap_pool_total_size;
> +#endif
> +
> #if defined(CONFIG_SWAP) && defined(CONFIG_MEMCG) && defined(CONFIG_BLK_CGROUP)
> extern void cgroup_throttle_swaprate(struct page *page, gfp_t gfp_mask);
> #else
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index ae0dd1948c2b..9dbebea09c69 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -125,6 +125,10 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> SWAP_RA,
> SWAP_RA_HIT,
> #endif
> +#ifdef CONFIG_ZSWAP
> + ZSWPIN,
> + ZSWPOUT,
> +#endif
> #ifdef CONFIG_X86
> DIRECT_MAP_LEVEL2_SPLIT,
> DIRECT_MAP_LEVEL3_SPLIT,
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index cccee36b289c..31aada15c571 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1369,6 +1369,10 @@ const char * const vmstat_text[] = {
> "swap_ra",
> "swap_ra_hit",
> #endif
> +#ifdef CONFIG_ZSWAP
> + "zswpin",
> + "zswpout",
> +#endif
> #ifdef CONFIG_X86
> "direct_map_level2_splits",
> "direct_map_level3_splits",
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 20763267a219..f93a7c715f76 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -40,7 +40,7 @@
> * statistics
> **********************************/
> /* Total bytes used by the compressed storage */
> -static u64 zswap_pool_total_size;
> +u64 zswap_pool_total_size;
> /* The number of compressed pages currently stored in zswap */
> static atomic_t zswap_stored_pages = ATOMIC_INIT(0);
> /* The number of same-value filled pages currently stored in zswap */
> @@ -1231,6 +1231,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
> /* update stats */
> atomic_inc(&zswap_stored_pages);
> zswap_update_total_size();
> + count_vm_event(ZSWAPOUT);
>
> return 0;
>
> @@ -1273,11 +1274,10 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
> zswap_fill_page(dst, entry->value);
> kunmap_atomic(dst);
> ret = 0;
> - goto freeentry;
> + goto stats;
> }
>
> if (!zpool_can_sleep_mapped(entry->pool->zpool)) {
> -
> tmp = kmalloc(entry->length, GFP_ATOMIC);
> if (!tmp) {
> ret = -ENOMEM;
> @@ -1292,10 +1292,8 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
> src += sizeof(struct zswap_header);
>
> if (!zpool_can_sleep_mapped(entry->pool->zpool)) {
> -
> memcpy(tmp, src, entry->length);
> src = tmp;
> -
> zpool_unmap_handle(entry->pool->zpool, entry->handle);
> }
>
> @@ -1314,7 +1312,8 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
> kfree(tmp);
>
> BUG_ON(ret);
> -
> +stats:
> + count_vm_event(ZSWAPIN);
> freeentry:
> spin_lock(&tree->lock);
> zswap_entry_put(tree, entry);
>

2021-08-24 14:56:20

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH 2/4] mm: Kconfig: group swap, slab, hotplug and thp options into submenus

On Tue, Aug 24, 2021 at 02:03:43PM +0200, Vlastimil Babka wrote:
> On 8/19/21 21:55, Johannes Weiner wrote:
> > There are several clusters of related config options spread throughout
> > the mostly flat MM submenu. Group them together and put specialization
> > options into further subdirectories to make the MM section a bit more
> > organized and easier to navigate.
> >
> > Signed-off-by: Johannes Weiner <[email protected]>
>
> Acked-by: Vlastimil Babka <[email protected]>

Thanks, Vlastimil!

> Note:
>
> > -config ZBUD
> > - tristate "Low (Up to 2x) density storage for compressed pages"
> > - help
> > - A special purpose allocator for storing compressed pages.
> > - It is designed to store up to two compressed pages per physical
> > - page. While this design limits storage density, it has simple and
> > - deterministic reclaim properties that make it preferable to a higher
> > - density approach when reclaim will be used.
> > -
>
> The whole large hunk with deletion part of the block move will be rejected in
> current trees because this is apparently based on older commit than 2a03085ce887
> ("mm/zbud: don't export any zbud API") which adds a "depends on ZPOOL" to the
> above. It's thus also missing in the add hunk part of the move and if not
> careful when resolving the reject, the depend will then be missing in the result.

Thanks for the headsup. Yeah I forgot to rebase before sending from an
older branch, I'll be sure to do that (paying attention to the zpool
depends) before sending the next version.

2021-08-30 18:06:35

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 4/4] mm: zswap: add basic meminfo and vmstat coverage

Hi Johannes,

I love your patch! Yet something to improve:

[auto build test ERROR on linux/master]
[cannot apply to hnaz-linux-mm/master block/for-next linus/master v5.14 next-20210830]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Johannes-Weiner/mm-Kconfig-move-swap-and-slab-config-options-to-the-MM-section/20210820-035613
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 349a2d52ffe59b7a0c5876fa7ee9f3eaf188b830
config: x86_64-rhel-8.3 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
# https://github.com/0day-ci/linux/commit/216a0ba919927dccd2dd26d7af1e395f4360002f
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Johannes-Weiner/mm-Kconfig-move-swap-and-slab-config-options-to-the-MM-section/20210820-035613
git checkout 216a0ba919927dccd2dd26d7af1e395f4360002f
# save the attached .config to linux build tree
mkdir build_dir
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

mm/zswap.c: In function 'zswap_frontswap_store':
>> mm/zswap.c:1234:17: error: 'ZSWAPOUT' undeclared (first use in this function); did you mean 'ZSWPOUT'?
1234 | count_vm_event(ZSWAPOUT);
| ^~~~~~~~
| ZSWPOUT
mm/zswap.c:1234:17: note: each undeclared identifier is reported only once for each function it appears in
mm/zswap.c: In function 'zswap_frontswap_load':
>> mm/zswap.c:1316:17: error: 'ZSWAPIN' undeclared (first use in this function); did you mean 'ZSWPIN'?
1316 | count_vm_event(ZSWAPIN);
| ^~~~~~~
| ZSWPIN


vim +1234 mm/zswap.c

1080
1081 /*********************************
1082 * frontswap hooks
1083 **********************************/
1084 /* attempts to compress and store an single page */
1085 static int zswap_frontswap_store(unsigned type, pgoff_t offset,
1086 struct page *page)
1087 {
1088 struct zswap_tree *tree = zswap_trees[type];
1089 struct zswap_entry *entry, *dupentry;
1090 struct scatterlist input, output;
1091 struct crypto_acomp_ctx *acomp_ctx;
1092 int ret;
1093 unsigned int hlen, dlen = PAGE_SIZE;
1094 unsigned long handle, value;
1095 char *buf;
1096 u8 *src, *dst;
1097 struct zswap_header zhdr = { .swpentry = swp_entry(type, offset) };
1098 gfp_t gfp;
1099
1100 /* THP isn't supported */
1101 if (PageTransHuge(page)) {
1102 ret = -EINVAL;
1103 goto reject;
1104 }
1105
1106 if (!zswap_enabled || !tree) {
1107 ret = -ENODEV;
1108 goto reject;
1109 }
1110
1111 /* reclaim space if needed */
1112 if (zswap_is_full()) {
1113 struct zswap_pool *pool;
1114
1115 zswap_pool_limit_hit++;
1116 zswap_pool_reached_full = true;
1117 pool = zswap_pool_last_get();
1118 if (pool)
1119 queue_work(shrink_wq, &pool->shrink_work);
1120 ret = -ENOMEM;
1121 goto reject;
1122 }
1123
1124 if (zswap_pool_reached_full) {
1125 if (!zswap_can_accept()) {
1126 ret = -ENOMEM;
1127 goto reject;
1128 } else
1129 zswap_pool_reached_full = false;
1130 }
1131
1132 /* allocate entry */
1133 entry = zswap_entry_cache_alloc(GFP_KERNEL);
1134 if (!entry) {
1135 zswap_reject_kmemcache_fail++;
1136 ret = -ENOMEM;
1137 goto reject;
1138 }
1139
1140 if (zswap_same_filled_pages_enabled) {
1141 src = kmap_atomic(page);
1142 if (zswap_is_page_same_filled(src, &value)) {
1143 kunmap_atomic(src);
1144 entry->offset = offset;
1145 entry->length = 0;
1146 entry->value = value;
1147 atomic_inc(&zswap_same_filled_pages);
1148 goto insert_entry;
1149 }
1150 kunmap_atomic(src);
1151 }
1152
1153 /* if entry is successfully added, it keeps the reference */
1154 entry->pool = zswap_pool_current_get();
1155 if (!entry->pool) {
1156 ret = -EINVAL;
1157 goto freepage;
1158 }
1159
1160 /* compress */
1161 acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
1162
1163 mutex_lock(acomp_ctx->mutex);
1164
1165 dst = acomp_ctx->dstmem;
1166 sg_init_table(&input, 1);
1167 sg_set_page(&input, page, PAGE_SIZE, 0);
1168
1169 /* zswap_dstmem is of size (PAGE_SIZE * 2). Reflect same in sg_list */
1170 sg_init_one(&output, dst, PAGE_SIZE * 2);
1171 acomp_request_set_params(acomp_ctx->req, &input, &output, PAGE_SIZE, dlen);
1172 /*
1173 * it maybe looks a little bit silly that we send an asynchronous request,
1174 * then wait for its completion synchronously. This makes the process look
1175 * synchronous in fact.
1176 * Theoretically, acomp supports users send multiple acomp requests in one
1177 * acomp instance, then get those requests done simultaneously. but in this
1178 * case, frontswap actually does store and load page by page, there is no
1179 * existing method to send the second page before the first page is done
1180 * in one thread doing frontswap.
1181 * but in different threads running on different cpu, we have different
1182 * acomp instance, so multiple threads can do (de)compression in parallel.
1183 */
1184 ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx->wait);
1185 dlen = acomp_ctx->req->dlen;
1186
1187 if (ret) {
1188 ret = -EINVAL;
1189 goto put_dstmem;
1190 }
1191
1192 /* store */
1193 hlen = zpool_evictable(entry->pool->zpool) ? sizeof(zhdr) : 0;
1194 gfp = __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM;
1195 if (zpool_malloc_support_movable(entry->pool->zpool))
1196 gfp |= __GFP_HIGHMEM | __GFP_MOVABLE;
1197 ret = zpool_malloc(entry->pool->zpool, hlen + dlen, gfp, &handle);
1198 if (ret == -ENOSPC) {
1199 zswap_reject_compress_poor++;
1200 goto put_dstmem;
1201 }
1202 if (ret) {
1203 zswap_reject_alloc_fail++;
1204 goto put_dstmem;
1205 }
1206 buf = zpool_map_handle(entry->pool->zpool, handle, ZPOOL_MM_RW);
1207 memcpy(buf, &zhdr, hlen);
1208 memcpy(buf + hlen, dst, dlen);
1209 zpool_unmap_handle(entry->pool->zpool, handle);
1210 mutex_unlock(acomp_ctx->mutex);
1211
1212 /* populate entry */
1213 entry->offset = offset;
1214 entry->handle = handle;
1215 entry->length = dlen;
1216
1217 insert_entry:
1218 /* map */
1219 spin_lock(&tree->lock);
1220 do {
1221 ret = zswap_rb_insert(&tree->rbroot, entry, &dupentry);
1222 if (ret == -EEXIST) {
1223 zswap_duplicate_entry++;
1224 /* remove from rbtree */
1225 zswap_rb_erase(&tree->rbroot, dupentry);
1226 zswap_entry_put(tree, dupentry);
1227 }
1228 } while (ret == -EEXIST);
1229 spin_unlock(&tree->lock);
1230
1231 /* update stats */
1232 atomic_inc(&zswap_stored_pages);
1233 zswap_update_total_size();
> 1234 count_vm_event(ZSWAPOUT);
1235
1236 return 0;
1237
1238 put_dstmem:
1239 mutex_unlock(acomp_ctx->mutex);
1240 zswap_pool_put(entry->pool);
1241 freepage:
1242 zswap_entry_cache_free(entry);
1243 reject:
1244 return ret;
1245 }
1246
1247 /*
1248 * returns 0 if the page was successfully decompressed
1249 * return -1 on entry not found or error
1250 */
1251 static int zswap_frontswap_load(unsigned type, pgoff_t offset,
1252 struct page *page)
1253 {
1254 struct zswap_tree *tree = zswap_trees[type];
1255 struct zswap_entry *entry;
1256 struct scatterlist input, output;
1257 struct crypto_acomp_ctx *acomp_ctx;
1258 u8 *src, *dst, *tmp;
1259 unsigned int dlen;
1260 int ret;
1261
1262 /* find */
1263 spin_lock(&tree->lock);
1264 entry = zswap_entry_find_get(&tree->rbroot, offset);
1265 if (!entry) {
1266 /* entry was written back */
1267 spin_unlock(&tree->lock);
1268 return -1;
1269 }
1270 spin_unlock(&tree->lock);
1271
1272 if (!entry->length) {
1273 dst = kmap_atomic(page);
1274 zswap_fill_page(dst, entry->value);
1275 kunmap_atomic(dst);
1276 ret = 0;
1277 goto stats;
1278 }
1279
1280 if (!zpool_can_sleep_mapped(entry->pool->zpool)) {
1281 tmp = kmalloc(entry->length, GFP_ATOMIC);
1282 if (!tmp) {
1283 ret = -ENOMEM;
1284 goto freeentry;
1285 }
1286 }
1287
1288 /* decompress */
1289 dlen = PAGE_SIZE;
1290 src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO);
1291 if (zpool_evictable(entry->pool->zpool))
1292 src += sizeof(struct zswap_header);
1293
1294 if (!zpool_can_sleep_mapped(entry->pool->zpool)) {
1295 memcpy(tmp, src, entry->length);
1296 src = tmp;
1297 zpool_unmap_handle(entry->pool->zpool, entry->handle);
1298 }
1299
1300 acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
1301 mutex_lock(acomp_ctx->mutex);
1302 sg_init_one(&input, src, entry->length);
1303 sg_init_table(&output, 1);
1304 sg_set_page(&output, page, PAGE_SIZE, 0);
1305 acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
1306 ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
1307 mutex_unlock(acomp_ctx->mutex);
1308
1309 if (zpool_can_sleep_mapped(entry->pool->zpool))
1310 zpool_unmap_handle(entry->pool->zpool, entry->handle);
1311 else
1312 kfree(tmp);
1313
1314 BUG_ON(ret);
1315 stats:
> 1316 count_vm_event(ZSWAPIN);
1317 freeentry:
1318 spin_lock(&tree->lock);
1319 zswap_entry_put(tree, entry);
1320 spin_unlock(&tree->lock);
1321
1322 return ret;
1323 }
1324

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (10.19 kB)
.config.gz (40.51 kB)
Download all attachments

2021-08-30 18:52:56

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 4/4] mm: zswap: add basic meminfo and vmstat coverage

Hi Johannes,

On Thu, Aug 19, 2021 at 03:55:33PM -0400, Johannes Weiner wrote:
> Currently it requires poking at debugfs to figure out the size of the
> zswap cache size on a host. There are no counters for reads and writes
> against the cache. This makes it difficult to understand behavior on
> production systems.
>
> Print zswap memory consumption in /proc/meminfo, count zswapouts and
> zswapins in /proc/vmstat.
>
> Signed-off-by: Johannes Weiner <[email protected]>
> ---
> fs/proc/meminfo.c | 4 ++++
> include/linux/swap.h | 4 ++++
> include/linux/vm_event_item.h | 4 ++++
> mm/vmstat.c | 4 ++++
> mm/zswap.c | 11 +++++------
> 5 files changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> index 6fa761c9cc78..2dc474940691 100644
> --- a/fs/proc/meminfo.c
> +++ b/fs/proc/meminfo.c
> @@ -86,6 +86,10 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
>
> show_val_kb(m, "SwapTotal: ", i.totalswap);
> show_val_kb(m, "SwapFree: ", i.freeswap);
> +#ifdef CONFIG_ZSWAP
> + seq_printf(m, "Zswap: %8lu kB\n",
> + (unsigned long)(zswap_pool_total_size >> 10));

Since we have zram as well as zswap, it would be great if
we can abstract both at once without introducing another
"Zram: " stuff in meminfo. A note: zram can support fs based on
on zram blk device as well as swap. Thus, term would be better
to say "compressed" rather than "swap".

How about this?

"Compressed: xx kB"

unsigined long total_compressed_memory(void) {
return zswap_compressed_mem() + zram_comporessed_mem();
}

> +#endif
> show_val_kb(m, "Dirty: ",
> global_node_page_state(NR_FILE_DIRTY));
> show_val_kb(m, "Writeback: ",
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 144727041e78..3b23c88b6a8d 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -696,6 +696,10 @@ static inline int mem_cgroup_swappiness(struct mem_cgroup *mem)
> }
> #endif
>
> +#ifdef CONFIG_ZSWAP
> +extern u64 zswap_pool_total_size;
> +#endif
> +
> #if defined(CONFIG_SWAP) && defined(CONFIG_MEMCG) && defined(CONFIG_BLK_CGROUP)
> extern void cgroup_throttle_swaprate(struct page *page, gfp_t gfp_mask);
> #else
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index ae0dd1948c2b..9dbebea09c69 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -125,6 +125,10 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> SWAP_RA,
> SWAP_RA_HIT,
> #endif
> +#ifdef CONFIG_ZSWAP
> + ZSWPIN,
> + ZSWPOUT,

INMEM_SWP[IN|OUT] to represent both zram and zswap ?
Feel free to suggest better word.

> +#endif
> #ifdef CONFIG_X86
> DIRECT_MAP_LEVEL2_SPLIT,
> DIRECT_MAP_LEVEL3_SPLIT,
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index cccee36b289c..31aada15c571 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1369,6 +1369,10 @@ const char * const vmstat_text[] = {
> "swap_ra",
> "swap_ra_hit",
> #endif
> +#ifdef CONFIG_ZSWAP
> + "zswpin",
> + "zswpout",
> +#endif
> #ifdef CONFIG_X86
> "direct_map_level2_splits",
> "direct_map_level3_splits",
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 20763267a219..f93a7c715f76 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -40,7 +40,7 @@
> * statistics
> **********************************/
> /* Total bytes used by the compressed storage */
> -static u64 zswap_pool_total_size;
> +u64 zswap_pool_total_size;
> /* The number of compressed pages currently stored in zswap */
> static atomic_t zswap_stored_pages = ATOMIC_INIT(0);
> /* The number of same-value filled pages currently stored in zswap */
> @@ -1231,6 +1231,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
> /* update stats */
> atomic_inc(&zswap_stored_pages);
> zswap_update_total_size();
> + count_vm_event(ZSWAPOUT);
>
> return 0;
>
> @@ -1273,11 +1274,10 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
> zswap_fill_page(dst, entry->value);
> kunmap_atomic(dst);
> ret = 0;
> - goto freeentry;
> + goto stats;
> }
>
> if (!zpool_can_sleep_mapped(entry->pool->zpool)) {
> -
> tmp = kmalloc(entry->length, GFP_ATOMIC);
> if (!tmp) {
> ret = -ENOMEM;
> @@ -1292,10 +1292,8 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
> src += sizeof(struct zswap_header);
>
> if (!zpool_can_sleep_mapped(entry->pool->zpool)) {
> -
> memcpy(tmp, src, entry->length);
> src = tmp;
> -
> zpool_unmap_handle(entry->pool->zpool, entry->handle);
> }
>
> @@ -1314,7 +1312,8 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
> kfree(tmp);
>
> BUG_ON(ret);
> -
> +stats:
> + count_vm_event(ZSWAPIN);
> freeentry:
> spin_lock(&tree->lock);
> zswap_entry_put(tree, entry);
> --
> 2.32.0
>
>

2021-11-02 19:20:05

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH 4/4] mm: zswap: add basic meminfo and vmstat coverage

Hi Minchan,

Sorry about the delay, I'm just now getting back to these patches.

On Mon, Aug 30, 2021 at 11:49:59AM -0700, Minchan Kim wrote:
> Hi Johannes,
>
> On Thu, Aug 19, 2021 at 03:55:33PM -0400, Johannes Weiner wrote:
> > Currently it requires poking at debugfs to figure out the size of the
> > zswap cache size on a host. There are no counters for reads and writes
> > against the cache. This makes it difficult to understand behavior on
> > production systems.
> >
> > Print zswap memory consumption in /proc/meminfo, count zswapouts and
> > zswapins in /proc/vmstat.
> >
> > Signed-off-by: Johannes Weiner <[email protected]>
> > ---
> > fs/proc/meminfo.c | 4 ++++
> > include/linux/swap.h | 4 ++++
> > include/linux/vm_event_item.h | 4 ++++
> > mm/vmstat.c | 4 ++++
> > mm/zswap.c | 11 +++++------
> > 5 files changed, 21 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> > index 6fa761c9cc78..2dc474940691 100644
> > --- a/fs/proc/meminfo.c
> > +++ b/fs/proc/meminfo.c
> > @@ -86,6 +86,10 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
> >
> > show_val_kb(m, "SwapTotal: ", i.totalswap);
> > show_val_kb(m, "SwapFree: ", i.freeswap);
> > +#ifdef CONFIG_ZSWAP
> > + seq_printf(m, "Zswap: %8lu kB\n",
> > + (unsigned long)(zswap_pool_total_size >> 10));
>
> Since we have zram as well as zswap, it would be great if
> we can abstract both at once without introducing another
> "Zram: " stuff in meminfo. A note: zram can support fs based on
> on zram blk device as well as swap. Thus, term would be better
> to say "compressed" rather than "swap".
>
> How about this?
>
> "Compressed: xx kB"

Wouldn't it make more sense to keep separate counters? Zswap and zram
are quite different from each other.

From an MM perspective, zram is an opaque storage backend. zswap OTOH
is an explicit MM cache stage which may in the future make different
decisions than zram, be integrated into vmscan's LRU hierarchy
etc. And in theory, you could put zswap with fast compression in front
of a zram device with denser compression, right?

I agree zram should probably also have memory counters, but I think it
makes sense to recognize zswap as a unique MM layer.

2021-11-10 19:11:31

by Minchan Kim

[permalink] [raw]
Subject: Re: [PATCH 4/4] mm: zswap: add basic meminfo and vmstat coverage

Hi Johannes,

On Tue, Nov 02, 2021 at 11:06:17AM -0400, Johannes Weiner wrote:
> Hi Minchan,
>
> Sorry about the delay, I'm just now getting back to these patches.
>
> On Mon, Aug 30, 2021 at 11:49:59AM -0700, Minchan Kim wrote:
> > Hi Johannes,
> >
> > On Thu, Aug 19, 2021 at 03:55:33PM -0400, Johannes Weiner wrote:
> > > Currently it requires poking at debugfs to figure out the size of the
> > > zswap cache size on a host. There are no counters for reads and writes
> > > against the cache. This makes it difficult to understand behavior on
> > > production systems.
> > >
> > > Print zswap memory consumption in /proc/meminfo, count zswapouts and
> > > zswapins in /proc/vmstat.
> > >
> > > Signed-off-by: Johannes Weiner <[email protected]>
> > > ---
> > > fs/proc/meminfo.c | 4 ++++
> > > include/linux/swap.h | 4 ++++
> > > include/linux/vm_event_item.h | 4 ++++
> > > mm/vmstat.c | 4 ++++
> > > mm/zswap.c | 11 +++++------
> > > 5 files changed, 21 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> > > index 6fa761c9cc78..2dc474940691 100644
> > > --- a/fs/proc/meminfo.c
> > > +++ b/fs/proc/meminfo.c
> > > @@ -86,6 +86,10 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
> > >
> > > show_val_kb(m, "SwapTotal: ", i.totalswap);
> > > show_val_kb(m, "SwapFree: ", i.freeswap);
> > > +#ifdef CONFIG_ZSWAP
> > > + seq_printf(m, "Zswap: %8lu kB\n",
> > > + (unsigned long)(zswap_pool_total_size >> 10));
> >
> > Since we have zram as well as zswap, it would be great if
> > we can abstract both at once without introducing another
> > "Zram: " stuff in meminfo. A note: zram can support fs based on
> > on zram blk device as well as swap. Thus, term would be better
> > to say "compressed" rather than "swap".
> >
> > How about this?
> >
> > "Compressed: xx kB"
>
> Wouldn't it make more sense to keep separate counters? Zswap and zram
> are quite different from each other.
>
> From an MM perspective, zram is an opaque storage backend. zswap OTOH
> is an explicit MM cache stage which may in the future make different
> decisions than zram, be integrated into vmscan's LRU hierarchy
> etc. And in theory, you could put zswap with fast compression in front
> of a zram device with denser compression, right?

My view is the the allocators aims to store compressed memory.
Likewise slab allocator, we could use the allocator any places
and display the total memory usage from the allocator in meminfo
instead of each subsystem and look at slabinfo if we need further
break down. I think it could work for this case, too.

>
> I agree zram should probably also have memory counters, but I think it
> makes sense to recognize zswap as a unique MM layer.

Under your view, I think it would introduce Zram-swap, Zram-block
as well as Zswap. If folks think it's better idea, I am fine and
happy to post a patch to merge it along with this patch.