2023-11-03 16:04:44

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v8 0/3] Implement MTE tag compression for swapped pages

Currently, when MTE pages are swapped out, the tags are kept in the
memory, occupying PAGE_SIZE/32 bytes per page. This is especially
problematic for devices that use zram-backed in-memory swap, because
tags stored uncompressed in the heap effectively reduce the available
amount of swap memory.

The RLE-based algorithm suggested by Evgenii Stepanov and implemented in
this patch series is able to efficiently compress fixed-size tag buffers,
resulting in practical compression ratio of 2x. In many cases it is
possible to store the compressed data in 63-bit Xarray values, resulting
in no extra memory allocations.

This patch series depends on "lib/bitmap: add bitmap_{read,write}()"
(https://lore.kernel.org/linux-arm-kernel/[email protected]/T/)
that is mailed separately.

v8:
- split off the bitmap_read()/bitmap_write() series
- simplified the compression logic (only compress data if it fits into
a pointer)

v7:
- fixed comments by Yury Norov, Andy Shevchenko, Rasmus Villemoes
- added perf tests for bitmap_read()/bitmap_write()
- more efficient bitmap_write() implementation (meant to be sent in v5)

v6:
- fixed comments by Yury Norov
- fixed handling of sizes divisible by MTE_GRANULES_PER_PAGE / 2
(caught while testing on a real device)

v5:
- fixed comments by Andy Shevchenko, Catalin Marinas, and Yury Norov
- added support for 16K- and 64K pages
- more efficient bitmap_write() implementation

v4:
- fixed a bunch of comments by Andy Shevchenko and Yury Norov
- added Documentation/arch/arm64/mte-tag-compression.rst

v3:
- as suggested by Andy Shevchenko, use
bitmap_get_value()/bitmap_set_value() written by Syed Nayyar Waris
- switched to unsigned long to reduce typecasts
- simplified the compression code

v2:
- as suggested by Yuri Norov, replace the poorly implemented struct
bitq with <linux/bitmap.h>



Alexander Potapenko (3):
arm64: mte: implement CONFIG_ARM64_MTE_COMP
arm64: mte: add a test for MTE tags compression
arm64: mte: add compression support to mteswap.c

Documentation/arch/arm64/index.rst | 1 +
.../arch/arm64/mte-tag-compression.rst | 154 ++++++++
arch/arm64/Kconfig | 21 +
arch/arm64/include/asm/mtecomp.h | 39 ++
arch/arm64/mm/Makefile | 2 +
arch/arm64/mm/mtecomp.c | 257 +++++++++++++
arch/arm64/mm/mtecomp.h | 12 +
arch/arm64/mm/mteswap.c | 88 ++++-
arch/arm64/mm/test_mtecomp.c | 364 ++++++++++++++++++
9 files changed, 934 insertions(+), 4 deletions(-)
create mode 100644 Documentation/arch/arm64/mte-tag-compression.rst
create mode 100644 arch/arm64/include/asm/mtecomp.h
create mode 100644 arch/arm64/mm/mtecomp.c
create mode 100644 arch/arm64/mm/mtecomp.h
create mode 100644 arch/arm64/mm/test_mtecomp.c

--
2.42.0.869.gea05f2083d-goog


2023-11-03 16:05:02

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v8 1/3] arm64: mte: implement CONFIG_ARM64_MTE_COMP

The config implements the algorithm compressing memory tags for ARM MTE
during swapping.

The algorithm is based on RLE and specifically targets buffers of tags
corresponding to a single page. In many cases a buffer can be compressed
into 63 bits, making it possible to store it without additional memory
allocation.

Suggested-by: Evgenii Stepanov <[email protected]>
Signed-off-by: Alexander Potapenko <[email protected]>

---
v8:
- As suggested by Catalin Marinas, only compress tags if they can be
stored inline. This simplifies the code drastically.
- Update the documentation.
- Split off patches introducing bitmap_read()/bitmap_write().

v6:
- shuffle bits in inline handles so that they can't be confused with
canonical pointers;
- use kmem_cache_zalloc() to allocate compressed storage
- correctly handle range size overflow
- minor documentation fixes, clarify the special cases

v5:
- make code PAGE_SIZE-agnostic, remove hardcoded constants, updated
the docs
- implement debugfs interface
- Address comments by Andy Shevchenko:
- update description of mtecomp.c
- remove redundant assignments, simplify mte_tags_to_ranges()
- various code simplifications
- introduce mtecomp.h
- add :export: to Documentation/arch/arm64/mte-tag-compression.rst

v4:
- Addressed comments by Andy Shevchenko:
- expanded "MTE" to "Memory Tagging Extension" in Kconfig
- fixed kernel-doc comments, moved them to C source
- changed variables to unsigned where applicable
- some code simplifications, fewer unnecessary assignments
- added the mte_largest_idx_bits() helper
- added namespace prefixes to all functions
- added missing headers (but removed bits.h)
- Addressed comments by Yury Norov:
- removed test-only functions from mtecomp.h
- dropped the algoritm name (all functions are now prefixed with
"mte")
- added more comments
- got rid of MTE_RANGES_INLINE
- renamed bitmap_{get,set}_value() to bitmap_{read,write}()
- moved the big comment explaining the algorithm to
Documentation/arch/arm64/mte-tag-compression.rst, expanded it,
add a link to it from Documentation/arch/arm64/index.rst
- removed hardcoded ranges from mte_alloc_size()/mte_size_to_ranges()

v3:
- Addressed comments by Andy Shevchenko:
- use bitmap_{set,get}_value() writte by Syed Nayyar Waris
- switched to unsigned long everywhere (fewer casts)
- simplified the code, removed redundant checks
- dropped ea0_compress_inline()
- added bit size constants and helpers to access the bitmap
- explicitly initialize all compressed sizes in ea0_compress_to_buf()
- initialize all handle bits

v2:
- as suggested by Yury Norov, switched from struct bitq (which is
not needed anymore) to <linux/bitmap.h>
- add missing symbol exports
---
Documentation/arch/arm64/index.rst | 1 +
.../arch/arm64/mte-tag-compression.rst | 154 +++++++++++
arch/arm64/Kconfig | 11 +
arch/arm64/include/asm/mtecomp.h | 39 +++
arch/arm64/mm/Makefile | 1 +
arch/arm64/mm/mtecomp.c | 257 ++++++++++++++++++
arch/arm64/mm/mtecomp.h | 12 +
7 files changed, 475 insertions(+)
create mode 100644 Documentation/arch/arm64/mte-tag-compression.rst
create mode 100644 arch/arm64/include/asm/mtecomp.h
create mode 100644 arch/arm64/mm/mtecomp.c
create mode 100644 arch/arm64/mm/mtecomp.h

diff --git a/Documentation/arch/arm64/index.rst b/Documentation/arch/arm64/index.rst
index d08e924204bf1..bf6c1583233a9 100644
--- a/Documentation/arch/arm64/index.rst
+++ b/Documentation/arch/arm64/index.rst
@@ -19,6 +19,7 @@ ARM64 Architecture
legacy_instructions
memory
memory-tagging-extension
+ mte-tag-compression
perf
pointer-authentication
ptdump
diff --git a/Documentation/arch/arm64/mte-tag-compression.rst b/Documentation/arch/arm64/mte-tag-compression.rst
new file mode 100644
index 0000000000000..8fe6b51a9db6d
--- /dev/null
+++ b/Documentation/arch/arm64/mte-tag-compression.rst
@@ -0,0 +1,154 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================================================
+Tag Compression for Memory Tagging Extension (MTE)
+==================================================
+
+This document describes the algorithm used to compress memory tags used by the
+ARM Memory Tagging Extension (MTE).
+
+Introduction
+============
+
+MTE assigns tags to memory pages: for 4K pages those tags occupy 128 bytes
+(256 4-bit tags each corresponding to a 16-byte MTE granule), for 16K pages -
+512 bytes, for 64K pages - 2048 bytes. By default, MTE carves out 3.125% (1/16)
+of the available physical memory to store the tags.
+
+When MTE pages are saved to swap, their tags need to be stored in the kernel
+memory. If the system swap is used heavily, these tags may take a substantial
+portion of the physical memory. To reduce memory waste, ``CONFIG_ARM64_MTE_COMP``
+allows the kernel to store the tags in compressed form.
+
+Implementation details
+======================
+
+The algorithm attempts to compress an array of ``MTE_PAGE_TAG_STORAGE``
+tag bytes into a byte sequence that can be stored in an 8-byte pointer. If that
+is not possible, the data is stored uncompressed.
+
+Tag manipulation and storage
+----------------------------
+
+Tags for swapped pages are stored in an XArray that maps swap entries to 63-bit
+values (see ``arch/arm64/mm/mteswap.c``). Bit 0 of these values indicates how
+their contents should be treated:
+
+ - 0: value is a pointer to an uncompressed buffer allocated with kmalloc()
+ (always the case if ``CONFIG_ARM64_MTE_COMP=n``) with the highest bit set
+ to 0;
+ - 1: value contains compressed data.
+
+``arch/arm64/include/asm/mtecomp.h`` declares the following functions that
+manipulate with tags:
+
+- mte_compress() - compresses the given ``MTE_PAGE_TAG_STORAGE``-byte ``tags``
+ buffer into a pointer;
+- mte_decompress() - decompresses the tags from a pointer;
+- mte_is_compressed() - returns ``true`` iff the pointer passed to it should be
+ treated as compressed data.
+
+Tag compression
+---------------
+
+The compression algorithm is a variation of RLE (run-length encoding) and works
+as follows (we will be considering 4K pages and 128-byte tag buffers, but the
+same approach scales to 16K and 64K pages):
+
+1. The input array of 128 (``MTE_PAGE_TAG_STORAGE``) bytes is transformed into
+ tag ranges (two arrays: ``r_tags[]`` containing tag values and ``r_sizes[]``
+ containing range lengths) by mte_tags_to_ranges(). Note that
+ ``r_sizes[]`` sums up to 256 (``MTE_GRANULES_PER_PAGE``).
+
+ If ``r_sizes[]`` consists of a single element
+ (``{ MTE_GRANULES_PER_PAGE }``), the corresponding range is split into two
+ halves, i.e.::
+
+ r_sizes_new[2] = { MTE_GRANULES_PER_PAGE/2, MTE_GRANULES_PER_PAGE/2 };
+ r_tags_new[2] = { r_tags[0], r_tags[0] };
+
+2. The number of the largest element of ``r_sizes[]`` is stored in
+ ``largest_idx``. The element itself is thrown away from ``r_sizes[]``,
+ because it can be reconstructed from the sum of the remaining elements. Note
+ that now none of the remaining ``r_sizes[]`` elements exceeds
+ ``MTE_GRANULES_PER_PAGE/2``.
+
+3. If the number ``N`` of ranges does not exceed ``6``, the ranges can be
+ compressed into 64 bits. This is done by storing the following values packed
+ into the pointer (``i<size>`` means a ``<size>``-bit unsigned integer)
+ treated as a bitmap (see ``include/linux/bitmap.h``)::
+
+ bit 0 : (always 1) : i1
+ bits 1-3 : largest_idx : i3
+ bits 4-27 : r_tags[0..5] : i4 x 6
+ bits 28-62 : r_sizes[0..4] : i7 x 5
+ bit 63 : (always 0) : i1
+
+ If N is less than 6, ``r_tags`` and ``r_sizes`` are padded up with zero
+ values. The unused bits in the pointer, including bit 63, are also set to 0,
+ so the compressed data can be stored in XArray.
+
+ Range size of ``MTE_GRANULES_PER_PAGE/2`` (at most one) does not fit into
+ i7 and will be written as 0. This case is handled separately by the
+ decompressing procedure.
+
+Tag decompression
+-----------------
+
+The decompression algorithm performs the steps below.
+
+1. Read the lowest bit of the data from the input buffer and check that it is 1,
+ otherwise bail out.
+
+2. Read ``largest_idx``, ``r_tags[]`` and ``r_sizes[]`` from the
+ input buffer.
+
+ If ``largest_idx`` is zero, and all ``r_sizes[]`` are zero, set
+ ``r_sizes[0] = MTE_GRANULES_PER_PAGE/2``.
+
+ Calculate the removed largest element of ``r_sizes[]`` as
+ ``largest = 256 - sum(r_sizes)`` and insert it into ``r_sizes`` at
+ position ``largest_idx``.
+
+6. For each ``r_sizes[i] > 0``, add a 4-bit value ``r_tags[i]`` to the output
+ buffer ``r_sizes[i]`` times.
+
+
+Why these numbers?
+------------------
+
+To be able to reconstruct ``N`` tag ranges from the compressed data, we need to
+store the indicator bit together with ``largest_idx``, ``r_tags[N]``, and
+``r_sizes[N-1]`` in 63 bits.
+Knowing that the sizes do not exceed ``MTE_PAGE_TAG_STORAGE``, each of them can be
+packed into ``S = ilog2(MTE_PAGE_TAG_STORAGE)`` bits, whereas a single tag occupies
+4 bits.
+
+It is evident that the number of ranges that can be stored in 63 bits is
+strictly less than 8, therefore we only need 3 bits to store ``largest_idx``.
+
+The maximum values of ``N`` so that the number ``1 + 3 + N * 4 + (N-1) * S`` of
+storage bits does not exceed 63, are shown in the table below::
+
+ +-----------+-----------------+----+---+-------------------+
+ | Page size | Tag buffer size | S | N | Storage bits |
+ +-----------+-----------------+----+---+-------------------+
+ | 4 KB | 128 B | 7 | 6 | 63 = 1+3+6*4+5*7 |
+ | 16 KB | 512 B | 9 | 5 | 60 = 1+3+5*4+4*9 |
+ | 64 KB | 2048 B | 11 | 4 | 53 = 1+3+4*4+3*11 |
+ +-----------+-----------------+----+---+-------------------+
+
+Note
+----
+
+Tag compression and decompression implicitly rely on the fixed MTE tag size
+(4 bits) and number of tags per page. Should these values change, the algorithm
+may need to be revised.
+
+
+Programming Interface
+=====================
+
+ .. kernel-doc:: arch/arm64/include/asm/mtecomp.h
+ .. kernel-doc:: arch/arm64/mm/mtecomp.c
+ :export:
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 78f20e6327120..c2fbe8e492e00 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2073,6 +2073,17 @@ config ARM64_EPAN
if the cpu does not implement the feature.
endmenu # "ARMv8.7 architectural features"

+config ARM64_MTE_COMP
+ bool "Tag compression for ARM64 Memory Tagging Extension"
+ default y
+ depends on ARM64_MTE
+ help
+ Enable tag compression support for ARM64 Memory Tagging Extension.
+
+ Tag buffers corresponding to swapped RAM pages are compressed using
+ RLE to conserve heap memory. In the common case compressed tags
+ occupy 2.5x less memory.
+
config ARM64_SVE
bool "ARM Scalable Vector Extension support"
default y
diff --git a/arch/arm64/include/asm/mtecomp.h b/arch/arm64/include/asm/mtecomp.h
new file mode 100644
index 0000000000000..b9a3a921a38d4
--- /dev/null
+++ b/arch/arm64/include/asm/mtecomp.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_MTECOMP_H
+#define __ASM_MTECOMP_H
+
+#include <linux/types.h>
+
+/**
+ * mte_is_compressed() - check if the supplied pointer contains compressed tags.
+ * @ptr: pointer returned by kmalloc() or mte_compress().
+ *
+ * Returns: true iff bit 0 of @ptr is 1, which is only possible if @ptr was
+ * returned by mte_is_compressed().
+ */
+static inline bool mte_is_compressed(void *ptr)
+{
+ return ((unsigned long)ptr & 1);
+}
+
+#if defined(CONFIG_ARM64_MTE_COMP)
+
+void *mte_compress(u8 *tags);
+bool mte_decompress(void *handle, u8 *tags);
+
+#else
+
+static inline void *mte_compress(u8 *tags)
+{
+ return NULL;
+}
+
+static inline bool mte_decompress(void *data, u8 *tags)
+{
+ return false;
+}
+
+#endif // CONFIG_ARM64_MTE_COMP
+
+#endif // __ASM_MTECOMP_H
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index dbd1bc95967d0..46778f6dd83c2 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_TRANS_TABLE) += trans_pgd.o
obj-$(CONFIG_TRANS_TABLE) += trans_pgd-asm.o
obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o
obj-$(CONFIG_ARM64_MTE) += mteswap.o
+obj-$(CONFIG_ARM64_MTE_COMP) += mtecomp.o
KASAN_SANITIZE_physaddr.o += n

obj-$(CONFIG_KASAN) += kasan_init.o
diff --git a/arch/arm64/mm/mtecomp.c b/arch/arm64/mm/mtecomp.c
new file mode 100644
index 0000000000000..c948921525030
--- /dev/null
+++ b/arch/arm64/mm/mtecomp.c
@@ -0,0 +1,257 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * MTE tag compression algorithm.
+ * See Documentation/arch/arm64/mte-tag-compression.rst for more details.
+ */
+
+#include <linux/bits.h>
+#include <linux/bitmap.h>
+#include <linux/bitops.h>
+#include <linux/export.h>
+#include <linux/gfp.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/types.h>
+
+#include <asm/mtecomp.h>
+
+#include "mtecomp.h"
+
+#define MTE_BITS_PER_LARGEST_IDX 3
+/* Range size cannot exceed MTE_GRANULES_PER_PAGE / 2. */
+#define MTE_BITS_PER_SIZE (ilog2(MTE_GRANULES_PER_PAGE) - 1)
+
+/*
+ * See Documentation/arch/arm64/mte-tag-compression.rst for details on how the
+ * maximum number of ranges is calculated.
+ */
+#if defined(CONFIG_ARM64_4K_PAGES)
+#define MTE_MAX_RANGES 6
+#elif defined(CONFIG_ARM64_16K_PAGES)
+#define MTE_MAX_RANGES 5
+#else
+#define MTE_MAX_RANGES 4
+#endif
+
+/**
+ * mte_tags_to_ranges() - break @tags into arrays of tag ranges.
+ * @tags: MTE_GRANULES_PER_PAGE-byte array containing MTE tags.
+ * @out_tags: u8 array to store the tag of every range.
+ * @out_sizes: unsigned short array to store the size of every range.
+ * @out_len: length of @out_tags and @out_sizes (output parameter, initially
+ * equal to lengths of out_tags[] and out_sizes[]).
+ *
+ * This function is exported for testing purposes.
+ */
+void mte_tags_to_ranges(u8 *tags, u8 *out_tags, unsigned short *out_sizes,
+ size_t *out_len)
+{
+ u8 prev_tag = tags[0] / 16; /* First tag in the array. */
+ unsigned int cur_idx = 0, i, j;
+ u8 cur_tag;
+
+ memset(out_tags, 0, array_size(*out_len, sizeof(*out_tags)));
+ memset(out_sizes, 0, array_size(*out_len, sizeof(*out_sizes)));
+
+ out_tags[cur_idx] = prev_tag;
+ for (i = 0; i < MTE_GRANULES_PER_PAGE; i++) {
+ j = i % 2;
+ cur_tag = j ? (tags[i / 2] % 16) : (tags[i / 2] / 16);
+ if (cur_tag == prev_tag) {
+ out_sizes[cur_idx]++;
+ } else {
+ cur_idx++;
+ prev_tag = cur_tag;
+ out_tags[cur_idx] = prev_tag;
+ out_sizes[cur_idx] = 1;
+ }
+ }
+ *out_len = cur_idx + 1;
+}
+EXPORT_SYMBOL_NS(mte_tags_to_ranges, MTECOMP);
+
+/**
+ * mte_ranges_to_tags() - fill @tags using given tag ranges.
+ * @r_tags: u8[] containing the tag of every range.
+ * @r_sizes: unsigned short[] containing the size of every range.
+ * @r_len: length of @r_tags and @r_sizes.
+ * @tags: MTE_GRANULES_PER_PAGE-byte array to write the tags to.
+ *
+ * This function is exported for testing purposes.
+ */
+void mte_ranges_to_tags(u8 *r_tags, unsigned short *r_sizes, size_t r_len,
+ u8 *tags)
+{
+ unsigned int i, j, pos = 0;
+ u8 prev;
+
+ for (i = 0; i < r_len; i++) {
+ for (j = 0; j < r_sizes[i]; j++) {
+ if (pos % 2)
+ tags[pos / 2] = (prev << 4) | r_tags[i];
+ else
+ prev = r_tags[i];
+ pos++;
+ }
+ }
+}
+EXPORT_SYMBOL_NS(mte_ranges_to_tags, MTECOMP);
+
+static void mte_bitmap_write(unsigned long *bitmap, unsigned long value,
+ unsigned long *pos, unsigned long bits)
+{
+ bitmap_write(bitmap, value, *pos, bits);
+ *pos += bits;
+}
+
+/* Compress ranges into an unsigned long. */
+static void mte_compress_to_ulong(size_t len, u8 *tags, unsigned short *sizes,
+ unsigned long *result)
+{
+ unsigned long bit_pos = 0;
+ unsigned int largest_idx, i;
+ unsigned short largest = 0;
+
+ for (i = 0; i < len; i++) {
+ if (sizes[i] > largest) {
+ largest = sizes[i];
+ largest_idx = i;
+ }
+ }
+ /* Bit 1 in position 0 indicates compressed data. */
+ mte_bitmap_write(result, 1, &bit_pos, 1);
+ mte_bitmap_write(result, largest_idx, &bit_pos,
+ MTE_BITS_PER_LARGEST_IDX);
+ for (i = 0; i < len; i++)
+ mte_bitmap_write(result, tags[i], &bit_pos, MTE_TAG_SIZE);
+ if (len == 1) {
+ /*
+ * We are compressing MTE_GRANULES_PER_PAGE of identical tags.
+ * Split it into two ranges containing
+ * MTE_GRANULES_PER_PAGE / 2 tags, so that it falls into the
+ * special case described below.
+ */
+ mte_bitmap_write(result, tags[0], &bit_pos, MTE_TAG_SIZE);
+ i = 2;
+ } else {
+ i = len;
+ }
+ for (; i < MTE_MAX_RANGES; i++)
+ mte_bitmap_write(result, 0, &bit_pos, MTE_TAG_SIZE);
+ /*
+ * Most of the time sizes[i] fits into MTE_BITS_PER_SIZE, apart from a
+ * special case when:
+ * len = 2;
+ * sizes = { MTE_GRANULES_PER_PAGE / 2, MTE_GRANULES_PER_PAGE / 2};
+ * In this case largest_idx will be set to 0, and the size written to
+ * the bitmap will be also 0.
+ */
+ for (i = 0; i < len; i++) {
+ if (i != largest_idx)
+ mte_bitmap_write(result, sizes[i], &bit_pos,
+ MTE_BITS_PER_SIZE);
+ }
+ for (i = len; i < MTE_MAX_RANGES; i++)
+ mte_bitmap_write(result, 0, &bit_pos, MTE_BITS_PER_SIZE);
+}
+
+/**
+ * mte_compress() - compress the given tag array.
+ * @tags: MTE_GRANULES_PER_PAGE-byte array to read the tags from.
+ *
+ * Attempts to compress the user-supplied tag array.
+ *
+ * Returns: compressed data or NULL.
+ */
+void *mte_compress(u8 *tags)
+{
+ unsigned short *r_sizes;
+ void *result = NULL;
+ u8 *r_tags;
+ size_t r_len;
+
+ r_sizes = kmalloc_array(MTE_GRANULES_PER_PAGE, sizeof(unsigned short),
+ GFP_KERNEL);
+ r_tags = kmalloc(MTE_GRANULES_PER_PAGE, GFP_KERNEL);
+ if (!r_sizes || !r_tags)
+ goto ret;
+ r_len = MTE_GRANULES_PER_PAGE;
+ mte_tags_to_ranges(tags, r_tags, r_sizes, &r_len);
+ if (r_len <= MTE_MAX_RANGES)
+ mte_compress_to_ulong(r_len, r_tags, r_sizes,
+ (unsigned long *)&result);
+ret:
+ kfree(r_tags);
+ kfree(r_sizes);
+ return result;
+}
+EXPORT_SYMBOL_NS(mte_compress, MTECOMP);
+
+static unsigned long mte_bitmap_read(const unsigned long *bitmap,
+ unsigned long *pos, unsigned long bits)
+{
+ unsigned long start = *pos;
+
+ *pos += bits;
+ return bitmap_read(bitmap, start, bits);
+}
+
+/**
+ * mte_decompress() - decompress the tag array from the given pointer.
+ * @data: pointer returned by @mte_compress()
+ * @tags: MTE_GRANULES_PER_PAGE-byte array to write the tags to.
+ *
+ * Reads the compressed data and writes it into the user-supplied tag array.
+ *
+ * Returns: true on success, false if the passed data is uncompressed.
+ */
+bool mte_decompress(void *data, u8 *tags)
+{
+ unsigned short r_sizes[MTE_MAX_RANGES];
+ u8 r_tags[MTE_MAX_RANGES];
+ unsigned int largest_idx, i;
+ unsigned long bit_pos = 0;
+ unsigned long *bitmap;
+ unsigned short sum;
+ size_t max_ranges;
+
+ if (!mte_is_compressed(data))
+ return false;
+
+ bitmap = (unsigned long *)&data;
+ max_ranges = MTE_MAX_RANGES;
+ /* Skip the leading bit indicating the inline case. */
+ mte_bitmap_read(bitmap, &bit_pos, 1);
+ largest_idx =
+ mte_bitmap_read(bitmap, &bit_pos, MTE_BITS_PER_LARGEST_IDX);
+ if (largest_idx >= MTE_MAX_RANGES)
+ return false;
+
+ for (i = 0; i < max_ranges; i++)
+ r_tags[i] = mte_bitmap_read(bitmap, &bit_pos, MTE_TAG_SIZE);
+ for (i = 0, sum = 0; i < max_ranges; i++) {
+ if (i == largest_idx)
+ continue;
+ r_sizes[i] =
+ mte_bitmap_read(bitmap, &bit_pos, MTE_BITS_PER_SIZE);
+ /*
+ * Special case: tag array consists of two ranges of
+ * `MTE_GRANULES_PER_PAGE / 2` tags.
+ */
+ if ((largest_idx == 0) && (i == 1) && (r_sizes[i] == 0))
+ r_sizes[i] = MTE_GRANULES_PER_PAGE / 2;
+ if (!r_sizes[i]) {
+ max_ranges = i;
+ break;
+ }
+ sum += r_sizes[i];
+ }
+ if (sum >= MTE_GRANULES_PER_PAGE)
+ return false;
+ r_sizes[largest_idx] = MTE_GRANULES_PER_PAGE - sum;
+ mte_ranges_to_tags(r_tags, r_sizes, max_ranges, tags);
+ return true;
+}
+EXPORT_SYMBOL_NS(mte_decompress, MTECOMP);
diff --git a/arch/arm64/mm/mtecomp.h b/arch/arm64/mm/mtecomp.h
new file mode 100644
index 0000000000000..b94cf0384f2af
--- /dev/null
+++ b/arch/arm64/mm/mtecomp.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef ARCH_ARM64_MM_MTECOMP_H_
+#define ARCH_ARM64_MM_MTECOMP_H_
+
+/* Functions exported from mtecomp.c for test_mtecomp.c. */
+void mte_tags_to_ranges(u8 *tags, u8 *out_tags, unsigned short *out_sizes,
+ size_t *out_len);
+void mte_ranges_to_tags(u8 *r_tags, unsigned short *r_sizes, size_t r_len,
+ u8 *tags);
+
+#endif // ARCH_ARM64_MM_TEST_MTECOMP_H_
--
2.42.0.869.gea05f2083d-goog

2023-11-03 16:06:25

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v8 3/3] arm64: mte: add compression support to mteswap.c

Update mteswap.c to perform inline compression of memory tags when
possible.

If CONFIG_ARM64_MTE_COMP is enabled, mteswap.c will attemt to compress
saved tags for a struct page and store them directly in Xarray entry
instead of wasting heap space.

Soon after booting Android, tag compression saves ~2x memory previously
spent by mteswap.c on tag allocations. On a moderately loaded device with
~20% tagged pages, this leads to saving several megabytes of kernel heap:

# cat /sys/kernel/debug/mteswap/stats
8 bytes: 102496 allocations, 67302 deallocations
128 bytes: 212234 allocations, 178278 deallocations
uncompressed tag storage size: 8851200
compressed tag storage size: 4346368

Signed-off-by: Alexander Potapenko <[email protected]>

---
v8:
- adapt to the new compression API, abandon mteswap_{no,}comp.c
- move stats collection to mteswap.c

v5:
- drop a dead variable from _mte_free_saved_tags() in mteswap_comp.c
- ensure MTE compression works with arbitrary page sizes
- update patch description

v4:
- minor code simplifications suggested by Andy Shevchenko, added
missing header dependencies
- changed compression API names to reflect modifications made to
memcomp.h (as suggested by Yury Norov)

v3:
- Addressed comments by Andy Shevchenko in another patch:
- fixed includes order
- replaced u64 with unsigned long
- added MODULE_IMPORT_NS(MTECOMP)
---
arch/arm64/mm/mteswap.c | 88 +++++++++++++++++++++++++++++++++++++++--
1 file changed, 84 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/mteswap.c b/arch/arm64/mm/mteswap.c
index a31833e3ddc54..0f558942d88b8 100644
--- a/arch/arm64/mm/mteswap.c
+++ b/arch/arm64/mm/mteswap.c
@@ -1,28 +1,48 @@
// SPDX-License-Identifier: GPL-2.0-only

+#include <linux/debugfs.h>
#include <linux/pagemap.h>
#include <linux/xarray.h>
#include <linux/slab.h>
#include <linux/swap.h>
#include <linux/swapops.h>
#include <asm/mte.h>
+#include <asm/mtecomp.h>
+#include "mtecomp.h"
+
+enum mteswap_counters {
+ MTESWAP_CTR_INLINE = 0,
+ MTESWAP_CTR_NOINLINE,
+ MTESWAP_CTR_SIZE
+};
+static atomic_long_t alloc_counters[MTESWAP_CTR_SIZE];
+static atomic_long_t dealloc_counters[MTESWAP_CTR_SIZE];

static DEFINE_XARRAY(mte_pages);

void *mte_allocate_tag_storage(void)
{
- /* tags granule is 16 bytes, 2 tags stored per byte */
- return kmalloc(MTE_PAGE_TAG_STORAGE, GFP_KERNEL);
+ void *ret;
+
+ ret = kmalloc(MTE_PAGE_TAG_STORAGE, GFP_KERNEL);
+ if (ret)
+ atomic_long_inc(&alloc_counters[MTESWAP_CTR_NOINLINE]);
+ return ret;
}

void mte_free_tag_storage(char *storage)
{
- kfree(storage);
+ if (!mte_is_compressed(storage)) {
+ kfree(storage);
+ atomic_long_dec(&alloc_counters[MTESWAP_CTR_NOINLINE]);
+ } else {
+ atomic_long_dec(&alloc_counters[MTESWAP_CTR_INLINE]);
+ }
}

int mte_save_tags(struct page *page)
{
- void *tag_storage, *ret;
+ void *tag_storage, *ret, *compressed;

if (!page_mte_tagged(page))
return 0;
@@ -32,6 +52,12 @@ int mte_save_tags(struct page *page)
return -ENOMEM;

mte_save_page_tags(page_address(page), tag_storage);
+ compressed = mte_compress(tag_storage);
+ if (compressed) {
+ mte_free_tag_storage(tag_storage);
+ tag_storage = (void *)compressed;
+ atomic_long_inc(&alloc_counters[MTESWAP_CTR_INLINE]);
+ }

/* lookup the swap entry.val from the page */
ret = xa_store(&mte_pages, page_swap_entry(page).val, tag_storage,
@@ -50,13 +76,20 @@ int mte_save_tags(struct page *page)
void mte_restore_tags(swp_entry_t entry, struct page *page)
{
void *tags = xa_load(&mte_pages, entry.val);
+ void *tag_storage = NULL;

if (!tags)
return;

if (try_page_mte_tagging(page)) {
+ if (mte_is_compressed(tags)) {
+ tag_storage = mte_allocate_tag_storage();
+ mte_decompress(tags, tag_storage);
+ tags = tag_storage;
+ }
mte_restore_page_tags(page_address(page), tags);
set_page_mte_tagged(page);
+ mte_free_tag_storage(tag_storage);
}
}

@@ -83,3 +116,50 @@ void mte_invalidate_tags_area(int type)
}
xa_unlock(&mte_pages);
}
+
+/* DebugFS interface. */
+static int stats_show(struct seq_file *seq, void *v)
+{
+ unsigned long total_mem_alloc = 0, total_mem_dealloc = 0;
+ unsigned long total_num_alloc = 0, total_num_dealloc = 0;
+ unsigned long sizes[2] = { 8, MTE_PAGE_TAG_STORAGE };
+ long alloc, dealloc;
+ unsigned long size;
+ int i;
+
+ for (i = 0; i < MTESWAP_CTR_SIZE; i++) {
+ alloc = atomic_long_read(&alloc_counters[i]);
+ dealloc = atomic_long_read(&dealloc_counters[i]);
+ total_num_alloc += alloc;
+ total_num_dealloc += dealloc;
+ size = sizes[i];
+ /*
+ * Do not count 8-byte buffers towards compressed tag storage
+ * size.
+ */
+ if (i) {
+ total_mem_alloc += (size * alloc);
+ total_mem_dealloc += (size * dealloc);
+ }
+ seq_printf(seq,
+ "%lu bytes: %lu allocations, %lu deallocations\n",
+ size, alloc, dealloc);
+ }
+ seq_printf(seq, "uncompressed tag storage size: %lu\n",
+ (total_num_alloc - total_num_dealloc) *
+ MTE_PAGE_TAG_STORAGE);
+ seq_printf(seq, "compressed tag storage size: %lu\n",
+ total_mem_alloc - total_mem_dealloc);
+ return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(stats);
+
+static int mteswap_init(void)
+{
+ struct dentry *mteswap_dir;
+
+ mteswap_dir = debugfs_create_dir("mteswap", NULL);
+ debugfs_create_file("stats", 0444, mteswap_dir, NULL, &stats_fops);
+ return 0;
+}
+module_init(mteswap_init);
--
2.42.0.869.gea05f2083d-goog

2023-11-03 22:05:34

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH v8 3/3] arm64: mte: add compression support to mteswap.c

On Fri, Nov 03, 2023 at 05:03:35PM +0100, Alexander Potapenko wrote:
> Update mteswap.c to perform inline compression of memory tags when
> possible.
>
> If CONFIG_ARM64_MTE_COMP is enabled, mteswap.c will attemt to compress

attempt?

> saved tags for a struct page and store them directly in Xarray entry
> instead of wasting heap space.
>
> Soon after booting Android, tag compression saves ~2x memory previously
> spent by mteswap.c on tag allocations. On a moderately loaded device with
> ~20% tagged pages, this leads to saving several megabytes of kernel heap:
>
> # cat /sys/kernel/debug/mteswap/stats
> 8 bytes: 102496 allocations, 67302 deallocations
> 128 bytes: 212234 allocations, 178278 deallocations
> uncompressed tag storage size: 8851200
> compressed tag storage size: 4346368

Can you align them like this:

# cat /sys/kernel/debug/mteswap/stats
8 bytes: 102496 allocations, 67302 deallocations
128 bytes: 212234 allocations, 178278 deallocations
uncompressed tag storage size: 8851200
compressed tag storage size: 4346368

And also, can you mention a new file in the documentation?

IIRC, it was my suggestion to measure some stats... If so, can you add:

Suggested-by: Yury Norov <[email protected]> # for stats

> Signed-off-by: Alexander Potapenko <[email protected]>
>
> ---
> v8:
> - adapt to the new compression API, abandon mteswap_{no,}comp.c
> - move stats collection to mteswap.c
>
> v5:
> - drop a dead variable from _mte_free_saved_tags() in mteswap_comp.c
> - ensure MTE compression works with arbitrary page sizes
> - update patch description
>
> v4:
> - minor code simplifications suggested by Andy Shevchenko, added
> missing header dependencies
> - changed compression API names to reflect modifications made to
> memcomp.h (as suggested by Yury Norov)
>
> v3:
> - Addressed comments by Andy Shevchenko in another patch:
> - fixed includes order
> - replaced u64 with unsigned long
> - added MODULE_IMPORT_NS(MTECOMP)
> ---
> arch/arm64/mm/mteswap.c | 88 +++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 84 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/mm/mteswap.c b/arch/arm64/mm/mteswap.c
> index a31833e3ddc54..0f558942d88b8 100644
> --- a/arch/arm64/mm/mteswap.c
> +++ b/arch/arm64/mm/mteswap.c
> @@ -1,28 +1,48 @@
> // SPDX-License-Identifier: GPL-2.0-only
>
> +#include <linux/debugfs.h>
> #include <linux/pagemap.h>
> #include <linux/xarray.h>
> #include <linux/slab.h>
> #include <linux/swap.h>
> #include <linux/swapops.h>
> #include <asm/mte.h>
> +#include <asm/mtecomp.h>
> +#include "mtecomp.h"
> +
> +enum mteswap_counters {
> + MTESWAP_CTR_INLINE = 0,
> + MTESWAP_CTR_NOINLINE,
> + MTESWAP_CTR_SIZE
> +};
> +static atomic_long_t alloc_counters[MTESWAP_CTR_SIZE];
> +static atomic_long_t dealloc_counters[MTESWAP_CTR_SIZE];

I think it's worth to add them and all the book keeping code in
a separate patch? Also can you consider making them configurable,
maybe depending on CONFIG_ARM64_MTE_SWAP_STAT?..

> static DEFINE_XARRAY(mte_pages);
>
> void *mte_allocate_tag_storage(void)
> {
> - /* tags granule is 16 bytes, 2 tags stored per byte */
> - return kmalloc(MTE_PAGE_TAG_STORAGE, GFP_KERNEL);
> + void *ret;
> +
> + ret = kmalloc(MTE_PAGE_TAG_STORAGE, GFP_KERNEL);
> + if (ret)
> + atomic_long_inc(&alloc_counters[MTESWAP_CTR_NOINLINE]);
> + return ret;
> }
>
> void mte_free_tag_storage(char *storage)

If you use a term 'free' here, the counter name should probably
reflect that.

> {
> - kfree(storage);
> + if (!mte_is_compressed(storage)) {
> + kfree(storage);
> + atomic_long_dec(&alloc_counters[MTESWAP_CTR_NOINLINE]);

s/NOINLINE/OUTLINE ?

> + } else {
> + atomic_long_dec(&alloc_counters[MTESWAP_CTR_INLINE]);
> + }
> }
>
> int mte_save_tags(struct page *page)
> {
> - void *tag_storage, *ret;
> + void *tag_storage, *ret, *compressed;
>
> if (!page_mte_tagged(page))
> return 0;
> @@ -32,6 +52,12 @@ int mte_save_tags(struct page *page)
> return -ENOMEM;
>
> mte_save_page_tags(page_address(page), tag_storage);
> + compressed = mte_compress(tag_storage);
> + if (compressed) {
> + mte_free_tag_storage(tag_storage);
> + tag_storage = (void *)compressed;

But 'compressed' is already 'void *', what for typecasting?
Also, it's a bad naming - adjective implies bool type. I'd name it
'compressed_tag', or similar.

> + atomic_long_inc(&alloc_counters[MTESWAP_CTR_INLINE]);
> + }

Is it possible to move all this conditional inside the function call?
I feel like it should be a single-line:

tag_storage = mte_compress(tag_storage);

So that client code will be free of housekeeping details.

>
> /* lookup the swap entry.val from the page */
> ret = xa_store(&mte_pages, page_swap_entry(page).val, tag_storage,
> @@ -50,13 +76,20 @@ int mte_save_tags(struct page *page)
> void mte_restore_tags(swp_entry_t entry, struct page *page)
> {
> void *tags = xa_load(&mte_pages, entry.val);
> + void *tag_storage = NULL;
>
> if (!tags)
> return;
>
> if (try_page_mte_tagging(page)) {
> + if (mte_is_compressed(tags)) {
> + tag_storage = mte_allocate_tag_storage();
> + mte_decompress(tags, tag_storage);
> + tags = tag_storage;
> + }

Same here, if it's possible, I'd prefer a single line call instead of
the above chunk:

tags = mte_decompress(tags);
if (!tags)
return;

And make sure that the decompress code will take care of details.

This would have a clear meaning: we add a compression/decompression
option which is as simple as calling corresponding functions in
mte_save/restore_tags(); and the rest of mte code is untouched

Thanks,
Yury

> mte_restore_page_tags(page_address(page), tags);
> set_page_mte_tagged(page);
> + mte_free_tag_storage(tag_storage);
> }
> }
>
> @@ -83,3 +116,50 @@ void mte_invalidate_tags_area(int type)
> }
> xa_unlock(&mte_pages);
> }
> +
> +/* DebugFS interface. */
> +static int stats_show(struct seq_file *seq, void *v)
> +{
> + unsigned long total_mem_alloc = 0, total_mem_dealloc = 0;
> + unsigned long total_num_alloc = 0, total_num_dealloc = 0;
> + unsigned long sizes[2] = { 8, MTE_PAGE_TAG_STORAGE };
> + long alloc, dealloc;
> + unsigned long size;
> + int i;
> +
> + for (i = 0; i < MTESWAP_CTR_SIZE; i++) {
> + alloc = atomic_long_read(&alloc_counters[i]);
> + dealloc = atomic_long_read(&dealloc_counters[i]);
> + total_num_alloc += alloc;
> + total_num_dealloc += dealloc;
> + size = sizes[i];
> + /*
> + * Do not count 8-byte buffers towards compressed tag storage
> + * size.
> + */
> + if (i) {
> + total_mem_alloc += (size * alloc);
> + total_mem_dealloc += (size * dealloc);
> + }
> + seq_printf(seq,
> + "%lu bytes: %lu allocations, %lu deallocations\n",
> + size, alloc, dealloc);
> + }
> + seq_printf(seq, "uncompressed tag storage size: %lu\n",
> + (total_num_alloc - total_num_dealloc) *
> + MTE_PAGE_TAG_STORAGE);
> + seq_printf(seq, "compressed tag storage size: %lu\n",
> + total_mem_alloc - total_mem_dealloc);
> + return 0;
> +}
> +DEFINE_SHOW_ATTRIBUTE(stats);
> +
> +static int mteswap_init(void)
> +{
> + struct dentry *mteswap_dir;
> +
> + mteswap_dir = debugfs_create_dir("mteswap", NULL);
> + debugfs_create_file("stats", 0444, mteswap_dir, NULL, &stats_fops);
> + return 0;
> +}
> +module_init(mteswap_init);
> --
> 2.42.0.869.gea05f2083d-goog