2023-10-24 13:47:11

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 00/39] Memory allocation profiling

Updates since the last version [1]
- Simplified allocation tagging macros;
- Runtime enable/disable sysctl switch (/proc/sys/vm/mem_profiling)
instead of kernel command-line option;
- CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT to select default enable state;
- Changed the user-facing API from debugfs to procfs (/proc/allocinfo);
- Removed context capture support to make patch incremental;
- Renamed uninstrumented allocation functions to use _noprof suffix;
- Added __GFP_LAST_BIT to make the code cleaner;
- Removed lazy per-cpu counters; it turned out the memory savings was
minimal and not worth the performance impact;

Things we could not address:
- Alternative way of instrument allocation functions. We discussed an
alternative way of instrumenting the allocators and Steven Rostedt wrote
a proposal [2] to provide compiler support for Callsite Trampolines - a
special attribution of functions to be instrumented. So far we spoke to
representatives of GNU and CLANG communities, will be presenting a
proposal at LPC 2023 and we are working on a proof of concept for CLANG
(see [example 1]). While we will keep working with compiler community on
adding this support, posting the latest version of the patchset as an
immediately available solution until compiler support is implemented;
- Reclaim memory used by pageexts, slabexts, pcpuexts when profiling is
disabled. The main obstacles of reclaiming the memory used for profiling:
1. References from already allocated objects still point to the
allocation tags we are trying to reclaim; We would have to scan and
kill all references before reclaiming;
2. pageext memory is allocated during early boot when fragmentation is
low. Once reclaimed we might not be able to get it back;
3. pageext and slabext objects used for profiling can be interleaved
with other data. Reformatting these vectors of objects at runtime is
a complex and racy task.

Overview

Memory allocation profiling infrastructure provides a low overhead
mechanism to make all kernel allocations in the system visible. It can be
used to monitor memory usage, track memory hotspots, detect memory leaks,
identify memory regressions.

To keep the overhead to the minimum, we record only allocation sizes for
every allocation in the codebase. The data is exposed to the user space
via /proc/allocinfo interface. Usage example:

$ sort -hr /proc/allocinfo | head
153MiB 8599 mm/slub.c:1826 module:slub func:alloc_slab_page
6.08MiB 49 mm/slab_common.c:950 module:slab_common func:_kmalloc_order
5.09MiB 6335 mm/memcontrol.c:2814 module:memcontrol func:alloc_slab_obj_exts
4.54MiB 78 mm/page_alloc.c:5777 module:page_alloc func:alloc_pages_exact
1.32MiB 338 include/asm-generic/pgalloc.h:63 module:pgtable func:__pte_alloc_one
1.16MiB 603 fs/xfs/xfs_log_priv.h:700 module:xfs func:xlog_kvmalloc
1.00MiB 256 mm/swap_cgroup.c:48 module:swap_cgroup func:swap_cgroup_prepare
734KiB 5380 fs/xfs/kmem.c:20 module:xfs func:kmem_alloc
640KiB 160 kernel/rcu/tree.c:3184 module:tree func:fill_page_cache_func
640KiB 160 drivers/char/virtio_console.c:452 module:virtio_console func:alloc_buf

Support for more detailed allocation context including pid, tgid, task
name, allocation size, timestamp and call stack is not posted in this
patchset to keep it small.

Implementation utilizes a more generic concept of code tagging, introduced
as part of this patchset. Code tag is a structure identifying a specific
location in the source code which is generated at compile time and can be
embedded in an application-specific structure. A number of applications
for code tagging have been presented in the original RFC [3].
Code tagging uses the old trick of "define a special elf section for
objects of a given type so that we can iterate over them at runtime" and
creates a proper library for it.

To profile memory allocations, we instrument page, slab and percpu
allocators to record total memory allocated in the associated code tag at
every allocation in the codebase. Every time an allocation is performed by
an instrumented allocator, the code tag at that location increments its
counter by allocation size. Every time the memory is freed the counter is
decremented. To decrement the counter upon freeing, allocated object needs
a reference to its code tag. Page allocators use page_ext to record this
reference while slab allocators use memcg_data (renamed into more generic
slabobj_ext) of the slab page.

Module allocations are accounted for the same way as other kernel
allocations. Module loading and unloading is supported. If a module is
unloaded while one or more of its allocations is still not freed (rather
rare condition), its data section will be kept in memory to allow later
code tag referencing when the allocation is freed later on.

As part of this series we introduce several kernel configs:
CONFIG_CODE_TAGGING - to enable code tagging framework.
CONFIG_MEM_ALLOC_PROFILING - enables memory allocation profiling.
CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT - enables memory allocation
profiling by default.
CONFIG_MEM_ALLOC_PROFILING_DEBUG - enables memory allocation profiling
validation.
Note: CONFIG_MEM_ALLOC_PROFILING enables CONFIG_PAGE_EXTENSION to store
code tag reference in the page_ext object.

/proc/sys/vm/mem_profiling sysctl is provided to enable/disable the
functionality and avoid the performance overhead.

Overhead
To measure the overhead we are comparing the following configurations:
(1) Baseline
(2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING &
!CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT)
(3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING &
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT)
(4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING &
!CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT & /proc/sys/vm/mem_profiling=1)
(5) Memcg (CONFIG_MEMCG_KMEM)
(6) Enabled by default with memcg (CONFIG_MEM_ALLOC_PROFILING &
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT & CONFIG_MEMCG_KMEM)

Performance overhead:
To evaluate performance we implemented an in-kernel test executing
multiple get_free_page/free_page and kmalloc/kfree calls with allocation
sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU
affinity set to a specific CPU to minimize the noise. Below is performance
comparison between the baseline kernel, profiling when enabled, profiling
when disabled and (for comparison purposes) baseline with
CONFIG_MEMCG_KMEM enabled and allocations using __GFP_ACCOUNT:

kmalloc pgalloc
(1 baseline) 12.041s 49.190s
(2 default disabled) 14.970s (+24.33%) 49.684s (+1.00%)
(3 default enabled) 16.859s (+40.01%) 56.287s (+14.43%)
(4 runtime enabled) 16.983s (+41.04%) 55.760s (+13.36%)
(5 memcg) 33.831s (+180.96%) 51.433s (+4.56%)
(6 enabled & memcg) 39.145s (+225.10%) 56.874s (+15.62%)

Memory overhead:
Kernel size:

text data bss dec hex
(1) 32638461 18286426 18325508 69250395 420ad5b
(2) 32710110 18646586 18071556 69428252 423641c
(3) 32706918 18646586 18071556 69425060 42357a4
(4) 32709664 18646586 18071556 69427806 423625e
(5) 32715893 18345334 18239492 69300719 42171ef
(6) 32786068 18701958 17993732 69481758 424351e

Memory consumption on a 56 core Intel CPU with 125GB of memory running
Fedora:
Code tags: 192 kB
PageExts: 262144 kB (256MB)
SlabExts: 9876 kB (9.6MB)
PcpuExts: 512 kB (0.5MB)

Total overhead is 0.2% of total memory.

[1] https://lore.kernel.org/all/[email protected]/
[2] https://docs.google.com/presentation/d/1zQnuMbEfcq9lHUXgJRUZsd1McRAkr3Xq6Wk693YA0To/edit?usp=sharing
[3] https://lore.kernel.org/all/[email protected]/

[example 1]:
typedef struct codetag {
const char* file;
int line;
int counter;
} codetag;

void my_trampoline(func_ptr func, ...) {
static codetag callsite_data __section("alloc_tags") =
{ __callsite_FILE, __callsite_LINE, 0 };
callsite_data.counter++;
func(...);
}

__callsite_wrapper(my_trampoline)
__attribute__ ((always_inline))
static inline void foo1(void) {
printf("foo1 function\n");
}

__callsite_wrapper(my_trampoline)
__attribute__ ((always_inline))
static inline void foo2(void) {
printf("foo2 function\n");
}

void bar(void) {
foo1();
}

int main(int argc, char** argv) {
foo1();
foo2();
bar();
return 0;
}

Kent Overstreet (16):
lib/string_helpers: Add flags param to string_get_size()
scripts/kallysms: Always include __start and __stop symbols
fs: Convert alloc_inode_sb() to a macro
nodemask: Split out include/linux/nodemask_types.h
prandom: Remove unused include
change alloc_pages name in ivpu_bo_ops to avoid conflicts
mm/slub: Mark slab_free_freelist_hook() __always_inline
mempool: Hook up to memory allocation profiling
xfs: Memory allocation profiling fixups
timekeeping: Fix a circular include dependency
mm: percpu: Introduce pcpuobj_ext
mm: percpu: Add codetag reference into pcpuobj_ext
arm64: Fix circular header dependency
mm: vmalloc: Enable memory allocation profiling
rhashtable: Plumb through alloc tag
MAINTAINERS: Add entries for code tagging and memory allocation
profiling

Suren Baghdasaryan (23):
mm: enumerate all gfp flags
mm: introduce slabobj_ext to support slab object extensions
mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext
creation
mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation
mm: prevent slabobj_ext allocations for slabobj_ext and kmem_cache
objects
slab: objext: introduce objext_flags as extension to
page_memcg_data_flags
lib: code tagging framework
lib: code tagging module support
lib: prevent module unloading if memory is not freed
lib: add allocation tagging support for memory allocation profiling
lib: introduce support for page allocation tagging
change alloc_pages name in dma_map_ops to avoid name conflicts
mm: enable page allocation tagging
mm: create new codetag references during page splitting
mm/page_ext: enable early_page_ext when
CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
lib: add codetag reference into slabobj_ext
mm/slab: add allocation accounting into slab allocation and free paths
mm/slab: enable slab allocation tagging for kmalloc and friends
mm: percpu: enable per-cpu allocation tagging
lib: add memory allocations report in show_mem()
codetag: debug: skip objext checking when it's for objext itself
codetag: debug: mark codetags for reserved pages as empty
codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext
allocations

Documentation/admin-guide/sysctl/vm.rst | 16 ++
Documentation/filesystems/proc.rst | 28 ++
MAINTAINERS | 16 ++
arch/arm64/include/asm/spectre.h | 4 +-
arch/powerpc/mm/book3s64/radix_pgtable.c | 2 +-
arch/x86/kernel/amd_gart_64.c | 2 +-
drivers/accel/ivpu/ivpu_gem.c | 8 +-
drivers/accel/ivpu/ivpu_gem.h | 2 +-
drivers/block/virtio_blk.c | 4 +-
drivers/gpu/drm/gud/gud_drv.c | 2 +-
drivers/iommu/dma-iommu.c | 2 +-
drivers/mmc/core/block.c | 4 +-
drivers/mtd/spi-nor/debugfs.c | 6 +-
.../ethernet/chelsio/cxgb4/cxgb4_debugfs.c | 4 +-
drivers/scsi/sd.c | 8 +-
drivers/staging/media/atomisp/pci/hmm/hmm.c | 2 +-
drivers/xen/grant-dma-ops.c | 2 +-
drivers/xen/swiotlb-xen.c | 2 +-
fs/xfs/kmem.c | 4 +-
fs/xfs/kmem.h | 10 +-
include/asm-generic/codetag.lds.h | 14 +
include/asm-generic/vmlinux.lds.h | 3 +
include/linux/alloc_tag.h | 188 +++++++++++++
include/linux/codetag.h | 83 ++++++
include/linux/dma-map-ops.h | 2 +-
include/linux/fortify-string.h | 5 +-
include/linux/fs.h | 6 +-
include/linux/gfp.h | 111 +++++---
include/linux/gfp_types.h | 101 +++++--
include/linux/hrtimer.h | 2 +-
include/linux/memcontrol.h | 56 +++-
include/linux/mempool.h | 73 +++--
include/linux/mm.h | 8 +
include/linux/mm_types.h | 4 +-
include/linux/nodemask.h | 2 +-
include/linux/nodemask_types.h | 9 +
include/linux/page_ext.h | 1 -
include/linux/pagemap.h | 9 +-
include/linux/percpu.h | 23 +-
include/linux/pgalloc_tag.h | 105 +++++++
include/linux/prandom.h | 1 -
include/linux/rhashtable-types.h | 11 +-
include/linux/sched.h | 26 +-
include/linux/slab.h | 180 ++++++------
include/linux/slab_def.h | 2 +-
include/linux/slub_def.h | 4 +-
include/linux/string.h | 4 +-
include/linux/string_helpers.h | 13 +-
include/linux/time_namespace.h | 2 +
include/linux/vmalloc.h | 60 +++-
init/Kconfig | 4 +
kernel/dma/mapping.c | 4 +-
kernel/kallsyms_selftest.c | 2 +-
kernel/module/main.c | 25 +-
lib/Kconfig.debug | 31 +++
lib/Makefile | 3 +
lib/alloc_tag.c | 212 ++++++++++++++
lib/codetag.c | 258 ++++++++++++++++++
lib/rhashtable.c | 52 +++-
lib/string_helpers.c | 24 +-
lib/test-string_helpers.c | 4 +-
mm/compaction.c | 7 +-
mm/filemap.c | 6 +-
mm/huge_memory.c | 2 +
mm/hugetlb.c | 8 +-
mm/kfence/core.c | 14 +-
mm/kfence/kfence.h | 4 +-
mm/memcontrol.c | 56 +---
mm/mempolicy.c | 42 +--
mm/mempool.c | 34 +--
mm/mm_init.c | 1 +
mm/page_alloc.c | 66 +++--
mm/page_ext.c | 13 +
mm/page_owner.c | 2 +-
mm/percpu-internal.h | 26 +-
mm/percpu.c | 120 ++++----
mm/show_mem.c | 15 +
mm/slab.c | 24 +-
mm/slab.h | 246 +++++++++++++----
mm/slab_common.c | 96 +++++--
mm/slub.c | 26 +-
mm/util.c | 44 +--
mm/vmalloc.c | 88 +++---
scripts/kallsyms.c | 13 +
scripts/module.lds.S | 7 +
85 files changed, 2117 insertions(+), 698 deletions(-)
create mode 100644 include/asm-generic/codetag.lds.h
create mode 100644 include/linux/alloc_tag.h
create mode 100644 include/linux/codetag.h
create mode 100644 include/linux/nodemask_types.h
create mode 100644 include/linux/pgalloc_tag.h
create mode 100644 lib/alloc_tag.c
create mode 100644 lib/codetag.c

--
2.42.0.758.gaed0368e0e-goog


2023-10-24 13:47:17

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 03/39] fs: Convert alloc_inode_sb() to a macro

From: Kent Overstreet <[email protected]>

We're introducing alloc tagging, which tracks memory allocations by
callsite. Converting alloc_inode_sb() to a macro means allocations will
be tracked by its caller, which is a bit more useful.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
Cc: Alexander Viro <[email protected]>
---
include/linux/fs.h | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4a40823c3c67..c545b1839e96 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2862,11 +2862,7 @@ int setattr_should_drop_sgid(struct mnt_idmap *idmap,
* This must be used for allocating filesystems specific inodes to set
* up the inode reclaim context correctly.
*/
-static inline void *
-alloc_inode_sb(struct super_block *sb, struct kmem_cache *cache, gfp_t gfp)
-{
- return kmem_cache_alloc_lru(cache, &sb->s_inode_lru, gfp);
-}
+#define alloc_inode_sb(_sb, _cache, _gfp) kmem_cache_alloc_lru(_cache, &_sb->s_inode_lru, _gfp)

extern void __insert_inode_hash(struct inode *, unsigned long hashval);
static inline void insert_inode_hash(struct inode *inode)
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:47:24

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 01/39] lib/string_helpers: Add flags param to string_get_size()

From: Kent Overstreet <[email protected]>

The new flags parameter allows controlling
- Whether or not the units suffix is separated by a space, for
compatibility with sort -h
- Whether or not to append a B suffix - we're not always printing
bytes.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: "Noralf Trønnes" <[email protected]>
Cc: Jens Axboe <[email protected]>
---
arch/powerpc/mm/book3s64/radix_pgtable.c | 2 +-
drivers/block/virtio_blk.c | 4 ++--
drivers/gpu/drm/gud/gud_drv.c | 2 +-
drivers/mmc/core/block.c | 4 ++--
drivers/mtd/spi-nor/debugfs.c | 6 ++---
.../ethernet/chelsio/cxgb4/cxgb4_debugfs.c | 4 ++--
drivers/scsi/sd.c | 8 +++----
include/linux/string_helpers.h | 13 +++++-----
lib/string_helpers.c | 24 +++++++++++++------
lib/test-string_helpers.c | 4 ++--
mm/hugetlb.c | 8 +++----
11 files changed, 44 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index c6a4ac766b2b..27aa5a083ff0 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -260,7 +260,7 @@ print_mapping(unsigned long start, unsigned long end, unsigned long size, bool e
if (end <= start)
return;

- string_get_size(size, 1, STRING_UNITS_2, buf, sizeof(buf));
+ string_get_size(size, 1, STRING_SIZE_BASE2, buf, sizeof(buf));

pr_info("Mapped 0x%016lx-0x%016lx with %s pages%s\n", start, end, buf,
exec ? " (exec)" : "");
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 1fe011676d07..59140424d755 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -986,9 +986,9 @@ static void virtblk_update_capacity(struct virtio_blk *vblk, bool resize)
nblocks = DIV_ROUND_UP_ULL(capacity, queue_logical_block_size(q) >> 9);

string_get_size(nblocks, queue_logical_block_size(q),
- STRING_UNITS_2, cap_str_2, sizeof(cap_str_2));
+ STRING_SIZE_BASE2, cap_str_2, sizeof(cap_str_2));
string_get_size(nblocks, queue_logical_block_size(q),
- STRING_UNITS_10, cap_str_10, sizeof(cap_str_10));
+ 0, cap_str_10, sizeof(cap_str_10));

dev_notice(&vdev->dev,
"[%s] %s%llu %d-byte logical blocks (%s/%s)\n",
diff --git a/drivers/gpu/drm/gud/gud_drv.c b/drivers/gpu/drm/gud/gud_drv.c
index 9d7bf8ee45f1..6b1748e1f666 100644
--- a/drivers/gpu/drm/gud/gud_drv.c
+++ b/drivers/gpu/drm/gud/gud_drv.c
@@ -329,7 +329,7 @@ static int gud_stats_debugfs(struct seq_file *m, void *data)
struct gud_device *gdrm = to_gud_device(entry->dev);
char buf[10];

- string_get_size(gdrm->bulk_len, 1, STRING_UNITS_2, buf, sizeof(buf));
+ string_get_size(gdrm->bulk_len, 1, STRING_SIZE_BASE2, buf, sizeof(buf));
seq_printf(m, "Max buffer size: %s\n", buf);
seq_printf(m, "Number of errors: %u\n", gdrm->stats_num_errors);

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 3a8f27c3e310..411dc8137f7c 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -2511,7 +2511,7 @@ static struct mmc_blk_data *mmc_blk_alloc_req(struct mmc_card *card,

blk_queue_write_cache(md->queue.queue, cache_enabled, fua_enabled);

- string_get_size((u64)size, 512, STRING_UNITS_2,
+ string_get_size((u64)size, 512, STRING_SIZE_BASE2,
cap_str, sizeof(cap_str));
pr_info("%s: %s %s %s%s\n",
md->disk->disk_name, mmc_card_id(card), mmc_card_name(card),
@@ -2707,7 +2707,7 @@ static int mmc_blk_alloc_rpmb_part(struct mmc_card *card,

list_add(&rpmb->node, &md->rpmbs);

- string_get_size((u64)size, 512, STRING_UNITS_2,
+ string_get_size((u64)size, 512, STRING_SIZE_BASE2,
cap_str, sizeof(cap_str));

pr_info("%s: %s %s %s, chardev (%d:%d)\n",
diff --git a/drivers/mtd/spi-nor/debugfs.c b/drivers/mtd/spi-nor/debugfs.c
index 6e163cb5b478..a1b61938fee2 100644
--- a/drivers/mtd/spi-nor/debugfs.c
+++ b/drivers/mtd/spi-nor/debugfs.c
@@ -85,7 +85,7 @@ static int spi_nor_params_show(struct seq_file *s, void *data)

seq_printf(s, "name\t\t%s\n", info->name);
seq_printf(s, "id\t\t%*ph\n", SPI_NOR_MAX_ID_LEN, nor->id);
- string_get_size(params->size, 1, STRING_UNITS_2, buf, sizeof(buf));
+ string_get_size(params->size, 1, STRING_SIZE_BASE2, buf, sizeof(buf));
seq_printf(s, "size\t\t%s\n", buf);
seq_printf(s, "write size\t%u\n", params->writesize);
seq_printf(s, "page size\t%u\n", params->page_size);
@@ -130,14 +130,14 @@ static int spi_nor_params_show(struct seq_file *s, void *data)
struct spi_nor_erase_type *et = &erase_map->erase_type[i];

if (et->size) {
- string_get_size(et->size, 1, STRING_UNITS_2, buf,
+ string_get_size(et->size, 1, STRING_SIZE_BASE2, buf,
sizeof(buf));
seq_printf(s, " %02x (%s) [%d]\n", et->opcode, buf, i);
}
}

if (!(nor->flags & SNOR_F_NO_OP_CHIP_ERASE)) {
- string_get_size(params->size, 1, STRING_UNITS_2, buf, sizeof(buf));
+ string_get_size(params->size, 1, STRING_SIZE_BASE2, buf, sizeof(buf));
seq_printf(s, " %02x (%s)\n", SPINOR_OP_CHIP_ERASE, buf);
}

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index 14e0d989c3ba..7d5fbebd36fc 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -3457,8 +3457,8 @@ static void mem_region_show(struct seq_file *seq, const char *name,
{
char buf[40];

- string_get_size((u64)to - from + 1, 1, STRING_UNITS_2, buf,
- sizeof(buf));
+ string_get_size((u64)to - from + 1, 1, STRING_SIZE_BASE2,
+ buf, sizeof(buf));
seq_printf(seq, "%-15s %#x-%#x [%s]\n", name, from, to, buf);
}

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 83b6a3f3863b..c37593f76b65 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2689,10 +2689,10 @@ sd_print_capacity(struct scsi_disk *sdkp,
if (!sdkp->first_scan && old_capacity == sdkp->capacity)
return;

- string_get_size(sdkp->capacity, sector_size,
- STRING_UNITS_2, cap_str_2, sizeof(cap_str_2));
- string_get_size(sdkp->capacity, sector_size,
- STRING_UNITS_10, cap_str_10, sizeof(cap_str_10));
+ string_get_size(sdkp->capacity, sector_size, STRING_SIZE_BASE2,
+ cap_str_2, sizeof(cap_str_2));
+ string_get_size(sdkp->capacity, sector_size, 0,
+ cap_str_10, sizeof(cap_str_10));

sd_printk(KERN_NOTICE, sdkp,
"%llu %d-byte logical blocks: (%s/%s)\n",
diff --git a/include/linux/string_helpers.h b/include/linux/string_helpers.h
index 9d1f5bb74dd5..a54467d891db 100644
--- a/include/linux/string_helpers.h
+++ b/include/linux/string_helpers.h
@@ -17,15 +17,14 @@ static inline bool string_is_terminated(const char *s, int len)
return memchr(s, '\0', len) ? true : false;
}

-/* Descriptions of the types of units to
- * print in */
-enum string_size_units {
- STRING_UNITS_10, /* use powers of 10^3 (standard SI) */
- STRING_UNITS_2, /* use binary powers of 2^10 */
+enum string_size_flags {
+ STRING_SIZE_BASE2 = (1 << 0),
+ STRING_SIZE_NOSPACE = (1 << 1),
+ STRING_SIZE_NOBYTES = (1 << 2),
};

-void string_get_size(u64 size, u64 blk_size, enum string_size_units units,
- char *buf, int len);
+int string_get_size(u64 size, u64 blk_size, enum string_size_flags flags,
+ char *buf, int len);

int parse_int_array_user(const char __user *from, size_t count, int **array);

diff --git a/lib/string_helpers.c b/lib/string_helpers.c
index 9982344cca34..b1496499b113 100644
--- a/lib/string_helpers.c
+++ b/lib/string_helpers.c
@@ -19,11 +19,17 @@
#include <linux/string.h>
#include <linux/string_helpers.h>

+enum string_size_units {
+ STRING_UNITS_10, /* use powers of 10^3 (standard SI) */
+ STRING_UNITS_2, /* use binary powers of 2^10 */
+};
+
/**
* string_get_size - get the size in the specified units
* @size: The size to be converted in blocks
* @blk_size: Size of the block (use 1 for size in bytes)
- * @units: units to use (powers of 1000 or 1024)
+ * @flags: units to use (powers of 1000 or 1024), whether to include space
+ * separator
* @buf: buffer to format to
* @len: length of buffer
*
@@ -32,14 +38,16 @@
* at least 9 bytes and will always be zero terminated.
*
*/
-void string_get_size(u64 size, u64 blk_size, const enum string_size_units units,
- char *buf, int len)
+int string_get_size(u64 size, u64 blk_size, enum string_size_flags flags,
+ char *buf, int len)
{
+ enum string_size_units units = flags & flags & STRING_SIZE_BASE2
+ ? STRING_UNITS_2 : STRING_UNITS_10;
static const char *const units_10[] = {
- "B", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB"
+ "", "k", "M", "G", "T", "P", "E", "Z", "Y"
};
static const char *const units_2[] = {
- "B", "KiB", "MiB", "GiB", "TiB", "PiB", "EiB", "ZiB", "YiB"
+ "", "Ki", "Mi", "Gi", "Ti", "Pi", "Ei", "Zi", "Yi"
};
static const char *const *const units_str[] = {
[STRING_UNITS_10] = units_10,
@@ -126,8 +134,10 @@ void string_get_size(u64 size, u64 blk_size, const enum string_size_units units,
else
unit = units_str[units][i];

- snprintf(buf, len, "%u%s %s", (u32)size,
- tmp, unit);
+ return snprintf(buf, len, "%u%s%s%s%s", (u32)size, tmp,
+ (flags & STRING_SIZE_NOSPACE) ? "" : " ",
+ unit,
+ (flags & STRING_SIZE_NOBYTES) ? "" : "B");
}
EXPORT_SYMBOL(string_get_size);

diff --git a/lib/test-string_helpers.c b/lib/test-string_helpers.c
index 9a68849a5d55..0b01ffca96fb 100644
--- a/lib/test-string_helpers.c
+++ b/lib/test-string_helpers.c
@@ -507,8 +507,8 @@ static __init void __test_string_get_size(const u64 size, const u64 blk_size,
char buf10[string_get_size_maxbuf];
char buf2[string_get_size_maxbuf];

- string_get_size(size, blk_size, STRING_UNITS_10, buf10, sizeof(buf10));
- string_get_size(size, blk_size, STRING_UNITS_2, buf2, sizeof(buf2));
+ string_get_size(size, blk_size, 0, buf10, sizeof(buf10));
+ string_get_size(size, blk_size, STRING_SIZE_BASE2, buf2, sizeof(buf2));

test_string_get_size_check("STRING_UNITS_10", exp_result10, buf10,
size, blk_size);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 52d26072dfda..37f2148d3b9c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3228,7 +3228,7 @@ static void __init hugetlb_hstate_alloc_pages_onenode(struct hstate *h, int nid)
if (i == h->max_huge_pages_node[nid])
return;

- string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32);
+ string_get_size(huge_page_size(h), 1, STRING_SIZE_BASE2, buf, 32);
pr_warn("HugeTLB: allocating %u of page size %s failed node%d. Only allocated %lu hugepages.\n",
h->max_huge_pages_node[nid], buf, nid, i);
h->max_huge_pages -= (h->max_huge_pages_node[nid] - i);
@@ -3290,7 +3290,7 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
if (i < h->max_huge_pages) {
char buf[32];

- string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32);
+ string_get_size(huge_page_size(h), 1, STRING_SIZE_BASE2, buf, 32);
pr_warn("HugeTLB: allocating %lu of page size %s failed. Only allocated %lu hugepages.\n",
h->max_huge_pages, buf, i);
h->max_huge_pages = i;
@@ -3336,7 +3336,7 @@ static void __init report_hugepages(void)
for_each_hstate(h) {
char buf[32];

- string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32);
+ string_get_size(huge_page_size(h), 1, STRING_SIZE_BASE2, buf, 32);
pr_info("HugeTLB: registered %s page size, pre-allocated %ld pages\n",
buf, h->free_huge_pages);
pr_info("HugeTLB: %d KiB vmemmap can be freed for a %s page\n",
@@ -4227,7 +4227,7 @@ static int __init hugetlb_init(void)
char buf[32];

string_get_size(huge_page_size(&default_hstate),
- 1, STRING_UNITS_2, buf, 32);
+ 1, STRING_SIZE_BASE2, buf, 32);
pr_warn("HugeTLB: Ignoring hugepages=%lu associated with %s page size\n",
default_hstate.max_huge_pages, buf);
pr_warn("HugeTLB: Using hugepages=%lu for number of default huge pages\n",
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:47:31

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 04/39] nodemask: Split out include/linux/nodemask_types.h

From: Kent Overstreet <[email protected]>

sched.h, which defines task_struct, needs nodemask_t - but sched.h is a
frequently used header and ideally shouldn't be pulling in any more code
that it needs to.

This splits out nodemask_types.h which has the definition sched.h needs,
which will avoid a circular header dependency in the alloc tagging patch
series, and as a bonus should speed up kernel build times.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
include/linux/nodemask.h | 2 +-
include/linux/nodemask_types.h | 9 +++++++++
include/linux/sched.h | 2 +-
3 files changed, 11 insertions(+), 2 deletions(-)
create mode 100644 include/linux/nodemask_types.h

diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index 8d07116caaf1..b61438313a73 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -93,10 +93,10 @@
#include <linux/threads.h>
#include <linux/bitmap.h>
#include <linux/minmax.h>
+#include <linux/nodemask_types.h>
#include <linux/numa.h>
#include <linux/random.h>

-typedef struct { DECLARE_BITMAP(bits, MAX_NUMNODES); } nodemask_t;
extern nodemask_t _unused_nodemask_arg_;

/**
diff --git a/include/linux/nodemask_types.h b/include/linux/nodemask_types.h
new file mode 100644
index 000000000000..84c2f47c4237
--- /dev/null
+++ b/include/linux/nodemask_types.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_NODEMASK_TYPES_H
+#define __LINUX_NODEMASK_TYPES_H
+
+#include <linux/numa.h>
+
+typedef struct { DECLARE_BITMAP(bits, MAX_NUMNODES); } nodemask_t;
+
+#endif /* __LINUX_NODEMASK_TYPES_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 77f01ac385f7..12a2554a3164 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -20,7 +20,7 @@
#include <linux/hrtimer.h>
#include <linux/irqflags.h>
#include <linux/seccomp.h>
-#include <linux/nodemask.h>
+#include <linux/nodemask_types.h>
#include <linux/rcupdate.h>
#include <linux/refcount.h>
#include <linux/resource.h>
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:47:38

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 02/39] scripts/kallysms: Always include __start and __stop symbols

From: Kent Overstreet <[email protected]>

These symbols are used to denote section boundaries: by always including
them we can unify loading sections from modules with loading built-in
sections, which leads to some significant cleanup.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
scripts/kallsyms.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 653b92f6d4c8..47978efe4797 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -204,6 +204,11 @@ static int symbol_in_range(const struct sym_entry *s,
return 0;
}

+static bool string_starts_with(const char *s, const char *prefix)
+{
+ return strncmp(s, prefix, strlen(prefix)) == 0;
+}
+
static int symbol_valid(const struct sym_entry *s)
{
const char *name = sym_name(s);
@@ -211,6 +216,14 @@ static int symbol_valid(const struct sym_entry *s)
/* if --all-symbols is not specified, then symbols outside the text
* and inittext sections are discarded */
if (!all_symbols) {
+ /*
+ * Symbols starting with __start and __stop are used to denote
+ * section boundaries, and should always be included:
+ */
+ if (string_starts_with(name, "__start_") ||
+ string_starts_with(name, "__stop_"))
+ return 1;
+
if (symbol_in_range(s, text_ranges,
ARRAY_SIZE(text_ranges)) == 0)
return 0;
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:47:40

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 06/39] mm: enumerate all gfp flags

Introduce GFP bits enumeration to let compiler track the number of used
bits (which depends on the config options) instead of hardcoding them.
That simplifies __GFP_BITS_SHIFT calculation.

Suggested-by: Petr Tesařík <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/gfp_types.h | 90 +++++++++++++++++++++++++++------------
1 file changed, 62 insertions(+), 28 deletions(-)

diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
index 6583a58670c5..3fbe624763d9 100644
--- a/include/linux/gfp_types.h
+++ b/include/linux/gfp_types.h
@@ -21,44 +21,78 @@ typedef unsigned int __bitwise gfp_t;
* include/trace/events/mmflags.h and tools/perf/builtin-kmem.c
*/

+enum {
+ ___GFP_DMA_BIT,
+ ___GFP_HIGHMEM_BIT,
+ ___GFP_DMA32_BIT,
+ ___GFP_MOVABLE_BIT,
+ ___GFP_RECLAIMABLE_BIT,
+ ___GFP_HIGH_BIT,
+ ___GFP_IO_BIT,
+ ___GFP_FS_BIT,
+ ___GFP_ZERO_BIT,
+ ___GFP_UNUSED_BIT, /* 0x200u unused */
+ ___GFP_DIRECT_RECLAIM_BIT,
+ ___GFP_KSWAPD_RECLAIM_BIT,
+ ___GFP_WRITE_BIT,
+ ___GFP_NOWARN_BIT,
+ ___GFP_RETRY_MAYFAIL_BIT,
+ ___GFP_NOFAIL_BIT,
+ ___GFP_NORETRY_BIT,
+ ___GFP_MEMALLOC_BIT,
+ ___GFP_COMP_BIT,
+ ___GFP_NOMEMALLOC_BIT,
+ ___GFP_HARDWALL_BIT,
+ ___GFP_THISNODE_BIT,
+ ___GFP_ACCOUNT_BIT,
+ ___GFP_ZEROTAGS_BIT,
+#ifdef CONFIG_KASAN_HW_TAGS
+ ___GFP_SKIP_ZERO_BIT,
+ ___GFP_SKIP_KASAN_BIT,
+#endif
+#ifdef CONFIG_LOCKDEP
+ ___GFP_NOLOCKDEP_BIT,
+#endif
+ ___GFP_LAST_BIT
+};
+
/* Plain integer GFP bitmasks. Do not use this directly. */
-#define ___GFP_DMA 0x01u
-#define ___GFP_HIGHMEM 0x02u
-#define ___GFP_DMA32 0x04u
-#define ___GFP_MOVABLE 0x08u
-#define ___GFP_RECLAIMABLE 0x10u
-#define ___GFP_HIGH 0x20u
-#define ___GFP_IO 0x40u
-#define ___GFP_FS 0x80u
-#define ___GFP_ZERO 0x100u
+#define ___GFP_DMA BIT(___GFP_DMA_BIT)
+#define ___GFP_HIGHMEM BIT(___GFP_HIGHMEM_BIT)
+#define ___GFP_DMA32 BIT(___GFP_DMA32_BIT)
+#define ___GFP_MOVABLE BIT(___GFP_MOVABLE_BIT)
+#define ___GFP_RECLAIMABLE BIT(___GFP_RECLAIMABLE_BIT)
+#define ___GFP_HIGH BIT(___GFP_HIGH_BIT)
+#define ___GFP_IO BIT(___GFP_IO_BIT)
+#define ___GFP_FS BIT(___GFP_FS_BIT)
+#define ___GFP_ZERO BIT(___GFP_ZERO_BIT)
/* 0x200u unused */
-#define ___GFP_DIRECT_RECLAIM 0x400u
-#define ___GFP_KSWAPD_RECLAIM 0x800u
-#define ___GFP_WRITE 0x1000u
-#define ___GFP_NOWARN 0x2000u
-#define ___GFP_RETRY_MAYFAIL 0x4000u
-#define ___GFP_NOFAIL 0x8000u
-#define ___GFP_NORETRY 0x10000u
-#define ___GFP_MEMALLOC 0x20000u
-#define ___GFP_COMP 0x40000u
-#define ___GFP_NOMEMALLOC 0x80000u
-#define ___GFP_HARDWALL 0x100000u
-#define ___GFP_THISNODE 0x200000u
-#define ___GFP_ACCOUNT 0x400000u
-#define ___GFP_ZEROTAGS 0x800000u
+#define ___GFP_DIRECT_RECLAIM BIT(___GFP_DIRECT_RECLAIM_BIT)
+#define ___GFP_KSWAPD_RECLAIM BIT(___GFP_KSWAPD_RECLAIM_BIT)
+#define ___GFP_WRITE BIT(___GFP_WRITE_BIT)
+#define ___GFP_NOWARN BIT(___GFP_NOWARN_BIT)
+#define ___GFP_RETRY_MAYFAIL BIT(___GFP_RETRY_MAYFAIL_BIT)
+#define ___GFP_NOFAIL BIT(___GFP_NOFAIL_BIT)
+#define ___GFP_NORETRY BIT(___GFP_NORETRY_BIT)
+#define ___GFP_MEMALLOC BIT(___GFP_MEMALLOC_BIT)
+#define ___GFP_COMP BIT(___GFP_COMP_BIT)
+#define ___GFP_NOMEMALLOC BIT(___GFP_NOMEMALLOC_BIT)
+#define ___GFP_HARDWALL BIT(___GFP_HARDWALL_BIT)
+#define ___GFP_THISNODE BIT(___GFP_THISNODE_BIT)
+#define ___GFP_ACCOUNT BIT(___GFP_ACCOUNT_BIT)
+#define ___GFP_ZEROTAGS BIT(___GFP_ZEROTAGS_BIT)
#ifdef CONFIG_KASAN_HW_TAGS
-#define ___GFP_SKIP_ZERO 0x1000000u
-#define ___GFP_SKIP_KASAN 0x2000000u
+#define ___GFP_SKIP_ZERO BIT(___GFP_SKIP_ZERO_BIT)
+#define ___GFP_SKIP_KASAN BIT(___GFP_SKIP_KASAN_BIT)
#else
#define ___GFP_SKIP_ZERO 0
#define ___GFP_SKIP_KASAN 0
#endif
#ifdef CONFIG_LOCKDEP
-#define ___GFP_NOLOCKDEP 0x4000000u
+#define ___GFP_NOLOCKDEP BIT(___GFP_NOLOCKDEP_BIT)
#else
#define ___GFP_NOLOCKDEP 0
#endif
-/* If the above are modified, __GFP_BITS_SHIFT may need updating */

/*
* Physical address zone modifiers (see linux/mmzone.h - low four bits)
@@ -249,7 +283,7 @@ typedef unsigned int __bitwise gfp_t;
#define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)

/* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT (26 + IS_ENABLED(CONFIG_LOCKDEP))
+#define __GFP_BITS_SHIFT ___GFP_LAST_BIT
#define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))

/**
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:47:42

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 05/39] prandom: Remove unused include

From: Kent Overstreet <[email protected]>

prandom.h doesn't use percpu.h - this fixes some circular header issues.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/prandom.h | 1 -
1 file changed, 1 deletion(-)

diff --git a/include/linux/prandom.h b/include/linux/prandom.h
index f2ed5b72b3d6..f7f1e5251c67 100644
--- a/include/linux/prandom.h
+++ b/include/linux/prandom.h
@@ -10,7 +10,6 @@

#include <linux/types.h>
#include <linux/once.h>
-#include <linux/percpu.h>
#include <linux/random.h>

struct rnd_state {
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:48:02

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 08/39] mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext creation

Introduce __GFP_NO_OBJ_EXT flag in order to prevent recursive allocations
when allocating slabobj_ext on a slab.

Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/gfp_types.h | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
index 3fbe624763d9..1c6573d69347 100644
--- a/include/linux/gfp_types.h
+++ b/include/linux/gfp_types.h
@@ -52,6 +52,9 @@ enum {
#endif
#ifdef CONFIG_LOCKDEP
___GFP_NOLOCKDEP_BIT,
+#endif
+#ifdef CONFIG_SLAB_OBJ_EXT
+ ___GFP_NO_OBJ_EXT_BIT,
#endif
___GFP_LAST_BIT
};
@@ -93,6 +96,11 @@ enum {
#else
#define ___GFP_NOLOCKDEP 0
#endif
+#ifdef CONFIG_SLAB_OBJ_EXT
+#define ___GFP_NO_OBJ_EXT BIT(___GFP_NO_OBJ_EXT_BIT)
+#else
+#define ___GFP_NO_OBJ_EXT 0
+#endif

/*
* Physical address zone modifiers (see linux/mmzone.h - low four bits)
@@ -133,12 +141,15 @@ enum {
* node with no fallbacks or placement policy enforcements.
*
* %__GFP_ACCOUNT causes the allocation to be accounted to kmemcg.
+ *
+ * %__GFP_NO_OBJ_EXT causes slab allocation to have no object extension.
*/
#define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE)
#define __GFP_WRITE ((__force gfp_t)___GFP_WRITE)
#define __GFP_HARDWALL ((__force gfp_t)___GFP_HARDWALL)
#define __GFP_THISNODE ((__force gfp_t)___GFP_THISNODE)
#define __GFP_ACCOUNT ((__force gfp_t)___GFP_ACCOUNT)
+#define __GFP_NO_OBJ_EXT ((__force gfp_t)___GFP_NO_OBJ_EXT)

/**
* DOC: Watermark modifiers
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:48:20

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 10/39] mm: prevent slabobj_ext allocations for slabobj_ext and kmem_cache objects

Use __GFP_NO_OBJ_EXT to prevent recursions when allocating slabobj_ext
objects. Also prevent slabobj_ext allocations for kmem_cache objects.

Signed-off-by: Suren Baghdasaryan <[email protected]>
---
mm/slab.h | 6 ++++++
mm/slab_common.c | 2 ++
2 files changed, 8 insertions(+)

diff --git a/mm/slab.h b/mm/slab.h
index 5a47125469f1..187acc593397 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -489,6 +489,12 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
if (!need_slab_obj_ext())
return NULL;

+ if (s->flags & SLAB_NO_OBJ_EXT)
+ return NULL;
+
+ if (flags & __GFP_NO_OBJ_EXT)
+ return NULL;
+
slab = virt_to_slab(p);
if (!slab_obj_exts(slab) &&
WARN(alloc_slab_obj_exts(slab, s, flags, false),
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 2b42a9d2c11c..446f406d2703 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -222,6 +222,8 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
void *vec;

gfp &= ~OBJCGS_CLEAR_MASK;
+ /* Prevent recursive extension vector allocation */
+ gfp |= __GFP_NO_OBJ_EXT;
vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
slab_nid(slab));
if (!vec)
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:48:29

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 09/39] mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation

Slab extension objects can't be allocated before slab infrastructure is
initialized. Some caches, like kmem_cache and kmem_cache_node, are created
before slab infrastructure is initialized. Objects from these caches can't
have extension objects. Introduce SLAB_NO_OBJ_EXT slab flag to mark these
caches and avoid creating extensions for objects allocated from these
slabs.

Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/slab.h | 7 +++++++
mm/slab.c | 2 +-
mm/slub.c | 5 +++--
3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 8228d1276a2f..11ef3d364b2b 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -164,6 +164,13 @@
#endif
#define SLAB_TEMPORARY SLAB_RECLAIM_ACCOUNT /* Objects are short-lived */

+#ifdef CONFIG_SLAB_OBJ_EXT
+/* Slab created using create_boot_cache */
+#define SLAB_NO_OBJ_EXT ((slab_flags_t __force)0x20000000U)
+#else
+#define SLAB_NO_OBJ_EXT 0
+#endif
+
/*
* ZERO_SIZE_PTR will be returned for zero sized kmalloc requests.
*
diff --git a/mm/slab.c b/mm/slab.c
index 9ad3d0f2d1a5..cefcb7499b6c 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1232,7 +1232,7 @@ void __init kmem_cache_init(void)
create_boot_cache(kmem_cache, "kmem_cache",
offsetof(struct kmem_cache, node) +
nr_node_ids * sizeof(struct kmem_cache_node *),
- SLAB_HWCACHE_ALIGN, 0, 0);
+ SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0);
list_add(&kmem_cache->list, &slab_caches);
slab_state = PARTIAL;

diff --git a/mm/slub.c b/mm/slub.c
index f7940048138c..d16643492320 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5043,7 +5043,8 @@ void __init kmem_cache_init(void)
node_set(node, slab_nodes);

create_boot_cache(kmem_cache_node, "kmem_cache_node",
- sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN, 0, 0);
+ sizeof(struct kmem_cache_node),
+ SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0);

hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);

@@ -5053,7 +5054,7 @@ void __init kmem_cache_init(void)
create_boot_cache(kmem_cache, "kmem_cache",
offsetof(struct kmem_cache, node) +
nr_node_ids * sizeof(struct kmem_cache_node *),
- SLAB_HWCACHE_ALIGN, 0, 0);
+ SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0);

kmem_cache = bootstrap(&boot_kmem_cache);
kmem_cache_node = bootstrap(&boot_kmem_cache_node);
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:48:36

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 11/39] slab: objext: introduce objext_flags as extension to page_memcg_data_flags

Introduce objext_flags to store additional objext flags unrelated to memcg.

Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/memcontrol.h | 29 ++++++++++++++++++++++-------
mm/slab.h | 4 +---
2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 4b17ebb7e723..f3ede28b6fa6 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -354,7 +354,22 @@ enum page_memcg_data_flags {
__NR_MEMCG_DATA_FLAGS = (1UL << 2),
};

-#define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
+#define __FIRST_OBJEXT_FLAG __NR_MEMCG_DATA_FLAGS
+
+#else /* CONFIG_MEMCG */
+
+#define __FIRST_OBJEXT_FLAG (1UL << 0)
+
+#endif /* CONFIG_MEMCG */
+
+enum objext_flags {
+ /* the next bit after the last actual flag */
+ __NR_OBJEXTS_FLAGS = __FIRST_OBJEXT_FLAG,
+};
+
+#define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1)
+
+#ifdef CONFIG_MEMCG

static inline bool folio_memcg_kmem(struct folio *folio);

@@ -388,7 +403,7 @@ static inline struct mem_cgroup *__folio_memcg(struct folio *folio)
VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio);
VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_KMEM, folio);

- return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
}

/*
@@ -409,7 +424,7 @@ static inline struct obj_cgroup *__folio_objcg(struct folio *folio)
VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio);
VM_BUG_ON_FOLIO(!(memcg_data & MEMCG_DATA_KMEM), folio);

- return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ return (struct obj_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
}

/*
@@ -466,11 +481,11 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)
if (memcg_data & MEMCG_DATA_KMEM) {
struct obj_cgroup *objcg;

- objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ objcg = (void *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
return obj_cgroup_memcg(objcg);
}

- return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
}

/*
@@ -509,11 +524,11 @@ static inline struct mem_cgroup *folio_memcg_check(struct folio *folio)
if (memcg_data & MEMCG_DATA_KMEM) {
struct obj_cgroup *objcg;

- objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ objcg = (void *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
return obj_cgroup_memcg(objcg);
}

- return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
}

static inline struct mem_cgroup *page_memcg_check(struct page *page)
diff --git a/mm/slab.h b/mm/slab.h
index 187acc593397..60417fd262ea 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -448,10 +448,8 @@ static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
slab_page(slab));
VM_BUG_ON_PAGE(obj_exts & MEMCG_DATA_KMEM, slab_page(slab));

- return (struct slabobj_ext *)(obj_exts & ~MEMCG_DATA_FLAGS_MASK);
-#else
- return (struct slabobj_ext *)obj_exts;
#endif
+ return (struct slabobj_ext *)(obj_exts & ~OBJEXTS_FLAGS_MASK);
}

int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:48:36

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 07/39] mm: introduce slabobj_ext to support slab object extensions

Currently slab pages can store only vectors of obj_cgroup pointers in
page->memcg_data. Introduce slabobj_ext structure to allow more data
to be stored for each slab object. Wrap obj_cgroup into slabobj_ext
to support current functionality while allowing to extend slabobj_ext
in the future.

Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/memcontrol.h | 20 +++--
include/linux/mm_types.h | 4 +-
init/Kconfig | 4 +
mm/kfence/core.c | 14 ++--
mm/kfence/kfence.h | 4 +-
mm/memcontrol.c | 56 ++------------
mm/page_owner.c | 2 +-
mm/slab.h | 148 +++++++++++++++++++++++++------------
mm/slab_common.c | 47 ++++++++++++
9 files changed, 185 insertions(+), 114 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e4e24da16d2c..4b17ebb7e723 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -346,8 +346,8 @@ struct mem_cgroup {
extern struct mem_cgroup *root_mem_cgroup;

enum page_memcg_data_flags {
- /* page->memcg_data is a pointer to an objcgs vector */
- MEMCG_DATA_OBJCGS = (1UL << 0),
+ /* page->memcg_data is a pointer to an slabobj_ext vector */
+ MEMCG_DATA_OBJEXTS = (1UL << 0),
/* page has been accounted as a non-slab kernel page */
MEMCG_DATA_KMEM = (1UL << 1),
/* the next bit after the last actual flag */
@@ -385,7 +385,7 @@ static inline struct mem_cgroup *__folio_memcg(struct folio *folio)
unsigned long memcg_data = folio->memcg_data;

VM_BUG_ON_FOLIO(folio_test_slab(folio), folio);
- VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJCGS, folio);
+ VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio);
VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_KMEM, folio);

return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
@@ -406,7 +406,7 @@ static inline struct obj_cgroup *__folio_objcg(struct folio *folio)
unsigned long memcg_data = folio->memcg_data;

VM_BUG_ON_FOLIO(folio_test_slab(folio), folio);
- VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJCGS, folio);
+ VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio);
VM_BUG_ON_FOLIO(!(memcg_data & MEMCG_DATA_KMEM), folio);

return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
@@ -503,7 +503,7 @@ static inline struct mem_cgroup *folio_memcg_check(struct folio *folio)
*/
unsigned long memcg_data = READ_ONCE(folio->memcg_data);

- if (memcg_data & MEMCG_DATA_OBJCGS)
+ if (memcg_data & MEMCG_DATA_OBJEXTS)
return NULL;

if (memcg_data & MEMCG_DATA_KMEM) {
@@ -549,7 +549,7 @@ static inline struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *ob
static inline bool folio_memcg_kmem(struct folio *folio)
{
VM_BUG_ON_PGFLAGS(PageTail(&folio->page), &folio->page);
- VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJCGS, folio);
+ VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS, folio);
return folio->memcg_data & MEMCG_DATA_KMEM;
}

@@ -1593,6 +1593,14 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
}
#endif /* CONFIG_MEMCG */

+/*
+ * Extended information for slab objects stored as an array in page->memcg_data
+ * if MEMCG_DATA_OBJEXTS is set.
+ */
+struct slabobj_ext {
+ struct obj_cgroup *objcg;
+} __aligned(8);
+
static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx)
{
__mod_lruvec_kmem_state(p, idx, 1);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 36c5b43999e6..5b55c4752c23 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -180,7 +180,7 @@ struct page {
/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
atomic_t _refcount;

-#ifdef CONFIG_MEMCG
+#ifdef CONFIG_SLAB_OBJ_EXT
unsigned long memcg_data;
#endif

@@ -315,7 +315,7 @@ struct folio {
};
atomic_t _mapcount;
atomic_t _refcount;
-#ifdef CONFIG_MEMCG
+#ifdef CONFIG_SLAB_OBJ_EXT
unsigned long memcg_data;
#endif
/* private: the union with struct page is transitional */
diff --git a/init/Kconfig b/init/Kconfig
index 6d35728b94b2..78a7abe36037 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -937,10 +937,14 @@ config CGROUP_FAVOR_DYNMODS

Say N if unsure.

+config SLAB_OBJ_EXT
+ bool
+
config MEMCG
bool "Memory controller"
select PAGE_COUNTER
select EVENTFD
+ select SLAB_OBJ_EXT
help
Provides control over the memory footprint of tasks in a cgroup.

diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 3872528d0963..02b744d2e07d 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -599,9 +599,9 @@ static unsigned long kfence_init_pool(void)
continue;

__folio_set_slab(slab_folio(slab));
-#ifdef CONFIG_MEMCG
- slab->memcg_data = (unsigned long)&kfence_metadata_init[i / 2 - 1].objcg |
- MEMCG_DATA_OBJCGS;
+#ifdef CONFIG_MEMCG_KMEM
+ slab->obj_exts = (unsigned long)&kfence_metadata_init[i / 2 - 1].obj_exts |
+ MEMCG_DATA_OBJEXTS;
#endif
}

@@ -649,8 +649,8 @@ static unsigned long kfence_init_pool(void)

if (!i || (i % 2))
continue;
-#ifdef CONFIG_MEMCG
- slab->memcg_data = 0;
+#ifdef CONFIG_MEMCG_KMEM
+ slab->obj_exts = 0;
#endif
__folio_clear_slab(slab_folio(slab));
}
@@ -1143,8 +1143,8 @@ void __kfence_free(void *addr)
{
struct kfence_metadata *meta = addr_to_metadata((unsigned long)addr);

-#ifdef CONFIG_MEMCG
- KFENCE_WARN_ON(meta->objcg);
+#ifdef CONFIG_MEMCG_KMEM
+ KFENCE_WARN_ON(meta->obj_exts.objcg);
#endif
/*
* If the objects of the cache are SLAB_TYPESAFE_BY_RCU, defer freeing
diff --git a/mm/kfence/kfence.h b/mm/kfence/kfence.h
index f46fbb03062b..084f5f36e8e7 100644
--- a/mm/kfence/kfence.h
+++ b/mm/kfence/kfence.h
@@ -97,8 +97,8 @@ struct kfence_metadata {
struct kfence_track free_track;
/* For updating alloc_covered on frees. */
u32 alloc_stack_hash;
-#ifdef CONFIG_MEMCG
- struct obj_cgroup *objcg;
+#ifdef CONFIG_MEMCG_KMEM
+ struct slabobj_ext obj_exts;
#endif
};

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5b009b233ab8..aca777f45d34 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2859,13 +2859,6 @@ static void commit_charge(struct folio *folio, struct mem_cgroup *memcg)
}

#ifdef CONFIG_MEMCG_KMEM
-/*
- * The allocated objcg pointers array is not accounted directly.
- * Moreover, it should not come from DMA buffer and is not readily
- * reclaimable. So those GFP bits should be masked off.
- */
-#define OBJCGS_CLEAR_MASK (__GFP_DMA | __GFP_RECLAIMABLE | __GFP_ACCOUNT)
-
/*
* mod_objcg_mlstate() may be called with irq enabled, so
* mod_memcg_lruvec_state() should be used.
@@ -2884,62 +2877,27 @@ static inline void mod_objcg_mlstate(struct obj_cgroup *objcg,
rcu_read_unlock();
}

-int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s,
- gfp_t gfp, bool new_slab)
-{
- unsigned int objects = objs_per_slab(s, slab);
- unsigned long memcg_data;
- void *vec;
-
- gfp &= ~OBJCGS_CLEAR_MASK;
- vec = kcalloc_node(objects, sizeof(struct obj_cgroup *), gfp,
- slab_nid(slab));
- if (!vec)
- return -ENOMEM;
-
- memcg_data = (unsigned long) vec | MEMCG_DATA_OBJCGS;
- if (new_slab) {
- /*
- * If the slab is brand new and nobody can yet access its
- * memcg_data, no synchronization is required and memcg_data can
- * be simply assigned.
- */
- slab->memcg_data = memcg_data;
- } else if (cmpxchg(&slab->memcg_data, 0, memcg_data)) {
- /*
- * If the slab is already in use, somebody can allocate and
- * assign obj_cgroups in parallel. In this case the existing
- * objcg vector should be reused.
- */
- kfree(vec);
- return 0;
- }
-
- kmemleak_not_leak(vec);
- return 0;
-}
-
static __always_inline
struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p)
{
/*
* Slab objects are accounted individually, not per-page.
* Memcg membership data for each individual object is saved in
- * slab->memcg_data.
+ * slab->obj_exts.
*/
if (folio_test_slab(folio)) {
- struct obj_cgroup **objcgs;
+ struct slabobj_ext *obj_exts;
struct slab *slab;
unsigned int off;

slab = folio_slab(folio);
- objcgs = slab_objcgs(slab);
- if (!objcgs)
+ obj_exts = slab_obj_exts(slab);
+ if (!obj_exts)
return NULL;

off = obj_to_index(slab->slab_cache, slab, p);
- if (objcgs[off])
- return obj_cgroup_memcg(objcgs[off]);
+ if (obj_exts[off].objcg)
+ return obj_cgroup_memcg(obj_exts[off].objcg);

return NULL;
}
@@ -2947,7 +2905,7 @@ struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p)
/*
* folio_memcg_check() is used here, because in theory we can encounter
* a folio where the slab flag has been cleared already, but
- * slab->memcg_data has not been freed yet
+ * slab->obj_exts has not been freed yet
* folio_memcg_check() will guarantee that a proper memory
* cgroup pointer or NULL will be returned.
*/
diff --git a/mm/page_owner.c b/mm/page_owner.c
index 4e2723e1b300..de6ea5746acd 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -372,7 +372,7 @@ static inline int print_page_owner_memcg(char *kbuf, size_t count, int ret,
if (!memcg_data)
goto out_unlock;

- if (memcg_data & MEMCG_DATA_OBJCGS)
+ if (memcg_data & MEMCG_DATA_OBJEXTS)
ret += scnprintf(kbuf + ret, count - ret,
"Slab cache page\n");

diff --git a/mm/slab.h b/mm/slab.h
index 799a315695c6..5a47125469f1 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -96,8 +96,8 @@ struct slab {
#endif

atomic_t __page_refcount;
-#ifdef CONFIG_MEMCG
- unsigned long memcg_data;
+#ifdef CONFIG_SLAB_OBJ_EXT
+ unsigned long obj_exts;
#endif
};

@@ -106,8 +106,8 @@ struct slab {
SLAB_MATCH(flags, __page_flags);
SLAB_MATCH(compound_head, slab_cache); /* Ensure bit 0 is clear */
SLAB_MATCH(_refcount, __page_refcount);
-#ifdef CONFIG_MEMCG
-SLAB_MATCH(memcg_data, memcg_data);
+#ifdef CONFIG_SLAB_OBJ_EXT
+SLAB_MATCH(memcg_data, obj_exts);
#endif
#undef SLAB_MATCH
static_assert(sizeof(struct slab) <= sizeof(struct page));
@@ -429,36 +429,106 @@ static inline bool kmem_cache_debug_flags(struct kmem_cache *s, slab_flags_t fla
return false;
}

-#ifdef CONFIG_MEMCG_KMEM
+#ifdef CONFIG_SLAB_OBJ_EXT
+
/*
- * slab_objcgs - get the object cgroups vector associated with a slab
+ * slab_obj_exts - get the pointer to the slab object extension vector
+ * associated with a slab.
* @slab: a pointer to the slab struct
*
- * Returns a pointer to the object cgroups vector associated with the slab,
+ * Returns a pointer to the object extension vector associated with the slab,
* or NULL if no such vector has been associated yet.
*/
-static inline struct obj_cgroup **slab_objcgs(struct slab *slab)
+static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
{
- unsigned long memcg_data = READ_ONCE(slab->memcg_data);
+ unsigned long obj_exts = READ_ONCE(slab->obj_exts);

- VM_BUG_ON_PAGE(memcg_data && !(memcg_data & MEMCG_DATA_OBJCGS),
+#ifdef CONFIG_MEMCG
+ VM_BUG_ON_PAGE(obj_exts && !(obj_exts & MEMCG_DATA_OBJEXTS),
slab_page(slab));
- VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, slab_page(slab));
+ VM_BUG_ON_PAGE(obj_exts & MEMCG_DATA_KMEM, slab_page(slab));

- return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+ return (struct slabobj_ext *)(obj_exts & ~MEMCG_DATA_FLAGS_MASK);
+#else
+ return (struct slabobj_ext *)obj_exts;
+#endif
}

-int memcg_alloc_slab_cgroups(struct slab *slab, struct kmem_cache *s,
- gfp_t gfp, bool new_slab);
-void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
- enum node_stat_item idx, int nr);
+int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
+ gfp_t gfp, bool new_slab);

-static inline void memcg_free_slab_cgroups(struct slab *slab)
+static inline bool need_slab_obj_ext(void)
{
- kfree(slab_objcgs(slab));
- slab->memcg_data = 0;
+ /*
+ * CONFIG_MEMCG_KMEM creates vector of obj_cgroup objects conditionally
+ * inside memcg_slab_post_alloc_hook. No other users for now.
+ */
+ return false;
}

+static inline void free_slab_obj_exts(struct slab *slab)
+{
+ struct slabobj_ext *obj_exts;
+
+ obj_exts = slab_obj_exts(slab);
+ if (!obj_exts)
+ return;
+
+ kfree(obj_exts);
+ slab->obj_exts = 0;
+}
+
+static inline struct slabobj_ext *
+prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
+{
+ struct slab *slab;
+
+ if (!p)
+ return NULL;
+
+ if (!need_slab_obj_ext())
+ return NULL;
+
+ slab = virt_to_slab(p);
+ if (!slab_obj_exts(slab) &&
+ WARN(alloc_slab_obj_exts(slab, s, flags, false),
+ "%s, %s: Failed to create slab extension vector!\n",
+ __func__, s->name))
+ return NULL;
+
+ return slab_obj_exts(slab) + obj_to_index(s, slab, p);
+}
+
+#else /* CONFIG_SLAB_OBJ_EXT */
+
+static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
+{
+ return NULL;
+}
+
+static inline int alloc_slab_obj_exts(struct slab *slab,
+ struct kmem_cache *s, gfp_t gfp,
+ bool new_slab)
+{
+ return 0;
+}
+
+static inline void free_slab_obj_exts(struct slab *slab)
+{
+}
+
+static inline struct slabobj_ext *
+prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
+{
+ return NULL;
+}
+
+#endif /* CONFIG_SLAB_OBJ_EXT */
+
+#ifdef CONFIG_MEMCG_KMEM
+void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
+ enum node_stat_item idx, int nr);
+
static inline size_t obj_full_size(struct kmem_cache *s)
{
/*
@@ -526,16 +596,15 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
if (likely(p[i])) {
slab = virt_to_slab(p[i]);

- if (!slab_objcgs(slab) &&
- memcg_alloc_slab_cgroups(slab, s, flags,
- false)) {
+ if (!slab_obj_exts(slab) &&
+ alloc_slab_obj_exts(slab, s, flags, false)) {
obj_cgroup_uncharge(objcg, obj_full_size(s));
continue;
}

off = obj_to_index(s, slab, p[i]);
obj_cgroup_get(objcg);
- slab_objcgs(slab)[off] = objcg;
+ slab_obj_exts(slab)[off].objcg = objcg;
mod_objcg_state(objcg, slab_pgdat(slab),
cache_vmstat_idx(s), obj_full_size(s));
} else {
@@ -548,14 +617,14 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
void **p, int objects)
{
- struct obj_cgroup **objcgs;
+ struct slabobj_ext *obj_exts;
int i;

if (!memcg_kmem_online())
return;

- objcgs = slab_objcgs(slab);
- if (!objcgs)
+ obj_exts = slab_obj_exts(slab);
+ if (!obj_exts)
return;

for (i = 0; i < objects; i++) {
@@ -563,11 +632,11 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
unsigned int off;

off = obj_to_index(s, slab, p[i]);
- objcg = objcgs[off];
+ objcg = obj_exts[off].objcg;
if (!objcg)
continue;

- objcgs[off] = NULL;
+ obj_exts[off].objcg = NULL;
obj_cgroup_uncharge(objcg, obj_full_size(s));
mod_objcg_state(objcg, slab_pgdat(slab), cache_vmstat_idx(s),
-obj_full_size(s));
@@ -576,27 +645,11 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
}

#else /* CONFIG_MEMCG_KMEM */
-static inline struct obj_cgroup **slab_objcgs(struct slab *slab)
-{
- return NULL;
-}
-
static inline struct mem_cgroup *memcg_from_slab_obj(void *ptr)
{
return NULL;
}

-static inline int memcg_alloc_slab_cgroups(struct slab *slab,
- struct kmem_cache *s, gfp_t gfp,
- bool new_slab)
-{
- return 0;
-}
-
-static inline void memcg_free_slab_cgroups(struct slab *slab)
-{
-}
-
static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s,
struct list_lru *lru,
struct obj_cgroup **objcgp,
@@ -633,7 +686,7 @@ static __always_inline void account_slab(struct slab *slab, int order,
struct kmem_cache *s, gfp_t gfp)
{
if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
- memcg_alloc_slab_cgroups(slab, s, gfp, true);
+ alloc_slab_obj_exts(slab, s, gfp, true);

mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
PAGE_SIZE << order);
@@ -642,8 +695,7 @@ static __always_inline void account_slab(struct slab *slab, int order,
static __always_inline void unaccount_slab(struct slab *slab, int order,
struct kmem_cache *s)
{
- if (memcg_kmem_online())
- memcg_free_slab_cgroups(slab);
+ free_slab_obj_exts(slab);

mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
-(PAGE_SIZE << order));
@@ -723,6 +775,7 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s,
unsigned int orig_size)
{
unsigned int zero_size = s->object_size;
+ struct slabobj_ext *obj_exts;
bool kasan_init = init;
size_t i;

@@ -765,6 +818,7 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s,
kmemleak_alloc_recursive(p[i], s->object_size, 1,
s->flags, flags);
kmsan_slab_alloc(s, p[i], flags);
+ obj_exts = prepare_slab_obj_exts_hook(s, flags, p[i]);
}

memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 9bbffe82d65a..2b42a9d2c11c 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -206,6 +206,53 @@ struct kmem_cache *find_mergeable(unsigned int size, unsigned int align,
return NULL;
}

+#ifdef CONFIG_SLAB_OBJ_EXT
+/*
+ * The allocated objcg pointers array is not accounted directly.
+ * Moreover, it should not come from DMA buffer and is not readily
+ * reclaimable. So those GFP bits should be masked off.
+ */
+#define OBJCGS_CLEAR_MASK (__GFP_DMA | __GFP_RECLAIMABLE | __GFP_ACCOUNT)
+
+int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
+ gfp_t gfp, bool new_slab)
+{
+ unsigned int objects = objs_per_slab(s, slab);
+ unsigned long obj_exts;
+ void *vec;
+
+ gfp &= ~OBJCGS_CLEAR_MASK;
+ vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
+ slab_nid(slab));
+ if (!vec)
+ return -ENOMEM;
+
+ obj_exts = (unsigned long)vec;
+#ifdef CONFIG_MEMCG
+ obj_exts |= MEMCG_DATA_OBJEXTS;
+#endif
+ if (new_slab) {
+ /*
+ * If the slab is brand new and nobody can yet access its
+ * obj_exts, no synchronization is required and obj_exts can
+ * be simply assigned.
+ */
+ slab->obj_exts = obj_exts;
+ } else if (cmpxchg(&slab->obj_exts, 0, obj_exts)) {
+ /*
+ * If the slab is already in use, somebody can allocate and
+ * assign slabobj_exts in parallel. In this case the existing
+ * objcg vector should be reused.
+ */
+ kfree(vec);
+ return 0;
+ }
+
+ kmemleak_not_leak(vec);
+ return 0;
+}
+#endif /* CONFIG_SLAB_OBJ_EXT */
+
static struct kmem_cache *create_cache(const char *name,
unsigned int object_size, unsigned int align,
slab_flags_t flags, unsigned int useroffset,
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:48:42

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 13/39] lib: code tagging module support

Add support for code tagging from dynamically loaded modules.

Signed-off-by: Suren Baghdasaryan <[email protected]>
Co-developed-by: Kent Overstreet <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
include/linux/codetag.h | 12 +++++++++
kernel/module/main.c | 4 +++
lib/codetag.c | 58 +++++++++++++++++++++++++++++++++++++++--
3 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index a9d7adecc2a5..386733e89b31 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -42,6 +42,10 @@ struct codetag_module {
struct codetag_type_desc {
const char *section;
size_t tag_size;
+ void (*module_load)(struct codetag_type *cttype,
+ struct codetag_module *cmod);
+ void (*module_unload)(struct codetag_type *cttype,
+ struct codetag_module *cmod);
};

struct codetag_iterator {
@@ -68,4 +72,12 @@ void codetag_to_text(struct seq_buf *out, struct codetag *ct);
struct codetag_type *
codetag_register_type(const struct codetag_type_desc *desc);

+#ifdef CONFIG_CODE_TAGGING
+void codetag_load_module(struct module *mod);
+void codetag_unload_module(struct module *mod);
+#else
+static inline void codetag_load_module(struct module *mod) {}
+static inline void codetag_unload_module(struct module *mod) {}
+#endif
+
#endif /* _LINUX_CODETAG_H */
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 98fedfdb8db5..c0d3f562c7ab 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -56,6 +56,7 @@
#include <linux/dynamic_debug.h>
#include <linux/audit.h>
#include <linux/cfi.h>
+#include <linux/codetag.h>
#include <linux/debugfs.h>
#include <uapi/linux/module.h>
#include "internal.h"
@@ -1242,6 +1243,7 @@ static void free_module(struct module *mod)
{
trace_module_free(mod);

+ codetag_unload_module(mod);
mod_sysfs_teardown(mod);

/*
@@ -2975,6 +2977,8 @@ static int load_module(struct load_info *info, const char __user *uargs,
/* Get rid of temporary copy. */
free_copy(info, flags);

+ codetag_load_module(mod);
+
/* Done! */
trace_module_load(mod);

diff --git a/lib/codetag.c b/lib/codetag.c
index 7708f8388e55..4ea57fb37346 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -108,15 +108,20 @@ static inline size_t range_size(const struct codetag_type *cttype,
static void *get_symbol(struct module *mod, const char *prefix, const char *name)
{
char buf[64];
+ void *ret;
int res;

res = snprintf(buf, sizeof(buf), "%s%s", prefix, name);
if (WARN_ON(res < 1 || res > sizeof(buf)))
return NULL;

- return mod ?
+ preempt_disable();
+ ret = mod ?
(void *)find_kallsyms_symbol_value(mod, buf) :
(void *)kallsyms_lookup_name(buf);
+ preempt_enable();
+
+ return ret;
}

static struct codetag_range get_section_range(struct module *mod,
@@ -157,8 +162,11 @@ static int codetag_module_init(struct codetag_type *cttype, struct module *mod)

down_write(&cttype->mod_lock);
err = idr_alloc(&cttype->mod_idr, cmod, 0, 0, GFP_KERNEL);
- if (err >= 0)
+ if (err >= 0) {
cttype->count += range_size(cttype, &range);
+ if (cttype->desc.module_load)
+ cttype->desc.module_load(cttype, cmod);
+ }
up_write(&cttype->mod_lock);

if (err < 0) {
@@ -197,3 +205,49 @@ codetag_register_type(const struct codetag_type_desc *desc)

return cttype;
}
+
+void codetag_load_module(struct module *mod)
+{
+ struct codetag_type *cttype;
+
+ if (!mod)
+ return;
+
+ mutex_lock(&codetag_lock);
+ list_for_each_entry(cttype, &codetag_types, link)
+ codetag_module_init(cttype, mod);
+ mutex_unlock(&codetag_lock);
+}
+
+void codetag_unload_module(struct module *mod)
+{
+ struct codetag_type *cttype;
+
+ if (!mod)
+ return;
+
+ mutex_lock(&codetag_lock);
+ list_for_each_entry(cttype, &codetag_types, link) {
+ struct codetag_module *found = NULL;
+ struct codetag_module *cmod;
+ unsigned long mod_id, tmp;
+
+ down_write(&cttype->mod_lock);
+ idr_for_each_entry_ul(&cttype->mod_idr, cmod, tmp, mod_id) {
+ if (cmod->mod && cmod->mod == mod) {
+ found = cmod;
+ break;
+ }
+ }
+ if (found) {
+ if (cttype->desc.module_unload)
+ cttype->desc.module_unload(cttype, cmod);
+
+ cttype->count -= range_size(cttype, &cmod->range);
+ idr_remove(&cttype->mod_idr, mod_id);
+ kfree(cmod);
+ }
+ up_write(&cttype->mod_lock);
+ }
+ mutex_unlock(&codetag_lock);
+}
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:48:51

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 12/39] lib: code tagging framework

Add basic infrastructure to support code tagging which stores tag common
information consisting of the module name, function, file name and line
number. Provide functions to register a new code tag type and navigate
between code tags.

Co-developed-by: Kent Overstreet <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/codetag.h | 71 ++++++++++++++
lib/Kconfig.debug | 4 +
lib/Makefile | 1 +
lib/codetag.c | 199 ++++++++++++++++++++++++++++++++++++++++
4 files changed, 275 insertions(+)
create mode 100644 include/linux/codetag.h
create mode 100644 lib/codetag.c

diff --git a/include/linux/codetag.h b/include/linux/codetag.h
new file mode 100644
index 000000000000..a9d7adecc2a5
--- /dev/null
+++ b/include/linux/codetag.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * code tagging framework
+ */
+#ifndef _LINUX_CODETAG_H
+#define _LINUX_CODETAG_H
+
+#include <linux/types.h>
+
+struct codetag_iterator;
+struct codetag_type;
+struct seq_buf;
+struct module;
+
+/*
+ * An instance of this structure is created in a special ELF section at every
+ * code location being tagged. At runtime, the special section is treated as
+ * an array of these.
+ */
+struct codetag {
+ unsigned int flags; /* used in later patches */
+ unsigned int lineno;
+ const char *modname;
+ const char *function;
+ const char *filename;
+} __aligned(8);
+
+union codetag_ref {
+ struct codetag *ct;
+};
+
+struct codetag_range {
+ struct codetag *start;
+ struct codetag *stop;
+};
+
+struct codetag_module {
+ struct module *mod;
+ struct codetag_range range;
+};
+
+struct codetag_type_desc {
+ const char *section;
+ size_t tag_size;
+};
+
+struct codetag_iterator {
+ struct codetag_type *cttype;
+ struct codetag_module *cmod;
+ unsigned long mod_id;
+ struct codetag *ct;
+};
+
+#define CODE_TAG_INIT { \
+ .modname = KBUILD_MODNAME, \
+ .function = __func__, \
+ .filename = __FILE__, \
+ .lineno = __LINE__, \
+ .flags = 0, \
+}
+
+void codetag_lock_module_list(struct codetag_type *cttype, bool lock);
+struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype);
+struct codetag *codetag_next_ct(struct codetag_iterator *iter);
+
+void codetag_to_text(struct seq_buf *out, struct codetag *ct);
+
+struct codetag_type *
+codetag_register_type(const struct codetag_type_desc *desc);
+
+#endif /* _LINUX_CODETAG_H */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index fa307f93fa2e..2acbef24e93e 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -962,6 +962,10 @@ config DEBUG_STACKOVERFLOW

If in doubt, say "N".

+config CODE_TAGGING
+ bool
+ select KALLSYMS
+
source "lib/Kconfig.kasan"
source "lib/Kconfig.kfence"
source "lib/Kconfig.kmsan"
diff --git a/lib/Makefile b/lib/Makefile
index 740109b6e2c8..b50212b5b999 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -233,6 +233,7 @@ obj-$(CONFIG_OF_RECONFIG_NOTIFIER_ERROR_INJECT) += \
of-reconfig-notifier-error-inject.o
obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o

+obj-$(CONFIG_CODE_TAGGING) += codetag.o
lib-$(CONFIG_GENERIC_BUG) += bug.o

obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o
diff --git a/lib/codetag.c b/lib/codetag.c
new file mode 100644
index 000000000000..7708f8388e55
--- /dev/null
+++ b/lib/codetag.c
@@ -0,0 +1,199 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/codetag.h>
+#include <linux/idr.h>
+#include <linux/kallsyms.h>
+#include <linux/module.h>
+#include <linux/seq_buf.h>
+#include <linux/slab.h>
+
+struct codetag_type {
+ struct list_head link;
+ unsigned int count;
+ struct idr mod_idr;
+ struct rw_semaphore mod_lock; /* protects mod_idr */
+ struct codetag_type_desc desc;
+};
+
+static DEFINE_MUTEX(codetag_lock);
+static LIST_HEAD(codetag_types);
+
+void codetag_lock_module_list(struct codetag_type *cttype, bool lock)
+{
+ if (lock)
+ down_read(&cttype->mod_lock);
+ else
+ up_read(&cttype->mod_lock);
+}
+
+struct codetag_iterator codetag_get_ct_iter(struct codetag_type *cttype)
+{
+ struct codetag_iterator iter = {
+ .cttype = cttype,
+ .cmod = NULL,
+ .mod_id = 0,
+ .ct = NULL,
+ };
+
+ return iter;
+}
+
+static inline struct codetag *get_first_module_ct(struct codetag_module *cmod)
+{
+ return cmod->range.start < cmod->range.stop ? cmod->range.start : NULL;
+}
+
+static inline
+struct codetag *get_next_module_ct(struct codetag_iterator *iter)
+{
+ struct codetag *res = (struct codetag *)
+ ((char *)iter->ct + iter->cttype->desc.tag_size);
+
+ return res < iter->cmod->range.stop ? res : NULL;
+}
+
+struct codetag *codetag_next_ct(struct codetag_iterator *iter)
+{
+ struct codetag_type *cttype = iter->cttype;
+ struct codetag_module *cmod;
+ struct codetag *ct;
+
+ lockdep_assert_held(&cttype->mod_lock);
+
+ if (unlikely(idr_is_empty(&cttype->mod_idr)))
+ return NULL;
+
+ ct = NULL;
+ while (true) {
+ cmod = idr_find(&cttype->mod_idr, iter->mod_id);
+
+ /* If module was removed move to the next one */
+ if (!cmod)
+ cmod = idr_get_next_ul(&cttype->mod_idr,
+ &iter->mod_id);
+
+ /* Exit if no more modules */
+ if (!cmod)
+ break;
+
+ if (cmod != iter->cmod) {
+ iter->cmod = cmod;
+ ct = get_first_module_ct(cmod);
+ } else
+ ct = get_next_module_ct(iter);
+
+ if (ct)
+ break;
+
+ iter->mod_id++;
+ }
+
+ iter->ct = ct;
+ return ct;
+}
+
+void codetag_to_text(struct seq_buf *out, struct codetag *ct)
+{
+ seq_buf_printf(out, "%s:%u module:%s func:%s",
+ ct->filename, ct->lineno,
+ ct->modname, ct->function);
+}
+
+static inline size_t range_size(const struct codetag_type *cttype,
+ const struct codetag_range *range)
+{
+ return ((char *)range->stop - (char *)range->start) /
+ cttype->desc.tag_size;
+}
+
+static void *get_symbol(struct module *mod, const char *prefix, const char *name)
+{
+ char buf[64];
+ int res;
+
+ res = snprintf(buf, sizeof(buf), "%s%s", prefix, name);
+ if (WARN_ON(res < 1 || res > sizeof(buf)))
+ return NULL;
+
+ return mod ?
+ (void *)find_kallsyms_symbol_value(mod, buf) :
+ (void *)kallsyms_lookup_name(buf);
+}
+
+static struct codetag_range get_section_range(struct module *mod,
+ const char *section)
+{
+ return (struct codetag_range) {
+ get_symbol(mod, "__start_", section),
+ get_symbol(mod, "__stop_", section),
+ };
+}
+
+static int codetag_module_init(struct codetag_type *cttype, struct module *mod)
+{
+ struct codetag_range range;
+ struct codetag_module *cmod;
+ int err;
+
+ range = get_section_range(mod, cttype->desc.section);
+ if (!range.start || !range.stop) {
+ pr_warn("Failed to load code tags of type %s from the module %s\n",
+ cttype->desc.section,
+ mod ? mod->name : "(built-in)");
+ return -EINVAL;
+ }
+
+ /* Ignore empty ranges */
+ if (range.start == range.stop)
+ return 0;
+
+ BUG_ON(range.start > range.stop);
+
+ cmod = kmalloc(sizeof(*cmod), GFP_KERNEL);
+ if (unlikely(!cmod))
+ return -ENOMEM;
+
+ cmod->mod = mod;
+ cmod->range = range;
+
+ down_write(&cttype->mod_lock);
+ err = idr_alloc(&cttype->mod_idr, cmod, 0, 0, GFP_KERNEL);
+ if (err >= 0)
+ cttype->count += range_size(cttype, &range);
+ up_write(&cttype->mod_lock);
+
+ if (err < 0) {
+ kfree(cmod);
+ return err;
+ }
+
+ return 0;
+}
+
+struct codetag_type *
+codetag_register_type(const struct codetag_type_desc *desc)
+{
+ struct codetag_type *cttype;
+ int err;
+
+ BUG_ON(desc->tag_size <= 0);
+
+ cttype = kzalloc(sizeof(*cttype), GFP_KERNEL);
+ if (unlikely(!cttype))
+ return ERR_PTR(-ENOMEM);
+
+ cttype->desc = *desc;
+ idr_init(&cttype->mod_idr);
+ init_rwsem(&cttype->mod_lock);
+
+ err = codetag_module_init(cttype, NULL);
+ if (unlikely(err)) {
+ kfree(cttype);
+ return ERR_PTR(err);
+ }
+
+ mutex_lock(&codetag_lock);
+ list_add_tail(&cttype->link, &codetag_types);
+ mutex_unlock(&codetag_lock);
+
+ return cttype;
+}
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:48:53

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 14/39] lib: prevent module unloading if memory is not freed

Skip freeing module's data section if there are non-zero allocation tags
because otherwise, once these allocations are freed, the access to their
code tag would cause UAF.

Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/codetag.h | 6 +++---
kernel/module/main.c | 23 +++++++++++++++--------
lib/codetag.c | 11 ++++++++---
3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/include/linux/codetag.h b/include/linux/codetag.h
index 386733e89b31..d98e4c8e86f0 100644
--- a/include/linux/codetag.h
+++ b/include/linux/codetag.h
@@ -44,7 +44,7 @@ struct codetag_type_desc {
size_t tag_size;
void (*module_load)(struct codetag_type *cttype,
struct codetag_module *cmod);
- void (*module_unload)(struct codetag_type *cttype,
+ bool (*module_unload)(struct codetag_type *cttype,
struct codetag_module *cmod);
};

@@ -74,10 +74,10 @@ codetag_register_type(const struct codetag_type_desc *desc);

#ifdef CONFIG_CODE_TAGGING
void codetag_load_module(struct module *mod);
-void codetag_unload_module(struct module *mod);
+bool codetag_unload_module(struct module *mod);
#else
static inline void codetag_load_module(struct module *mod) {}
-static inline void codetag_unload_module(struct module *mod) {}
+static inline bool codetag_unload_module(struct module *mod) { return true; }
#endif

#endif /* _LINUX_CODETAG_H */
diff --git a/kernel/module/main.c b/kernel/module/main.c
index c0d3f562c7ab..079f40792ce8 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1211,15 +1211,19 @@ static void *module_memory_alloc(unsigned int size, enum mod_mem_type type)
return module_alloc(size);
}

-static void module_memory_free(void *ptr, enum mod_mem_type type)
+static void module_memory_free(void *ptr, enum mod_mem_type type,
+ bool unload_codetags)
{
+ if (!unload_codetags && mod_mem_type_is_core_data(type))
+ return;
+
if (mod_mem_use_vmalloc(type))
vfree(ptr);
else
module_memfree(ptr);
}

-static void free_mod_mem(struct module *mod)
+static void free_mod_mem(struct module *mod, bool unload_codetags)
{
for_each_mod_mem_type(type) {
struct module_memory *mod_mem = &mod->mem[type];
@@ -1230,20 +1234,23 @@ static void free_mod_mem(struct module *mod)
/* Free lock-classes; relies on the preceding sync_rcu(). */
lockdep_free_key_range(mod_mem->base, mod_mem->size);
if (mod_mem->size)
- module_memory_free(mod_mem->base, type);
+ module_memory_free(mod_mem->base, type,
+ unload_codetags);
}

/* MOD_DATA hosts mod, so free it at last */
lockdep_free_key_range(mod->mem[MOD_DATA].base, mod->mem[MOD_DATA].size);
- module_memory_free(mod->mem[MOD_DATA].base, MOD_DATA);
+ module_memory_free(mod->mem[MOD_DATA].base, MOD_DATA, unload_codetags);
}

/* Free a module, remove from lists, etc. */
static void free_module(struct module *mod)
{
+ bool unload_codetags;
+
trace_module_free(mod);

- codetag_unload_module(mod);
+ unload_codetags = codetag_unload_module(mod);
mod_sysfs_teardown(mod);

/*
@@ -1285,7 +1292,7 @@ static void free_module(struct module *mod)
kfree(mod->args);
percpu_modfree(mod);

- free_mod_mem(mod);
+ free_mod_mem(mod, unload_codetags);
}

void *__symbol_get(const char *symbol)
@@ -2295,7 +2302,7 @@ static int move_module(struct module *mod, struct load_info *info)
return 0;
out_enomem:
for (t--; t >= 0; t--)
- module_memory_free(mod->mem[t].base, t);
+ module_memory_free(mod->mem[t].base, t, true);
return ret;
}

@@ -2425,7 +2432,7 @@ static void module_deallocate(struct module *mod, struct load_info *info)
percpu_modfree(mod);
module_arch_freeing_init(mod);

- free_mod_mem(mod);
+ free_mod_mem(mod, true);
}

int __weak module_finalize(const Elf_Ehdr *hdr,
diff --git a/lib/codetag.c b/lib/codetag.c
index 4ea57fb37346..0ad4ea66c769 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -5,6 +5,7 @@
#include <linux/module.h>
#include <linux/seq_buf.h>
#include <linux/slab.h>
+#include <linux/vmalloc.h>

struct codetag_type {
struct list_head link;
@@ -219,12 +220,13 @@ void codetag_load_module(struct module *mod)
mutex_unlock(&codetag_lock);
}

-void codetag_unload_module(struct module *mod)
+bool codetag_unload_module(struct module *mod)
{
struct codetag_type *cttype;
+ bool unload_ok = true;

if (!mod)
- return;
+ return true;

mutex_lock(&codetag_lock);
list_for_each_entry(cttype, &codetag_types, link) {
@@ -241,7 +243,8 @@ void codetag_unload_module(struct module *mod)
}
if (found) {
if (cttype->desc.module_unload)
- cttype->desc.module_unload(cttype, cmod);
+ if (!cttype->desc.module_unload(cttype, cmod))
+ unload_ok = false;

cttype->count -= range_size(cttype, &cmod->range);
idr_remove(&cttype->mod_idr, mod_id);
@@ -250,4 +253,6 @@ void codetag_unload_module(struct module *mod)
up_write(&cttype->mod_lock);
}
mutex_unlock(&codetag_lock);
+
+ return unload_ok;
}
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:49:05

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 16/39] lib: introduce support for page allocation tagging

Introduce helper functions to easily instrument page allocators by
storing a pointer to the allocation tag associated with the code that
allocated the page in a page_ext field.

Signed-off-by: Suren Baghdasaryan <[email protected]>
Co-developed-by: Kent Overstreet <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
include/linux/page_ext.h | 1 -
include/linux/pgalloc_tag.h | 73 +++++++++++++++++++++++++++++++++++++
lib/Kconfig.debug | 1 +
lib/alloc_tag.c | 17 +++++++++
mm/mm_init.c | 1 +
mm/page_alloc.c | 4 ++
mm/page_ext.c | 4 ++
7 files changed, 100 insertions(+), 1 deletion(-)
create mode 100644 include/linux/pgalloc_tag.h

diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index be98564191e6..07e0656898f9 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -4,7 +4,6 @@

#include <linux/types.h>
#include <linux/stacktrace.h>
-#include <linux/stackdepot.h>

struct pglist_data;

diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
new file mode 100644
index 000000000000..a060c26eb449
--- /dev/null
+++ b/include/linux/pgalloc_tag.h
@@ -0,0 +1,73 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * page allocation tagging
+ */
+#ifndef _LINUX_PGALLOC_TAG_H
+#define _LINUX_PGALLOC_TAG_H
+
+#include <linux/alloc_tag.h>
+
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+
+#include <linux/page_ext.h>
+
+extern struct page_ext_operations page_alloc_tagging_ops;
+extern struct page_ext *page_ext_get(struct page *page);
+extern void page_ext_put(struct page_ext *page_ext);
+
+static inline union codetag_ref *codetag_ref_from_page_ext(struct page_ext *page_ext)
+{
+ return (void *)page_ext + page_alloc_tagging_ops.offset;
+}
+
+static inline struct page_ext *page_ext_from_codetag_ref(union codetag_ref *ref)
+{
+ return (void *)ref - page_alloc_tagging_ops.offset;
+}
+
+static inline union codetag_ref *get_page_tag_ref(struct page *page)
+{
+ if (page && mem_alloc_profiling_enabled()) {
+ struct page_ext *page_ext = page_ext_get(page);
+
+ if (page_ext)
+ return codetag_ref_from_page_ext(page_ext);
+ }
+ return NULL;
+}
+
+static inline void put_page_tag_ref(union codetag_ref *ref)
+{
+ page_ext_put(page_ext_from_codetag_ref(ref));
+}
+
+static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
+ unsigned int order)
+{
+ union codetag_ref *ref = get_page_tag_ref(page);
+
+ if (ref) {
+ alloc_tag_add(ref, task->alloc_tag, PAGE_SIZE << order);
+ put_page_tag_ref(ref);
+ }
+}
+
+static inline void pgalloc_tag_sub(struct page *page, unsigned int order)
+{
+ union codetag_ref *ref = get_page_tag_ref(page);
+
+ if (ref) {
+ alloc_tag_sub(ref, PAGE_SIZE << order);
+ put_page_tag_ref(ref);
+ }
+}
+
+#else /* CONFIG_MEM_ALLOC_PROFILING */
+
+static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
+ unsigned int order) {}
+static inline void pgalloc_tag_sub(struct page *page, unsigned int order) {}
+
+#endif /* CONFIG_MEM_ALLOC_PROFILING */
+
+#endif /* _LINUX_PGALLOC_TAG_H */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 475a14e70566..e1eda1450d68 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -972,6 +972,7 @@ config MEM_ALLOC_PROFILING
depends on PROC_FS
depends on !DEBUG_FORCE_WEAK_PER_CPU
select CODE_TAGGING
+ select PAGE_EXTENSION
help
Track allocation source code and record total allocation size
initiated at that code location. The mechanism can be used to track
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 4fc031f9cefd..2d5226d9262d 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -3,6 +3,7 @@
#include <linux/fs.h>
#include <linux/gfp.h>
#include <linux/module.h>
+#include <linux/page_ext.h>
#include <linux/proc_fs.h>
#include <linux/seq_buf.h>
#include <linux/seq_file.h>
@@ -124,6 +125,22 @@ static bool alloc_tag_module_unload(struct codetag_type *cttype,
return module_unused;
}

+static __init bool need_page_alloc_tagging(void)
+{
+ return true;
+}
+
+static __init void init_page_alloc_tagging(void)
+{
+}
+
+struct page_ext_operations page_alloc_tagging_ops = {
+ .size = sizeof(union codetag_ref),
+ .need = need_page_alloc_tagging,
+ .init = init_page_alloc_tagging,
+};
+EXPORT_SYMBOL(page_alloc_tagging_ops);
+
static struct ctl_table memory_allocation_profiling_sysctls[] = {
{
.procname = "mem_profiling",
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 50f2f34745af..8e72e431dc35 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -24,6 +24,7 @@
#include <linux/page_ext.h>
#include <linux/pti.h>
#include <linux/pgtable.h>
+#include <linux/stackdepot.h>
#include <linux/swap.h>
#include <linux/cma.h>
#include "internal.h"
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 95546f376302..d490d0f73e72 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -52,6 +52,7 @@
#include <linux/psi.h>
#include <linux/khugepaged.h>
#include <linux/delayacct.h>
+#include <linux/pgalloc_tag.h>
#include <asm/div64.h>
#include "internal.h"
#include "shuffle.h"
@@ -1093,6 +1094,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
__memcg_kmem_uncharge_page(page, order);
reset_page_owner(page, order);
page_table_check_free(page, order);
+ pgalloc_tag_sub(page, order);
return false;
}

@@ -1135,6 +1137,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
reset_page_owner(page, order);
page_table_check_free(page, order);
+ pgalloc_tag_sub(page, order);

if (!PageHighMem(page)) {
debug_check_no_locks_freed(page_address(page),
@@ -1535,6 +1538,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order,

set_page_owner(page, order, gfp_flags);
page_table_check_alloc(page, order);
+ pgalloc_tag_add(page, current, order);
}

static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags,
diff --git a/mm/page_ext.c b/mm/page_ext.c
index 4548fcc66d74..3c58fe8a24df 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -10,6 +10,7 @@
#include <linux/page_idle.h>
#include <linux/page_table_check.h>
#include <linux/rcupdate.h>
+#include <linux/pgalloc_tag.h>

/*
* struct page extension
@@ -82,6 +83,9 @@ static struct page_ext_operations *page_ext_ops[] __initdata = {
#if defined(CONFIG_PAGE_IDLE_FLAG) && !defined(CONFIG_64BIT)
&page_idle_ops,
#endif
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+ &page_alloc_tagging_ops,
+#endif
#ifdef CONFIG_PAGE_TABLE_CHECK
&page_table_check_ops,
#endif
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:49:25

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 15/39] lib: add allocation tagging support for memory allocation profiling

Introduce CONFIG_MEM_ALLOC_PROFILING which provides definitions to easily
instrument memory allocators. It registers an "alloc_tags" codetag type
with /proc/allocinfo interface to output allocation tag information when
the feature is enabled.
CONFIG_MEM_ALLOC_PROFILING_DEBUG is provided for debugging the memory
allocation profiling instrumentation.
Memory allocation profiling can be enabled or disabled at runtime using
/proc/sys/vm/mem_profiling sysctl when CONFIG_MEM_ALLOC_PROFILING_DEBUG=n.
CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT enables memory allocation
profiling by default.

Co-developed-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
Documentation/admin-guide/sysctl/vm.rst | 16 +++
Documentation/filesystems/proc.rst | 28 +++++
include/asm-generic/codetag.lds.h | 14 +++
include/asm-generic/vmlinux.lds.h | 3 +
include/linux/alloc_tag.h | 133 ++++++++++++++++++++
include/linux/sched.h | 24 ++++
lib/Kconfig.debug | 25 ++++
lib/Makefile | 2 +
lib/alloc_tag.c | 158 ++++++++++++++++++++++++
scripts/module.lds.S | 7 ++
10 files changed, 410 insertions(+)
create mode 100644 include/asm-generic/codetag.lds.h
create mode 100644 include/linux/alloc_tag.h
create mode 100644 lib/alloc_tag.c

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 45ba1f4dc004..0a012ac13a38 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -43,6 +43,7 @@ Currently, these files are in /proc/sys/vm:
- legacy_va_layout
- lowmem_reserve_ratio
- max_map_count
+- mem_profiling (only if CONFIG_MEM_ALLOC_PROFILING=y)
- memory_failure_early_kill
- memory_failure_recovery
- min_free_kbytes
@@ -425,6 +426,21 @@ e.g., up to one or two maps per allocation.
The default value is 65530.


+mem_profiling
+==============
+
+Enable memory profiling (when CONFIG_MEM_ALLOC_PROFILING=y)
+
+1: Enable memory profiling.
+
+0: Disabld memory profiling.
+
+Enabling memory profiling introduces a small performance overhead for all
+memory allocations.
+
+The default value depends on CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT.
+
+
memory_failure_early_kill:
==========================

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 2b59cff8be17..41a7e6d95fe8 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -688,6 +688,7 @@ files are there, and which are missing.
============ ===============================================================
File Content
============ ===============================================================
+ allocinfo Memory allocations profiling information
apm Advanced power management info
buddyinfo Kernel memory allocator information (see text) (2.5)
bus Directory containing bus specific information
@@ -947,6 +948,33 @@ also be allocatable although a lot of filesystem metadata may have to be
reclaimed to achieve this.


+allocinfo
+~~~~~~~
+
+Provides information about memory allocations at all locations in the code
+base. Each allocation in the code is identified by its source file, line
+number, module and the function calling the allocation. The number of bytes
+allocated at each location is reported.
+
+Example output.
+
+::
+
+ > cat /proc/allocinfo
+
+ 153MiB mm/slub.c:1826 module:slub func:alloc_slab_page
+ 6.08MiB mm/slab_common.c:950 module:slab_common func:_kmalloc_order
+ 5.09MiB mm/memcontrol.c:2814 module:memcontrol func:alloc_slab_obj_exts
+ 4.54MiB mm/page_alloc.c:5777 module:page_alloc func:alloc_pages_exact
+ 1.32MiB include/asm-generic/pgalloc.h:63 module:pgtable func:__pte_alloc_one
+ 1.16MiB fs/xfs/xfs_log_priv.h:700 module:xfs func:xlog_kvmalloc
+ 1.00MiB mm/swap_cgroup.c:48 module:swap_cgroup func:swap_cgroup_prepare
+ 734KiB fs/xfs/kmem.c:20 module:xfs func:kmem_alloc
+ 640KiB kernel/rcu/tree.c:3184 module:tree func:fill_page_cache_func
+ 640KiB drivers/char/virtio_console.c:452 module:virtio_console func:alloc_buf
+ ...
+
+
meminfo
~~~~~~~

diff --git a/include/asm-generic/codetag.lds.h b/include/asm-generic/codetag.lds.h
new file mode 100644
index 000000000000..64f536b80380
--- /dev/null
+++ b/include/asm-generic/codetag.lds.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_CODETAG_LDS_H
+#define __ASM_GENERIC_CODETAG_LDS_H
+
+#define SECTION_WITH_BOUNDARIES(_name) \
+ . = ALIGN(8); \
+ __start_##_name = .; \
+ KEEP(*(_name)) \
+ __stop_##_name = .;
+
+#define CODETAG_SECTIONS() \
+ SECTION_WITH_BOUNDARIES(alloc_tags)
+
+#endif /* __ASM_GENERIC_CODETAG_LDS_H */
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 67d8dd2f1bde..f04661824939 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -50,6 +50,8 @@
* [__nosave_begin, __nosave_end] for the nosave data
*/

+#include <asm-generic/codetag.lds.h>
+
#ifndef LOAD_OFFSET
#define LOAD_OFFSET 0
#endif
@@ -367,6 +369,7 @@
. = ALIGN(8); \
BOUNDED_SECTION_BY(__dyndbg_classes, ___dyndbg_classes) \
BOUNDED_SECTION_BY(__dyndbg, ___dyndbg) \
+ CODETAG_SECTIONS() \
LIKELY_PROFILE() \
BRANCH_PROFILE() \
TRACE_PRINTKS() \
diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
new file mode 100644
index 000000000000..cf55a149fa84
--- /dev/null
+++ b/include/linux/alloc_tag.h
@@ -0,0 +1,133 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * allocation tagging
+ */
+#ifndef _LINUX_ALLOC_TAG_H
+#define _LINUX_ALLOC_TAG_H
+
+#include <linux/bug.h>
+#include <linux/codetag.h>
+#include <linux/container_of.h>
+#include <linux/preempt.h>
+#include <asm/percpu.h>
+#include <linux/cpumask.h>
+#include <linux/static_key.h>
+
+struct alloc_tag_counters {
+ u64 bytes;
+ u64 calls;
+};
+
+/*
+ * An instance of this structure is created in a special ELF section at every
+ * allocation callsite. At runtime, the special section is treated as
+ * an array of these. Embedded codetag utilizes codetag framework.
+ */
+struct alloc_tag {
+ struct codetag ct;
+ struct alloc_tag_counters __percpu *counters;
+} __aligned(8);
+
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+
+static inline struct alloc_tag *ct_to_alloc_tag(struct codetag *ct)
+{
+ return container_of(ct, struct alloc_tag, ct);
+}
+
+#ifdef ARCH_NEEDS_WEAK_PER_CPU
+/*
+ * When percpu variables are required to be defined as weak, static percpu
+ * variables can't be used inside a function (see comments for DECLARE_PER_CPU_SECTION).
+ */
+#error "Memory allocation profiling is incompatible with ARCH_NEEDS_WEAK_PER_CPU"
+#endif
+
+#define DEFINE_ALLOC_TAG(_alloc_tag, _old) \
+ static DEFINE_PER_CPU(struct alloc_tag_counters, _alloc_tag_cntr); \
+ static struct alloc_tag _alloc_tag __used __aligned(8) \
+ __section("alloc_tags") = { \
+ .ct = CODE_TAG_INIT, \
+ .counters = &_alloc_tag_cntr }; \
+ struct alloc_tag * __maybe_unused _old = alloc_tag_save(&_alloc_tag)
+
+DECLARE_STATIC_KEY_MAYBE(CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT,
+ mem_alloc_profiling_key);
+
+static inline bool mem_alloc_profiling_enabled(void)
+{
+ return static_branch_maybe(CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT,
+ &mem_alloc_profiling_key);
+}
+
+static inline struct alloc_tag_counters alloc_tag_read(struct alloc_tag *tag)
+{
+ struct alloc_tag_counters v = { 0, 0 };
+ struct alloc_tag_counters *counter;
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ counter = per_cpu_ptr(tag->counters, cpu);
+ v.bytes += counter->bytes;
+ v.calls += counter->calls;
+ }
+
+ return v;
+}
+
+static inline void __alloc_tag_sub(union codetag_ref *ref, size_t bytes)
+{
+ struct alloc_tag *tag;
+
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+ WARN_ONCE(ref && !ref->ct, "alloc_tag was not set\n");
+#endif
+ if (!ref || !ref->ct)
+ return;
+
+ tag = ct_to_alloc_tag(ref->ct);
+
+ this_cpu_sub(tag->counters->bytes, bytes);
+ this_cpu_dec(tag->counters->calls);
+
+ ref->ct = NULL;
+}
+
+static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes)
+{
+ __alloc_tag_sub(ref, bytes);
+}
+
+static inline void alloc_tag_sub_noalloc(union codetag_ref *ref, size_t bytes)
+{
+ __alloc_tag_sub(ref, bytes);
+}
+
+static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag, size_t bytes)
+{
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+ WARN_ONCE(ref && ref->ct,
+ "alloc_tag was not cleared (got tag for %s:%u)\n",\
+ ref->ct->filename, ref->ct->lineno);
+
+ WARN_ONCE(!tag, "current->alloc_tag not set");
+#endif
+ if (!ref || !tag)
+ return;
+
+ ref->ct = &tag->ct;
+ this_cpu_add(tag->counters->bytes, bytes);
+ this_cpu_inc(tag->counters->calls);
+}
+
+#else
+
+#define DEFINE_ALLOC_TAG(_alloc_tag, _old)
+static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes) {}
+static inline void alloc_tag_sub_noalloc(union codetag_ref *ref, size_t bytes) {}
+static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag,
+ size_t bytes) {}
+
+#endif
+
+#endif /* _LINUX_ALLOC_TAG_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 12a2554a3164..d35901d8de06 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -767,6 +767,10 @@ struct task_struct {
unsigned int flags;
unsigned int ptrace;

+#ifdef CONFIG_MEM_ALLOC_PROFILING
+ struct alloc_tag *alloc_tag;
+#endif
+
#ifdef CONFIG_SMP
int on_cpu;
struct __call_single_node wake_entry;
@@ -806,6 +810,7 @@ struct task_struct {
struct task_group *sched_task_group;
#endif

+
#ifdef CONFIG_UCLAMP_TASK
/*
* Clamp values requested for a scheduling entity.
@@ -2457,4 +2462,23 @@ static inline int sched_core_idle_cpu(int cpu) { return idle_cpu(cpu); }

extern void sched_set_stop_task(int cpu, struct task_struct *stop);

+#ifdef CONFIG_MEM_ALLOC_PROFILING
+static inline struct alloc_tag *alloc_tag_save(struct alloc_tag *tag)
+{
+ swap(current->alloc_tag, tag);
+ return tag;
+}
+
+static inline void alloc_tag_restore(struct alloc_tag *tag, struct alloc_tag *old)
+{
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+ WARN(current->alloc_tag != tag, "current->alloc_tag was changed:\n");
+#endif
+ current->alloc_tag = old;
+}
+#else
+static inline struct alloc_tag *alloc_tag_save(struct alloc_tag *tag) { return NULL; }
+#define alloc_tag_restore(_tag, _old)
+#endif
+
#endif
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 2acbef24e93e..475a14e70566 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -966,6 +966,31 @@ config CODE_TAGGING
bool
select KALLSYMS

+config MEM_ALLOC_PROFILING
+ bool "Enable memory allocation profiling"
+ default n
+ depends on PROC_FS
+ depends on !DEBUG_FORCE_WEAK_PER_CPU
+ select CODE_TAGGING
+ help
+ Track allocation source code and record total allocation size
+ initiated at that code location. The mechanism can be used to track
+ memory leaks with a low performance and memory impact.
+
+config MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
+ bool "Enable memory allocation profiling by default"
+ default y
+ depends on MEM_ALLOC_PROFILING
+
+config MEM_ALLOC_PROFILING_DEBUG
+ bool "Memory allocation profiler debugging"
+ default n
+ depends on MEM_ALLOC_PROFILING
+ select MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
+ help
+ Adds warnings with helpful error messages for memory allocation
+ profiling.
+
source "lib/Kconfig.kasan"
source "lib/Kconfig.kfence"
source "lib/Kconfig.kmsan"
diff --git a/lib/Makefile b/lib/Makefile
index b50212b5b999..4525469637fe 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -234,6 +234,8 @@ obj-$(CONFIG_OF_RECONFIG_NOTIFIER_ERROR_INJECT) += \
obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o

obj-$(CONFIG_CODE_TAGGING) += codetag.o
+obj-$(CONFIG_MEM_ALLOC_PROFILING) += alloc_tag.o
+
lib-$(CONFIG_GENERIC_BUG) += bug.o

obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
new file mode 100644
index 000000000000..4fc031f9cefd
--- /dev/null
+++ b/lib/alloc_tag.c
@@ -0,0 +1,158 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/alloc_tag.h>
+#include <linux/fs.h>
+#include <linux/gfp.h>
+#include <linux/module.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_buf.h>
+#include <linux/seq_file.h>
+
+static struct codetag_type *alloc_tag_cttype;
+
+DEFINE_STATIC_KEY_MAYBE(CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT,
+ mem_alloc_profiling_key);
+
+static void *allocinfo_start(struct seq_file *m, loff_t *pos)
+{
+ struct codetag_iterator *iter;
+ struct codetag *ct;
+ loff_t node = *pos;
+
+ iter = kzalloc(sizeof(*iter), GFP_KERNEL);
+ m->private = iter;
+ if (!iter)
+ return NULL;
+
+ codetag_lock_module_list(alloc_tag_cttype, true);
+ *iter = codetag_get_ct_iter(alloc_tag_cttype);
+ while ((ct = codetag_next_ct(iter)) != NULL && node)
+ node--;
+
+ return ct ? iter : NULL;
+}
+
+static void *allocinfo_next(struct seq_file *m, void *arg, loff_t *pos)
+{
+ struct codetag_iterator *iter = (struct codetag_iterator *)arg;
+ struct codetag *ct = codetag_next_ct(iter);
+
+ (*pos)++;
+ if (!ct)
+ return NULL;
+
+ return iter;
+}
+
+static void allocinfo_stop(struct seq_file *m, void *arg)
+{
+ struct codetag_iterator *iter = (struct codetag_iterator *)m->private;
+
+ if (iter) {
+ codetag_lock_module_list(alloc_tag_cttype, false);
+ kfree(iter);
+ }
+}
+
+static void alloc_tag_to_text(struct seq_buf *out, struct codetag *ct)
+{
+ struct alloc_tag *tag = ct_to_alloc_tag(ct);
+ struct alloc_tag_counters counter = alloc_tag_read(tag);
+ s64 bytes = counter.bytes;
+ char val[10], *p = val;
+
+ if (bytes < 0) {
+ *p++ = '-';
+ bytes = -bytes;
+ }
+
+ string_get_size(bytes, 1,
+ STRING_SIZE_BASE2|STRING_SIZE_NOSPACE,
+ p, val + ARRAY_SIZE(val) - p);
+
+ seq_buf_printf(out, "%8s %8llu ", val, counter.calls);
+ codetag_to_text(out, ct);
+ seq_buf_putc(out, ' ');
+ seq_buf_putc(out, '\n');
+}
+
+static int allocinfo_show(struct seq_file *m, void *arg)
+{
+ struct codetag_iterator *iter = (struct codetag_iterator *)arg;
+ char *bufp;
+ size_t n = seq_get_buf(m, &bufp);
+ struct seq_buf buf;
+
+ seq_buf_init(&buf, bufp, n);
+ alloc_tag_to_text(&buf, iter->ct);
+ seq_commit(m, seq_buf_used(&buf));
+ return 0;
+}
+
+static const struct seq_operations allocinfo_seq_op = {
+ .start = allocinfo_start,
+ .next = allocinfo_next,
+ .stop = allocinfo_stop,
+ .show = allocinfo_show,
+};
+
+static void __init procfs_init(void)
+{
+ proc_create_seq("allocinfo", 0444, NULL, &allocinfo_seq_op);
+}
+
+static bool alloc_tag_module_unload(struct codetag_type *cttype,
+ struct codetag_module *cmod)
+{
+ struct codetag_iterator iter = codetag_get_ct_iter(cttype);
+ struct alloc_tag_counters counter;
+ bool module_unused = true;
+ struct alloc_tag *tag;
+ struct codetag *ct;
+
+ for (ct = codetag_next_ct(&iter); ct; ct = codetag_next_ct(&iter)) {
+ if (iter.cmod != cmod)
+ continue;
+
+ tag = ct_to_alloc_tag(ct);
+ counter = alloc_tag_read(tag);
+
+ if (WARN(counter.bytes, "%s:%u module %s func:%s has %llu allocated at module unload",
+ ct->filename, ct->lineno, ct->modname, ct->function, counter.bytes))
+ module_unused = false;
+ }
+
+ return module_unused;
+}
+
+static struct ctl_table memory_allocation_profiling_sysctls[] = {
+ {
+ .procname = "mem_profiling",
+ .data = &mem_alloc_profiling_key,
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+ .mode = 0444,
+#else
+ .mode = 0644,
+#endif
+ .proc_handler = proc_do_static_key,
+ },
+ { }
+};
+
+static int __init alloc_tag_init(void)
+{
+ const struct codetag_type_desc desc = {
+ .section = "alloc_tags",
+ .tag_size = sizeof(struct alloc_tag),
+ .module_unload = alloc_tag_module_unload,
+ };
+
+ alloc_tag_cttype = codetag_register_type(&desc);
+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
+ return PTR_ERR(alloc_tag_cttype);
+
+ register_sysctl_init("vm", memory_allocation_profiling_sysctls);
+ procfs_init();
+
+ return 0;
+}
+module_init(alloc_tag_init);
diff --git a/scripts/module.lds.S b/scripts/module.lds.S
index bf5bcf2836d8..45c67a0994f3 100644
--- a/scripts/module.lds.S
+++ b/scripts/module.lds.S
@@ -9,6 +9,8 @@
#define DISCARD_EH_FRAME *(.eh_frame)
#endif

+#include <asm-generic/codetag.lds.h>
+
SECTIONS {
/DISCARD/ : {
*(.discard)
@@ -47,12 +49,17 @@ SECTIONS {
.data : {
*(.data .data.[0-9a-zA-Z_]*)
*(.data..L*)
+ CODETAG_SECTIONS()
}

.rodata : {
*(.rodata .rodata.[0-9a-zA-Z_]*)
*(.rodata..L*)
}
+#else
+ .data : {
+ CODETAG_SECTIONS()
+ }
#endif
}

--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:49:26

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 17/39] change alloc_pages name in dma_map_ops to avoid name conflicts

After redefining alloc_pages, all uses of that name are being replaced.
Change the conflicting names to prevent preprocessor from replacing them
when it's not intended.

Signed-off-by: Suren Baghdasaryan <[email protected]>
---
arch/x86/kernel/amd_gart_64.c | 2 +-
drivers/iommu/dma-iommu.c | 2 +-
drivers/xen/grant-dma-ops.c | 2 +-
drivers/xen/swiotlb-xen.c | 2 +-
include/linux/dma-map-ops.h | 2 +-
kernel/dma/mapping.c | 4 ++--
6 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
index 56a917df410d..842a0ec5eaa9 100644
--- a/arch/x86/kernel/amd_gart_64.c
+++ b/arch/x86/kernel/amd_gart_64.c
@@ -676,7 +676,7 @@ static const struct dma_map_ops gart_dma_ops = {
.get_sgtable = dma_common_get_sgtable,
.dma_supported = dma_direct_supported,
.get_required_mask = dma_direct_get_required_mask,
- .alloc_pages = dma_direct_alloc_pages,
+ .alloc_pages_op = dma_direct_alloc_pages,
.free_pages = dma_direct_free_pages,
};

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4b1a88f514c9..28b7b2d10655 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1603,7 +1603,7 @@ static const struct dma_map_ops iommu_dma_ops = {
.flags = DMA_F_PCI_P2PDMA_SUPPORTED,
.alloc = iommu_dma_alloc,
.free = iommu_dma_free,
- .alloc_pages = dma_common_alloc_pages,
+ .alloc_pages_op = dma_common_alloc_pages,
.free_pages = dma_common_free_pages,
.alloc_noncontiguous = iommu_dma_alloc_noncontiguous,
.free_noncontiguous = iommu_dma_free_noncontiguous,
diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
index 76f6f26265a3..29257d2639db 100644
--- a/drivers/xen/grant-dma-ops.c
+++ b/drivers/xen/grant-dma-ops.c
@@ -282,7 +282,7 @@ static int xen_grant_dma_supported(struct device *dev, u64 mask)
static const struct dma_map_ops xen_grant_dma_ops = {
.alloc = xen_grant_dma_alloc,
.free = xen_grant_dma_free,
- .alloc_pages = xen_grant_dma_alloc_pages,
+ .alloc_pages_op = xen_grant_dma_alloc_pages,
.free_pages = xen_grant_dma_free_pages,
.mmap = dma_common_mmap,
.get_sgtable = dma_common_get_sgtable,
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 946bd56f0ac5..4f1e3f1fc44e 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -403,6 +403,6 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
.dma_supported = xen_swiotlb_dma_supported,
.mmap = dma_common_mmap,
.get_sgtable = dma_common_get_sgtable,
- .alloc_pages = dma_common_alloc_pages,
+ .alloc_pages_op = dma_common_alloc_pages,
.free_pages = dma_common_free_pages,
};
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index f2fc203fb8a1..3a8a015fdd2e 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -28,7 +28,7 @@ struct dma_map_ops {
unsigned long attrs);
void (*free)(struct device *dev, size_t size, void *vaddr,
dma_addr_t dma_handle, unsigned long attrs);
- struct page *(*alloc_pages)(struct device *dev, size_t size,
+ struct page *(*alloc_pages_op)(struct device *dev, size_t size,
dma_addr_t *dma_handle, enum dma_data_direction dir,
gfp_t gfp);
void (*free_pages)(struct device *dev, size_t size, struct page *vaddr,
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index e323ca48f7f2..58e490e2cfb4 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -570,9 +570,9 @@ static struct page *__dma_alloc_pages(struct device *dev, size_t size,
size = PAGE_ALIGN(size);
if (dma_alloc_direct(dev, ops))
return dma_direct_alloc_pages(dev, size, dma_handle, dir, gfp);
- if (!ops->alloc_pages)
+ if (!ops->alloc_pages_op)
return NULL;
- return ops->alloc_pages(dev, size, dma_handle, dir, gfp);
+ return ops->alloc_pages_op(dev, size, dma_handle, dir, gfp);
}

struct page *dma_alloc_pages(struct device *dev, size_t size,
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:49:39

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 18/39] change alloc_pages name in ivpu_bo_ops to avoid conflicts

From: Kent Overstreet <[email protected]>

After redefining alloc_pages, all uses of that name are being replaced.
Change the conflicting names to prevent preprocessor from replacing them
when it's not intended.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
drivers/accel/ivpu/ivpu_gem.c | 8 ++++----
drivers/accel/ivpu/ivpu_gem.h | 2 +-
2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/accel/ivpu/ivpu_gem.c b/drivers/accel/ivpu/ivpu_gem.c
index d09f13b35902..d324eaf5bbe3 100644
--- a/drivers/accel/ivpu/ivpu_gem.c
+++ b/drivers/accel/ivpu/ivpu_gem.c
@@ -61,7 +61,7 @@ static void prime_unmap_pages_locked(struct ivpu_bo *bo)
static const struct ivpu_bo_ops prime_ops = {
.type = IVPU_BO_TYPE_PRIME,
.name = "prime",
- .alloc_pages = prime_alloc_pages_locked,
+ .alloc_pages_op = prime_alloc_pages_locked,
.free_pages = prime_free_pages_locked,
.map_pages = prime_map_pages_locked,
.unmap_pages = prime_unmap_pages_locked,
@@ -134,7 +134,7 @@ static void ivpu_bo_unmap_pages_locked(struct ivpu_bo *bo)
static const struct ivpu_bo_ops shmem_ops = {
.type = IVPU_BO_TYPE_SHMEM,
.name = "shmem",
- .alloc_pages = shmem_alloc_pages_locked,
+ .alloc_pages_op = shmem_alloc_pages_locked,
.free_pages = shmem_free_pages_locked,
.map_pages = ivpu_bo_map_pages_locked,
.unmap_pages = ivpu_bo_unmap_pages_locked,
@@ -186,7 +186,7 @@ static void internal_free_pages_locked(struct ivpu_bo *bo)
static const struct ivpu_bo_ops internal_ops = {
.type = IVPU_BO_TYPE_INTERNAL,
.name = "internal",
- .alloc_pages = internal_alloc_pages_locked,
+ .alloc_pages_op = internal_alloc_pages_locked,
.free_pages = internal_free_pages_locked,
.map_pages = ivpu_bo_map_pages_locked,
.unmap_pages = ivpu_bo_unmap_pages_locked,
@@ -200,7 +200,7 @@ static int __must_check ivpu_bo_alloc_and_map_pages_locked(struct ivpu_bo *bo)
lockdep_assert_held(&bo->lock);
drm_WARN_ON(&vdev->drm, bo->sgt);

- ret = bo->ops->alloc_pages(bo);
+ ret = bo->ops->alloc_pages_op(bo);
if (ret) {
ivpu_err(vdev, "Failed to allocate pages for BO: %d", ret);
return ret;
diff --git a/drivers/accel/ivpu/ivpu_gem.h b/drivers/accel/ivpu/ivpu_gem.h
index 6b0ceda5f253..b81cf2af0b2d 100644
--- a/drivers/accel/ivpu/ivpu_gem.h
+++ b/drivers/accel/ivpu/ivpu_gem.h
@@ -42,7 +42,7 @@ enum ivpu_bo_type {
struct ivpu_bo_ops {
enum ivpu_bo_type type;
const char *name;
- int (*alloc_pages)(struct ivpu_bo *bo);
+ int (*alloc_pages_op)(struct ivpu_bo *bo);
void (*free_pages)(struct ivpu_bo *bo);
int (*map_pages)(struct ivpu_bo *bo);
void (*unmap_pages)(struct ivpu_bo *bo);
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:50:08

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 20/39] mm: create new codetag references during page splitting

When a high-order page is split into smaller ones, each newly split
page should get its codetag. The original codetag is reused for these
pages but it's recorded as 0-byte allocation because original codetag
already accounts for the original high-order allocated page.

Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/pgalloc_tag.h | 30 ++++++++++++++++++++++++++++++
mm/huge_memory.c | 2 ++
mm/page_alloc.c | 2 ++
3 files changed, 34 insertions(+)

diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index a060c26eb449..0174aff5e871 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -62,11 +62,41 @@ static inline void pgalloc_tag_sub(struct page *page, unsigned int order)
}
}

+static inline void pgalloc_tag_split(struct page *page, unsigned int nr)
+{
+ int i;
+ struct page_ext *page_ext;
+ union codetag_ref *ref;
+ struct alloc_tag *tag;
+
+ if (!mem_alloc_profiling_enabled())
+ return;
+
+ page_ext = page_ext_get(page);
+ if (unlikely(!page_ext))
+ return;
+
+ ref = codetag_ref_from_page_ext(page_ext);
+ if (!ref->ct)
+ goto out;
+
+ tag = ct_to_alloc_tag(ref->ct);
+ page_ext = page_ext_next(page_ext);
+ for (i = 1; i < nr; i++) {
+ /* New reference with 0 bytes accounted */
+ alloc_tag_add(codetag_ref_from_page_ext(page_ext), tag, 0);
+ page_ext = page_ext_next(page_ext);
+ }
+out:
+ page_ext_put(page_ext);
+}
+
#else /* CONFIG_MEM_ALLOC_PROFILING */

static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
unsigned int order) {}
static inline void pgalloc_tag_sub(struct page *page, unsigned int order) {}
+static inline void pgalloc_tag_split(struct page *page, unsigned int nr) {}

#endif /* CONFIG_MEM_ALLOC_PROFILING */

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 064fbd90822b..392b6907d875 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -37,6 +37,7 @@
#include <linux/page_owner.h>
#include <linux/sched/sysctl.h>
#include <linux/memory-tiers.h>
+#include <linux/pgalloc_tag.h>

#include <asm/tlb.h>
#include <asm/pgalloc.h>
@@ -2545,6 +2546,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
/* Caller disabled irqs, so they are still disabled here */

split_page_owner(head, nr);
+ pgalloc_tag_split(head, nr);

/* See comment in __split_huge_page_tail() */
if (PageAnon(head)) {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 63dc2f8c7901..c4f0cd127e14 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2540,6 +2540,7 @@ void split_page(struct page *page, unsigned int order)
for (i = 1; i < (1 << order); i++)
set_page_refcounted(page + i);
split_page_owner(page, 1 << order);
+ pgalloc_tag_split(page, 1 << order);
split_page_memcg(page, 1 << order);
}
EXPORT_SYMBOL_GPL(split_page);
@@ -4669,6 +4670,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
struct page *last = page + nr;

split_page_owner(page, 1 << order);
+ pgalloc_tag_split(page, 1 << order);
split_page_memcg(page, 1 << order);
while (page < --last)
set_page_refcounted(last);
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:50:17

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 19/39] mm: enable page allocation tagging

Redefine page allocators to record allocation tags upon their invocation.
Instrument post_alloc_hook and free_pages_prepare to modify current
allocation tag.

Co-developed-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
include/linux/alloc_tag.h | 10 ++++
include/linux/gfp.h | 111 +++++++++++++++++++++++---------------
include/linux/pagemap.h | 9 ++--
mm/compaction.c | 7 ++-
mm/filemap.c | 6 +--
mm/mempolicy.c | 42 +++++++--------
mm/page_alloc.c | 60 ++++++++++-----------
7 files changed, 144 insertions(+), 101 deletions(-)

diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index cf55a149fa84..6fa8a94d8bc1 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -130,4 +130,14 @@ static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag,

#endif

+#define alloc_hooks(_do_alloc) \
+({ \
+ typeof(_do_alloc) _res; \
+ DEFINE_ALLOC_TAG(_alloc_tag, _old); \
+ \
+ _res = _do_alloc; \
+ alloc_tag_restore(&_alloc_tag, _old); \
+ _res; \
+})
+
#endif /* _LINUX_ALLOC_TAG_H */
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 665f06675c83..20686fd1f417 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -6,6 +6,8 @@

#include <linux/mmzone.h>
#include <linux/topology.h>
+#include <linux/alloc_tag.h>
+#include <linux/sched.h>

struct vm_area_struct;

@@ -174,42 +176,43 @@ static inline void arch_free_page(struct page *page, int order) { }
static inline void arch_alloc_page(struct page *page, int order) { }
#endif

-struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
+struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
nodemask_t *nodemask);
-struct folio *__folio_alloc(gfp_t gfp, unsigned int order, int preferred_nid,
+#define __alloc_pages(...) alloc_hooks(__alloc_pages_noprof(__VA_ARGS__))
+
+struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
nodemask_t *nodemask);
+#define __folio_alloc(...) alloc_hooks(__folio_alloc_noprof(__VA_ARGS__))

-unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
+unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
nodemask_t *nodemask, int nr_pages,
struct list_head *page_list,
struct page **page_array);
+#define __alloc_pages_bulk(...) alloc_hooks(alloc_pages_bulk_noprof(__VA_ARGS__))

-unsigned long alloc_pages_bulk_array_mempolicy(gfp_t gfp,
+unsigned long alloc_pages_bulk_array_mempolicy_noprof(gfp_t gfp,
unsigned long nr_pages,
struct page **page_array);
+#define alloc_pages_bulk_array_mempolicy(...) alloc_hooks(alloc_pages_bulk_array_mempolicy_noprof(__VA_ARGS__))

/* Bulk allocate order-0 pages */
-static inline unsigned long
-alloc_pages_bulk_list(gfp_t gfp, unsigned long nr_pages, struct list_head *list)
-{
- return __alloc_pages_bulk(gfp, numa_mem_id(), NULL, nr_pages, list, NULL);
-}
+#define alloc_pages_bulk_list(_gfp, _nr_pages, _list) \
+ __alloc_pages_bulk(_gfp, numa_mem_id(), NULL, _nr_pages, _list, NULL)

-static inline unsigned long
-alloc_pages_bulk_array(gfp_t gfp, unsigned long nr_pages, struct page **page_array)
-{
- return __alloc_pages_bulk(gfp, numa_mem_id(), NULL, nr_pages, NULL, page_array);
-}
+#define alloc_pages_bulk_array(_gfp, _nr_pages, _page_array) \
+ __alloc_pages_bulk(_gfp, numa_mem_id(), NULL, _nr_pages, NULL, _page_array)

static inline unsigned long
-alloc_pages_bulk_array_node(gfp_t gfp, int nid, unsigned long nr_pages, struct page **page_array)
+alloc_pages_bulk_array_node_noprof(gfp_t gfp, int nid, unsigned long nr_pages, struct page **page_array)
{
if (nid == NUMA_NO_NODE)
nid = numa_mem_id();

- return __alloc_pages_bulk(gfp, nid, NULL, nr_pages, NULL, page_array);
+ return alloc_pages_bulk_noprof(gfp, nid, NULL, nr_pages, NULL, page_array);
}

+#define alloc_pages_bulk_array_node(...) alloc_hooks(alloc_pages_bulk_array_node_noprof(__VA_ARGS__))
+
static inline void warn_if_node_offline(int this_node, gfp_t gfp_mask)
{
gfp_t warn_gfp = gfp_mask & (__GFP_THISNODE|__GFP_NOWARN);
@@ -229,21 +232,23 @@ static inline void warn_if_node_offline(int this_node, gfp_t gfp_mask)
* online. For more general interface, see alloc_pages_node().
*/
static inline struct page *
-__alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
+__alloc_pages_node_noprof(int nid, gfp_t gfp_mask, unsigned int order)
{
VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
warn_if_node_offline(nid, gfp_mask);

- return __alloc_pages(gfp_mask, order, nid, NULL);
+ return __alloc_pages_noprof(gfp_mask, order, nid, NULL);
}

+#define __alloc_pages_node(...) alloc_hooks(__alloc_pages_node_noprof(__VA_ARGS__))
+
static inline
struct folio *__folio_alloc_node(gfp_t gfp, unsigned int order, int nid)
{
VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
warn_if_node_offline(nid, gfp);

- return __folio_alloc(gfp, order, nid, NULL);
+ return __folio_alloc_noprof(gfp, order, nid, NULL);
}

/*
@@ -251,53 +256,69 @@ struct folio *__folio_alloc_node(gfp_t gfp, unsigned int order, int nid)
* prefer the current CPU's closest node. Otherwise node must be valid and
* online.
*/
-static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
- unsigned int order)
+static inline struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask,
+ unsigned int order)
{
if (nid == NUMA_NO_NODE)
nid = numa_mem_id();

- return __alloc_pages_node(nid, gfp_mask, order);
+ return __alloc_pages_node_noprof(nid, gfp_mask, order);
}

+#define alloc_pages_node(...) alloc_hooks(alloc_pages_node_noprof(__VA_ARGS__))
+
#ifdef CONFIG_NUMA
-struct page *alloc_pages(gfp_t gfp, unsigned int order);
-struct folio *folio_alloc(gfp_t gfp, unsigned order);
-struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma,
+struct page *alloc_pages_noprof(gfp_t gfp, unsigned int order);
+struct folio *folio_alloc_noprof(gfp_t gfp, unsigned int order);
+struct folio *vma_alloc_folio_noprof(gfp_t gfp, int order, struct vm_area_struct *vma,
unsigned long addr, bool hugepage);
#else
-static inline struct page *alloc_pages(gfp_t gfp_mask, unsigned int order)
+static inline struct page *alloc_pages_noprof(gfp_t gfp_mask, unsigned int order)
{
- return alloc_pages_node(numa_node_id(), gfp_mask, order);
+ return alloc_pages_node_noprof(numa_node_id(), gfp_mask, order);
}
-static inline struct folio *folio_alloc(gfp_t gfp, unsigned int order)
+static inline struct folio *folio_alloc_noprof(gfp_t gfp, unsigned int order)
{
return __folio_alloc_node(gfp, order, numa_node_id());
}
-#define vma_alloc_folio(gfp, order, vma, addr, hugepage) \
- folio_alloc(gfp, order)
+#define vma_alloc_folio_noprof(gfp, order, vma, addr, hugepage) \
+ folio_alloc_noprof(gfp, order)
#endif
+
+#define alloc_pages(...) alloc_hooks(alloc_pages_noprof(__VA_ARGS__))
+#define folio_alloc(...) alloc_hooks(folio_alloc_noprof(__VA_ARGS__))
+#define vma_alloc_folio(...) alloc_hooks(vma_alloc_folio_noprof(__VA_ARGS__))
+
#define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
-static inline struct page *alloc_page_vma(gfp_t gfp,
+
+static inline struct page *alloc_page_vma_noprof(gfp_t gfp,
struct vm_area_struct *vma, unsigned long addr)
{
- struct folio *folio = vma_alloc_folio(gfp, 0, vma, addr, false);
+ struct folio *folio = vma_alloc_folio_noprof(gfp, 0, vma, addr, false);

return &folio->page;
}
+#define alloc_page_vma(...) alloc_hooks(alloc_page_vma_noprof(__VA_ARGS__))
+
+extern unsigned long get_free_pages_noprof(gfp_t gfp_mask, unsigned int order);
+#define __get_free_pages(...) alloc_hooks(get_free_pages_noprof(__VA_ARGS__))

-extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
-extern unsigned long get_zeroed_page(gfp_t gfp_mask);
+extern unsigned long get_zeroed_page_noprof(gfp_t gfp_mask);
+#define get_zeroed_page(...) alloc_hooks(get_zeroed_page_noprof(__VA_ARGS__))
+
+void *alloc_pages_exact_noprof(size_t size, gfp_t gfp_mask) __alloc_size(1);
+#define alloc_pages_exact(...) alloc_hooks(alloc_pages_exact_noprof(__VA_ARGS__))

-void *alloc_pages_exact(size_t size, gfp_t gfp_mask) __alloc_size(1);
void free_pages_exact(void *virt, size_t size);
-__meminit void *alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask) __alloc_size(2);

-#define __get_free_page(gfp_mask) \
- __get_free_pages((gfp_mask), 0)
+__meminit void *alloc_pages_exact_nid_noprof(int nid, size_t size, gfp_t gfp_mask) __alloc_size(2);
+#define alloc_pages_exact_nid(...) alloc_hooks(alloc_pages_exact_nid_noprof(__VA_ARGS__))
+
+#define __get_free_page(gfp_mask) \
+ __get_free_pages((gfp_mask), 0)

-#define __get_dma_pages(gfp_mask, order) \
- __get_free_pages((gfp_mask) | GFP_DMA, (order))
+#define __get_dma_pages(gfp_mask, order) \
+ __get_free_pages((gfp_mask) | GFP_DMA, (order))

extern void __free_pages(struct page *page, unsigned int order);
extern void free_pages(unsigned long addr, unsigned int order);
@@ -347,10 +368,14 @@ extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma);

#ifdef CONFIG_CONTIG_ALLOC
/* The below functions must be run on a range from a single zone. */
-extern int alloc_contig_range(unsigned long start, unsigned long end,
+extern int alloc_contig_range_noprof(unsigned long start, unsigned long end,
unsigned migratetype, gfp_t gfp_mask);
-extern struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask,
- int nid, nodemask_t *nodemask);
+#define alloc_contig_range(...) alloc_hooks(alloc_contig_range_noprof(__VA_ARGS__))
+
+extern struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
+ int nid, nodemask_t *nodemask);
+#define alloc_contig_pages(...) alloc_hooks(alloc_contig_pages_noprof(__VA_ARGS__))
+
#endif
void free_contig_range(unsigned long pfn, unsigned long nr_pages);

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 351c3b7f93a1..cb387321f929 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -508,14 +508,17 @@ static inline void *detach_page_private(struct page *page)
#endif

#ifdef CONFIG_NUMA
-struct folio *filemap_alloc_folio(gfp_t gfp, unsigned int order);
+struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
#else
-static inline struct folio *filemap_alloc_folio(gfp_t gfp, unsigned int order)
+static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
{
- return folio_alloc(gfp, order);
+ return folio_alloc_noprof(gfp, order);
}
#endif

+#define filemap_alloc_folio(...) \
+ alloc_hooks(filemap_alloc_folio_noprof(__VA_ARGS__))
+
static inline struct page *__page_cache_alloc(gfp_t gfp)
{
return &filemap_alloc_folio(gfp, 0)->page;
diff --git a/mm/compaction.c b/mm/compaction.c
index 38c8d216c6a3..74a7ec71660d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1760,7 +1760,7 @@ static void isolate_freepages(struct compact_control *cc)
* This is a migrate-callback that "allocates" freepages by taking pages
* from the isolated freelists in the block we are migrating to.
*/
-static struct folio *compaction_alloc(struct folio *src, unsigned long data)
+static struct folio *compaction_alloc_noprof(struct folio *src, unsigned long data)
{
struct compact_control *cc = (struct compact_control *)data;
struct folio *dst;
@@ -1779,6 +1779,11 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data)
return dst;
}

+static struct folio *compaction_alloc(struct folio *src, unsigned long data)
+{
+ return alloc_hooks(compaction_alloc_noprof(src, data));
+}
+
/*
* This is a migrate-callback that "frees" freepages back to the isolated
* freelist. All pages on the freelist are from the same zone, so there is no
diff --git a/mm/filemap.c b/mm/filemap.c
index f0a15ce1bd1b..fc0e4b630b51 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -958,7 +958,7 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
EXPORT_SYMBOL_GPL(filemap_add_folio);

#ifdef CONFIG_NUMA
-struct folio *filemap_alloc_folio(gfp_t gfp, unsigned int order)
+struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
{
int n;
struct folio *folio;
@@ -973,9 +973,9 @@ struct folio *filemap_alloc_folio(gfp_t gfp, unsigned int order)

return folio;
}
- return folio_alloc(gfp, order);
+ return folio_alloc_noprof(gfp, order);
}
-EXPORT_SYMBOL(filemap_alloc_folio);
+EXPORT_SYMBOL(filemap_alloc_folio_noprof);
#endif

/*
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f1b00d6ac7ee..0c3d65fbf9e8 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2127,7 +2127,7 @@ static struct page *alloc_page_interleave(gfp_t gfp, unsigned order,
{
struct page *page;

- page = __alloc_pages(gfp, order, nid, NULL);
+ page = __alloc_pages_noprof(gfp, order, nid, NULL);
/* skip NUMA_INTERLEAVE_HIT counter update if numa stats is disabled */
if (!static_branch_likely(&vm_numa_stat_key))
return page;
@@ -2153,15 +2153,15 @@ static struct page *alloc_pages_preferred_many(gfp_t gfp, unsigned int order,
*/
preferred_gfp = gfp | __GFP_NOWARN;
preferred_gfp &= ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL);
- page = __alloc_pages(preferred_gfp, order, nid, &pol->nodes);
+ page = __alloc_pages_noprof(preferred_gfp, order, nid, &pol->nodes);
if (!page)
- page = __alloc_pages(gfp, order, nid, NULL);
+ page = __alloc_pages_noprof(gfp, order, nid, NULL);

return page;
}

/**
- * vma_alloc_folio - Allocate a folio for a VMA.
+ * vma_alloc_folio_noprof - Allocate a folio for a VMA.
* @gfp: GFP flags.
* @order: Order of the folio.
* @vma: Pointer to VMA or NULL if not available.
@@ -2175,7 +2175,7 @@ static struct page *alloc_pages_preferred_many(gfp_t gfp, unsigned int order,
*
* Return: The folio on success or NULL if allocation fails.
*/
-struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma,
+struct folio *vma_alloc_folio_noprof(gfp_t gfp, int order, struct vm_area_struct *vma,
unsigned long addr, bool hugepage)
{
struct mempolicy *pol;
@@ -2246,7 +2246,7 @@ struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma,
* memory with both reclaim and compact as well.
*/
if (!folio && (gfp & __GFP_DIRECT_RECLAIM))
- folio = __folio_alloc(gfp, order, hpage_node,
+ folio = __folio_alloc_noprof(gfp, order, hpage_node,
nmask);

goto out;
@@ -2255,15 +2255,15 @@ struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma,

nmask = policy_nodemask(gfp, pol);
preferred_nid = policy_node(gfp, pol, node);
- folio = __folio_alloc(gfp, order, preferred_nid, nmask);
+ folio = __folio_alloc_noprof(gfp, order, preferred_nid, nmask);
mpol_cond_put(pol);
out:
return folio;
}
-EXPORT_SYMBOL(vma_alloc_folio);
+EXPORT_SYMBOL(vma_alloc_folio_noprof);

/**
- * alloc_pages - Allocate pages.
+ * alloc_pages_noprof - Allocate pages.
* @gfp: GFP flags.
* @order: Power of two of number of pages to allocate.
*
@@ -2276,7 +2276,7 @@ EXPORT_SYMBOL(vma_alloc_folio);
* flags are used.
* Return: The page on success or NULL if allocation fails.
*/
-struct page *alloc_pages(gfp_t gfp, unsigned order)
+struct page *alloc_pages_noprof(gfp_t gfp, unsigned int order)
{
struct mempolicy *pol = &default_policy;
struct page *page;
@@ -2294,24 +2294,24 @@ struct page *alloc_pages(gfp_t gfp, unsigned order)
page = alloc_pages_preferred_many(gfp, order,
policy_node(gfp, pol, numa_node_id()), pol);
else
- page = __alloc_pages(gfp, order,
+ page = __alloc_pages_noprof(gfp, order,
policy_node(gfp, pol, numa_node_id()),
policy_nodemask(gfp, pol));

return page;
}
-EXPORT_SYMBOL(alloc_pages);
+EXPORT_SYMBOL(alloc_pages_noprof);

-struct folio *folio_alloc(gfp_t gfp, unsigned order)
+struct folio *folio_alloc_noprof(gfp_t gfp, unsigned int order)
{
- struct page *page = alloc_pages(gfp | __GFP_COMP, order);
+ struct page *page = alloc_pages_noprof(gfp | __GFP_COMP, order);
struct folio *folio = (struct folio *)page;

if (folio && order > 1)
folio_prep_large_rmappable(folio);
return folio;
}
-EXPORT_SYMBOL(folio_alloc);
+EXPORT_SYMBOL(folio_alloc_noprof);

static unsigned long alloc_pages_bulk_array_interleave(gfp_t gfp,
struct mempolicy *pol, unsigned long nr_pages,
@@ -2330,13 +2330,13 @@ static unsigned long alloc_pages_bulk_array_interleave(gfp_t gfp,

for (i = 0; i < nodes; i++) {
if (delta) {
- nr_allocated = __alloc_pages_bulk(gfp,
+ nr_allocated = alloc_pages_bulk_noprof(gfp,
interleave_nodes(pol), NULL,
nr_pages_per_node + 1, NULL,
page_array);
delta--;
} else {
- nr_allocated = __alloc_pages_bulk(gfp,
+ nr_allocated = alloc_pages_bulk_noprof(gfp,
interleave_nodes(pol), NULL,
nr_pages_per_node, NULL, page_array);
}
@@ -2358,11 +2358,11 @@ static unsigned long alloc_pages_bulk_array_preferred_many(gfp_t gfp, int nid,
preferred_gfp = gfp | __GFP_NOWARN;
preferred_gfp &= ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL);

- nr_allocated = __alloc_pages_bulk(preferred_gfp, nid, &pol->nodes,
+ nr_allocated = alloc_pages_bulk_noprof(preferred_gfp, nid, &pol->nodes,
nr_pages, NULL, page_array);

if (nr_allocated < nr_pages)
- nr_allocated += __alloc_pages_bulk(gfp, numa_node_id(), NULL,
+ nr_allocated += alloc_pages_bulk_noprof(gfp, numa_node_id(), NULL,
nr_pages - nr_allocated, NULL,
page_array + nr_allocated);
return nr_allocated;
@@ -2374,7 +2374,7 @@ static unsigned long alloc_pages_bulk_array_preferred_many(gfp_t gfp, int nid,
* It can accelerate memory allocation especially interleaving
* allocate memory.
*/
-unsigned long alloc_pages_bulk_array_mempolicy(gfp_t gfp,
+unsigned long alloc_pages_bulk_array_mempolicy_noprof(gfp_t gfp,
unsigned long nr_pages, struct page **page_array)
{
struct mempolicy *pol = &default_policy;
@@ -2390,7 +2390,7 @@ unsigned long alloc_pages_bulk_array_mempolicy(gfp_t gfp,
return alloc_pages_bulk_array_preferred_many(gfp,
numa_node_id(), pol, nr_pages, page_array);

- return __alloc_pages_bulk(gfp, policy_node(gfp, pol, numa_node_id()),
+ return alloc_pages_bulk_noprof(gfp, policy_node(gfp, pol, numa_node_id()),
policy_nodemask(gfp, pol), nr_pages, NULL,
page_array);
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d490d0f73e72..63dc2f8c7901 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4239,7 +4239,7 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
*
* Returns the number of pages on the list or array.
*/
-unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
+unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid,
nodemask_t *nodemask, int nr_pages,
struct list_head *page_list,
struct page **page_array)
@@ -4375,7 +4375,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
pcp_trylock_finish(UP_flags);

failed:
- page = __alloc_pages(gfp, 0, preferred_nid, nodemask);
+ page = __alloc_pages_noprof(gfp, 0, preferred_nid, nodemask);
if (page) {
if (page_list)
list_add(&page->lru, page_list);
@@ -4386,13 +4386,13 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,

goto out;
}
-EXPORT_SYMBOL_GPL(__alloc_pages_bulk);
+EXPORT_SYMBOL_GPL(alloc_pages_bulk_noprof);

/*
* This is the 'heart' of the zoned buddy allocator.
*/
-struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
- nodemask_t *nodemask)
+struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order,
+ int preferred_nid, nodemask_t *nodemask)
{
struct page *page;
unsigned int alloc_flags = ALLOC_WMARK_LOW;
@@ -4454,12 +4454,12 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,

return page;
}
-EXPORT_SYMBOL(__alloc_pages);
+EXPORT_SYMBOL(__alloc_pages_noprof);

-struct folio *__folio_alloc(gfp_t gfp, unsigned int order, int preferred_nid,
+struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int preferred_nid,
nodemask_t *nodemask)
{
- struct page *page = __alloc_pages(gfp | __GFP_COMP, order,
+ struct page *page = __alloc_pages_noprof(gfp | __GFP_COMP, order,
preferred_nid, nodemask);
struct folio *folio = (struct folio *)page;

@@ -4467,29 +4467,29 @@ struct folio *__folio_alloc(gfp_t gfp, unsigned int order, int preferred_nid,
folio_prep_large_rmappable(folio);
return folio;
}
-EXPORT_SYMBOL(__folio_alloc);
+EXPORT_SYMBOL(__folio_alloc_noprof);

/*
* Common helper functions. Never use with __GFP_HIGHMEM because the returned
* address cannot represent highmem pages. Use alloc_pages and then kmap if
* you need to access high mem.
*/
-unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order)
+unsigned long get_free_pages_noprof(gfp_t gfp_mask, unsigned int order)
{
struct page *page;

- page = alloc_pages(gfp_mask & ~__GFP_HIGHMEM, order);
+ page = alloc_pages_noprof(gfp_mask & ~__GFP_HIGHMEM, order);
if (!page)
return 0;
return (unsigned long) page_address(page);
}
-EXPORT_SYMBOL(__get_free_pages);
+EXPORT_SYMBOL(get_free_pages_noprof);

-unsigned long get_zeroed_page(gfp_t gfp_mask)
+unsigned long get_zeroed_page_noprof(gfp_t gfp_mask)
{
- return __get_free_page(gfp_mask | __GFP_ZERO);
+ return get_free_pages_noprof(gfp_mask | __GFP_ZERO, 0);
}
-EXPORT_SYMBOL(get_zeroed_page);
+EXPORT_SYMBOL(get_zeroed_page_noprof);

/**
* __free_pages - Free pages allocated with alloc_pages().
@@ -4681,7 +4681,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
}

/**
- * alloc_pages_exact - allocate an exact number physically-contiguous pages.
+ * alloc_pages_exact_noprof - allocate an exact number physically-contiguous pages.
* @size: the number of bytes to allocate
* @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP
*
@@ -4695,7 +4695,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
*
* Return: pointer to the allocated area or %NULL in case of error.
*/
-void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
+void *alloc_pages_exact_noprof(size_t size, gfp_t gfp_mask)
{
unsigned int order = get_order(size);
unsigned long addr;
@@ -4703,13 +4703,13 @@ void *alloc_pages_exact(size_t size, gfp_t gfp_mask)
if (WARN_ON_ONCE(gfp_mask & (__GFP_COMP | __GFP_HIGHMEM)))
gfp_mask &= ~(__GFP_COMP | __GFP_HIGHMEM);

- addr = __get_free_pages(gfp_mask, order);
+ addr = get_free_pages_noprof(gfp_mask, order);
return make_alloc_exact(addr, order, size);
}
-EXPORT_SYMBOL(alloc_pages_exact);
+EXPORT_SYMBOL(alloc_pages_exact_noprof);

/**
- * alloc_pages_exact_nid - allocate an exact number of physically-contiguous
+ * alloc_pages_exact_nid_noprof - allocate an exact number of physically-contiguous
* pages on a node.
* @nid: the preferred node ID where memory should be allocated
* @size: the number of bytes to allocate
@@ -4720,7 +4720,7 @@ EXPORT_SYMBOL(alloc_pages_exact);
*
* Return: pointer to the allocated area or %NULL in case of error.
*/
-void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
+void * __meminit alloc_pages_exact_nid_noprof(int nid, size_t size, gfp_t gfp_mask)
{
unsigned int order = get_order(size);
struct page *p;
@@ -4728,7 +4728,7 @@ void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask)
if (WARN_ON_ONCE(gfp_mask & (__GFP_COMP | __GFP_HIGHMEM)))
gfp_mask &= ~(__GFP_COMP | __GFP_HIGHMEM);

- p = alloc_pages_node(nid, gfp_mask, order);
+ p = alloc_pages_node_noprof(nid, gfp_mask, order);
if (!p)
return NULL;
return make_alloc_exact((unsigned long)page_address(p), order, size);
@@ -6090,7 +6090,7 @@ int __alloc_contig_migrate_range(struct compact_control *cc,
}

/**
- * alloc_contig_range() -- tries to allocate given range of pages
+ * alloc_contig_range_noprof() -- tries to allocate given range of pages
* @start: start PFN to allocate
* @end: one-past-the-last PFN to allocate
* @migratetype: migratetype of the underlying pageblocks (either
@@ -6110,7 +6110,7 @@ int __alloc_contig_migrate_range(struct compact_control *cc,
* pages which PFN is in [start, end) are allocated for the caller and
* need to be freed with free_contig_range().
*/
-int alloc_contig_range(unsigned long start, unsigned long end,
+int alloc_contig_range_noprof(unsigned long start, unsigned long end,
unsigned migratetype, gfp_t gfp_mask)
{
unsigned long outer_start, outer_end;
@@ -6234,15 +6234,15 @@ int alloc_contig_range(unsigned long start, unsigned long end,
undo_isolate_page_range(start, end, migratetype);
return ret;
}
-EXPORT_SYMBOL(alloc_contig_range);
+EXPORT_SYMBOL(alloc_contig_range_noprof);

static int __alloc_contig_pages(unsigned long start_pfn,
unsigned long nr_pages, gfp_t gfp_mask)
{
unsigned long end_pfn = start_pfn + nr_pages;

- return alloc_contig_range(start_pfn, end_pfn, MIGRATE_MOVABLE,
- gfp_mask);
+ return alloc_contig_range_noprof(start_pfn, end_pfn, MIGRATE_MOVABLE,
+ gfp_mask);
}

static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
@@ -6277,7 +6277,7 @@ static bool zone_spans_last_pfn(const struct zone *zone,
}

/**
- * alloc_contig_pages() -- tries to find and allocate contiguous range of pages
+ * alloc_contig_pages_noprof() -- tries to find and allocate contiguous range of pages
* @nr_pages: Number of contiguous pages to allocate
* @gfp_mask: GFP mask to limit search and used during compaction
* @nid: Target node
@@ -6297,8 +6297,8 @@ static bool zone_spans_last_pfn(const struct zone *zone,
*
* Return: pointer to contiguous pages on success, or NULL if not successful.
*/
-struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask,
- int nid, nodemask_t *nodemask)
+struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
+ int nid, nodemask_t *nodemask)
{
unsigned long ret, pfn, flags;
struct zonelist *zonelist;
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:50:17

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 22/39] lib: add codetag reference into slabobj_ext

To store code tag for every slab object, a codetag reference is embedded
into slabobj_ext when CONFIG_MEM_ALLOC_PROFILING=y.

Signed-off-by: Suren Baghdasaryan <[email protected]>
Co-developed-by: Kent Overstreet <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
include/linux/memcontrol.h | 5 +++++
lib/Kconfig.debug | 1 +
mm/slab.h | 4 ++++
3 files changed, 10 insertions(+)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index f3ede28b6fa6..853a24b5f713 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1613,7 +1613,12 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
* if MEMCG_DATA_OBJEXTS is set.
*/
struct slabobj_ext {
+#ifdef CONFIG_MEMCG_KMEM
struct obj_cgroup *objcg;
+#endif
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+ union codetag_ref ref;
+#endif
} __aligned(8);

static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index e1eda1450d68..482a6aae7664 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -973,6 +973,7 @@ config MEM_ALLOC_PROFILING
depends on !DEBUG_FORCE_WEAK_PER_CPU
select CODE_TAGGING
select PAGE_EXTENSION
+ select SLAB_OBJ_EXT
help
Track allocation source code and record total allocation size
initiated at that code location. The mechanism can be used to track
diff --git a/mm/slab.h b/mm/slab.h
index 60417fd262ea..293210ed10a9 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -457,6 +457,10 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,

static inline bool need_slab_obj_ext(void)
{
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+ if (mem_alloc_profiling_enabled())
+ return true;
+#endif
/*
* CONFIG_MEMCG_KMEM creates vector of obj_cgroup objects conditionally
* inside memcg_slab_post_alloc_hook. No other users for now.
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:50:22

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 21/39] mm/page_ext: enable early_page_ext when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y

For all page allocations to be tagged, page_ext has to be initialized
before the first page allocation. Early tasks allocate their stacks
using page allocator before alloc_node_page_ext() initializes page_ext
area, unless early_page_ext is enabled. Therefore these allocations will
generate a warning when CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled.
Enable early_page_ext whenever CONFIG_MEM_ALLOC_PROFILING_DEBUG=y to
ensure page_ext initialization prior to any page allocation. This will
have all the negative effects associated with early_page_ext, such as
possible longer boot time, therefore we enable it only when debugging
with CONFIG_MEM_ALLOC_PROFILING_DEBUG enabled and not universally for
CONFIG_MEM_ALLOC_PROFILING.

Signed-off-by: Suren Baghdasaryan <[email protected]>
---
mm/page_ext.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/mm/page_ext.c b/mm/page_ext.c
index 3c58fe8a24df..e7d8f1a5589e 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -95,7 +95,16 @@ unsigned long page_ext_size;

static unsigned long total_usage;

+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+/*
+ * To ensure correct allocation tagging for pages, page_ext should be available
+ * before the first page allocation. Otherwise early task stacks will be
+ * allocated before page_ext initialization and missing tags will be flagged.
+ */
+bool early_page_ext __meminitdata = true;
+#else
bool early_page_ext __meminitdata;
+#endif
static int __init setup_early_page_ext(char *str)
{
early_page_ext = true;
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:51:00

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 23/39] mm/slab: add allocation accounting into slab allocation and free paths

Account slab allocations using codetag reference embedded into slabobj_ext.

Signed-off-by: Suren Baghdasaryan <[email protected]>
Co-developed-by: Kent Overstreet <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
include/linux/slab_def.h | 2 +-
include/linux/slub_def.h | 4 ++--
mm/slab.c | 4 +++-
mm/slab.h | 32 ++++++++++++++++++++++++++++++++
4 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index a61e7d55d0d3..23f14dcb8d5b 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -107,7 +107,7 @@ static inline void *nearest_obj(struct kmem_cache *cache, const struct slab *sla
* reciprocal_divide(offset, cache->reciprocal_buffer_size)
*/
static inline unsigned int obj_to_index(const struct kmem_cache *cache,
- const struct slab *slab, void *obj)
+ const struct slab *slab, const void *obj)
{
u32 offset = (obj - slab->s_mem);
return reciprocal_divide(offset, cache->reciprocal_buffer_size);
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index deb90cf4bffb..43fda4a5f23a 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -182,14 +182,14 @@ static inline void *nearest_obj(struct kmem_cache *cache, const struct slab *sla

/* Determine object index from a given position */
static inline unsigned int __obj_to_index(const struct kmem_cache *cache,
- void *addr, void *obj)
+ void *addr, const void *obj)
{
return reciprocal_divide(kasan_reset_tag(obj) - addr,
cache->reciprocal_size);
}

static inline unsigned int obj_to_index(const struct kmem_cache *cache,
- const struct slab *slab, void *obj)
+ const struct slab *slab, const void *obj)
{
if (is_kfence_address(obj))
return 0;
diff --git a/mm/slab.c b/mm/slab.c
index cefcb7499b6c..18923f5f05b5 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3348,9 +3348,11 @@ static void cache_flusharray(struct kmem_cache *cachep, struct array_cache *ac)
static __always_inline void __cache_free(struct kmem_cache *cachep, void *objp,
unsigned long caller)
{
+ struct slab *slab = virt_to_slab(objp);
bool init;

- memcg_slab_free_hook(cachep, virt_to_slab(objp), &objp, 1);
+ memcg_slab_free_hook(cachep, slab, &objp, 1);
+ alloc_tagging_slab_free_hook(cachep, slab, &objp, 1);

if (is_kfence_address(objp)) {
kmemleak_free_recursive(objp, cachep->flags);
diff --git a/mm/slab.h b/mm/slab.h
index 293210ed10a9..4859ce1f8808 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -533,6 +533,32 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)

#endif /* CONFIG_SLAB_OBJ_EXT */

+#ifdef CONFIG_MEM_ALLOC_PROFILING
+
+static inline void alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab,
+ void **p, int objects)
+{
+ struct slabobj_ext *obj_exts;
+ int i;
+
+ obj_exts = slab_obj_exts(slab);
+ if (!obj_exts)
+ return;
+
+ for (i = 0; i < objects; i++) {
+ unsigned int off = obj_to_index(s, slab, p[i]);
+
+ alloc_tag_sub(&obj_exts[off].ref, s->size);
+ }
+}
+
+#else
+
+static inline void alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab,
+ void **p, int objects) {}
+
+#endif /* CONFIG_MEM_ALLOC_PROFILING */
+
#ifdef CONFIG_MEMCG_KMEM
void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
enum node_stat_item idx, int nr);
@@ -827,6 +853,12 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s,
s->flags, flags);
kmsan_slab_alloc(s, p[i], flags);
obj_exts = prepare_slab_obj_exts_hook(s, flags, p[i]);
+
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+ /* obj_exts can be allocated for other reasons */
+ if (likely(obj_exts) && mem_alloc_profiling_enabled())
+ alloc_tag_add(&obj_exts->ref, current->alloc_tag, s->size);
+#endif
}

memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:51:05

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 25/39] mm/slub: Mark slab_free_freelist_hook() __always_inline

From: Kent Overstreet <[email protected]>

It seems we need to be more forceful with the compiler on this one.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
mm/slub.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index f5e07d8802e2..222c16cef729 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1800,7 +1800,7 @@ static __always_inline bool slab_free_hook(struct kmem_cache *s,
return kasan_slab_free(s, x, init);
}

-static inline bool slab_free_freelist_hook(struct kmem_cache *s,
+static __always_inline bool slab_free_freelist_hook(struct kmem_cache *s,
void **head, void **tail,
int *cnt)
{
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:51:17

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 24/39] mm/slab: enable slab allocation tagging for kmalloc and friends

Redefine kmalloc, krealloc, kzalloc, kcalloc, etc. to record allocations
and deallocations done by these functions.

Signed-off-by: Suren Baghdasaryan <[email protected]>
Co-developed-by: Kent Overstreet <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
include/linux/fortify-string.h | 5 +-
include/linux/slab.h | 173 ++++++++++++++++-----------------
include/linux/string.h | 4 +-
mm/slab.c | 18 ++--
mm/slab_common.c | 38 ++++----
mm/slub.c | 19 ++--
mm/util.c | 20 ++--
7 files changed, 137 insertions(+), 140 deletions(-)

diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h
index da51a83b2829..11319e7634a4 100644
--- a/include/linux/fortify-string.h
+++ b/include/linux/fortify-string.h
@@ -752,9 +752,9 @@ __FORTIFY_INLINE void *memchr_inv(const void * const POS0 p, int c, size_t size)
return __real_memchr_inv(p, c, size);
}

-extern void *__real_kmemdup(const void *src, size_t len, gfp_t gfp) __RENAME(kmemdup)
+extern void *__real_kmemdup(const void *src, size_t len, gfp_t gfp) __RENAME(kmemdup_noprof)
__realloc_size(2);
-__FORTIFY_INLINE void *kmemdup(const void * const POS0 p, size_t size, gfp_t gfp)
+__FORTIFY_INLINE void *kmemdup_noprof(const void * const POS0 p, size_t size, gfp_t gfp)
{
const size_t p_size = __struct_size(p);

@@ -764,6 +764,7 @@ __FORTIFY_INLINE void *kmemdup(const void * const POS0 p, size_t size, gfp_t gfp
fortify_panic(__func__);
return __real_kmemdup(p, size, gfp);
}
+#define kmemdup(...) alloc_hooks(kmemdup_noprof(__VA_ARGS__))

/**
* strcpy - Copy a string into another string buffer
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 11ef3d364b2b..0543e0f76c60 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -230,7 +230,9 @@ int kmem_cache_shrink(struct kmem_cache *s);
/*
* Common kmalloc functions provided by all allocators
*/
-void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __realloc_size(2);
+void * __must_check krealloc_noprof(const void *objp, size_t new_size, gfp_t flags) __realloc_size(2);
+#define krealloc(...) alloc_hooks(krealloc_noprof(__VA_ARGS__))
+
void kfree(const void *objp);
void kfree_sensitive(const void *objp);
size_t __ksize(const void *objp);
@@ -491,7 +493,10 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
static_assert(PAGE_SHIFT <= 20);
#define kmalloc_index(s) __kmalloc_index(s, true)

-void *__kmalloc(size_t size, gfp_t flags) __assume_kmalloc_alignment __alloc_size(1);
+#include <linux/alloc_tag.h>
+
+void *__kmalloc_noprof(size_t size, gfp_t flags) __assume_kmalloc_alignment __alloc_size(1);
+#define __kmalloc(...) alloc_hooks(__kmalloc_noprof(__VA_ARGS__))

/**
* kmem_cache_alloc - Allocate an object
@@ -503,9 +508,13 @@ void *__kmalloc(size_t size, gfp_t flags) __assume_kmalloc_alignment __alloc_siz
*
* Return: pointer to the new object or %NULL in case of error
*/
-void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags) __assume_slab_alignment __malloc;
-void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
- gfp_t gfpflags) __assume_slab_alignment __malloc;
+void *kmem_cache_alloc_noprof(struct kmem_cache *cachep, gfp_t flags) __assume_slab_alignment __malloc;
+#define kmem_cache_alloc(...) alloc_hooks(kmem_cache_alloc_noprof(__VA_ARGS__))
+
+void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
+ gfp_t gfpflags) __assume_slab_alignment __malloc;
+#define kmem_cache_alloc_lru(...) alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__))
+
void kmem_cache_free(struct kmem_cache *s, void *objp);

/*
@@ -516,29 +525,40 @@ void kmem_cache_free(struct kmem_cache *s, void *objp);
* Note that interrupts must be enabled when calling these functions.
*/
void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p);
-int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, void **p);
+
+int kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags, size_t size, void **p);
+#define kmem_cache_alloc_bulk(...) alloc_hooks(kmem_cache_alloc_bulk_noprof(__VA_ARGS__))

static __always_inline void kfree_bulk(size_t size, void **p)
{
kmem_cache_free_bulk(NULL, size, p);
}

-void *__kmalloc_node(size_t size, gfp_t flags, int node) __assume_kmalloc_alignment
+void *__kmalloc_node_noprof(size_t size, gfp_t flags, int node) __assume_kmalloc_alignment
__alloc_size(1);
-void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t flags, int node) __assume_slab_alignment
- __malloc;
+#define __kmalloc_node(...) alloc_hooks(__kmalloc_node_noprof(__VA_ARGS__))

-void *kmalloc_trace(struct kmem_cache *s, gfp_t flags, size_t size)
+void *kmem_cache_alloc_node_noprof(struct kmem_cache *s, gfp_t flags, int node) __assume_slab_alignment
+ __malloc;
+#define kmem_cache_alloc_node(...) alloc_hooks(kmem_cache_alloc_node_noprof(__VA_ARGS__))
+
+void *kmalloc_trace_noprof(struct kmem_cache *s, gfp_t flags, size_t size)
__assume_kmalloc_alignment __alloc_size(3);

-void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
- int node, size_t size) __assume_kmalloc_alignment
+void *kmalloc_node_trace_noprof(struct kmem_cache *s, gfp_t gfpflags,
+ int node, size_t size) __assume_kmalloc_alignment
__alloc_size(4);
-void *kmalloc_large(size_t size, gfp_t flags) __assume_page_alignment
+#define kmalloc_trace(...) alloc_hooks(kmalloc_trace_noprof(__VA_ARGS__))
+
+#define kmalloc_node_trace(...) alloc_hooks(kmalloc_node_trace_noprof(__VA_ARGS__))
+
+void *kmalloc_large_noprof(size_t size, gfp_t flags) __assume_page_alignment
__alloc_size(1);
+#define kmalloc_large(...) alloc_hooks(kmalloc_large_noprof(__VA_ARGS__))

-void *kmalloc_large_node(size_t size, gfp_t flags, int node) __assume_page_alignment
+void *kmalloc_large_node_noprof(size_t size, gfp_t flags, int node) __assume_page_alignment
__alloc_size(1);
+#define kmalloc_large_node(...) alloc_hooks(kmalloc_large_node_noprof(__VA_ARGS__))

/**
* kmalloc - allocate kernel memory
@@ -594,37 +614,39 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node) __assume_page_align
* Try really hard to succeed the allocation but fail
* eventually.
*/
-static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)
+static __always_inline __alloc_size(1) void *kmalloc_noprof(size_t size, gfp_t flags)
{
if (__builtin_constant_p(size) && size) {
unsigned int index;

if (size > KMALLOC_MAX_CACHE_SIZE)
- return kmalloc_large(size, flags);
+ return kmalloc_large_noprof(size, flags);

index = kmalloc_index(size);
- return kmalloc_trace(
+ return kmalloc_trace_noprof(
kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
flags, size);
}
- return __kmalloc(size, flags);
+ return __kmalloc_noprof(size, flags);
}
+#define kmalloc(...) alloc_hooks(kmalloc_noprof(__VA_ARGS__))

-static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t flags, int node)
+static __always_inline __alloc_size(1) void *kmalloc_node_noprof(size_t size, gfp_t flags, int node)
{
if (__builtin_constant_p(size) && size) {
unsigned int index;

if (size > KMALLOC_MAX_CACHE_SIZE)
- return kmalloc_large_node(size, flags, node);
+ return kmalloc_large_node_noprof(size, flags, node);

index = kmalloc_index(size);
- return kmalloc_node_trace(
+ return kmalloc_node_trace_noprof(
kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
flags, node, size);
}
- return __kmalloc_node(size, flags, node);
+ return __kmalloc_node_noprof(size, flags, node);
}
+#define kmalloc_node(...) alloc_hooks(kmalloc_node_noprof(__VA_ARGS__))

/**
* kmalloc_array - allocate memory for an array.
@@ -632,16 +654,17 @@ static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t fla
* @size: element size.
* @flags: the type of memory to allocate (see kmalloc).
*/
-static inline __alloc_size(1, 2) void *kmalloc_array(size_t n, size_t size, gfp_t flags)
+static inline __alloc_size(1, 2) void *kmalloc_array_noprof(size_t n, size_t size, gfp_t flags)
{
size_t bytes;

if (unlikely(check_mul_overflow(n, size, &bytes)))
return NULL;
if (__builtin_constant_p(n) && __builtin_constant_p(size))
- return kmalloc(bytes, flags);
- return __kmalloc(bytes, flags);
+ return kmalloc_noprof(bytes, flags);
+ return kmalloc_noprof(bytes, flags);
}
+#define kmalloc_array(...) alloc_hooks(kmalloc_array_noprof(__VA_ARGS__))

/**
* krealloc_array - reallocate memory for an array.
@@ -650,18 +673,19 @@ static inline __alloc_size(1, 2) void *kmalloc_array(size_t n, size_t size, gfp_
* @new_size: new size of a single member of the array
* @flags: the type of memory to allocate (see kmalloc)
*/
-static inline __realloc_size(2, 3) void * __must_check krealloc_array(void *p,
- size_t new_n,
- size_t new_size,
- gfp_t flags)
+static inline __realloc_size(2, 3) void * __must_check krealloc_array_noprof(void *p,
+ size_t new_n,
+ size_t new_size,
+ gfp_t flags)
{
size_t bytes;

if (unlikely(check_mul_overflow(new_n, new_size, &bytes)))
return NULL;

- return krealloc(p, bytes, flags);
+ return krealloc_noprof(p, bytes, flags);
}
+#define krealloc_array(...) alloc_hooks(krealloc_array_noprof(__VA_ARGS__))

/**
* kcalloc - allocate memory for an array. The memory is set to zero.
@@ -669,16 +693,11 @@ static inline __realloc_size(2, 3) void * __must_check krealloc_array(void *p,
* @size: element size.
* @flags: the type of memory to allocate (see kmalloc).
*/
-static inline __alloc_size(1, 2) void *kcalloc(size_t n, size_t size, gfp_t flags)
-{
- return kmalloc_array(n, size, flags | __GFP_ZERO);
-}
+#define kcalloc(_n, _size, _flags) kmalloc_array(_n, _size, (_flags) | __GFP_ZERO)

-void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node,
+void *kmalloc_node_track_caller_noprof(size_t size, gfp_t flags, int node,
unsigned long caller) __alloc_size(1);
-#define kmalloc_node_track_caller(size, flags, node) \
- __kmalloc_node_track_caller(size, flags, node, \
- _RET_IP_)
+#define kmalloc_node_track_caller(...) alloc_hooks(kmalloc_node_track_caller_noprof(__VA_ARGS__, _RET_IP_))

/*
* kmalloc_track_caller is a special version of kmalloc that records the
@@ -688,11 +707,9 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node,
* allocator where we care about the real place the memory allocation
* request comes from.
*/
-#define kmalloc_track_caller(size, flags) \
- __kmalloc_node_track_caller(size, flags, \
- NUMA_NO_NODE, _RET_IP_)
+#define kmalloc_track_caller(...) kmalloc_node_track_caller(__VA_ARGS__, NUMA_NO_NODE)

-static inline __alloc_size(1, 2) void *kmalloc_array_node(size_t n, size_t size, gfp_t flags,
+static inline __alloc_size(1, 2) void *kmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags,
int node)
{
size_t bytes;
@@ -700,75 +717,51 @@ static inline __alloc_size(1, 2) void *kmalloc_array_node(size_t n, size_t size,
if (unlikely(check_mul_overflow(n, size, &bytes)))
return NULL;
if (__builtin_constant_p(n) && __builtin_constant_p(size))
- return kmalloc_node(bytes, flags, node);
- return __kmalloc_node(bytes, flags, node);
+ return kmalloc_node_noprof(bytes, flags, node);
+ return __kmalloc_node_noprof(bytes, flags, node);
}
+#define kmalloc_array_node(...) alloc_hooks(kmalloc_array_node_noprof(__VA_ARGS__))

-static inline __alloc_size(1, 2) void *kcalloc_node(size_t n, size_t size, gfp_t flags, int node)
-{
- return kmalloc_array_node(n, size, flags | __GFP_ZERO, node);
-}
+#define kcalloc_node(_n, _size, _flags, _node) kmalloc_array_node(_n, _size, (_flags) | __GFP_ZERO, _node)

/*
* Shortcuts
*/
-static inline void *kmem_cache_zalloc(struct kmem_cache *k, gfp_t flags)
-{
- return kmem_cache_alloc(k, flags | __GFP_ZERO);
-}
+#define kmem_cache_zalloc(_k, _flags) kmem_cache_alloc(_k, (_flags)|__GFP_ZERO)

/**
* kzalloc - allocate memory. The memory is set to zero.
* @size: how many bytes of memory are required.
* @flags: the type of memory to allocate (see kmalloc).
*/
-static inline __alloc_size(1) void *kzalloc(size_t size, gfp_t flags)
+static inline __alloc_size(1) void *kzalloc_noprof(size_t size, gfp_t flags)
{
- return kmalloc(size, flags | __GFP_ZERO);
+ return kmalloc_noprof(size, flags | __GFP_ZERO);
}
+#define kzalloc(...) alloc_hooks(kzalloc_noprof(__VA_ARGS__))
+#define kzalloc_node(_size, _flags, _node) kmalloc_node(_size, (_flags)|__GFP_ZERO, _node)

-/**
- * kzalloc_node - allocate zeroed memory from a particular memory node.
- * @size: how many bytes of memory are required.
- * @flags: the type of memory to allocate (see kmalloc).
- * @node: memory node from which to allocate
- */
-static inline __alloc_size(1) void *kzalloc_node(size_t size, gfp_t flags, int node)
-{
- return kmalloc_node(size, flags | __GFP_ZERO, node);
-}
+extern void *kvmalloc_node_noprof(size_t size, gfp_t flags, int node) __alloc_size(1);
+#define kvmalloc_node(...) alloc_hooks(kvmalloc_node_noprof(__VA_ARGS__))

-extern void *kvmalloc_node(size_t size, gfp_t flags, int node) __alloc_size(1);
-static inline __alloc_size(1) void *kvmalloc(size_t size, gfp_t flags)
-{
- return kvmalloc_node(size, flags, NUMA_NO_NODE);
-}
-static inline __alloc_size(1) void *kvzalloc_node(size_t size, gfp_t flags, int node)
-{
- return kvmalloc_node(size, flags | __GFP_ZERO, node);
-}
-static inline __alloc_size(1) void *kvzalloc(size_t size, gfp_t flags)
-{
- return kvmalloc(size, flags | __GFP_ZERO);
-}
+#define kvmalloc(_size, _flags) kvmalloc_node(_size, _flags, NUMA_NO_NODE)
+#define kvzalloc(_size, _flags) kvmalloc(_size, _flags|__GFP_ZERO)

-static inline __alloc_size(1, 2) void *kvmalloc_array(size_t n, size_t size, gfp_t flags)
-{
- size_t bytes;
-
- if (unlikely(check_mul_overflow(n, size, &bytes)))
- return NULL;
+#define kvzalloc_node(_size, _flags, _node) kvmalloc_node(_size, _flags|__GFP_ZERO, _node)

- return kvmalloc(bytes, flags);
-}
+#define kvmalloc_array(_n, _size, _flags) \
+({ \
+ size_t _bytes; \
+ \
+ !check_mul_overflow(_n, _size, &_bytes) ? kvmalloc(_bytes, _flags) : NULL; \
+})

-static inline __alloc_size(1, 2) void *kvcalloc(size_t n, size_t size, gfp_t flags)
-{
- return kvmalloc_array(n, size, flags | __GFP_ZERO);
-}
+#define kvcalloc(_n, _size, _flags) kvmalloc_array(_n, _size, _flags|__GFP_ZERO)

-extern void *kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
+extern void *kvrealloc_noprof(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
__realloc_size(3);
+#define kvrealloc(...) alloc_hooks(kvrealloc_noprof(__VA_ARGS__))
+
extern void kvfree(const void *addr);
extern void kvfree_sensitive(const void *addr, size_t len);

diff --git a/include/linux/string.h b/include/linux/string.h
index dbfc66400050..9516258d8117 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -176,7 +176,9 @@ extern void kfree_const(const void *x);
extern char *kstrdup(const char *s, gfp_t gfp) __malloc;
extern const char *kstrdup_const(const char *s, gfp_t gfp);
extern char *kstrndup(const char *s, size_t len, gfp_t gfp);
-extern void *kmemdup(const void *src, size_t len, gfp_t gfp) __realloc_size(2);
+extern void *kmemdup_noprof(const void *src, size_t len, gfp_t gfp) __realloc_size(2);
+#define kmemdup(...) alloc_hooks(kmemdup_noprof(__VA_ARGS__))
+
extern void *kvmemdup(const void *src, size_t len, gfp_t gfp) __realloc_size(2);
extern char *kmemdup_nul(const char *s, size_t len, gfp_t gfp);

diff --git a/mm/slab.c b/mm/slab.c
index 18923f5f05b5..f75519fa89b9 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3429,18 +3429,18 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
return ret;
}

-void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
+void *kmem_cache_alloc_noprof(struct kmem_cache *cachep, gfp_t flags)
{
return __kmem_cache_alloc_lru(cachep, NULL, flags);
}
-EXPORT_SYMBOL(kmem_cache_alloc);
+EXPORT_SYMBOL(kmem_cache_alloc_noprof);

-void *kmem_cache_alloc_lru(struct kmem_cache *cachep, struct list_lru *lru,
+void *kmem_cache_alloc_lru_noprof(struct kmem_cache *cachep, struct list_lru *lru,
gfp_t flags)
{
return __kmem_cache_alloc_lru(cachep, lru, flags);
}
-EXPORT_SYMBOL(kmem_cache_alloc_lru);
+EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof);

static __always_inline void
cache_alloc_debugcheck_after_bulk(struct kmem_cache *s, gfp_t flags,
@@ -3452,8 +3452,8 @@ cache_alloc_debugcheck_after_bulk(struct kmem_cache *s, gfp_t flags,
p[i] = cache_alloc_debugcheck_after(s, flags, p[i], caller);
}

-int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
- void **p)
+int kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags, size_t size,
+ void **p)
{
struct obj_cgroup *objcg = NULL;
unsigned long irqflags;
@@ -3491,7 +3491,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
kmem_cache_free_bulk(s, i, p);
return 0;
}
-EXPORT_SYMBOL(kmem_cache_alloc_bulk);
+EXPORT_SYMBOL(kmem_cache_alloc_bulk_noprof);

/**
* kmem_cache_alloc_node - Allocate an object on the specified node
@@ -3506,7 +3506,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_bulk);
*
* Return: pointer to the new object or %NULL in case of error
*/
-void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
+void *kmem_cache_alloc_node_noprof(struct kmem_cache *cachep, gfp_t flags, int nodeid)
{
void *ret = slab_alloc_node(cachep, NULL, flags, nodeid, cachep->object_size, _RET_IP_);

@@ -3514,7 +3514,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)

return ret;
}
-EXPORT_SYMBOL(kmem_cache_alloc_node);
+EXPORT_SYMBOL(kmem_cache_alloc_node_noprof);

void *__kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
int nodeid, size_t orig_size,
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 446f406d2703..8ef5e47ff6a7 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1077,24 +1077,24 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
return ret;
}

-void *__kmalloc_node(size_t size, gfp_t flags, int node)
+void *__kmalloc_node_noprof(size_t size, gfp_t flags, int node)
{
return __do_kmalloc_node(size, flags, node, _RET_IP_);
}
-EXPORT_SYMBOL(__kmalloc_node);
+EXPORT_SYMBOL(__kmalloc_node_noprof);

-void *__kmalloc(size_t size, gfp_t flags)
+void *__kmalloc_noprof(size_t size, gfp_t flags)
{
return __do_kmalloc_node(size, flags, NUMA_NO_NODE, _RET_IP_);
}
-EXPORT_SYMBOL(__kmalloc);
+EXPORT_SYMBOL(__kmalloc_noprof);

-void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
- int node, unsigned long caller)
+void *kmalloc_node_track_caller_noprof(size_t size, gfp_t flags,
+ int node, unsigned long caller)
{
return __do_kmalloc_node(size, flags, node, caller);
}
-EXPORT_SYMBOL(__kmalloc_node_track_caller);
+EXPORT_SYMBOL(kmalloc_node_track_caller_noprof);

/**
* kfree - free previously allocated memory
@@ -1161,7 +1161,7 @@ size_t __ksize(const void *object)
return slab_ksize(folio_slab(folio)->slab_cache);
}

-void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
+void *kmalloc_trace_noprof(struct kmem_cache *s, gfp_t gfpflags, size_t size)
{
void *ret = __kmem_cache_alloc_node(s, gfpflags, NUMA_NO_NODE,
size, _RET_IP_);
@@ -1171,9 +1171,9 @@ void *kmalloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
ret = kasan_kmalloc(s, ret, size, gfpflags);
return ret;
}
-EXPORT_SYMBOL(kmalloc_trace);
+EXPORT_SYMBOL(kmalloc_trace_noprof);

-void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
+void *kmalloc_node_trace_noprof(struct kmem_cache *s, gfp_t gfpflags,
int node, size_t size)
{
void *ret = __kmem_cache_alloc_node(s, gfpflags, node, size, _RET_IP_);
@@ -1183,7 +1183,7 @@ void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
ret = kasan_kmalloc(s, ret, size, gfpflags);
return ret;
}
-EXPORT_SYMBOL(kmalloc_node_trace);
+EXPORT_SYMBOL(kmalloc_node_trace_noprof);

gfp_t kmalloc_fix_flags(gfp_t flags)
{
@@ -1213,7 +1213,7 @@ static void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
flags = kmalloc_fix_flags(flags);

flags |= __GFP_COMP;
- page = alloc_pages_node(node, flags, order);
+ page = alloc_pages_node_noprof(node, flags, order);
if (page) {
ptr = page_address(page);
mod_lruvec_page_state(page, NR_SLAB_UNRECLAIMABLE_B,
@@ -1228,7 +1228,7 @@ static void *__kmalloc_large_node(size_t size, gfp_t flags, int node)
return ptr;
}

-void *kmalloc_large(size_t size, gfp_t flags)
+void *kmalloc_large_noprof(size_t size, gfp_t flags)
{
void *ret = __kmalloc_large_node(size, flags, NUMA_NO_NODE);

@@ -1236,9 +1236,9 @@ void *kmalloc_large(size_t size, gfp_t flags)
flags, NUMA_NO_NODE);
return ret;
}
-EXPORT_SYMBOL(kmalloc_large);
+EXPORT_SYMBOL(kmalloc_large_noprof);

-void *kmalloc_large_node(size_t size, gfp_t flags, int node)
+void *kmalloc_large_node_noprof(size_t size, gfp_t flags, int node)
{
void *ret = __kmalloc_large_node(size, flags, node);

@@ -1246,7 +1246,7 @@ void *kmalloc_large_node(size_t size, gfp_t flags, int node)
flags, node);
return ret;
}
-EXPORT_SYMBOL(kmalloc_large_node);
+EXPORT_SYMBOL(kmalloc_large_node_noprof);

#ifdef CONFIG_SLAB_FREELIST_RANDOM
/* Randomize a generic freelist */
@@ -1460,7 +1460,7 @@ __do_krealloc(const void *p, size_t new_size, gfp_t flags)
return (void *)p;
}

- ret = kmalloc_track_caller(new_size, flags);
+ ret = kmalloc_node_track_caller_noprof(new_size, flags, NUMA_NO_NODE, _RET_IP_);
if (ret && p) {
/* Disable KASAN checks as the object's redzone is accessed. */
kasan_disable_current();
@@ -1484,7 +1484,7 @@ __do_krealloc(const void *p, size_t new_size, gfp_t flags)
*
* Return: pointer to the allocated memory or %NULL in case of error
*/
-void *krealloc(const void *p, size_t new_size, gfp_t flags)
+void *krealloc_noprof(const void *p, size_t new_size, gfp_t flags)
{
void *ret;

@@ -1499,7 +1499,7 @@ void *krealloc(const void *p, size_t new_size, gfp_t flags)

return ret;
}
-EXPORT_SYMBOL(krealloc);
+EXPORT_SYMBOL(krealloc_noprof);

/**
* kfree_sensitive - Clear sensitive information in memory before freeing
diff --git a/mm/slub.c b/mm/slub.c
index d16643492320..f5e07d8802e2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3497,18 +3497,18 @@ void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
return ret;
}

-void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
+void *kmem_cache_alloc_noprof(struct kmem_cache *s, gfp_t gfpflags)
{
return __kmem_cache_alloc_lru(s, NULL, gfpflags);
}
-EXPORT_SYMBOL(kmem_cache_alloc);
+EXPORT_SYMBOL(kmem_cache_alloc_noprof);

-void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru,
+void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
gfp_t gfpflags)
{
return __kmem_cache_alloc_lru(s, lru, gfpflags);
}
-EXPORT_SYMBOL(kmem_cache_alloc_lru);
+EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof);

void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
int node, size_t orig_size,
@@ -3518,7 +3518,7 @@ void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
caller, orig_size);
}

-void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
+void *kmem_cache_alloc_node_noprof(struct kmem_cache *s, gfp_t gfpflags, int node)
{
void *ret = slab_alloc_node(s, NULL, gfpflags, node, _RET_IP_, s->object_size);

@@ -3526,7 +3526,7 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)

return ret;
}
-EXPORT_SYMBOL(kmem_cache_alloc_node);
+EXPORT_SYMBOL(kmem_cache_alloc_node_noprof);

static noinline void free_to_partial_list(
struct kmem_cache *s, struct slab *slab,
@@ -3802,6 +3802,7 @@ static __fastpath_inline void slab_free(struct kmem_cache *s, struct slab *slab,
unsigned long addr)
{
memcg_slab_free_hook(s, slab, p, cnt);
+ alloc_tagging_slab_free_hook(s, slab, p, cnt);
/*
* With KASAN enabled slab_free_freelist_hook modifies the freelist
* to remove objects, whose reuse must be delayed.
@@ -4032,8 +4033,8 @@ static int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
#endif /* CONFIG_SLUB_TINY */

/* Note that interrupts must be enabled when calling this function. */
-int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
- void **p)
+int kmem_cache_alloc_bulk_noprof(struct kmem_cache *s, gfp_t flags, size_t size,
+ void **p)
{
int i;
struct obj_cgroup *objcg = NULL;
@@ -4057,7 +4058,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
slab_want_init_on_alloc(flags, s), s->object_size);
return i;
}
-EXPORT_SYMBOL(kmem_cache_alloc_bulk);
+EXPORT_SYMBOL(kmem_cache_alloc_bulk_noprof);


/*
diff --git a/mm/util.c b/mm/util.c
index 8cbbfd3a3d59..27ed6a5ac31a 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -115,7 +115,7 @@ char *kstrndup(const char *s, size_t max, gfp_t gfp)
EXPORT_SYMBOL(kstrndup);

/**
- * kmemdup - duplicate region of memory
+ * kmemdup_noprof - duplicate region of memory
*
* @src: memory region to duplicate
* @len: memory region length
@@ -124,16 +124,16 @@ EXPORT_SYMBOL(kstrndup);
* Return: newly allocated copy of @src or %NULL in case of error,
* result is physically contiguous. Use kfree() to free.
*/
-void *kmemdup(const void *src, size_t len, gfp_t gfp)
+void *kmemdup_noprof(const void *src, size_t len, gfp_t gfp)
{
void *p;

- p = kmalloc_track_caller(len, gfp);
+ p = kmalloc_node_track_caller_noprof(len, gfp, NUMA_NO_NODE, _RET_IP_);
if (p)
memcpy(p, src, len);
return p;
}
-EXPORT_SYMBOL(kmemdup);
+EXPORT_SYMBOL(kmemdup_noprof);

/**
* kvmemdup - duplicate region of memory
@@ -567,7 +567,7 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
EXPORT_SYMBOL(vm_mmap);

/**
- * kvmalloc_node - attempt to allocate physically contiguous memory, but upon
+ * kvmalloc_node_noprof - attempt to allocate physically contiguous memory, but upon
* failure, fall back to non-contiguous (vmalloc) allocation.
* @size: size of the request.
* @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
@@ -582,7 +582,7 @@ EXPORT_SYMBOL(vm_mmap);
*
* Return: pointer to the allocated memory of %NULL in case of failure
*/
-void *kvmalloc_node(size_t size, gfp_t flags, int node)
+void *kvmalloc_node_noprof(size_t size, gfp_t flags, int node)
{
gfp_t kmalloc_flags = flags;
void *ret;
@@ -604,7 +604,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
kmalloc_flags &= ~__GFP_NOFAIL;
}

- ret = kmalloc_node(size, kmalloc_flags, node);
+ ret = kmalloc_node_noprof(size, kmalloc_flags, node);

/*
* It doesn't really make sense to fallback to vmalloc for sub page
@@ -633,7 +633,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
flags, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
node, __builtin_return_address(0));
}
-EXPORT_SYMBOL(kvmalloc_node);
+EXPORT_SYMBOL(kvmalloc_node_noprof);

/**
* kvfree() - Free memory.
@@ -672,7 +672,7 @@ void kvfree_sensitive(const void *addr, size_t len)
}
EXPORT_SYMBOL(kvfree_sensitive);

-void *kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
+void *kvrealloc_noprof(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
{
void *newp;

@@ -685,7 +685,7 @@ void *kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
kvfree(p);
return newp;
}
-EXPORT_SYMBOL(kvrealloc);
+EXPORT_SYMBOL(kvrealloc_noprof);

/**
* __vmalloc_array - allocate memory for a virtually contiguous array.
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:51:17

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 30/39] mm: percpu: Add codetag reference into pcpuobj_ext

From: Kent Overstreet <[email protected]>

To store codetag for every per-cpu allocation, a codetag reference is
embedded into pcpuobj_ext when CONFIG_MEM_ALLOC_PROFILING=y. Hooks to
use the newly introduced codetag are added.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
mm/percpu-internal.h | 11 +++++++++--
mm/percpu.c | 26 ++++++++++++++++++++++++++
2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h
index e62d582f4bf3..7e42f0ca3b7b 100644
--- a/mm/percpu-internal.h
+++ b/mm/percpu-internal.h
@@ -36,9 +36,12 @@ struct pcpuobj_ext {
#ifdef CONFIG_MEMCG_KMEM
struct obj_cgroup *cgroup;
#endif
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+ union codetag_ref tag;
+#endif
};

-#ifdef CONFIG_MEMCG_KMEM
+#if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MEM_ALLOC_PROFILING)
#define NEED_PCPUOBJ_EXT
#endif

@@ -86,7 +89,11 @@ struct pcpu_chunk {

static inline bool need_pcpuobj_ext(void)
{
- return !mem_cgroup_kmem_disabled();
+ if (IS_ENABLED(CONFIG_MEM_ALLOC_PROFILING))
+ return true;
+ if (!mem_cgroup_kmem_disabled())
+ return true;
+ return false;
}

extern spinlock_t pcpu_lock;
diff --git a/mm/percpu.c b/mm/percpu.c
index 5a6202acffa3..002ee5d38fd5 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1701,6 +1701,32 @@ static void pcpu_memcg_free_hook(struct pcpu_chunk *chunk, int off, size_t size)
}
#endif /* CONFIG_MEMCG_KMEM */

+#ifdef CONFIG_MEM_ALLOC_PROFILING
+static void pcpu_alloc_tag_alloc_hook(struct pcpu_chunk *chunk, int off,
+ size_t size)
+{
+ if (mem_alloc_profiling_enabled() && likely(chunk->obj_exts)) {
+ alloc_tag_add(&chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].tag,
+ current->alloc_tag, size);
+ }
+}
+
+static void pcpu_alloc_tag_free_hook(struct pcpu_chunk *chunk, int off, size_t size)
+{
+ if (mem_alloc_profiling_enabled() && likely(chunk->obj_exts))
+ alloc_tag_sub_noalloc(&chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].tag, size);
+}
+#else
+static void pcpu_alloc_tag_alloc_hook(struct pcpu_chunk *chunk, int off,
+ size_t size)
+{
+}
+
+static void pcpu_alloc_tag_free_hook(struct pcpu_chunk *chunk, int off, size_t size)
+{
+}
+#endif
+
/**
* pcpu_alloc - the percpu allocator
* @size: size of area to allocate in bytes
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:51:18

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 32/39] arm64: Fix circular header dependency

From: Kent Overstreet <[email protected]>

Replace linux/percpu.h include with asm/percpu.h to avoid circular
dependency.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
arch/arm64/include/asm/spectre.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/spectre.h b/arch/arm64/include/asm/spectre.h
index 9cc501450486..75e837753772 100644
--- a/arch/arm64/include/asm/spectre.h
+++ b/arch/arm64/include/asm/spectre.h
@@ -13,8 +13,8 @@
#define __BP_HARDEN_HYP_VECS_SZ ((BP_HARDEN_EL2_SLOTS - 1) * SZ_2K)

#ifndef __ASSEMBLY__
-
-#include <linux/percpu.h>
+#include <linux/smp.h>
+#include <asm/percpu.h>

#include <asm/cpufeature.h>
#include <asm/virt.h>
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:51:23

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 28/39] timekeeping: Fix a circular include dependency

From: Kent Overstreet <[email protected]>

This avoids a circular header dependency in an upcoming patch by only
making hrtimer.h depend on percpu-defs.h

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
include/linux/hrtimer.h | 2 +-
include/linux/time_namespace.h | 2 ++
2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 0ee140176f10..e67349e84364 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -16,7 +16,7 @@
#include <linux/rbtree.h>
#include <linux/init.h>
#include <linux/list.h>
-#include <linux/percpu.h>
+#include <linux/percpu-defs.h>
#include <linux/seqlock.h>
#include <linux/timer.h>
#include <linux/timerqueue.h>
diff --git a/include/linux/time_namespace.h b/include/linux/time_namespace.h
index 03d9c5ac01d1..a9e61120d4e3 100644
--- a/include/linux/time_namespace.h
+++ b/include/linux/time_namespace.h
@@ -11,6 +11,8 @@
struct user_namespace;
extern struct user_namespace init_user_ns;

+struct vm_area_struct;
+
struct timens_offsets {
struct timespec64 monotonic;
struct timespec64 boottime;
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:51:24

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 26/39] mempool: Hook up to memory allocation profiling

From: Kent Overstreet <[email protected]>

This adds hooks to mempools for correctly annotating mempool-backed
allocations at the correct source line, so they show up correctly in
/sys/kernel/debug/allocations.

Various inline functions are converted to wrappers so that we can invoke
alloc_hooks() in fewer places.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/mempool.h | 73 ++++++++++++++++++++---------------------
mm/mempool.c | 34 ++++++++-----------
2 files changed, 48 insertions(+), 59 deletions(-)

diff --git a/include/linux/mempool.h b/include/linux/mempool.h
index 4aae6c06c5f2..9fa126aa19b5 100644
--- a/include/linux/mempool.h
+++ b/include/linux/mempool.h
@@ -5,6 +5,8 @@
#ifndef _LINUX_MEMPOOL_H
#define _LINUX_MEMPOOL_H

+#include <linux/sched.h>
+#include <linux/alloc_tag.h>
#include <linux/wait.h>
#include <linux/compiler.h>

@@ -39,18 +41,32 @@ void mempool_exit(mempool_t *pool);
int mempool_init_node(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
mempool_free_t *free_fn, void *pool_data,
gfp_t gfp_mask, int node_id);
-int mempool_init(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
+
+int mempool_init_noprof(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
mempool_free_t *free_fn, void *pool_data);
+#define mempool_init(...) \
+ alloc_hooks(mempool_init_noprof(__VA_ARGS__))

extern mempool_t *mempool_create(int min_nr, mempool_alloc_t *alloc_fn,
mempool_free_t *free_fn, void *pool_data);
-extern mempool_t *mempool_create_node(int min_nr, mempool_alloc_t *alloc_fn,
+
+extern mempool_t *mempool_create_node_noprof(int min_nr, mempool_alloc_t *alloc_fn,
mempool_free_t *free_fn, void *pool_data,
gfp_t gfp_mask, int nid);
+#define mempool_create_node(...) \
+ alloc_hooks(mempool_create_node_noprof(__VA_ARGS__))
+
+#define mempool_create(_min_nr, _alloc_fn, _free_fn, _pool_data) \
+ mempool_create_node(_min_nr, _alloc_fn, _free_fn, _pool_data, \
+ GFP_KERNEL, NUMA_NO_NODE)

extern int mempool_resize(mempool_t *pool, int new_min_nr);
extern void mempool_destroy(mempool_t *pool);
-extern void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask) __malloc;
+
+extern void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask) __malloc;
+#define mempool_alloc(...) \
+ alloc_hooks(mempool_alloc_noprof(__VA_ARGS__))
+
extern void mempool_free(void *element, mempool_t *pool);

/*
@@ -61,19 +77,10 @@ extern void mempool_free(void *element, mempool_t *pool);
void *mempool_alloc_slab(gfp_t gfp_mask, void *pool_data);
void mempool_free_slab(void *element, void *pool_data);

-static inline int
-mempool_init_slab_pool(mempool_t *pool, int min_nr, struct kmem_cache *kc)
-{
- return mempool_init(pool, min_nr, mempool_alloc_slab,
- mempool_free_slab, (void *) kc);
-}
-
-static inline mempool_t *
-mempool_create_slab_pool(int min_nr, struct kmem_cache *kc)
-{
- return mempool_create(min_nr, mempool_alloc_slab, mempool_free_slab,
- (void *) kc);
-}
+#define mempool_init_slab_pool(_pool, _min_nr, _kc) \
+ mempool_init(_pool, (_min_nr), mempool_alloc_slab, mempool_free_slab, (void *)(_kc))
+#define mempool_create_slab_pool(_min_nr, _kc) \
+ mempool_create((_min_nr), mempool_alloc_slab, mempool_free_slab, (void *)(_kc))

/*
* a mempool_alloc_t and a mempool_free_t to kmalloc and kfree the
@@ -82,17 +89,12 @@ mempool_create_slab_pool(int min_nr, struct kmem_cache *kc)
void *mempool_kmalloc(gfp_t gfp_mask, void *pool_data);
void mempool_kfree(void *element, void *pool_data);

-static inline int mempool_init_kmalloc_pool(mempool_t *pool, int min_nr, size_t size)
-{
- return mempool_init(pool, min_nr, mempool_kmalloc,
- mempool_kfree, (void *) size);
-}
-
-static inline mempool_t *mempool_create_kmalloc_pool(int min_nr, size_t size)
-{
- return mempool_create(min_nr, mempool_kmalloc, mempool_kfree,
- (void *) size);
-}
+#define mempool_init_kmalloc_pool(_pool, _min_nr, _size) \
+ mempool_init(_pool, (_min_nr), mempool_kmalloc, mempool_kfree, \
+ (void *)(unsigned long)(_size))
+#define mempool_create_kmalloc_pool(_min_nr, _size) \
+ mempool_create((_min_nr), mempool_kmalloc, mempool_kfree, \
+ (void *)(unsigned long)(_size))

/*
* A mempool_alloc_t and mempool_free_t for a simple page allocator that
@@ -101,16 +103,11 @@ static inline mempool_t *mempool_create_kmalloc_pool(int min_nr, size_t size)
void *mempool_alloc_pages(gfp_t gfp_mask, void *pool_data);
void mempool_free_pages(void *element, void *pool_data);

-static inline int mempool_init_page_pool(mempool_t *pool, int min_nr, int order)
-{
- return mempool_init(pool, min_nr, mempool_alloc_pages,
- mempool_free_pages, (void *)(long)order);
-}
-
-static inline mempool_t *mempool_create_page_pool(int min_nr, int order)
-{
- return mempool_create(min_nr, mempool_alloc_pages, mempool_free_pages,
- (void *)(long)order);
-}
+#define mempool_init_page_pool(_pool, _min_nr, _order) \
+ mempool_init(_pool, (_min_nr), mempool_alloc_pages, \
+ mempool_free_pages, (void *)(long)(_order))
+#define mempool_create_page_pool(_min_nr, _order) \
+ mempool_create((_min_nr), mempool_alloc_pages, \
+ mempool_free_pages, (void *)(long)(_order))

#endif /* _LINUX_MEMPOOL_H */
diff --git a/mm/mempool.c b/mm/mempool.c
index 734bcf5afbb7..4fd949178449 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -230,17 +230,17 @@ EXPORT_SYMBOL(mempool_init_node);
*
* Return: %0 on success, negative error code otherwise.
*/
-int mempool_init(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
- mempool_free_t *free_fn, void *pool_data)
+int mempool_init_noprof(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
+ mempool_free_t *free_fn, void *pool_data)
{
return mempool_init_node(pool, min_nr, alloc_fn, free_fn,
pool_data, GFP_KERNEL, NUMA_NO_NODE);

}
-EXPORT_SYMBOL(mempool_init);
+EXPORT_SYMBOL(mempool_init_noprof);

/**
- * mempool_create - create a memory pool
+ * mempool_create_node - create a memory pool
* @min_nr: the minimum number of elements guaranteed to be
* allocated for this pool.
* @alloc_fn: user-defined element-allocation function.
@@ -255,17 +255,9 @@ EXPORT_SYMBOL(mempool_init);
*
* Return: pointer to the created memory pool object or %NULL on error.
*/
-mempool_t *mempool_create(int min_nr, mempool_alloc_t *alloc_fn,
- mempool_free_t *free_fn, void *pool_data)
-{
- return mempool_create_node(min_nr, alloc_fn, free_fn, pool_data,
- GFP_KERNEL, NUMA_NO_NODE);
-}
-EXPORT_SYMBOL(mempool_create);
-
-mempool_t *mempool_create_node(int min_nr, mempool_alloc_t *alloc_fn,
- mempool_free_t *free_fn, void *pool_data,
- gfp_t gfp_mask, int node_id)
+mempool_t *mempool_create_node_noprof(int min_nr, mempool_alloc_t *alloc_fn,
+ mempool_free_t *free_fn, void *pool_data,
+ gfp_t gfp_mask, int node_id)
{
mempool_t *pool;

@@ -281,7 +273,7 @@ mempool_t *mempool_create_node(int min_nr, mempool_alloc_t *alloc_fn,

return pool;
}
-EXPORT_SYMBOL(mempool_create_node);
+EXPORT_SYMBOL(mempool_create_node_noprof);

/**
* mempool_resize - resize an existing memory pool
@@ -377,7 +369,7 @@ EXPORT_SYMBOL(mempool_resize);
*
* Return: pointer to the allocated element or %NULL on error.
*/
-void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
+void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
{
void *element;
unsigned long flags;
@@ -444,7 +436,7 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
finish_wait(&pool->wait, &wait);
goto repeat_alloc;
}
-EXPORT_SYMBOL(mempool_alloc);
+EXPORT_SYMBOL(mempool_alloc_noprof);

/**
* mempool_free - return an element to the pool.
@@ -515,7 +507,7 @@ void *mempool_alloc_slab(gfp_t gfp_mask, void *pool_data)
{
struct kmem_cache *mem = pool_data;
VM_BUG_ON(mem->ctor);
- return kmem_cache_alloc(mem, gfp_mask);
+ return kmem_cache_alloc_noprof(mem, gfp_mask);
}
EXPORT_SYMBOL(mempool_alloc_slab);

@@ -533,7 +525,7 @@ EXPORT_SYMBOL(mempool_free_slab);
void *mempool_kmalloc(gfp_t gfp_mask, void *pool_data)
{
size_t size = (size_t)pool_data;
- return kmalloc(size, gfp_mask);
+ return kmalloc_noprof(size, gfp_mask);
}
EXPORT_SYMBOL(mempool_kmalloc);

@@ -550,7 +542,7 @@ EXPORT_SYMBOL(mempool_kfree);
void *mempool_alloc_pages(gfp_t gfp_mask, void *pool_data)
{
int order = (int)(long)pool_data;
- return alloc_pages(gfp_mask, order);
+ return alloc_pages_noprof(gfp_mask, order);
}
EXPORT_SYMBOL(mempool_alloc_pages);

--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:51:26

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 33/39] mm: vmalloc: Enable memory allocation profiling

From: Kent Overstreet <[email protected]>

This wrapps all external vmalloc allocation functions with the
alloc_hooks() wrapper, and switches internal allocations to _noprof
variants where appropriate, for the new memory allocation profiling
feature.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
drivers/staging/media/atomisp/pci/hmm/hmm.c | 2 +-
include/linux/vmalloc.h | 60 ++++++++++----
kernel/kallsyms_selftest.c | 2 +-
mm/util.c | 24 +++---
mm/vmalloc.c | 88 ++++++++++-----------
5 files changed, 103 insertions(+), 73 deletions(-)

diff --git a/drivers/staging/media/atomisp/pci/hmm/hmm.c b/drivers/staging/media/atomisp/pci/hmm/hmm.c
index bb12644fd033..3e2899ad8517 100644
--- a/drivers/staging/media/atomisp/pci/hmm/hmm.c
+++ b/drivers/staging/media/atomisp/pci/hmm/hmm.c
@@ -205,7 +205,7 @@ static ia_css_ptr __hmm_alloc(size_t bytes, enum hmm_bo_type type,
}

dev_dbg(atomisp_dev, "pages: 0x%08x (%zu bytes), type: %d, vmalloc %p\n",
- bo->start, bytes, type, vmalloc);
+ bo->start, bytes, type, vmalloc_noprof);

return bo->start;

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index c720be70c8dd..106d78e75606 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -2,6 +2,8 @@
#ifndef _LINUX_VMALLOC_H
#define _LINUX_VMALLOC_H

+#include <linux/alloc_tag.h>
+#include <linux/sched.h>
#include <linux/spinlock.h>
#include <linux/init.h>
#include <linux/list.h>
@@ -137,26 +139,54 @@ extern unsigned long vmalloc_nr_pages(void);
static inline unsigned long vmalloc_nr_pages(void) { return 0; }
#endif

-extern void *vmalloc(unsigned long size) __alloc_size(1);
-extern void *vzalloc(unsigned long size) __alloc_size(1);
-extern void *vmalloc_user(unsigned long size) __alloc_size(1);
-extern void *vmalloc_node(unsigned long size, int node) __alloc_size(1);
-extern void *vzalloc_node(unsigned long size, int node) __alloc_size(1);
-extern void *vmalloc_32(unsigned long size) __alloc_size(1);
-extern void *vmalloc_32_user(unsigned long size) __alloc_size(1);
-extern void *__vmalloc(unsigned long size, gfp_t gfp_mask) __alloc_size(1);
-extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
+extern void *vmalloc_noprof(unsigned long size) __alloc_size(1);
+#define vmalloc(...) alloc_hooks(vmalloc_noprof(__VA_ARGS__))
+
+extern void *vzalloc_noprof(unsigned long size) __alloc_size(1);
+#define vzalloc(...) alloc_hooks(vzalloc_noprof(__VA_ARGS__))
+
+extern void *vmalloc_user_noprof(unsigned long size) __alloc_size(1);
+#define vmalloc_user(...) alloc_hooks(vmalloc_user_noprof(__VA_ARGS__))
+
+extern void *vmalloc_node_noprof(unsigned long size, int node) __alloc_size(1);
+#define vmalloc_node(...) alloc_hooks(vmalloc_node_noprof(__VA_ARGS__))
+
+extern void *vzalloc_node_noprof(unsigned long size, int node) __alloc_size(1);
+#define vzalloc_node(...) alloc_hooks(vzalloc_node_noprof(__VA_ARGS__))
+
+extern void *vmalloc_32_noprof(unsigned long size) __alloc_size(1);
+#define vmalloc_32(...) alloc_hooks(vmalloc_32_noprof(__VA_ARGS__))
+
+extern void *vmalloc_32_user_noprof(unsigned long size) __alloc_size(1);
+#define vmalloc_32_user(...) alloc_hooks(vmalloc_32_user_noprof(__VA_ARGS__))
+
+extern void *__vmalloc_noprof(unsigned long size, gfp_t gfp_mask) __alloc_size(1);
+#define __vmalloc(...) alloc_hooks(__vmalloc_noprof(__VA_ARGS__))
+
+extern void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
unsigned long start, unsigned long end, gfp_t gfp_mask,
pgprot_t prot, unsigned long vm_flags, int node,
const void *caller) __alloc_size(1);
-void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
+#define __vmalloc_node_range(...) alloc_hooks(__vmalloc_node_range_noprof(__VA_ARGS__))
+
+void *__vmalloc_node_noprof(unsigned long size, unsigned long align, gfp_t gfp_mask,
int node, const void *caller) __alloc_size(1);
-void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) __alloc_size(1);
+#define __vmalloc_node(...) alloc_hooks(__vmalloc_node_noprof(__VA_ARGS__))
+
+void *vmalloc_huge_noprof(unsigned long size, gfp_t gfp_mask) __alloc_size(1);
+#define vmalloc_huge(...) alloc_hooks(vmalloc_huge_noprof(__VA_ARGS__))
+
+extern void *__vmalloc_array_noprof(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
+#define __vmalloc_array(...) alloc_hooks(__vmalloc_array_noprof(__VA_ARGS__))
+
+extern void *vmalloc_array_noprof(size_t n, size_t size) __alloc_size(1, 2);
+#define vmalloc_array(...) alloc_hooks(vmalloc_array_noprof(__VA_ARGS__))
+
+extern void *__vcalloc_noprof(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
+#define __vcalloc(...) alloc_hooks(__vcalloc_noprof(__VA_ARGS__))

-extern void *__vmalloc_array(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
-extern void *vmalloc_array(size_t n, size_t size) __alloc_size(1, 2);
-extern void *__vcalloc(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2);
-extern void *vcalloc(size_t n, size_t size) __alloc_size(1, 2);
+extern void *vcalloc_noprof(size_t n, size_t size) __alloc_size(1, 2);
+#define vcalloc(...) alloc_hooks(vcalloc_noprof(__VA_ARGS__))

extern void vfree(const void *addr);
extern void vfree_atomic(const void *addr);
diff --git a/kernel/kallsyms_selftest.c b/kernel/kallsyms_selftest.c
index b4cac76ea5e9..3ea9be364e32 100644
--- a/kernel/kallsyms_selftest.c
+++ b/kernel/kallsyms_selftest.c
@@ -82,7 +82,7 @@ static struct test_item test_items[] = {
ITEM_FUNC(kallsyms_test_func_static),
ITEM_FUNC(kallsyms_test_func),
ITEM_FUNC(kallsyms_test_func_weak),
- ITEM_FUNC(vmalloc),
+ ITEM_FUNC(vmalloc_noprof),
ITEM_FUNC(vfree),
#ifdef CONFIG_KALLSYMS_ALL
ITEM_DATA(kallsyms_test_var_bss_static),
diff --git a/mm/util.c b/mm/util.c
index 27ed6a5ac31a..7e6043ece040 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -629,7 +629,7 @@ void *kvmalloc_node_noprof(size_t size, gfp_t flags, int node)
* about the resulting pointer, and cannot play
* protection games.
*/
- return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
+ return __vmalloc_node_range_noprof(size, 1, VMALLOC_START, VMALLOC_END,
flags, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
node, __builtin_return_address(0));
}
@@ -688,12 +688,12 @@ void *kvrealloc_noprof(const void *p, size_t oldsize, size_t newsize, gfp_t flag
EXPORT_SYMBOL(kvrealloc_noprof);

/**
- * __vmalloc_array - allocate memory for a virtually contiguous array.
+ * __vmalloc_array_noprof - allocate memory for a virtually contiguous array.
* @n: number of elements.
* @size: element size.
* @flags: the type of memory to allocate (see kmalloc).
*/
-void *__vmalloc_array(size_t n, size_t size, gfp_t flags)
+void *__vmalloc_array_noprof(size_t n, size_t size, gfp_t flags)
{
size_t bytes;

@@ -701,18 +701,18 @@ void *__vmalloc_array(size_t n, size_t size, gfp_t flags)
return NULL;
return __vmalloc(bytes, flags);
}
-EXPORT_SYMBOL(__vmalloc_array);
+EXPORT_SYMBOL(__vmalloc_array_noprof);

/**
- * vmalloc_array - allocate memory for a virtually contiguous array.
+ * vmalloc_array_noprof - allocate memory for a virtually contiguous array.
* @n: number of elements.
* @size: element size.
*/
-void *vmalloc_array(size_t n, size_t size)
+void *vmalloc_array_noprof(size_t n, size_t size)
{
return __vmalloc_array(n, size, GFP_KERNEL);
}
-EXPORT_SYMBOL(vmalloc_array);
+EXPORT_SYMBOL(vmalloc_array_noprof);

/**
* __vcalloc - allocate and zero memory for a virtually contiguous array.
@@ -720,22 +720,22 @@ EXPORT_SYMBOL(vmalloc_array);
* @size: element size.
* @flags: the type of memory to allocate (see kmalloc).
*/
-void *__vcalloc(size_t n, size_t size, gfp_t flags)
+void *__vcalloc_noprof(size_t n, size_t size, gfp_t flags)
{
return __vmalloc_array(n, size, flags | __GFP_ZERO);
}
-EXPORT_SYMBOL(__vcalloc);
+EXPORT_SYMBOL(__vcalloc_noprof);

/**
- * vcalloc - allocate and zero memory for a virtually contiguous array.
+ * vcalloc_noprof - allocate and zero memory for a virtually contiguous array.
* @n: number of elements.
* @size: element size.
*/
-void *vcalloc(size_t n, size_t size)
+void *vcalloc_noprof(size_t n, size_t size)
{
return __vmalloc_array(n, size, GFP_KERNEL | __GFP_ZERO);
}
-EXPORT_SYMBOL(vcalloc);
+EXPORT_SYMBOL(vcalloc_noprof);

struct anon_vma *folio_anon_vma(struct folio *folio)
{
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a3fedb3ee0db..62fd33094bfd 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3025,12 +3025,12 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
* but mempolicy wants to alloc memory by interleaving.
*/
if (IS_ENABLED(CONFIG_NUMA) && nid == NUMA_NO_NODE)
- nr = alloc_pages_bulk_array_mempolicy(bulk_gfp,
+ nr = alloc_pages_bulk_array_mempolicy_noprof(bulk_gfp,
nr_pages_request,
pages + nr_allocated);

else
- nr = alloc_pages_bulk_array_node(bulk_gfp, nid,
+ nr = alloc_pages_bulk_array_node_noprof(bulk_gfp, nid,
nr_pages_request,
pages + nr_allocated);

@@ -3060,9 +3060,9 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
break;

if (nid == NUMA_NO_NODE)
- page = alloc_pages(alloc_gfp, order);
+ page = alloc_pages_noprof(alloc_gfp, order);
else
- page = alloc_pages_node(nid, alloc_gfp, order);
+ page = alloc_pages_node_noprof(nid, alloc_gfp, order);
if (unlikely(!page)) {
if (!nofail)
break;
@@ -3119,10 +3119,10 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,

/* Please note that the recursion is strictly bounded. */
if (array_size > PAGE_SIZE) {
- area->pages = __vmalloc_node(array_size, 1, nested_gfp, node,
+ area->pages = __vmalloc_node_noprof(array_size, 1, nested_gfp, node,
area->caller);
} else {
- area->pages = kmalloc_node(array_size, nested_gfp, node);
+ area->pages = kmalloc_node_noprof(array_size, nested_gfp, node);
}

if (!area->pages) {
@@ -3205,7 +3205,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
}

/**
- * __vmalloc_node_range - allocate virtually contiguous memory
+ * __vmalloc_node_range_noprof - allocate virtually contiguous memory
* @size: allocation size
* @align: desired alignment
* @start: vm area range start
@@ -3232,7 +3232,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
*
* Return: the address of the area or %NULL on failure
*/
-void *__vmalloc_node_range(unsigned long size, unsigned long align,
+void *__vmalloc_node_range_noprof(unsigned long size, unsigned long align,
unsigned long start, unsigned long end, gfp_t gfp_mask,
pgprot_t prot, unsigned long vm_flags, int node,
const void *caller)
@@ -3361,7 +3361,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
}

/**
- * __vmalloc_node - allocate virtually contiguous memory
+ * __vmalloc_node_noprof - allocate virtually contiguous memory
* @size: allocation size
* @align: desired alignment
* @gfp_mask: flags for the page level allocator
@@ -3379,10 +3379,10 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *__vmalloc_node(unsigned long size, unsigned long align,
+void *__vmalloc_node_noprof(unsigned long size, unsigned long align,
gfp_t gfp_mask, int node, const void *caller)
{
- return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END,
+ return __vmalloc_node_range_noprof(size, align, VMALLOC_START, VMALLOC_END,
gfp_mask, PAGE_KERNEL, 0, node, caller);
}
/*
@@ -3391,15 +3391,15 @@ void *__vmalloc_node(unsigned long size, unsigned long align,
* than that.
*/
#ifdef CONFIG_TEST_VMALLOC_MODULE
-EXPORT_SYMBOL_GPL(__vmalloc_node);
+EXPORT_SYMBOL_GPL(__vmalloc_node_noprof);
#endif

-void *__vmalloc(unsigned long size, gfp_t gfp_mask)
+void *__vmalloc_noprof(unsigned long size, gfp_t gfp_mask)
{
- return __vmalloc_node(size, 1, gfp_mask, NUMA_NO_NODE,
+ return __vmalloc_node_noprof(size, 1, gfp_mask, NUMA_NO_NODE,
__builtin_return_address(0));
}
-EXPORT_SYMBOL(__vmalloc);
+EXPORT_SYMBOL(__vmalloc_noprof);

/**
* vmalloc - allocate virtually contiguous memory
@@ -3413,12 +3413,12 @@ EXPORT_SYMBOL(__vmalloc);
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *vmalloc(unsigned long size)
+void *vmalloc_noprof(unsigned long size)
{
- return __vmalloc_node(size, 1, GFP_KERNEL, NUMA_NO_NODE,
+ return __vmalloc_node_noprof(size, 1, GFP_KERNEL, NUMA_NO_NODE,
__builtin_return_address(0));
}
-EXPORT_SYMBOL(vmalloc);
+EXPORT_SYMBOL(vmalloc_noprof);

/**
* vmalloc_huge - allocate virtually contiguous memory, allow huge pages
@@ -3432,16 +3432,16 @@ EXPORT_SYMBOL(vmalloc);
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *vmalloc_huge(unsigned long size, gfp_t gfp_mask)
+void *vmalloc_huge_noprof(unsigned long size, gfp_t gfp_mask)
{
- return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
+ return __vmalloc_node_range_noprof(size, 1, VMALLOC_START, VMALLOC_END,
gfp_mask, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
NUMA_NO_NODE, __builtin_return_address(0));
}
-EXPORT_SYMBOL_GPL(vmalloc_huge);
+EXPORT_SYMBOL_GPL(vmalloc_huge_noprof);

/**
- * vzalloc - allocate virtually contiguous memory with zero fill
+ * vzalloc_noprof - allocate virtually contiguous memory with zero fill
* @size: allocation size
*
* Allocate enough pages to cover @size from the page level
@@ -3453,12 +3453,12 @@ EXPORT_SYMBOL_GPL(vmalloc_huge);
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *vzalloc(unsigned long size)
+void *vzalloc_noprof(unsigned long size)
{
- return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_ZERO, NUMA_NO_NODE,
+ return __vmalloc_node_noprof(size, 1, GFP_KERNEL | __GFP_ZERO, NUMA_NO_NODE,
__builtin_return_address(0));
}
-EXPORT_SYMBOL(vzalloc);
+EXPORT_SYMBOL(vzalloc_noprof);

/**
* vmalloc_user - allocate zeroed virtually contiguous memory for userspace
@@ -3469,17 +3469,17 @@ EXPORT_SYMBOL(vzalloc);
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *vmalloc_user(unsigned long size)
+void *vmalloc_user_noprof(unsigned long size)
{
- return __vmalloc_node_range(size, SHMLBA, VMALLOC_START, VMALLOC_END,
+ return __vmalloc_node_range_noprof(size, SHMLBA, VMALLOC_START, VMALLOC_END,
GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL,
VM_USERMAP, NUMA_NO_NODE,
__builtin_return_address(0));
}
-EXPORT_SYMBOL(vmalloc_user);
+EXPORT_SYMBOL(vmalloc_user_noprof);

/**
- * vmalloc_node - allocate memory on a specific node
+ * vmalloc_node_noprof - allocate memory on a specific node
* @size: allocation size
* @node: numa node
*
@@ -3491,15 +3491,15 @@ EXPORT_SYMBOL(vmalloc_user);
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *vmalloc_node(unsigned long size, int node)
+void *vmalloc_node_noprof(unsigned long size, int node)
{
- return __vmalloc_node(size, 1, GFP_KERNEL, node,
+ return __vmalloc_node_noprof(size, 1, GFP_KERNEL, node,
__builtin_return_address(0));
}
-EXPORT_SYMBOL(vmalloc_node);
+EXPORT_SYMBOL(vmalloc_node_noprof);

/**
- * vzalloc_node - allocate memory on a specific node with zero fill
+ * vzalloc_node_noprof - allocate memory on a specific node with zero fill
* @size: allocation size
* @node: numa node
*
@@ -3509,12 +3509,12 @@ EXPORT_SYMBOL(vmalloc_node);
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *vzalloc_node(unsigned long size, int node)
+void *vzalloc_node_noprof(unsigned long size, int node)
{
- return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_ZERO, node,
+ return __vmalloc_node_noprof(size, 1, GFP_KERNEL | __GFP_ZERO, node,
__builtin_return_address(0));
}
-EXPORT_SYMBOL(vzalloc_node);
+EXPORT_SYMBOL(vzalloc_node_noprof);

#if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)
#define GFP_VMALLOC32 (GFP_DMA32 | GFP_KERNEL)
@@ -3529,7 +3529,7 @@ EXPORT_SYMBOL(vzalloc_node);
#endif

/**
- * vmalloc_32 - allocate virtually contiguous memory (32bit addressable)
+ * vmalloc_32_noprof - allocate virtually contiguous memory (32bit addressable)
* @size: allocation size
*
* Allocate enough 32bit PA addressable pages to cover @size from the
@@ -3537,15 +3537,15 @@ EXPORT_SYMBOL(vzalloc_node);
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *vmalloc_32(unsigned long size)
+void *vmalloc_32_noprof(unsigned long size)
{
- return __vmalloc_node(size, 1, GFP_VMALLOC32, NUMA_NO_NODE,
+ return __vmalloc_node_noprof(size, 1, GFP_VMALLOC32, NUMA_NO_NODE,
__builtin_return_address(0));
}
-EXPORT_SYMBOL(vmalloc_32);
+EXPORT_SYMBOL(vmalloc_32_noprof);

/**
- * vmalloc_32_user - allocate zeroed virtually contiguous 32bit memory
+ * vmalloc_32_user_noprof - allocate zeroed virtually contiguous 32bit memory
* @size: allocation size
*
* The resulting memory area is 32bit addressable and zeroed so it can be
@@ -3553,14 +3553,14 @@ EXPORT_SYMBOL(vmalloc_32);
*
* Return: pointer to the allocated memory or %NULL on error
*/
-void *vmalloc_32_user(unsigned long size)
+void *vmalloc_32_user_noprof(unsigned long size)
{
- return __vmalloc_node_range(size, SHMLBA, VMALLOC_START, VMALLOC_END,
+ return __vmalloc_node_range_noprof(size, SHMLBA, VMALLOC_START, VMALLOC_END,
GFP_VMALLOC32 | __GFP_ZERO, PAGE_KERNEL,
VM_USERMAP, NUMA_NO_NODE,
__builtin_return_address(0));
}
-EXPORT_SYMBOL(vmalloc_32_user);
+EXPORT_SYMBOL(vmalloc_32_user_noprof);

/*
* Atomically zero bytes in the iterator.
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:51:30

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 27/39] xfs: Memory allocation profiling fixups

From: Kent Overstreet <[email protected]>

This adds an alloc_hooks() wrapper around kmem_alloc(), so that we can
have allocations accounted to the proper callsite.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
fs/xfs/kmem.c | 4 ++--
fs/xfs/kmem.h | 10 ++++------
2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
index c557a030acfe..9aa57a4e2478 100644
--- a/fs/xfs/kmem.c
+++ b/fs/xfs/kmem.c
@@ -8,7 +8,7 @@
#include "xfs_trace.h"

void *
-kmem_alloc(size_t size, xfs_km_flags_t flags)
+kmem_alloc_noprof(size_t size, xfs_km_flags_t flags)
{
int retries = 0;
gfp_t lflags = kmem_flags_convert(flags);
@@ -17,7 +17,7 @@ kmem_alloc(size_t size, xfs_km_flags_t flags)
trace_kmem_alloc(size, flags, _RET_IP_);

do {
- ptr = kmalloc(size, lflags);
+ ptr = kmalloc_noprof(size, lflags);
if (ptr || (flags & KM_MAYFAIL))
return ptr;
if (!(++retries % 100))
diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
index b987dc2c6851..c4cf1dc2a7af 100644
--- a/fs/xfs/kmem.h
+++ b/fs/xfs/kmem.h
@@ -6,6 +6,7 @@
#ifndef __XFS_SUPPORT_KMEM_H__
#define __XFS_SUPPORT_KMEM_H__

+#include <linux/alloc_tag.h>
#include <linux/slab.h>
#include <linux/sched.h>
#include <linux/mm.h>
@@ -56,18 +57,15 @@ kmem_flags_convert(xfs_km_flags_t flags)
return lflags;
}

-extern void *kmem_alloc(size_t, xfs_km_flags_t);
static inline void kmem_free(const void *ptr)
{
kvfree(ptr);
}

+extern void *kmem_alloc_noprof(size_t, xfs_km_flags_t);
+#define kmem_alloc(...) alloc_hooks(kmem_alloc_noprof(__VA_ARGS__))

-static inline void *
-kmem_zalloc(size_t size, xfs_km_flags_t flags)
-{
- return kmem_alloc(size, flags | KM_ZERO);
-}
+#define kmem_zalloc(_size, _flags) kmem_alloc((_size), (_flags) | KM_ZERO)

/*
* Zone interfaces
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:51:31

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 36/39] codetag: debug: skip objext checking when it's for objext itself

objext objects are created with __GFP_NO_OBJ_EXT flag and therefore have
no corresponding objext themselves (otherwise we would get an infinite
recursion). When freeing these objects their codetag will be empty and
when CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled this will lead to false
warnings. Introduce CODETAG_EMPTY special codetag value to mark
allocations which intentionally lack codetag to avoid these warnings.
Set objext codetags to CODETAG_EMPTY before freeing to indicate that
the codetag is expected to be empty.

Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/alloc_tag.h | 26 ++++++++++++++++++++++++++
mm/slab.h | 33 +++++++++++++++++++++++++++++++++
mm/slab_common.c | 1 +
3 files changed, 60 insertions(+)

diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 0a5973c4ad77..1f3207097b03 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -77,6 +77,27 @@ static inline struct alloc_tag_counters alloc_tag_read(struct alloc_tag *tag)
return v;
}

+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+
+#define CODETAG_EMPTY (void *)1
+
+static inline bool is_codetag_empty(union codetag_ref *ref)
+{
+ return ref->ct == CODETAG_EMPTY;
+}
+
+static inline void set_codetag_empty(union codetag_ref *ref)
+{
+ if (ref)
+ ref->ct = CODETAG_EMPTY;
+}
+
+#else /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */
+
+static inline bool is_codetag_empty(union codetag_ref *ref) { return false; }
+
+#endif /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */
+
static inline void __alloc_tag_sub(union codetag_ref *ref, size_t bytes)
{
struct alloc_tag *tag;
@@ -87,6 +108,11 @@ static inline void __alloc_tag_sub(union codetag_ref *ref, size_t bytes)
if (!ref || !ref->ct)
return;

+ if (is_codetag_empty(ref)) {
+ ref->ct = NULL;
+ return;
+ }
+
tag = ct_to_alloc_tag(ref->ct);

this_cpu_sub(tag->counters->bytes, bytes);
diff --git a/mm/slab.h b/mm/slab.h
index 4859ce1f8808..45216bad34b8 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -455,6 +455,31 @@ static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
gfp_t gfp, bool new_slab);

+
+#ifdef CONFIG_MEM_ALLOC_PROFILING_DEBUG
+
+static inline void mark_objexts_empty(struct slabobj_ext *obj_exts)
+{
+ struct slabobj_ext *slab_exts;
+ struct slab *obj_exts_slab;
+
+ obj_exts_slab = virt_to_slab(obj_exts);
+ slab_exts = slab_obj_exts(obj_exts_slab);
+ if (slab_exts) {
+ unsigned int offs = obj_to_index(obj_exts_slab->slab_cache,
+ obj_exts_slab, obj_exts);
+ /* codetag should be NULL */
+ WARN_ON(slab_exts[offs].ref.ct);
+ set_codetag_empty(&slab_exts[offs].ref);
+ }
+}
+
+#else /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */
+
+static inline void mark_objexts_empty(struct slabobj_ext *obj_exts) {}
+
+#endif /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */
+
static inline bool need_slab_obj_ext(void)
{
#ifdef CONFIG_MEM_ALLOC_PROFILING
@@ -476,6 +501,14 @@ static inline void free_slab_obj_exts(struct slab *slab)
if (!obj_exts)
return;

+ /*
+ * obj_exts was created with __GFP_NO_OBJ_EXT flag, therefore its
+ * corresponding extension will be NULL. alloc_tag_sub() will throw a
+ * warning if slab has extensions but the extension of an object is
+ * NULL, therefore replace NULL with CODETAG_EMPTY to indicate that
+ * the extension for obj_exts is expected to be NULL.
+ */
+ mark_objexts_empty(obj_exts);
kfree(obj_exts);
slab->obj_exts = 0;
}
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 8ef5e47ff6a7..db2cd7afc353 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -246,6 +246,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
* assign slabobj_exts in parallel. In this case the existing
* objcg vector should be reused.
*/
+ mark_objexts_empty(vec);
kfree(vec);
return 0;
}
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:51:34

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 34/39] rhashtable: Plumb through alloc tag

From: Kent Overstreet <[email protected]>

This gives better memory allocation profiling results; rhashtable
allocations will be accounted to the code that initialized the
rhashtable.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/rhashtable-types.h | 11 +++++--
lib/rhashtable.c | 52 +++++++++++++++++++++++++-------
2 files changed, 50 insertions(+), 13 deletions(-)

diff --git a/include/linux/rhashtable-types.h b/include/linux/rhashtable-types.h
index 57467cbf4c5b..aac2984c2ef0 100644
--- a/include/linux/rhashtable-types.h
+++ b/include/linux/rhashtable-types.h
@@ -9,6 +9,7 @@
#ifndef _LINUX_RHASHTABLE_TYPES_H
#define _LINUX_RHASHTABLE_TYPES_H

+#include <linux/alloc_tag.h>
#include <linux/atomic.h>
#include <linux/compiler.h>
#include <linux/mutex.h>
@@ -88,6 +89,9 @@ struct rhashtable {
struct mutex mutex;
spinlock_t lock;
atomic_t nelems;
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+ struct alloc_tag *alloc_tag;
+#endif
};

/**
@@ -127,9 +131,12 @@ struct rhashtable_iter {
bool end_of_table;
};

-int rhashtable_init(struct rhashtable *ht,
+int rhashtable_init_noprof(struct rhashtable *ht,
const struct rhashtable_params *params);
-int rhltable_init(struct rhltable *hlt,
+#define rhashtable_init(...) alloc_hooks(rhashtable_init_noprof(__VA_ARGS__))
+
+int rhltable_init_noprof(struct rhltable *hlt,
const struct rhashtable_params *params);
+#define rhltable_init(...) alloc_hooks(rhltable_init_noprof(__VA_ARGS__))

#endif /* _LINUX_RHASHTABLE_TYPES_H */
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 6ae2ba8e06a2..b62116f332b8 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -63,6 +63,27 @@ EXPORT_SYMBOL_GPL(lockdep_rht_bucket_is_held);
#define ASSERT_RHT_MUTEX(HT)
#endif

+#ifdef CONFIG_MEM_ALLOC_PROFILING
+static inline void rhashtable_alloc_tag_init(struct rhashtable *ht)
+{
+ ht->alloc_tag = current->alloc_tag;
+}
+
+static inline struct alloc_tag *rhashtable_alloc_tag_save(struct rhashtable *ht)
+{
+ return alloc_tag_save(ht->alloc_tag);
+}
+
+static inline void rhashtable_alloc_tag_restore(struct rhashtable *ht, struct alloc_tag *old)
+{
+ alloc_tag_restore(ht->alloc_tag, old);
+}
+#else
+#define rhashtable_alloc_tag_init(ht)
+static inline struct alloc_tag *rhashtable_alloc_tag_save(struct rhashtable *ht) { return NULL; }
+#define rhashtable_alloc_tag_restore(ht, old)
+#endif
+
static inline union nested_table *nested_table_top(
const struct bucket_table *tbl)
{
@@ -130,7 +151,7 @@ static union nested_table *nested_table_alloc(struct rhashtable *ht,
if (ntbl)
return ntbl;

- ntbl = kzalloc(PAGE_SIZE, GFP_ATOMIC);
+ ntbl = kmalloc_noprof(PAGE_SIZE, GFP_ATOMIC|__GFP_ZERO);

if (ntbl && leaf) {
for (i = 0; i < PAGE_SIZE / sizeof(ntbl[0]); i++)
@@ -157,7 +178,7 @@ static struct bucket_table *nested_bucket_table_alloc(struct rhashtable *ht,

size = sizeof(*tbl) + sizeof(tbl->buckets[0]);

- tbl = kzalloc(size, gfp);
+ tbl = kmalloc_noprof(size, gfp|__GFP_ZERO);
if (!tbl)
return NULL;

@@ -180,8 +201,10 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
size_t size;
int i;
static struct lock_class_key __key;
+ struct alloc_tag * __maybe_unused old = rhashtable_alloc_tag_save(ht);

- tbl = kvzalloc(struct_size(tbl, buckets, nbuckets), gfp);
+ tbl = kvmalloc_node_noprof(struct_size(tbl, buckets, nbuckets),
+ gfp|__GFP_ZERO, NUMA_NO_NODE);

size = nbuckets;

@@ -190,6 +213,8 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
nbuckets = 0;
}

+ rhashtable_alloc_tag_restore(ht, old);
+
if (tbl == NULL)
return NULL;

@@ -975,7 +1000,7 @@ static u32 rhashtable_jhash2(const void *key, u32 length, u32 seed)
}

/**
- * rhashtable_init - initialize a new hash table
+ * rhashtable_init_noprof - initialize a new hash table
* @ht: hash table to be initialized
* @params: configuration parameters
*
@@ -1016,7 +1041,7 @@ static u32 rhashtable_jhash2(const void *key, u32 length, u32 seed)
* .obj_hashfn = my_hash_fn,
* };
*/
-int rhashtable_init(struct rhashtable *ht,
+int rhashtable_init_noprof(struct rhashtable *ht,
const struct rhashtable_params *params)
{
struct bucket_table *tbl;
@@ -1031,6 +1056,8 @@ int rhashtable_init(struct rhashtable *ht,
spin_lock_init(&ht->lock);
memcpy(&ht->p, params, sizeof(*params));

+ rhashtable_alloc_tag_init(ht);
+
if (params->min_size)
ht->p.min_size = roundup_pow_of_two(params->min_size);

@@ -1076,26 +1103,26 @@ int rhashtable_init(struct rhashtable *ht,

return 0;
}
-EXPORT_SYMBOL_GPL(rhashtable_init);
+EXPORT_SYMBOL_GPL(rhashtable_init_noprof);

/**
- * rhltable_init - initialize a new hash list table
+ * rhltable_init_noprof - initialize a new hash list table
* @hlt: hash list table to be initialized
* @params: configuration parameters
*
* Initializes a new hash list table.
*
- * See documentation for rhashtable_init.
+ * See documentation for rhashtable_init_noprof.
*/
-int rhltable_init(struct rhltable *hlt, const struct rhashtable_params *params)
+int rhltable_init_noprof(struct rhltable *hlt, const struct rhashtable_params *params)
{
int err;

- err = rhashtable_init(&hlt->ht, params);
+ err = rhashtable_init_noprof(&hlt->ht, params);
hlt->ht.rhlist = true;
return err;
}
-EXPORT_SYMBOL_GPL(rhltable_init);
+EXPORT_SYMBOL_GPL(rhltable_init_noprof);

static void rhashtable_free_one(struct rhashtable *ht, struct rhash_head *obj,
void (*free_fn)(void *ptr, void *arg),
@@ -1222,6 +1249,7 @@ struct rhash_lock_head __rcu **rht_bucket_nested_insert(
unsigned int index = hash & ((1 << tbl->nest) - 1);
unsigned int size = tbl->size >> tbl->nest;
union nested_table *ntbl;
+ struct alloc_tag * __maybe_unused old = rhashtable_alloc_tag_save(ht);

ntbl = nested_table_top(tbl);
hash >>= tbl->nest;
@@ -1236,6 +1264,8 @@ struct rhash_lock_head __rcu **rht_bucket_nested_insert(
size <= (1 << shift));
}

+ rhashtable_alloc_tag_restore(ht, old);
+
if (!ntbl)
return NULL;

--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:51:41

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 29/39] mm: percpu: Introduce pcpuobj_ext

From: Kent Overstreet <[email protected]>

Upcoming alloc tagging patches require a place to stash per-allocation
metadata.

We already do this when memcg is enabled, so this patch generalizes the
obj_cgroup * vector in struct pcpu_chunk by creating a pcpu_obj_ext
type, which we will be adding to in an upcoming patch - similarly to the
previous slabobj_ext patch.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: [email protected]
---
mm/percpu-internal.h | 19 +++++++++++++++++--
mm/percpu.c | 30 +++++++++++++++---------------
2 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h
index cdd0aa597a81..e62d582f4bf3 100644
--- a/mm/percpu-internal.h
+++ b/mm/percpu-internal.h
@@ -32,6 +32,16 @@ struct pcpu_block_md {
int nr_bits; /* total bits responsible for */
};

+struct pcpuobj_ext {
+#ifdef CONFIG_MEMCG_KMEM
+ struct obj_cgroup *cgroup;
+#endif
+};
+
+#ifdef CONFIG_MEMCG_KMEM
+#define NEED_PCPUOBJ_EXT
+#endif
+
struct pcpu_chunk {
#ifdef CONFIG_PERCPU_STATS
int nr_alloc; /* # of allocations */
@@ -64,8 +74,8 @@ struct pcpu_chunk {
int end_offset; /* additional area required to
have the region end page
aligned */
-#ifdef CONFIG_MEMCG_KMEM
- struct obj_cgroup **obj_cgroups; /* vector of object cgroups */
+#ifdef NEED_PCPUOBJ_EXT
+ struct pcpuobj_ext *obj_exts; /* vector of object cgroups */
#endif

int nr_pages; /* # of pages served by this chunk */
@@ -74,6 +84,11 @@ struct pcpu_chunk {
unsigned long populated[]; /* populated bitmap */
};

+static inline bool need_pcpuobj_ext(void)
+{
+ return !mem_cgroup_kmem_disabled();
+}
+
extern spinlock_t pcpu_lock;

extern struct list_head *pcpu_chunk_lists;
diff --git a/mm/percpu.c b/mm/percpu.c
index a7665de8485f..5a6202acffa3 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1392,9 +1392,9 @@ static struct pcpu_chunk * __init pcpu_alloc_first_chunk(unsigned long tmp_addr,
panic("%s: Failed to allocate %zu bytes\n", __func__,
alloc_size);

-#ifdef CONFIG_MEMCG_KMEM
+#ifdef NEED_PCPUOBJ_EXT
/* first chunk is free to use */
- chunk->obj_cgroups = NULL;
+ chunk->obj_exts = NULL;
#endif
pcpu_init_md_blocks(chunk);

@@ -1463,12 +1463,12 @@ static struct pcpu_chunk *pcpu_alloc_chunk(gfp_t gfp)
if (!chunk->md_blocks)
goto md_blocks_fail;

-#ifdef CONFIG_MEMCG_KMEM
- if (!mem_cgroup_kmem_disabled()) {
- chunk->obj_cgroups =
+#ifdef NEED_PCPUOBJ_EXT
+ if (need_pcpuobj_ext()) {
+ chunk->obj_exts =
pcpu_mem_zalloc(pcpu_chunk_map_bits(chunk) *
- sizeof(struct obj_cgroup *), gfp);
- if (!chunk->obj_cgroups)
+ sizeof(struct pcpuobj_ext), gfp);
+ if (!chunk->obj_exts)
goto objcg_fail;
}
#endif
@@ -1480,7 +1480,7 @@ static struct pcpu_chunk *pcpu_alloc_chunk(gfp_t gfp)

return chunk;

-#ifdef CONFIG_MEMCG_KMEM
+#ifdef NEED_PCPUOBJ_EXT
objcg_fail:
pcpu_mem_free(chunk->md_blocks);
#endif
@@ -1498,8 +1498,8 @@ static void pcpu_free_chunk(struct pcpu_chunk *chunk)
{
if (!chunk)
return;
-#ifdef CONFIG_MEMCG_KMEM
- pcpu_mem_free(chunk->obj_cgroups);
+#ifdef NEED_PCPUOBJ_EXT
+ pcpu_mem_free(chunk->obj_exts);
#endif
pcpu_mem_free(chunk->md_blocks);
pcpu_mem_free(chunk->bound_map);
@@ -1648,8 +1648,8 @@ static void pcpu_memcg_post_alloc_hook(struct obj_cgroup *objcg,
if (!objcg)
return;

- if (likely(chunk && chunk->obj_cgroups)) {
- chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT] = objcg;
+ if (likely(chunk && chunk->obj_exts)) {
+ chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].cgroup = objcg;

rcu_read_lock();
mod_memcg_state(obj_cgroup_memcg(objcg), MEMCG_PERCPU_B,
@@ -1665,13 +1665,13 @@ static void pcpu_memcg_free_hook(struct pcpu_chunk *chunk, int off, size_t size)
{
struct obj_cgroup *objcg;

- if (unlikely(!chunk->obj_cgroups))
+ if (unlikely(!chunk->obj_exts))
return;

- objcg = chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT];
+ objcg = chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].cgroup;
if (!objcg)
return;
- chunk->obj_cgroups[off >> PCPU_MIN_ALLOC_SHIFT] = NULL;
+ chunk->obj_exts[off >> PCPU_MIN_ALLOC_SHIFT].cgroup = NULL;

obj_cgroup_uncharge(objcg, pcpu_obj_full_size(size));

--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:51:46

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 31/39] mm: percpu: enable per-cpu allocation tagging

Redefine __alloc_percpu, __alloc_percpu_gfp and __alloc_reserved_percpu
to record allocations and deallocations done by these functions.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/alloc_tag.h | 15 +++++++++
include/linux/percpu.h | 23 +++++++++-----
mm/percpu.c | 64 +++++----------------------------------
3 files changed, 38 insertions(+), 64 deletions(-)

diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 6fa8a94d8bc1..3fe51e67e231 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -140,4 +140,19 @@ static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag,
_res; \
})

+/*
+ * workaround for a sparse bug: it complains about res_type_to_err() when
+ * typeof(_do_alloc) is a __percpu pointer, but gcc won't let us add a separate
+ * __percpu case to res_type_to_err():
+ */
+#define alloc_hooks_pcpu(_do_alloc) \
+({ \
+ typeof(_do_alloc) _res; \
+ DEFINE_ALLOC_TAG(_alloc_tag, _old); \
+ \
+ _res = _do_alloc; \
+ alloc_tag_restore(&_alloc_tag, _old); \
+ _res; \
+})
+
#endif /* _LINUX_ALLOC_TAG_H */
diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 68fac2e7cbe6..338c1ef9c93d 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -2,6 +2,7 @@
#ifndef __LINUX_PERCPU_H
#define __LINUX_PERCPU_H

+#include <linux/alloc_tag.h>
#include <linux/mmdebug.h>
#include <linux/preempt.h>
#include <linux/smp.h>
@@ -9,6 +10,7 @@
#include <linux/pfn.h>
#include <linux/init.h>
#include <linux/cleanup.h>
+#include <linux/sched.h>

#include <asm/percpu.h>

@@ -121,7 +123,6 @@ extern int __init pcpu_page_first_chunk(size_t reserved_size,
pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
#endif

-extern void __percpu *__alloc_reserved_percpu(size_t size, size_t align) __alloc_size(1);
extern bool __is_kernel_percpu_address(unsigned long addr, unsigned long *can_addr);
extern bool is_kernel_percpu_address(unsigned long addr);

@@ -129,13 +130,15 @@ extern bool is_kernel_percpu_address(unsigned long addr);
extern void __init setup_per_cpu_areas(void);
#endif

-extern void __percpu *__alloc_percpu_gfp(size_t size, size_t align, gfp_t gfp) __alloc_size(1);
-extern void __percpu *__alloc_percpu(size_t size, size_t align) __alloc_size(1);
-extern void free_percpu(void __percpu *__pdata);
+extern void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
+ gfp_t gfp) __alloc_size(1);

-DEFINE_FREE(free_percpu, void __percpu *, free_percpu(_T))
-
-extern phys_addr_t per_cpu_ptr_to_phys(void *addr);
+#define __alloc_percpu_gfp(_size, _align, _gfp) \
+ alloc_hooks_pcpu(pcpu_alloc_noprof(_size, _align, false, _gfp))
+#define __alloc_percpu(_size, _align) \
+ alloc_hooks_pcpu(pcpu_alloc_noprof(_size, _align, false, GFP_KERNEL))
+#define __alloc_reserved_percpu(_size, _align) \
+ alloc_hooks_pcpu(pcpu_alloc_noprof(_size, _align, true, GFP_KERNEL))

#define alloc_percpu_gfp(type, gfp) \
(typeof(type) __percpu *)__alloc_percpu_gfp(sizeof(type), \
@@ -144,6 +147,12 @@ extern phys_addr_t per_cpu_ptr_to_phys(void *addr);
(typeof(type) __percpu *)__alloc_percpu(sizeof(type), \
__alignof__(type))

+extern void free_percpu(void __percpu *__pdata);
+
+DEFINE_FREE(free_percpu, void __percpu *, free_percpu(_T))
+
+extern phys_addr_t per_cpu_ptr_to_phys(void *addr);
+
extern unsigned long pcpu_nr_pages(void);

#endif /* __LINUX_PERCPU_H */
diff --git a/mm/percpu.c b/mm/percpu.c
index 002ee5d38fd5..328a5b3c943b 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1728,7 +1728,7 @@ static void pcpu_alloc_tag_free_hook(struct pcpu_chunk *chunk, int off, size_t s
#endif

/**
- * pcpu_alloc - the percpu allocator
+ * pcpu_alloc_noprof - the percpu allocator
* @size: size of area to allocate in bytes
* @align: alignment of area (max PAGE_SIZE)
* @reserved: allocate from the reserved chunk if available
@@ -1742,7 +1742,7 @@ static void pcpu_alloc_tag_free_hook(struct pcpu_chunk *chunk, int off, size_t s
* RETURNS:
* Percpu pointer to the allocated area on success, NULL on failure.
*/
-static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
+void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
gfp_t gfp)
{
gfp_t pcpu_gfp;
@@ -1909,6 +1909,8 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,

pcpu_memcg_post_alloc_hook(objcg, chunk, off, size);

+ pcpu_alloc_tag_alloc_hook(chunk, off, size);
+
return ptr;

fail_unlock:
@@ -1937,61 +1939,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,

return NULL;
}
-
-/**
- * __alloc_percpu_gfp - allocate dynamic percpu area
- * @size: size of area to allocate in bytes
- * @align: alignment of area (max PAGE_SIZE)
- * @gfp: allocation flags
- *
- * Allocate zero-filled percpu area of @size bytes aligned at @align. If
- * @gfp doesn't contain %GFP_KERNEL, the allocation doesn't block and can
- * be called from any context but is a lot more likely to fail. If @gfp
- * has __GFP_NOWARN then no warning will be triggered on invalid or failed
- * allocation requests.
- *
- * RETURNS:
- * Percpu pointer to the allocated area on success, NULL on failure.
- */
-void __percpu *__alloc_percpu_gfp(size_t size, size_t align, gfp_t gfp)
-{
- return pcpu_alloc(size, align, false, gfp);
-}
-EXPORT_SYMBOL_GPL(__alloc_percpu_gfp);
-
-/**
- * __alloc_percpu - allocate dynamic percpu area
- * @size: size of area to allocate in bytes
- * @align: alignment of area (max PAGE_SIZE)
- *
- * Equivalent to __alloc_percpu_gfp(size, align, %GFP_KERNEL).
- */
-void __percpu *__alloc_percpu(size_t size, size_t align)
-{
- return pcpu_alloc(size, align, false, GFP_KERNEL);
-}
-EXPORT_SYMBOL_GPL(__alloc_percpu);
-
-/**
- * __alloc_reserved_percpu - allocate reserved percpu area
- * @size: size of area to allocate in bytes
- * @align: alignment of area (max PAGE_SIZE)
- *
- * Allocate zero-filled percpu area of @size bytes aligned at @align
- * from reserved percpu area if arch has set it up; otherwise,
- * allocation is served from the same dynamic area. Might sleep.
- * Might trigger writeouts.
- *
- * CONTEXT:
- * Does GFP_KERNEL allocation.
- *
- * RETURNS:
- * Percpu pointer to the allocated area on success, NULL on failure.
- */
-void __percpu *__alloc_reserved_percpu(size_t size, size_t align)
-{
- return pcpu_alloc(size, align, true, GFP_KERNEL);
-}
+EXPORT_SYMBOL_GPL(pcpu_alloc_noprof);

/**
* pcpu_balance_free - manage the amount of free chunks
@@ -2301,6 +2249,8 @@ void free_percpu(void __percpu *ptr)

size = pcpu_free_area(chunk, off);

+ pcpu_alloc_tag_free_hook(chunk, off, size);
+
pcpu_memcg_free_hook(chunk, off, size);

/*
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:52:11

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 35/39] lib: add memory allocations report in show_mem()

Include allocations in show_mem reports.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/alloc_tag.h | 2 ++
lib/alloc_tag.c | 37 +++++++++++++++++++++++++++++++++++++
mm/show_mem.c | 15 +++++++++++++++
3 files changed, 54 insertions(+)

diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 3fe51e67e231..0a5973c4ad77 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -30,6 +30,8 @@ struct alloc_tag {

#ifdef CONFIG_MEM_ALLOC_PROFILING

+void alloc_tags_show_mem_report(struct seq_buf *s);
+
static inline struct alloc_tag *ct_to_alloc_tag(struct codetag *ct)
{
return container_of(ct, struct alloc_tag, ct);
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 2d5226d9262d..2f7a2e3ddf55 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -96,6 +96,43 @@ static const struct seq_operations allocinfo_seq_op = {
.show = allocinfo_show,
};

+void alloc_tags_show_mem_report(struct seq_buf *s)
+{
+ struct codetag_iterator iter;
+ struct codetag *ct;
+ struct {
+ struct codetag *tag;
+ size_t bytes;
+ } tags[10], n;
+ unsigned int i, nr = 0;
+
+ codetag_lock_module_list(alloc_tag_cttype, true);
+ iter = codetag_get_ct_iter(alloc_tag_cttype);
+ while ((ct = codetag_next_ct(&iter))) {
+ struct alloc_tag_counters counter = alloc_tag_read(ct_to_alloc_tag(ct));
+ n.tag = ct;
+ n.bytes = counter.bytes;
+
+ for (i = 0; i < nr; i++)
+ if (n.bytes > tags[i].bytes)
+ break;
+
+ if (i < ARRAY_SIZE(tags)) {
+ nr -= nr == ARRAY_SIZE(tags);
+ memmove(&tags[i + 1],
+ &tags[i],
+ sizeof(tags[0]) * (nr - i));
+ nr++;
+ tags[i] = n;
+ }
+ }
+
+ for (i = 0; i < nr; i++)
+ alloc_tag_to_text(s, tags[i].tag);
+
+ codetag_lock_module_list(alloc_tag_cttype, false);
+}
+
static void __init procfs_init(void)
{
proc_create_seq("allocinfo", 0444, NULL, &allocinfo_seq_op);
diff --git a/mm/show_mem.c b/mm/show_mem.c
index 4b888b18bdde..660e9a78a34d 100644
--- a/mm/show_mem.c
+++ b/mm/show_mem.c
@@ -12,6 +12,7 @@
#include <linux/hugetlb.h>
#include <linux/mm.h>
#include <linux/mmzone.h>
+#include <linux/seq_buf.h>
#include <linux/swap.h>
#include <linux/vmstat.h>

@@ -426,4 +427,18 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx)
#ifdef CONFIG_MEMORY_FAILURE
printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages));
#endif
+#ifdef CONFIG_MEM_ALLOC_PROFILING
+ {
+ struct seq_buf s;
+ char *buf = kmalloc(4096, GFP_ATOMIC);
+
+ if (buf) {
+ printk("Memory allocations:\n");
+ seq_buf_init(&s, buf, 4096);
+ alloc_tags_show_mem_report(&s);
+ printk("%s", buf);
+ kfree(buf);
+ }
+ }
+#endif
}
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:52:46

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 38/39] codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations

If slabobj_ext vector allocation for a slab object fails and later on it
succeeds for another object in the same slab, the slabobj_ext for the
original object will be NULL and will be flagged in case when
CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled.
Mark failed slabobj_ext vector allocations using a new objext_flags flag
stored in the lower bits of slab->obj_exts. When new allocation succeeds
it marks all tag references in the same slabobj_ext vector as empty to
avoid warnings implemented by CONFIG_MEM_ALLOC_PROFILING_DEBUG checks.

Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/memcontrol.h | 4 +++-
mm/slab.h | 25 +++++++++++++++++++++++++
mm/slab_common.c | 22 +++++++++++++++-------
3 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 853a24b5f713..6b680ca424e3 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -363,8 +363,10 @@ enum page_memcg_data_flags {
#endif /* CONFIG_MEMCG */

enum objext_flags {
+ /* slabobj_ext vector failed to allocate */
+ OBJEXTS_ALLOC_FAIL = __FIRST_OBJEXT_FLAG,
/* the next bit after the last actual flag */
- __NR_OBJEXTS_FLAGS = __FIRST_OBJEXT_FLAG,
+ __NR_OBJEXTS_FLAGS = (__FIRST_OBJEXT_FLAG << 1),
};

#define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1)
diff --git a/mm/slab.h b/mm/slab.h
index 45216bad34b8..1736268892e6 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -474,9 +474,34 @@ static inline void mark_objexts_empty(struct slabobj_ext *obj_exts)
}
}

+static inline void mark_failed_objexts_alloc(struct slab *slab)
+{
+ slab->obj_exts = OBJEXTS_ALLOC_FAIL;
+}
+
+static inline void handle_failed_objexts_alloc(unsigned long obj_exts,
+ struct slabobj_ext *vec, unsigned int objects)
+{
+ /*
+ * If vector previously failed to allocate then we have live
+ * objects with no tag reference. Mark all references in this
+ * vector as empty to avoid warnings later on.
+ */
+ if (obj_exts & OBJEXTS_ALLOC_FAIL) {
+ unsigned int i;
+
+ for (i = 0; i < objects; i++)
+ set_codetag_empty(&vec[i].ref);
+ }
+}
+
+
#else /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */

static inline void mark_objexts_empty(struct slabobj_ext *obj_exts) {}
+static inline void mark_failed_objexts_alloc(struct slab *slab) {}
+static inline void handle_failed_objexts_alloc(unsigned long obj_exts,
+ struct slabobj_ext *vec, unsigned int objects) {}

#endif /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */

diff --git a/mm/slab_common.c b/mm/slab_common.c
index db2cd7afc353..cea73314f919 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -218,29 +218,37 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
gfp_t gfp, bool new_slab)
{
unsigned int objects = objs_per_slab(s, slab);
- unsigned long obj_exts;
- void *vec;
+ unsigned long new_exts;
+ unsigned long old_exts;
+ struct slabobj_ext *vec;

gfp &= ~OBJCGS_CLEAR_MASK;
/* Prevent recursive extension vector allocation */
gfp |= __GFP_NO_OBJ_EXT;
vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
slab_nid(slab));
- if (!vec)
+ if (!vec) {
+ /* Mark vectors which failed to allocate */
+ if (new_slab)
+ mark_failed_objexts_alloc(slab);
+
return -ENOMEM;
+ }

- obj_exts = (unsigned long)vec;
+ new_exts = (unsigned long)vec;
#ifdef CONFIG_MEMCG
- obj_exts |= MEMCG_DATA_OBJEXTS;
+ new_exts |= MEMCG_DATA_OBJEXTS;
#endif
+ old_exts = slab->obj_exts;
+ handle_failed_objexts_alloc(old_exts, vec, objects);
if (new_slab) {
/*
* If the slab is brand new and nobody can yet access its
* obj_exts, no synchronization is required and obj_exts can
* be simply assigned.
*/
- slab->obj_exts = obj_exts;
- } else if (cmpxchg(&slab->obj_exts, 0, obj_exts)) {
+ slab->obj_exts = new_exts;
+ } else if (cmpxchg(&slab->obj_exts, old_exts, new_exts) != old_exts) {
/*
* If the slab is already in use, somebody can allocate and
* assign slabobj_exts in parallel. In this case the existing
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:52:46

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 37/39] codetag: debug: mark codetags for reserved pages as empty

To avoid debug warnings while freeing reserved pages which were not
allocated with usual allocators, mark their codetags as empty before
freeing.
Maybe we can annotate reserved pages correctly and avoid this?

Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/alloc_tag.h | 2 ++
include/linux/mm.h | 8 ++++++++
include/linux/pgalloc_tag.h | 2 ++
3 files changed, 12 insertions(+)

diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
index 1f3207097b03..102caf62c2a9 100644
--- a/include/linux/alloc_tag.h
+++ b/include/linux/alloc_tag.h
@@ -95,6 +95,7 @@ static inline void set_codetag_empty(union codetag_ref *ref)
#else /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */

static inline bool is_codetag_empty(union codetag_ref *ref) { return false; }
+static inline void set_codetag_empty(union codetag_ref *ref) {}

#endif /* CONFIG_MEM_ALLOC_PROFILING_DEBUG */

@@ -155,6 +156,7 @@ static inline void alloc_tag_sub(union codetag_ref *ref, size_t bytes) {}
static inline void alloc_tag_sub_noalloc(union codetag_ref *ref, size_t bytes) {}
static inline void alloc_tag_add(union codetag_ref *ref, struct alloc_tag *tag,
size_t bytes) {}
+static inline void set_codetag_empty(union codetag_ref *ref) {}

#endif

diff --git a/include/linux/mm.h b/include/linux/mm.h
index bf5d0b1b16f4..310129414833 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -5,6 +5,7 @@
#include <linux/errno.h>
#include <linux/mmdebug.h>
#include <linux/gfp.h>
+#include <linux/pgalloc_tag.h>
#include <linux/bug.h>
#include <linux/list.h>
#include <linux/mmzone.h>
@@ -3077,6 +3078,13 @@ extern void reserve_bootmem_region(phys_addr_t start,
/* Free the reserved page into the buddy system, so it gets managed. */
static inline void free_reserved_page(struct page *page)
{
+ union codetag_ref *ref;
+
+ ref = get_page_tag_ref(page);
+ if (ref) {
+ set_codetag_empty(ref);
+ put_page_tag_ref(ref);
+ }
ClearPageReserved(page);
init_page_count(page);
__free_page(page);
diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index 0174aff5e871..ae9b0f359264 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -93,6 +93,8 @@ static inline void pgalloc_tag_split(struct page *page, unsigned int nr)

#else /* CONFIG_MEM_ALLOC_PROFILING */

+static inline union codetag_ref *get_page_tag_ref(struct page *page) { return NULL; }
+static inline void put_page_tag_ref(union codetag_ref *ref) {}
static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
unsigned int order) {}
static inline void pgalloc_tag_sub(struct page *page, unsigned int order) {}
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 13:52:46

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v2 39/39] MAINTAINERS: Add entries for code tagging and memory allocation profiling

From: Kent Overstreet <[email protected]>

The new code & libraries added are being maintained - mark them as such.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
MAINTAINERS | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2894f0777537..22e51de42131 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5118,6 +5118,13 @@ S: Supported
F: Documentation/process/code-of-conduct-interpretation.rst
F: Documentation/process/code-of-conduct.rst

+CODE TAGGING
+M: Suren Baghdasaryan <[email protected]>
+M: Kent Overstreet <[email protected]>
+S: Maintained
+F: include/linux/codetag.h
+F: lib/codetag.c
+
COMEDI DRIVERS
M: Ian Abbott <[email protected]>
M: H Hartley Sweeten <[email protected]>
@@ -13708,6 +13715,15 @@ F: mm/memblock.c
F: mm/mm_init.c
F: tools/testing/memblock/

+MEMORY ALLOCATION PROFILING
+M: Suren Baghdasaryan <[email protected]>
+M: Kent Overstreet <[email protected]>
+S: Maintained
+F: include/linux/alloc_tag.h
+F: include/linux/codetag_ctx.h
+F: lib/alloc_tag.c
+F: lib/pgalloc_tag.c
+
MEMORY CONTROLLER DRIVERS
M: Krzysztof Kozlowski <[email protected]>
L: [email protected]
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 14:26:51

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v2 01/39] lib/string_helpers: Add flags param to string_get_size()

(Minimized the list of people for my review / comments)

On Tue, Oct 24, 2023 at 06:45:58AM -0700, Suren Baghdasaryan wrote:
> From: Kent Overstreet <[email protected]>
>
> The new flags parameter allows controlling
> - Whether or not the units suffix is separated by a space, for
> compatibility with sort -h
> - Whether or not to append a B suffix - we're not always printing
> bytes.

...

> string_get_size(nblocks, queue_logical_block_size(q),
> - STRING_UNITS_10, cap_str_10, sizeof(cap_str_10));
> + 0, cap_str_10, sizeof(cap_str_10));

This doesn't seem right (even if it works). We shouldn't rely on the
implementation details.

...

> - string_get_size(sdkp->capacity, sector_size,
> - STRING_UNITS_10, cap_str_10, sizeof(cap_str_10));

> + string_get_size(sdkp->capacity, sector_size, 0,
> + cap_str_10, sizeof(cap_str_10));

Neither this.

...

> -/* Descriptions of the types of units to
> - * print in */
> -enum string_size_units {
> - STRING_UNITS_10, /* use powers of 10^3 (standard SI) */
> - STRING_UNITS_2, /* use binary powers of 2^10 */
> +enum string_size_flags {

So, please add UNITS_10 as it is now. It will help if anybody in the future
wants to add, e.g., 8-base numbers.

> + STRING_SIZE_BASE2 = (1 << 0),
> + STRING_SIZE_NOSPACE = (1 << 1),
> + STRING_SIZE_NOBYTES = (1 << 2),
> };

Please, add necessary comments.

...

> +enum string_size_units {
> + STRING_UNITS_10, /* use powers of 10^3 (standard SI) */
> + STRING_UNITS_2, /* use binary powers of 2^10 */
> +};

And what a point now in having these?

I assume you need to split this to a few patches:

1) rename parameter to be a flags without renaming the definitions (this will
touch only string_helpers part);
2) do the end job by renaming it all over the drivers;
3) add the other flags one-by-one (each in a separate change);
4) use new flags where it's needed;

Also see below.

...

> static const char *const units_10[] = {
> - "B", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB"
> + "", "k", "M", "G", "T", "P", "E", "Z", "Y"
> };
> static const char *const units_2[] = {
> - "B", "KiB", "MiB", "GiB", "TiB", "PiB", "EiB", "ZiB", "YiB"
> + "", "Ki", "Mi", "Gi", "Ti", "Pi", "Ei", "Zi", "Yi"
> };

Ouch, instead of leaving this and actually "cutting the letter" with NO* flags,
you did something different.

...

Now the main part. Since in 50+% cases (I briefly estimated, it may be more)
this is used in printf() why not introducing a new pointer extension for that?

Yes, it may be done separately, but it will look like a double effort to me.
Instead it might give us a possibility to scale w/o touching users each time
we want to do something and at the same time hide this complete API under
printf() implementation.

--
With Best Regards,
Andy Shevchenko


2023-10-24 17:09:18

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v2 01/39] lib/string_helpers: Add flags param to string_get_size()

On Tue, Oct 24, 2023 at 7:26 AM Andy Shevchenko <[email protected]> wrote:
>
> (Minimized the list of people for my review / comments)

Thanks! I'll let Kent address your feedback.

>
> On Tue, Oct 24, 2023 at 06:45:58AM -0700, Suren Baghdasaryan wrote:
> > From: Kent Overstreet <[email protected]>
> >
> > The new flags parameter allows controlling
> > - Whether or not the units suffix is separated by a space, for
> > compatibility with sort -h
> > - Whether or not to append a B suffix - we're not always printing
> > bytes.
>
> ...
>
> > string_get_size(nblocks, queue_logical_block_size(q),
> > - STRING_UNITS_10, cap_str_10, sizeof(cap_str_10));
> > + 0, cap_str_10, sizeof(cap_str_10));
>
> This doesn't seem right (even if it works). We shouldn't rely on the
> implementation details.
>
> ...
>
> > - string_get_size(sdkp->capacity, sector_size,
> > - STRING_UNITS_10, cap_str_10, sizeof(cap_str_10));
>
> > + string_get_size(sdkp->capacity, sector_size, 0,
> > + cap_str_10, sizeof(cap_str_10));
>
> Neither this.
>
> ...
>
> > -/* Descriptions of the types of units to
> > - * print in */
> > -enum string_size_units {
> > - STRING_UNITS_10, /* use powers of 10^3 (standard SI) */
> > - STRING_UNITS_2, /* use binary powers of 2^10 */
> > +enum string_size_flags {
>
> So, please add UNITS_10 as it is now. It will help if anybody in the future
> wants to add, e.g., 8-base numbers.
>
> > + STRING_SIZE_BASE2 = (1 << 0),
> > + STRING_SIZE_NOSPACE = (1 << 1),
> > + STRING_SIZE_NOBYTES = (1 << 2),
> > };
>
> Please, add necessary comments.
>
> ...
>
> > +enum string_size_units {
> > + STRING_UNITS_10, /* use powers of 10^3 (standard SI) */
> > + STRING_UNITS_2, /* use binary powers of 2^10 */
> > +};
>
> And what a point now in having these?
>
> I assume you need to split this to a few patches:
>
> 1) rename parameter to be a flags without renaming the definitions (this will
> touch only string_helpers part);
> 2) do the end job by renaming it all over the drivers;
> 3) add the other flags one-by-one (each in a separate change);
> 4) use new flags where it's needed;
>
> Also see below.
>
> ...
>
> > static const char *const units_10[] = {
> > - "B", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB"
> > + "", "k", "M", "G", "T", "P", "E", "Z", "Y"
> > };
> > static const char *const units_2[] = {
> > - "B", "KiB", "MiB", "GiB", "TiB", "PiB", "EiB", "ZiB", "YiB"
> > + "", "Ki", "Mi", "Gi", "Ti", "Pi", "Ei", "Zi", "Yi"
> > };
>
> Ouch, instead of leaving this and actually "cutting the letter" with NO* flags,
> you did something different.
>
> ...
>
> Now the main part. Since in 50+% cases (I briefly estimated, it may be more)
> this is used in printf() why not introducing a new pointer extension for that?
>
> Yes, it may be done separately, but it will look like a double effort to me.
> Instead it might give us a possibility to scale w/o touching users each time
> we want to do something and at the same time hide this complete API under
> printf() implementation.
>
> --
> With Best Regards,
> Andy Shevchenko
>
>

2023-10-24 18:29:48

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH v2 00/39] Memory allocation profiling

On Tue, Oct 24, 2023 at 06:45:57AM -0700, Suren Baghdasaryan wrote:
> Updates since the last version [1]
> - Simplified allocation tagging macros;
> - Runtime enable/disable sysctl switch (/proc/sys/vm/mem_profiling)
> instead of kernel command-line option;
> - CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT to select default enable state;
> - Changed the user-facing API from debugfs to procfs (/proc/allocinfo);
> - Removed context capture support to make patch incremental;
> - Renamed uninstrumented allocation functions to use _noprof suffix;
> - Added __GFP_LAST_BIT to make the code cleaner;
> - Removed lazy per-cpu counters; it turned out the memory savings was
> minimal and not worth the performance impact;

Hello Suren,

> Performance overhead:
> To evaluate performance we implemented an in-kernel test executing
> multiple get_free_page/free_page and kmalloc/kfree calls with allocation
> sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU
> affinity set to a specific CPU to minimize the noise. Below is performance
> comparison between the baseline kernel, profiling when enabled, profiling
> when disabled and (for comparison purposes) baseline with
> CONFIG_MEMCG_KMEM enabled and allocations using __GFP_ACCOUNT:
>
> kmalloc pgalloc
> (1 baseline) 12.041s 49.190s
> (2 default disabled) 14.970s (+24.33%) 49.684s (+1.00%)
> (3 default enabled) 16.859s (+40.01%) 56.287s (+14.43%)
> (4 runtime enabled) 16.983s (+41.04%) 55.760s (+13.36%)
> (5 memcg) 33.831s (+180.96%) 51.433s (+4.56%)

some recent changes [1] to the kmem accounting should have made it quite a bit
faster. Would be great if you can provide new numbers for the comparison.
Maybe with the next revision?

And btw thank you (and Kent): your numbers inspired me to do this kmemcg
performance work. I expect it still to be ~twice more expensive than your
stuff because on the memcg side we handle separately charge and statistics,
but hopefully the difference will be lower.

Thank you!

[1]:
patches from next tree, so no stable hashes:
mm: kmem: reimplement get_obj_cgroup_from_current()
percpu: scoped objcg protection
mm: kmem: scoped objcg protection
mm: kmem: make memcg keep a reference to the original objcg
mm: kmem: add direct objcg pointer to task_struct
mm: kmem: optimize get_obj_cgroup_from_current()

2023-10-24 18:39:28

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v2 00/39] Memory allocation profiling

On Tue, Oct 24, 2023 at 11:29 AM Roman Gushchin
<[email protected]> wrote:
>
> On Tue, Oct 24, 2023 at 06:45:57AM -0700, Suren Baghdasaryan wrote:
> > Updates since the last version [1]
> > - Simplified allocation tagging macros;
> > - Runtime enable/disable sysctl switch (/proc/sys/vm/mem_profiling)
> > instead of kernel command-line option;
> > - CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT to select default enable state;
> > - Changed the user-facing API from debugfs to procfs (/proc/allocinfo);
> > - Removed context capture support to make patch incremental;
> > - Renamed uninstrumented allocation functions to use _noprof suffix;
> > - Added __GFP_LAST_BIT to make the code cleaner;
> > - Removed lazy per-cpu counters; it turned out the memory savings was
> > minimal and not worth the performance impact;
>
> Hello Suren,
>
> > Performance overhead:
> > To evaluate performance we implemented an in-kernel test executing
> > multiple get_free_page/free_page and kmalloc/kfree calls with allocation
> > sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU
> > affinity set to a specific CPU to minimize the noise. Below is performance
> > comparison between the baseline kernel, profiling when enabled, profiling
> > when disabled and (for comparison purposes) baseline with
> > CONFIG_MEMCG_KMEM enabled and allocations using __GFP_ACCOUNT:
> >
> > kmalloc pgalloc
> > (1 baseline) 12.041s 49.190s
> > (2 default disabled) 14.970s (+24.33%) 49.684s (+1.00%)
> > (3 default enabled) 16.859s (+40.01%) 56.287s (+14.43%)
> > (4 runtime enabled) 16.983s (+41.04%) 55.760s (+13.36%)
> > (5 memcg) 33.831s (+180.96%) 51.433s (+4.56%)
>
> some recent changes [1] to the kmem accounting should have made it quite a bit
> faster. Would be great if you can provide new numbers for the comparison.
> Maybe with the next revision?
>
> And btw thank you (and Kent): your numbers inspired me to do this kmemcg
> performance work. I expect it still to be ~twice more expensive than your
> stuff because on the memcg side we handle separately charge and statistics,
> but hopefully the difference will be lower.

Yes, I saw them! Well done! I'll definitely update my numbers once the
patches land in their final form.

>
> Thank you!

Thank you for the optimizations!

>
> [1]:
> patches from next tree, so no stable hashes:
> mm: kmem: reimplement get_obj_cgroup_from_current()
> percpu: scoped objcg protection
> mm: kmem: scoped objcg protection
> mm: kmem: make memcg keep a reference to the original objcg
> mm: kmem: add direct objcg pointer to task_struct
> mm: kmem: optimize get_obj_cgroup_from_current()

2023-10-24 19:47:09

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v2 01/39] lib/string_helpers: Add flags param to string_get_size()

On Tue, Oct 24, 2023 at 05:26:18PM +0300, Andy Shevchenko wrote:
> (Minimized the list of people for my review / comments)
>
> On Tue, Oct 24, 2023 at 06:45:58AM -0700, Suren Baghdasaryan wrote:
> > From: Kent Overstreet <[email protected]>
> >
> > The new flags parameter allows controlling
> > - Whether or not the units suffix is separated by a space, for
> > compatibility with sort -h
> > - Whether or not to append a B suffix - we're not always printing
> > bytes.
>
> ...
>
> > string_get_size(nblocks, queue_logical_block_size(q),
> > - STRING_UNITS_10, cap_str_10, sizeof(cap_str_10));
> > + 0, cap_str_10, sizeof(cap_str_10));
>
> This doesn't seem right (even if it works). We shouldn't rely on the
> implementation details.

It's now a flags parameter: passing an empty set of flags is not
"relying on an implementation detail".

> > -/* Descriptions of the types of units to
> > - * print in */
> > -enum string_size_units {
> > - STRING_UNITS_10, /* use powers of 10^3 (standard SI) */
> > - STRING_UNITS_2, /* use binary powers of 2^10 */
> > +enum string_size_flags {
>
> So, please add UNITS_10 as it is now. It will help if anybody in the future
> wants to add, e.g., 8-base numbers.

Octal human readable numbers? No, no one's wanted that so far and I
very much doubt anyone will want that in the future.

> > + STRING_SIZE_BASE2 = (1 << 0),
> > + STRING_SIZE_NOSPACE = (1 << 1),
> > + STRING_SIZE_NOBYTES = (1 << 2),
> > };
>
> Please, add necessary comments.

That I can do.

> > +enum string_size_units {
> > + STRING_UNITS_10, /* use powers of 10^3 (standard SI) */
> > + STRING_UNITS_2, /* use binary powers of 2^10 */
> > +};
>
> And what a point now in having these?

Minimizing the size of the diff and making it more reviewable. It's fine
as an internal implementation thing.

>
> I assume you need to split this to a few patches:
>
> 1) rename parameter to be a flags without renaming the definitions (this will
> touch only string_helpers part);
> 2) do the end job by renaming it all over the drivers;
> 3) add the other flags one-by-one (each in a separate change);
> 4) use new flags where it's needed;

No, those would not be atomic changes. In particular changing the
parameter to a flags without changing the callers - that's not how we do
things.

We're currently working towards _better_ type safety for enums, fyi.

The new flags _could_ be a separate patch, but since it would be
touching much the same code as the previous patch I don't see the point
in splitting it.

> > static const char *const units_10[] = {
> > - "B", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB"
> > + "", "k", "M", "G", "T", "P", "E", "Z", "Y"
> > };
> > static const char *const units_2[] = {
> > - "B", "KiB", "MiB", "GiB", "TiB", "PiB", "EiB", "ZiB", "YiB"
> > + "", "Ki", "Mi", "Gi", "Ti", "Pi", "Ei", "Zi", "Yi"
> > };
>
> Ouch, instead of leaving this and actually "cutting the letter" with NO* flags,
> you did something different.

Not sure I understand your complaint? Were you attached to the redundant
Bs?

> Now the main part. Since in 50+% cases (I briefly estimated, it may be more)
> this is used in printf() why not introducing a new pointer extension for that?
>
> Yes, it may be done separately, but it will look like a double effort to me.
> Instead it might give us a possibility to scale w/o touching users each time
> we want to do something and at the same time hide this complete API under
> printf() implementation.

No, I would not be in favor of another %p extension: in particular,
since this takes integer inputs the lack of type safety for %p
extensions coupled with C's very relaxed approach to integer type
conversion would be a really nasty footgun.

2023-10-25 05:47:14

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH v2 06/39] mm: enumerate all gfp flags

On Tue, 24 Oct 2023 06:46:03 -0700
Suren Baghdasaryan <[email protected]> wrote:

> Introduce GFP bits enumeration to let compiler track the number of used
> bits (which depends on the config options) instead of hardcoding them.
> That simplifies __GFP_BITS_SHIFT calculation.
> Suggested-by: Petr Tesařík <[email protected]>
> Signed-off-by: Suren Baghdasaryan <[email protected]>
> ---
> include/linux/gfp_types.h | 90 +++++++++++++++++++++++++++------------
> 1 file changed, 62 insertions(+), 28 deletions(-)
>
> diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
> index 6583a58670c5..3fbe624763d9 100644
> --- a/include/linux/gfp_types.h
> +++ b/include/linux/gfp_types.h
> @@ -21,44 +21,78 @@ typedef unsigned int __bitwise gfp_t;
> * include/trace/events/mmflags.h and tools/perf/builtin-kmem.c
> */
>
> +enum {
> + ___GFP_DMA_BIT,
> + ___GFP_HIGHMEM_BIT,
> + ___GFP_DMA32_BIT,
> + ___GFP_MOVABLE_BIT,
> + ___GFP_RECLAIMABLE_BIT,
> + ___GFP_HIGH_BIT,
> + ___GFP_IO_BIT,
> + ___GFP_FS_BIT,
> + ___GFP_ZERO_BIT,
> + ___GFP_UNUSED_BIT, /* 0x200u unused */
> + ___GFP_DIRECT_RECLAIM_BIT,
> + ___GFP_KSWAPD_RECLAIM_BIT,
> + ___GFP_WRITE_BIT,
> + ___GFP_NOWARN_BIT,
> + ___GFP_RETRY_MAYFAIL_BIT,
> + ___GFP_NOFAIL_BIT,
> + ___GFP_NORETRY_BIT,
> + ___GFP_MEMALLOC_BIT,
> + ___GFP_COMP_BIT,
> + ___GFP_NOMEMALLOC_BIT,
> + ___GFP_HARDWALL_BIT,
> + ___GFP_THISNODE_BIT,
> + ___GFP_ACCOUNT_BIT,
> + ___GFP_ZEROTAGS_BIT,
> +#ifdef CONFIG_KASAN_HW_TAGS
> + ___GFP_SKIP_ZERO_BIT,
> + ___GFP_SKIP_KASAN_BIT,
> +#endif
> +#ifdef CONFIG_LOCKDEP
> + ___GFP_NOLOCKDEP_BIT,
> +#endif
> + ___GFP_LAST_BIT
> +};
> +
> /* Plain integer GFP bitmasks. Do not use this directly. */
> -#define ___GFP_DMA 0x01u
> -#define ___GFP_HIGHMEM 0x02u
> -#define ___GFP_DMA32 0x04u
> -#define ___GFP_MOVABLE 0x08u
> -#define ___GFP_RECLAIMABLE 0x10u
> -#define ___GFP_HIGH 0x20u
> -#define ___GFP_IO 0x40u
> -#define ___GFP_FS 0x80u
> -#define ___GFP_ZERO 0x100u
> +#define ___GFP_DMA BIT(___GFP_DMA_BIT)
> +#define ___GFP_HIGHMEM BIT(___GFP_HIGHMEM_BIT)
> +#define ___GFP_DMA32 BIT(___GFP_DMA32_BIT)
> +#define ___GFP_MOVABLE BIT(___GFP_MOVABLE_BIT)
> +#define ___GFP_RECLAIMABLE BIT(___GFP_RECLAIMABLE_BIT)
> +#define ___GFP_HIGH BIT(___GFP_HIGH_BIT)
> +#define ___GFP_IO BIT(___GFP_IO_BIT)
> +#define ___GFP_FS BIT(___GFP_FS_BIT)
> +#define ___GFP_ZERO BIT(___GFP_ZERO_BIT)
> /* 0x200u unused */

This comment can be also removed here, because it is already stated
above with the definition of ___GFP_UNUSED_BIT.

Then again, I think that the GFP bits have never been compacted after
Neil Brown removed __GFP_ATOMIC with commit 2973d8229b78 simply because
that would mean changing definitions of all subsequent GFP flags. FWIW
I am not aware of any code that would depend on the numeric value of
___GFP_* macros, so this patch seems like a good opportunity to change
the numbering and get rid of this unused 0x200u altogether.

@Neil: I have added you to the conversation in case you want to correct
my understanding of the unused bit.

Other than that LGTM.

Petr T

> -#define ___GFP_DIRECT_RECLAIM 0x400u
> -#define ___GFP_KSWAPD_RECLAIM 0x800u
> -#define ___GFP_WRITE 0x1000u
> -#define ___GFP_NOWARN 0x2000u
> -#define ___GFP_RETRY_MAYFAIL 0x4000u
> -#define ___GFP_NOFAIL 0x8000u
> -#define ___GFP_NORETRY 0x10000u
> -#define ___GFP_MEMALLOC 0x20000u
> -#define ___GFP_COMP 0x40000u
> -#define ___GFP_NOMEMALLOC 0x80000u
> -#define ___GFP_HARDWALL 0x100000u
> -#define ___GFP_THISNODE 0x200000u
> -#define ___GFP_ACCOUNT 0x400000u
> -#define ___GFP_ZEROTAGS 0x800000u
> +#define ___GFP_DIRECT_RECLAIM BIT(___GFP_DIRECT_RECLAIM_BIT)
> +#define ___GFP_KSWAPD_RECLAIM BIT(___GFP_KSWAPD_RECLAIM_BIT)
> +#define ___GFP_WRITE BIT(___GFP_WRITE_BIT)
> +#define ___GFP_NOWARN BIT(___GFP_NOWARN_BIT)
> +#define ___GFP_RETRY_MAYFAIL BIT(___GFP_RETRY_MAYFAIL_BIT)
> +#define ___GFP_NOFAIL BIT(___GFP_NOFAIL_BIT)
> +#define ___GFP_NORETRY BIT(___GFP_NORETRY_BIT)
> +#define ___GFP_MEMALLOC BIT(___GFP_MEMALLOC_BIT)
> +#define ___GFP_COMP BIT(___GFP_COMP_BIT)
> +#define ___GFP_NOMEMALLOC BIT(___GFP_NOMEMALLOC_BIT)
> +#define ___GFP_HARDWALL BIT(___GFP_HARDWALL_BIT)
> +#define ___GFP_THISNODE BIT(___GFP_THISNODE_BIT)
> +#define ___GFP_ACCOUNT BIT(___GFP_ACCOUNT_BIT)
> +#define ___GFP_ZEROTAGS BIT(___GFP_ZEROTAGS_BIT)
> #ifdef CONFIG_KASAN_HW_TAGS
> -#define ___GFP_SKIP_ZERO 0x1000000u
> -#define ___GFP_SKIP_KASAN 0x2000000u
> +#define ___GFP_SKIP_ZERO BIT(___GFP_SKIP_ZERO_BIT)
> +#define ___GFP_SKIP_KASAN BIT(___GFP_SKIP_KASAN_BIT)
> #else
> #define ___GFP_SKIP_ZERO 0
> #define ___GFP_SKIP_KASAN 0
> #endif
> #ifdef CONFIG_LOCKDEP
> -#define ___GFP_NOLOCKDEP 0x4000000u
> +#define ___GFP_NOLOCKDEP BIT(___GFP_NOLOCKDEP_BIT)
> #else
> #define ___GFP_NOLOCKDEP 0
> #endif
> -/* If the above are modified, __GFP_BITS_SHIFT may need updating */
>
> /*
> * Physical address zone modifiers (see linux/mmzone.h - low four bits)
> @@ -249,7 +283,7 @@ typedef unsigned int __bitwise gfp_t;
> #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
>
> /* Room for N __GFP_FOO bits */
> -#define __GFP_BITS_SHIFT (26 + IS_ENABLED(CONFIG_LOCKDEP))
> +#define __GFP_BITS_SHIFT ___GFP_LAST_BIT
> #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
>
> /**

2023-10-25 15:31:05

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v2 06/39] mm: enumerate all gfp flags

On Tue, Oct 24, 2023 at 10:47 PM Petr Tesařík <[email protected]> wrote:
>
> On Tue, 24 Oct 2023 06:46:03 -0700
> Suren Baghdasaryan <[email protected]> wrote:
>
> > Introduce GFP bits enumeration to let compiler track the number of used
> > bits (which depends on the config options) instead of hardcoding them.
> > That simplifies __GFP_BITS_SHIFT calculation.
> > Suggested-by: Petr Tesařík <[email protected]>
> > Signed-off-by: Suren Baghdasaryan <[email protected]>
> > ---
> > include/linux/gfp_types.h | 90 +++++++++++++++++++++++++++------------
> > 1 file changed, 62 insertions(+), 28 deletions(-)
> >
> > diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
> > index 6583a58670c5..3fbe624763d9 100644
> > --- a/include/linux/gfp_types.h
> > +++ b/include/linux/gfp_types.h
> > @@ -21,44 +21,78 @@ typedef unsigned int __bitwise gfp_t;
> > * include/trace/events/mmflags.h and tools/perf/builtin-kmem.c
> > */
> >
> > +enum {
> > + ___GFP_DMA_BIT,
> > + ___GFP_HIGHMEM_BIT,
> > + ___GFP_DMA32_BIT,
> > + ___GFP_MOVABLE_BIT,
> > + ___GFP_RECLAIMABLE_BIT,
> > + ___GFP_HIGH_BIT,
> > + ___GFP_IO_BIT,
> > + ___GFP_FS_BIT,
> > + ___GFP_ZERO_BIT,
> > + ___GFP_UNUSED_BIT, /* 0x200u unused */
> > + ___GFP_DIRECT_RECLAIM_BIT,
> > + ___GFP_KSWAPD_RECLAIM_BIT,
> > + ___GFP_WRITE_BIT,
> > + ___GFP_NOWARN_BIT,
> > + ___GFP_RETRY_MAYFAIL_BIT,
> > + ___GFP_NOFAIL_BIT,
> > + ___GFP_NORETRY_BIT,
> > + ___GFP_MEMALLOC_BIT,
> > + ___GFP_COMP_BIT,
> > + ___GFP_NOMEMALLOC_BIT,
> > + ___GFP_HARDWALL_BIT,
> > + ___GFP_THISNODE_BIT,
> > + ___GFP_ACCOUNT_BIT,
> > + ___GFP_ZEROTAGS_BIT,
> > +#ifdef CONFIG_KASAN_HW_TAGS
> > + ___GFP_SKIP_ZERO_BIT,
> > + ___GFP_SKIP_KASAN_BIT,
> > +#endif
> > +#ifdef CONFIG_LOCKDEP
> > + ___GFP_NOLOCKDEP_BIT,
> > +#endif
> > + ___GFP_LAST_BIT
> > +};
> > +
> > /* Plain integer GFP bitmasks. Do not use this directly. */
> > -#define ___GFP_DMA 0x01u
> > -#define ___GFP_HIGHMEM 0x02u
> > -#define ___GFP_DMA32 0x04u
> > -#define ___GFP_MOVABLE 0x08u
> > -#define ___GFP_RECLAIMABLE 0x10u
> > -#define ___GFP_HIGH 0x20u
> > -#define ___GFP_IO 0x40u
> > -#define ___GFP_FS 0x80u
> > -#define ___GFP_ZERO 0x100u
> > +#define ___GFP_DMA BIT(___GFP_DMA_BIT)
> > +#define ___GFP_HIGHMEM BIT(___GFP_HIGHMEM_BIT)
> > +#define ___GFP_DMA32 BIT(___GFP_DMA32_BIT)
> > +#define ___GFP_MOVABLE BIT(___GFP_MOVABLE_BIT)
> > +#define ___GFP_RECLAIMABLE BIT(___GFP_RECLAIMABLE_BIT)
> > +#define ___GFP_HIGH BIT(___GFP_HIGH_BIT)
> > +#define ___GFP_IO BIT(___GFP_IO_BIT)
> > +#define ___GFP_FS BIT(___GFP_FS_BIT)
> > +#define ___GFP_ZERO BIT(___GFP_ZERO_BIT)
> > /* 0x200u unused */
>
> This comment can be also removed here, because it is already stated
> above with the definition of ___GFP_UNUSED_BIT.

Ack.

>
> Then again, I think that the GFP bits have never been compacted after
> Neil Brown removed __GFP_ATOMIC with commit 2973d8229b78 simply because
> that would mean changing definitions of all subsequent GFP flags. FWIW
> I am not aware of any code that would depend on the numeric value of
> ___GFP_* macros, so this patch seems like a good opportunity to change
> the numbering and get rid of this unused 0x200u altogether.
>
> @Neil: I have added you to the conversation in case you want to correct
> my understanding of the unused bit.

Hmm. I would prefer to do that in a separate patch even though it
would be a one-line change. Seems safer to me in case something goes
wrong and we have to bisect and revert it. If that sounds ok I'll post
that in the next version.

>
> Other than that LGTM.

Thanks for the review!
Suren.

>
> Petr T
>
> > -#define ___GFP_DIRECT_RECLAIM 0x400u
> > -#define ___GFP_KSWAPD_RECLAIM 0x800u
> > -#define ___GFP_WRITE 0x1000u
> > -#define ___GFP_NOWARN 0x2000u
> > -#define ___GFP_RETRY_MAYFAIL 0x4000u
> > -#define ___GFP_NOFAIL 0x8000u
> > -#define ___GFP_NORETRY 0x10000u
> > -#define ___GFP_MEMALLOC 0x20000u
> > -#define ___GFP_COMP 0x40000u
> > -#define ___GFP_NOMEMALLOC 0x80000u
> > -#define ___GFP_HARDWALL 0x100000u
> > -#define ___GFP_THISNODE 0x200000u
> > -#define ___GFP_ACCOUNT 0x400000u
> > -#define ___GFP_ZEROTAGS 0x800000u
> > +#define ___GFP_DIRECT_RECLAIM BIT(___GFP_DIRECT_RECLAIM_BIT)
> > +#define ___GFP_KSWAPD_RECLAIM BIT(___GFP_KSWAPD_RECLAIM_BIT)
> > +#define ___GFP_WRITE BIT(___GFP_WRITE_BIT)
> > +#define ___GFP_NOWARN BIT(___GFP_NOWARN_BIT)
> > +#define ___GFP_RETRY_MAYFAIL BIT(___GFP_RETRY_MAYFAIL_BIT)
> > +#define ___GFP_NOFAIL BIT(___GFP_NOFAIL_BIT)
> > +#define ___GFP_NORETRY BIT(___GFP_NORETRY_BIT)
> > +#define ___GFP_MEMALLOC BIT(___GFP_MEMALLOC_BIT)
> > +#define ___GFP_COMP BIT(___GFP_COMP_BIT)
> > +#define ___GFP_NOMEMALLOC BIT(___GFP_NOMEMALLOC_BIT)
> > +#define ___GFP_HARDWALL BIT(___GFP_HARDWALL_BIT)
> > +#define ___GFP_THISNODE BIT(___GFP_THISNODE_BIT)
> > +#define ___GFP_ACCOUNT BIT(___GFP_ACCOUNT_BIT)
> > +#define ___GFP_ZEROTAGS BIT(___GFP_ZEROTAGS_BIT)
> > #ifdef CONFIG_KASAN_HW_TAGS
> > -#define ___GFP_SKIP_ZERO 0x1000000u
> > -#define ___GFP_SKIP_KASAN 0x2000000u
> > +#define ___GFP_SKIP_ZERO BIT(___GFP_SKIP_ZERO_BIT)
> > +#define ___GFP_SKIP_KASAN BIT(___GFP_SKIP_KASAN_BIT)
> > #else
> > #define ___GFP_SKIP_ZERO 0
> > #define ___GFP_SKIP_KASAN 0
> > #endif
> > #ifdef CONFIG_LOCKDEP
> > -#define ___GFP_NOLOCKDEP 0x4000000u
> > +#define ___GFP_NOLOCKDEP BIT(___GFP_NOLOCKDEP_BIT)
> > #else
> > #define ___GFP_NOLOCKDEP 0
> > #endif
> > -/* If the above are modified, __GFP_BITS_SHIFT may need updating */
> >
> > /*
> > * Physical address zone modifiers (see linux/mmzone.h - low four bits)
> > @@ -249,7 +283,7 @@ typedef unsigned int __bitwise gfp_t;
> > #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
> >
> > /* Room for N __GFP_FOO bits */
> > -#define __GFP_BITS_SHIFT (26 + IS_ENABLED(CONFIG_LOCKDEP))
> > +#define __GFP_BITS_SHIFT ___GFP_LAST_BIT
> > #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
> >
> > /**
>
>

2023-10-25 17:33:30

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v2 28/39] timekeeping: Fix a circular include dependency

On Tue, Oct 24 2023 at 06:46, Suren Baghdasaryan wrote:
> From: Kent Overstreet <[email protected]>
>
> This avoids a circular header dependency in an upcoming patch by only
> making hrtimer.h depend on percpu-defs.h

What's the actual dependency problem?

2023-10-26 13:21:11

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v2 01/39] lib/string_helpers: Add flags param to string_get_size()

On Tue, Oct 24, 2023 at 03:46:53PM -0400, Kent Overstreet wrote:
> On Tue, Oct 24, 2023 at 05:26:18PM +0300, Andy Shevchenko wrote:
> > On Tue, Oct 24, 2023 at 06:45:58AM -0700, Suren Baghdasaryan wrote:

...

> > > string_get_size(nblocks, queue_logical_block_size(q),
> > > - STRING_UNITS_10, cap_str_10, sizeof(cap_str_10));
> > > + 0, cap_str_10, sizeof(cap_str_10));
> >
> > This doesn't seem right (even if it works). We shouldn't rely on the
> > implementation details.
>
> It's now a flags parameter: passing an empty set of flags is not
> "relying on an implementation detail".

0 is the "default" flag which is definitely relies on the "implementation
detail". And I think that it's better that caller will explicitly tell what
they want.

...

> > > -/* Descriptions of the types of units to
> > > - * print in */
> > > -enum string_size_units {
> > > - STRING_UNITS_10, /* use powers of 10^3 (standard SI) */
> > > - STRING_UNITS_2, /* use binary powers of 2^10 */
> > > +enum string_size_flags {
> >
> > So, please add UNITS_10 as it is now. It will help if anybody in the future
> > wants to add, e.g., 8-base numbers.
>
> Octal human readable numbers? No, no one's wanted that so far and I
> very much doubt anyone will want that in the future.

I also in doubt, but still, the explicit is better than implicit in this case
in my opinion.

> > > + STRING_SIZE_BASE2 = (1 << 0),
> > > + STRING_SIZE_NOSPACE = (1 << 1),
> > > + STRING_SIZE_NOBYTES = (1 << 2),
> > > };

...

> > > +enum string_size_units {
> > > + STRING_UNITS_10, /* use powers of 10^3 (standard SI) */
> > > + STRING_UNITS_2, /* use binary powers of 2^10 */
> > > +};
> >
> > And what a point now in having these?
>
> Minimizing the size of the diff and making it more reviewable. It's fine
> as an internal implementation thing.

It's not an issue to rename these all over the places as you already did that
for most of the users. And minimizing diff could be done better with
--histogram algorithm or so. Otherwise it is not an objective here, right?

...

> > I assume you need to split this to a few patches:
> >
> > 1) rename parameter to be a flags without renaming the definitions (this will
> > touch only string_helpers part);
> > 2) do the end job by renaming it all over the drivers;
> > 3) add the other flags one-by-one (each in a separate change);
> > 4) use new flags where it's needed;
>
> No, those would not be atomic changes. In particular changing the
> parameter to a flags without changing the callers - that's not how we do
> things.

> We're currently working towards _better_ type safety for enums, fyi.
>
> The new flags _could_ be a separate patch, but since it would be
> touching much the same code as the previous patch I don't see the point
> in splitting it.

Individual flags can be discussed, objected or approved and won't affect the
rest of the changes. That's why I highly recommend to reconsider splitting of
this change.

It would be possible to squash back if maintainer wants this, but from review
perspective you are adding more burden to the reviewer's shoulders is not good.

...

> > > static const char *const units_10[] = {
> > > - "B", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB"
> > > + "", "k", "M", "G", "T", "P", "E", "Z", "Y"
> > > };
> > > static const char *const units_2[] = {
> > > - "B", "KiB", "MiB", "GiB", "TiB", "PiB", "EiB", "ZiB", "YiB"
> > > + "", "Ki", "Mi", "Gi", "Ti", "Pi", "Ei", "Zi", "Yi"
> > > };
> >
> > Ouch, instead of leaving this and actually "cutting the letter" with NO* flags,
> > you did something different.
>
> Not sure I understand your complaint? Were you attached to the redundant
> Bs?

Flag means "cutting" while in the code you "adding" (doing the opposite). Why
not do exactly "cutting" without touching there. Or since you mentioned changes
across the all callers, make them explicitly tell that they want Bytes suffix.

--
With Best Regards,
Andy Shevchenko


2023-10-26 18:33:43

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v2 28/39] timekeeping: Fix a circular include dependency

On Wed, Oct 25, 2023 at 5:33 PM Thomas Gleixner <[email protected]> wrote:
>
> On Tue, Oct 24 2023 at 06:46, Suren Baghdasaryan wrote:
> > From: Kent Overstreet <[email protected]>
> >
> > This avoids a circular header dependency in an upcoming patch by only
> > making hrtimer.h depend on percpu-defs.h
>
> What's the actual dependency problem?

Sorry for the delay.
When we instrument per-cpu allocations in [1] we need to include
sched.h in percpu.h to be able to use alloc_tag_save(). sched.h
includes hrtimer.h. So, without this change we end up with a circular
inclusion: percpu.h->sched.h->hrtimer.h->percpu.h

[1] https://lore.kernel.org/all/[email protected]/

>

2023-10-26 18:46:53

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v2 01/39] lib/string_helpers: Add flags param to string_get_size()

On Thu, Oct 26, 2023 at 04:12:52PM +0300, Andy Shevchenko wrote:
> On Tue, Oct 24, 2023 at 03:46:53PM -0400, Kent Overstreet wrote:
> > On Tue, Oct 24, 2023 at 05:26:18PM +0300, Andy Shevchenko wrote:
> > > On Tue, Oct 24, 2023 at 06:45:58AM -0700, Suren Baghdasaryan wrote:
>
> ...
>
> > > > string_get_size(nblocks, queue_logical_block_size(q),
> > > > - STRING_UNITS_10, cap_str_10, sizeof(cap_str_10));
> > > > + 0, cap_str_10, sizeof(cap_str_10));
> > >
> > > This doesn't seem right (even if it works). We shouldn't rely on the
> > > implementation details.
> >
> > It's now a flags parameter: passing an empty set of flags is not
> > "relying on an implementation detail".
>
> 0 is the "default" flag which is definitely relies on the "implementation
> detail". And I think that it's better that caller will explicitly tell what
> they want.
>
> ...
>
> > > > -/* Descriptions of the types of units to
> > > > - * print in */
> > > > -enum string_size_units {
> > > > - STRING_UNITS_10, /* use powers of 10^3 (standard SI) */
> > > > - STRING_UNITS_2, /* use binary powers of 2^10 */
> > > > +enum string_size_flags {
> > >
> > > So, please add UNITS_10 as it is now. It will help if anybody in the future
> > > wants to add, e.g., 8-base numbers.
> >
> > Octal human readable numbers? No, no one's wanted that so far and I
> > very much doubt anyone will want that in the future.
>
> I also in doubt, but still, the explicit is better than implicit in this case
> in my opinion.
>
> > > > + STRING_SIZE_BASE2 = (1 << 0),
> > > > + STRING_SIZE_NOSPACE = (1 << 1),
> > > > + STRING_SIZE_NOBYTES = (1 << 2),
> > > > };
>
> ...
>
> > > > +enum string_size_units {
> > > > + STRING_UNITS_10, /* use powers of 10^3 (standard SI) */
> > > > + STRING_UNITS_2, /* use binary powers of 2^10 */
> > > > +};
> > >
> > > And what a point now in having these?
> >
> > Minimizing the size of the diff and making it more reviewable. It's fine
> > as an internal implementation thing.
>
> It's not an issue to rename these all over the places as you already did that
> for most of the users. And minimizing diff could be done better with
> --histogram algorithm or so. Otherwise it is not an objective here, right?
>
> ...
>
> > > I assume you need to split this to a few patches:
> > >
> > > 1) rename parameter to be a flags without renaming the definitions (this will
> > > touch only string_helpers part);
> > > 2) do the end job by renaming it all over the drivers;
> > > 3) add the other flags one-by-one (each in a separate change);
> > > 4) use new flags where it's needed;
> >
> > No, those would not be atomic changes. In particular changing the
> > parameter to a flags without changing the callers - that's not how we do
> > things.
>
> > We're currently working towards _better_ type safety for enums, fyi.
> >
> > The new flags _could_ be a separate patch, but since it would be
> > touching much the same code as the previous patch I don't see the point
> > in splitting it.
>
> Individual flags can be discussed, objected or approved and won't affect the
> rest of the changes. That's why I highly recommend to reconsider splitting of
> this change.
>
> It would be possible to squash back if maintainer wants this, but from review
> perspective you are adding more burden to the reviewer's shoulders is not good.
>
> ...
>
> > > > static const char *const units_10[] = {
> > > > - "B", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB"
> > > > + "", "k", "M", "G", "T", "P", "E", "Z", "Y"
> > > > };
> > > > static const char *const units_2[] = {
> > > > - "B", "KiB", "MiB", "GiB", "TiB", "PiB", "EiB", "ZiB", "YiB"
> > > > + "", "Ki", "Mi", "Gi", "Ti", "Pi", "Ei", "Zi", "Yi"
> > > > };
> > >
> > > Ouch, instead of leaving this and actually "cutting the letter" with NO* flags,
> > > you did something different.
> >
> > Not sure I understand your complaint? Were you attached to the redundant
> > Bs?
>
> Flag means "cutting" while in the code you "adding" (doing the opposite). Why
> not do exactly "cutting" without touching there. Or since you mentioned changes
> across the all callers, make them explicitly tell that they want Bytes suffix.

Andy: to be blunt, you've been pretty hostile and hysterical ("breaking
the kernel!" over debug statements? really?) and the bikeshedding is
getting to be too much - I'm just going to drop this patch from the
series and we'll post process the output as needed.

2023-10-26 23:06:19

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v2 28/39] timekeeping: Fix a circular include dependency

On Thu, Oct 26 2023 at 18:33, Suren Baghdasaryan wrote:
> On Wed, Oct 25, 2023 at 5:33 PM Thomas Gleixner <[email protected]> wrote:
>> > This avoids a circular header dependency in an upcoming patch by only
>> > making hrtimer.h depend on percpu-defs.h
>>
>> What's the actual dependency problem?
>
> Sorry for the delay.
> When we instrument per-cpu allocations in [1] we need to include
> sched.h in percpu.h to be able to use alloc_tag_save(). sched.h

Including sched.h in percpu.h is fundamentally wrong as sched.h is the
initial place of all header recursions.

There is a reason why a lot of funtionalitiy has been split out of
sched.h into seperate headers over time in order to avoid that.

Thanks,

tglx

2023-10-26 23:55:09

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v2 28/39] timekeeping: Fix a circular include dependency

On Fri, Oct 27, 2023 at 01:05:48AM +0200, Thomas Gleixner wrote:
> On Thu, Oct 26 2023 at 18:33, Suren Baghdasaryan wrote:
> > On Wed, Oct 25, 2023 at 5:33 PM Thomas Gleixner <[email protected]> wrote:
> >> > This avoids a circular header dependency in an upcoming patch by only
> >> > making hrtimer.h depend on percpu-defs.h
> >>
> >> What's the actual dependency problem?
> >
> > Sorry for the delay.
> > When we instrument per-cpu allocations in [1] we need to include
> > sched.h in percpu.h to be able to use alloc_tag_save(). sched.h
>
> Including sched.h in percpu.h is fundamentally wrong as sched.h is the
> initial place of all header recursions.
>
> There is a reason why a lot of funtionalitiy has been split out of
> sched.h into seperate headers over time in order to avoid that.

Yeah, it's definitely unfortunate. The issue here is that
alloc_tag_save() needs task_struct - we have to pull that in for
alloc_tag_save() to be inline, which we really want.

What if we moved task_struct to its own dedicated header? That might be
good to do anyways...

2023-10-27 06:35:37

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 28/39] timekeeping: Fix a circular include dependency

On Fri, Oct 27, 2023, at 01:54, Kent Overstreet wrote:
> On Fri, Oct 27, 2023 at 01:05:48AM +0200, Thomas Gleixner wrote:
>> On Thu, Oct 26 2023 at 18:33, Suren Baghdasaryan wrote:
>> > On Wed, Oct 25, 2023 at 5:33 PM Thomas Gleixner <[email protected]> wrote:
>> >> > This avoids a circular header dependency in an upcoming patch by only
>> >> > making hrtimer.h depend on percpu-defs.h
>> >>
>> >> What's the actual dependency problem?
>> >
>> > Sorry for the delay.
>> > When we instrument per-cpu allocations in [1] we need to include
>> > sched.h in percpu.h to be able to use alloc_tag_save(). sched.h
>>
>> Including sched.h in percpu.h is fundamentally wrong as sched.h is the
>> initial place of all header recursions.
>>
>> There is a reason why a lot of funtionalitiy has been split out of
>> sched.h into seperate headers over time in order to avoid that.
>
> Yeah, it's definitely unfortunate. The issue here is that
> alloc_tag_save() needs task_struct - we have to pull that in for
> alloc_tag_save() to be inline, which we really want.
>
> What if we moved task_struct to its own dedicated header? That might be
> good to do anyways...

Yes, I agree that is the best way to handle it. I've prototyped
a more thorough header cleanup with good results (much improved
build speed) in the past, and most of the work to get there is
to seperate out structures like task_struct, mm_struct, net_device,
etc into headers that only depend on the embedded structure
definitions without needing all the inline functions associated
with them.

Arnd

2023-10-27 15:28:40

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v2 28/39] timekeeping: Fix a circular include dependency

On Thu, Oct 26, 2023 at 11:35 PM Arnd Bergmann <[email protected]> wrote:
>
> On Fri, Oct 27, 2023, at 01:54, Kent Overstreet wrote:
> > On Fri, Oct 27, 2023 at 01:05:48AM +0200, Thomas Gleixner wrote:
> >> On Thu, Oct 26 2023 at 18:33, Suren Baghdasaryan wrote:
> >> > On Wed, Oct 25, 2023 at 5:33 PM Thomas Gleixner <[email protected]> wrote:
> >> >> > This avoids a circular header dependency in an upcoming patch by only
> >> >> > making hrtimer.h depend on percpu-defs.h
> >> >>
> >> >> What's the actual dependency problem?
> >> >
> >> > Sorry for the delay.
> >> > When we instrument per-cpu allocations in [1] we need to include
> >> > sched.h in percpu.h to be able to use alloc_tag_save(). sched.h
> >>
> >> Including sched.h in percpu.h is fundamentally wrong as sched.h is the
> >> initial place of all header recursions.
> >>
> >> There is a reason why a lot of funtionalitiy has been split out of
> >> sched.h into seperate headers over time in order to avoid that.
> >
> > Yeah, it's definitely unfortunate. The issue here is that
> > alloc_tag_save() needs task_struct - we have to pull that in for
> > alloc_tag_save() to be inline, which we really want.
> >
> > What if we moved task_struct to its own dedicated header? That might be
> > good to do anyways...
>
> Yes, I agree that is the best way to handle it. I've prototyped
> a more thorough header cleanup with good results (much improved
> build speed) in the past, and most of the work to get there is
> to seperate out structures like task_struct, mm_struct, net_device,
> etc into headers that only depend on the embedded structure
> definitions without needing all the inline functions associated
> with them.

This is something I'll add to our automation todos which I plan to
talk about at plumbers; I feel like it should be possible to write a
script that given a header and identifier can split whatever
declaration out into a new header, update the old header, then add the
necessary includes for the newly created header to each dependent
(optional).
--
Thanks,
~Nick Desaulniers

2023-10-28 17:22:12

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH v2 06/39] mm: enumerate all gfp flags

On Wed, 25 Oct 2023 08:28:32 -0700
Suren Baghdasaryan <[email protected]> wrote:

> On Tue, Oct 24, 2023 at 10:47 PM Petr Tesařík <[email protected]> wrote:
> >
> > On Tue, 24 Oct 2023 06:46:03 -0700
> > Suren Baghdasaryan <[email protected]> wrote:
> >
> > > Introduce GFP bits enumeration to let compiler track the number of used
> > > bits (which depends on the config options) instead of hardcoding them.
> > > That simplifies __GFP_BITS_SHIFT calculation.
> > > Suggested-by: Petr Tesařík <[email protected]>
> > > Signed-off-by: Suren Baghdasaryan <[email protected]>
> > > ---
> > > include/linux/gfp_types.h | 90 +++++++++++++++++++++++++++------------
> > > 1 file changed, 62 insertions(+), 28 deletions(-)
> > >
> > > diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
> > > index 6583a58670c5..3fbe624763d9 100644
> > > --- a/include/linux/gfp_types.h
> > > +++ b/include/linux/gfp_types.h
> > > @@ -21,44 +21,78 @@ typedef unsigned int __bitwise gfp_t;
> > > * include/trace/events/mmflags.h and tools/perf/builtin-kmem.c
> > > */
> > >
> > > +enum {
> > > + ___GFP_DMA_BIT,
> > > + ___GFP_HIGHMEM_BIT,
> > > + ___GFP_DMA32_BIT,
> > > + ___GFP_MOVABLE_BIT,
> > > + ___GFP_RECLAIMABLE_BIT,
> > > + ___GFP_HIGH_BIT,
> > > + ___GFP_IO_BIT,
> > > + ___GFP_FS_BIT,
> > > + ___GFP_ZERO_BIT,
> > > + ___GFP_UNUSED_BIT, /* 0x200u unused */
> > > + ___GFP_DIRECT_RECLAIM_BIT,
> > > + ___GFP_KSWAPD_RECLAIM_BIT,
> > > + ___GFP_WRITE_BIT,
> > > + ___GFP_NOWARN_BIT,
> > > + ___GFP_RETRY_MAYFAIL_BIT,
> > > + ___GFP_NOFAIL_BIT,
> > > + ___GFP_NORETRY_BIT,
> > > + ___GFP_MEMALLOC_BIT,
> > > + ___GFP_COMP_BIT,
> > > + ___GFP_NOMEMALLOC_BIT,
> > > + ___GFP_HARDWALL_BIT,
> > > + ___GFP_THISNODE_BIT,
> > > + ___GFP_ACCOUNT_BIT,
> > > + ___GFP_ZEROTAGS_BIT,
> > > +#ifdef CONFIG_KASAN_HW_TAGS
> > > + ___GFP_SKIP_ZERO_BIT,
> > > + ___GFP_SKIP_KASAN_BIT,
> > > +#endif
> > > +#ifdef CONFIG_LOCKDEP
> > > + ___GFP_NOLOCKDEP_BIT,
> > > +#endif
> > > + ___GFP_LAST_BIT
> > > +};
> > > +
> > > /* Plain integer GFP bitmasks. Do not use this directly. */
> > > -#define ___GFP_DMA 0x01u
> > > -#define ___GFP_HIGHMEM 0x02u
> > > -#define ___GFP_DMA32 0x04u
> > > -#define ___GFP_MOVABLE 0x08u
> > > -#define ___GFP_RECLAIMABLE 0x10u
> > > -#define ___GFP_HIGH 0x20u
> > > -#define ___GFP_IO 0x40u
> > > -#define ___GFP_FS 0x80u
> > > -#define ___GFP_ZERO 0x100u
> > > +#define ___GFP_DMA BIT(___GFP_DMA_BIT)
> > > +#define ___GFP_HIGHMEM BIT(___GFP_HIGHMEM_BIT)
> > > +#define ___GFP_DMA32 BIT(___GFP_DMA32_BIT)
> > > +#define ___GFP_MOVABLE BIT(___GFP_MOVABLE_BIT)
> > > +#define ___GFP_RECLAIMABLE BIT(___GFP_RECLAIMABLE_BIT)
> > > +#define ___GFP_HIGH BIT(___GFP_HIGH_BIT)
> > > +#define ___GFP_IO BIT(___GFP_IO_BIT)
> > > +#define ___GFP_FS BIT(___GFP_FS_BIT)
> > > +#define ___GFP_ZERO BIT(___GFP_ZERO_BIT)
> > > /* 0x200u unused */
> >
> > This comment can be also removed here, because it is already stated
> > above with the definition of ___GFP_UNUSED_BIT.
>
> Ack.
>
> >
> > Then again, I think that the GFP bits have never been compacted after
> > Neil Brown removed __GFP_ATOMIC with commit 2973d8229b78 simply because
> > that would mean changing definitions of all subsequent GFP flags. FWIW
> > I am not aware of any code that would depend on the numeric value of
> > ___GFP_* macros, so this patch seems like a good opportunity to change
> > the numbering and get rid of this unused 0x200u altogether.
> >
> > @Neil: I have added you to the conversation in case you want to correct
> > my understanding of the unused bit.
>
> Hmm. I would prefer to do that in a separate patch even though it
> would be a one-line change. Seems safer to me in case something goes
> wrong and we have to bisect and revert it. If that sounds ok I'll post
> that in the next version.

You're right. If something does go wrong, it will be easier to fix if
the removal of the unused bit is in a commit of its own.

Petr T

2023-10-30 20:08:32

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v2 01/39] lib/string_helpers: Add flags param to string_get_size()

On Thu, Oct 26, 2023 at 9:45 PM Kent Overstreet
<[email protected]> wrote:
> On Thu, Oct 26, 2023 at 04:12:52PM +0300, Andy Shevchenko wrote:
> > On Tue, Oct 24, 2023 at 03:46:53PM -0400, Kent Overstreet wrote:
> > > On Tue, Oct 24, 2023 at 05:26:18PM +0300, Andy Shevchenko wrote:
> > > > On Tue, Oct 24, 2023 at 06:45:58AM -0700, Suren Baghdasaryan wrote:

...

> Andy: to be blunt, you've been pretty hostile and hysterical ("breaking
> the kernel!" over debug statements? really?)

Where did I say that?

And instead of judging me, look from the reviewer's and maintainer's
point of view. The patch is unreviewable and has not good
maintainability in this form as it does a few things at once.

Also the commit message is poorly written as it has no mention about use cases.

> and the bikeshedding is
> getting to be too much - I'm just going to drop this patch from the
> series and we'll post process the output as needed.

I don't think I bikeshed here too much. This is the regular asking the
reviewers and/or maintainers usually do, so is it too hard to split?

--
With Best Regards,
Andy Shevchenko

2023-10-30 22:36:51

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v2 01/39] lib/string_helpers: Add flags param to string_get_size()

On Mon, Oct 30, 2023 at 10:07:43PM +0200, Andy Shevchenko wrote:
> I don't think I bikeshed here too much. This is the regular asking the
> reviewers and/or maintainers usually do, so is it too hard to split?

Excuse me?

You were objecting to the new approach making it impossible to add
_octal_ support, and your proposed patch split was completely broken.

Are you trying to be more reasonable now...?